Hypothetical Observational Data:

We'll create a contingency table with observed counts for each combination of habitat type and plant presence.
Plant         Present	  Plant Absent	    Row Total
Forest      	25            15	            40
Grassland   	10	          30	            40
Wetland	        20	          10	            30
Column Total	55	          55	            110

Chi-squared Test:
The Chi-squared test will help us determine if there is a significant association between the type of habitat and the presence of the plant species.

Observed Contingency Table:
Plant      Present	Plant Absent
Forest	      25	15
Grassland	  10	30
Wetland	     20	    10


In [1]:
import numpy as np
from scipy.stats import chi2

In [2]:
# Observed frequencies
observed = np.array([[25, 15],
                     [10, 30],
                     [20, 10]])

In [3]:
# Step 1: Calculate the row totals, column totals, and grand total
row_totals = observed.sum(axis=1)
col_totals = observed.sum(axis=0)
grand_total = observed.sum()


In [4]:
# Step 2: Calculate the expected frequencies
expected = np.outer(row_totals, col_totals) / grand_total

# Step 3: Compute the Chi-squared statistic
chi_squared_statistic = ((observed - expected) ** 2 / expected).sum()

# Step 4: Determine the degrees of freedom
degrees_of_freedom = (observed.shape[0] - 1) * (observed.shape[1] - 1)

# Step 5: Calculate the p-value
p_value = chi2.sf(chi_squared_statistic, degrees_of_freedom)

# Step 6: Compare the p-value with the significance level
significance_level = 0.05
reject_null = p_value < significance_level

In [5]:
# Output the results
chi_squared_statistic, p_value, degrees_of_freedom, reject_null, expected


(15.833333333333332,
 0.0003646156887302733,
 2,
 True,
 array([[20., 20.],
        [20., 20.],
        [15., 15.]]))

P value of 0.0003, suggests to reject the null hypothesis, which means there is a significant association between the type of habitat and the presence of the plant species

The chi squared statistics of 15.8 means the observed frequencies deviate from the expected frequencies under the null hypoothesis. a higher value indicates  greater deviation..