In [5]:
import pandas as pd
from scipy.stats import chi2_contingency

In [7]:
#Create the contingency table
data=[[20, 30], #Male: [Like, Dislike]
      [25,25]]  #Female: [Like, Dislike]

#Create a DataFrame for better visualization
df=pd.DataFrame(data,columns=["Like", "Dislike"], index=["Male", "Female"])

#Perform the Chi-Squared Test
chi2, p, dof, expected= chi2_contingency(df)

#Display results
print("Chi-Squared Statistic:", chi2)
print("Degrees of Freedom:", dof)
print("P-value:", p)
print("Expected Frequencies:\n", expected)

Chi-Squared Statistic: 0.6464646464646464
Degrees of Freedom: 1
P-value: 0.4213795037428696
Expected Frequencies:
 [[22.5 27.5]
 [22.5 27.5]]


*Applications*

Market Research: Analyzing the association between customer demographics and product preferences.
Healthcare: Studying the relationship between patient characteristics and disease incidence.

Social Sciences: Investigating the link between social factors (e.g., education level) and behavioral outcomes (e.g., voting patterns).

Education: Examining the connection between teaching methods and student performance.

Quality Control: Assessing the association between manufacturing conditions and product defects.



In [8]:
import numpy as np
from scipy.stats import chi2_contingency

In [10]:
#Observed rows = gender(Male, Female), columns = preference (Like, Dislike)
data = np.array([
    [20, 30], #Male: [Like, Dislike]
    [25, 25]  #Female: [Like, Dislike]
])

#Perform Chi-Squared Test
chi2, p, dof, expected = chi2_contingency(data, correction=False)

#Calculate Cramér's V
n = data.sum()
cramers_v = np.sqrt(chi2 / (n*(min(data.shape)-1)))

print("Chi- Squared statistic:", chi2)
print("p-value:", p)
print("Degree of Freedom:", dof)
print("Expected Frequencies:\n", expected)
print("Cramér's V:", cramers_v)

Chi- Squared statistic: 1.0101010101010102
p-value: 0.3148786413364169
Degree of Freedom: 1
Expected Frequencies:
 [[22.5 27.5]
 [22.5 27.5]]
Cramér's V: 0.10050378152592121


Using a chi-square distribution table, we compare the calculated chi-square value (1.008) with the critical value at one degree of freedom and a significance level (e.g., 0.05). The critical value, as determined from chi-square distribution tables, is approximately 3.841.

Since 1.008 < 3.841, we fail to reject the null hypothesis. Thus, there is no significant association between gender and product preference in this sample.

In [11]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

In [13]:
data2=np.array([
    [50,30], #Smoker: [Disease, No Disease]
    [20, 100] #Non-Smoker: [Disease, No Disease]
])

#Perform the Chi-Squared Test
chi2, p, dof, expected=chi2_contingency(data2)

#Display results
print("Chi-Squared Statistic:", chi2)
print("P-value:", p)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:\n", expected)


Chi-Squared Statistic: 42.33058608058608
P-value: 7.707766001215449e-11
Degrees of Freedom: 1
Expected Frequencies:
 [[28. 52.]
 [42. 78.]]


This result demonstrates a very strong association between smoking status and disease presence.


Interpretation
	•	Chi-Squared Statistic: 42.33 Such a high value indicates a major difference between observed and expected frequencies, with almost no overlap due to randomness.

	•	P-value: 7.71 × 10⁻¹¹ This p-value is extremely small (much less than 0.05), showing the association is statistically significant.

	•	Degrees of Freedom: 1 Correct for a 2×2 table.

	•	Expected Frequencies:

	•	Smoker:

	•	Non-Smoker:
	The observed frequencies for smokers () and non-smokers () are very different from what would be expected if there were no association.

What it Means

	•	The data provides overwhelming evidence that smoking status is associated with disease.

	•	The null hypothesis (no association) is strongly rejected.
	
	•	The difference observed is not due to chance, implying a real relationship in the sample.

Practical Takeaway

Whenever the chi-square statistic is large and the p-value is extremely small, it signals a strong and significant association between the categorical variables (here: smoking and disease presence). If you calculated Cramér’s V, it would be close to 1, confirming a strong effect.