#### About

> Contigency table and Chi-square tests

Contingency tables and Chi-square are statistical tools used to analyze the relationship between two categorical variables.

A contingency table is a table that displays the frequency distribution of two or more categorical variables. Each cell in the table represents the frequency count of a specific combination of values from the two variables. Contingency tables are also known as cross-tabulations.

The chi-square test is a statistical test used to determine if there is a significant association between two categorical variables in a contingency table. It compares the observed frequencies in the contingency table with the expected frequencies under the null hypothesis of no association.




In [1]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

In [2]:
data = {'Gender': ['Male', 'Male', 'Male', 'Female', 'Female', 'Female'],
        'Smoker': ['Yes', 'No', 'No', 'Yes', 'No', 'No']}
df = pd.DataFrame(data)

In [3]:

# Create a contingency table
contingency_table = pd.crosstab(df['Gender'], df['Smoker'])

In [4]:
contingency_table

Smoker,No,Yes
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,2,1
Male,2,1


In [5]:
# Perform the chi-square test
stat, p, dof, expected = chi2_contingency(contingency_table)

In [6]:
# Display the test results
print('Chi-square statistic:', stat)
print('p-value:', p)
print('Degrees of freedom:', dof)
print('Expected frequencies:', expected)

Chi-square statistic: 0.0
p-value: 1.0
Degrees of freedom: 1
Expected frequencies: [[2. 1.]
 [2. 1.]]


p-value is greater than 0.05 that means there is no significant association. 