# Chi-square Test

* Chi-square Test of Independence / Associations
* Chi-square Test of Goodness of fit
* Chi-square Test of Homogeneity 

In [1]:
import pandas as pd
from scipy.stats  import chi2_contingency

# Two different methods of conducting a t-test
import researchpy as rp
from scipy import stats


# Chi-square Test of Independece / Associations

### Example 1

In [2]:
# Read the data in
df = pd.read_csv(r"C:\Users\User\Desktop\Python Code\Inferential Statistics\Unit 1\Chi-Square Test\Chi-square test Independence.csv")

In [3]:
df.head()

Unnamed: 0,Gender,Voting_Preference
0,Male,Republic
1,Male,Republic
2,Male,Republic
3,Male,Republic
4,Male,Republic


In [4]:
df.columns

Index(['Gender', 'Voting_Preference'], dtype='object')

In [5]:
df.Gender.value_counts()

Female    600
Male      400
Name: Gender, dtype: int64

### Hypotheses

Ho:- There is no relationship between the Gender and Voting_Preference ( independent )

H1:- There is relationship between the Gender and Voting_Preference ( dependent )

In [6]:
table, results = rp.crosstab(df['Gender'], df['Voting_Preference'], test= 'chi-square') #,prop='row')
    
table

Unnamed: 0_level_0,Voting_Preference,Voting_Preference,Voting_Preference,Voting_Preference
Voting_Preference,Democratic,Indian_Party,Republic,All
Gender,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Female,300,50,250,600
Male,150,50,200,400
All,450,100,450,1000


In [10]:
results  # Alpha = 0.05 

Unnamed: 0,Chi-square test,results
0,Pearson Chi-square ( 2.0) =,16.2037
1,p-value =,0.0003
2,Cramer's V =,0.1273


### Conclusion
* We Reject Ho
* There is relationship between the Gender and Voting_Preference ( dependent )

# Chi-square Test Goodness of fit

* Acme toy co. prints baseball cards. 
* The company claims that 30% of the cards are rookies; 
* 60% veteran but not all steps & 10% are veteran All-stars
* suppose a all random sample of 100 cards has 50 rookies , 45 veterans & 5 all-stars. 
* Is this consistent with Acme’s claims? Use 0.05 level of significance

### Hypotheses

* Ho:- The proportion of rookies , Vetrans & all stars is 30% , 60% & 10% will not have different proportion in Sample
* H1:- The proportion of rookies , Vetrans & all stars is 30% , 60% & 10% will have different proportion in Sample


In [2]:
import scipy.stats as stats

In [3]:
def expected_value(n,p1,p2,p3):
    e1=n*p1
    e2=n*p2
    e3=n*p3 
    return list([e1,e2,e3])

In [4]:
expected_value(100,0.30,0.60,0.10)

[30.0, 60.0, 10.0]

In [15]:
expected = [30, 60,10]
observed = [50, 45,5]

In [18]:
stats.chisquare(f_obs=observed, f_exp=expected)

Power_divergenceResult(statistic=19.583333333333336, pvalue=5.5915626856371765e-05)

In [None]:
p-value = 0.0000595 <= Alpha = 0.05 

### Conclusion
* We Reject Ho
* The proportion of rookies , Vetrans & all stars is 30% , 60% & 10% will have different proportion in Sample


# Chi-Square Test of Homogeneity 

### Example 1

### Hypotheses

* Ho:- The proportion of boys who prefer the Lone Ranger,Sesame_Street,The_Simpsons is identical to the proportion of girls
* 
Vs 

* H1 :-  At least one of the null hypothesis statements is false.

In [7]:
# Read the data in
df = pd.read_csv(r"C:\Users\User\Desktop\Python Code\Inferential Statistics\Unit 1\Chi-Square Test\Chi-square test Homogeneity.csv")

In [8]:
df.columns

Index(['Gender', 'Viewing_Preference'], dtype='object')

In [9]:
df

Unnamed: 0,Gender,Viewing_Preference
0,Boys,Lone_Ranger
1,Boys,Lone_Ranger
2,Boys,Lone_Ranger
3,Boys,Lone_Ranger
4,Boys,Lone_Ranger
...,...,...
295,Girls,The_Simpsons
296,Girls,The_Simpsons
297,Girls,The_Simpsons
298,Girls,The_Simpsons


In [9]:
table, results = rp.crosstab(df['Gender'], df['Viewing_Preference'], test= 'chi-square') #,prop='row')
    
table

Unnamed: 0_level_0,Viewing_Preference,Viewing_Preference,Viewing_Preference,Viewing_Preference
Viewing_Preference,Lone_Ranger,Sesame_Street,The_Simpsons,All
Gender,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Boys,50,30,20,100
Girls,50,80,70,200
All,100,110,90,300


In [10]:
results  # Alpha = 0.05

Unnamed: 0,Chi-square test,results
0,Pearson Chi-square ( 2.0) =,19.3182
1,p-value =,0.0001
2,Cramer's V =,0.2538


#### Conclusion
*  We Reject Ho
* At least one of the null hypothesis statements is false.