# Hypothesis Tests - Chi square tests
- evaluates hypotheses about categorical data to determine if observed frequencies match expected frequencies, indicating either a relationship between two variables (independence test) or a fit to a specific distribution (goodness-of-fit test).
- The null hypothesis Ho states no relationship or no difference from the expected distribution, while the alternative hypothesis (Ha) claims there is a relationship or a significant difference.

## Q1 : Survey on Drink and Local Sale
- In a sample survey of public opinion, ans to the following Ques are tabulated below
    - Do you Drink ?
    - Are in favour of local option on sale of liquor
-  ![image.png](attachment:605d3a51-7c2e-4e03-8666-7298a9959481.png)
-  Hypothesis
    - Ho : {Drinking and opinion on local sale are independent.}
    - Ha : {They are not independent (association exists).} 

In [2]:
import numpy as np
from scipy.stats import chi2_contingency

In [3]:
# Contingency table (rows: Local Sale Yes/No, columns: Drink Yes/No)
table = np.array([[56, 31],
                  [18, 6]])
table

array([[56, 31],
       [18,  6]])

In [4]:
chi2, p, dof, expected = chi2_contingency(table)

print("Chi-square statistic:", round(chi2, 3))
print("Degrees of freedom:", dof)
print("p-value:", round(p, 4))
print("\nExpected frequencies:\n", expected)

Chi-square statistic: 0.538
Degrees of freedom: 1
p-value: 0.4632

Expected frequencies:
 [[58. 29.]
 [16.  8.]]


- p = 0.79 > 0.05 : Fail to reject Ho
- There is no significant association between whether people drink and their opinion on local sale of liquor at the 5% level.
- The data do not provide evidence of a relationship between drinking habits and opinion on local liquor sale — the two responses appear to be independent.z

## Q2 : Is liking of soft drink influenced by particular category of employees
- From the following data fin whether there is any  significant liking the the habit of taking soft drinks amoung the categories of employees?
- ![image.png](attachment:cbaf9265-043d-498c-8c1c-2d4c63524cb8.png)
- H0 : {Soft-drink preference is independent of employee category.}
- Ha: {Soft-drink preference depends on employee category.}
- this is again a Chi-Square Test of Independence, used to check whether type of employee (Clerks, Teachers, Officers) and preference for soft drink (Pepsi, Thums Up, Coke) are associated or independent.

In [5]:
import numpy as np
from scipy.stats import chi2_contingency

In [6]:
# Contingency table: rows = Soft drink, columns = Employee category
table = np.array([
    [10, 25, 65],
    [15, 30, 65],
    [50, 60, 30]
])
table

array([[10, 25, 65],
       [15, 30, 65],
       [50, 60, 30]])

In [7]:
chi2, p, dof, expected = chi2_contingency(table)

print("Chi-square statistic:", round(chi2, 3))
print("Degrees of freedom:", dof)
print("p-value:", round(p, 4))
print("\nExpected frequencies:\n", expected)

Chi-square statistic: 60.234
Degrees of freedom: 4
p-value: 0.0

Expected frequencies:
 [[21.42857143 32.85714286 45.71428571]
 [23.57142857 36.14285714 50.28571429]
 [30.         46.         64.        ]]


- p = 0.0 < 0.05 : Reject Ho : Independent 
- There is significant difference among the categories of employees and they way they like type of soft drinks.
- There is a significant association between type of employee and preference for soft drink.
In other words:
    - The liking or habit of taking soft drinks differs significantly among Clerks, Teachers, and Officers.

![image.png](attachment:9f864af4-7ca8-4a1c-8aef-4bbfba448096.png)

# end of ChiSquare Test of Independence