In [5]:
import scipy.stats as stat
import numpy as np

In [6]:
## No of hours students study daily
# sum(observed_data) SHOULD BE EQUAL TO sum(expected_data)
expected_data = [8, 6, 7, 9, 6, 9, 7] # Expected frequencies
observed_data = [7, 8, 6, 9, 9, 6, 7] # Observed frequencies (from survey)

# Chi-Sqaure Goodness of Fit Test
chisquare_test_statistic, p = stat.chisquare(observed_data, expected_data)
chisquare_test_statistic, p

(3.4345238095238093, 0.7526596580922865)

## 1. scipy.stats.chisquare()
#### scipy.stats.chisquare() is a function used to perform a Chi-Square test, specifically the Goodness of Fit test. This test checks whether the observed data matches an expected distribution. It compares observed frequencies to expected frequencies and tests how well the observed data fits a theoretical distribution.

In [2]:
# scipy.stats.chisquare(f_obs, f_exp=None, ddof=0)

#### f_obs: The observed frequencies (a list or array of counts).
#### f_exp: The expected frequencies (optional). If not provided, the default assumption is that all categories are equally likely.
#### ddof: Degrees of freedom adjustment. By default, it's 0, but it can be changed based on the problem (e.g., in some cases, you may adjust it based on the number of estimated parameters).

## Interpretation:
#### The Chi-Square statistic measures how much the observed frequencies deviate from the expected frequencies.
#### The P-value tells you whether the deviation is statistically significant. If the p-value is less than your significance level (say 0.05), you reject the null hypothesis (which is that the data fits the expected distribution)

In [11]:
# Find Critical Value
significance_value = 0.05
dof = len(expected_data) - 1
critical_value = stat.chi2.ppf(1 - significance_value, dof)
critical_value

12.591587243743977

## 2. scipy.stats.chi2.ppf()
#### The stats.chi2.ppf() function is part of the Chi-Square distribution and is used to calculate critical values. It is based on the Percent Point Function (ppf), which is the inverse of the cumulative distribution function (CDF).

#### chi2 refers to the Chi-Square distribution.
#### ppf() gives you the value of the Chi-Square statistic for a given probability (quantile).
#### In hypothesis testing, you often compare your test statistic (e.g., Chi-Square statistic) to a critical value. This critical value is determined by the degrees of freedom and the significance level. The ppf() function allows you to compute that critical value.

In [11]:
# scipy.stats.chi2.ppf(q, df)

#### q: The quantile (1 - significance level). For example, if you’re testing at a 95% confidence level, q = 1 - 0.05 = 0.95.
#### df: The degrees of freedom for the test

In [15]:
# If chi square Statistic is GREATER than critical value, we REJECT NULL
if chisquare_test_statistic > critical_value :
    print('Reject NULL hypothesis')
else :
    print('Fail to reject NULL hypothesis')

Fail to reject NULL hypothesis


#### When performing a Chi-Square test:

#### You first calculate the Chi-Square statistic using chisquare().
#### Then, you compare the statistic to the critical value obtained using chi2.ppf() to make a decision about whether to reject or accept the null hypothesis