## <font color = 'green'> Chi-square test </font>:


**Description :**

    The chi-square (c2) test measures the alignment between two sets of frequency measures. These must be categorical counts and not percentages or ratios measures.

    Note that the frequency numbers should be significant and be at least above 5 (although an occasional lower figure may be possible, as long as they are not a part of a pattern of low figures).

### Chi Square Independence : (scipy.stats.chi2_contingency)

- `The chi-square test can be used in the reverse manner to goodness of fit. If the two sets of measures are compared, then just as you can show they align, you can also determine if they do not align.`


- ***`The main difference in goodness-of-fit vs. independence assessments is in the use of the Chi Square table. For goodness of fit, attention is on 0.05, 0.01 or 0.001 figures. For independence, it is on 0.95 or 0.99 figures (this is why the table has two ends to it).`***

![title](ChiSquare_Ind.png)

    The Chi-square test of independence is an omnibus test; meaning it tests the data as a whole. This means that one will not be able to easily tell which levels (categories) of the variables are responsible for the relationship if the Chi-square table is larger than 2×2. 

### Step 1: State the null and alternative hypothesis:

  **Null hypothesis ($H_0$)**: There is no difference in quality of the products manufactured by male and female

  **Alternative hypothesis ( $H_A$)**: There is a significant difference in quality of the products manufactured by male and female

### Step 2: Decide the significance level
    
            Here we select α = 0.05

### Step 3: Identify the test statistic

     use the chi-square test of independence to find out the difference of categorical variables

### Step 4: Calculate p value or chi-square statistic value

In [1]:
import pandas      as pd
import numpy       as np
import scipy.stats as stats

quality_array = np.array([[138, 83, 64],[64, 67, 84]])
chi_sq_Stat, p_value, deg_freedom, exp_freq = stats.chi2_contingency(quality_array)

print('Chi-square statistic %3.5f P value %1.6f Degrees of freedom %d' %(chi_sq_Stat, p_value,deg_freedom))

Chi-square statistic 22.15247 P value 0.000015 Degrees of freedom 2


### Step 5: Decide to reject or accept null hypothesis

    In this example, p value is 0.000015 and < 0.05 so we reject the null hypothesis.

So, we conclude that there is a significant difference in quality of the products manufactured by male and female.

In [2]:
# Functions for calculating the degree of association between nominal variables
    
" This Function is using to find the Association between the Categorical Columns"

def ChiSquare(df,cols):

    crosstab = pd.crosstab(df[cols[0]], df[cols[1]])

    chi_sq_Stat, p_value, deg_freedom, exp_freq = stats.chi2_contingency(crosstab)

    print('Chi-square statistic %3.5f P value %1.6f Degrees of freedom %d' %(chi_sq_Stat, p_value,deg_freedom))

    if(p_value <= 0.05):

        print('We reject the Null Hypothesis and we retain Alternative Hypothesis : Two columns are dependent')

    else:

        print('We failed reject the Null Hypothesis : Two columns are independent')

In [3]:
data = pd.read_csv("AB_NYC_2019.csv",parse_dates=[0])
x = [x for x in input("Enter multiple Columns: ").split()] 
ChiSquare(data,x)

Enter multiple Columns: neighbourhood_group room_type
Chi-square statistic 1559.58035 P value 0.000000 Degrees of freedom 8
We reject the Null Hypothesis and we retain Alternative Hypothesis : Two columns are dependent
