https://www.spss-tutorials.com/cramers-v-what-and-why/

https://www.kaggle.com/omarayman/chi-square-test-in-python/data

### Cramer's V :
            
            Cramér’s V is a number between 0 and 1 that indicates how strongly two categorical variables are associated.

- `If we'd like to know if 2 categorical variables are associated, our first option is the ` **chi-square independence test**`.  

***`A significance level close to zero means that our variables are very unlikely to be completely unassociated in some population. However, this does not mean the variables are strongly associated; a weak association in a large sample size may also result in p = 0.000.`***

![title](cramers_v.png)

- `What we need is something that will look like correlation, but will work with categorical values — or more formally, we’re looking for a measure of association between two categorical features. Introducing: Cramér’s V.`


- `It is based on a nominal variation of Pearson’s Chi-Square Test, and comes built-in with some great benefits:`
        
      1.Similarly to correlation, the output is in the range of [0,1], where 0 means no association and 1 is full association. (Unlike correlation, there are no negative values, as there’s no such thing as a negative association. Either there is, or there isn’t)
    
      2.Like correlation, Cramer’s V is symmetrical — it is insensitive to swapping x and y

In [1]:
import numpy as np
import pandas as pd
from scipy import stats

In [22]:
data = pd.read_csv('chi2.csv')

data

Unnamed: 0,Have you ever taken a course in statistics?,Do you have any previous experience with programming?,What's your interest in data science?,"Just for fun, do you prefer dogs or cat?"
0,Yep,Nope,I want to get a job where I use data science,Cats ?±
1,Yep,I have quite a bit of experience,I want to get a job where I use data science,Dogs ?¶
2,Yep,I have a little bit of experience,It will help me in my current job,Dogs ?¶
3,Nope,I have a little bit of experience,Just curious,Cats ?±
4,"Yes, but I've forgotten everything",I have quite a bit of experience,I want to get a job where I use data science,Neither ?…
...,...,...,...,...
1244,"Yes, but I've forgotten everything",I have a little bit of experience,I want to get a job where I use data science,Dogs ?¶
1245,Nope,I have quite a bit of experience,I want to get a job where I use data science,Dogs ?¶
1246,Yep,I have a little bit of experience,Just curious,Neither ?…
1247,Yep,Nope,Just curious,Cats ?±


In [24]:
data = data.rename(columns = {"Have you ever taken a course in statistics?": "stats","Do you have any previous experience with programming?":"program"})
data

Unnamed: 0,stats,program,What's your interest in data science?,"Just for fun, do you prefer dogs or cat?"
0,Yep,Nope,I want to get a job where I use data science,Cats ?±
1,Yep,I have quite a bit of experience,I want to get a job where I use data science,Dogs ?¶
2,Yep,I have a little bit of experience,It will help me in my current job,Dogs ?¶
3,Nope,I have a little bit of experience,Just curious,Cats ?±
4,"Yes, but I've forgotten everything",I have quite a bit of experience,I want to get a job where I use data science,Neither ?…
...,...,...,...,...
1244,"Yes, but I've forgotten everything",I have a little bit of experience,I want to get a job where I use data science,Dogs ?¶
1245,Nope,I have quite a bit of experience,I want to get a job where I use data science,Dogs ?¶
1246,Yep,I have a little bit of experience,Just curious,Neither ?…
1247,Yep,Nope,Just curious,Cats ?±


In [25]:

# Functions for calculating the degree of association between nominal variables
    
" This Function is using to find the Association between the Categorical Columns"

def ChiSquare(df,cols):

    crosstab = pd.crosstab(df[cols[0]], df[cols[1]])

    chi_sq_Stat, p_value, deg_freedom, exp_freq = stats.chi2_contingency(crosstab)

    print('Chi-square statistic %3.5f P value %1.6f Degrees of freedom %d' %(chi_sq_Stat, p_value,deg_freedom))

    if(p_value <= 0.05):

        print('We reject the Null Hypothesis and we retain Alternative Hypothesis : Two columns are dependent')

    else:

        print('We failed reject the Null Hypothesis : Two columns are independent')

def cramers_v(df,cols):

    crosstab = pd.crosstab(df[cols[0]], df[cols[1]])
    chi2 = stats.chi2_contingency(crosstab)[0]
    n = crosstab.sum().sum()
    phi2 = chi2/n
    r,k = crosstab.shape
    phi2corr = max(0, phi2-((k-1)*(r-1))/(n-1))
    rcorr = r-((r-1)**2)/(n-1)
    kcorr = k-((k-1)**2)/(n-1)
    association = np.sqrt(phi2corr/min((kcorr-1),(rcorr-1)))
    
    print('The Association between {0} and {1} is : {2:.3f}'.format(cols[0],cols[1],association))


In [26]:
#data = pd.read_csv("AB_NYC_2019.csv",parse_dates=[0])
x = [x for x in input("Enter multiple Columns: ").split()] 
ChiSquare(data,x)

Enter multiple Columns: stats	program	
Chi-square statistic 16.82763 P value 0.031955 Degrees of freedom 8
We reject the Null Hypothesis and we retain Alternative Hypothesis : Two columns are dependent


In [6]:
x = [x for x in input("Enter multiple Columns: ").split()] 
cramers_v(data,x)

Enter multiple Columns: neighbourhood_group room_type
The Association between neighbourhood_group and room_type is : 0.126


**Note :**
    
        With the Chi Square Function we came to know that there is a dependency between the two columns and with Cramer's V we came to know how much Association is there in between.