### Cramer's V 
@Citations:  
* [Cramer's V Interpretation paper](https://www.researchgate.net/publication/307963787_Cramer's_V)  
* [Stack Overflow Cramer's code citation](https://stackoverflow.com/questions/20892799/using-pandas-calculate-cram%C3%A9rs-coefficient-matrix)  
* [Wikipedia Cramer'V for Bias correction snippet](https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V)

#### Why Cramer's V?

Its foundation is Chi-sq test of independence for two categorical columns. However Chi-sq output is unbounded, whereas Cramer's V is normalized by sample set size. Finally Cramer's V values lie within 0-1 which makes it easier to compare with other correlation/ association tests.  

Also chi-sq is non-parametric. Avoid using on ordinal columns. 

In [None]:
import pandas as pd
import scipy as sc
import numpy as np
import scipy.stats as stat

In [1]:
def helper_cramersVpair(pdf, colname1, colname2):
    """ 
    calculate Cramers V statistic for categorial-categorial association.
    uses correction from Bergsma and Wicher, 
    Journal of the Korean Statistical Society 42 (2013): 323-328
    (Source in citation)
        
    Inputs:
    pdf: Pandas data frame
    colname1: column name for first nominal column
    colname2: column name for second nominal column
    
    Outputs:
    chisquareScore: single score for association strenght of 2 cols
    
    Usage:
    cramersV = helper_cramersVPair(data, 'col1', 'col2')
    """
    # creating cross table built of freq count of 2 nominal categorical vars
    pairCrossTab = pd.crosstab(pdf[colname1], pdf[colname2])
    # selecting only the chisq statistic
    chi2 = stat.chi2_contingency(pairCrossTab)[0]
    # applying bias correction and converting to cramer's V
    n = pairCrossTab.sum().sum()
    phi2 = chi2/n
    r,k = pairCrossTab.shape
    phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))    
    rcorr = r - ((r-1)**2)/(n-1)
    kcorr = k - ((k-1)**2)/(n-1)
    return np.sqrt(phi2corr / min( (kcorr-1), (rcorr-1)))