# Exercise 6: Correlation

In this series of exercises we'll look at how to calculate basic measures of correlation in python 

## Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import os

# For retina displays only 
# from IPython.display import set_matplotlib_formats
# set_matplotlib_formats('retina')
%matplotlib inline

In [None]:
# Creating some data
N = 500 # number of samples
X = np.random.normal(loc=3.0, scale=5.0, size=N)
noise = np.random.normal(loc=0, scale=7.0, size=N)
Y = 2.2*X + noise

In [None]:
# Always plot your data
fig = plt.figure(figsize=(5,5))
gs = fig.add_gridspec(2, 2, width_ratios=[1, .1], height_ratios=[.1, 1])
ax = fig.add_subplot(gs[1, 0])
ax_histx = fig.add_subplot(gs[0, 0], sharex=ax)
ax_histy = fig.add_subplot(gs[1, 1], sharey=ax)
ax.scatter(X, Y)
ax_histx.hist(X, bins=100)
ax_histy.hist(Y, bins=100, orientation='horizontal')
plt.tight_layout()

## Pearson's correlation ($r$)

In [None]:
# using numpy
np.corrcoef(X, Y)

In [None]:
# using scipy.stats
import scipy.stats

scipy.stats.pearsonr(X, Y)

In [None]:
# Using pandas
df = pd.DataFrame({'x': X, 'y': Y})

df.corr(method='pearson')

## Spearman's correlation ($s$)

In [None]:
# using scipy.stats
scipy.stats.spearmanr(X, Y)

In [None]:
# Using pandas
df = pd.DataFrame({'x': X, 'y': Y})

df.corr(method="spearman")

In [None]:
# verifying that what I said was true
scipy.stats.pearsonr(scipy.stats.rankdata(X), scipy.stats.rankdata(Y))

In [None]:
# Excerise 1: Demonstrate that sample size changes p value but not correlation












In [None]:
# Excerise 2: Play around with different X Y associations and calculate $r$ and $s$










## Kendall's tau

In [None]:
# Exercise 3: Find the function from scipy that allows you to calculate Kendalls's tau. Use this and compare to other forms of correlation. 







In [None]:
# Exercise 4: How do you calculate Kendall's tau using pandas? 








## Cramer's V

Unfortunately, python does not have a built in function to calculate Cramer's V from a contingency table **but** it's not hard to create your own function if you can calculate $\chi^2$




In [None]:
# Here's a made up coningency table 

data = np.random.randint(1, 20, size=(3,2))
print(data)

In [None]:
# Exercise 5: find the scipy function to calculate chi2 and use it on the data above. 






In [None]:
# Exercise 6: Figure out how to derive the remaining values from `data` that you need to calculate Cramer's V (i.e. total number of counts, n, and minimum dimension size - 1). Use these and chi2 above to write your own Cramer's V function 

n = ???
min_dim = ???

def CramersV(data, ...):
    ???

In [None]:
# Bonus Exercise 7: Turns out the "naive" forumla for Cramers V is biased. Look up the correction and modify your function to include it. 




