# Statistical Analysis II - Practicum 1

## Non-parametric statistics

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.stats import spearmanr, kendalltau

### Spearman $\rho$ and Kendall $\tau$ correlation coefficients

Example from [the web](https://www.statology.org/spearman-correlation-python/)

In statistics, correlation refers to the strength and direction of a relationship between two variables. The value of a correlation coefficient can range from -1 to 1, with the following interpretations:

- -1: a perfect negative relationship between two variables
- 0: no relationship between two variables
- 1: a perfect positive relationship between two variables

One special type of correlation is called **Spearman Rank Correlation**, which is used to measure the correlation between two ranked variables. 

Kendall’s $\tau$ is a measure of the correspondence between two rankings. Values close to 1 indicate strong agreement (ranks grow together), and values close to -1 indicate strong disagreement (ranks follow an opposite trend).

(E.g. rank of a student’s math exam score vs. rank of their English exam score in a class).

![title](images/spearman.png)

![title](images/kendall.png)

In [None]:
#create DataFrame
df = pd.DataFrame({'student': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
                   'math': [70, 78, 90, 87, 84, 86, 91, 74, 83, 85],
                   'English': [90, 94, 79, 86, 84, 83, 88, 92, 76, 75]})
df.index = df.student

Let us firstly inspect the data through a scatter plot

In [None]:
df.plot.scatter('math','English')
plt.show()

Let's firstly manually calculate the coefficients

In [None]:
#calculate Spearman Rank correlation and corresponding p-value
rho, p_rho = spearmanr(df['math'], df['English'])

#print Spearman rank correlation and p-value
print(rho)

print(p_rho)

In [None]:
df['math_rank']=df['math'].rank()

In [None]:
df['English_rank']= df['English'].rank()

In [None]:
df.sort_values(by='math_rank')

In [None]:
#calculate Spearman Rank correlation and corresponding p-value
tau, p_tau = kendalltau(df['math'], df['English'])

#print Spearman rank correlation and p-value
print(tau)

print(p_tau)

In [None]:
df['x2'] = df['math']**2
df['y2'] = df['English']**2

In [None]:
r = (len(df)*(df.math*df.English).sum() - df.math.sum()*df.English.sum())/\
    (np.sqrt(len(df)*df.x2.sum()-df.math.sum()**2)*np.sqrt(len(df)*df.y2.sum()-df.English.sum()**2))
r

Further material available from [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html)