### Day 7 - Pearson Correlation Coefficient. Spearsman Correlation Coefficient
________________________________________________

  <br/>

- [Background](#Background)
- [Task 1](#Task)
- [Task 2](#task2)

  <br/>

#### Background 
__Covariance__ is a measure of how two random variables change together, or the strength of their correlation.

Consider two random variables, $ X $ and $ Y $, each with $ n $ values (i.e.,$ x_1, x_2, ..., x_n $  and $ y_1, y_2, ..., y_n $). The covariance of $ X $ and $ Y $ can be found using following formula:
$$
cov(X,Y) = \frac{1}{n} {\sum_{i=1}^n {(x_i-\bar{x})⋅(y_i-\bar{y})}}
$$,
where $ \bar{x} $ is the mean of $ X $ and $ \bar{y} $ is the mean of $ Y $.

The Pearson correlation coefficient, $ \rho_{X,Y}$ , is given by:

$$
\rho_{X,Y} = \frac{cov(X,Y)}{\sigma_X\sigma_Y}  = \frac{{\sum_{i=1}^n {(x_i-\bar{x})⋅(y_i-\bar{y})}}}{n\sigma_X\sigma_Y}
$$,
where  $ \sigma_X $ is the standard deviation of $ X $ and $ \sigma_Y $ is the standard deviation of $ Y $.

[Full tutorial link on HackerRank](https://www.hackerrank.com/challenges/s10-pearson-correlation-coefficient/tutorial)


If $ Rank_X $ and $ Rank_Y $ denote the respective ranks of each data point, then the Spearman's rank correlation coefficient, $ r_S $ , is the Pearson correlation coefficient of  $ Rank_X $ and $ Rank_Y $.

If $ X $ and $ Y $ don't contain duplicates:

$$
r_S = 1 - \frac{6⋅{\sum_{i=1}^n {d_i^2}}}{n(n^2-1)}
$$,
where $ d_i $  is the difference between the respective values of $ Rank_X $ and $ Rank_Y $.

[Full tutorial link on HackerRank](https://www.hackerrank.com/challenges/s10-spearman-rank-correlation-coefficient/tutorial)


#### Task

Given two __n__-element data sets, __X__ and __Y__, calculate the value of the Pearson correlation coefficient.


##### Input Format

The first line contains an integer, __n__ , denoting the size of data sets __X__ and __Y__.
The second line contains __n__ space-separated real numbers (scaled to at most one decimal place), defining data set __X__.
The third line contains __n__ space-separated real numbers (scaled to at most one decimal place), defining data set __Y__.

###### Constraints
- 10 <= n <= 100
- 1 <= X<sub>i</sub> <= 500
- 1 <= Y<sub>i</sub> <= 500
- Data set X contains unique values.
- Data set Y contains unique values.

##### Output Format

Print the value of the Pearson correlation coefficient, rounded to a scale of 3 decimal places.



In [1]:
# Solution without using numpy
import math

n = int(input())
X = list(map(float,input().split()))
Y = list(map(float,input().split()))

X_mean = sum(X)/len(X)
Y_mean = sum(Y)/len(Y)

X_std = math.sqrt(sum([(x-X_mean)**2 for x in X])/len(X))
Y_std = math.sqrt(sum([(y-Y_mean)**2 for y in Y])/len(Y))

pear_c = sum([(x-X_mean)*(y-Y_mean) for x,y in zip(X,Y)]) / (n * X_std * Y_std)

print(f"{pear_c:0.3f}")



10
10 9.8 8 7.8 7.7 7 6 5 4 2
200 44 32 24 22 17 15 12 8 4
0.612


In [2]:
# Solution with numpy
import numpy as np

n = int(input())
X = list(map(float,input().split()))
Y = list(map(float,input().split()))

print(f"{np.corrcoef(X, Y)[0][1]:0.3f}") 

10
10 9.8 8 7.8 7.7 7 6 5 4 2
200 44 32 24 22 17 15 12 8 4
0.612


#### Task<a name="task2" />

Given two __n__-element data sets, __X__ and __Y__, calculate the value of the Spearman's rank correlation coefficient.

##### Input Format

The first line contains an integer, __n__ , denoting the size of data sets __X__ and __Y__.
The second line contains __n__ space-separated real numbers (scaled to at most one decimal place), defining data set __X__.
The third line contains __n__ space-separated real numbers (scaled to at most one decimal place), defining data set __Y__.

###### Constraints
- 10 <= n <= 100
- 1 <= X<sub>i</sub> <= 500
- 1 <= Y<sub>i</sub> <= 500
- Data set X contains unique values.
- Data set Y contains unique values.

##### Output Format

Print the value of the Spearman's rank correlation coefficient, rounded to a scale of 3 decimal places.



In [5]:
# Solution without using numpy or scipy
import math

n = int(input())
X = list(map(float,input().split()))
Y = list(map(float,input().split()))

rank_X = [sorted(X).index(x)+1 for x in X ]
rank_Y = [sorted(Y).index(y)+1 for y in Y ]

spearmans_c = 1 - ((6 * sum([(rx-ry)**2 for rx,ry in zip(rank_X, rank_Y)])) / (n * (n**2 - 1)))

print(f"{spearmans_c:0.3f}")


10
10 9.8 8 7.8 7.7 1.7 6 5 1.4 2
200 44 32 24 22 17 15 12 8 4
0.903


In [6]:
# solution with scipy
import scipy.stats

n = int(input())
X = list(map(float,input().split()))
Y = list(map(float,input().split()))

print(f"{scipy.stats.spearmanr(X,Y).correlation:0.3f}")


10
10 9.8 8 7.8 7.7 1.7 6 5 1.4 2
200 44 32 24 22 17 15 12 8 4
0.903
