# Correlation

Types of Correlation tests:

- Pearson's correlation coefficient
- Spearman's rank correlation coefficient
- Kendall's rank correlation coefficient
- Point-Biserial correlation coefficient
- Biserial correlation coefficient
- Phi coefficient
- Cramer's V

## Pearson's correlation coefficient

Pearson's correlation coefficient is a measure of the linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation.

$$r_{xy} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2}\sqrt{\sum_{i=1}^{n}(y_i - \bar{y})^2}}$$

## Spearman's rank correlation coefficient

Spearman's rank correlation coefficient is a nonparametric measure of the monotonicity of the relationship between two datasets. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. Like other correlation coefficients, this one varies between +1 and −1 with 0 implying no correlation. Correlations of −1 or +1 imply an exact monotonic relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

$$r_s = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2}\sqrt{\sum_{i=1}^{n}(y_i - \bar{y})^2}}$$

# Example of Pearson's correlation coefficient




In [2]:
# Pearson's correlation coefficient 

import pandas as pd 
import numpy as np 

def pearson(x, y):
    x_mean = np.mean(x)
    y_mean = np.mean(y)
    x_std = np.std(x)
    y_std = np.std(y)
    n = len(x)
    return sum((x-x_mean)*(y-y_mean))/(n*x_std*y_std)


# example dataset 

x=np.array([1, 2, 3, 4,5])
y=np.array([2, 4, 5, 4, 5])

print(f"Pearson Correlation Coefficenet: {pearson(x,y)}")


# print with if else statement

if pearson(x,y) < 0.6 > 0:
    print("Positive Corelaton.")
elif pearson(x,y)>0.6:
    print("Highly Positive Correlation")
elif pearson(x,y)> -0.6 <0:
    print("Negative Correlation")
elif pearson(x,y)< -0.6:
    print("Highly Negative Correlation.")
else:
    print("No Correlation.")

Pearson Correlation Coefficenet: 0.7745966692414834
Highly Positive Correlation


In [4]:
# SPearmean's correlation coefficient 

def spearman(x,y):
    x_rank = pd.Series(x).rank()
    y_rank = pd.Series(y).rank()
    return pearson(x_rank, y_rank)

print(f"Spearman Correlaton Coefficient: {spearman(x,y)}")

# print with if else statement

if spearman(x,y) < 0.6 >0:
    print("Positive Correlaton")
elif spearman(x,y) >0.6:
    print("highly Positive Correlation")
elif spearman(x,y) > -0.6 <0:
    print("Negative Correlation")
elif spearman(x,y) < -0.6:
    print("Highly Negative Correlation")
else:
    print("No Correlaton")

Spearman Correlaton Coefficient: 0.7378647873726218
highly Positive Correlation


# Other methods to computes correlation

In [5]:
import pandas as pd 
import numpy as np 

# example dataset 
x= np.array([1,2,3,4, 5])
y = np.array([2,4,5,4,5])



# Pearsong's correlation coeffficient 

pearson = np.corrcoef(x,y)
print(f"Pearson Correlation Coefficinet: {pearson[0,1]}")

Pearson Correlation Coefficinet: 0.7745966692414834


In [7]:
# Create an example dataset 
x=pd.Series([1,2,3,4,5])
y=pd.Series([2,4,5,4,5])

# pearson's correlation coefficient 
pearsong_corr =x.corr(y)
print(f"pearsong Correlaton Coefficient:{pearsong_corr}")

pearsong Correlaton Coefficient:0.7745966692414834


In [8]:
df = pd.DataFrame({'x':x,'y':y})
df.head()

Unnamed: 0,x,y
0,1,2
1,2,4
2,3,5
3,4,4
4,5,5


In [None]:
# using correlation matrix in pandas 

df = pd.DataFrame({'x':x, 'y':y})

# pearson's correlation coefficient

pearson_corr = df.corr(method = 'pearson')
spearman_corr = df.corr(method ='sparman')
kendall_corr = df.corr(method = 'kendall')
