## Measures of Correlation Between Pairs of Data

If:
```
x = [1,2,3,4]
y = [1,2,3,4]
```

Then the pairs of x and y is:
```
pairs_x_y = [
    (1,1),
    (2,2),
    (3,3),
    (4,4)
]
```

You’ll see the following measures of correlation between pairs of data:

- Positive correlation exists when larger values of 𝑥 correspond to larger values of 𝑦 and vice versa.
- Negative correlation exists when larger values of 𝑥 correspond to smaller values of 𝑦 and vice versa.
- Weak or no correlation exists if there is no such apparent relationship.

![image.png](https://files.realpython.com/media/py-stats-08.5a1e9f3e3aa4.png)

In [1]:
import math
import statistics
import numpy as np
import scipy.stats 
import pandas as pd

In [3]:
# initial data
x = list(range(-10, 11))
y = [0, 2, 2, 2, 2, 3, 3, 6, 7, 4, 7, 6, 6, 9, 4, 5, 5, 10, 11, 12, 14]
x_, y_ = np.array(x), np.array(y)
x__, y__ = pd.Series(x_), pd.Series(y_)

### Covariance
The sample covariance is a measure that quantifies the strength and direction of a relationship between a pair of variables.


\# Basically the concept is the same

In [5]:
n = len(x)
cov_matrix = np.cov(x_, y_)
cov_matrix

array([[38.5       , 19.95      ],
       [19.95      , 13.91428571]])

In [6]:
# example:
# The upper-left element of the covariance matrix is the covariance of x and x, or the variance of x.
# Similarly, the lower-right element is the covariance of y and y, or the variance of y

In [7]:
x_.var(ddof=1)

38.5

In [8]:
y_.var(ddof=1)

13.914285714285711

In [10]:
cov_xy = x__.cov(y__)
cov_xy

19.95

In [11]:
cov_yx = y__.cov(x__)
cov_yx

19.95

### Correlation Coefficient
The correlation coefficient, or Pearson product-moment correlation coefficient, is denoted by the symbol 𝑟.

You can think of it as a **standardized covariance**.

- The value 𝑟 > 0 indicates positive correlation. 
- The value 𝑟 < 0 indicates negative correlation.
- The value r = 1 is the maximum possible value of 𝑟. It corresponds to a perfect positive linear relationship between variables. 
- The value r = −1 is the minimum possible value of 𝑟. It corresponds to a perfect negative linear relationship between variables.
- The value r ≈ 0, or when 𝑟 is around zero, means that the correlation between variables is weak.

In [16]:
# scipy
r, p = scipy.stats.pearsonr(x_, y_)
print(r)
print(p) # p-value, skip this by now


0.8619500056316062
5.122760847201132e-07


In [15]:
# the matrix
corr_matrix = np.corrcoef(x_, y_)
corr_matrix

array([[1.        , 0.86195001],
       [0.86195001, 1.        ]])

In [18]:
r = corr_matrix[0, 1]
r

0.8619500056316061

In [20]:
# using scrupy's linregress()
scipy.stats.linregress(x_, y_)


LinregressResult(slope=1.0, intercept=0.0, rvalue=1.0, pvalue=1.3080864538736422e-188, stderr=0.0, intercept_stderr=0.0)

In [21]:
scipy.stats.linregress(x_, x_)


LinregressResult(slope=1.0, intercept=0.0, rvalue=1.0, pvalue=1.3080864538736422e-188, stderr=0.0, intercept_stderr=0.0)

In [22]:
result = scipy.stats.linregress(x_, y_)
result.rvalue

0.861950005631606

In [23]:
# Pandas.Series's .corr()
r= x__.corr(y__) # same as y__.corr(x__)
r

0.8619500056316061