## Measures of Linear relationship between pairs of data
Say there are $n$ pairs of elements $(x_{1},y_{1}), (x_{2},y_{2}),.., (x_{n},y_{n})$

In [2]:
x = list(range(-10, 11))
y = [0, 2, 2, 2, 2, 3, 3, 6, 7, 4, 7, 6, 6, 9, 4, 5, 5, 10, 11, 12, 14]
x_, y_ = np.array(x), np.array(y)
x__, y__ = pd.Series(x_), pd.Series(y_)

### Sample Covariance
$S_{xy}=\frac{1}{N-1} \sum_{i=0}^{N-1}(x_{i}-\bar{x})(y_{i}-\bar{y})$ where $N$ is the number of observations

In [5]:
# Pure python
def samp_cov(x,y):
    if len(x)==len(y):
        n=len(x)
        x_mean= sum(x)/n
        y_mean= sum(y)/n
        cov_xy= sum( (x[k]-x_mean)*(y[k]-y_mean) for k in range(n))/(n-1)
        return (cov_xy)
    else:
        print('Lists are of different length!')

In [6]:
samp_cov(x,y)

19.95

In [8]:
# Numpy cov() returns the covariance matrix
cov_matrix = np.cov(x_, y_)
cov_matrix[0,1]

19.95

In [9]:
# Pandas
x__.cov(y__)   # or y__.cov(x__)

19.95

### Sample Correlation
$r_{xy}=\frac{cov_{xy}}{\sqrt{var_x * var_y}}$

In [42]:
def samp_corr(x,y):
        if len(x)==len(y):
            n=len(x)
            x_mean= sum(x)/n
            y_mean= sum(y)/n
            cov_xy= sum( (x[k]-x_mean)*(y[k]-y_mean) for k in range(n))/(n-1)
            x_var = sum( (item - x_mean)**2 for item in x)/ (n-1)
            y_var = sum( (item - y_mean)**2 for item in y)/ (n-1)
            x_std, y_std = x_var**0.5, y_var **0.5
            corr_xy = cov_xy/ (x_std * y_std)
            return (corr_xy)
        else:
            print('Lists are of different length!')

In [12]:
samp_corr(x,y)

0.861950005631606

In [13]:
# SciPy - pearsonr() returns a tuple with two values (Correlation co-efficient and p-value)
scipy.stats.pearsonr(x_,y_)

(0.8619500056316061, 5.122760847201135e-07)

We can use the `scipy.stats.linregress()` to get the same result.
`linregress()` takes x_ and y_, performs linear regression, and returns the results. 
slope and intercept define the equation of the regression line, while rvalue is the correlation coefficient. 

In [15]:
scipy.stats.linregress(x_,y_)

LinregressResult(slope=0.5181818181818181, intercept=5.714285714285714, rvalue=0.861950005631606, pvalue=5.122760847201165e-07, stderr=0.06992387660074979, intercept_stderr=0.4234100995002589)

In [16]:
scipy.stats.linregress(x_,y_).rvalue

0.861950005631606

In [14]:
# NumPy corrcoef() returns the correlation matrix
np.corrcoef(x_,y_)

array([[1.        , 0.86195001],
       [0.86195001, 1.        ]])

In [17]:
# Pandas
x__.corr(y__)   # or y__.corr(x__)

0.8619500056316061