# Regression

## Covariance

Direction of change (do they vary in the positive relation). Correlation measures strength. Cov value is dependent on the size of the number

# $Cov(X,Y) = \Sigma{ ({x - \bar{x}})({y - \bar{y}}) \over N}$

In [1]:
import numpy as np

In [2]:
# By Default Get Population Cov
def covariance(x, y, sample=False):
    x_hat = np.mean(x)
    y_hat = np.mean(y)
    numerator = [(x-x_hat)*(y-y_hat) for x,y in zip(x, y)]
    
    if sample:
        return sum(numerator)/(len(numerator)-1)
    else:
        return sum(numerator)/len(numerator)

In [3]:
xlist = [12,30,15,24,14,18,28,26,19,27]
ylist = [20,60,27,50,21,30,61,54,32,57]

In [4]:
covariance(xlist, ylist)

96.24

## Variance

Spread of data over a set of values

# $\sigma^2 = \Sigma{ ({x-\bar{x}})^2 \over {N}}$

In [5]:
# By Default Get Population Var
def variance(x, sample=False):
    x_hat = np.mean(x)
    var = [(i-x_hat)**2 for i in x]
    if sample:
        return sum(var)/(len(var)-1)
    else:
        return sum(var)/len(var)

In [6]:
variance(xlist, False)

37.809999999999995

In [7]:
np.var(xlist)

37.81000000000001

## Standard Deviation

In [8]:
def standard_dev(x, sample=False):
    if sample:
        return np.sqrt(variance(x, True))
    else:
        return np.sqrt(variance(x))

## Regression - Closed Form

In [9]:
xl2 = [95, 85, 80, 70, 60]
yl2 = [85, 95, 70, 65, 70]

In [10]:
def closed_form(x, y):
    return str("y = " +  "{0:.2f}".format(cf_intercept(x, y)) + " + " + "{0:.4f}".format(cf_slope(x, y)) + "x")

def cf_slope(x, y):
    return covariance(x,y)/variance(x)

def cf_intercept(x, y):
    x_hat = np.mean(x)
    y_hat = np.mean(y)
    return y_hat - (x_hat*cf_slope(x,y))

In [11]:
closed_form(xl2, yl2)

'y = 26.78 + 0.6438x'

## Pearson Correlation

# $\rho_{X, Y} = {cov(X, Y) \over {\sigma_X \sigma_Y}}$

In [12]:
from scipy.stats import pearsonr

In [13]:
def pearson_relation(x, y):
    return covariance(x, y)/(standard_dev(x)*standard_dev(y))

In [14]:
pearsonr(xl2, yl2)

(0.6930525298193005, 0.19446749009400918)

In [15]:
pearson_relation(xl2, yl2)

0.6930525298193004

## Scipy Linear Regression

In [16]:
from scipy.stats import linregress

## Multiple Linear Regression

With multiple X values, the regression will grant a plane solution

In [40]:
M = [[1,2],
     [3,4],
     [5,6]]

In [41]:
def transpose_zip(matrix):
    return list(zip(*matrix))

def transpose(matrix):
    # Get size of row col
    # create new of col row -> len M[0] = #col, len(M) = #row
    matrix_transposed = [[matrix[j][i] for j in range(len(matrix))] for i in range(len(matrix[0]))]
    return matrix_transposed

In [42]:
M_t = transpose_zip(M)

In [43]:
transpose(M)

[[1, 3, 5], [2, 4, 6]]