# Statistics Tutorial

https://www.youtube.com/playlist?list=PLGLfVvz_LVvQjNJr85J4U_lxDg8vgqvcO
<br />
https://www.newthinktank.com/2020/07/statistics-every-day/

In [1]:
import numpy as np
import matplotlib as plt
import math

## Part 1
https://www.youtube.com/watch?v=YCPYNXtwKAc&list=PLGLfVvz_LVvQjNJr85J4U_lxDg8vgqvcO&index=1&ab_channel=DerekBanas

#### Mean: Average of Data Set
Mean Population: $\mu$
<br />
Mean Sample: $\bar{x}$
$$
\mu = \frac{1}{N} \sum_{i = 1}^{N} x_i 
\qquad \qquad 
\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_i
$$

In [2]:
def mean(*args):
    val_sum = sum(args)
    return val_sum / len(args)

print('Mean: ', mean(1,2,3,4,5))

Mean:  3.0


#### Median: Number at Center of Data Set

In [3]:
def median(*args):
    if len(args) % 2 == 0:
        left_center = int(len(args) / 2) - 1
        right_center = left_center + 1
        return mean(args[left_center], args[right_center])
    else:
        center = int(len(args) / 2)
        return args[center]

print('Median: ', median(0,1,2,3,4))

Median:  2


#### Mode: Number that occurs the Most

In [4]:
def mode(*args):
    frequencies = {arg : args.count(arg) for arg in args}
    mode = [key for key, val in frequencies.items() if val == max(frequencies.values())]
    return mode

print('Mode: ', mode(1,1,2,2,3))

Mode:  [1, 2]


#### Variance: How Data is Spread around the Mean
Variance Population: $\sigma^2$
<br />
Variance Sample: $S^2$
$$
\sigma^2 = \frac{1}{N} \sum_{i = 1}^{N} (x_i - \mu)^2 
\qquad \qquad
S^2 = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_i - \bar{x})^2
$$

In [5]:
def variance(*args):
    mean_val = mean(*args)
    numerator = sum([(arg - mean_val) ** 2 for arg in args])
    return numerator / (len(args) - 1)


print('Variance: ', variance(4,3,6,5,2))

Variance:  2.5


#### Standard Deviation

Standard Deviation Population: $\mu$
<br />
Standard Deviation Sample: $S$

In [6]:
def standard_deviation(*args):
    return math.sqrt(variance(*args))

print('Standard Deviation: ', standard_deviation(4,3,6,5,2))

Standard Deviation:  1.5811388300841898


#### Coefficient of Variation: How Data is Spread in relation to the Mean

$$
\text{Coefficient of Variation} = \frac{S}{\bar{x}}
$$

In [17]:
def coefficient_variance(*args):
    return standard_deviation(*args) / mean(*args)


miles = [3, 4, 4.5, 3.5]
kms = [4.828, 6.437, 7.242, 5.632]
print('Coefficient Variance (m): ',  coefficient_variance(*miles))
print('Coefficient Variance (km): ', coefficient_variance(*kms))


Coefficient Variance (m):  0.17213259316477408
Coefficient Variance (km):  0.17214686292344047


#### Covariance: Which Direction 2 Datasets are Moving

$$
\text{COV} =  \frac{1}{n - 1} \sum_{i = 1}^{n} (x_i - \bar{x})^2 (y_i - \bar{y})^2
$$

COV > 0: The datasets are moving Together
<br />
COV < 0: The datasets are moving Opposite
<br />
COV = 0: The datasets are moving Independently of each other


In [29]:
def covariance(x, y):
    if (len(x) != len(y)):
        return None
        
    mean_x = mean(*x)
    mean_y = mean(*y)
    numerator = 0
    for i in range(0, len(x)):
        numerator += (x[i] - mean_x) * (y[i] - mean_y)
    return numerator / (len(x) - 1)


market_cap = [1532, 1488, 1343, 928, 615]
earnings = [58, 35, 75, 41, 17]
print('Covariance: ', covariance(market_cap, earnings))

    

Covariance:  5803.200000000001


#### Correlation Coefficient: How Dependent Datasets are to Each Other
$$
r = \frac{COV(X,Y)}{S(X) * S(Y)}, \qquad -1 < r < 1
$$

r = 1: Perfect Correlation
<br />
As r approaches 1, the closer the relation between the two datasets
<br />
As r approaches -1, the closer the inverse relation between the two datasets
<br />
r = 0: Independence

In [31]:
def correlation_coefficient(x, y):
    if (len(x) != len(y)):
        return None
        
    return covariance(x, y) / (standard_deviation(*x) * standard_deviation(*y))

print('Correlation Coefficient: ', correlation_coefficient(market_cap, earnings))

Correlation Coefficient:  0.660125602195931


## Part 2
https://www.youtube.com/watch?v=ger_Won5sRQ&list=PLGLfVvz_LVvQjNJr85J4U_lxDg8vgqvcO&index=2&ab_channel=DerekBanas