# Statistics Fundamentals 

_October 27, 2020_

Agenda today:
- Measure of central tendency: mean, median, mode
- Measure of dispersion: variance, standard deviation
- Measure of relationship: covariance and correlation

In [61]:
import numpy as np
import matplotlib.pyplot as plt

## Part I. Mean, Median, and Mode
What are the definition of the three measurements?

In [62]:
array = [10,11,11,12,11,13,14,16,17,18,19,20,22,24,26,22,24]
# plot it out and examine it 
plt.style.use('fivethirtyeight')

What is the above plot called? What kind of values can it be used to represent?

## Part II. Measure of Dispersion
Two measurements of dispersion we will be concerned with is **variance** and **standard deviation**. They are both measurement of variability of dataset. Why might we need a measure of variability in addition to central tendency?

#### Variance calculation:
$$ \large \sigma^2 = \dfrac{1}{n}\displaystyle\sum^n_{i=1}(x_i-\mu)^2 $$

#### Standard deviation calculation:
$$ \large \sigma = \sqrt{\dfrac{1}{n}\displaystyle\sum^n_{i=1}(x_i-\mu)^2} $$

In [63]:
# exercises

# can you write a function that takes in an array, calculate the variance and standard deviation?
def calculate_variance(array):
    '''
    calculate the variance of an array
    '''
    n = len(array)
    sum1 = 0
    for i in range(n):
        sum1 = sum1 + array[i]
    mean = sum1/n
    sum2=0
    for i in range(n):
        sum2 = sum2 + (array[i] - mean)**2
    var = sum2/n
    return var
print(calculate_variance(array))
print(np.var(array))

26.52595155709342
26.52595155709342


In [64]:
def calculate_std(array):
    '''
    calculate the standard deviation of an array
    '''
    return calculate_variance(array)**1/2
print (calculate_std(array))

13.26297577854671


## Part III. Covariance and Correlation
Covariance and correlation measures the degree of two variables' relationship. 

#### Covariance calculation:
$$Cov_{X,Y} = \dfrac{1}{n}\displaystyle\sum_{i=1}^{n}(x_i -\mu_x)(y_i - \mu_y)$$

#### Correlation calculation:
$$ r = \frac{cov(X,Y)} {\sigma_x  \sigma_y}$$

<img src= 'https://raw.githubusercontent.com/learn-co-curriculum/dsc-correlation-covariance/master/images/correx.svg'>

In [65]:
## exercises

# write a function that calculates the correlation and covariance of two arrays 
array2 = [11,12,13,14,13,12,11,14,15,16,11,19,13,19,22,28,22]

def calculate_covariance(array1, array2):
    cov1=[]
    cov2=[]
    sum1 = 0
    sum2 = 0
    sum3 = 0
    mean1 = np.mean(array1)
    mean2 = np.mean(array2)
    
    for i in range(0, len(array1)):
        sum1 += ((array1[i] - mean1) * (array1[i]  - mean1))
        cov_b = sum1/(len(array1)-1)
    cov1.append(cov_b)

    for i in range(0, len(array1)):
        sum2 += ((array1[i] - mean1) * (array2[i] - mean2))
        cov_a = sum2 / (len(array1)-1)
    cov1.append(cov_a)
    
    cov2.append(cov_a)
    
    for i in range(0, len(array2)):
        sum3 += ((array2[i] - mean2) * (array2[i] - mean2))
        cov_c = sum3 / (len(array2)-1)
    cov2.append(cov_c)
    cov1.append(cov2)
    return cov1 

print(calculate_covariance(array,array2))
print(np.cov(array,array2))

[28.18382352941176, 18.713235294117645, [18.713235294117645, 23.38235294117647]]
[[28.18382353 18.71323529]
 [18.71323529 23.38235294]]


In [68]:
def calculate_correlation(array1, array2):
    '''
    calculate the correlation of two arrays
    '''
print(np.corrcoef(array, array2))

[[1.         0.72896188]
 [0.72896188 1.        ]]
