# Data Science, Prediction, and Forecasting - Assignment 1

## Expectation, Variance, and covariance

Briefly discuss:

- What is the intuition behind expectation, variance, and covariance? 
- What properties do they describe; how do the equations line up with your explanations?
- What is the difference between covariance and correlation?
- How do you interpret a covariance matrix?


Exercises: 

- Calculate the expectation of the vector _x_ in the code chunk below
- Implement functions for calculating variance and covariance (eq. 1.39 and 1.41)

In [1]:
# Numpy is the go-to library for anything involving vectors 
# and matrices in Python
import numpy as np

In [32]:
# sampling 10 random values from a normal distribution with mean 0 and sd 1
x = np.random.randn(10)
print(f"x: {x}")

x: [-0.55028918 -0.38581381  0.93777909 -0.15409486 -1.89389919 -0.20819534
  0.00486944 -0.60464846 -0.5540889  -1.93240368]


In [33]:

# calculate the expectation of x 
Ex = np.mean(x)
print(f"E(x): {Ex}")

E(x): -0.534078490124977


In [34]:

# calculate the expectation of x 
Ex = sum(x)/len(x)
print(f"E(x): {Ex}")

E(x): -0.534078490124977


In [14]:
def var(x):
    "Calculate variance following eq. 1.39"
    return np.mean(x**2) - np.mean(x)**2 

# check if your results match numpy's built in function
print(f"my var: {var(x)}")
print(f"np.var: {np.var(x)}")

my var: 0.41464244377689025
np.var: 0.4146424437768902


In [35]:
# sample 10 more random variables
y = np.random.randn(10)
print(f"y: {y}")

y: [-1.15433257  0.42435208 -0.60877069 -0.26527399 -0.13635247 -0.21176816
  0.53403857  0.17117869  0.22072002  0.66646836]


In [23]:

def covar(x, y):
     "Calculate covariance following eq. 1.41"
     return np.mean(x*y) - np.mean(x)*np.mean(y)

# check if result match numpy
print(f"my covar: {covar(x, y)}")
# np.cov calculates the entire covariance matrix; [0][1] extracts the 
# covariance of the variables
print(f"np.cov: {np.cov(x, y, ddof=0)[0][1]}") 
# What does ddoef=0 mean?

my covar: 0.059073644239441114
np.cov: 0.05907364423944109


In [24]:
# How do you interpret the covariance matrix? What does the diagonal describe?
np.cov(x, y, ddof=0)

array([[0.41464244, 0.05907364],
       [0.05907364, 1.10210914]])

Why do we care about covariance matrices? 
- Useful for e.g. calculating and understanding normal distributions! Play around with the first interactive figure in [this link](https://distill.pub/2019/visual-exploration-gaussian-processes/).

_Extra_: Implement the following two (possibly more intuitive) equations for variance and covariance

$$var(x) =\frac{\sum_{i=1}^{n}(x_i - \bar{x})^{2}}{n}$$

$$cov(x, y) = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{n}$$

In [25]:
def var2(x):
    "Calculate variance"
    return sum((x - np.mean(x))**2)/len(x)

In [28]:
print(f"my var: {var(x)}")
print(f"my var2: {var2(x)}")
print(f"np.var: {np.var(x)}")

my var: 0.41464244377689025
my var: 0.4146424437768902
np.var: 0.4146424437768902


In [26]:
def covar2(x, y):
    "Calculate covariance"
    return sum((x - np.mean(x))*(y - np.mean(y)))/len(x)

In [31]:
print(f"my covar: {covar(x, y)}")
print(f"my covar2: {covar2(x, y)}")
print(f"np.cov: {np.cov(x, y, ddof=0)[0][1]}")

my covar: 0.059073644239441114
my covar2: 0.05907364423944108
np.cov: 0.05907364423944109


Done already? Take a stab at any other exercises from the book!