### Data Science, Week 1

# Model comparison and covariance

__Goal__: Learning to turn equations into code


## Model comparison

In your groups, briefly discuss:

- What are some methods to do model comparison?
- List some pros/cons of each method
- What can we do to increase the generalizability of our models?

Exercises:
- Implement the sum of squares and ridge regression loss functions (Bishop, p. 10)
- How do the loss functions behave as the weights _w_ increase or decrease? 
- Extra: implement the [__lasso__ loss function](https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c) 


In [1]:
# Numpy is the go-to library for anything involving vectors 
# and matrices in Python
import numpy as np

In [2]:
# Vector of true values
y = np.array([1, 2, 3, 4, 5, 6])

# For simplicity, let's assume we just have a vector of predicted values yhat 
# instead of expression y(x_n, w) in eq. 1.4
yhat = np.array([1, 2, 2.8, 3.7, 5.2, 6])

In [3]:
# Write a function calculating the sum of squares
def ss(y, yhat):
    return sum((y - yhat)**2)

In [4]:
# Ridge regression loss needs two more parameters, the weights w and 
# the lambda regularization weight
w = np.array([0.5, 1.5]) # Setting two arbitrary weights (betas in a linear regression)
lamb = 0.2 # Arbitrary lambda > 0 (NEGATIVE VALUES ARE NOT ALLOWED!)

In [5]:
# Implement the ridge regression loss function
def ridge(y, yhat, w, lamb):
    # Three ways to do the same
    return 0.5*(ss(y, yhat) + lamb*ss(w,0))
    # return 0.5*(ss(y, yhat) + lamb*sum(w**2))
    # return 0.5*(ss(y, yhat) + lamb*sum(w*w))

In [6]:
print(f"sum of squares: {ss(yhat, y)}")
print(f"ridge: {ridge(yhat, y, w, lamb)}")

sum of squares: 0.17000000000000004
ridge: 0.335


In [7]:
# What happens to the loss if you make the weights in w larger or smaller?
w = 2*w
print(f"ridge (w doubled): {ridge(yhat, y, w, lamb)}")

w = w/4
print(f"ridge (w halved): {ridge(yhat, y, w, lamb)}")
w = 2*w # reset to original value

# What happens if you increase or decrease lambda?
lamb = 2*lamb
print(f"ridge (lambda doubled): {ridge(yhat, y, w, lamb)}")

lamb = lamb/4
print(f"ridge (lambda halved): {ridge(yhat, y, w, lamb)}")
lamb = 2*lamb # reset to original value

ridge (w doubled): 1.085
ridge (w halved): 0.14750000000000002
ridge (lambda doubled): 0.585
ridge (lambda halved): 0.21000000000000002


## Expectation, Variance, and covariance

Briefly discuss:

- What is the intuition behind expectation, variance, and covariance? 
- What properties do they describe; how do the equations line up with your explanations?
- What is the difference between covariance and correlation?
- How do you interpret a covariance matrix?


Exercises: 

- Calculate the expectation of the vector _x_ in the code chunk below
- Implement functions for calculating variance and covariance (eq. 1.39 and 1.42)

In [8]:
# sampling 10 random values from a normal distribution with mean 0 and sd 1
x = np.random.randn(10)
print(x)

# calculate the expectation of x 
## -> We know that x was sampled from a standard Gaussian. This has expectation 0.

[-0.22296133 -0.8088917   0.16412208 -0.47816916  0.43129079  0.9476529
 -0.23414641 -0.59098412 -1.42949745 -1.40072738]


In [9]:
def var(x):
    "Calculate variance following eq. 1.39"
    return np.mean(x**2) - np.mean(x)**2

# check if your results match numpy's built in function
print(f"my var: {var(x)}")
print(f"np.var: {np.var(x)}")

my var: 0.5141129737505766
np.var: 0.5141129737505766


In [10]:
# sample 10 more random variables
y = np.random.randn(10)

def covar(x, y):
     "Calculate covariance following eq. 1.42"
     return np.mean(x*y.T) - np.mean(x)*np.mean(y.T)

# check if result match numpy
print(f"my covar: {covar(x, y)}")
# np.cov calculates the entire covariance matrix; [0][1] extracts the 
# covariance of the variables
print(f"np.cov: {np.cov(x, y, ddof=0)[0][1]}") 
# What does ddof=0 mean?
## -> Explained at https://numpy.org/doc/stable/reference/generated/numpy.cov.html

my covar: -0.15638134484943925
np.cov: -0.15638134484943922


In [11]:
# How do you interpret the covariance matrix? What does the diagonal describe?
## -> The diagonal contains the *variances* of the individual vectors (here x and y)
## -> The off-diagonal elements are symmetric and describe how a pair of different vectors covaries
np.cov(x, y, ddof=0)

array([[ 0.51411297, -0.15638134],
       [-0.15638134,  2.23681511]])

Why do we care about covariance matrices? 
- Useful for e.g. calculating and understanding normal distributions! Play around with the first interactive figure in [this link](https://distill.pub/2019/visual-exploration-gaussian-processes/).

_Extra_: Implement the following two (possibly more intuitive) equations for variance and covariance

$$var(x) =\frac{\sum_{i=1}^{n}(x_i - \bar{x})^{2}}{n}$$

$$cov(x, y) = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{n}$$

Done already? Take a stab at some probability theory exercises from [here](https://www.math.kth.se/matstat/gru/sf1901/TCOMK/exercises.pdf).