<a href="https://colab.research.google.com/github/DavideScassola/PML2024/blob/main/Notebooks/02_numpy_pandas_sklearn/021_numpy_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numpy exercises

Probabilistic Machine Learning -- Spring 2024, UniTS

### <ins>No loops allowed!<ins>
(also list comprehension is not allowed)

## Exercise 1

Compute the empirical correlation coefficient between the given obsevations `x` and `y`:

$$\rho[x,y] = \frac{cov[x,y]}{\sigma_x \sigma_y}$$ 

In [1]:
import numpy as np
np.random.seed(0)
x = np.random.normal(0, 1, 1000)
y = 2*x + np.random.normal(0, 1, 1000)

### Solution

In [2]:
(np.mean(x*y) - np.mean(x)*np.mean(y))/(np.std(x)*np.std(y))

0.8951815276982953

## Exercise 2

Define a function that computes the mean squared error between an array of observaions `y` and an array of predictions `pred`:

$$MSE(y,\hat{y}) := \frac{1}{n} \sum_{i=1}^{n} (y - \hat{y})^2$$

In [3]:
np.random.seed(0)
y = np.random.normal(0, 1, 1000)
pred = y + np.random.normal(0, 0.4, 1000)

### Solution

In [4]:
def mse(pred, y):
    return np.mean((pred - y)**2)

mse(pred, y)

0.15000306805541663

## Exercise 3

Compute a matrix $M$ where $M_{i,j} := i \times j$ for $i \in \{1,\ldots,10\}$ and $j \in \{1,\ldots,10\}$

Hint: use broadcasting

### Solution

In [5]:
x = np.arange(1, 11)
x.reshape(-1,1) * x.reshape(1,-1)

array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100]])

## Exercise 4

Given the following `(8,3)` matrix `x`, build a matrix `y` such that row $y_{i} = x_{2i} + x_{2i -1}$

In [6]:
x = np.arange(24).reshape(8,3)
x

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14],
       [15, 16, 17],
       [18, 19, 20],
       [21, 22, 23]])

### Solution

In [7]:
x[::2] + x[1::2]

array([[ 3,  5,  7],
       [15, 17, 19],
       [27, 29, 31],
       [39, 41, 43]])

## Exercise 5

Write a function that given a scalar valued function `f`, an interval `[a,b]` and a number of steps `n` approximates $\int_a^b f(x) \ dx$ using the [trapezoidal rule](https://www.wikiwand.com/en/Trapezoidal_rule). Then test it on the following given function `g` in the interval `[-10,10]`.

Hint: use the function `np.linspace`

In [8]:
def g(x):
    return (x**3 - 2*x - x**5 * np.sin(x))*np.exp(-0.2*x**2)

def trapezoidal_rule(f, a, b, n):
    pass

### Solution

## Exercise 6

Given the following `(1000, 5)` matrix `m` representing a set of n=`1000` observations having `5` features, compute the empirical correlation matrix:
$$ R = \frac{1}{n} X^t X $$
where $X$ is the standardized data (centered in 0, rescaled to 1).

Hint: use `.dot()` or `@` for matrix multiplication and `.T` for transposing

In [9]:
np.random.seed(0)
m = np.random.normal(0, 1, (1000, 5)).cumsum(axis=1)

### Solution

## Exercise 7

Compute a series of 1000 coordinates $(x_i,y_i)$ such that:
$(x_0,y_0) = 0$ and $(x_{i+1},y_{i+1}) = (x_i + \epsilon^x_i ,y_i + \epsilon^y_i)$ where 

$\epsilon^x_i \sim \mathcal{N}(0,1)$ and $\epsilon^y_i \sim \mathcal{N}(0, 1)$

Then visualize it using the following function (`plot_random_walk`, that takes as input an `(n,2)` shaped array).

Hint: use the function `np.cumsum` 

In [10]:
import matplotlib.pyplot as plt
plt.figure(dpi=400)

def plot_random_walk(x):
    plt.plot(x[:,0], x[:,1], alpha=0.8)

<Figure size 2400x1600 with 0 Axes>

### Solution

## Exercise 8

Implement the [ReLU (rectified linear unit) activation function](https://www.wikiwand.com/en/Rectifier_(neural_networks)) in all the 3 ways showed in this formula:



<div style="background-color:white;">
    <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/1b43bb9d7f851aa7c8ce8e4dacbd943d18512528" width="600">
</div>

then appliy it to the given array `x` and verify that results are equal.

Hint: use `np.maximum`, `np.abs`, `np.where` for computing ReLu, use `==` and `np.all` for checking equality.

In [11]:
x = np.arange(10)-5

### Solution

## Exercise 9

The entropy of a discrete distribution $p$ is defined as:

$$\text{H}[p] := -\mathbb{E}[\log{p}(x)]$$

compute it for the given array $p$ representing a discrete distribution.

In [12]:
p = np.array([0.1, 0.2, 0.0, 0.2, 0.5])

### Solution

## Exercise 10

The following array `price` with shape `(365, 7)` contains the prices of 7 different assets recorded at the end of 365 different days. The array `portfolio` of shape `(7,)` instead contains the amounts for each asset that you posses.

Compute:
- the total value of the portfolio $v_i$ at the end of each day
- the value difference between consecutive days $v_i - v_{i-1}$
- the value ratio between consecutive days $\frac{v_i}{v_{i-1}}$
- plot the price series of the third asset (using `plt.plot`)
- plot $v_i$ (using `plt.plot`)

In [13]:
import matplotlib.pyplot as plt
plt.figure(dpi=400)

def generate_prices(n):
    rng = np.random.default_rng(13)
    x = rng.normal(0,0.01,size=(n,7))
    x[:,0] = 0.3*x[:,0] + 0.8*x[:,1] + 0.3*x[:,2]
    x[:,4] = x[:,4] * 2
    x[:,5:] = x[:,5:] * 0.5
    x[:, 3] += 0.001
    x = np.exp(np.cumsum(x, axis=0))
    return x

x = generate_prices(365)

portfolio = np.array([12, 200, 100, 125, 50, 5, 100])

<Figure size 2400x1600 with 0 Axes>

### Solution