# Practical session 2

In this session we are going to explore the concept of gradients and how they can be used to more efficiently find a model that is a good fit for the data.

A gradient represents the direction in which a variable is heading with respect to another. Imagine that you have two related quantities A and B. If A increases as B increases, then the _gradient of A with respect to B_ is going to be positive. If A decreases as B increases, then the _gradient of A with respect to B_ is negative. Gradients also have a magnitude, which indicates how fast A will increase (or decrease) as B increases.

Gradients are very useful because we have mathematical expressions for them. Which means that we don't need to see the plot of A vs. B to know how A changes with respect to B. If the gradient of A with respect to B is positive at some point, then increasing B will increase A. In a point where the gradient of A with respect to B is negative, increasing B will reduce A.

In the modelling world. Gradients are central to finding good models automatically and efficiently. Think about it! If you define an error function for your "model family", you can calculate the gradient of the error with respect to your parameters. The sign of the gradient tells you how to change your parameter to reduce the error and the magnitude of the gradient tells you how fast the error will change!

The method of following the opposite direction of the gradient to update your parameters is called _gradient descent_. It underpins nearly all machine learning algorithms.

Using gradient descent means that you are no longer limited to optimising only models that you can see in a plot and you can update every parameter in your model simultaneously and know that the new error will be smaller than the previous one.


## Practical exercise 1: a simple correlation, with help

In this exercise we will get familiarised with how gradient descent works. We will use the same weights and heights dataset as before. This time, however, we have two extra tools at our disposal.

The first one is the mean absolute error that the model makes on the observed data. We will use the mean absolute error to measure 'how bad' is our model. MAE is calculated as:

MAE = 1/N sum(y_pred - y_true)

The second tool at our disposal will be the gradients of the MAE with respect to the slope and cutoff point.

Running the cell below you can set a slope and a cutoff point. The cell will print your model and the data (same as before), but it will also print the MAE and the gradients of each of the parameters.

Your mission is to find a good slope and cutoff point using the gradient information to update the parameters at each try. As you do trial and error try to think of:

- What happens if you only update one parameter at a time?
- How low can you get the MAE? Can you get the MAE to 0?
- What happens to the gradients as your MAE decreases?

In [None]:
# Data reading, run only once

import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv("heights_weights.csv")
height = data["Height(Inches)"].to_numpy()
weight = data["Weight(Pounds)"].to_numpy()

In [None]:
## -- Your parameters here -- ##

SLOPE = 
CUTOFF = 


## --- NO EDIT START --- ##

import numpy as np
import numpy.typing as npt
import tensorflow as tf

tf_slope = tf.Variable(float(SLOPE))
tf_cutoff = tf.Variable(float(CUTOFF))

with tf.GradientTape(persistent=True) as tape:
    y_vals = height * tf_slope + tf_cutoff
    mae = tf.reduce_mean(tf.abs(y_vals - weight))

grad = tape.gradient(mae, [tf_slope, tf_cutoff])
dslope = grad[0].numpy()
dcutoff = grad[1].numpy()

print(f"Your current mean absolute error is {mae}")
print(f"The gradient for the slope is {dslope}")
print(f"The gradient for the cutoff is {dcutoff}")
plt.figure(figsize=(5,5))
plt.scatter(height, weight, s=1, label="data")
plt.plot(height, y_vals, 'y', label="model")
plt.xlabel("Height [Inches]")
plt.ylabel("Weight [Pounds]")
plt.legend()

## --- NO EDIT END --- ##

## Practical exercise 2: another intergalactic experience

We will now repeat the second exercise from the previous session, but with a twist. This time we have changed planets (so the same parameter as before won't work) and we are throwing the ball, not just letting it fall. This means that the A_1 parameter will not be 0 anymore.

Your mission is to find which planet we are in now (i.e. what is the gravity) and what is the initial velocity at which we are throwing the ball. You have at your disposal the same tools as before: a mean absolute error and the gradients of that error with respect to each of the parameters.

In [None]:
# Data reading, run only once
import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv("thrown_fall.csv")
time = data["time"].to_numpy()
distance_travelled = data["distance"].to_numpy()

In [None]:
## -- Your parameters here -- ##

A_0 = 
A_1 = 
A_2 = 


## --- NO EDIT START --- ##

import numpy as np
import numpy.typing as npt

tf_a0 = tf.Variable(float(A_0))
tf_a1 = tf.Variable(float(A_1))
tf_a2 = tf.Variable(float(A_2))

sorted_time = np.array(sorted(time))

with tf.GradientTape(persistent=True) as tape:
    y_vals = tf_a0 + tf_a1 * sorted_time + tf_a2 * sorted_time ** 2
    mae = tf.reduce_mean(tf.abs(y_vals - distance_travelled))

grads = tape.gradient(mae, [tf_a0, tf_a1, tf_a2])

da0 = grads[0].numpy()
da1 = grads[1].numpy()
da2 = grads[2].numpy()

print(f"Your current mean absolute error is {mae}")
print(f"The gradient for a_0 is {da0}")
print(f"The gradient for a_1 is {da1}")
print(f"The gradient for a_2 is {da2}")

plt.figure(figsize=(5,5))
plt.scatter(time, distance_travelled, s=1, label="data")
plt.plot(sorted_time, y_vals, 'y', label="model")
plt.xlabel("Time [seconds]")
plt.ylabel("Distance Travelled [meters]")
plt.legend()

## --- NO EDIT END --- ##