### Exercises

#### Question 1

The accompanying file `data.csv` contains information for the value `x` of something observed at time `t`.

Given this data, we want to calculate the rate of change of this value over time - we'll do this by taking two consecutive observations, say $x(t_i)$ and $x(t_{i+1})$ and approximate the rate of change using this formula:

$$
v(t_{i+1}) = \frac{x(t_{i+1}) - x(t_i)}{t_{i+1} - t_i}
$$

For example, if the data looks like this:

```
t     x
0.1   10
0.2   12
0.4   14
0.5   15
```

Then the first row of data would be considered $t_0$, the second row $t_1$, etc

And we can start approximating the rate of change starting at $v_1$ which would be calculated as:

$$
v_1 = \frac{12 - 10}{0.2 - 0.1} = 20.0
$$

Similarly, $v_2$ would be calculated as:

$$
v_2 = \frac{14 - 12}{0.4 - 0.2} = 10.0
$$

Use NumPy arrays to create an array that holds the calculated rates of change and determine the minimum, maximum, average and standard deviation of the rate of change.

In [18]:
import numpy as np
from csv import reader

with open("data.csv", "r") as f:
    content = reader(f)
    next(content)
    data = np.array(tuple(content)).astype(float)

data

array([[9.20000000e-02, 1.47656750e+01],
       [2.00000000e-01, 2.02592269e+01],
       [2.96000000e-01, 2.52463647e+01],
       [3.90000000e-01, 2.85919601e+01],
       [4.94000000e-01, 3.55838752e+01],
       [6.05000000e-01, 3.99240561e+01],
       [6.99000000e-01, 4.49001430e+01],
       [8.06000000e-01, 5.01119987e+01],
       [8.90000000e-01, 5.53374484e+01],
       [1.00300000e+00, 6.11368215e+01],
       [1.10900000e+00, 6.45004525e+01],
       [1.19500000e+00, 6.94338229e+01],
       [1.30400000e+00, 7.52142913e+01],
       [1.39400000e+00, 8.06802446e+01],
       [1.51000000e+00, 8.56904599e+01],
       [1.59600000e+00, 9.06736538e+01],
       [1.69900000e+00, 9.42614600e+01],
       [1.80100000e+00, 1.00150181e+02],
       [1.89300000e+00, 1.04022361e+02],
       [2.00600000e+00, 1.10574714e+02],
       [2.09800000e+00, 1.14100175e+02],
       [2.19300000e+00, 1.20511360e+02],
       [2.30200000e+00, 1.25753567e+02],
       [2.40200000e+00, 1.29984891e+02],
       [2.508000

#### Question 2

In linear regression we try to find the coefficients `m` (slope) and `c` (y-intercept) of a straight line

$$
y = mx + c
$$

that provides the "best" fit given some `x` and `y` data. This formula then allows to predict `y` values for given `x` values.

Given an array of `n` `(x, y)` data pairs, these coefficients can be calculated very simply.

A bit of terminology first:

- Let `X` mean the column of `X` values.
- Let `Y` mean the column of `Y` values.
- Let `XX` mean a column calculated by multiplying each `x` in the `X` column by itself
- Let `XY` mean a column calculated by multiplying the `x` and `y` values from the `X` and `Y` columns

Then, given some column (say `X`), this symbol: $\sum{X}$ means the sum of all the elements in the column.

Similarly, the symbol $\sum{XY}$ means the sum of the values obtained by multiplying (pairwise) the values in `X` and `Y`.

Given those definitions, the formulas for calculating the "best" values of `m` and `c` are given by:

$$
m = \frac{n\sum{XY} - \sum{X}\sum{Y}}{n\sum{XX} - (\sum{X})^2}
$$

$$
c = \frac{\sum{Y}\sum{XX} - \sum{X}\sum{XY}}{n\sum{XX} - (\sum{X})^2}
$$

(where `n` is the number of `(x,y)` pairs in our data set.)

Using the same data we saw in Question 1, calculate the values for `m` and `c` for that data set given the formulas above.

You can think of the `t` column in the data as the `X` column, and the `x` values in the data as the `Y` column - we are trying to predict the value of `x` given a value of `t`.

This will result in a straight line that "best" fits through the data.

Compare the slope of this regression line to the average rate of change you calculated in Question 1.

In [19]:
import numpy as np
from csv import reader

with open("data.csv", "r") as f:
    content = reader(f)
    next(content)
    data = np.array(tuple(content)).astype(float)

data

array([[9.20000000e-02, 1.47656750e+01],
       [2.00000000e-01, 2.02592269e+01],
       [2.96000000e-01, 2.52463647e+01],
       [3.90000000e-01, 2.85919601e+01],
       [4.94000000e-01, 3.55838752e+01],
       [6.05000000e-01, 3.99240561e+01],
       [6.99000000e-01, 4.49001430e+01],
       [8.06000000e-01, 5.01119987e+01],
       [8.90000000e-01, 5.53374484e+01],
       [1.00300000e+00, 6.11368215e+01],
       [1.10900000e+00, 6.45004525e+01],
       [1.19500000e+00, 6.94338229e+01],
       [1.30400000e+00, 7.52142913e+01],
       [1.39400000e+00, 8.06802446e+01],
       [1.51000000e+00, 8.56904599e+01],
       [1.59600000e+00, 9.06736538e+01],
       [1.69900000e+00, 9.42614600e+01],
       [1.80100000e+00, 1.00150181e+02],
       [1.89300000e+00, 1.04022361e+02],
       [2.00600000e+00, 1.10574714e+02],
       [2.09800000e+00, 1.14100175e+02],
       [2.19300000e+00, 1.20511360e+02],
       [2.30200000e+00, 1.25753567e+02],
       [2.40200000e+00, 1.29984891e+02],
       [2.508000