### Exercises

#### Question 1

The accompanying file `data.csv` contains information for the value `x` of something observed at time `t`.

Given this data, we want to calculate the rate of change of this value over time - we'll do this by taking two consecutive observations, say $x(t_i)$ and $x(t_{i+1})$ and approximate the rate of change using this formula:

$$
v(t_{i+1}) = \frac{x(t_{i+1}) - x(t_i)}{t_{i+1} - t_i}
$$

For example, if the data looks like this:

```
t     x
0.1   10
0.2   12
0.4   14
0.5   15
```

Then the first row of data would be considered $t_0$, the second row $t_1$, etc

And we can start approximating the rate of change starting at $v_1$ which would be calculated as:

$$
v_1 = \frac{12 - 10}{0.2 - 0.1} = 20.0
$$

Similarly, $v_2$ would be calculated as:

$$
v_2 = \frac{14 - 12}{0.4 - 0.2} = 10.0
$$

Use NumPy arrays to create an array that holds the calculated rates of change and determine the minimum, maximum, average and standard deviation of the rate of change.


#### Solution

In [6]:
import numpy as np
import pandas as pd

# Read the data from the CSV file
data = pd.read_csv('data.csv')

# Get the 't' and 'x' columns from the data
t_values = data['t'].values
x_values = data['x'].values

# Calculate the rates of change
v_values = np.diff(x_values) / np.diff(t_values)

# Calculate the statistics of the rates of change
minimum = np.min(v_values)
maximum = np.max(v_values)
average = np.mean(v_values)
std_deviation = np.std(v_values)

print("Minimum rate of change:", minimum)
print("Maximum rate of change:", maximum)
print("Average rate of change:", average)
print("Standard deviation of rate of change:", std_deviation)


Minimum rate of change: 29.42739859222208
Maximum rate of change: 69.07300506151955
Average rate of change: 49.98125178748103
Standard deviation of rate of change: 9.043463532187475


#### Question 2

In linear regression we try to find the coefficients `m` (slope) and `c` (y-intercept) of a straight line

$$
y = mx + c
$$

that provides the "best" fit given some `x` and `y` data. This formula then allows to predict `y` values for given `x` values.

Given an array of `n` `(x, y)` data pairs, these coefficients can be calculated very simply.

A bit of terminology first:

- Let `X` mean the column of `X` values.
- Let `Y` mean the column of `Y` values.
- Let `XX` mean a column calculated by multiplying each `x` in the `X` column by itself
- Let `XY` mean a column calculated by multiplying the `x` and `y` values from the `X` and `Y` columns

Then, given some column (say `X`), this symbol: $\sum{X}$ means the sum of all the elements in the column.

Similarly, the symbol $\sum{XY}$ means the sum of the values obtained by multiplying (pairwise) the values in `X` and `Y`.

Given those definitions, the formulas for calculating the "best" values of `m` and `c` are given by:

$$
m = \frac{n\sum{XY} - \sum{X}\sum{Y}}{n\sum{XX} - (\sum{X})^2}
$$

$$
c = \frac{\sum{Y}\sum{XX} - \sum{X}\sum{XY}}{n\sum{XX} - (\sum{X})^2}
$$

(where `n` is the number of `(x,y)` pairs in our data set.)

Using the same data we saw in Question 1, calculate the values for `m` and `c` for that data set given the formulas above.

You can think of the `t` column in the data as the `X` column, and the `x` values in the data as the `Y` column - we are trying to predict the value of `x` given a value of `t`.

This will result in a straight line that "best" fits through the data.

Compare the slope of this regression line to the average rate of change you calculated in Question 1.

### Solution

In [9]:
import numpy as np
import pandas as pd


data = pd.read_csv('data.csv')


t_values = data['t'].values
x_values = data['x'].values


n = len(t_values)
sum_t = np.sum(t_values)
sum_x = np.sum(x_values)
sum_xx = np.sum(x_values ** 2)
sum_xt = np.sum(x_values * t_values)
sum_t_x = np.sum(t_values * x_values)
sum_t_t = np.sum(t_values ** 2)

# calculate slope and intercept
m = (n * sum_xt - sum_x * sum_t) / (n * sum_xx - sum_x ** 2)
c = (sum_t * sum_xx - sum_x * sum_xt) / (n * sum_xx - sum_x ** 2)

print("Slope (m):", m)
print("Y-Intercept (c):", c)


Slope (m): 0.02000848092188041
Y-Intercept (c): -0.20163019189705095


In [10]:
print("Average rate of change:", average)
print("Slope of regression line:", m)


Average rate of change: 49.98125178748103
Slope of regression line: 0.02000848092188041
