# Worksheet 5 - Data analysis task 2

- This worksheet should be used in conjunction with the Intro to Python course notes [here](https://uniexeterrse.github.io/intro-to-python/). 
- All information contained in this worksheet can be found in the course notes. 
- This worksheet highlights tasks that can be completed during the sessions.

## 1. Ensure data is loaded

We are in a new notebook. We need to make sure that our data has been loaded by this Notebook instance.

In [None]:
import numpy as np
filepath = 'data/inflammation-01.csv'
data = np.loadtxt(fname=filepath, delimiter=',')

## 2. Visualisation using matplotlib

There are an incredible number of graph types and plotting functions available in `matplotlib`. 

Check out the `matplotlib` documentation [here](https://matplotlib.org/stable/api/pyplot_summary.html). 

We are going to use `matplotlib.pyplot`. We are going to import it as `plt`.

In [None]:
import matplotlib.pyplot as plt
image = plt.imshow(data)
plt.show()

Here we are using `plt.imshow()`. This plots our numerical data as a 2D raster image. Its a very quick way of generating a heatmap.

Each row in the heat map corresponds to a patient in the clinical trial dataset, and each column corresponds to a day in the dataset. Blue pixels in this heat map represent low values, while yellow pixels represent high values. As we can see, the general number of inflammation flare-ups for the patients rises and falls over a 40-day period.

So far so good as this is in line with our knowledge of the clinical trial and Dr. Maverick’s claims:

* the patients take their medication once their inflammation flare-ups begin
* it takes around 3 weeks for the medication to take effect and begin reducing flare-ups
* flare-ups appear to drop to zero by the end of the clinical trial.

## 3. Line plots with matplotplib

In [None]:
ave_inflammation = np.mean(data, axis=0)
ave_plot = plt.plot(ave_inflammation)
plt.show()

Here, we have put the average inflammation per day across all patients in the variable ave_inflammation, then asked matplotlib.pyplot to create and display a line graph of those values. The result is a reasonably linear rise and fall, in line with Dr. Maverick’s claim that the medication takes 3 weeks to take effect. But a good data scientist doesn’t just consider the average of a dataset, so let’s have a look at two other statistics:

In [None]:
max_plot = plt.plot(np.max(data, axis=0))
plt.show()

In [None]:
min_plot = plt.plot(np.min(data, axis=0))
plt.show()

The maximum value rises and falls linearly, while the minimum seems to be a step function. Neither trend seems particularly likely, so either there’s a mistake in our calculations or something is wrong with our data. This insight would have been difficult to reach by examining the numbers themselves without visualization tools.

## 4. Grouping and saving plots

In [None]:
fig = plt.figure(figsize=(10.0, 3.0))

axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)

axes1.set_ylabel('average')
axes1.plot(np.mean(data, axis=0))

axes2.set_ylabel('max')
axes2.plot(np.max(data, axis=0))

axes3.set_ylabel('min')
axes3.plot(np.min(data, axis=0))

fig.tight_layout()

plt.savefig('inflammation.png')
plt.show()

## 5. Questions

**Q**: Why do all of our plots stop just short of the upper end of our graph?

We can use `axes3.set_ylim(0,6)` to adjust the limits of a plot.

In [None]:
min_data = np.min(data, axis=0)
axes3.set_ylabel('min')
axes3.plot(min_data)
axes3.set_ylim(np.min(min_data), np.max(min_data) * 1.1)

**Q**: In the center and right subplots above, we expect all lines to look like step functions because non-integer value are not realistic for the minimum and maximum values. However, you can see that the lines are not always vertical or horizontal, and in particular the step function in the subplot on the right looks slanted. Why is this?

In [None]:
fig = plt.figure(figsize=(10.0, 3.0))

axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)

axes1.set_ylabel('average')
axes1.plot(np.mean(data, axis=0), drawstyle='steps-mid')

axes2.set_ylabel('max')
axes2.plot(np.max(data, axis=0), drawstyle='steps-mid')

axes3.set_ylabel('min')
axes3.plot(nbp.min(data, axis=0), drawstyle='steps-mid')

fig.tight_layout()

plt.show()

## 6. Exercises

**Exercise 1**: Create a plot showing the standard deviation (`np.std`) of the inflammation data for each day across all patients.

**Exercise 2**: Modify the program to display the three plots on top of one another instead of side by side.