# Programming with Python

## Episode 3 - Visualising Tabular Data


Questions
- How can I visualise tabular data in Python?
- How can I group several plots together?

Objectives
- Plot simple graphs from data.
- Group several graphs in a single figure.

Let's load the dataset that we were using in the previous episode, so that we can visualise it:

In [1]:
import numpy
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

### Visualising data

The mathematician Richard Hamming once said, “The purpose of computing is insight, not numbers,” and the best way to develop insight is often to visualise data.

Visualisation deserves an entire workshop of its own, but we can explore a few features of Python's `matplotlib` library here. While there is no official plotting library, `matplotlib` is the de facto standard. First, we will import the `pyplot` module from `matplotlib` and use two of its functions to create and display a heat map of our data:

```python
import matplotlib.pyplot
plot = matplotlib.pyplot.imshow(data)
```

In [None]:
import matplotlib.pyplot
plot = matplotlib.pyplot.imshow(data)

#### Heatmap of the Data

Blue pixels in this heat map represent low values, while yellow pixels represent high values. As we can see, inflammation rises and falls over a 40-day period.

#### Some IPython Magic

If you're using a Jupyter notebook, you'll need to execute the following command in order for your matplotlib images to appear in the notebook when show() is called:

```python
%matplotlib inline
```

The `%` indicates an IPython magic function - a function that is only valid within the notebook environment. Note that you only have to execute this function once per notebook.

Let's take a look at the average inflammation over time:

```
ave_inflammation = numpy.mean(data, axis=0)
ave_plot = matplotlib.pyplot.plot(ave_inflammation)
```

Here, we have put the average per day across all patients in the variable `ave_inflammation`, then asked `matplotlib.pyplot` to create and display a line graph of those values. The result is a roughly linear rise and fall, which is suspicious: we might instead expect a sharper rise and slower fall. 

Let's have a look at two other statistics, the maximum inflammation of all the patients each day:
```
max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0))
```

... and the minimum inflammation across all patient each day ...
```
min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0))
matplotlib.pyplot.show()
```

The maximum value rises and falls smoothly, while the minimum seems to be a step function. Neither trend seems particularly likely, so either there's a mistake in our calculations or something is wrong with our data. This insight would have been difficult to reach by examining the numbers themselves without visualisation tools.

### Grouping plots

You can group similar plots in a single figure using subplots. This script below uses a number of new commands. The function `matplotlib.pyplot.figure()` creates a space into which we will place all of our plots. The parameter `figsize` tells Python how big to make this space. 

Each subplot is placed into the figure using its `add_subplot` method. The `add_subplot` method takes 3 parameters. The first denotes how many total rows of subplots there are, the second parameter refers to the total number of subplot columns, and the final parameter denotes which subplot your variable is referencing (left-to-right, top-to-bottom). Each subplot is stored in a different variable (`axes1`, `axes2`, `axes3`). 

Once a subplot is created, the axes can be labelled using the `set_xlabel()` command (or `set_ylabel()`). Here are our three plots side by side:


In [None]:
import numpy
import matplotlib.pyplot

data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

fig = matplotlib.pyplot.figure(figsize=(15.0, 5.0))

axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)

axes1.set_ylabel('average')
plot = axes1.plot(numpy.mean(data, axis=0))

axes2.set_ylabel('max')
plot = axes2.plot(numpy.max(data, axis=0))

axes3.set_ylabel('min')
axes3.plot(numpy.min(data, axis=0))

fig.tight_layout()


##### The Previous Plots as Subplots

The call to `loadtxt` reads our data, and the rest of the program tells the plotting library how large we want the figure to be, that we're creating three subplots, what to draw for each one, and that we want a tight layout. (If we leave out that call to `fig.tight_layout()`, the graphs will actually be squeezed together more closely.)

Exercise: See if you can add the label `Days` to the X-Axis of each subplot

##### Scientists Dislike Typing. 
We will always use the syntax `import numpy` to import NumPy. However, in order to save typing, it is often suggested to make a shortcut like so: `import numpy as np`. If you ever see Python code online using a NumPy function with np (for example, `np.loadtxt(...))`, it's because they've used this shortcut. When working with other people, it is important to agree on a convention of how common libraries are imported.

In other words:

```
import numpy
numpy.random.rand()
```

is the same as:

```
import numpy as np
np.random.rand()
```


## Exercises

### Plot Scaling
Why do all of our plots stop just short of the upper end of our graph?

Solution

If we want to change this, we can use the `set_ylim(min, max)` method of each ‘axes’, for example:
```
axes3.set_ylim(0,6)
```
Update your plotting code to automatically set a more appropriate scale. (Hint: you can make use of the max and min methods to help.)

Solution

### Drawing Straight Lines
In the centre and right subplots above, we expect all lines to look like step functions because non-integer value are not realistic for the minimum and maximum values. However, you can see that the lines are not always vertical or horizontal, and in particular the step function in the subplot on the right looks slanted. Why is this?

Try adding a `drawstyle` parameter to your plotting:
```
axes2.set_ylabel('average')
axes2.plot(numpy.mean(data, axis=0), drawstyle='steps-mid')
```

Solution

### Make Your Own Plot
Create a plot showing the standard deviation (using `numpy.std`) of the inflammation data for each day across all patients.

Solution

### Moving Plots Around
Modify the program to display the three plots vertically rather than side by side.

Solution

## Key Points
- Use the `pyplot` library from `matplotlib` for creating simple visualisations.

# Save your changes

- Save your work: `File -> Save`