# Learning Python Data Analysis

## Python Visualizing Tabular Data

Setup: https://swcarpentry.github.io/python-novice-inflammation/instructor/index.html#setup

Instruction: https://swcarpentry.github.io/python-novice-inflammation/instructor/03-matplotlib.html

Objectives:
* Plot simple graphs from data.
* Plot multiple graphs in a single figure.

In [None]:
# Load NumPy and data
import numpy
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

In [None]:
import matplotlib.pyplot # Only load the pyplot module from matplotlib, using dot notation
# The pyplot docs: https://matplotlib.org/3.5.3/api/_as_gen/matplotlib.pyplot.html

# Plot the data
image = matplotlib.pyplot.imshow(data)
matplotlib.pyplot.show()

# Each row in the heat map corresponds to a patient in the clinical trial dataset
# Each column corresponds to a day in the dataset
# Blue pixels represent low values, and yellow pixels represent high values

# About the Data
* Clinical trial data for an imaginary miracle drug that promises to cure arthritis
* The patients take the medication once their inflammation flare-ups begin
* It takes around 3 weeks for the medication to take effect and begin reducing flare-ups

In [None]:
# Let's look at the average inflammation over time

# Get the average inflammation per day across all patients in the variable ave_inflammation
ave_inflammation = numpy.mean(data, axis=0)

# Then create (and display) a line graph of those values 
ave_plot = matplotlib.pyplot.plot(ave_inflammation)
matplotlib.pyplot.show()

# The result is a reasonably linear rise and fall

In [None]:
# Plotting the max value

max_plot = matplotlib.pyplot.plot(numpy.amax(data, axis=0))
matplotlib.pyplot.show()

In [None]:
# And the min value

min_plot = matplotlib.pyplot.plot(numpy.amin(data, axis=0))
matplotlib.pyplot.show()

Conclusion: The maximum value rises and falls linearly, while the minimum seems to be a step function. 
Neither trend seems particularly likely.

There’s either a mistake in our calculations or something is wrong with our data. This insight would have been difficult to reach by examining the numbers themselves without visualization tools.

## Grouping plots
You can group similar plots in a single figure using subplots.

In [None]:
# Create 3 plots side by side
import numpy
import matplotlib.pyplot

data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) # define the size

# add_subplot method takes 3 parameters. 
# 1. total rows of subplots
# 2. total columns of subplots 
# 3. which subplot your variable is referencing (left-to-right, top-to-bottom)

axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)

axes1.set_ylabel('average')
axes1.plot(numpy.mean(data, axis=0))

axes2.set_ylabel('max')
axes2.plot(numpy.amax(data, axis=0))

axes3.set_ylabel('min')
axes3.plot(numpy.amin(data, axis=0))

fig.tight_layout()

matplotlib.pyplot.savefig('inflammation.png')
matplotlib.pyplot.show()

Exercise - create your own plot showing the 
standard deviation (numpy.std) of the inflammation data for each day across all patients.