# Just Plot It!

## Introduction

### The System

In this course we will work with a set of "experimental" data to illustrate going from "raw" measurement (or simulation) data through exploratory visualization to an (almost) paper ready figure.

In this scenario, we have fabricated (or simulated) 25 cantilevers.  There is some value (suggestively called "control") that varies between the cantilevers and we want to see how the properties of the cantilever are affect by "control".

To see what this will look like physically, take part a "clicky" pen.  Hold one end of the spring in your fingers and flick the free end.  

Or just watch this cat:

In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('4aTagDSnclk?start=19')

Springs, and our cantilevers,  are part of a class of systems known as (Damped) Harmonic Oscillators. We are going to measure the natural frequency and damping rate we deflect each cantilever by the same amount and then observe the position as a function of time as the vibrations damp out.

### The Tools

We are going make use of: 

- [jupyter](https://jupyter.org)
- [numpy](https://numpy.org)
- [matplotlib](https://matplotlib.org)
- [scipy](https://www.scipy.org/scipylib/index.html)
- [xarray](http://xarray.pydata.org/en/stable/index.html)
- [pandas](https://pandas.pydata.org/docs/)

We are only going to scratch the surface of what any of these libraries can do!  For the purposes of this course we assume you know numpy and Matplotlib at least to the level of LINKS TO OTHER COURSES.  We will only be using one aspect (least square fitting) from scipy so no prior familiarity is needed.  Similarly, we will only be superficially making use of pandas and xarray to provided access to structured data.  No prior familiarity is required and if you want to learn more see LINK TO OTHER COURSES.

In [1]:
# interactive figures, requires ipypml!
%matplotlib widget
#%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy
import xarray as xa

### Philsophy

While this coures uses Matplotlib for the visualization, the high-level lessons of this course are transferable to any plotting tools (in any language).

At its core, programing in the process of taking existing tools (libraries) and building new tools more fit to your purpose.  This course will walk through a concrete example, starting with a pile of data and ending with a paper figure, of how to think about and design scientific visualizations tools tuned to exactly *your* data and questions.

## The Data

### Accessing data

As a rule-of-thumb I/O logic should be kept out of the inner loops of analysis or plotting.  This will, in the medium term, lead to more re-usable and maintainable code.  Remember your most frequent collaborator is yourself in 6 months.  Be kind to your (future) self and write re-usable, maintainable, and understandable code now ;)

In this case, we have a data (simulation) function `get_data` that will simulate the experiment and returns to us a [`xarray.DataArray`](http://xarray.pydata.org/en/stable/quick-overview.html#create-a-dataarray).   `xarray.DataArray` is (roughly) a N-dimensional numpy array that is enriched by the concept of coordinates and indies on the the axes and meta-data.  

`xarray` has much more functionality than we will use in this course!

In [2]:
# not sure how else to get the helpers on the path!
import sys
sys.path.append('../scripts')

In [3]:
from data_gen import get_data, fit

### First look

Using the function `get_data` we can pull an `xarray.DataArray` into our namespace and the use the html repr from xarray to get a first look at the data

In [4]:
d = get_data(25)
d

From this we can see that we have a, more-or-less, 2D array with 25 rows, each of which is a measurement that is a 4,112 point time series.  Because this is an DataArray it also caries **coordinates** giving the value of **control** for each row and the time for each column.

If we pull out just one row we can see a single experimental measurement.

In [5]:
d[6]

We can see that the **control** coordinate now gives 1 value, but the **time** coordinate is still a vector.  We can access these values via attribute access (which we will use later):

In [6]:
d[6].control

In [7]:
d[6].time

## The Plotting

### Plot it?
Looking at (truncated) lists of numbers is not intuitive or informative for most people, to get a better sense of what this data looks like lets plot it!  We know that `Axes.plot` can plot multiple lines at once so lets try naively throwing `d` at `ax.plot`!

In [8]:
fig, ax = plt.subplots()
ax.plot(d);

While this does look sort of cool, it is not *useful*.  What has happened is that Matplotlib has looked at our `(25, 4_112)` array and said "Clearly, you have a table that is 4k columns wide and 25 rows long.  What you want is each column plotted!".  Thus, what we are seeing is "The deflection at a fixed time as a function of cantilever ID number".  This plot does accurately reflect that data that we passed in, but this is a nearly meaningless plot!

Visualization, just like writing, is a tool for communication and you need to think about the story you want to tell as you make the plots.

### Sidebar: Explicit vs Implicit Matplotlib API

There are two related but distinct APIs to use Matplotlib: the "Explicit" (nee "Object Oriented") and "Implicit" (nee "pyplot/pylab"). The Implicit API is implemented using the Explicit API; anything you can do with the Implicit API you can do with the Explicit API, but there is some functionality of the Explicit API that is not exposed through the Implicit API.  It is also possible, but with one exception not suggested, to mix the two APIs.

The core conceptual difference is than in the Implicit API Matplotlib has a notion of the "current figure" and "current axes" that all of the calls re-directed to. For example, the implementation of `plt.plot` (once you scroll past the docstring) is only 1 line:

In [9]:
?? plt.plot

While the Implicit API reduces the boilerplate required to get some things done and is convenient when working in a terminal, it comes at the cost of Matplotlib maintaining global state of which Axes is currently active!  When scripting this can quickly become a headache to manage.

When using Matplotlib with one of the GUI backends, we do need to, at the library level, keep track of some global state so that the plot windows remain responsive.  If you are embedding Matplotlib in your own GUI application you are responsible for this, but when working at an IPython prompt,`pyplot` takes care of this for you.

This course is going to, with the exception of creating new figures, always use the Explict API.

### Plot it!

What we really want to see is the transpose of the above (A line per experiment as a function of time):

In [10]:
fig, ax = plt.subplots()
ax.plot(d.T);

Which is better!  If we squint a bit (or zoom in if we are using `ipympl` or a GUI backend) can sort of see each of the individual oscillators ringing-down over time.

### Just one at a time

To make it easier to see lets plot just one of the curves:

In [11]:
fig, ax = plt.subplots()
ax.plot(d[6]);

### Pass freshman physics

While we do have just one line on the axes and can see what is going on, this plot would, right, be marked as little-to-no credit if turned in as part of a freshman Physics lab!  We do not have a meaningful value on the x-axis, no legend, and no axis labels!

In [17]:
fig, ax = plt.subplots()
m = d[6]
ax.plot(m.time, m, label=f'control = {float(m.control):.1f}')
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement (mm)')
ax.legend();

At this point we have a minimally acceptable plot!  It shows us one curve with axis labels (with units!) and a legend.  With 

### sidebar: xarray plotting

Because xarray knows more about the structure of your data than a couple of numpy arrays in your local namespace or dictionary, it can make smarter choices about the automatic visualization:

In [18]:
fig, ax = plt.subplots()
m.plot(ax=ax)

While this is helpful exploritory plotting, `xarray` makes some choices that make it difficult to compose plotting multiple data sets.