# A little bit of analysis goes a long way

As great as looking at raw data is, it is not our ultimate goal.  Reminding our selves, why we are (hypothetically) doing this: we want to determine how *control* affects the physical properties of a cantilever.  We know that we can model this system as a damped harmonic oscillator so the displacement away from equilibrium as a function of time is given by:

$$D(t) = A e^{-\zeta\omega_0t} \sin\left(\sqrt{1 - \zeta^2}\omega_0t + \varphi\right)$$

where $A$ and $\varphi$ are set by the initial conditions, $\zeta$ is the damping factor and $\omega_0$ is the natural frequency of the oscillator.

## (non-linear) fitting

In [1]:
# not sure how else to get the helpers on the path!
import sys
sys.path.append('../scripts')

In [2]:
from data_gen import get_data, fit, Params
d = get_data(25)
m = d[6]

In order to extract the fit parameters from our (noisy) data we will use the [`scipy.optimize.curve_fit`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html).  In the interest of time, we have implemented this in the helper module and will import it.

In the same spirit as our `plot_one` method, the `fit` function wraps the underlying `curve_fit` function and provides 2 improvements for our purposes:

1. Specializes from general non-linear fitting to "damped harmonic fitting" by baking in the functional form we know we want to fit
2. Enriches the returned parameters from a numpy array to a `namedtuple` with nice reprs and a `sample` method.

### TODO

Detail design review of `fit` and `Params`

In [3]:
fit(m)

## Add fit to plot

In [4]:
# interactive figures, requires ipypml!
%matplotlib widget
#%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy
import xarray as xa

In [5]:
def plot_one(
    ax: "Axes",
    experiment: "OneExperiment",
    fit: Params,
    offset: float = 0,
    *,
    line_style: "Dict[str, Any]" = None,
) -> "Dict[str, Artist]":
    """Given a curve plot it (with an offset) and format a label for a legend.

    Parameters
    ----------
    ax : mpl.Axes
        The axes to add the plot to

    experiment : OneExperiment
        An xarray DataArray with a vector 'time' and scalar 'control' coordinates.
        
    fit : Params
        The fit Parameters

    offset : float, optional
        A vertical offset to apply to before plotting

    line_style : dict, optional
        Any any keywords that can be passed to `matplotlib.axes.Axes.plot`

    Returns
    -------
    dict : Dict[str, Artist]
        A mapping of the Artists created by this function.  
        
        The expected keys are {'raw', 'fit', 'annotation'}
    """
    # if the user does not give us input, 
    line_style = line_style if line_style is not None else {}
    # if the user passes label in the line_style dict, let it win!
    line_style.setdefault("label", f"control = {float(experiment.control):.1f}")
    # do the plot!
    t = experiment.time
    (ln,) = ax.plot(t, experiment + offset, **line_style)
    (fit_ln,) = ax.plot(t, fit.sample(t) + offset, color="k")
    # Add annotation with the fit parameters
    ann = ax.annotate(
        f"$\\zeta={fit.zeta:.2g}$, $\\omega_0={fit.omega:.2f}$",
        # This controls how xy is interpreted
        xycoords=ax.get_yaxis_transform(),
        # units are (axes-fraction, data)
        # anchor the text at 95% of the width, and additional 0.5 above the offest
        xy=(0.95, offset + 0.5),
        # set the text alignment.  This anchors the text on the lower-right corner
        ha="right",
        va="bottom",
    )
    # return all three artists
    return {"raw": ln, "fit": fit_ln, "annotation": ann}



Modifying our standard example to do the fit and pass the result into `plot_one` we get:

In [6]:
fig, ax = plt.subplots()
for j, indx in enumerate([6, 0, -1]):
    plot_one(ax, d[indx], fit(d[indx]), j*4, line_style={'linewidth': .75})
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement [offset] (mm)')
ax.legend();

This is better, and promising for our experiment as the fits look good, but the legend placement is not good.  Before we address the legend placement, we we are going to discuss the choices about the updated signature of `plot_one`. For Matplotlib content-only skip the next two sections.

### What is the input (or do not cross the streams)

Before we address the issue with the legend, we should talk about the choice of doing the fit inside or outside of `plot_one` and the more structured return type.

On one hand, if we made the call to `fit` inside of `plot_one` it would not require any changes to the API of `plot_one`!  Instead of "plot the experimental data" it would now have the job of "plot the experimental data and the best-fit curve".  This is a very reasonable extension of the scope of `plot_one` and would make sure that the fit and the raw-data were always correctly associated, however doing so would mix analysis and visualization in a fundamentally inseparable way. If we wanted to use the fit parameters in some other part of the code we would have the choice of either re-calling `fit` or returning the values of the fit from `plot_one`. 

If we were to re-call `fit` multiple places in the code it would waste time (you may have noted a bit of lag in running the plotting cell just above) and would present a risk that different parts of the code would be calling different versions of `fit` (or for techniques that may not be fully deterministic get slightly different results) which would lead to inconsistencies.  Consider if we wrote a `fit_but_better`, then we would have to find _every_ place in our code that we called `fit` and change it.

To avoid having to re-call `fit`, we could have `plot_one` return the fit values, however, this would plotting a pre-requisite for fitting!  Consider the case were we had hundreds of experiments, it would be farcical to have to plot (and then discard the figure) every data set to extract all of the fit parameters!  

This exact scenario happened recently in my work.  Myself and a colleague were updating a notebook and package left by a now-departed post-doc.  When running a function called `compute_circular_average` we were (temporarily) stymied with a file permissions error because that function was plotting and saving a png to disk as part of doing the analysis!  While we did eventually find where the file path was set and update it to a directory where we had write-permissions, it was a debugging adventure unrelated to what our goal for the day was.

While it may seem convenient to mix your plotting and analysis in the sort term, in the long term it will cause problems.  Thus, we have chosen to pass the fit results into `plot_one`!

### What to return

Instead of returning a single `Line2D` we now return a dictionary of artist.  When writing functions that well create and add aritsts to a Figure, it is good practice to return them to the caller.  This enables updating these artists, by setting new data, changing the style, toggling the visibility, or adding interactions, which require a handle to the Python object that represents the Artist.

With only three things, it would also be reasonable to return a tuple like
```python
def plot_one(...):
    ...
    return ln, fit_ln, ann

```
However, this includes a part of the API some constraints that will be hard to change later.  For example, that this returns three Artists is now part of the API.  If you were to put:
```python
a, b, c = plot_one(...)
```
in your code and you later increase the number of returns from `plot_one` (we just went from 1 to 3, it seems reasonable to expect it to go up again!) you will have a `TypeError`.  Alternatively by returning a dictionary so long as we do not remove a key (or change its meaning!) we will be able to add additional Artists to the return in a backwards compatible way.  When designing APIs considering what a choice will _prevent_ you from doing the future can be as important as what it lets you do now.

Additionally, by returning the three items as a tuple we have to pick an order, but in this case it is not clear there is a "natural" or "c1orrect" order for these three things.  Should it be `(ln, fit_ln, ann)` or `(ln, ann, fit_ln)` or `(ann, fit_ln, ln)` ... (and so on through all the permutations).  Due to this ambiguity, when using `plot_one` you will have to remember (or check the docs) for the order.  In contrast dictionaries do not have an inherent order (yes, the iterate in insertion order, but we do not want to use that here) and instead give you names for the things.  It is easier to remember "`plot_one` return the `'raw'`, `fit'`, and `annotation` artist" than in remembering that plus what order they are in.

One issue with returning a dictionary like this is that the flexibility that was an asset above, is a liability as the keys could be _anything_.  If you were reading code code that used `plot_one` and came across
```python
arts = plot_one(...)
# will this work?!
arts['raw'].set_lw(5)
```
you have no idea what is in there, how many artists there are etc.  If you can interactively inspect `arts` than you could see the keys of `{'raw', 'fit', 'annotation'}`, but from only static analysis you are out of luck (short of reading the source of `plot_one`).   Returning a dictionary could also be criticized because you have to access the contents via `__getitem__` rather than attribute access.  To address this Python has two built in options
1. [collections.namedtuple](https://docs.python.org/3/library/collections.html#collections.namedtuple)
2. [types.SimpleNamespace](https://docs.python.org/3/library/types.html#types.SimpleNamespace)
2. [dataclasses](https://docs.python.org/3/library/dataclasses.html#module-dataclasses)

For the rest of this course we are going to stick with returning dictionaries because it is simple and provides us enough flexibility without any extra overhead.  However is larger projects some of these techniques are worth considering.

#### Simplenamespace

For `SimpleNamespace`, which primarily solves the problem of wanting attribute access vs `__getitem__` access, we could do
```python
from types import SimpleNamespace

def plot_one(...):
    ...
    return SimpleNamespace(raw=ln, fit=ft_ln, annotation=ann)
```
However, this comes at the cost of the easy iteration that a dictionary has via `.items()`.

#### namedtuple

To use `namedtuple`, which we discussed above in the context of `Params`,  we define

In [7]:
from collections import namedtuple
PlotOneReturn = namedtuple('PlotOneReturn', ['raw', 'fit', 'annotation'])
?PlotOneReturn

Then, from `plot_one` we would return

```python
def plot_one(...):
    ...
    return PlotOneReturn(ln, fit_ln, ann)
```
This addresses some of the issues with returning 3 items from `plot_one`: we now have names associated with the elements and can access them via `ret.raw`, however `PlotOneReturn` is still a `tuple` sub-class and the concerns about the order and fixed length still hold.

On the other hand, if it is very unlikely that the number of returned artists will change, this is a lightweight way provide a more structure and to your code.

#### dataclasses

Dataclasses are the youngest of the three options, only being added in Python 3.7.  

To create a dataclass for our return we could write


In [8]:
from dataclasses import dataclass
from matplotlib.lines import Line2D
from matplotlib.text import Annotation

@dataclass
class PlotOneReturnDC:
    """Class to track the returns from plot_one"""
    raw: Line2D
    fit: Line2D
    annotation: Annotation

? PlotOneReturnDC
    

To use this in `plot_one` we would do
```python
def plot_one(...):
    ...
    return PlotOneReturnDC(raw=ln, fit=ft_ln, annotation=ann)
```


Because `dataclass` is a class-decorator (rather than a factory function like `namedtuple`), you can add additional methods such as

In [9]:

@dataclass
class PlotOneReturnDC:
    """Class to track the returns from plot_one"""
    raw: Line2D
    fit: Line2D
    annotation: Annotation
    
    def toggle_annotation(self, *, visible=None):
        """
        Toggle visibility of annotation text.
        
        Parameters
        ----------
        visible : bool, optional
            If not None, set the visibility, otherwise toggle current sate. 
        """
        if visible is None:
            visible = not self.annotation.get_visible()
        self.annotation.set_visible(visible)

? PlotOneReturnDC.toggle_annotation

By default `ax.legend` looks at the artist in the Axes and tries to find the "best" place to put the legend using a heuristic that minimizes overlapping with the data.  However, this is not fool-proof as 

### Fix the legend. 

By default `ax.legend` looks at the artist in the Axes and uses a heuristic find the "best" place to put it based on overlap.  However, as is evident in this figure, the heuristic can fail.  In that case we will need to try something different.

The legends in Matplotlib are very flexible.  For more details see [the Legend Guide](https://matplotlib.org/stable/tutorials/intermediate/legend_guide.html) tutorial.

#### Draggable legend

If you only need these figures to be used interactive or in on-off manually saved cases, we can make the legend draggable and put it someplace better:

In [10]:
fig, ax = plt.subplots()
for j, indx in enumerate([6, 0, -1]):
    plot_one(ax, d[indx], fit(d[indx]), j*4, line_style={'linewidth': .75})
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement [offset] (mm)')
leg = ax.legend()
leg.set_draggable(True)

However, this manual intervention is antithetical to reproducible science so can we do better?

#### A wide external legend

Given the shape of the data there is not really space inside of the axes space for the legend.  One solution is to put the legend outside of the Axes to ensure that it can not collide with the actual plot!  

Adapting the first example in 
[this subsection of the Legend Guide](https://matplotlib.org/stable/tutorials/intermediate/legend_guide.html#legend-location) we do:

In [11]:
fig, ax = plt.subplots()
for j, indx in enumerate([6, 0, -1]):
    plot_one(ax, d[indx], fit(d[indx]), j*4, line_style={'linewidth': 2})
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement [offset] (mm)')
leg = ax.legend(
    bbox_to_anchor=(0., 1.02, 1., .102), 
    loc='lower left', 
    ncol=3, 
    mode="expand", 
    borderaxespad=0.
)

which is definitely better!  However, it is a bit weird that we have some information, the fit parameters, in the annotation and other information, the control value, in the legend.  This plot, as it stands, also has significant accessibility issues.  If this plot were to be printed out in black and white it would be very difficult to correctly associate the legend with the lines (do the go top-to-bottom and left-to-right or bottom-to-top and left-to-right).  To solve this we can move the legend into the annotation with the fit parameters.

#### Annotation as legend

Expanding on the annotation we already have in `plot_one` lets add the control information as well.  At its core `ax.annotation` is a way to specify:
1. a point in the Axes
2. on offset from that point
3. a text label.
There is significant complexity on how we defined each of those things.  For an extensive tutorial see [the Annotation Tutorial](https://matplotlib.org/stable/tutorials/text/annotations.html) in the Matplotlib documentation.

In [14]:
def plot_one(
    ax: "Axes",
    experiment: "OneExperiment",
    fit: Params,
    offset: float = 0,
    *,
    line_style: "Dict[str, Any]" = None,
) -> "Dict[str, Artist]":
    """Given a curve plot it (with an offset) and format a label for a legend.

    Parameters
    ----------
    ax : mpl.Axes
        The axes to add the plot to

    experiment : OneExperiment
        An xarray DataArray with a vector 'time' and scalar 'control' coordinates.
        
    fit : Params
        The fit Parameters

    offset : float, optional
        A vertical offset to apply to before plotting

    line_style : dict, optional
        Any any keywords that can be passed to `matplotlib.axes.Axes.plot`

    Returns
    -------
    dict : Dict[str, Artist]
        A mapping of the Artists created by this function.  
        
        The expected keys are {'raw', 'fit', 'annotation'}
    """
    # if the user does not give us input, 
    line_style = line_style if line_style is not None else {}
    # if the user passes label in the line_style dict, let it win!
    # even though we are going to add the control value to the annotation still leave
    # the label here
    label = f"control = {float(experiment.control):.1f}"
    line_style.setdefault("label", label)
    # do the plot!
    t = experiment.time
    (ln,) = ax.plot(t, experiment + offset, **line_style)
    (fit_ln,) = ax.plot(t, fit.sample(t) + offset, color="k")
    # Add annotation with the fit parameters
    ann = ax.annotate(
   
        f"{label}\n$\\zeta={fit.zeta:.2g}$, $\\omega_0={fit.omega:.2f}$",
        # This controls how xy is interpreted
        xycoords=ax.get_yaxis_transform(),
        # units are (axes-fraction, data)
        # anchor the text at 95% of the width, and additional 0.5 above the offest
        xy=(0.95, offset + 0.5),
        # set the text alignment.  This anchors the text on the lower-right corner
        ha="right",
        va="baseline",
    )
    # return all three artists
    return {"raw": ln, "fit": fit_ln, "annotation": ann}



In [15]:
fig, ax = plt.subplots()
for j, indx in enumerate([6, 0, -1]):
    plot_one(ax, d[indx], fit(d[indx]), j*4, line_style={'linewidth': 2})
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement [offset] (mm)')

Going through the `ax.annotate` call one line at a time:

```python
    ann = ax.annotate(
```
The first argument we pass in is the label we want to use.
```python
        f"{label}\n$\\zeta={fit.zeta:.2g}$, $\\omega_0={fit.omega:.2f}$",
```
note both the use of math mode `$$` and of the `'\n'` new-line literal to put line break in.  The default font Matplotlib uses also supports a wide-range of western alphabets, including Latin, Cyrillic, and Greek, so we could also use unicode for the Greek letters.

The next is `xycoords` which sets which coordinate system Matplotlib should use to understand the position of the annotation (the next argument)
```python
        # This controls how xy is interpreted
        xycoords=ax.get_yaxis_transform(),
```
With in a Figure and an Axes there are several coordinate systems available to us.  There are three main families
1. the "data" coordinates, the "natural" coordinates in an Axes
2. the "Axes" coordinates, both absolute and fractional, from the lower left of the Axes
3. the "Figure" coordinates, both absolute and fractional, from the lower left of the Figure.
For more details please see the [Transformations Tutorial](https://matplotlib.org/stable/tutorials/advanced/transforms_tutorial.html) in the Matplotlib documentation which explains the coordinate systems and the transformation functions between them.  In this case we are using a "blended transformation" where the x and y coordinates are from different coordinate systems.  In this case the x-coordinates is axes fraction (0 is the left spine, 1 is the right spine) and the y-coordinate is in data.  This allows us to specify the location of our annotation relative to both the edge of the Axes horizontally and relative to the data vertically. 
```python
        # units are (axes-fraction, data)
        # anchor the text at 95% of the width, and additional 0.5 above the offest
        xy=(0.95, offset + 0.5),
```
If you pan or zoom the figure above you will see that the next does not move horizontally and stays 0.5 (in data coordinates) above its line.

Finally we specify the alignment of the text. We can chose how to place the bounding box of the text relative to the point given by `xy`.  In this case we are using a horizontal alignment of `"right"` so the right edge of the bounding box will be at the given x-position and a vertical alignment of "baseline" so the text baseline (where you would underline, not bottom of the bounding box) is at the given y-postion. 
```python
        # set the text alignment.  This anchors the text on the lower-right corner
        ha="right",
        va="baseline",
    )
```