# Try a function on for size

In [None]:
# interactive figures, requires ipypml!
%matplotlib widget
#%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy
import xarray as xa

In [None]:
# not sure how else to get the helpers on the path!
import sys
sys.path.append('../scripts')

In [None]:
from data_gen import get_data, fit
d = get_data(25)

## Plot more than one curve

In the previous lesson we got as far making a plot with a single vibration curve in it:

In [None]:
fig, ax = plt.subplots()
m = d[6]
ax.plot(m.time, m, label=f'control = {float(m.control):.1f}')
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement (mm)')
ax.legend();

But we know that the scientifically interesting effect we want to see is how these curves change as a function of *control* so we really want to be able to see more than one curve at the same time.  Via copy-paste-edit we can get three curves on the axes:

In [None]:
fig, ax = plt.subplots()
m = d[6]
ax.plot(m.time, m, label=f'control = {float(m.control):.1f}')
m = d[0]
ax.plot(m.time, m, label=f'control = {float(m.control):.1f}')
m = d[-1]
ax.plot(m.time, m, label=f'control = {float(m.control):.1f}')
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement (mm)')
ax.legend();

### Add an offset

While this is better than "plot everything" from the first it is still a bit too busy to be readily understood.  One technique we can use is to add an offset to the data before plotting to separate the data visually

In [None]:
fig, ax = plt.subplots()
m = d[6]
ax.plot(m.time, m + 0, label=f'control = {float(m.control):.1f}')
m = d[0]
ax.plot(m.time, m + 4 , label=f'control = {float(m.control):.1f}')
m = d[-1]
ax.plot(m.time, m + 8, label=f'control = {float(m.control):.1f}')
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement [offset] (mm)')
ax.legend();

### Refactor to a loop

Looking at this cell there is a fair amount of (nearly identical) duplicated code.  This suggests that we should try using a loop to reduce the duplication.  This will make the code easier to read (as it will be clear what is different each pass through the loop) make it easier to make future updates (as the change only has to be made once), and makes in easier to change the number of curves plotted (by changing the loop)

In [None]:
fig, ax = plt.subplots()
for j, indx in enumerate([6, 0, -1]):
    m = d[indx]
    ax.plot(m.time, m + j * 4, label=f'control = {float(m.control):.1f}')
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement [offset] (mm)')
ax.legend();

## With a little help from my ~friends~ function

Looking at the body of that loop we have a section of code that does a well scoped task "Given a curve plot it (with an offset) making sure it has a good label".  We want to pull this out into a function (it is only two lines now, but it will grow!) so that we can re-use this logic.  However, we are now faced with a design choice: what should the signature of our function be?!  We could mechanically lift the loop body out and make all of the variables input:


```python
def plot_one(ax: Axes, d: FullDataSet, indx: int, j: int):
    ...
```

This would allow us to copy-paste our loop body into the function and go on our way (and is also what some IDEs might offer to do for you!), but this is not the best design.  It both tells the function too much and not enough.  Because we are passing in the whole data set and an index we are offering the function more information that it needs to do its job, it only needs the curve it cares about!  Further, because we are also passing in the index of the full data set to pull out, if we were to end up having just one curve and wanted to use this function we would have to re-wrap the curve in something that we could then have the function index

```python
plot_one(ax, [single_curve], 0, 0)  # why do this?
```

The first change we should make to the signature is take in a single curve rather than the full data set and an index:

```python
def plot_one(ax: Axes, experiment: OneExperiment, j: int):
    ...
```

Now we should look at *j* which is passing too little information into the function!  In loop we had a hard-coded factor of `4` in the offset computation.  As currently proposed we would only be able to offset the curves in multiples of 4!  We could relax the API a bit to allow *j* to also be a float, but then when using this function you would have to know about the magic number 4 and do

```python
plot_one(ax, single_curve, the_offset_I_want / 4)  # why do this?
```

Another option would be to pass both the step and the offset in as 

```python
def plot_one(..., j: int, step_size: float):
    offset = j * step_size
    ...
```

however this would result in the *j* and *step_size* arguments being very tightly coupled (as they are immediately multiplied together!) and there would be an infinite number of ways to call the function that would result in the same output!  There is nothing technically wrong with this, but it can lead to confusion later when it is not obvious that

```python
plot_one(..., 1, 10)  # do these have
plot_one(..., 10, 1)  # the same offset?
```

are equivalent calls!  That said, it is possible this function could be extended to have more functionality and we would need to be able to express "half way to the next curve" in the code.

A third way to structure this API would be 

```python
def plot_one(..., offset: float):
    ...
```

which has the virtue of being the simplest and easiest to explain.  It "does what it says on the tin" and offsets the data by what ever value you pass in.


Going with the simplest option, we select an API of:

```python
def plot_one(ax: Axes, experiment: OneExperment, offset: float=0):
    ...
```

where we also set a default value for the offset.

Adapting the function body to match this signature we write

In [None]:
def plot_one(ax: 'Axes', experiment: 'OneExperiment', offset: float=0) -> 'Dict[str, Artist]':
    """Given a curve plot it (with an offset) and format a label for a legend.
    
    Parameters
    ----------
    ax : mpl.Axes
        The axes to add the plot to
        
    experiment : OneExperiment
        An xarray DataArray with a vector 'time' and scalar 'control' coordinates.
        
    offset : float, optional
        A vertical offset to apply to before plotting
        
    Returns
    -------
    curve : Line2D
        The Line2D object for the curve 
    """
    return ax.plot(
        experiment.time, 
        experiment + offset, 
        label=f'control = {float(experiment.control):.1f}'
    )

The docstring (which is indeed currently longer than the function body!) follows the [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard).  While it is not the only docstring convention in use, you will see a lot of docstrings in this format because it is followed by many of the core projects (numpy, scipy, Matplotlib, scikit-learn, ...) of the scientific Python ecosystem.

In [None]:
type(d[6])

In [None]:
fig, ax = plt.subplots()
for j, indx in enumerate([6, 0, -1]):
    plot_one(ax, d[indx], j*4)
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement [offset] (mm)')
ax.legend();

We now ask "was it worth it?" for creating this function.  Currently we only have one statement in the function body and in our calling cell we only saved ourselves one local variable! In the next couple of lessons we are going to expand this function, but even if we stopped here, I think this function is worth having.  What it is expressing, in addition to the `ax.plot` call is that the data in **this** use-case is a 1D-vector which caries with it an associated *time* and *control* attributes.  This may seem trivial, but by using xarray (or pandas, awkward array, or a dictionary of numpy arrays) to structurally encode the important relationships between the parts of your data.  This function knows how to use this structure to "do the right thing".

## Style the curves

Our function, despite its upsides, has in fact cost us some functionality: we can no longer directly control any of the properties of the Line2D object!

### Sidebar: variadic keyword arguments and API design


In addition to *label*, `ax.plot` can take a wide range of key-word arguments to control the styling of the line.  To get this back we can either pass a all extra key-word arguments through to the `ax.plot` call like

```python

def plot_one(..., **kwargs_for_plot):
    return ax.plot(..., **kwargs_for_plot)
```

which is a very common pattern when wrapping APIs.  However, because with Python there can only be exactly on "all the extra keyword collectors".  Thus, if you are only wrapping one thing, this can lead to very natural extensions of existing APIs that are better suited to your purpose.  However if we want to route arguments to more than one underlying function we quickly get into trouble.  One option is to pass the `**kwargs` to every function like

```python
def wrapper(..., **kwargs):
    a(**kwargs)
    b(**kwargs)
    ...
```
which works so long both functions have the same API _and_ you want to pass the same arguments to each of them.  One way to get this pattern to work is to have your inner functions take and ignore any key-word arguments it does not know about.  While this will work, and almost never raise an exception, that leads to APIs that are extremely difficult to use because if you miss-type a keyword name rather than Python raising at `TypeError` and telling you, your code just eats it!  As a rule of thumb, if you write a function that takes `**kwargs` then you should 

1. pass them onto an inner function
2. validate that you only have an expected sub-set of keys
3. document that the user-controlled keyword names are part of the input (like the `dict` intit method!).

If you are lucky enough that the keywords are non-overlapping it may be possible to split the keys and route to the correct function

```python
def wrapper(..., **kwargs):
    a_kwargs, b_kwargs = split_kwargs(kwargs)
    a(**a_kwargs)
    b(**b_kwargs)
    ...
```

however it may not be trivial to write `split_kwargs`.  If the the keywords to overlap (for example both `a` and`b` take a *color* keyword argument), then there is no way to split the input.  You could route the over-lapping keywords to _both_ functions, but that does not cover the case where you want to send different values to the two functions.

In these situations, we can use the idiom of

```python
def wrapper(..., *, a_kwargs=None, b_kwargs=None):
    a_kwargs = a_kwargs if a_kwargs is not None else {}
    b_kwargs = b_kwargs if b_kwargs is not None else {}
    a(**a_kwargs)
    b(**b_kwargs)
    ...
```

which is not ideal (it is very hard for IDE and static analyzers to help you out), but sometimes it is the least-bad option!

#### Sidebar: types of arguments

As of Python 3.8 the inputs to a function can be:

- positional only
- positional or keyword
- keyword only

and independently required or optional (has a default value) with the restriction that once a positional argument is optional, all later positional arguments must also be optional to avoid ambiguity.  This means that in 

In [None]:
from inspect import signature

def example(a, /, b, c=None, *, d, e=None):
    ...
    
sig = signature(example)
{k: (v, v.kind) for k, v in sig.parameters.items()}

*a* can only be passed by position (not by keyword) so

In [None]:
example(a=1, b=2, d=0)

fails with `TypeError`.  Similarly *d* can _only_ be passed as a keyword argument so

In [None]:
example(1, 2, 3, 4)

also fails with `TypeError`.

When thinking about how to design APIs you can use these to nudge your users (or your future self) into the right direction.  For example, use position-only arguments if you want to keep the names of your parameters out of the API.  On the other hand, keyword-only arguments are extremely useful for cases where you may have (too) many inputs and you want to make sure the _order_ of the parameters does not leak into your API.

#### Sidebar: mutable defaults: not even once

Be very cautious about using mutable default arguments, they can lead to very surprising results:

In [None]:
def wat(a, b=[]):
    b.append(a)
    print(a + len(b))
    
    
wat(1)
wat(1)
wat(1)
wat(1, [])
wat(1)

What is going on is that the default value of *b* is evaluated once when the function is defined.  Each subsequent call, when not explicitly passed *b*, uses the same `list` instance, hence the length of *b* keeps growing!  While this can be exploited for inter-call caching / memmoization, it is probably better to reach for `functools.lrucache`.

### Route arguments to plot

As discussed above, the least-bad API choice we have, assuming we will want to route to additional methods in the near future, is

In [None]:
def plot_one(
    ax: "Axes",
    experiment: "OneExperiment",
    offset: float = 0,
    *,
    line_style: "Dict[str, Any]" = None,
) -> "Dict[str, Artist]":
    """Given a curve plot it (with an offset) and format a label for a legend.

    Parameters
    ----------
    ax : mpl.Axes
        The axes to add the plot to

    experiment : OneExperiment
        An xarray DataArray with a vector 'time' and scalar 'control' coordinates.

    offset : float, optional
        A vertical offset to apply to before plotting

    line_style : dict, optional
        Any any keywords that can be passed to `matplotlib.axes.Axes.plot`

    Returns
    -------
    curve : Line2D
        The Line2D object for the curve
    """
    # if the user does not give us input, 
    line_style = line_style if line_style is not None else {}
    # if the user passes label in the line_style dict, let it win!
    line_style.setdefault("label", f"control = {float(experiment.control):.1f}")
    # do the plot!
    return ax.plot(experiment.time, experiment + offset, **line_style)


Using this new functionality we can, for example, make all of the line thinner than default:

In [None]:
fig, ax = plt.subplots()
for j, indx in enumerate([6, 0, -1]):
    plot_one(ax, d[indx], j*4, line_style={'linewidth': .75})
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement [offset] (mm)')
ax.legend();

If we want more control over the way the styles cycle we can use `cycler`

In [None]:
from cycler import cycler

#### Cycle the Style

The [cyler](https://matplotlib.org/cycler/) library lets us create complex cycles of styles by composing simple cycles.  For example to create a style cycle that varies both the color and the linestyle we would  do:

In [None]:
color = cycler(color=['red', 'green', 'blue'])
linestyles = cycler(linestyle=['-', '--', ':'])

my_cycle = color + linestyles

my_cycle

Where we have a nice html repr.  If we iterate over `my_cycle` we see that each element is a dictionary

In [None]:
for sty in my_cycle:
    print(f'{type(sty)=} {sty=}')

Because we added a way to pass the line style in to our helper `plot_one` we can directly use the `Cycler` to control the styling of our plots:

In [None]:
fig, ax = plt.subplots()
for j, (indx, sty) in enumerate(zip([6, 0, -1], my_cycle)):
    plot_one(ax, d[indx], j*4, line_style={'linewidth': .75, **sty})
ax.set_xlabel('time (ms)')
ax.set_ylabel('displacement [offset] (mm)')
ax.legend();