<a href="#Overview"></a>
# Overview
* <a href="#49ca6ee2-0a50-40e0-9fde-1351bbac51dc">Overview</a>
* <a href="#b8858321-e871-43ec-ad87-81e5ccea28a2">Goal</a>
  * <a href="#c17864c9-2db3-4d3e-8d01-15ae018016ca">Learning goals</a>
* <a href="#cab730f3-c6a4-44aa-a854-b2888c68605f">Getting started</a>
  * <a href="#e0e57657-4e3d-4d0c-a449-86737da006ec">Why import?</a>
* <a href="#6a94ae4c-9dfe-4755-8704-44c7fdff25a4">Loading the data</a>
  * <a href="#4c3d708e-27bd-4377-b268-c8cae24c575d">Exercise 1: inspecting our data</a>
* <a href="#c440ebfe-921a-4254-9b96-93481b541ddd">Plotting the data</a>
  * <a href="#658c03f5-d5b7-4a5e-ae94-4e40456d62eb">Exercise 2: Creating figures with subplots and axes in Matplotlib</a>
* <a href="#472ca876-af48-4984-b6d3-20c952550c51">String formatting</a>
  * <a href="#bbac55dc-ae84-4d60-925e-091839404daa">Exercise 3: using string formatting to define `filename`</a>
* <a href="#7e69be7f-2182-4a64-817f-c287ef53fec8">Yay functions!</a>
  * <a href="#7ff42e34-c432-486e-95b6-5148d2cc719a">Exercise 4: create a function to load data from a designated file</a>
* <a href="#3fdde124-cd71-48b0-b5e2-b5ddc83051ba">Creating multiple plots</a>
  * <a href="#b07f0ca7-3722-4705-9211-a61a53bdeb21">Exercise 5: plotting on a single subplot from an array of subplots</a>
* <a href="#20de2154-8f0b-4383-900b-6d1b92dc8380">Iterating through files with for loops</a>
  * <a href="#5b78a607-f68a-41b3-b6fe-2bbf2ab5fc5e">Exercise 6: using loops to make multiple subplots</a>

<a id="49ca6ee2-0a50-40e0-9fde-1351bbac51dc"></a>
# Overview
<a href="#Overview">Return to overview</a>
We will be analyzing data from a go-nogo psychoacoustical task in which the subjects are asked to listen to a four-second tone that may be amplitude-modulated. After listening to the tone, they must provide a response if they think they heard an amplitude-modulation. The difficulty of the task is varied by changing the degree of amplitude modulation from 0% (no modulation) to 100% (maximum modulation).

<img src="schematic.png" />

It's called a go-nogo task because the subject is supposed to *go* (i.e., provide a response) if they heard an amplitude modulation and *no-go* (i.e., don't do anything) if they don't hear an amplitude modulation.

<a id="b8858321-e871-43ec-ad87-81e5ccea28a2"></a>
# Goal
<a href="#Overview">Return to overview</a>
When running experiments, it's important to be able to inspect the preliminary data you've collected to ensure that there are no problems.  Today we are going to write functions that allow us to load and plot data for individual sessions from individual subjects.

<a id="c17864c9-2db3-4d3e-8d01-15ae018016ca"></a>
## Learning goals
<a href="#Overview">Return to overview</a>
* How to inspect a data file and figure out how to load it into Python
* String formatting
* How to make reproducible plots
* Functions
* For loops

<a id="cab730f3-c6a4-44aa-a854-b2888c68605f"></a>
# Getting started
<a href="#Overview">Return to overview</a>
First, let's get the boring stuff out of the way and import our libraries.

<a id="e0e57657-4e3d-4d0c-a449-86737da006ec"></a>
## Why import?
<a href="#Overview">Return to overview</a>
As an aside (not for discussion during class unless you have questions). If you're a Matlab user, you might think that the need to explicitly specify imports is a major disadvantage of Python. However, there are two advantages of specifying imports. First, you can reuse function names as long as they live in different modules. For example, there are several `log` functions available:

    from math import log
    from numpy import log
   
The first one (available in `math`, which is always bundled with the core Python distribution) isn't smart enough to work with Numpy arrays or Pandas dataframes. But, the second one (available in `numpy`, which is a third-party library) is. So, you'd just import the one that you wanted to use. However, if you were using Matlab, there's no good way to tell Matlab which one you want to use. Instead, the functions would likely be named `log` and `numpy_log` to avoid *name collisions*.

The second reason why specifying imports is often a good thing is because Matlab must load **every** single installed library and toolbox before it is ready to run your code. This can be quite slow if you have a lot of third-party libraries and/or toolboxes installed. In contrast, Python only loads the modules you specify using `import`.

In [None]:
# We have some missing imports. We can't be making today's exercise too easy.
import matplotlib.pylab as plt

<a id="6a94ae4c-9dfe-4755-8704-44c7fdff25a4"></a>
# Loading the data
<a href="#Overview">Return to overview</a>

<a id="4c3d708e-27bd-4377-b268-c8cae24c575d"></a>
## Exercise 1: inspecting our data
<a href="#Overview">Return to overview</a>
The data is split into multiple files (one file per session per subject). Your first job is to figure out how to read in one file and plot it. You already have learned the function that can be used to read in the file, but you may (hint) need to specify values for certain function parameters that you have not used in the past. As part of the challenge, you will need to figure out what library to import. Imports can occur anywhere in Python code. The import can be part of the answer cell below (i.e., you don't have to put it in the cell above with the other imports).

Ready for the challenge? Your assignment is to first inspect the file [data/S0_20191122_20dBSPL_11360Hz.dat](data/S0_20191122_20dBSPL_11360Hz.dat) to determine how to load it. Go ahead and open it in your browser (if you're lucky, clicking on the filename above will work).

***Discussion***: Does everyone have it open? Great, now take a few minutes to look through it and discuss as a class. Tell us what you notice about it. What is the format? What library do you think will work best for reading this format?

Good, now that you think you know what to do, go ahead and make it so. Load the file into `data`. You'll know you did it correctly when `data.shape` is `(178, 2)`.

In [None]:
filename = 'data/S0_20191122_20dBSPL_11360Hz.dat'

%load "answers/answer_001.txt"

Now, look at the first few rows of data. How do you do that?

In [None]:
%load "answers/answer_002.txt"

Each row is the data for a single trial. Since there are 178 rows, there were 178 trials for this particular session. The first column is `depth`, which is the modulation depth for that trial (expressed as a percentage from 0 to 100). The second column is the subject's response (0 is no response, 1 is response). Remember that the subject is expectd to provide a response if they think they heard an amplitude modulation. As the amplitude modulation approaches 0%, the subject will have greater difficulty detecting the amplitude modulation and, therefore, be less likely to provide a response.

<a id="c440ebfe-921a-4254-9b96-93481b541ddd"></a>
# Plotting the data
<a href="#Overview">Return to overview</a>

<a id="658c03f5-d5b7-4a5e-ae94-4e40456d62eb"></a>
## Exercise 2: Creating figures with subplots and axes in Matplotlib
<a href="#Overview">Return to overview</a>
Now that we have our data loaded, we want to compute and plot the probability, as a percent, of the subject's response for each modulation depth (i.e., their psychometric function). If there are 10 trials with a `depth` of 50 and they provide a response on 3 of them, the probability of the response will be 30% ($3/10\times100$). 

**Before we get started:**
In the past, you might have used the `plot` method available on DataFrames, or you might have called the `plt.plot` function. This time we're going to do things a little bit differently. You're probably familiar with the following approach:

    # Don't do this!!!
    plt.plot(...)
    plt.xlabel(...)
    plt.ylabel(...)
    
Here, Matplotlib is automatically creating the figure and axes (i.e., the canvas on which the plot is shown). However, this is not the recommended approach for using Matplotlib. Instead, we are going to explicitly create the figure and axes. As you will see later in the exercise, this gives you quite a bit of flexibility in laying out your figures (and gets you a step closer to uninstalling Adobe Illu\\$trator and Micro\\$oft Powerpoint).

    # Do this!!!
    figure, ax = plt.subplots(1, 1)
    ax.plot(...)
    ax.set_xlabel(...)
    ax.set_ylabel(...)

Let's walk through this step-by-step. First, we call `plt.subplots` to create the figure and axes.

***Discussion***. We have given `plt.subplots` two arguments. What do they mean? How can you figure it out?

The function, `plt.subplots`, returns a two-element tuple, `(figure, axes)`. We are then *unpacking* this two-element tuple into two variables. This is tuple unpacking. One of my favorite features of Python. We've already discussed it before, but as a refresher:

In [None]:
def my_function():
    return ('hello', 'goodbye')

# A more verbose way of doing this
result = my_function()
a = result[0]
b = result[1]
print(f'a={a}; b={b}')

# Slightly less verbose
result = my_function()
a, b = result
print(f'a={a}; b={b}')

# Cut out the middleman alltogether
a, b = my_function()
print(f'a={a}; b={b}')

Ok, back to how we're creating the figure and axes. For now, we can ignore the figure and just focus on the axes, which has been saved to the variable, `ax`. If you're a bit confused about terminology:

![image.png](attachment:image.png)

* **figure**: A figure that can contain multiple axes (i.e. panels or subplots)
* **axes**: Analogous to a panel in a figure, can contain multiple plots (i.e., lines). This is also commonly called a `subplot` (thanks to Matlab).

***Discussion*** Going back to our original code. We've done the following:

    figure, ax = plt.subplots(1, 1)

This creates a figure with 1 row by 1 column of axes. How many axes do we have total? What does the variable `ax` look like? Is it an array, list, tuple or a scalar value? How can you tell?

In [None]:
%load "answers/answer_003.txt"

Later we'll start working with figures with multiple subplots (i.e., more than one row and/or column). At that point you'll start to appreciate why `plt.subplots` is a superior approach to `plt.plot` and `plt.subplot` (trust us for now). For now, we are working with only one subplot.

Now that we have new tools to make and manage plots let's give it a try!

For this exercise we are going to compute and plot the psychometric function. We want to plot the probability of a **response** (as a percent) for a given **depth**. Therefore, the X-axis will be modulation **depth** and the y-axis will be the percent of trials with a **response**. Let's go ahead and try it. You have several steps:

* For each modulation depth you will need to compute the percent of trials with a response.
* Next you will plot the result. Make sure that it is a scatterplot connected by lines. How do you specify that you want both markers and lines?
* Finally, label the x and y axes appropriately.

You'll know you got it right when your plot looks like this:

![image.png](attachment:image.png)

***Discussion*** How might we tackle the above steps? Ideas?

Finished early? Great, please put up your **green flag**, then go on to the bonus steps:
1. Add a grid to the plot.
2. Make the top and right border invisible. [Hint: Let me Google this for you](https://lmgtfy.com/?q=how+do+i+hide+the+top+and+right+border+in+matplotlib).
3. Change the color of the plot to your favorite [HTML color name](https://www.w3schools.com/colors/colors_names.asp).

In [None]:
figure, ax = plt.subplots(1, 1)

%load "answers/answer_004.txt"

Ok, this is great, but there's one concern. In these experiments, we don't have the same number of trials at each modulation depth. So, this information would be useful to include on the plot. We can do this by creating a `scatter` plot where the size of each marker in the plot can vary.

In [None]:
# x-values
x = [1,    2,    3,    4,    5]
# y-values
y = [0.1,  0.2,  0.5,  0.8,  1.0]
# size of marker
n = [100,  50,   500,  1000, 100]

# Create the axes
figure, ax = plt.subplots(1, 1)
# Plot a line connecting the points
ax.plot(x, y, '-')
# Plot the points. s is used to set the size of the markers.
ax.scatter(x, y, s=n);

***Discussion***: How might we calculate the number of trials per modulation depth?

Ok, go ahead and make it so that `n` contains the answer (to double-check your answer, there are 85 trials for a modulation depth of 0% and 2 at a modulation depth of 70%).

In [None]:
%load "answers/answer_005.txt"

***Discussion***: What's the datatype of `n`?

***Discussion***: We've worked with `scatter` before. Is it a smart function? Does it know how to deal with things like Pandas objects? If not, what do we have to do?

Now, make the `scatter` plot. You'll know you got it right if your plot looks like this:

![image.png](attachment:image.png)

Finished early? Great, please put up your **green flag**, then go on to the bonus steps:
1. Figure out how to set the `edgecolors` of the scatter points to `coral`. Try playing with `linewidths` as well to change the appearance of the plot to taste. Start with `linewidths=2` and then going from there.
2. Set the axes limits so that both the x and y axes go from -10 to 110% (if we set it to the range 0 to 100%, it clips some of the plot).
3. Hide the top and right spines and create a grid.
4. I don't like the automatically-chosen tick spacing of 20\%. I'd rather have a tick spacing of 25\%.

In [None]:
x_values = mean_response.index.values
y_values = mean_response.values

# We're multiplying by 10 because this helps emphasize differences between the points.
size = n.values*10

%load "answers/answer_006.txt"

<a id="472ca876-af48-4984-b6d3-20c952550c51"></a>
# String formatting
<a href="#Overview">Return to overview</a>

<a id="bbac55dc-ae84-4d60-925e-091839404daa"></a>
## Exercise 3: using string formatting to define `filename`
<a href="#Overview">Return to overview</a>
Now, let's take a closer look at the filename. It has four components that tell you something about the experiment:
* Subject number
* Date of session in YYYYMMDD format
* Stimulus level in dB SPL
* Stimulus frequency in Hz

The format of the filename is:

    data/S[subject]_[date]_[level]dBSPL_[frequency]Hz
    
That's a very elegantly formatted filename. A bit too elegant to have been created by a scientist. We're going to want to be able to inspect how our subjects are doing on a per-session basis. This is important whenever doing psychoacoustics work. You want to track how your subjects do throughout the multiple weeks of training and testing to make sure they're not getting tired of the task.

First, let's segue into our [tutorial from a few weeks ago on string formatting](../200128_supplement/). Go ahead and open it up.

Ok, now that you're experts in string formatting, write the code that takes the following variables and computes the formatted `filename`. Once you've done it properly, you should get `data/S1_20191122_60dBSPL_2840Hz.dat`.

In [None]:
subject = 1
date = '20191122'
level = 60
frequency = 2840

%load "answers/answer_007.txt"

<a id="7e69be7f-2182-4a64-817f-c287ef53fec8"></a>
# Yay functions!
<a href="#Overview">Return to overview</a>

<a id="7ff42e34-c432-486e-95b6-5148d2cc719a"></a>
## Exercise 4: create a function to load data from a designated file
<a href="#Overview">Return to overview</a>
Now, we are going to write a function with four parameters that computes the filename, loads the data from the file and returns it. What are the steps to creating a function?

* Start with `def` followed by the name of your function. 
* Then, in parenthesis, specify the list of parameters it accepts. Parameters must be separated by a comma.
* Then, starting with the next line, indent the code belonging to the function by a tab or four spaces.
* Most of the code inside the function can be cut-and-pasted from previous answers. You need to perform three steps in the function:
    * Compute the filename
    * Load data from the file
    * `return` the result
* Once you've written your function, call it with `load_data(1, '20191122', 60, 2840)`.

Here's a template you can cut-and-paste:

    def load_data(subject, date, level, frequency):
        # All code indented is part of the function
        ...
        
    # This code is not part of the function because it's not indented
    data = load_data(3, '20191127', 20, 2840)
    data.shape
    
Is the shape `(197, 2)`? Congratulations. You've got it.

Finished early? Great, please put up your **green flag**, then go on to the bonus steps:
1. Since we want to manually inspect our data, we will be doing a lot of typing. However, we are primarily interested in performance for a level of 20 dB SPL and frequency of 2840 Hz. So, let's make those the *default* values for those parameters. That will allow us to type `load_data(3, '20191127')` instead of `load_data(3, '20191127', 20, 2840)`. Go ahead and do it.

In [None]:
%load "answers/answer_008.txt"

Now, write a function, `plot_data` that takes two parameters, `data` and `ax` and plots the data on the provided axes. The function should incorporate the code we used to create our `scatter` plot above. You can cut and paste the following template:

    # Define function here
    ...
    
    data = load_data(0, '20191122', 40, 11360)
    figure, ax = plt.subplots(1, 1)
    plot_data(data, ax)
    
You'll know you got it right when your plot looks like this:

![image.png](attachment:image.png)

It looks different because we changed the file you're plotting for this exercise (it's a good way to catch mistakes). You don't need to implement all the bonus steps described above if you don't have time. Just make sure that the X and Y values match.

In [None]:
%load "answers/answer_009.txt"

<a id="3fdde124-cd71-48b0-b5e2-b5ddc83051ba"></a>
# Creating multiple plots
<a href="#Overview">Return to overview</a>

Remember how we promised that `plt.subplots` is a superior approach? You didn't believe us, right? Well, hopefully we can convince you now. Let's start by creating a two by three grid of subplots using `plt.subplots`. Think back to our earlier discussion. What are the first two parameters of `plt.subplots`? Ok, quick, just write out the answer. All you need is *one* line of code which calls *one* function to create the two by three grid of subplots. Not sure which function to use? Hint: I've mentioned it *three* times in this paragraph.

In [None]:
%load "answers/answer_010.txt"

We're used to seeing:

    figure, ax = plt.subplots(1, 1)
    
But, I decided to call it `axes` in the answer above. Why is that? What does `axes` look like? How can we figure it out?

In [None]:
%load "answers/answer_011.txt"

Aha! It's no longer a scalar value (i.e., a single AxesSubplot object). That's why I decided to call it `axes` instead of `ax` (i.e., it's a reminder to me that I'm working with an array of axes instead of a single axes. 

***Discussion*** Let's take a closer look at `axes`. What type of attributes can we look at? Let's take a look at them.

In [None]:
%load "answers/answer_012.txt"

<a id="b07f0ca7-3722-4705-9211-a61a53bdeb21"></a>
## Exercise 5: plotting on a single subplot from an array of subplots
<a href="#Overview">Return to overview</a>

Let's plot the data from subject 0 for session `'20191126'` on the axes in the first row, second column. Remember we wrote `load_data` so it takes both the data and the axes you want to plot. So, you need to figure out how to extract the `ax` you want to plot from the `axes` array. Hint ... Python has zero-based indexing (e.g., 0 is the first element, 1 is the second, etc.). Once you're done, your answer will look like this:

![image.png](attachment:image.png)

Finished early? Great, please put up your **green flag**, then go on to the bonus steps:

1. Whoa. The axes are overlapping. There's not enough space in between. One easy way to fix this is to make the figure a little better and **tight**en up the **layout** (hint). Fortunately, `plt.subplots` takes an argument that allows you to specify the size of the figure. Figure out that argument and then figure out the method you need to call *at the end of your code* to tighten up the layout (there are two steps).

In [None]:
data = load_data(0, '20191126')
figure, axes = plt.subplots(2, 3)

%load "answers/answer_013.txt"

<a id="20de2154-8f0b-4383-900b-6d1b92dc8380"></a>
# Iterating through files with for loops
<a href="#Overview">Return to overview</a>

We've talked about `for` loops a bit. Let's review! The structure of a `for` loop is:

    for variable in iterable:
        # do something with variable
        
**Discussion** What is an **iterable**? It's an object that has multiple elements. What are some objects that have multiple elements?

Great, now let's look at a basic `for` loop. Before we run it, how many lines do we expect it to print (there will be one line each time it loops)?

In [None]:
my_iterable = ['a', 'b', 'c']

for my_variable in my_iterable:
    print(f'my_variable = {my_variable}')

What if `my_iterable` is a tuple instead of a list? Does that change things?

In [None]:
my_iterable = ('a', 'b', 'c')

for my_variable in my_iterable:
    print(f'my_variable = {my_variable}')

What about this? No cheating! Answer before you run it.

In [None]:
my_iterable = 'abc'

for my_variable in my_iterable:
    print(f'my_variable = {my_variable}')

Many, many things in Python are **iterables**. This includes Numpy arrays.

In [None]:
import numpy as np
my_iterable = np.array(['a', 'b', 'c'])

for my_variable in my_iterable:
    print(f'my_variable = {my_variable}')

One more thing. Remember how to use `enumerate` to count the number of loops?

In [None]:
my_iterable = ['a', 'b', 'c']

for i, my_variable in enumerate(my_iterable):
    print(f'loop {i}: my_variable = {my_variable}')

Remember the following:
* Python is zero-based indexing, which is why `i` is zero on the first loop.
* `enumerate` returns a two-element tuple. The first value is the loop counter, the second value is the corresponding element from `my_iterable`.
* We are using tuple-unpacking to unpack the return value of `enumerate` into two variables, `i` and `my_variable`.

This is **very** common syntax in Python. You'll encounter it quite a bit so it's important to be comfortable with tuple-unpacking, `for` loops and `enumerate`.

<a id="5b78a607-f68a-41b3-b6fe-2bbf2ab5fc5e"></a>
## Exercise 6: using loops to make multiple subplots
<a href="#Overview">Return to overview</a>

Now that we know how to:
* Load our data for any subject and any session,
* Plot that data on a single axes and
* Create a 2x3 grid of subplots,

We could do the following:

In [None]:
figure, axes = plt.subplots(2, 3, figsize=(10, 7))

data = load_data(0, '20191122')
ax = axes[0, 0]
plot_data(data, ax)

data = load_data(0, '20191125')
ax = axes[0, 1]
plot_data(data, ax)

data = load_data(0, '20191126')
ax = axes[0, 2]
plot_data(data, ax)

data = load_data(1, '20191122')
ax = axes[1, 0]
plot_data(data, ax)

data = load_data(1, '20191125')
ax = axes[1, 1]
plot_data(data, ax)

data = load_data(1, '20191126')
ax = axes[1, 2]
plot_data(data, ax)

figure.tight_layout()


That's a lot of typing! There must be an easier way to do this. Can `for` loops allow us to *iterate* through a list of sessions and plot on each panel?

Let's start by creating our grid of axes and defining a list of sessions:

    figure, axes = plt.subplots(2, 3, figsize=(10, 7))
    sessions = ['20191122', '20191125', '20191126']
    
Now, we are going to plot so each column is a different session. The first row plots the three sessions for subject 0 and the second row plots the three sessions for subject 1. You can use the following as a template.

    for ... in ...:
        # Load data for subject 0 for session
        data = load_data(0, session)
        ... # pull out axes to plot
        plot_data(data, ax)
        
        # Load data for subject 1 for session
        data = load_data(1, session)
        ... # pull out axes to plot
        plot_data(data, ax)
        
Hint, you probably want to iterate through each `session` in sessions and use `enumerate` to generate a counter, `i`. This counter will tell you the column index. You already know the row index since subject 0 is the first row and subject 1 is the second row. 

You'll know you got it right when your plot looks like the one generated by the verbose code above this cell.

In [None]:
figure, axes = plt.subplots(2, 3, figsize=(10, 7))
sessions = ['20191122', '20191125', '20191126']

%load "answers/answer_014.txt"

Finished early? Great, please put up your **green flag**, then go on to this bonus step:

We have four subjects. Let's create a 4x3 grid so that there are now four rows (one for each subject). Update the code to use nested for loops to plot each subject. Use the template:

    figure, axes = plt.subplots(4, 3, figsize=(10, 10))
    sessions = ['20191122', '20191125', '20191126']
    subjects = [0, 1, 2, 3]
    for ... in enumerate(subjects):
        for ... in enumerate(sessions):
            data = load_data(subject, session)
            ... # code to select axes
            plot_data(data, ax)
    plt.tight_layout()
    
You'll know you got it right if the image looks like this:

![image.png](attachment:image.png)

In [None]:
%load "answers/answer_015.txt"

Still have some time? Great, go on to another bonus step. We have three stimulus frequencies for each session. They're saved in different files. By default, we set up the `load_data` function to load 2840 kHz if the frequency isn't specified. Let's modify your answer from the previous bonus exercise. First, define the following:

    figure, axes = plt.subplots(4, 3, figsize=(10, 10))
    sessions = ['20191122', '20191125', '20191126']
    subjects = [0, 1, 2, 3]
    
Now, add a **third** `for` loop that plots each frequency:

    for ... in ...:
        for ... in ...:
            for ... in ...:
                data = load_data(subject, session, frequency=frequency)
                ... # code to select axes
                plot_data(data, ax)
    plt.tight_layout()
    
You'll know you got it right if the image looks like this:

![image.png](attachment:image.png)

In [None]:
%load "answers/answer_016.txt"