# Overview
We will be analyzing data from a psychoacoustical task in which the subjects (gerbils) are asked to listen to a five-second tone that may be amplitude-modulated. After listening to the tone, they must provide a response if they think they heard an amplitude-modulation. The difficulty of the task is varied by changing the degree of amplitude modulation from 0% (no modulation) to 100% (maximum modulation).

TODO:
* Add wav file illustrating stimulus
* Add image schematic for stimulus

# Goal
When running experiments, it's important to be able to inspect the data you are collecting as it comes in to ensure that there are no problems. Today we are going to write functions that allow us to load and plot data for individual sessions from individual animals. 

TODO:
* update as we add exercises

## Learning goals
* How to inspect a data file and figure out how to load it into Python
* String formatting
* Functions

# Getting started
First, let's get the boring stuff out of the way. 

## Why import?
As an aside (not for discussion during class unless you have questions). If you're a Matlab user, you might think that the need to explicitly specify imports is a major disadvantage of Python. However, there are two advantages of specifying imports. First, you can reuse function names as long as they live in different modules. For example, there are several `log` functions available:

    from math import log
    from numpy import log
   
The first one (available in `math`, which is always bundled with the core Python distribution) isn't smart enough to work with Numpy arrays or Pandas dataframes. But, the second one (available in `numpy`, which is a third-party library) is. So, you'd just import the one that you wanted to use. However, if you were using Matlab, there's no good way to tell Matlab which one you want to use. Instead, the functions would likely be named `log` and `numpy_log` to avoid *name collisions*.

The second reason why specifying imports is often a good thing is because Matlab must load **every** single installed library and toolbox before it is ready to run your code. This can be quite slow if you have a lot of third-party libraries and/or toolboxes installed. In contrast, Python only loads the modules you specify using `import`.

In [None]:
# We have some missing imports. We can't be making today's exercise too easy.
from pathlib import Path
import matplotlib.pylab as plt

# Loading the data

## Exercise - inspecting our data
The data is split into multiple files (one file per session). Your first job is to figure out how to read in one file and plot it. You already have learned the function that can be used to read in the file, but you may (hint) need to specify values for certain function parameters that you have not used in the past. As part of the challenge, you will need to figure out what library to import.

Ready for the challenge? Your assignment is to inspect the file [data/S0_20191122_20dBSPL_11360Hz.dat](data/S0_20191122_20dBSPL_11360Hz.dat) to determine how to load it. Go ahead and open it in your browser (if you're lucky, clicking on the filename above will work).

***Discussion time***: Does everyone have it open? Great, now take a few minutes to look through it and discuss as a class. Tell us what you notice about it. What is the format? What library do you think will work best for reading this format?

Good, now that you think you know what to do, go ahead and make it so. Load the file into `data`. You'll know you did it correctly when `data.shape` is `(178, 2)`.

In [None]:
filename = 'data/S0_20191122_20dBSPL_11360Hz.dat'

# Answer
import pandas as pd
data = pd.read_csv(filename, comment=';')
data.shape

Now, look at the first few rows of data. How do you do that?

In [None]:
# Answer
data.head()

Now, we want to compute and plot the psychometric function. The X-axis will be modulation **depth** and the y-axis will be the percent of trials with a **response**. You've done this before. Go ahead and do it! You have several steps:
* For each modulation depth, compute the percent of trials with a response.
* Plot the result. Make sure that it is a scatterplot connected by lines. How do you specify that you want both markers and lines?
* Label the x and y axes appropriately.

Good to know:
We're going to start using a different approach to plotting. In the past, you might have used the `plot` method available on DataFrames, or you might have called the `plt.plot` function. However, we are going to use the recommended approach in Matplotlib for the solution. Instead of having Matplotlib implicitly generate the axes (i.e., the canvas on which the plot is shown):

    # Don't do this!!!
    plt.plot(...)
    plt.xlabel(...)
    plt.ylabel(...)
    
You will explicitly create it yourself and interact directly with the axes. *Note that the way you **set** the axes labels is a bit different* using this approach:

    # Do this!!!
    ax = plt.subplot(111)
    ax.plot(...)
    ax.set_xlabel(...)
    ax.set_ylabel(...)

This is a superior approach to `plt.plot` because it allows you to juggle multiple axes in the same figure. So, get in the habit of using this approach. If you're a bit confused about terminology:

* **figure**: A figure that can contain multiple axes (i.e. panels)
* **axes**: Analogous to a panel in a figure, can contain multiple plots (i.e., lines)

Finished early? Great, please put up your **green flag**, then go on to the bonus steps:
* Add a grid to the plot
* Make the top and right border invisible. [Hint: Let me Google this for you](https://lmgtfy.com/?q=how+do+i+hide+the+top+and+right+border+in+matplotlib).

In [None]:
ax = plt.subplot(111)

# Answer
mean_response = data.groupby('depth')['response'].mean() * 100
ax.plot(mean_response, 'o-')
ax.set_xlabel('Modulation depth (%)')
ax.set_ylabel('Percent response (%)')
ax.grid()

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

Ok, this is great, but there's one concern. In these experiments, we don't have the same number of trials at each modulation depth. So, this information would be useful to include on the plot.

***Discussion time***: How might we calculate the number of trials per modulation depth?

Ok, go ahead and make it so that `n` contains the answer.

In [None]:
# Answer
n = data.groupby('depth').size()
n

***Discussion point***: What's the datatype of `n`?

Now, make the `scatter` plot. Hint. There's some great examples on the Matplotlib website. Need me to Google it for you? Didn't think so. But, remember you might find an example that uses `plt.scatter` instead of the NEUS642-sanctioned approach of `ax.scatter`. No cheating by using `plt.scatter`.

***Discussion point***: We've worked with `scatter` before. Is it a smart function? Does it know how to deal with things like Pandas Series? If not, what do we have to do?

Finished early? Great, please put up your **green flag**, then go on to the bonus steps:
* Figure out how to set the `edgecolors` of the scatter points to `coral`. Try playing with `linewidths` as well to change the appearance of the plot to taste. Start with `linewidths=2` and then going from there.
* Set the axes limits so that both the x and y axes go from -10 to 110% (if we set it to the range 0 to 100%, it clips some of the plot).

In [None]:
x_values = mean_response.index.values
y_values = mean_response.values

# We're multiplying by 10 because this helps emphasize differences between the points.
size = n.values*10

# Answer
ax = plt.subplot(111)
ax.plot(x_values, y_values, '-')
ax.scatter(x_values, y_values, size, edgecolors='coral', linewidths=3, norm=True)
ax.set_xlabel('Modulation depth (%)')
ax.set_ylabel('Percent response (%)')
ax.axis(xmin=-10, xmax=110, ymin=-10, ymax=110)

Now, let's take a closer look at the filename. It has four components that tell you something about the experiment:
* Subject number
* Date of experiment in YYYYMMDD format
* Stimulus level in dB SPL
* Stimulus frequency in Hz

The format of the filename is:

    data/S[subject]_[date]_[level]dBSPL_[frequency]Hz
    
That's a very elegantly formatted filename. A bit too elegant to have been created by a scientist. We're going to want to be able to inspect how our subjects are doing on a per-session basis. This is important whenever doing animal psychoacoustics work. You want to track how your animals do throughout the multiple weeks of training and testing to make sure they're not getting tired of the task.

First, let's segue into our [tutorial from a few weeks ago on string formatting](../200128_supplement/). Go ahead and open it up.

Ok, now that you're experts in string formatting, write the code that takes the following variables and computes the formatted `filename`. Once you've done it properly, you should get `data/S1_20191122_60dBSPL_2840Hz.dat`.

In [None]:
subject = 1
date = '20191122'
level = 60
frequency = 2840

# Answer

## Option 1: I'm a crotchety Matlab programmer 
filename = 'data/S' + str(subject) + '_' + date + '_' + str(level) + 'dBSPL_' + str(frequency) + 'Hz.dat'

## Option 2: There's something cool about retro code
filename = 'data/S{}_{}_{}dBSPL_{}Hz.dat'.format(subject, date, level, frequency)

## Option 3: I'm a sleek, modern coder
filename = f'data/S{subject}_{date}_{level}dBSPL_{frequency}Hz.dat'
filename

Now, we are going to write a function with four parameters that computes the filename, loads the data from the file and returns it. What are the steps to creating a function?

* Start with `def` followed by the name of your function. 
* Then, in parenthesis, specify the list of parameters it accepts. Parameters must be separated by a comma.
* Then, starting with the next line, indent your code by a tab or four spaces.
* Most of the code inside the function can be cut-and-pasted from previous answers.
* Don't forget to `return` the result.
* Once you've written your function, call it with `load_data(1, '20191122', 60, 2840)`.

Here's a template you can cut-and-paste:

    def load_data(subject, date, level, frequency):
        # All code indented is part of the function
        ...
        
    # This code is not part of the function because it's not indented
    data = load_data(3, '20191127', 20, 2840)
    data.shape
    
Is the shape `(197, 2)`? Congratulations. You've got it.

Finished early? Great, please put up your **green flag**, then go on to the bonus steps:
* Since we want to manually inspect our data, we will be doing a lot of typing. However, we are primarily interested in performance for a level of 20 dB SPL and frequency of 2840 Hz. So, let's make those the *default* values for those parameters. That will allow us to type `load_data(3, '20191127')` instead of `load_data(3, '20191127', 20, 2840)`. Go ahead and do it.

In [None]:
# Answer
def load_data(subject, date, level=20, frequency=2840):
    filename = f'data/S{subject}_{date}_{level}dBSPL_{frequency}Hz.dat'
    data = pd.read_csv(filename, comment=';')
    return data

data = load_data(3, '20191127', 20, 2840)
print(data.shape)

data = load_data(3, '20191127')
print(data.shape)

Now, write a function, `plot_data` that takes two parameters, `axes` and `data`, and plots the data on the provided axes. Use the plotting code we created for the `scatter` plot.