<img src="../resources/cropped-SummerWorkshop_Header.png">  

<h1 align="center">Visual Behavior Neuropixels Dataset Exercises</h1> 
<h2 align="center">Summer Workshop on the Dynamic Brain</h2> 

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [None]:
import platform
platstring = platform.platform()

data_dirname = 'visual-behavior-neuropixels'
use_static = False
if 'Darwin' in platstring or 'macOS' in platstring:
    # macOS 
    data_root = "/Volumes/Brain2022/"
elif 'Windows'  in platstring:
    # Windows (replace with the drive letter of USB drive)
    data_root = "E:/"
elif ('amzn' in platstring):
    # then on AWS
    data_root = "/data/"
    data_dirname = 'visual-behavior-neuropixels-data'
    use_static = True
else:
    # then your own linux platform
    # EDIT location where you mounted hard drive
    data_root = "/media/$USERNAME/Brain2022/"

In [None]:
from allensdk.brain_observatory.behavior.behavior_project_cache.\
    behavior_neuropixels_project_cache \
    import VisualBehaviorNeuropixelsProjectCache

# this path should point to the location of the dataset on your platform
cache_dir = os.path.join(data_root, data_dirname)

cache = VisualBehaviorNeuropixelsProjectCache.from_local_cache(
            cache_dir=cache_dir, use_static_cache=use_static)

<div class="alert alert-block alert-success">

## Exercise 1: Playing with the trials table and examining licking behavior

In this starter exercise, we'll practise working with the trials table and examine the licking behavior of one of the mice in the dataset. For this exercise, there are comments with detailed prompts that act as guiderails.
    
The tasks we'll undertake are:

1. Creating a new column for trial "type" (hit, miss, false alarm, etc.) in the trials table and plotting the number trials of each type.
1. Comparing lick times (from start of each trial) for hit trials and aborted trials
1. Comparing lick latency from the stimulus flash for hit, aborted and false alarm trials
1. Computing lick bouts
    
**Note:** With all of these exercises, there are multiple ways of accomplishing the same end goal. For each of the tasks in Exercise 1, we've provided prompts that take you through one logical sequence of steps to complete these analyses (which we think is simple and teaches useful concepts). The objective is to give you a relatively easy algorithm you can follow, so that you can first focus on getting used to writing pandas code. But feel free to try completing the task objectives using your own algorithm first, and consulting our prompts if you get stuck.

</div>

In [None]:
# Setup: first, let's get the relevant tables
session_id = 1065437523  # This is a good session for looking at lick behavior
session = cache.get_ecephys_session(ecephys_session_id=session_id)

stimulus_presentations = session.stimulus_presentations
trials = session.trials
licks = session.licks

<div class="alert alert-block alert-success">

### 1.1 Creating a new column for trial "type" in the trials table, and plotting the number trials of each type

The different types of trials are: `hit`, `miss`, `false_alarm`, `correct_reject`, `aborted`, and `auto_rewarded`. Each of these trial types has a separate boolean column in the `trials` table indicating whether a particular trial is of that type. For any given trial, only one of these columns will be True, and the rest will be False.
    
We are going to convert these boolean columns into a new column in the `trials` table called `trial_type`, which contains the trial type as a string. Then, we'll create a bar plot indicating how many trials there are of each type.
    
If you want to try coming up with the logic for this on your own (i.e., without following the prompts), then this is a good point to pause, think and/or try coding.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.1.a:** Create a new column called `trial_type` filled with NaNs.
    
</div>

<div class="alert alert-block alert-success">

**Prompt 1.1.b:** Select the hit trials (rows of the trials table where the column `hit` is True), and set the `trial_type` column for those rows to be equal to the string `'hit'`.

</div>

<div class="alert alert-block alert-success">

We'll now repeat this action for all trial types using a for loop.

**Prompt 1.1.c:** Create a python list with the names of all the different trial types (no harm in including `'hit'` again). Then, using a for loop over the list of trial types you just created, select rows of the trials table of each type, and assign the `trial_type` column for those rows to equal the name of the trial's type.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.1.d:** Examine the trials table to see how that worked

</div>

<div class="alert alert-block alert-success">

**Prompt 1.1.e:** Now, use the `value_counts` function on the `trials` data frame to get the number of rows having each of these trial types. Rename the output Series to `'trial_type_tally'`, which makes more sense.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.1.f:** Make the index of this series a new column using `reset_index`, and rename the new column to `trial_type`.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.1.g:** Finally, plot these values using a horizontal bar plot. When using matplotlib, you can use the `plt.barh` function: supply the `y`, `width` and `tick_label` parameters. Give the plot a meaningful title.
    
</div>

<div class="alert alert-block alert-success">

### 1.2 Comparing lick times for hit trials and aborted trials

Next, we'll compute the time between the start of the trial and the point at which the mouse licked, and compare this time interval across aborted trials and hit trials. To do this, we'll first create a new column in the `trials` table containing the aforementioned time interval, and then plot histograms of the time interval for hit and aborted trials.

Since the mouse correctly waited until the stimulus changed in hit trials, the wait time distribution should closely match the distribution of change times. On the other hand, the distribution of lick times should be much smaller in the case of aborted trials: but this will depend on how the mouse licked in aborted trials.

> **Note**: Do _not_ use the 'licks' _column_ from the `trials` table to get the times at which licks occurred. Use information from the `licks` _table_ instead. This is because the `trials` table contains lick timestamps as recorded by the task control computer, while the `licks` table contains lick timestamps as recorded by the lick sensor; the latter is more accurate.

If you want to try coming up with the logic for this on your own (i.e., without following the prompts), then this is a good point to pause, think and/or try coding.
    
</div>

<div class="alert alert-block alert-success">

**Prompt 1.2.a:** First, find the initial lick in each trial. The `licks` table contains a `timestamps` column that has all time instants at which the mouse licked. Use numpy's `np.searchsorted` function to find the lick timestamps that come immediately after each trial's `start_time`. Add these indices as a new column to the `trials` table called `first_lick_indices`.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.2.b:** Now that we have the indices of rows in the `licks` table that correspond to the first lick in each trial, we can use these indices to find the _times_ at which the first licks occurred.

But, the `first_lick_indices` we have computed cannot be used directly to index the licks table because the last trial (or the last few) may have had no licks. In this case, `searchsorted` returns an index one greater than the number of rows in the licks table (to indicate that these trials started _after_ the last lick).

Examine the last few rows of the `first_lick_indices` column and you'll notice that some indices come after the last index in the `licks` table. Remove these rows and create a new dataframe of subselected trials, which only contains those `first_lick_indices` that correspond to valid licks.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.2.c:** Use the `first_lick_indices` from the subselected trials to _index_ the timestamps column of the `licks` table; store the timestamps in a new pandas Series.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.2.d:** By default, this Series will inherit its index from the `licks` table. Change the index of this new Series to the index of the subselected trials. This is required to add the timestamps of the first licks back to the appropriate rows of the `trials` table.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.2.e:** Add the new Series containing the timestamps of first licks to the `trials` table, in the form of a new column.
    
</div>

<div class="alert alert-block alert-success">

**Prompt 1.2.f:** Create a new column containing the time difference between the start of the trial and the first lick

</div>

<div class="alert alert-block alert-success">

**Prompt 1.2.g:** Sub-select only the hit trials and the aborted trials.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.2.h:** Plot a histogram of the time from the start of the trial to the first lick, colored by trial type.
Use bins in steps of 0.75 seconds so that each histogram column represents licks within a stimulus presentation.

For plotting, you can call matplotlib's `plt.hist` function twice, once for each trial type. Supply the data, and additionally set the `bins`, `alpha` and `label` arguments. Call `plt.legend()`, which will make use of the label arguments you supplied to create a legend. Don't forget to label your x- and y-axes.
    
</div>

<div class="alert alert-block alert-success">

Does anything about this plot look unusual? What sort of distribution should we have expected in aborted trials, if the mouse licked on each stimulus presentation uniformly at random?

</div>

<div class="alert alert-block alert-success">

### 1.3  Comparing lick latency from the stimulus flash for hit trials and aborted/false alarm trials

Next, we'll compare the latency between the first lick in each trial and the _preceding stimulus flash_ (rather than the start of the trial), across hit, aborted and false alarm trials.

To get information about stimulus presentations, we'll have to look at the `stimulus_presentations` table. Each stimulus presentation (i.e., flash) occurs within a trial: we'll associate each stimulus presentation to its trial, and then use the first lick times we computed in Exercise 1.2 to find the stimulus presentation preceding each lick.

For hit trials, we expect the licks to be stimulus-locked, with a response after a clear delay. If this is also true for aborted trials, it could mean that the mice are trying to use a "timing" strategy: licking after a particular number of flashes, rather than licking on the flash after the image changes.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.3.a:** First, use `searchsorted` to find the trial index that starts _before_ the `start_time` of each stimulus presentation. Assign these indices to a new column of the stimulus presentation table called `trials_id`.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.3.b:** Next, merge the `stimulus_presentations` table with the `trials` table using the newly created `trials_id` column.

While merging, note that both the `stimulus_presentations` table and the `trials` table have some columns of the same name, e.g., `start_time`. To distinguish these columns after merging, use the `suffixes` argument. The default suffixes are `('_x', '_y')`, but it's better to rename them to be more meaningful.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.3.c:** Print out the columns of the merged table to check that everything is there.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.3.d:** Now that we have all the info we need, subselect the hit, false alarm and aborted trials.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.3.e:** Next, select the stimulus presentations that precede the first lick timestamp, and create a copy of the returned data frame. We need to call `.copy()` after selecting to prevent the `SettingWithCopy` warning later.

</div>

<div class="alert alert-block alert-success">

To find the stimulus presentation *just* before the first lick timestamp, we can group the data frame (of all stimulus presentations preceding the first lick timestamp) by trial index and select the stimulus presentation with the largest stimulus presentation index.

**Prompt 1.3.f:** First, create a new column called `stim_id` to contain the stimulus presentation index (move the index to a column using `reset_index`).

</div>

<div class="alert alert-block alert-success">

**Prompt 1.3.g:** Then, group rows by `trials_id`, and find the maximum `stim_id` in each trial.
    
The resulting data frame will associate every trial with a `stim_id`, which is the last stimulus presentation preceding the first lick in that trial.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.3.h:** Now, choose those rows from the merged `stimulus_presentation`-`trials` table that have these `stim_id`s.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.3.i:** Finally, compute the latency between the first lick in each trial and the start of the preceding stimulus flash. Add this to a new column.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.3.j:** Plot a histogram of lick latencies, colored by trial type.

</div>

<div class="alert alert-block alert-success">

Since both hit trials and aborted trials appear to be stimulus-locked, it would appear that this mouse is using a timing strategy.
    
> **Note:** We could have gone about Exercise 1.3 in many other ways too. The reason we chose this method was to practise using `merge` and `groupby`. For instance, here are some other possible algorithms:
> 1. We could have used `searchsorted` to directly find the stimulus presentations preceding the first lick in each trial.
> 2. Or, we could have computed the `lick_latency` column after 1.3.d, and then selected the rows with smallest positive lick latencies, which would have avoided the `groupby`.

</div>

<div class="alert alert-block alert-success">

### 1.4  Computing lick bouts

Mice tend to lick quickly in rapid succession. In this exercise, we'll examine how to separate these licks into lick "bouts".
    
We'll compute the inter-lick interval from the `licks` table, and use a histogram of these intervals to find a suitable lick bout "threshold". This will be a cut-off time interval (>100ms and &lt;1 second) within which two licks will be considered part of a bout.
    
Then, we'll add a column to the licks table to indicate the lick bout number that each lick is part of. Finally, we'll plot a histogram of inter-bout intervals.
    
</div>

<div class="alert alert-block alert-success">

**Prompt 1.4.a:** Compute the inter-lick interval from the `licks` table. You can use the `np.diff` function from numpy to compute differences of adjacent elements.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.4.b:** Plot a histogram of the inter-lick interval to find a reasonable threshold for lick bouts (use 1000 bins and log scaling on the y-axis to better evaluate where the lick bout cuts off).

</div>

<div class="alert alert-block alert-success">

**Prompt 1.4.c:** Set the lick bout threshold based on where the histogram clips off (roughly)

</div>

<div class="alert alert-block alert-success">

**Prompt 1.4.d:** (This step is not needed for what follows) Add the inter-lick intervals to the `licks` table. There is now one row fewer than needed, since the inter-lick interval is not defined for the very first lick: make that row either `inf` or `NaN`.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.4.e:** Create a `lick_bout` column in the `licks` table, to contain the "bout index" that each lick is part of: to compute the bout index, we need to increment a counter every time the inter-lick interval exceeds the lick bout threshold that we defined above. `np.cumsum` computes a cumulative sum, and can be used do this efficiently.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.4.f:** Create a table with the timestamps of lick bouts, i.e., timestamps of the first lick in each bout. You could use a `groupby` to do this.

</div>

<div class="alert alert-block alert-success">

**Prompt 1.4.g:** Check what the histogram of inter-bout intervals looks like (leaving out intervals >=1 minute in length)

</div>