<img src="../resources/cropped-SummerWorkshop_Header.png">  

<h1 align="center">Visual Behavior Neuropixels Dataset Exercises</h1> 
<h2 align="center">Summer Workshop on the Dynamic Brain</h2> 

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [None]:
import platform
platstring = platform.platform()

data_dirname = 'visual-behavior-neuropixels'
use_static = False
if 'Darwin' in platstring or 'macOS' in platstring:
    # macOS 
    data_root = "/Volumes/Brain2022/"
elif 'Windows'  in platstring:
    # Windows (replace with the drive letter of USB drive)
    data_root = "E:/"
elif ('amzn' in platstring):
    # then on AWS
    data_root = "/data/"
    data_dirname = 'visual-behavior-neuropixels-data'
    use_static = True
else:
    # then your own linux platform
    # EDIT location where you mounted hard drive
    data_root = "/media/$USERNAME/Brain2022/"

In [None]:
from allensdk.brain_observatory.behavior.behavior_project_cache.\
    behavior_neuropixels_project_cache \
    import VisualBehaviorNeuropixelsProjectCache

# this path should point to the location of the dataset on your platform
cache_dir = os.path.join(data_root, data_dirname)

cache = VisualBehaviorNeuropixelsProjectCache.from_local_cache(
            cache_dir=cache_dir, use_static_cache=use_static)

<div class="alert alert-block alert-success">

<h2>Exercise 1: Playing with the trials table and examining licking behavior</h2>

<p>
In this starter exercise, we'll practise working with the trials table and examine the licking behavior of one of the mice in the dataset. For this exercise, there are comments with detailed prompts that act as guiderails.
</p>

<p>
The tasks we'll undertake are:
</p>
<ol>
<li>Creating a new column for trial "type" (hit, miss, false alarm, etc.) in the trials table and plotting the number trials of each type.</li>
<li>Comparing lick times (from start of each trial) for hit trials and aborted trials</li>
<li>Comparing lick latency from the stimulus flash for hit, aborted and false alarm trials</li>
<li>Computing lick bouts</li>
</ol>
    
<p>
<strong>Note:</strong> With all of these exercises, there are multiple ways of accomplishing the same end goal. For each of the tasks in Exercise 1, we've provided prompts that take you through one logical sequence of steps to complete these analyses (which we think is simple and teaches useful concepts). The objective is to give you a relatively easy algorithm you can follow, so that you can first focus on getting used to writing pandas code. But feel free to try completing the task objectives using your own algorithm first, and consulting our prompts if you get stuck.
</p>

</div>

In [None]:
# Setup: first, let's get the relevant tables
session_id = 1065437523  # This is a good session for looking at lick behavior
session = cache.get_ecephys_session(ecephys_session_id=session_id)

stimulus_presentations = session.stimulus_presentations
trials = session.trials
licks = session.licks

<div class="alert alert-block alert-success">

<h3>1.1 Creating a new column for trial "type" in the trials table, and plotting the number trials of each type</h3>

<p>
The different types of trials are: <code>hit</code>, <code>miss</code>, <code>false_alarm</code>, <code>correct_reject</code>, <code>aborted</code>, and <code>auto_rewarded</code>. Each of these trial types has a separate boolean column in the <code>trials</code> table indicating whether a particular trial is of that type. For any given trial, only one of these columns will be True, and the rest will be False.
</p>

<p>
We are going to convert these boolean columns into a new column in the <code>trials</code> table called <code>trial_type</code>, which contains the trial type as a string. Then, we'll create a bar plot indicating how many trials there are of each type.
</p>

<p>
If you want to try coming up with the logic for this on your own (i.e., without following the prompts), then this is a good point to pause, think and/or try coding.
</p>

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.1.a:</strong> Create a new column called <code>trial_type</code> filled with NaNs.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.1.b:</strong> Select the hit trials (rows of the trials table where the column <code>hit</code> is True), and set the <code>trial_type</code> column for those rows to be equal to the string <code>'hit'</code>.

</div>

<div class="alert alert-block alert-success">

<p>
We'll now repeat this action for all trial types using a for loop.
</p>

<p>
<strong>Prompt 1.1.c:</strong> Create a python list with the names of all the different trial types (no harm in including <code>'hit'</code> again). Then, using a for loop over the list of trial types you just created, select rows of the trials table of each type, and assign the <code>trial_type</code> column for those rows to equal the name of the trial's type.
</p>

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.1.d:</strong> Examine the trials table to see how that worked

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.1.e:</strong> Now, use the <code>value_counts</code> function on the <code>trials</code> data frame to get the number of rows having each of these trial types. Rename the output Series to <code>'trial_type_tally'</code>, which makes more sense.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.1.f:</strong> Make the index of this series a new column using <code>reset_index</code>, and rename the new column to <code>trial_type</code>.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.1.g:</strong> Finally, plot these values using a horizontal bar plot. When using matplotlib, you can use the <code>plt.barh</code> function: supply the <code>y</code>, <code>width</code> and <code>tick_label</code> parameters. Give the plot a meaningful title.
    
</div>

<div class="alert alert-block alert-success">

<h3>1.2 Comparing lick times for hit trials and aborted trials</h3>

<p>
Next, we'll compute the time interval between the start of the trial and the point at which the mouse licked, and compare this time interval across aborted trials and hit trials. To do this, we'll first create a new column in the <code>trials</code> table containing the aforementioned time interval, and then plot histograms of the time interval for hit and aborted trials.
</p>

<p>
Since the mouse correctly waited until the stimulus changed in hit trials, the wait time distribution should closely match the distribution of change times. On the other hand, the distribution of lick times should be much smaller in the case of aborted trials: but this will depend on how the mouse licked in aborted trials.
</p>

<blockquote>
<p>
<strong>Note</strong>: Do <em>not</em> use the 'licks' <em>column</em> from the <code>trials</code> table to get the times at which licks occurred. Use information from the <code>licks</code> <em>table</em> instead. This is because the <code>trials</code> table contains lick timestamps as recorded by the task control computer, while the <code>licks</code> table contains lick timestamps as recorded by the lick sensor; the latter is more accurate.
</p>

<p>
<strong>Hint</strong>: The first step in this exercise is to find the initial lick in every trial. That is, for every <code>start_time</code> in the <code>trials</code> table, we need to find the first lick timestamp in the <code>licks</code> table coming immediately after it. To do this efficiently, we'll use the numpy function <a href="https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html"><code>np.searchsorted</code></a>: more information on using this function is available in the first prompt for this exercise.
</p>
</blockquote>

<p>
If you want to try coming up with the logic for this on your own (i.e., without following the prompts), then this is a good point to pause, think and/or try coding.
</p>

</div>

<div class="alert alert-block alert-success">

<p>
<strong>Prompt 1.2.a:</strong> First, find the initial lick in each trial. The <code>licks</code> table contains a <code>timestamps</code> column that has all time instants at which the mouse licked. Use numpy's <code>np.searchsorted</code> function to find the indices of the licks that come immediately after each trial's <code>start_time</code>. (These indices may not make sense for trials where the mouse did not lick, but later, we will be considering only hit and aborted trials, in which we know for sure that the mouse did lick.) You should have one lick index for each trial: add these indices as a new column to the <code>trials</code> table called <code>first_lick_indices</code>.
</p>

<blockquote>
<p>
<a href="https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html"><code>np.searchsorted</code></a> can efficiently find the index at which a number <code>v</code> must be inserted into a sorted array <code>a</code> to maintain sorted order. <code>v</code> can also be an array, in which case <code>searchsorted</code> returns one index for each element of <code>v</code>.
</p>
<p>
To take an example, let's consider a sorted array, <code>a = [1, 4, 7, 9]</code> and suppose <code>v = [5, 0, 10]</code>. Then, <code>r = np.searchsorted(a, v)</code> would produce <code>r = [2, 0, 4]</code>. Observe that <code>v[0] = 5</code> needs to be inserted into <code>a</code> before <code>7</code>, which is at <code>a[2]</code>. Thus, <code>r[0] = 2</code> because that is the index in <code>a</code> before which <code>v[0]</code> needs to be inserted.
</p>
<p>
Similarly, <code>v[1] = 0</code> needs to be inserted before <code>a[0] = 1</code>, so <code>r[1] = 0</code>. Finally, <code>v[2] = 10</code> needs to be inserted at the end of the array, i.e., at index <code>r[2] = 4</code> (even though this is <em>currently</em> not a valid index for <code>a</code> &mdash; valid indices run from 0 through 3 &mdash; it would be valid after the insertion).
</p>
<p>
Note that the <em>length</em> of the returned array <code>r</code> is equal to the length of <code>v</code>, while the <em>elements</em> of <code>r</code> are between <code>0</code> and <code>len(a)</code> (inclusive).
</p>
</blockquote>

</div>

<div class="alert alert-block alert-success">

<p>
<strong>Prompt 1.2.b:</strong> Now that we have the indices of rows in the <code>licks</code> table that correspond to the first lick in each trial, we can use these indices to find the <em>times</em> at which the first licks occurred.
</p>

<p>
But, the <code>first_lick_indices</code> we have computed cannot be used directly to index the licks table because the last trial (or the last few) may have had no licks. In this case, <code>searchsorted</code> returns an index one greater than the number of rows in the licks table (to indicate that these trials started <em>after</em> the last lick).
</p>

<p>
Examine the last few rows of the <code>first_lick_indices</code> column and you'll notice that some indices come after the last index in the <code>licks</code> table. Remove these rows and create a new dataframe of subselected trials, which only contains those <code>first_lick_indices</code> that correspond to valid licks.
</p>

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.2.c:</strong> Use the <code>first_lick_indices</code> from the subselected trials to _index_ the timestamps column of the <code>licks</code> table; store the timestamps in a new pandas Series.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.2.d:</strong> By default, this Series will inherit its index from the <code>licks</code> table. Change the index of this new Series to the index of the subselected trials. This is required to add the timestamps of the first licks back to the appropriate rows of the <code>trials</code> table.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.2.e:</strong> Add the new Series containing the timestamps of first licks to the <code>trials</code> table, in the form of a new column.
    
</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.2.f:</strong> Create a new column containing the time difference between the start of the trial and the first lick

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.2.g:</strong> Sub-select only the hit trials and the aborted trials.

</div>

<div class="alert alert-block alert-success">

<p>
<strong>Prompt 1.2.h:</strong> Plot a histogram of the time from the start of the trial to the first lick, colored by trial type.
Use bins in steps of 0.75 seconds so that each histogram column represents licks within a stimulus presentation.
</p>

<p>
For plotting, you can call matplotlib's <code>plt.hist</code> function twice, once for each trial type. Supply the data, and additionally set the <code>bins</code>, <code>alpha</code> and <code>label</code> arguments. Call <code>plt.legend()</code>, which will make use of the label arguments you supplied to create a legend. Don't forget to label your x- and y-axes.
</p>

</div>

<div class="alert alert-block alert-success">

Does anything about this plot look unusual? What sort of distribution should we have expected in aborted trials, if the mouse licked on each stimulus presentation uniformly at random?

</div>

<div class="alert alert-block alert-success">

<h3>1.3  Comparing lick latency from the stimulus flash for hit trials and aborted/false alarm trials</h3>

<p>
Next, we'll compare the latency between the first lick in each trial and the start of the <em>preceding stimulus flash</em> (rather than the start of the trial), across hit, aborted and false alarm trials.
</p>

<p>    
To get information about stimulus presentations, we'll have to look at the <code>stimulus_presentations</code> table. Recall that every trial consists of multiple stimulus presentations (i.e., image flashes). We'll associate each stimulus presentation with the trial it falls within: this will involve merging the <code>stimulus_presentations</code> and <code>trials</code> tables. Then, we'll use the first lick times that we computed for each trial in Exercise 1.2 to find the stimulus presentation preceding each lick. Finally, we'll plot histograms of the lick latency for hit, aborted and false alarm trials.
</p>

<p>
For hit trials, we expect the licks to be stimulus-locked, with a response after a clear delay. Based on the histograms we plotted in Exercise 1.2, this mouse appears to be licking in aborted trials at time intervals that are similar to how it licks on hit trials. This suggests that its licks are not impulsive or random, rather, they might be visually evoked. In this exercise, we'll see whether the mouse's licking is stimulus-locked on aborted trials too.
</p>

<p>
If you want to try coming up with the logic for this on your own (i.e., without following the prompts), then this is a good point to pause, think and/or try coding.
</p>

</div>

<div class="alert alert-block alert-success">

<p>
<strong>Prompt 1.3.a:</strong> First, use <code>searchsorted</code> to find the trial index that starts _before_ the <code>start_time</code> of each stimulus presentation. Assign these indices to a new column of the stimulus presentations table called <code>trials_id</code>.
</p>

<p>
Examine the stimulus presentations table to see how this looks. The new <code>trials_id</code> column should tell us which trial each stimulus presentation was part of, so that we can merge the <code>stimulus_presentations</code> table with the <code>trials</code> table in the next step.
</p>

</div>

<div class="alert alert-block alert-success">

<p>
<strong>Prompt 1.3.b:</strong> Next, merge the <code>stimulus_presentations</code> table with the <code>trials</code> table using the newly created <code>trials_id</code> column.
</p>

<p>
While merging, note that both the <code>stimulus_presentations</code> table and the <code>trials</code> table have some columns of the same name, e.g., <code>start_time</code>. To distinguish these columns after merging, use the <code>suffixes</code> argument. The default suffixes are <code>('_x', '_y')</code>, but it's better to rename them to be more meaningful.
</p>

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.3.c:</strong> Print out the columns of the merged table to check that everything is there.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.3.d:</strong> Now that we have all the info we need, subselect the hit, false alarm and aborted trials.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.3.e:</strong> Next, select the stimulus presentations that precede the first lick timestamp, and create a copy of the returned data frame. We need to call <code>.copy()</code> after selecting to prevent the <code>SettingWithCopy</code> warning later.

</div>

<div class="alert alert-block alert-success">

<p>
To find the stimulus presentation *just* before the first lick timestamp, we can group the data frame (of all stimulus presentations preceding the first lick timestamp) by trial index and select the stimulus presentation with the largest stimulus presentation index.
</p>

<p>
<strong>Prompt 1.3.f:</strong> First, create a new column called <code>stim_id</code> to contain the stimulus presentation index (move the index to a column using <code>reset_index</code>).
</p>

</div>

<div class="alert alert-block alert-success">

<p>
<strong>Prompt 1.3.g:</strong> Then, group rows by <code>trials_id</code>, and find the maximum <code>stim_id</code> in each trial.
</p>

<p>
The resulting data frame will associate every trial with a <code>stim_id</code>, which is the last stimulus presentation preceding the first lick in that trial.
</p>

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.3.h:</strong> Now, choose those rows from the merged <code>stimulus_presentations</code>-<code>trials</code> table that have these <code>stim_id</code>s.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.3.i:</strong> Finally, compute the latency between the first lick in each trial and the start of the preceding stimulus flash. Add this to a new column.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.3.j:</strong> Plot a histogram of lick latencies, colored by trial type.

</div>

<div class="alert alert-block alert-success">

<blockquote>
<strong>Note:</strong> We could have gone about Exercise 1.3 in many other ways too. The reason we chose this method was to practise using <code>merge</code> and <code>groupby</code>. For instance, here are some other possible algorithms:
<ol>
<li>We could have used <code>searchsorted</code> to directly find the stimulus presentations preceding the first lick in each trial.</li>
<li>Or, we could have computed the <code>lick_latency</code> column after 1.3.d, and then selected the rows with smallest positive lick latencies, which would have avoided the <code>groupby</code>.</li>
</ol>
</blockquote>

</div>

<div class="alert alert-block alert-success">

<h3>1.4  Computing lick bouts</h3>

<p>
Mice tend to lick quickly in rapid succession. In this exercise, we'll examine how to separate these licks into lick "bouts".
</p>

<p>    
We'll compute the inter-lick interval from the <code>licks</code> table, and use a histogram of these intervals to find a suitable lick bout "threshold". This will be a cut-off time interval (&gt;100ms and &lt;1 second) within which two licks will be considered part of a bout.
</p>

<p>
Then, we'll add a column to the licks table to indicate the lick bout number that each lick is part of. Finally, we'll plot a histogram of inter-bout intervals.
</p>

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.4.a:</strong> Compute the inter-lick interval from the <code>licks</code> table. You can use the <code>np.diff</code> function from numpy to compute differences of adjacent elements.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.4.b:</strong> Plot a histogram of the inter-lick interval to find a reasonable threshold for lick bouts (use 1000 bins and log scaling on the y-axis to better evaluate where the lick bout cuts off).

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.4.c:</strong> Set the lick bout threshold based on where the histogram clips off (roughly)

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.4.d:</strong> (This step is not needed for what follows) Add the inter-lick intervals to the <code>licks</code> table. There is now one row fewer than needed, since the inter-lick interval is not defined for the very first lick: make that row either <code>inf</code> or <code>NaN</code>.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.4.e:</strong> Create a <code>lick_bout</code> column in the <code>licks</code> table, to contain the "bout index" that each lick is part of: to compute the bout index, we need to increment a counter every time the inter-lick interval exceeds the lick bout threshold that we defined above. <code>np.cumsum</code> computes a cumulative sum, and can be used do this efficiently.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.4.f:</strong> Create a table with the timestamps of lick bouts, i.e., timestamps of the first lick in each bout. You could use a <code>groupby</code> to do this.

</div>

<div class="alert alert-block alert-success">

<strong>Prompt 1.4.g:</strong> Check what the histogram of inter-bout intervals looks like (leaving out intervals &gt;=1 minute in length)

</div>