![Image](resources/banner.jpg)

<h1 align="center">Allen Brain Observatory Visual Coding Neuropixels </h1> 
<h2 align="center"> Day 1, Morning Session. SWDB 2024 </h2> 

<h3 align="center">Monday, August 19, 2023</h3> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
The Allen Brain Observatory Visual Coding Neuropixels dataset is a large-scale survey of physiological activity in mouse visual cortex in response to a variety of visual stimuli under passive viewing conditions.  The animals are head-fixed but free to run on a disc.  Electrophysiological recording with Neuropixels probes is performed in different areas and layers.  This notebook is a brief introduction to get you started with this data set and lead you to resources for you to explore further.

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
***What kind of questions can you answer with this dataset?***

This dataset contains recordings of activity in response to a variety of natural and artificial visual stimuli.  This makes it suitable for a variety of coding questions.

- How are stimuli and features from the external world encoded in neural responses?  
- How do the encoding properties differ across areas and layers?  In different cell lines?
- Can you build predictive models of response from stimuli?
- How are running activity and pupil size related to cortical activity?
- How can information about the stimuli and/or the animal's state be extracted from neural activity?  Can you decode stimuli?
- Do neurons coordinate their activity?  Do the act in ensembles?  
- Is there any spatial aspect to neural information?

These are just some of the questions that might be addressed from this type of data.  

***Why electrophysiology?***

- You get high temporal resolution.
- You are able to see low firing rate activity (but not a complete lack of activity).

***Why NOT electrophysiology?***

- A more constrained spatial arrangement of recordings.
- No firing whatsoever means you never record the cell.
- Limited number of units per areas relative to two-photon imaging.

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
**Databook**

The databook is a resource for more in-depth information and examples for the Allen Brain Observatory Visual Coding Two-photon dataset.  You can find the pages for this data set here:  https://allenswdb.github.io/physiology/ephys/visual-coding/vcnp.html

![Image](resources/databook_vcnp.png)

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
***Remember the tools you have!***

- Use the databook as a reference; this notebook contains only a small portion of what is in the databook!
- Use the help function to find function arguments
- Use `dir` to see data and functions in an object
- Use tab complete in jupyter 

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
Using the Python objects we'll show you below, you can extract information about this dataset such as how many recordings are available.

For each ...

- Spike times for identified units
- quality control metrics

</div>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import platform, os

%matplotlib inline

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
The following cell sets up a path variable so that this notebook will work on the cloud or using data accessed locally, e.g. from your hard drive.

</div>

In [None]:
# Set file location based on platform. 
platstring = platform.platform()
if ('Darwin' in platstring) or ('macOS' in platstring):
    # macOS 
    data_root = "/Volumes/Brain2024/"
elif 'Windows'  in platstring:
    # Windows (replace with the drive letter of USB drive)
    data_root = "E:/"
elif ('amzn' in platstring):
    # then on Code Ocean
    data_root = "/data/"
else:
    # then your own linux platform
    # EDIT location where you mounted hard drive
    data_root = "/media/$USERNAME/Brain2024/"

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
This dataset is accessed via the `allensdk` python package.  It requires instantiating an `EcephysProjectCache` object that we usually call `cache`.  You'll access all of the data for this dataset using this object.

</div>

In [None]:
from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache

In [None]:
manifest_path = os.path.join(data_root, "allen-brain-observatory/visual-coding-neuropixels/ecephys-cache/manifest.json")
cache = EcephysProjectCache.from_warehouse(manifest=manifest_path)

In [None]:
cache.get_all_session_types()

![Image](resources/neuropixels_stimulus_sets.webp)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
You can get information about which sessions are available with the following:

</div>

In [None]:
sessions = cache.get_session_table()
sessions.head()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
Data on an individual session can be accessed by using the method `get_session_data` with the `cache` object.  It takes an argument that is the session id.  This is the index of the table returned as `sessions` above.

If your data is not mounted correctly, you should get a download warning here.

</div>

In [None]:
session_id = 715093703
session_data = cache.get_session_data(session_id)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
Two important sources of information about each session are the `metadata` and the `structurewise_unit_counts`

</div>

In [None]:
session_data.metadata

In [None]:
session_data.structurewise_unit_counts

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
Information on each unit is contained in the units table accessed with `session_data.units`. This is indexed on the unique id for each unit.

</div>

In [None]:
session_data.units.head()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
Let's get the id for the first unit

</div>

In [None]:
unit_id = session_data.units.index[0]
print(unit_id)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
Recorded spike times can be accessed with `session_data.spike_times`, which returns a dictionary whose keys are unit ids and values are numpy arrays of individual spike times in seconds. Let's look at the unit we identified above

</div>

In [None]:
session_data.spike_times[unit_id]

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
Let's make a raster plot of spike times for all the units in VISp.

</div>

In [None]:
session_units = session_data.units
visp_units = session_units[session_units.ecephys_structure_acronym=='VISp']
visp_units.head()

In [None]:
visp_spike_times = {uid: st for uid, st in session_data.spike_times.items() if uid in visp_units.index}
len(visp_spike_times)

In [None]:
fig, ax = plt.subplots(figsize=(15,5))

for i, (unit_id, st) in enumerate(visp_spike_times.items()):
    ax.plot(st, np.zeros(len(st))+i, 'ko', markersize=1)
    
ax.set_ylabel('VISp unit index')
ax.set_xlabel('time (s)')
ax.set_title('Spike raster for VISp units recorded in session {}'.format(session_id))

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
Why are there gaps in the spike times?  Occasionally there were issues with a recording session that did not invalidate the whole experiment, but did invalidate time intervals within the experiment.  You can see these times directly with `session_data.get_invalid_times`.  For some analyses you may have to be aware of these times and explicitly account for them.

</div>

In [None]:
session_data.get_invalid_times()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
There isn't much detail in that raster plot above, so let's zoom in:

</div>

In [None]:
fig, ax = plt.subplots(figsize=(15,5))

t_start = 1000
t_end = 1010

for i, (unit_id, st) in enumerate(visp_spike_times.items()):
    st_temp = st[np.where(np.logical_and(st>=t_start, st<t_end))]
    ax.plot(st_temp, np.zeros(len(st_temp))+i, 'ko', markersize=1)
    
ax.set_ylabel('VISp unit index')
ax.set_xlabel('time (s)')
ax.set_title('Spike raster for VISp units recorded in session {}'.format(session_id))

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
We can retrieve the time intervals during which certain stimulus types were shown via `get_stimulus_epochs`.

</div>

In [None]:
stimulus_epoch_table = session_data.get_stimulus_epochs()
stimulus_epoch_table

In [None]:
fig, ax = plt.subplots(figsize=(15,5))

for i, (unit_id, st) in enumerate(visp_spike_times.items()):
    ax.plot(st, np.zeros(len(st))+i, 'ko', markersize=1)
    
ax.set_ylabel('VISp unit index')
ax.set_xlabel('time (s)')
ax.set_title('Spike raster for VISp units recorded in session {}'.format(session_id))

colors = ['blue','orange','green','red','yellow','purple','magenta','gray','lightblue']
for c, stim_name in enumerate(session_data.stimulus_names):
    stim = stimulus_epoch_table[stimulus_epoch_table.stimulus_name==stim_name]
    for j in range(len(stim)):
        plt.axvspan(xmin=stim["start_time"].iloc[j], xmax=stim["stop_time"].iloc[j], color=colors[c], alpha=0.1)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
***Explore Further***

- In addition to the session_table, we looked at above, the cache object also has tables for the probes, the channels, and the individual units.  Call the function that returns these tables.  What information do they contain?

- The running speed and pupil size are also available in these data.  Find out how to return them and add them to the plot above.

- Units have quality control metrics and there are default values that we consider "good" units.  What are these default values and what is the distribution of these metrics?  Plot these distributions.

- These data also have LFP available.  How you access this?  Plot the LFP for a single probe in a session.

:::{admonition} Hint
:class: dropdown
Remember to check the [Databook](https://allenswdb.github.io/physiology/ephys/visual-coding/vcnp.html)!
:::

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

***Homework 1***

How many sessions are there with Pvalb mice with the `brain_observatory_1.1’ stimulus?

</div>


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

***Homework 2***

Select a recording session.  Make a plot of the distribution of mean firing rates per brain structure.

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

***Homework 3***

Select a recording session, then select a unit within that session.  To which image does that unit have it’s largest mean response? Make a raster plot of this unit’s response to that image.

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

***Homework 4***

We plotted the stimulus epochs above.  Pick an individual stimulus (e.g. a particular natural scene) and remake the plot above by shading when that particular stimulus was shown.  For natural stimuli, make a figure with the exact stimulus shown.

:::{admonition} Hint #1
:class: dropdown
You will need the `stimulus_table`.  Look inside the `session_data` object or check the data book to see how to find this.
:::
:::{admonition} Hint #2
:class: dropdown
For natural stimuli, you'll want the `stimulus_template`.  Look inside the `cache` object or check the data book.
:::

</div>