# Neuropixels Utilities

This notebook contains code to interact with the Neuropixels data, usefully linked to on our DataHub.

In [1]:
# Import the Neuropixels Cache
from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache

# We have all of this data on the datahub! This is where it lives.
manifest_path = '/datasets/allen-brain-observatory/visual-coding-neuropixels/ecephys-cache/manifest.json' 

# Create the EcephysProjectCache object
cache = EcephysProjectCache.fixed(manifest=manifest_path)

# Get the sessions available in this dataset
sessions = cache.get_session_table()
print('Total number of sessions: ' + str(len(sessions)))
sessions.head()

Total number of sessions: 58


Unnamed: 0_level_0,published_at,specimen_id,session_type,age_in_days,sex,full_genotype,unit_count,channel_count,probe_count,ecephys_structure_acronyms
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
715093703,2019-10-03T00:00:00Z,699733581,brain_observatory_1.1,118.0,M,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,884,2219,6,"[CA1, VISrl, nan, PO, LP, LGd, CA3, DG, VISl, ..."
719161530,2019-10-03T00:00:00Z,703279284,brain_observatory_1.1,122.0,M,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,755,2214,6,"[TH, Eth, APN, POL, LP, DG, CA1, VISpm, nan, N..."
721123822,2019-10-03T00:00:00Z,707296982,brain_observatory_1.1,125.0,M,Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,444,2229,6,"[MB, SCig, PPT, NOT, DG, CA1, VISam, nan, LP, ..."
732592105,2019-10-03T00:00:00Z,717038288,brain_observatory_1.1,100.0,M,wt/wt,824,1847,5,"[grey, VISpm, nan, VISp, VISl, VISal, VISrl]"
737581020,2019-10-03T00:00:00Z,718643567,brain_observatory_1.1,108.0,M,wt/wt,568,2218,6,"[grey, VISmma, nan, VISpm, VISp, VISl, VISrl]"


Let's say we only want sessions where the data has recordings from CA1. We can do the following to create a session list that we want.

In [2]:
# Create a session list based on some criteria

session_list = []

for idx,structure_list in enumerate(sessions['ecephys_structure_acronyms']):
    if 'CA1' in structure_list:
        session_list.append(sessions.index[idx])   
        
print('There are '+str(len(session_list))+' sessions that meet this criteria:')
print(session_list)

There are 52 sessions that meet this criteria:
[715093703, 719161530, 721123822, 743475441, 744228101, 746083955, 750332458, 750749662, 751348571, 754312389, 754829445, 755434585, 756029989, 757216464, 757970808, 758798717, 759883607, 760345702, 761418226, 762602078, 763673393, 766640955, 767871931, 768515987, 771160300, 771990200, 773418906, 774875821, 778240327, 778998620, 779839471, 781842082, 786091066, 787025148, 789848216, 791319847, 793224716, 794812542, 797828357, 798911424, 799864342, 816200189, 819186360, 819701982, 821695405, 829720705, 831882777, 835479236, 839068429, 839557629, 840012044, 847657808]


Now, we can use the session list to get the data we want. Unfortunately, it looks like we can only extract one experiment as a time, so if you want to do this for multiple experiments, you'll need to loop over the `get_session-data` method for your entire session_list. For example, your workflow might be:

1. Extract one session.
2. Look for units recorded from your brain region of interest in that session.
3. Extract whatever metric you're interested in (e.g., firing rate).
4. Append those values to a list of firing rates.
5. Loop back around to the next session.

Here, we'll just take the first session as an example.

In [3]:
session = cache.get_session_data(session_list[0])

Much of the interesting, processed data for us exists in the `units` dataframe. This dataframe contains information about a number of things, both in terms of how well the unit was sorted, and what its properties might be:
* **firing rate**: mean spike rate during the entire session
* **presence ratio**: fraction of session when spikes are present
* **ISI violations**: rate of refractory period violations
* **Isolation distances**: distance to nearest cluster in Mihalanobis space
* **d'**: classification accuracy based on LDA
* **SNR**: signal to noise ratio
* **Maximum drift**: Maximum change in spike depth during recording
* **Cumulative drift**: Cumulative change in spike depth during recording
* **1D Waveform features**: For more information on these see [quality metrics](https://github.com/AllenInstitute/ecephys_spike_sorting/tree/master/ecephys_spike_sorting/modules/quality_metrics) and [mean_waveforms](https://github.com/AllenInstitute/ecephys_spike_sorting/tree/master/ecephys_spike_sorting/modules/mean_waveforms)

You can use `session.units.columns` to see everything that this dataframe contains. Here "units" are individually identified neurons.

In [7]:
session.units.columns.sort_values()

Index(['L_ratio', 'amplitude_cutoff', 'anterior_posterior_ccf_coordinate',
       'area_rf', 'azimuth_rf', 'c50_dg', 'channel_local_index', 'cluster_id',
       'cumulative_drift', 'd_prime', 'dorsal_ventral_ccf_coordinate',
       'ecephys_structure_acronym', 'ecephys_structure_id', 'elevation_rf',
       'f1_f0_dg', 'fano_dg', 'fano_fl', 'fano_ns', 'fano_rf', 'fano_sg',
       'firing_rate', 'firing_rate_dg', 'firing_rate_fl', 'firing_rate_ns',
       'firing_rate_rf', 'firing_rate_sg', 'g_dsi_dg', 'g_osi_dg', 'g_osi_sg',
       'image_selectivity_ns', 'isi_violations', 'isolation_distance',
       'left_right_ccf_coordinate', 'lifetime_sparseness_dg',
       'lifetime_sparseness_fl', 'lifetime_sparseness_ns',
       'lifetime_sparseness_rf', 'lifetime_sparseness_sg', 'local_index_unit',
       'location', 'max_drift', 'mod_idx_dg', 'nn_hit_rate', 'nn_miss_rate',
       'on_off_ratio_fl', 'p_value_rf', 'peak_channel_id',
       'pref_image_multi_ns', 'pref_image_ns', 'pref_ori_dg',
 

In [5]:
# See the units dataframe for this session; this takes a few minutes to load.
session.units.head()

Unnamed: 0_level_0,amplitude_cutoff,max_drift,d_prime,waveform_halfwidth,waveform_velocity_above,cluster_id,local_index_unit,nn_miss_rate,silhouette_score,isolation_distance,...,ecephys_structure_id,ecephys_structure_acronym,anterior_posterior_ccf_coordinate,dorsal_ventral_ccf_coordinate,left_right_ccf_coordinate,probe_description,location,probe_sampling_rate,probe_lfp_sampling_rate,probe_has_lfp_data
unit_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
950910352,0.0577,34.38,4.576155,0.096147,-0.137353,6,6,0.008277,0.080869,69.455405,...,215.0,APN,8157,3521,6697,probeA,,29999.954846,1249.998119,True
950910364,0.065649,23.43,5.602703,0.274707,-0.61809,7,7,0.002786,0.153496,102.847616,...,215.0,APN,8154,3513,6698,probeA,,29999.954846,1249.998119,True
950910371,0.015509,57.44,5.061817,0.164824,0.274707,8,8,0.007975,0.089229,76.90761,...,215.0,APN,8146,3487,6701,probeA,,29999.954846,1249.998119,True
950910392,0.025891,33.65,4.219074,0.109883,0.0,11,11,0.002874,0.139601,65.671206,...,215.0,APN,8133,3444,6707,probeA,,29999.954846,1249.998119,True
950910435,0.010061,27.84,6.393051,0.123618,-0.068677,17,17,0.016699,0.146494,294.002222,...,215.0,APN,8110,3367,6719,probeA,,29999.954846,1249.998119,True


From here, you can subset for neurons in your region of interest (in the `structure_acronym` column). For example, `session.units[session.units.structure_acronym=='VISp']`.

## About this notebook

Code here taken from the [SWDB_2019 notebook](https://github.com/AllenInstitute/SWDB_2019/blob/master/DynamicBrain/Neuropixels_walkthrough.ipynb). Check it that notebook as well as [this one](https://allensdk.readthedocs.io/en/latest/_static/examples/nb/ecephys_session.html#Running-speed) for more examples of what you can do with this data.