<a href="https://colab.research.google.com/github/IanQS/neuromatch_project/blob/main/steinmetz_modeling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Modeling of the Steinmetz dataset

- uses [Neuromatch Load Steinmetz Decisions](https://colab.research.google.com/github/NeuromatchAcademy/course-content/blob/main/projects/neurons/load_steinmetz_decisions.ipynb#scrollTo=DJ-jzsE5eLxX) as a base

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import zscore
from sklearn.decomposition import PCA
import concurrent.futures
from multiprocessing import Pool

!pip install -q ipython-autotime
%load_ext autotime

time: 320 µs (started: 2023-07-20 00:20:28 +00:00)


In [2]:
# @title Data Downloading And Stacking
import os, requests

fname = []
for j in range(3):
  fname.append('steinmetz_part%d.npz'%j)
url = ["https://osf.io/agvxh/download"]
url.append("https://osf.io/uv3mw/download")
url.append("https://osf.io/ehmw2/download")

for j in range(len(url)):
  if not os.path.isfile(fname[j]):
    try:
      r = requests.get(url[j])
    except requests.ConnectionError:
      print("!!! Failed to download data !!!")
    else:
      if r.status_code != requests.codes.ok:
        print("!!! Failed to download data !!!")
      else:
        with open(fname[j], "wb") as fid:
          fid.write(r.content)

all_ds = np.array([])
for j in range(len(fname)):
  all_ds = np.hstack((all_ds,
                      np.load('steinmetz_part%d.npz'%j,
                              allow_pickle=True)['dat']))

time: 49.9 s (started: 2023-07-20 00:20:28 +00:00)


# Dataset Description

(taken and modified from the Neuromatch Load Steinmetz Decisions notebook)

## High-level

`all_ds` contains 39 sessions from 10 mice, data from Steinmetz et al, 2019. Time bins for all measurements are 10ms, starting 500ms before stimulus onset. The mouse had to determine which side has the highest contrast. For each `curr_ds = all_ds[k]`, you have the fields below. For extra variables, check out the extra notebook and extra data files (lfp, waveforms and exact spike times, non-binned).

## Fields present (not all are used)

* `curr_ds['mouse_name']`: mouse name
* `curr_ds['date_exp']`: when a session was performed
* `curr_ds['spks']`: neurons by trials by time bins.    
* `curr_ds['brain_area']`: brain area for each neuron recorded.
* `curr_ds['ccf']`: Allen Institute brain atlas coordinates for each neuron.
* `curr_ds['ccf_axes']`: axes names for the Allen CCF.
* `curr_ds['contrast_right']`: contrast level for the right stimulus, which is always contralateral to the recorded brain areas.
* `curr_ds['contrast_left']`: contrast level for left stimulus.
* `curr_ds['gocue']`: when the go cue sound was played.
* `curr_ds['response_time']`: when the response was registered, which has to be after the go cue. The mouse can turn the wheel before the go cue (and nearly always does!), but the stimulus on the screen won't move before the go cue.  
* `curr_ds['response']`: which side the response was (`-1`, `0`, `1`). When the right-side stimulus had higher contrast, the correct choice was `-1`. `0` is a no go response.
* `curr_ds['feedback_time']`: when feedback was provided.
* `curr_ds['feedback_type']`: if the feedback was positive (`+1`, reward) or negative (`-1`, white noise burst).  
* `curr_ds['wheel']`: turning speed of the wheel that the mice uses to make a response, sampled at `10ms`.
* `curr_ds['pupil']`: pupil area  (noisy, because pupil is very small) + pupil horizontal and vertical position.
* `curr_ds['face']`: average face motion energy from a video camera.
* `curr_ds['licks']`: lick detections, 0 or 1.   
* `curr_ds['trough_to_peak']`: measures the width of the action potential waveform for each neuron. Widths `<=10` samples are "putative fast spiking neurons".
* `curr_ds['%X%_passive']`: same as above for `X` = {`spks`, `pupil`, `wheel`, `contrast_left`, `contrast_right`} but for  passive trials at the end of the recording when the mouse was no longer engaged and stopped making responses.
* `curr_ds['prev_reward']`: time of the feedback (reward/white noise) on the previous trial in relation to the current stimulus time.
* `curr_ds['reaction_time']`: ntrials by 2. First column: reaction time computed from the wheel movement as the first sample above `5` ticks/10ms bin. Second column: direction of the wheel movement (`0` = no move detected).  


The original dataset is here: https://figshare.com/articles/dataset/Dataset_from_Steinmetz_et_al_2019/9598406

In [3]:
regions = ["vis ctx", "thal", "hipp", "other ctx", "midbrain", "basal ganglia", "cortical subplate", "other"]
region_colors = ['blue', 'red', 'green', 'darkblue', 'violet', 'lightblue', 'orange', 'gray']
brain_groups = [["VISa", "VISam", "VISl", "VISp", "VISpm", "VISrl"],  # visual cortex
                ["CL", "LD", "LGd", "LH", "LP", "MD", "MG", "PO", "POL", "PT", "RT", "SPF", "TH", "VAL", "VPL", "VPM"], # thalamus
                ["CA", "CA1", "CA2", "CA3", "DG", "SUB", "POST"],  # hippocampal
                ["ACA", "AUD", "COA", "DP", "ILA", "MOp", "MOs", "OLF", "ORB", "ORBm", "PIR", "PL", "SSp", "SSs", "RSP","TT"],  # non-visual cortex
                ["APN", "IC", "MB", "MRN", "NB", "PAG", "RN", "SCs", "SCm", "SCig", "SCsg", "ZI"],  # midbrain
                ["ACB", "CP", "GPe", "LS", "LSc", "LSr", "MS", "OT", "SNr", "SI"],  # basal ganglia
                ["BLA", "BMA", "EP", "EPd", "MEA"]  # cortical subplate
                ]

# Assign each area an index
area_to_index = dict(root=0)
counter = 1
for group in brain_groups:
    for area in group:
        area_to_index[area] = counter
        counter += 1

# Figure out which areas are in each dataset
areas_by_dataset = np.zeros((counter, len(all_ds)), dtype=bool)
for j, d in enumerate(all_ds):
    for area in np.unique(d['brain_area']):
        i = area_to_index[area]
        areas_by_dataset[i, j] = True


time: 5.69 ms (started: 2023-07-20 00:21:18 +00:00)


In [4]:
DATASET_IDX = 11
curr_ds = all_ds[DATASET_IDX]

dt = curr_ds["bin_size"]
NUM_NEURONS_RECORDED = curr_ds["spks"].shape[0]
NUM_TRIALS = curr_ds["spks"].shape[1]
NUM_BINNED_TIMES = curr_ds["spks"].shape[2]

if DATASET_IDX != 11:
    raise Exception("Code is only meant for DATASET_IDX=11")
else:
    NUM_REGIONS = 4
    NUM_NEURONS_RECORDED = len(curr_ds["brain_area"])  # The string idx version of

brain_subregions = NUM_REGIONS * np.ones(NUM_NEURONS_RECORDED, )  # last one is "other"
for j in range(NUM_REGIONS):
  brain_subregions[
      np.isin(curr_ds['brain_area'], brain_groups[j])
      ] = j  # assign a number to each region


time: 2.6 ms (started: 2023-07-20 00:21:18 +00:00)


# Creating the dataset

1) Create the labels

2) Create a dataset dictionary where the keys are brain regions and the values are all the neuron readings that are in that region

3) Create a dataset dictionary where the keys are brain areas (sub-regions) and the values are all the neuron readings that are in that area/sub-region

In [6]:
LABELS = curr_ds["response"]  # RIGHT - NO_GO - LEFT (-1, 0, 1)
y = LABELS

time: 760 µs (started: 2023-07-20 00:23:20 +00:00)


In [9]:
def log_shapes(ds):
    _ds = ds['spks']
    print(f"All spikes shape: {_ds.shape}")
    _ds_brain_region = _ds[brain_subregions == 0]
    print(f"\t- Spike shape for sample brain region (0-th): {_ds_brain_region.shape}")

    _ds_0th_left_response = _ds_brain_region[:, y >= 0]
    print(f"\t- Spike shape for sample brain region (0-th) left responses: {_ds_0th_left_response.shape}")

    averaged_over_left_response = _ds_0th_left_response.mean(axis=(0, 1))
    print(f"\t- Averaged brain region (0-th) left responses: {averaged_over_left_response.shape}")

log_shapes(curr_ds)


All spikes shape: (698, 340, 250)
	- Spike shape for sample brain region (0-th): (145, 340, 250)
	- Spike shape for sample brain region (0-th) left responses: (145, 199, 250)
	- Averaged brain region (0-th) left responses: (250,)
time: 60.2 ms (started: 2023-07-20 00:23:35 +00:00)


In [29]:
# def dataset_by_brain_region(num_regions, ds):
#     spike_partitioned = {}  # brain region to spike mapping
#     for i in range(num_regions):
#         spks = ds["spks"][brain_subregions == j]
#         spikes_for_left_response = spks[:, y < 0]
#         spikes_for_right_response = spks[:, y > 0]
#         spikes_for_no_response = spks[:, y == 0]

#         spike_partitioned[regions[i]] = [
#             spikes_for_left_response,
#             spikes_for_no_response,
#             spikes_for_right_response
#         ]
#     return spike_partitioned

# brain_region_data_dict = dataset_by_brain_region(NUM_REGIONS + 1, curr_ds)


def dataset_by_subregion(arr_of_subregions, ds):
    spike_partitioned = {}  # brain region to spike mapping
    unique_subregions = set(arr_of_subregions)
    for subregion in unique_subregions:
        subregion_idxs = arr_of_subregions == subregion
        subregion_data = ds["spks"][subregion_idxs]

        spikes_for_left_response = subregion_data[:, y > 0]
        spikes_for_right_response = subregion_data[:, y > 0]
        spikes_for_no_response = subregion_data[:, y == 0]

        spike_partitioned[subregion] = [
            spikes_for_left_response,
            spikes_for_no_response,
            spikes_for_right_response
        ]
    return spike_partitioned

subregion_data_dict = dataset_by_subregion(curr_ds["brain_area"], curr_ds)

time: 81.8 ms (started: 2023-07-20 00:55:20 +00:00)


In [26]:

"""
["VISa", "VISam", "VISl", "VISp", "VISpm", "VISrl"],  # visual cortex
["CL", "LD", "LGd", "LH", "LP", "MD", "MG", "PO", "POL", "PT", "RT", "SPF", "TH", "VAL", "VPL", "VPM"], # thalamus
["CA", "CA1", "CA2", "CA3", "DG", "SUB", "POST"],  # hippocampal
["ACA", "AUD", "COA", "DP", "ILA", "MOp", "MOs", "OLF", "ORB", "ORBm", "PIR", "PL", "SSp", "SSs", "RSP","TT"],  # non-visual cortex
["APN", "IC", "MB", "MRN", "NB", "PAG", "RN", "SCs", "SCm", "SCig", "SCsg", "ZI"],  # midbrain
["ACB", "CP", "GPe", "LS", "LSc", "LSr", "MS", "OT", "SNr", "SI"],  # basal ganglia
["BLA", "BMA", "EP", "EPd", "MEA"]  # cortical subplate
"""

for k, v in subregion_data_dict.items():
    print(k, v[0].shape, v[1].shape, v[2].shape)


SUB (105, 141, 250) (105, 64, 250) (105, 135, 250)
VISam (79, 141, 250) (79, 64, 250) (79, 135, 250)
LH (18, 141, 250) (18, 64, 250) (18, 135, 250)
CA1 (50, 141, 250) (50, 64, 250) (50, 135, 250)
MD (126, 141, 250) (126, 64, 250) (126, 135, 250)
PL (56, 141, 250) (56, 64, 250) (56, 135, 250)
ACA (16, 141, 250) (16, 64, 250) (16, 135, 250)
VISp (66, 141, 250) (66, 64, 250) (66, 135, 250)
LGd (11, 141, 250) (11, 64, 250) (11, 135, 250)
root (100, 141, 250) (100, 64, 250) (100, 135, 250)
DG (65, 141, 250) (65, 64, 250) (65, 135, 250)
MOs (6, 141, 250) (6, 64, 250) (6, 135, 250)
time: 11.9 ms (started: 2023-07-20 00:45:54 +00:00)


In [28]:
for k, v in brain_region_data_dict.items():
    print(k, v[0].shape, v[1].shape, v[2].shape)

vis ctx (78, 141, 250) (78, 64, 250) (78, 135, 250)
thal (78, 141, 250) (78, 64, 250) (78, 135, 250)
hipp (78, 141, 250) (78, 64, 250) (78, 135, 250)
other ctx (78, 141, 250) (78, 64, 250) (78, 135, 250)
midbrain (78, 141, 250) (78, 64, 250) (78, 135, 250)
time: 1.1 ms (started: 2023-07-20 00:46:24 +00:00)
