<a href="https://colab.research.google.com/github/IanQS/neuromatch_project/blob/main/NMA_2023_IBL_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Download the external data sources

Downloads the data in parallel

In [18]:
!echo "https://ibl.flatironinstitute.org/public/mainenlab/Subjects/ZFM-01576/2020-12-01/001/alf/probe00/pykilosort/spikes.clusters.9f648cc5-9574-410a-9f5c-717cb5e1f7b8.npy" >> srcs.txt
!echo "https://ibl.flatironinstitute.org/public/mainenlab/Subjects/ZFM-01576/2020-12-01/001/alf/probe00/pykilosort/spikes.times.cbe0311b-075a-4f3d-a825-ebf5666990a4.npy" >> srcs.txt

!cat srcs.txt | xargs -n 1 -P 2 wget -q

!mv "spikes.times.cbe0311b-075a-4f3d-a825-ebf5666990a4.npy" spike_times.npy

# Setup

First install the IBL pipeline, which NMA curated to only have the behavioral data, we'll have to import the database

In [19]:
# install IBL pipeline package to access and navigate the pipeline
!pip install --quiet nma-ibl ONE-api ibllib

Configure datajoint to link up with the NMA IBL database

In [20]:
import datajoint as dj
dj.config['database.host'] = 'datajoint-public.internationalbrainlab.org'
dj.config['database.user'] = 'ibl-public'
dj.config['database.password'] = 'ibl-public'

from nma_ibl import reference, subject, action, acquisition, data, behavior, behavior_analyses

In [21]:
#imports here
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
import numpy as np
import tqdm


# Turn off logging, this is a hidden cell on docs page
import logging
logger = logging.getLogger('ibllib')
logger.setLevel(logging.CRITICAL)


# We should load the data from NMA (IBL Behavior) and the data from IBL themselves
# (the cleaned up spikes) below in a way that makes sense
#here we run imports for the IBL ephys data
from one.api import ONE
from brainbox.io.one import SpikeSortingLoader
from ibllib.atlas import AllenAtlas

# Spike Dataset Construction

- creates a sliding window of the spikes. We can align these with the responses from the subject

In [None]:
data = np.load('spike_times.npy')
data.shape

def construct_window(flattened_np_arr: np.ndarray, time_interval=0.1, window_length = 25, dataset_size=10):
    dataset = []
    curr_window = []
    start_interval = 0.0
    num_iters = 0
    pbar = tqdm.tqdm(total=dataset_size)
    while len(dataset) < dataset_size:
        while len(curr_window) < window_length:
            num_spikes = len(flattened_np_arr[np.where((start_interval <= flattened_np_arr) & (flattened_np_arr < start_interval + time_interval))])
            start_interval += time_interval
            num_iters += 1
            curr_window.append(num_spikes)

        dataset.append(curr_window)
        curr_window = []
        if num_iters > 1000:
            break
        pbar.update(1)
    return dataset


construct_window(
    data, time_interval=0.001, window_length = 10, dataset_size=10
)

Now we set up the interface with the ephys database, first we authenticate with the public password, then make a list of **probe IDs** with ```pids```, load a particular **pid** with ```SpikeSortingLoader()```

In [24]:
ONE.setup(base_url='https://openalyx.internationalbrainlab.org', silent=True)
one = ONE(password='international')
# one = ONE()
ba = AllenAtlas()
pids = [
   '1a276285-8b0e-4cc9-9f0a-a3a002978724',
   '1e104bf4-7a24-4624-a5b2-c2c8289c0de7',
   '5d570bf6-a4c6-4bf1-a14b-2c878c84ef0e',
   '5f7766ce-8e2e-410c-9195-6bf089fea4fd',
   '6638cfb3-3831-4fc2-9327-194b76cf22e1',
   '749cb2b7-e57e-4453-a794-f6230e4d0226',
   'd7ec0892-0a6c-4f4f-9d8f-72083692af5c',
   'da8dfec1-d265-44e8-84ce-6ae9c109b8bd',
   'dab512bd-a02d-4c1f-8dbc-9155a163efc0',
   'dc7e9403-19f7-409f-9240-05ee57cb7aea',
   'e8f9fba4-d151-4b00-bee7-447f0f3e752c',
   'eebcaf65-7fa4-4118-869d-a084e84530e2',
   'fe380793-8035-414e-b000-09bfe5ece92a',
]
pid = pids[0]
eid, name = one.pid2eid(pid)

sl = SpikeSortingLoader(pid=pid, one=one, atlas=ba)
spikes, clusters, channels = sl.load_spike_sorting()
clusters = sl.merge_clusters(spikes, clusters, channels)

After loading in the data sets we should extract relevant details to structure that's gonna work easier with numpy and scikitlearn.

w1d3 notebook loads the Steinmetz data as a dictionary, ```spikes```: an array of normalized spike rates with shape (n_trials, n_neurons), and ```choices```: a vector of 0s and 1s, indicating the animal's behavioural response, with length n_trials.



In [25]:
#squish data into something easy to work with, a dictionary of arrays works

Then just train a model and instead of just taking vanilla accuracy we can use cross-validation

In [26]:
##   accuracies = cross_val_score(LogisticRegression(penalty=None), spikes, choices, cv=8)  # k=8 cross validation

I can imagine we're gonna have WAY more features than samples if we take each neuron to be a feature