# Pluralistic Approach to Decoding Motor Activity from Different Regions in the Rodent Brain
##### Project Contributors: Narotam Singh, Vaibhav, Rishika Mohanta, Prakriti Nayak

##### Done as part of [Neuromatch Academy](https://github.com/NeuromatchAcademy/course-content) July 13-31 2020

## Population Spike Code Approach Pipeline

The original data is from Steinmetz, Nicholas A., et al. "Distributed coding of choice, action and engagement across the mouse brain." Nature 576.7786 (2019): 266-273. It was then further cleaned to consider only recordings from motor related areas with more than 50 neurons from atleast 2 mice. We only considered the the open loop condition ie. data between stimulus onset and go cue to avoid representations of moving stimulus from appearing in the neural data we are analysing. 

We implemented a General Linear Model with Spiking History pipeline in ``python 3.6`` to decode the motor output (``wheel``) from the neural spike data from 50 randomly sampled neurons from 100 randomly sampled trials from different (Session, Brain Area) pairs. We implemented this using Cross Validated Ridge Regressor in the ``scikit-learn`` package.  The length of the temporal kernel ie. the spiking history required for the decoding models to decode optimally can vary from region to region. So we evaluate different kernel sizes between 50 to 250ms and choose the optimal kernel size for analysis.

### Import packages

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression,SGDRegressor,RidgeCV
from sklearn.linear_model import BayesianRidge
import matplotlib.pyplot as plt
from scipy.stats import linregress
from sklearn.metrics import r2_score
import pandas as pd
from numpy.random import default_rng

### Import Data

In [None]:
alldata = np.load('../cleaned_dataset/train.npz', allow_pickle=True)['arr_0']

### Defining Functions for Random Sample, Temporal History Transformation, and Training

In [None]:
## Function for Random Sampling of neurons and trials with train-test split
def sampler(data, N = 50, Train_N = 80, Test_N = 20, brain_region = None):  ## Samples for each Session
  rng = default_rng()
  trial_index = rng.choice(data['spks'].shape[1], size=100, replace=False)
  if brain_region:
    data_temp = data['spks'][data['brain_area']==brain_region,:,:]
  else:
    data_temp = data['spks']
  neuron_index = rng.choice(data_temp.shape[0], size=50, replace=False)
  train_trial_index = trial_index[0:Train_N]
  test_trial_index = trial_index[Train_N:-1 ]
  return neuron_index, train_trial_index, test_trial_index

## Function for returning the relevant Trial Data
def get_relevant_trial(data, neuron_idx, idx, start, end):
    motion = data["wheel"][0][idx]
    spikes = data["spks"][neuron_idx, idx,:]
    return spikes[:,start:end], motion[start:end]

## Function for Transforming Spiking Timeseries to Temporal History Data with d*10 ms kernels
def designmatrix(X,d=20):
   #print(X.shape)
   padded_stim = np.column_stack((np.zeros((X.shape[0],d-1)),X))
   X_design = np.zeros((X.shape[1],d*X.shape[0]))
   for t in range(X.shape[1]):
     X_design[t] = padded_stim[:,t:t+d].flatten()
   return np.column_stack((np.ones(X.shape[1]),X_design))

## Function for training given session data
def train_loop(data, neuron_index, train_trial_index, test_trial_index, d = 10, select_motion = True):
  start = int(data["stim_onset"]/data["bin_size"])
  model = RidgeCV()#SGDRegressor(penalty='l2')
  
  X_train,y_train = [],[]
  for idx in train_trial_index:
    end = int((data["stim_onset"]+data["gocue"][idx])/data["bin_size"])
    X_train_part, y_train_part = get_relevant_trial(data, neuron_index, idx, start, end)
    X_train_part = designmatrix(X_train_part,d = d)
    X_train.append(X_train_part)
    y_train.append(y_train_part)
  
  X_train = np.concatenate(X_train)
  y_train = np.concatenate(y_train)  
  model.fit(X_train,y_train)
  y_pred = model.predict(X_train)
  r2_value_train = r2_score(y_train, y_pred)
  rvalue_train = linregress(y_train,y_pred).rvalue

  X_test,y_test = [],[]
  for idx in test_trial_index:
    end = int((data["stim_onset"]+data["gocue"][idx])/data["bin_size"])
    X_test_part, y_test_part = get_relevant_trial(data, neuron_index, idx, start, end)
    X_test_part = designmatrix(X_test_part,d = d)  
    X_test.append(X_test_part)
    y_test.append(y_test_part)
    
  X_test = np.concatenate(X_test)
  y_test = np.concatenate(y_test)  
  y_pred = model.predict(X_test)
  r2_value = r2_score(y_test, y_pred)
  rvalue = linregress(y_test,y_pred).rvalue

  return model, r2_value, rvalue

### Summarizing the Data to Analyze
Here we go through the cleaned data and summarize the data.

In [None]:
for i in range(len(alldata)):
  dat = alldata[i]
  print(f"index {i}  brain areas {set(dat['brain_area'])} Neurons {dat['spks'].shape[0]}")

### Running GLM analysis for all (Session, Brain Area) pairs and find optimal kernel size

In [None]:
from tqdm import tqdm 

result = []
for jj in tqdm(range(3)):
  result = []
  for i in range(len(alldata)):
    dat = alldata[i]
    ba = set(dat['brain_area'])
    for bb in ba:
      neuron_index, train_trial_index, test_trial_index = sampler(dat, brain_region = bb)
      
      # Finding optimal 'd' for every brain region.
      best_d, best_coff, best_r2 = -1, 0, -np.inf
      
      for d in range(6, 25):
        #neuron_index, train_trial_index, test_trial_index = sampler(dat, brain_region = bb)
        try:
          model, r2, corr_coeff = train_loop(dat, neuron_index, train_trial_index, test_trial_index, d)
        except:
          pass
        if r2 > best_r2 :
          best_d, best_r2 = d, r2

      for _ in range(10):
        neuron_index, train_trial_index, test_trial_index = sampler(dat, brain_region = bb)
        try:
          model, r2, corr_coeff = train_loop(dat, neuron_index, train_trial_index, test_trial_index, best_d)
          result.append([jj, i, bb, best_d, [neuron_index], [train_trial_index], [test_trial_index], model, corr_coeff ,r2])
                  
        except:
          pass

df = pd.DataFrame(result)
df.columns = ["Exp_No","#Session_Number", "Brain_Areas", "Optimal_d", "#Neurons_Used","Train_index", "Test_index", "Model", "Correlation_Coefficient","R2_score"]

df.to_csv('../results/result.csv', index=False)