<h3 align="center">NEU 437/537</h3>
<h4 align="center">Princeton University, Spring 2023</h4>

---
# Homework 1: Low Dimensional Dynamics during Decision Making
#### Due: **Friday, Mar 3rd at MIDNIGHT** (*10% off per day late*)
---


## Instructions
- Go to the menu File->Save a copy in Drive to make your own copy of the notebook that you can run and modify. Please prepare your homework submission completely within your own copy of that colab notebook.

- For each problem or sub-problem, please **limit yourself to one Code cell and/or one Markdown cell** as appropriate (switch between them by using the menu at the top, or the shortcuts `Ctrl+M M` for Markdown and `Ctrl+M B` for Code). 

- **Submitting your homework**:  Please submit an .ipynb file via the assignment tab in Canvas. (From your notebook, File->Download->Download .ipynb).  Late submissions will be penalized 10% per day.

- **Test before submmitting**: Before submitting, make sure to verify that your code runs without  errors by selecting `Runtime -> Restart & Run All`. 

- **Code Hygiene**: Giving variables human-readable names will make your code easier for both you and us to interpret. Similarly, when plotting, give your axes labels (using ```plt.xlabel()``` and ```plt.ylabel()```).

- **Looking up Documentation**: In several places, you are given suggestions for pre-existing Python packages to make use of to complete the assignment. Links are provided to documentation in those cases, which you will need to click on and read through.

## Setup

Import necessary packages and set plotting constants

In [None]:
# interactive debugger. uncomment this if you want errors to stop your code and give you interactive access to the workspace
%pdb off 

# import packages
import os
import numpy as np
import scipy.io as sio
import matplotlib as mpl
import matplotlib.pyplot as plt
from statsmodels.stats.proportion import proportion_confint

# set default font sizes
SMALL_SIZE = 12
MEDIUM_SIZE = 14
BIGGER_SIZE = 16
plt.rc('font', size=SMALL_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title

## Introduction: Neural Population Recordings during Auditory Evidence Accumulation in Rats

In this problem set we will be exploring dimensionality reduction of neural population recordings. 

We'll be analyzing a real dataset (documented further [here](https://github.com/Brody-Lab/adrian_striatum_analysis)), that consists of neural recordings from a rat performing an auditory decision making task called the ["Poisson Clicks" task](https://www.science.org/doi/10.1126/science.1233912?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed/) (see Fig. 1). 

**IMPORTANT NOTE: We ask that you do not distribute or make use of these data outside the context of this problem set.**

<figure>
<img src="https://github.com/Brody-Lab/adrian_striatum_analysis/raw/master/images/Poisson%20Clicks%20Task.jpeg" class="center" alt="schematic illustration of the Poisson Clicks task" height=330>
<figcaption align = "left"><b>Figure 1.</b> Schematic illustration of the Poisson Clicks Task</figcaption>
</figure>

<figure>
<img src="https://iiif.elifesciences.org/lax/59716%2Felife-59716-fig1-v2.tif/full/1500,/0/default.jpg" class="center" alt="schematic illustration of the Poisson Clicks task" height=330 >
<figcaption align = "left"><b>Figure 2.</b> Chronic Neuropixels Recordings in Rats. Figure taken from <a href="https://elifesciences.org/articles/59716/">Luo*, Bondy* et al (2020). eLife</a>. </figcaption>
</figure>


Briefly here's how the task works. The rat was placed in a behavior box with three "ports" it could interact with by inserting its nose into them. To initiate a trial, the rat had to poke into the center port and hold its nose there. After a variable delay, the rat was presented with a sequence of randomly timed auditory clicks from speakers to its left and right. The sequence of clicks could last from 0.2s to 1s.  At the end of the stimulus, the rat had to report whether there were more clicks to the left or right by poking its nose in either the left or right side port. The rat was motivated to perform the task through delivery of liquid reward for a correct choice. The task is designed to force the rats to gradually integrate auditory "evidence" provided by the experimenters in the form of the clicks. Thus it allows us to control the decision process while we record from the animal's brain.

The rat had a [Neuropixels](https://www.nature.com/articles/nature24636) probe chronically implanted in a part of its brain called the anterior dorsal striatum (ADS). Neuropixels probes are silicon probes with around 1000 recording sites distributed along a 1cm shank. We were able to record simultaneously from many ADS neurons at once (~130 in this example dataset) using this approach (see Fig. 2 for a visual depiction of the recording method). We chose ADS because it [had been previously shown](https://elifesciences.org/articles/34929) to contain single neurons whose firing rates correlate with the value of accumulated evidence for the animal's decision.

In this problem set, you'll go beyond single neuron responses to explore the representation of the evolving decision at the population level.


## Loading the data

First we'll load the data from Github. 

In [None]:
# load data from github repo
dataset_url = "https://github.com/Brody-Lab/adrian_striatum_analysis/raw/master/datasets/"
dataset_name = "pset1_data.mat"
print('Loading %s from %s\n' % (dataset_name,dataset_url))
system_call = "wget -O {dataset_name} {dataset_url}{dataset_name} > /dev/null".format(dataset_name=dataset_name,dataset_url=dataset_url)
os.system(system_call)
pset1_data = sio.loadmat(dataset_name,mat_dtype=True)

# load variables into workspace, removing useless dimensions and performing type conversions when necessary
def load_vars(data,var_names):
  vals = [];
  for i in range(len(var_names)):
    vals.append(np.squeeze(data[var_names[i]]))
  return tuple(vals) 

time_s,rat,sess_date,resolution_s,n_left_clicks,n_right_clicks,stim_off,went_right,response,smooth_std_s,is_correct = \
  load_vars(pset1_data,['time_s','rat','sess_date','resolution_s','n_left_clicks','n_right_clicks','stim_off','went_right','spikes','smooth_std_s','is_correct'])
stim_off = stim_off.astype('int');
went_right = went_right==1
evidence_strength =np.log(n_right_clicks/n_left_clicks) ;
n_evidence_bins = 6;
evidence_strength_bins = np.percentile(evidence_strength,np.linspace(0,100,n_evidence_bins+1))  #define 7 bins for breaking up trials according to momentary evidence strength
time_range_s=[0.1,0.75] # min and max time after stimulus onset (in seconds) to plot and for certain analyses

# get and print some basic information about the dataset
ntrials,nbins,ncells = response.shape
print('Dataset for rat %s from %s includes:\n - %d trials\n - %.1f s of spike data per trial\n - %d cells\n\nTemporal resolution is %.0f ms per time bin.\n\nSmoothed with %d ms causal Gaussian filter.' \
      % (rat,sess_date,ntrials,nbins*resolution_s,ncells,resolution_s*1e3,smooth_std_s*1e3) )

---
<h3>The data consists of several key variables, described in detail below.

*(TLDR: Feel free to skim this for now and come back to it as a reference as you work through the problem set.)*

---

**```response```**: an array of size ```ntrials``` $\times$ ```nbins``` $\times$ ```ncells```, where ```ntrials``` is the number of trials, ```nbins``` is the number of time bins per trial and ```ncells``` is the number of recorded cells. These are the recorded spiking data in response to the clicks stimulus, broken down by trial, timepoints and cell number. The raw spike counts have been smoothed with a causal half-Gaussian filter with 75 ms s.d. and divided by the bin duration (see ```resolution_s``` below) so that the values represent a smoothed estimate of instantaneous firing rate in spikes per second. The data is aligned to the time of stimulus onset (i.e. the first click) and extends for 1s. Each bin represents 5ms. Stimulus duration was variable, ranging from 0.2 s to 1s. On trials with stimulus duration shorter than 1s, the times after stimulus offset are filled with NaNs to maintain a uniform array size.

**```n_left_clicks```**: a vector of integers of length ```ntrials``` giving the number of left clicks on each trial.

**```n_right_clicks```**: a vector of integers of length ```ntrials``` giving the number of right clicks on each trial.

**```evidence_strength```**: a numeric vector of length ```ntrials``` giving the instantaneous evidence strength for each trial. This is analogous to motion coherence in the classic "random dot motion" task, and is equal to $\log(\frac{n\_right\_clicks}{n\_left\_clicks})$.

**```time_s```**: a numeric vector of length ```nbins``` giving the time (in seconds relative to stimulus onset) of each time bin 

**```time_range_s```**: a numeric vector of length 2 giving the starting and ending time (in seconds relative to stimulus onset) of data for plotting purposes and certain analyses. The default value is [0.1,0.75]. Times earlier than this contain little information about the decision and times after this contain too few trials for sufficient statistical power.

**```resolution_s```**: a numeric scalar giving the bin duration in seconds (0.005 in this case)

**```went_right```**: a Boolean vector of length ```ntrials``` specifying the animal's choice (```True``` for right, ```False``` for left).

**```is_correct```**: a Boolean vector of length ```ntrials``` specifying whether the animal's choice was rewarded (i.e. if the animal chose the side with more clicks).

## Examining the dataset


### Plotting the psychometric curve

First, let's verify that the animal can do the task. We'll make a plot showing how the subject's choices depended on the amount of accumulated evidence in the stimulus. We expect this to show that the rat reliably choose the side with more evidence, and did so more reliably when the accumulated evidence was stronger. This kind of plot is called a "psychometric curve" and we'll define an aptly named function (```plot_psychometric_curve```) to generate one.



In [None]:
def plot_psychometric_curve(signal,went_right,bins=9):

  # calculate binned choice fractions
  edge = np.max(np.abs(signal))+1
  bin_edges = np.linspace(-edge,edge,bins+1)
  choice_frac = np.zeros(bins)
  n = np.zeros(bins)
  for i in range(0,bins):
    idx = (signal>bin_edges[i]) & (signal<=bin_edges[i+1])
    n[i] = np.count_nonzero(idx)
    choice_frac[i] = np.mean(went_right[idx])
  bin_mid=(bin_edges[:len(bin_edges)-1]+bin_edges[1:])/2
  
  # get binomial confidence intervals
  err = proportion_confint(count=n*choice_frac, nobs=n, alpha=0.05)  
  
  # plot
  plt.figure()
  plt.errorbar(bin_mid,choice_frac,yerr=abs(err-choice_frac),marker="o")
  plt.ylim(0,1)
  plt.xlim(bin_mid[0]-1,bin_mid[-1]+1)
  plt.xlabel("# right - # left clicks")
  plt.ylabel("Fraction chose right")

plot_psychometric_curve(n_right_clicks - n_left_clicks,went_right)

Indeed, this shows that the animal's choices are systematically modulated by the total stimulus evidence (i.e. the difference between number of right and left clicks). This is one (but by no means the only) piece of evidence demonstrating that rats do indeed perform gradual evidence accumulation to perform the Poisson Clicks task.

### Looking at PETHs

As we said earlier, we know that single neurons in ADS (like those in monkey parietal cortex during the classic random dot motion task) have firing rates that correlate with the value of accumulated evidence. To visualize this, we can plot the firing rate of a neuron as a function of time since the onset of the stimulus. Such plots are called "peri-event time histograms" (or PETHs).

We'll break down trials into four groups based on the strength of momentary evidence (i.e. the value of ```evidence_strength```, defined below) and only include correct trials. 

In [None]:
### THIS CODE CELL DEFINES VARIABLES AND FUNCTIONS NEEDED FOR PLOTTING PETHs. 
### YOU DON'T NEED TO HAVE A DEEP UNDERSTANDING OF WHAT IS GOING ON HERE TO COMPLETE THE PROBLEM SET. 

# define some constants needed for PETH plotting
example_cells=[3,21,26,108,109] # example choice-selective cells for T219_12_20_2019 pset1_data.mat (zero-based index)
legend_str=["strong left evidence","middle left evidence","weak left evidence","weak right evidence","middle right evidence","strong right evidence"] # trial condition names
cmap = mpl.cm.get_cmap('viridis')  
colors=cmap(np.linspace(0,1,n_evidence_bins))  
min_time_bin = np.count_nonzero(time_s<=time_range_s[0])
max_time_bin = np.count_nonzero(time_s<time_range_s[1])

def get_peth(response,nboot=int(1e2),q=[25,75]):
  # calculates the PETH (internal function used by plot_peth_example_cells)
  # response should be an array of size ntrials x nbins (i.e. the response of one cell for all trials and timepoints)
  # nboot defines the number of bootstrap resamples to use for calculating error bars
  # q defines the bootstrap coverage interval of the error bar in percentiles
  boots=np.empty((nboot,max_time_bin-min_time_bin))
  peth = np.empty((n_evidence_bins,max_time_bin-min_time_bin))
  peth[:] = np.nan
  lower = peth.copy()
  upper = peth.copy()
  for i in range(0,n_evidence_bins):
    trial_idx = (evidence_strength>=evidence_strength_bins[i]) & (evidence_strength<evidence_strength_bins[i+1]) & is_correct;
    this_response = response[trial_idx,min_time_bin:max_time_bin]
    ntrials=np.count_nonzero(trial_idx)
    for k in range(0,nboot):
      idx = np.random.randint(low=0, high=ntrials, size=(ntrials,));
      boots[k,:] = np.nanmean(this_response[idx,:],axis=0)
    lower[i,:]=np.percentile(boots,q[0],axis=0)
    upper[i,:]=np.percentile(boots,q[1],axis=0)
    peth[i,:] = np.nanmean(this_response,axis=0)    
  return peth, lower, upper

def plot_peth(data,xlabel="Time after stimulus onset (s)", ylabel="Spikes/s", legend_str = legend_str):
  # makes a figure with a subplot for each cell,
  # illustrating its smoothed response to the stimulus,
  # with trials broken down by signal strength.
  # data is an array of size ntrials x nbins x m where m indexes cells. 
  # cells indexes which cells in data to be plotted.
  xs=np.arange(min_time_bin,max_time_bin)*resolution_s     
  peth,lower,upper = get_peth(data)
  plt.gca().set_prop_cycle('color',colors)     
  for i in range(0,n_evidence_bins):   
    plt.fill_between(xs,lower[i,:],upper[i,:])    
  plt.xlabel(xlabel)
  plt.ylabel(ylabel)  
  plt.legend(legend_str);    

def plot_peth_example_cells(data,cells):
  # makes a figure with a subplot for each cell,
  # illustrating its smoothed response to the stimulus,
  # with trials broken down by signal strength.
  # data is an array of size ntrials x nbins x m where m indexes cells. 
  # cells indexes which cells in data to be plotted.
  fig, axs = plt.subplots(1, len(cells), figsize=(30,6))
  for cellno in range(0,len(cells)):
    plt.sca(axs[cellno])
    plot_peth(np.squeeze(data[:,:,cells[cellno]]))
    axs[cellno].set_title(("cell {cellno:d}").format(cellno=cells[cellno]));
    if cellno>0:
      plt.xlabel("")
      plt.ylabel("")  
      plt.legend([]);           

plot_peth_example_cells(response,example_cells)       


You can see that the example neurons each has a preferred choice -- that is, they fire more on trials when the evidence favored that choice. More importantly, the rate at which their firing rate changes depends on the momentary strength of evidence favoring that choice.

From data like these, we conclude that these neurons reflect the value of an internal "decision variable" that is integrating the momentary sensory evidence and which could be used by the animal to decide which way to go at the end of the trial.

## Question 1: Reducing the dimensionality of the data

If it's really the case, as these PETHs seem to indicate, that ADS neuron responses reflect a "latent" decision process, we should be able to reduce the dimensionality of the data without losing too much information.

To do this, we'll start by performing principal components analysis (PCA). This is an unsupervised approach which finds an orthogonal set of axes in neural state space such that the data has most variance along axis 1, the second most along axis 2, and so forth. These new axes are called the "principal components." By confining subsequent analysis to just the top set of principal components, we get a low-dimensional representation of the data that preserves the maximum variance.



### Data preprocessing
Before we perform PCA, we'll need to do a bit of data wrangling/processing.

First, while we want to perform PCA on a response vector $\vec{r}(t)$ where $t$ can be any time during the recording, the variable ```response``` is an array of size ```ntrials``` $\times$ ```nbins``` $\times$ ```ncells```. The computation will be much easier if we define a version of ```response``` has a single time dimension that concatenates across all timepoints in the dataset.  Let's define a variable called ```r_t``` that contains the same data as ```response``` but has dimensions (```ntrials``` $*$ ```nbins```) $\times$ ```ncells```. 



In [None]:
r_t = response.reshape((ntrials*nbins,ncells))

Next, we need to standardize $\vec{r}(t)$. Often when using PCA, data is first z-scored (so that each feature has zero-mean and unit-variance). This prevents the PCs from being trvially biased towards the features that happen to have a greater range.  However, this isn't really a desirable approach for neural data. This is because neurons with low firing rates have variance that is dominated by shot noise, so treating their variance equally with high-firing rate neurons will just add noise. Instead, we'll define a function ```normalize_spikes``` that preserves some, but not all, of the firing rate variance across the population. Then we'll apply this to  ```r_t```.

In [None]:
def normalize_response(response,norm_factor=5):
  # transforms input data to have zero mean and variance that scales with its normalized range (range+norm_factor)
  # increasing norm_factor will make the output less sensitive to the data range
  # approach taken from: Russo et al. (2018). Neuron. 
  # response must be of size ntimepoints x ncells
  norm_response=response.copy()
  for cell in range(0,ncells): # loop over cells
    this_response = response[:,cell]
    this_response = this_response - np.nanmean(this_response); # subtract mean
    ptp = np.nanmax(this_response) - np.nanmin(this_response)
    norm_response[:,cell] = this_response/(ptp+norm_factor); # normalize variance
  return norm_response

r_t = normalize_response(r_t);

Argh! One last bit of data wrangling.  Because the stimuli have different lengths, but it was desirable to make a uniform-sized array of responses, some trials have fewer than ```nbins``` elements and have been padded with NaNs. We'll define a variable ```missing_rows``` that indexes the timepoints (rows) of ```r_t``` that contain NaNs so we can remove them later on.

In [None]:
missing_rows = np.isnan(r_t[:,0]);

### Question 1a: Computing the PCs

Phew! Now we can find the principal components $V$ of $\vec{r}(t)$ (i.e. the variable ```r_t```).

To jog your memory, recall that the principal components (PCs) are equivalent to the eigenvectors of $\Sigma$, the covariance matrix of $\vec{r}(t)$. The variance of the data projected on the PCs are the eigenvalues of $\Sigma$.

In other words, when we diagonalize $\Sigma$ (i.e. define $V$ and $\Lambda$ such that $\Sigma = V \Lambda V^T $), then the columns of $V$ are the eigenvectors (principal components) and the diagonal elements of the diagonal matrix $\Lambda$ are the eigenvalues (principal component variances). (Note that the columns of $V$ and $\Lambda$ are sorted so that the principal component variances are in descending order, by convention.) 

In the code block below, compute the full matrix of PCs $V$. This should be a square matrix of size ```ncells``` $\times$ ```ncells```. (Remember to exclude the rows of the data indexed by ```missing_rows``` in your calculation. IOW, use ```r_t[~missing_rows,:]``` instead of ```r_t```).

HINT: While you could find the answer to this and later questions by implementing the linear algebra in code, there are packages in Python that will do most of the work of PCA for you. We suggest checking out [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) from the scikit-learn package. (Although note that this package returns the principal components as the right eigenvectors of $\Sigma$ (that is $V^T$ instead of $V$) 

<font color='red'>YOUR ANSWER IN CODE BELOW (3 pts)</font>

In [None]:
###### YOUR CODE HERE #######

# Question 1a

# calculate PCs
# V = 

### Question 1b: Principal Component Variances

In the code block below, make a plot of the cumulative fraction of explained variance of $\vec{r}(t)$ as a function of the number of PCs. 

<font color='red'>YOUR ANSWER IN CODE BELOW (2 pts)</font>

In [None]:
###### YOUR CODE HERE #######

# Question 1b

# plot cumulative fraction variance explained as a function of number of PCs


What does the plot above look like? Roughly how many PCs do we need to explain 70% of the variance in the original data? How do you interpret this?

<font color='red'>YOUR ANSWER IN TEXT HERE (2 pts)</font>

### Question 1c: Computing the PC Scores

Next, in the code block beloW:
- Compute the principal component scores  $\vec{s}(t) = \vec{r}(t)  V$. This is the transformation of the data in the new coordinate space defined by the principal components. (Again, remember to exclude the missing rows of the data indexed by ```missing_rows``` in your calculation.) Name the variable corresponding to the principal component scores ```s_t```.
- Then once you've obtained ```s_t```, reinsert missing values (using ```np.nan```) at the timepoints indexed by ```missing_rows``` and reshape the array to be the same size as the original ```ntrials``` $\times$ ```nbins``` $\times$ ```ncells``` array ```response```.

<font color='red'>YOUR ANSWER IN CODE BELOW (2 pts)</font>

In [None]:
###### YOUR CODE HERE #######

# Question 1c
# calculate PC scores

# initialize s_t with NaNs
s_t = np.empty(r_t.shape)
s_t[:]=np.nan

# fill in non-missing rows with the transformed data
s_t[~missing_rows,:] = 

# reshape to be size of original "response" variable (ntrials x nbins x ncells)
s_t = 

### Question 1d: Visualizing the PC Scores

Use the function ```plot_peth_scores_2d``` (defined in the code block below) to make a 2d plot illustrating the neural population trajectory in a space spanned by the first two PCs. Then repeat this with any other pair of PCs.

<font color='red'>YOUR ANSWER IN CODE BELOW (2 pts)</font>

In [None]:
###### YOUR CODE HERE #######

# Question 1d

def plot_peth_scores_2d(s_t,PC1,PC2):
  # makes a 2d plot showing the trajectory of the population along two PCs, with trials broken down by evidence strength
  # s_t is an array of PC scores of size ntrials x nbins x npcs. 
  # PC1 and PC2 are the one-based indices giving the desired pair of PCs to plot (i.e. lowest possible value is 1)
  # example: plot_peth_scores_2d(scores,1,2) will make a plot of the population trajectory across PCs 1 and 2
  peth,_,_ = get_peth(s_t[:,:,PC1-1],nboot=1)
  peth2,_,_ = get_peth(s_t[:,:,PC2-1],nboot=1)
  plt.gca().set_prop_cycle('color',colors)     
  plt.plot(np.transpose(peth),np.transpose(peth2),linewidth=5)    
  plt.xlabel("PC {pc:d}".format(pc=PC1))
  plt.ylabel("PC {pc:d}".format(pc=PC2))
  plt.legend(legend_str);    

# plot PC 1 versus PC 2

# plot some other PCs (use same axis range for better comparison)



Describe what you observe in the plots you just made. What do you conclude about how the evolving decision process is represented in this population of neurons?

<font color='red'>YOUR ANSWER IN TEXT HERE (2 pts)</font>

## Question 2: Projecting Neural Activity on the "Choice Axis"

In Question 1, we performed PCA, a type of unsupervised dimensionality reduction. Now we'll perform **targeted dimensionality reduction**, where we seek to find a low-dimensional subspace that best aligns with some feature extrinsic to the neural data.

Specifically, we'll use **logistic regression** to define an axis in neural state space that best predicts the animal's choices. In the neuroscience of decision making, this axis is sometimes called **"the choice axis"** and the projection of the population onto that axis **"the decision variable"** or "DV" for short.

We'll define the vector $\vec{\beta}$ as the set of weights defining the choice axis in neural state space (see Fig. 2, left panel). $DV(t)$ is the projection of the neural population response on trial $t$ onto this axis: that is, $ DV(t) = \vec{r}(t) \vec{\beta}  $. The goal is to find $\vec{\beta}$ that maximizes our ability to predict the animal's choices from $DV(t)$. (Since there is only one choice per trial, we'll use $t$ to index trials in Question 2, and we'll let $\vec{r}(t)$ be the time-averaged response of the neural population on trial $t$). 

Because $DV(t)$ can take any real value while choices are binary, we'll model them as being related in the following way: $ p(R) = logistic(DV(t)) = \frac{1}{1+e^{-DV(t)}} $. Under this model, the probability $p(R)$ of a rightward choice is a sigmoidal function of the decision variable (see Fig. 2, right panel). 

<figure>
<table><tr>
  <td bgcolor="#FFFFFF"><img src="https://github.com/Brody-Lab/adrian_striatum_analysis/raw/master/images/logistic%20regression%201.png" class="center" alt="Schematic of Choice Axis in Neural State Space" height="400"></td>
<td bgcolor="#FFFFFF">  <img src="https://github.com/Brody-Lab/adrian_striatum_analysis/raw/master/images/logistic%20regression%202.png" class="center" alt="Schematic of Choice Axis in Neural State Space" height="370"></td>
</tr></table>

<figcaption><b>Figure 2.</b> <i>Left panel</i> Illustration of the choice axis, defined by vector $\vec{\beta}$, in neural state space. The projection of the average neural response on trial $t$ onto this axis defines the magnitude of the decision variable at that time point. The grey square indicates the plane defined by $DV=0$.  <i>Right panel</i> The probability of a rightward choice is a sigmoidal function of the decision variable. A nice feature of the logistic model is that, for $DV=\alpha$, a rightward choice is $e^\alpha$ times as likely than a leftward choice.</figcaption></figure>


### Question 2a: Calculating the choice axis using principal components regression

As we learned in problem 1, neural responses are largely confined to a low-dimensional manifold in neural state space. Therefore, it seems reasonable to expect that $\vec{\beta}$ will also live along that manifold. (In other words, it would be weird if the dimension in neural state that best predicted choice was one of the low-variance dimensions in the data). Therefore, instead of trying to estimate $\vec{\beta}$ in the traditional manner outlined above, we're actually going to estimate $\vec{\beta}_{PCR}$ such that $ DV(t) =   \vec{s_L}(t)\space \vec{\beta}_{PCR}$ where $\vec{s_L}(t)$ are the $L$ PC scores corresponding to the top $L$ principal components of $\vec{r}(t)$. This explicitly imposes the constraint that the choice axis must live in the subspace spanned by the top $L$ principal components of the data. This is referred to as principal components regression or PCR.

One way to think about PCR is as a form of model shrinkage, akin to other methods you may have heard of (like ridge regression). Like those other methods, we are setting small coefficients to 0, under the assumption that small coefficients are likely capturing noise rather than signal. PCR provides a specific way to deciding which coefficients to set to 0, specifically those that correspond to low-variance principal components. 

As we mentioned above, for the purposes of Question 2, $t$ now indexes trials rather than time bins. In the code block below, take the scores from Question 1 (i.e. the variable ```s_t```) and average across timepoints, obtaining a single value for each trial and PC. Save this in a new variable called ```s_t_average``` of size ```ntrials``` $\times$ ```ncells```. Ignore the first 0.2s of data from each trial, when the decision process is still early in its evolution and there is little information with which to decode choice. 

<font color='red'>YOUR ANSWER IN CODE BELOW (1 pt)</font>

In [None]:
###### YOUR CODE HERE #######

# Question 2a, part 1

# remove first 0.2 s from each trial of PC scores and average over time bins (hint: use np.nanmean instead of np.mean to ignore missing time bins)
# s_t_average =  

Now you'ready to perform (logistic) PCR using the time-averaged PC scores (the variable ```s_t_average```) as the set of predictors and the the variable ```went_right``` (i.e. the animal's choices) as the response. The first question we want to address is: how many PCs (columns of ```s_t_average```) should we use? This will tell us how small a subspace of the neural state space is sufficient for decoding choice.

In the code block below:
- Fit a series of models from L = 1 to L = ```ncells```, each time using the first L columns of ```s_t_average``` as the predictors. Save the model accuracy (i.e. fraction of correctly predicted choices) for each of the L models.
- Define a variable ```best_L``` which is the value of L that leads to the best cross-validated model accuracy and save the vector of ```best_L``` estimated coefficients ($\vec{\beta}_{PCR}$) of that model as ```B_PCR```.
- Finally, plot the model accuracy as a function of L.

It is important that you calculate the model accuracy on held-out data so that it reflects the model's ability to generalize, rather than its ability to fit every wiggle in the training data. To do this, we'll use ten-fold cross-validation.  For more about cross-validation and the problem of overfitting that it helps address, there are many resources on the web, like [this Nature article](https://www.nature.com/articles/nmeth.3968).

There are many packages for doing logistic regression in Python, but we'll use [LogisticRegressionCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html) from the scikit-learn package. This will fit each of the L models in a single line (including the cross-validation step). You will need to click the link above and read over the documentation to be able to complete this question.

<font color='red'>YOUR ANSWER IN CODE BELOW (4 pts)</font>

In [None]:
# Note: Don't be surprised if this code cell takes a minute or two to execute!
###### YOUR CODE HERE #######

# import LogisticRegressionCV and turn off annoying warnings
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import KFold
from warnings import filterwarnings
filterwarnings('ignore') # suppresses the useless barage of warnings from LogisticRegressionCV

# looping over number of pcs, calculate choice prediction accuracy (under k-fold cross-validation)
K = 10;
cv_obj = KFold(n_splits=K, shuffle=True, random_state=1)
model = LogisticRegressionCV(cv=cv_obj,Cs=[float('inf')],scoring='accuracy')
accuracy=np.empty(ncells)
best_accuracy=0
for L in range(ncells):
  # fit the model (initialized above) to the animal's choices using the first L PC scores averaged across trials (i.e. the first L columns of s_t_average)
  # YOUR CODE HERE, use LogisticRegressionCV's "fit" method

  # then calculate the cross-validated accuracy of the fitted model (code provided)
  accuracy[L]=np.mean(model.scores_[1]) # take the average accuracy across the K cros-validation folds

  # if the current value of accuracy is the best yet,
  # define best_L as the current number of PCs used
  # and define B_PCR as the current model coefficients
  if (YOUR LOGICAL STATEMENT HERE):
    best_accuracy=
    B_PCR = 
    best_L =

# plot choice prediction accuracy versus number of PCs
# YOUR CODE HERE

What do you observe in the plot you just made? Does choice prediction accuracy increase monotonically as the number of PCs included in the model grows? If not, why not?

<font color='red'>YOUR ANSWER IN TEXT (2 pts)</font>

### Question 2b: Calculate and plot the "decision variable" trajectory on single trials

Now, in the code block below:
- Compute $ DV(t) =   \vec{s_L}(t)\space \vec{\beta}_{PCR}$. In words, this means taking the best-fitting coefficients above as our estimate of the choice axis ($\vec{\beta}_{PCR}$) and project the ```best_L``` PC scores onto this axis to define a "decision variable." Call the new variable containing this projection ```DV_t```. This variable should be of size ```ntrials``` $\times$ ```nbins```. NOTE: since we averaged across time bins, ```s_t_average``` only contains a single timepoint per trial. But we want to visualize the evolution of the decision variable across time on single trials. So let's instead project the unaveraged PC scores (i.e. variable ```s_t``` from Question 1) onto $\vec{\beta}_{PCR}$ (or to be more precise those columns of ```s_t``` corresponding to the top ```best_L``` PCs). 
- In a single plot, plot the trajectory of the decision variable on the first 20 right-choice trials and the first 20 left-choice trials. Use the variable ```time_s``` as the x-values and confine the x-axis range to ```time_range_s```. Use separate colors for the two trial types.
- Finally, pass the variable ```DV_t``` to the function ```plot_peth``` defined earlier to make a plot showing the average trajectory across trials separated by evidence strength.

<font color='red'>YOUR ANSWER IN CODE BELOW (4 pts)</font>

In [None]:
###### YOUR CODE HERE #######

# Question 2b

### Project the first L PC scores (i.e. "s_t" but including only the first L elements of its last dimension) onto the choice axis (i.e. "B_PCR") to get an estimate of the decision variable (DV_t)
### By "project A onto B" we mean "take the dot product of A and B"
B_PCR = np.squeeze(B_PCR) # np.squeeze removes singleton dimension to make B_PCR one-dimensional
DV_t =   # (Hint: use np.dot)

# plot decision variable trajectory on first 20 left-choice trials and first 20 right-choice trials
#DV_t_left = YOUR CODE HERE # get values of DV_t, selecting only the trials where the animal went left
#DV_t_right = YOUR CODE HERE # get values of DV_t, selecting only the trials where the animal went right 
plt.figure()
plt.plot(time_s,np.transpose(DV_t_right[:20,:]),'y-');
plt.plot(time_s,np.transpose(DV_t_left[:20,:]),'b-');
plt.xlim(time_range_s)

# add x and y labels
# YOUR CODE HERE

# plot trial-average decision variables broken down by momentary evidence strength (using function "plot_peth")
# YOUR CODE HERE


What do you observe in the two plots above? Are there any trials in the first plot where the trajectory of DV seems to change sign? How does the evidence strength influence the way the decision variable evolves across time? Are your observations consistent with the idea that the decision variable gives us a readout of the ongoing decision process in the animal's brain? Why or why not?

<font color='red'>YOUR ANSWER IN TEXT HERE (2 pts)</font>

## EXTRA CREDIT Question 3: Analyzing putative "changes of mind" on single trials

Let's explore the cases where DV changes sign on single trials. These sign changes have been interpreted in past studies as putative **"changes on mind" (CoMs)**: that is, moments in time where the subject's provisional decision changes. If that is really true, it shouldn't occur randomly but instead should exhibit some statistical regularities. For one, we'd expect CoMs to occur less frequently as the trial progresses , since the subject will have had more time to develop confidence in their choice. By similar reasoning, we'd expect to see CoMs less often on trials with stronger evidence.

In the code block below, determine whether or not the data support these predictions:
- First, define a variable ```CoMs``` that contains a 1 for time points in each trial when ```DV_t``` changes sign and a 0 otherwise.
- Make a plot showing the frequency of CoMs as a function of time across trials (i.e. plot the value of ```CoMs``` averaged over trials).
- Make a plot showing the per-timepoint frequency of CoMs as a function of evidence strength. To do this, group trials into three groups according to evidence strength (i.e. according to ```np.abs(evidence_strength)```) and then, within each such group, find the average value of ```CoMs``` across trials and timepoints.

**NOTE**: Confine your CoM analysis to time bins in the range given by ```time_range_s``` (defined earlier). Timepoints earlier than this contain too many CoMs to be meaningful (since very little evidence has been presented yet) and timepoints after this include data from too few trials for sufficient statistical power.

<font color='red'>YOUR ANSWER IN CODE BELOW (5 pts)</font>

In [None]:
###### YOUR CODE HERE #######

# Question 3
# Analyzing CoMs

# index CoMs
# CoMs = 

# plot frequency of CoMs across time bins

# plot frequency of CoMs by evidence strength


What do you observe in the plots above? Are these observations consistent with the predictions we made?

<font color='red'>YOUR ANSWER IN TEXT (2 pts)</font>

---
*fin*