
![Image](./resources/cropped-SummerWorkshop_Header.png)

<h1 align="center">Encoding </h1> 
<h2 align="center"> Day 1, Morning Session. SWDB 2023 </h2> 

<h3 align="center">Monday, August 21, 2023</h3> 

In [None]:
# We need to import these modules to get started
import numpy as np
import pandas as pd
import os
import platform
import matplotlib.pyplot as plt
%matplotlib inline

# This patch of code just ensures we get an easy to read font size for the duration of the notebook
import matplotlib
font = {'weight' : 'bold',
        'size'   : 15}
matplotlib.rc('font', **font)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>0 Today</h2>

<p>Hubel and Wiesel found orientation and direction selective cells using single wire recordings in the cat primary visual cortex.

To do this, they used a "bespoke" technique - honing in on an individual cell with a high impedance electrode, then manually determining their "best" stimulus to drive that cells behavior. 

In our phasing, we will say that Hubel and Wiesel showed that their exist neurons in V1 that "encode" direction selective information. This was (and still is!) exciting - but it also raises more questions.

Using this idea as a starting place, this Workshop will show how to scale this kind of analysis to survey a population of neurons, and highlight some of the challenges that go with this.

</div>


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<h2>0.1 Our Questions here </h2>

(1) Do neurons in the mouse visual cortex encode direction?

(2) How reliable are these responses?

(3) What do these responses look like in the neural population? Is this a common phenomena? Is it consistent across cells?

(4) How can we mathematically formalize this analysis?

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>All of out examples today will use the Allen Brain Observatory 2-photon dataset. Before we get started, lets find the dataset and ensure we can access it:
    
<p>The main entry point is the `BrainObservatoryCache` class.  This class is responsible for downloading any requested data or metadata as needed and storing it in well known locations.  For this workshop, all of the data has been preloaded into data assets on CodeOcean - These data are big, and this will save us a lot of bandwidth and time.

    
<p> We begin by importing the `BrainObservatoryCache` class and instantiating it. Here, `manifest_path` is a path to the manifest file. 
</div>

In [None]:
# Set file location based on platform. 
platstring = platform.platform()
if ('Darwin' in platstring) or ('macOS' in platstring):
    # macOS 
    data_root = "/Volumes/Brain2023/"
elif 'Windows'  in platstring:
    # Windows (replace with the drive letter of USB drive)
    data_root = "E:/"
elif ('amzn' in platstring):
    # then on Code Ocean
    data_root = "/data/"
else:
    # then your own linux platform
    # EDIT location where you mounted hard drive
    data_root = "/media/$USERNAME/Brain2023/"

In [None]:
from allensdk.core.brain_observatory_cache import BrainObservatoryCache

manifest_file = os.path.join(data_root,'allen-brain-observatory/visual-coding-2p/manifest.json')

boc = BrainObservatoryCache(manifest_file=manifest_file)

The 'BrainObservatoryCache object contains all of the functions you need to access brain observatory data. Recall from bootcamp that you can see avalible functions for an object by pressing 'tab' after the period after the object.

In [None]:
#boc.

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

## 1 Do neurons in the mouse visual cortex encode direction?

To answer this question, we will use the Allen Brain Observatory dataset(s). These data include a battery of visual stimuli, with accompanying neural recordings from early visual areas. These include static gratings, drifting gratings, locally sparse noise, natural scenes, and some natural movies. The data also includes periods of no stimulus, which allow for "spontaneous" (or at least unstimulated) neural activity. 

![VisualStimuli.JPG](attachment:29e4246f-60bc-4c77-9182-1e62e2996208.JPG)




We can use the BrainObservatoryCache to see all the stimuli used:


In [None]:
boc.get_all_stimuli()

For today's workshop, though, our question is focused on direction selectivity. This means that it makes the most sense to use drifting gratings. Here, gratings are shown across the whole screen, but moved across it in specific directions at specific temporal frequencies. The BrainObservatoryCache can rapidly give us an idea about how much data we might have to answer this question. Hint: its a lot.

In [None]:
sessions_with_drifting_gratings = boc.get_ophys_experiments(stimuli=['drifting_gratings'])
len(sessions_with_drifting_gratings)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

## 1.1 Record Neurons; Every recording is biased; its important to know how.

Our goal here is to look at the activity of individual neurons. In this course, you will see data from two primary ways of doing this; in vivo electophysiology (e.g NeuroPixels data) and 2-photon calcium imaging data. 

This is no one right way to record neural activity- each method has its pluses and minuses. Many of you may have in depth understandings of one or both of these methods, but lets think for a moment about how you might select a recording technique to tackle a particular scientific question. 
    
We will discuss these methods briefly now, but the *Data Book* goes into more detail. While this may seem academic, choosing the right data set for the right question will be important to planning your projects!

## 1.1.1 In Vivo Electrophysiology 
In vivo electrophysiology is where you park an electrode next to a cell and record extracellular ion flow from each action potential. 

![Model of mouse v1_Barry Isralewitz_braincontest2021.jpg](attachment:a776080c-0557-4f70-90ef-d41d1ad6de8e.jpg)








<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">


## 1.1.2 2-Photon Calcium imaging
Calcium imaging is an optical technique. Neurons are genetically modified to express a GCaMP, a genetically encoded calcium indictor that attaches Green Fluorescent Protein (GFP) to calmodulin, a calcium binding protein. As a result, cells will fluoresce in a  green wavelength when these is a calcium influx into the cell. Since calcium concentration spikes during action potentials, this allows for optical imaging of a given cells activity. 

Here is what this looks like when you stimulate a neuron and simultaneously record its electrophysiological (in this case, via patch clamp) and Ca2+ response (Stosiek et al. 2003, PNAS). Notice that there is a single, slow calcium transient for a much faster burst of spikes.

![EphysAndCa2.JPG](attachment:f367a0da-3fbc-4a48-a412-be0657695805.JPG)    
    
As an historical aside, GFP was discovered here, at Friday Harbor, from the fluorescent jelly fish Aequorea victoria. 

![https://www.thoughtco.com/green-fluorescent-protein-facts-4153062](./resources/AequoreaVictoria.jpg)


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">


## 1.2 Looking at Ca2+ data
For today's question, we want to characterize a population of neurons in visual cortex. Because we want to look at a large population of neurons in a defined spatial region, Ca2+ data makes sense to answer our questions. We are therefore going to dive deeper into data from the Allen Brain Observatory optical dataset. You will learn more about an analogous ephys dataset tomorrow!
    
    

As an aside; if you are wondering about how brain areas in these are recordings are identified, see the *Data Book* description of "Intrinsic Signal Imaging." Technique that optically measures hemodyanmics - effectively, changes in blood oxygenation - and can be used in conjunction with visual stimuli to identify visual areas on the surface of the brain.

![ISI.JPG](attachment:f5bdbb9c-83d6-45cd-bd33-380884daf8c9.JPG)



<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h3>Experiment containers</h3>
 
<p> Now we are ready to access some data!
    
<p>An experiment container describes a set of 3 experiment sessions performed at the same location (targeted area and imaging depth) in the same mouse that targets the same set of cells. Note that this targeting is not perfect- many cells will overlap between sessions, but not all. 

Each experiment container has a unique ID number. For now, lets just choose one to work with.
</div>

In [None]:
expt_container_id = 637998953

In [None]:
# Get the sessions in this experiment container
expt_session_info = boc.get_ophys_experiments(experiment_container_ids=[expt_container_id],)

# Convert to a pandas dataframe for easy access
expt_session_info_df = pd.DataFrame(expt_session_info)
expt_session_info_df

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h3>The Dataset Object</h3>
<p>Each session has a unique `session_id.' These can be used to query a data_set object, which contains information specific to that session.

</div>

In [None]:
# Find experiments that include the "drifting grating" stimulus in our chosen experiment container
expt_session_info = boc.get_ophys_experiments(experiment_container_ids=[expt_container_id],
                                             stimuli=['drifting_gratings'])

# Again, Convert to dataframe for easy access
session_info  = pd.DataFrame(expt_session_info)

# Load the data from our chosen session
session_id = session_info.id.iloc[0]
data_set = boc.get_ophys_experiment_data(ophys_experiment_id=session_id)

The dataset object has a selection of metadata associated with it. 

In [None]:
data_set.get_metadata()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

## 1.2.1 Cre Lines, Cortical Depth, and other metadata.
    
The dataset metadata include a number of key details for understanding the experiment. 

Lets highlight some key metadata features. These are futher addressed both in todays *Take Home Exercises* and in the *Data Book*
    
<b>Cre Line:</b> Calcium imaging only works if a calcium indicator (here GCaMP) is expressed in cells. Mice are bred express these indicators in particular cell types using the genetic toolkit based around a Cre-Lox system. A genetic line of mice is called a "cre line," with each cre line expressing the GCaMP only in a certain set of genetically targeted cells. Ideally, a cre-line will target a particular cell type (though not all are perfect).
    
In this case, we have a Fezf2-Cre line. This will drive GCaMP expression in corticofugal excitatory neurons in layer 5.

<b>Imaging Depth:</b> Because Ca2+ imaging is an optical technique, recordings must be targeted to a specific focal depth of the microscope, corresponding to how deep in the tissue the images were collected. 
    
Here, we are relativly deep (400um) to see layer 5.

https://help.brain-map.org/display/observatory/Allen+Brain+Observatory+-+Overview#:~:text=Intrinsic%20signal%20imaging%20%28ISI%29%20measures%20hemodynamic%20response%20to,field%20to%20corresponding%20locations%20within%20responsive%20cortical%20areas.

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

## 1.2.2 DF/F Traces

Calcium signalling is a measurment of fluorecesnse, in this case the intensity of fluorecesnse at the soma of the cell. Because the amount of calcium in a cell is never 0 (and because all cells have some native fluorecesnse without the indictator active), we cannot simply measure raw fluorecense signal. Instead, what we care about is change in fluorecesse to some baseline value, that is, $\Delta f/f$ (typically writen df/f or simply dff). The *Data Book* contains more information on how the Allen Ca2+ data were processed.
    


In [None]:
# Get the df/f traces for every cell from this session. 
# "get_dff_traces" returns a tuple with two values, the timestamps and the dff_traces.
timestamps,dff_traces = data_set.get_dff_traces()

# Take a look at the size and shape
num_cells = dff_traces.shape[0]
print("Number of cells: " + str(num_cells))
print("Number of frames: " + str(len(timestamps)))



In [None]:
# Lets grab a particular cell
cell_specimine_id = 662191687
cell_idx= data_set.get_cell_specimen_indices([cell_specimine_id])[0]
cell_idx

In [None]:
# And plot its dff trace
fig,ax = plt.subplots()
ax.plot(timestamps,dff_traces[cell_idx,:])
ax.set_ylabel('Time (seconds)')
ax.set_xlabel('df/f')


Stimulus information is contained in tables as part of the dataset object

In [None]:
stim_table = data_set.get_stimulus_table("drifting_gratings")
stim_table.head()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
<h3>Now we have data to work with!</h3>

This means we can start actually plotting some of stimulus responses for specific stimuli.

Lets start by plotting df/f for a single trial. For comparison, we will grab a little bit of time (1 second) on either side of the two second trial.

In [None]:

fig,ax = plt.subplots()
# Plot one trial worth of df/f
ax.plot(dff_traces[cell_idx,int(stim_table.start[0])-30:int(stim_table.end[0])+30])
ax.axvspan(30,90, color='gray', alpha=0.3) #this shades the period when the stimulus is being presented
ax.set_ylabel("df/f")
ax.set_xlabel("Frames");


Now lets grab a (more interesting) trial.

In [None]:
# Define a time range for the plot
interesting_trial = 438

fig,ax = plt.subplots()
# Plot the df/f trace
ax.plot(dff_traces[cell_idx,stim_table.start[interesting_trial]-30:stim_table.end[interesting_trial]+30])
ax.axvspan(30,90, color='gray', alpha=0.3) #this shades the period when the stimulus is being presented
ax.set_ylabel("DF/F")
ax.set_xlabel("Frames")



We can filter the stim table to look for all the trails of this type

In [None]:
stim_table.iloc[interesting_trial]

In [None]:
# Get info for this trial
intersting_freq= stim_table.iloc[interesting_trial].temporal_frequency
intersting_orientation= stim_table.iloc[interesting_trial].orientation

# And query stim table for trials with the same parameters
this_stim_table = stim_table[(stim_table.temporal_frequency==intersting_freq) & (stim_table.orientation==intersting_orientation)]
this_stim_table

In [None]:
len(this_stim_table)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">
    
    <h3> Task 1 </h3>
    
Plot the df/f traces for all this cell on all of the "interesting" trials.

In [None]:
fig,ax = plt.subplots()
ax.axvspan(30,90, color='gray', alpha=0.3) 
# THIS IS A SUPER HANDY WAY TO INTERATE THROUGH A TABLE!!!
for ii,row in this_stim_table.iterrows():
    # Your Code Here!
    continue


In [None]:
# hint:
row

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

## 1.2.1.3: Event detection
Calcium signal is a continuous variable. We know, however, that most large influxes of calcium are associated with one or more spikes taking place in the neuron in question. These spikes are associated with a large influx of calcium into the cell, followed by a slow decay in the fluorescence signal. Importantly, this also presents a potential problem with calcium imaging; signals in the past contaminate signals in the future.

One solution to this problem is to do deconvolution of the calcium events from the time course of the df/f trace. Deconvolution is the process of separating two signals (in this case, the cell's response from the GCaMP and imaging dynamics). Deconvolution is a computationally hard and expensive problem. One such algorithm has already been run for us on these data. This algorithm makes some reasonable assumptions about the dynamics of df/f responses relative to calcium traces, and uses these to extract event times and event amplitudes from the df/f trace. A very excellent description of this method is here https://jewellsean.github.io/fast-spike-deconvolution/index.html. We will work with its outputs for the duration of this course.

A quick word on nomenclature: We don't actually know what caused a spike in calcium. It was most likely a single action potential or a burst of spikes in quick succession, but we did not explicitly observe these and do not know how many spikes there might have been. We therefore refer to calcium "events" as a source-agnostic way to describe this calcium activity.

Events for each session have been pre computed

In [None]:
events = boc.get_ophys_experiment_events(ophys_experiment_id=session_id)

In [None]:
cell_idx

Each event is discrete, and has both a timestamp and an amplitude. Lets look at how these relate to the same df/f trace from before.

We know the frame rate for our data is about 30 Hz, and we know that stimuli were show for 2 seconds. We can use this information to plot data centered around the trial start, with a buffer on each side.

In [None]:
fig,ax = plt.subplots()
ax.axvspan(30,90, color='gray', alpha=0.3) 

# Plot the DF/F trace
ax.plot(dff_traces[cell_idx,stim_table.start[interesting_trial]-30:stim_table.end[interesting_trial]+30])
ax.set_ylabel("DF/F")
ax.set_xlabel("Frames")

# Define a new axis that shares the X axis, but has a different Y
ax_ = ax.twinx()

# Plot the events 
trial_events = events[cell_idx,stim_table.start[interesting_trial]-30:stim_table.end[interesting_trial]+30]
ax_.stem(np.arange(0,len(trial_events)),# Stem requires a x axis input: just use 0:1:N
                 trial_events, 'g', markerfmt='g.', basefmt='g-')
ax_.set_ylabel("Event Magnitude")

Like df/f, event magnitudes are dimensionless units describing how much that event changed the Ca2+ signal. 

In [None]:
fig,ax = plt.subplots(figsize = (5,7))
ax.axvspan(30,90, color='gray', alpha=0.3) #this shades the period when the stimulus is being presented

# Since iterrows uses the dataframe index as an iterator, we need our own trial counter
trlidx = 0

for ii,row in this_stim_table.iterrows():
    trial_events = events[cell_idx,int(row.start)-30:int(row.end)+30]
    
    # The stem plot returns multiple plotting objects, so that we can interact with them seperatly.
    markerline, stemlines, baseline = ax.stem(np.arange(0,len(trial_events)),
            trial_events+trlidx,
            bottom = trlidx,markerfmt=' ',basefmt='-')
    
    # This line just makes sure the line and stems are the same color 
    plt.setp(stemlines, 'color', plt.getp(baseline,'color'))

    trlidx+=1 # Update trial counter
    
    ax.set_ylabel("Trial+Event Amplitude")
    ax.set_xlabel("Frames")

A quick note- here, we are approximating time by assuming a 30 Hz sampling rate. This isn't strictly true; if you look at the timestamps for a session, they do not always have consistent intervals. What is more, if we round stimulus presentation to the nearest timestamp we do not always get exactly the same trial length (some are 60 frames, some are 59). So, while this approximation is fine for what we are doing here today, it might not always be a safe thing to do! 



<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

## 1.3 Do neurons in the mouse visual cortex encode direction?


For each trial, lets get the orientation, temporal frequency, and response for this cell.

In [None]:
# Create arrays for orientation, temporal frequency, and response
# Initially these will just be zeros, but we will fill the in with a loop
orientation = np.zeros((len(stim_table)))
temp_freq = np.zeros((len(stim_table)))

# We now have two response variables, one for dF/F and one for events. Since we
# are mostly going to be analysing events, we won't put a qualifier on the name
# of this variable.
dff_response = np.zeros((len(stim_table),60))
response = np.zeros((len(stim_table),60))

# Loop through each stimulus presentation, store its parameters
for ii in range(len(stim_table)):
    orientation[ii] = stim_table.orientation[ii]
    temp_freq[ii] = stim_table.temporal_frequency[ii]
    dff_response[ii,:] = dff_traces[cell_idx,stim_table.start[ii]:stim_table.start[ii]+60]
    response[ii,:] = events[cell_idx,stim_table.start[ii]:stim_table.start[ii]+60]

In [None]:
# The response matrix here is a 2D array, with each row representing the
# response on a single trials.
response.shape

For now, we can subsample only to the temporal frequency we were looking at before.

In [None]:
orientation = orientation[temp_freq==intersting_freq]
dff_response = dff_response[temp_freq==intersting_freq,:]
response = response[temp_freq==intersting_freq,:]

The 'unique' command is useful here. Not only will it tell us the list of unique stimuli shown, but it will provide an (optional) "inverse" variable that can be used to reconstruct the original array. Lets take a look:

In [None]:
orientations,ix = np.unique(orientation,return_inverse=True)
print(orientations)


In [None]:
# The variable 'ix' maps the values in 'orientation' to the values in
# 'orientations'
print(ix[1:10])
print(orientation[1:10])
print(orientations[ix[1:10]])

In [None]:
# sorting on the inverse index will put the trials in order of orientation
sort_order = np.argsort(ix)

fig,ax = plt.subplots(figsize=(7,10))
# Plot the dff matrix
ax.imshow(dff_response[sort_order,:],aspect='auto')
ax.set_ylabel('Stimulus direction')
ax.set_xlabel('time (frames from stim onset)')

# The unique command can also return the first time a unique variable occures.
# Here, we will use that denote the blocks of stimuli in our sorted plot
_,first_instance = np.unique(np.sort(ix),return_index=True)
ax.set_yticks(first_instance)
ax.set_yticklabels(orientations);

In [None]:
_,first_instance = np.unique(np.sort(ix),return_index=True)


Before we get too far, we are going to divide the data in half. Don't worry too much about this right now - eventually it will allow us to quantify some of the variability in our data, but we will talk about this at length later.

In [None]:

divider = len(orientation)//2
orientation_a = orientation[:divider]
orientation_b = orientation[divider:]
temp_freq_a = temp_freq[:divider]
temp_freq_b = temp_freq[divider:]
response_a = response[:divider,:]
response_b = response[divider:,:]
ix_a = ix[:divider]
ix_b = ix[divider:]


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">
    
    <h3> Task 2 </h3>
    
     Plot the sorted events for the first and second halves of the session

In [None]:
fig,ax = plt.subplots(ncols=2,figsize = (10,10))

# Since iterrows uses the dataframe index as an iterator, we need our own trial counter
trlidx = 0

sort_order = np.argsort(ix_a)
for ii,ss in enumerate(sort_order):
    markerline, stemlines, baseline = ax[0].stem(np.arange(0,60),
            3*response_a[ss,:]+trlidx,
            bottom = trlidx,markerfmt=' ',basefmt='-')
    plt.setp(stemlines, 'color', plt.getp(baseline,'color')) # This line just makes sure the line and stems are the same color 
    trlidx+=1 # Update trial counter
    
ax[0].set_ylabel("Trial+Event Amplitude")
ax[0].set_xlabel("Frames")
ax[0].set_title("1st half")

# Denote the blocks of stimuli in our sorted plot
_,first_instance = np.unique(np.sort(ix_a),return_index=True)
ax[0].set_yticks(first_instance)
ax[0].set_yticklabels(orientations);

# Add your code for the second half of the session here


It is going to be easier to look at each response as a single number, rather than a 2-second time series. 
For this, if we were working with df/f traces, one option would be to take the average of the full 2 second window. Because all our time windows here are the same length, this will be proportional to the area under each response curve.

Even though we are looking at events, we can still average or sum the event total over the same window. Our response magnitude will now be measured as average (giving a event magnitude rate) or a sum (giving a total/cumulative event magnitude).

In [None]:
mean_response_a = response_a.mean(axis =1)
mean_response_b = response_b.mean(axis =1)

In [None]:
print(mean_response_a.shape)
print(mean_response_b.shape)

In [None]:
fig,ax = plt.subplots()
ax.plot(orientation_a, mean_response_a, 'o')
ax.set_xticks(orientations)
ax.set_xlim(-10,325)
ax.set_xlabel("Direction", fontsize=16)
ax.set_ylabel("Trial Event Rate", fontsize=16)

In [None]:
tuning_a = np.zeros(orientations.shape)
for ii in range(orientations.shape[0]):
    tuning_a[ii] = mean_response_a[ix_a==ii].mean()
    
fig,ax = plt.subplots()
ax.plot(orientations,tuning_a, 'o-')
ax.set_xticks(orientations)
ax.set_xlim(-10,325)
ax.set_xlabel("Direction", fontsize=16)
ax.set_ylabel("Trial Event Rate", fontsize=16)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

## 1.4 Descriptive statistics
    
What we have done thus far is describe the response of our cell in terms of its average response to each grating direction. 
    
Its worth noting that a common way of analysing this kind of data is to define a single descriptive statistic. 
    
In this case, a popular choice is the *Direction selectivity index* or *DSI*. This is an index score that compares the the resonse of the cell in to its prefered orientation, $R_{pref}$, to the oposing orientation, $R_{null}$: 
    
$$DSI = \frac{R_{pref}-R_{null}}{R_{pref}+R_{null}}$$
    
For event data (where events must have values >=0), DSI is bounded between 0 and 1, where 0 would mean a cell had no selectivity and 1 would mean that selectivity is entrily tuned in one direction.


In [None]:
# Define a fuction that computes DSI
def dsi(ori_vals,tuning):
    """
    Computes the direction selectivity of a cell. 
    De Vries 2019

    Parameters
    ----------
    ori_vals : float array of length N
         List of orientation values, in degrees
    tuning : float array of length N
        Each value the (averaged) response of the cell at a different
        orientation, in the same order as ori_vals

    Returns
    -------
    dsi : float
        Direction selectivity Index
    """
    # Find the index of the max response
    pref_idx = np.argmax(tuning) 
    # Find the prefered direction
    pref_dir = ori_vals[pref_idx] 
    # and prefered response.
    R_pref = tuning[pref_idx]
    
    # Null direction is opposing, so 180 degress from prefered
    # Here % sign is the modulus, so wraps around 360.
    null_dir = (pref_dir +180)%360 
    # Find the index of the null direction
    null_idx = list(ori_vals).index(null_dir)
    # Find the null response
    R_null = tuning[null_idx]
    
    return (R_pref-R_null)/(R_pref+R_null)

In [None]:
dsi(orientations,tuning_a)

What does this selectivity index mean? On its own, it is difficult to interpret. We will come back to that later in this later in the workshop, and you can dive deeper into it in the "Take Home" Excersise if you like. 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

## 2 How reliable are these responses?


We can use our tuning curve as a simple model of the receptive field if we use it like a look up table: for each stimulus direction, we know what the mean response of the cell will be. Using the index established above, we can query the tuning curve for the expected response amplitude of the cell.

In [None]:
prediction_aa = tuning_a[ix_a]

In [None]:
fig,ax = plt.subplots()
ax.plot(orientation_a,prediction_aa, '.')
ax.set_xticks(orientations)
ax.set_xlim(-10,325)
ax.set_xlabel("Direction", fontsize=16)
ax.set_ylabel("Predicted Event Rate", fontsize=16);

# Notice that even though there are multiple trials at each direction, 
# they all get the same prediction.

One method for quantifying how well a prediction does is to simply compute the prediction error, that is, how far off the prediction was from the observed value. These are also called the model residuals.

In [None]:
residual = prediction_aa- mean_response_a

fig,ax = plt.subplots()
ax.hist(residual,bins=50);
ax.set_xlabel('Trial Residuals (event magnitude)')
ax.set_ylabel('Trial Count');

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
By itself, though, simply computing the residual error on every point can be difficult to interpret.
    
One popular metric for assessing this kind of mode is the "coefficient of determination," or $R^2$. $R^2$ asks compares the sum of squared residuals in our model, $SS_{residuals}$, to the total sum of squares $SS_{total}$, effectively asking "How much of the variance in our data is explained by our model?." $R^2$ is defined:
    
 $$R^2 = 1-\frac{SS_{residuals}}{SS_{total}}$$

Because a perfect model will have zero residuals and worthless model will explain no variance, subtracting this ratio from 1 lends interpretability. A perfect model will have zero residuals, and therefore $R^2=1$. If no variance is explained $R^2$ will equal 0. Values in between can be interpreted as "fraction of variance explained."

Time to try it!

In [None]:
residuals_aa  =  np.mean((residual)**2)
residuals_aa

In [None]:
total_variance_aa = np.mean((mean_response_a-np.mean(mean_response_a))**2)
total_variance_aa

In [None]:
r2_aa = 1-residuals_aa/total_variance_aa
r2_aa

In [None]:
stdev_a = np.zeros(orientations.shape)
for ii in range(orientations.shape[0]):
    stdev_a[ii] = mean_response_a[ix_a==ii].std()

fig,ax = plt.subplots()
ax.errorbar(orientations,tuning_a,stdev_a)
ax.set_xticks(orientations)
ax.set_xlim(-10,325)
ax.set_xlabel("Direction", fontsize=16)
ax.set_ylabel("Trial Event Rate", fontsize=16);

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
  
## 2.1 Splitting and Validation
    
You will probably have noticed that we split our data in half before we fit our initial model.
    
This is because, up to this point, we have been building our model on the same data we are using to quantify its predictive power. What we really want to know, though, is how well this model can predict data that it has never seen before. Splitting the data early on gives us an additional set of trials with which to test these model predictions.
    

Look at the data from both parts of the split.

In [None]:
fig,ax = plt.subplots()
ax.scatter(orientation_a, mean_response_a,label = 'A')
ax.scatter(orientation_b, mean_response_b,label = 'B')

ax.set_xticks(orientations)
ax.legend()
ax.set_xlim(-10,325)
ax.set_xlabel("Direction", fontsize=16)
ax.set_ylabel("Trial Event Rate", fontsize=16)

We can make predictions about out 'testing' data using the same indexing technique.

In [None]:
prediction_ab = tuning_a[ix_b]

Computing $R^2$ is something that we are likely going to want to do many times. It is therefore helpful to write a function that tackles what we did earlier.

In [None]:
def compute_coefficient_of_determination(mean_response,prediction):
    """
    Compute the coefficient of determination (R^2) for a set of data
    
    Parameters
    ----------
    mean_response : np.array
        The mean response for each stimulus
    prediction : np.array
        The predicted response for each stimulus
    
    Returns
    -------
    r2 : float
        The coefficient of determination
    """

    # Compute SS residuals
    residual = mean_response-prediction
    sum_of_squared_residual  =  np.mean((residual)**2)
    
    # Compute Total SS
    total = np.mean(mean_response)-mean_response
    sum_of_squared_total  =  np.mean((total)**2)
    
    # And finally R2
    r2 = 1-sum_of_squared_residual/sum_of_squared_total
    return r2

In [None]:
# How well does a model fit on A predict B?
r2_ab = compute_coefficient_of_determination(mean_response_b,prediction_ab)
r2_ab

In [None]:
# How well did this same model do at predicting itself?
r2_aa

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
 
  <p>  
So- Our model did pretty well (63% of variance explained) in predicting the data it was trained on, but less well (only 16% variance explained) on held out data. 
    </p>
<p>   
In this case, we split our data in half in time. This means either (a) The cell is noisy and not particularly consistant accross trials or (b) the cell's responses was not consistant between the first and second halfs of the trial. Later in the workshop, we will look further at which of these we think is the true.
    </p>   
    
In the meantime (and to get a better sense of what these numbers mean) lets try looking at another cell.

Here, we repeat what we did above, but this time use an additional for-loop to compute the response for every cell on every trial. This will allow us to query a table of cell responses, and save us from having to re-do this math every time we want to look at addition cells.

In [None]:
# Orientation and temp freq wont change from before
#orientation = np.zeros((len(stim_table)))
#temp_freq = np.zeros((len(stim_table)))
mean_response_all = np.zeros((len(stim_table),num_cells))
for ii in range(len(stim_table)):
    #orientation[ii] = stim_table.orientation[ii]
    #temp_freq[ii] = stim_table.temporal_frequency[ii]
    for cc in range(num_cells):
        this_response =  events[cc,stim_table.start[ii]:stim_table.start[ii]+60]
        mean_response_all[ii,cc] = this_response.mean()

# re-filter for the temp_freq = 1
mean_response_all = mean_response_all[temp_freq==intersting_freq,:]
#orientation = orientation[temp_freq==intersting_freq]
#temp_freq = temp_freq[temp_freq==intersting_freq]

# Split the mean response
divider= len(mean_response_all)//2      
mean_response_all_a = mean_response_all[:divider,:]
mean_response_all_b = mean_response_all[divider:,:]


Similarly, we probably want a function that does the math on computing tuning curves:

In [None]:
def compute_tuning_curve(mean_response,orientation,orientations):
    """
    Compute the tuning curve for a set of reponses and orientations
    
    Parameters
    ----------
    mean_response : np.array
        The mean response for each stimulus
    orientation : np.array
        Orientation of each stimulus
    orientations: np.array
        All orientations to compute tuning over
        Useful when a subset of orientations are needed.
    
    Returns
    -------
    tuning : np.array
        mean response at each orientation
    stdev: np.array 
        Standard deviation of responses at each orientation
    """
    tuning = np.zeros(orientations.shape)
    stdev = np.zeros(orientations.shape)
    for ii,ori in enumerate(orientations):
        tuning[ii] = mean_response[orientation==ori].mean()
        stdev[ii] = mean_response[orientation==ori].std()
    return tuning,stdev
    

Lets grab a different cell

In [None]:
another_idx = 59

In [None]:

this_tuning_a,this_stdev_a = compute_tuning_curve(mean_response_all_a[:,another_idx],orientation_a,orientations) 

fig,ax = plt.subplots()
ax.plot(orientations,this_tuning_a, 'o-')

ax.set_xticks(orientations)
ax.set_xlim(-10,325)
ax.set_xlabel("Direction", fontsize=16)
ax.set_ylabel("Trial Event Rate", fontsize=16)

What is the direction tuning index for this cell?

In [None]:
dsi(orientations,this_tuning_a)

It looks like this cell has a sharp tuning peak at 180 degrees. 

How much variance does our model explain?

In [None]:
compute_coefficient_of_determination(mean_response_all_a[:,another_idx],this_tuning_a[ix_a])

In [None]:
fig,ax = plt.subplots()

ax.errorbar(orientations,this_tuning_a,this_stdev_a,)
ax.scatter(orientation_a,mean_response_all_a[:,another_idx])

ax.set_xticks(orientations)
ax.set_xlim(-10,325)
ax.set_xlabel("Direction", fontsize=16)
ax.set_ylabel("Trial Event Rate", fontsize=16)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Compare the figure here for to the one we made for the last cell. Before, the amplitude of the cells response might be a bit smaller, but it still seems to have a prefered direction. We quantified this by showing that splitting our data and showing a model fit on the first half of the data did reasonably well on the second half. What happends in 
    
What happens if we compute $R^2$ on our held out data?
    

In [None]:
compute_coefficient_of_determination(mean_response_all_b[:,another_idx],this_tuning_a[ix_b])


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
Here, a negative $R^2$ value means that not only did we explain NON of variance in the second half of our data, but our model did worse than if we had just guessed with the mean of these data. This is pretty clearly a sign that we have built a bad model.
    
To visualize this, we should look at the responses for each stimulus.

In [None]:
fig,ax = plt.subplots()
ax.errorbar(orientations,this_tuning_a,this_stdev_a,label = '1st half')
ax.scatter(orientation_a,mean_response_all_a[:,another_idx])

this_tuning_b,this_stdev_b = compute_tuning_curve(mean_response_all_b[:,another_idx],orientation_b,orientations) 

ax.errorbar(orientations,this_tuning_b,this_stdev_b,label = '2nd half')
ax.scatter(orientation_b,mean_response_all_b[:,another_idx])
ax.set_xticks(orientations)
ax.set_xlim(-10,325)
ax.set_xlabel("Direction", fontsize=16)
ax.set_ylabel("Trial Event Rate", fontsize=16)
ax.legend()


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

The moral of this story is that a 'peak' in a tuning curve DOES NOT necessarily mean the cell is reliably tuned! 
    
What is more, simply plotting or quantifying tuning curves would have failed to diagnose this problem, since our second cell appeared to have a large peak in its response curve.
    
We call this technique for assessing models "data splitting". The segment of the data we built our model with is called the "training set," and the set of data we tested with is called the "testing set." 
    
Next, we will use this technique to scale up our analysis and look at every cell in this session.

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

## 3 What do responses look like an a neural population?

In [None]:

many_r2_train = np.zeros(len(dff_traces))
many_r2_test = np.zeros(num_cells)

tuning_train = np.zeros((num_cells,orientations.shape[0]))
stdev_train = np.zeros((num_cells,orientations.shape[0]))

tuning_test = np.zeros((num_cells,orientations.shape[0]))
stdev_test = np.zeros((num_cells,orientations.shape[0]))

for cc in range(num_cells):

    tuning_train[cc,:],stdev_train[cc,:] = compute_tuning_curve(mean_response_all_a[:,cc],orientation_a,orientations)
    tuning_test[cc,:],stdev_test[cc,:] = compute_tuning_curve(mean_response_all_b[:,cc],orientation_b,orientations)

    this_prediction_train = tuning_train[cc,ix_a]
    this_prediction_test = tuning_train[cc,ix_b]

    many_r2_train[cc] = compute_coefficient_of_determination(mean_response_all_a[:,cc],this_prediction_train)
    many_r2_test[cc] = compute_coefficient_of_determination(mean_response_all_b[:,cc],this_prediction_test)
    

In [None]:
fig,ax = plt.subplots()
ax.hist(many_r2_train,bins=20);
ax.set_xlabel('$R^2$')
ax.set_ylabel('Number of cells')
ax.set_title('$R^2$ distribution on training set');


In [None]:

fig,ax = plt.subplots()

# r2 may not always be defined on the test set- this can happen when no events
# occurred, and we our math tells us we have "infinate" error. 
# We'll set these to nan - a maker of missing data - so they don't mess up our
# analysis.
many_r2_test[np.isinf(many_r2_test)]=np.nan

ax.hist(many_r2_test,bins=50);
ax.set_xlabel('$R^2$')
ax.set_ylabel('Number of cells')
ax.set_title('$R^2$ distribution on test set');
# Yikes!

In [None]:
fig,ax=plt.subplots(figsize=(5,5))
ax.scatter(many_r2_train,many_r2_test,label = 'All')
ax.scatter(r2_aa,r2_ab,c='r',s=100,label = 'Reliable Example')
ax.scatter(many_r2_train[another_idx],many_r2_test[another_idx],c='y',s=100,label = 'Not So Reliable Example')

ax.plot([-1,1],[-1,1],'k--')
ax.plot([-1,1],[0,0],'k')
ax.plot([0,0],[-1,1],'k')
ax.set_xlabel('R2 for first half')
ax.set_ylabel('R2 for second half')
ax.set_xlim(-1,1) 
ax.set_ylim(-1,1)
ax.set_aspect('equal')

ax.legend()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">
    
    <h3> Task 3 </h3>
    
Where did the green dot for our "bad" cell go? Adjust the figure above to see if you can find it.

This kind of analysis can also be useful for exploring larger datasets in a structured way. 

We can, for example, sort our cells based on how well our model performed on the test set.

In [None]:
# sort models from worst to best performance on the test set
order = np.argsort(many_r2_test)

fig,ax = plt.subplots(ncols=3,nrows=  2,figsize=(15,5))
ax[0,0].errorbar(orientations, tuning_train[order[-1]],stdev_train[order[-1]],)
ax[0,0].scatter(orientation_a,mean_response_all_a[:,order[-1]])
ax[0,0].errorbar(orientations, tuning_test[order[-1]],stdev_test[order[-1]],)
ax[0,0].scatter(orientation_b,mean_response_all_b[:,order[-1]])
ax[0,0].set_title('1st Most Reliable Direction Tuning')

ax[0,1].errorbar(orientations, tuning_train[order[-2]],stdev_train[order[-2]],)
ax[0,1].scatter(orientation_a,mean_response_all_a[:,order[-2]])
ax[0,1].errorbar(orientations, tuning_test[order[-2]],stdev_test[order[-2]],)
ax[0,1].scatter(orientation_b,mean_response_all_b[:,order[-2]])
ax[0,1].set_title('2nd Most Reliable Direction Tuning')

ax[0,2].errorbar(orientations, tuning_train[order[-3]],stdev_train[order[-3]],)
ax[0,2].scatter(orientation_a,mean_response_all_a[:,order[-3]])
ax[0,2].errorbar(orientations, tuning_test[order[-3]],stdev_test[order[-3]],)
ax[0,2].scatter(orientation_b,mean_response_all_b[:,order[-3]])

ax[0,2].set_title('3rd Most Reliable Direction Tuning')


ax[1,0].errorbar(orientations, tuning_train[order[0]],stdev_train[order[0]],)
ax[1,0].scatter(orientation_a,mean_response_all_a[:,order[0]])
ax[1,0].errorbar(orientations, tuning_test[order[0]],stdev_test[order[0]],)
ax[1,0].scatter(orientation_b,mean_response_all_b[:,order[0]])
ax[1,0].set_title('1st Least Reliable Direction Tuning')


ax[1,1].errorbar(orientations, tuning_train[order[1]],stdev_train[order[1]],)
ax[1,1].scatter(orientation_a,mean_response_all_a[:,order[1]])
ax[1,1].errorbar(orientations, tuning_test[order[1]],stdev_test[order[1]],)
ax[1,1].scatter(orientation_b,mean_response_all_b[:,order[1]])
ax[1,1].set_title('2st Least Reliable Direction Tuning')

ax[1,2].errorbar(orientations, tuning_train[order[2]],stdev_train[order[2]],)
ax[1,2].scatter(orientation_a,mean_response_all_a[:,order[2]])

ax[1,2].errorbar(orientations, tuning_test[order[2]],stdev_test[order[2]],)
ax[1,2].scatter(orientation_a,mean_response_all_a[:,order[2]])

ax[1,2].set_title('3rd Least Reliable Direction Tuning')

for ii,x in enumerate(ax.flatten()):
    x.set_xticks(orientations)
    x.set_xlim(-10,325)
    x.set_xlabel("Direction", fontsize=16)
    x.set_ylabel("Mean DF/F", fontsize=16)
    


fig.set_tight_layout(True)


Its worth returning to the orientation selectivity statistic from before. By plotting the orientation selectivity against the reliability metric we just computed, we can see how the descriptive statistic maps onto cell reliability.

In [None]:
dir_sel_idx= np.zeros(num_cells)
for ii in range(num_cells):
    dir_sel_idx[ii] = dsi(orientations,tuning_train[ii,:])

In [None]:
fig,ax = plt.subplots()
ax.hist(dir_sel_idx,20)
ax.axvline(dir_sel_idx[cell_idx],c = 'r',label = 'More Reliable Cell')
ax.axvline(dir_sel_idx[another_idx],c = 'y',label = 'Lesss Reliable Cell')
ax.legend()
ax.set_xlabel('DSI')
ax.set_ylabel('# Cells')

In [None]:
fig,ax = plt.subplots()
ax.scatter(dir_sel_idx,many_r2_test)
ax.scatter(dir_sel_idx[cell_idx],many_r2_test[cell_idx],c= 'r')

ax.set_xlabel('Orientation selectivity index')
ax.set_ylabel('R2 for second half')
ax.set_xlim(-.1,1.1) # The lower bound here is arbitrarty...
ax.set_ylim(-3,1)


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

## 3.1 There is more than one way to split your data.
Up till now, all our analysis has focus on data that were split into the first and second halves of the behavior session. 

How, exactly, you split your data can have a profound impact on the question you are asking. Here,this means that our question wasn't just about the consistency of data- it was about stability of neural responses between the first and second half of the session. 

One valid alternative would have been to grab random halves. Another might have been to grab alternating trials under each experiment condition. All of these would have told us something about neural consistency, but each would have asked subtly different questions about our data and each would have made subtly different assumptions going in. There is no one answer for the 'best' way to split your data. Instead, you will always need to formulate a split to answer your question at hand.

## 3.1.1 You can use carful data splits to answer scientific questions.

So far, we have thought about variability as a problem. Now though, lets try a strategic split across experimental conditions to see if we can learn something.
    
In this case, we are going to how an animals running might impact the tuning of a cell.

We can get the running speed from the dataset information. 

In [None]:
# Load running data
run_speed,run_ts = data_set.get_running_speed()

In [None]:
# Plot
fig,ax = plt.subplots()
ax.plot(run_ts,run_speed)
ax.set_xlabel('time (s)')
ax.set_ylabel('speed (cm/s)') 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">
    
    
    
<h3> Task 4 </h3>
    
    <p>
There is clearly a problem with this running trace. A little bit of negative running is normal (mice can walk backwards!), but trace has one point at -545 cm/sec. This would mean the mouse instantaniously move backwards at 12 mph (20 kph) for 1/30 of a second... which seems more likely to be a problem with the data than anything else.
    
<p>
We often use a "nan" value to represent missing data. "nan" stands for "Not A Number." This is safer than, say, setting data to some arbitrary value that might throw off computations down the road. Your task: set the spurious data point to equal "np.nan" and replot the data.

In [None]:
# Solution


Running has the same frame sampling as the stimulus table, so we can get the average running speed for each trial.

In [None]:
# Average running speed accross the trial
trial_running_speed = np.zeros(len(stim_table))
for ii,row in stim_table.iterrows():
    trial_running_speed[ii] = run_speed[int(row.start):int(row.end)].mean()


In [None]:
# Look at the distribution of running 
fig,ax = plt.subplots()
ax.hist(trial_running_speed,100)
ax.set_ylabel('# of trials')
ax.set_xlabel('speed (cm/s)')

# Hint: play around with ax.semilogy and ax.semilogx to get a better sense of the data.
# This can be useful with long tails!

Lets set a threshold on speed. Things above it will be "running" and below it will be "stopped."

In [None]:
threshold = 1
stopped = trial_running_speed<=threshold
running = trial_running_speed>threshold

We will need to recompute some of the variables above, since will filtered our data at some point in there.

In [None]:
orientation = np.zeros((len(stim_table)))
temp_freq = np.zeros((len(stim_table)))
response = np.zeros((len(stim_table),60))
mean_response_all = np.zeros((len(stim_table),num_cells))
for ii in range(len(stim_table)):
    orientation[ii] = stim_table.orientation[ii]
    temp_freq[ii] = stim_table.temporal_frequency[ii]
    for cc in range(num_cells):
        this_response =  events[cc,stim_table.start[ii]:stim_table.start[ii]+60]
        mean_response_all[ii,cc] = this_response.mean()



<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
 
    <h2> Saving data </h2>
  <p>  
A quick aside: We just recomputed some things we had already computed before. Gennerally speaking, this is a waste time for both you and your computer- it would have been better to save the data upfront.
    </p>
    
    <p>
So that we don't need to do this again, lets save our data somewhere. On code ocean, we will use the "scratch" folder so that we can access it again if/when we restart our capsual, or if we want to get to it outside this notebook.
        

In [None]:
save_loc = os.path.join('/scratch','Workshop1')

# Make a folder if it does not yet exist. 
# If it does exist, os will throw a FileExistsError, 
# but it means we have the directory we need and can move on.
try:
    os.mkdir(save_loc)
except (FileExistsError):
    None
    
# actually save stuff!!
np.save(os.path.join(save_loc,str(session_id)+'_orientation.npy'),orientation)
np.save(os.path.join(save_loc,str(session_id)+'_temp_freq.npy'),temp_freq)
np.save(os.path.join(save_loc,str(session_id)+'_mean_response_all.npy'),mean_response_all)
# Lets also save our trial averaged running speed
np.save(os.path.join(save_loc,str(session_id)+'_trial_running_speed.npy'),trial_running_speed)


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
 
Now we can move forward with our analysis.

In [None]:
# Some stimuli don't have orientation, so we need to remove those trials
keep_me = temp_freq==intersting_freq
orientation = orientation[temp_freq==1]
mean_response_all = mean_response_all[temp_freq==1,:]
trial_running_speed = trial_running_speed[temp_freq==1]
running = running[keep_me] 
stopped = stopped[keep_me]
 
orientations,ix = np.unique(orientation,return_inverse=True)


Next, we are going to divide up our data. We have the obvious divide of "running" and "stopped" trials but - as we learned before - this isn't an entirely fair comparison. We need to also compare two halves of our "running" and "not running" data. 

Before, we split our session into the first and second halves of our data. Here, we will show a different techique; we will random running trials for each half. What does this change out the assumtions we are making in our data?

In [None]:
# If you don't see your random number generator, 
# you will get different results each time you run this notebook
np.random.seed(42)

# Try out permutations
np.random.permutation([0,1,2,3,4,5,6,7,8,9])

In [None]:
# Find trials where mouse is running
running_trials = np.where(running)[0]
# Shuffle trials
perm_running_trials = np.random.permutation(running_trials)
#Split data
running_trials_1st_half = perm_running_trials[:len(running_trials)//2]
running_trials_2st_half = perm_running_trials[len(running_trials)//2:]

In [None]:
tuning_running,stdev_running = compute_tuning_curve(mean_response_all[running,cell_idx],orientation[running],orientations) 
tuning_stopped,stdev_stopped = compute_tuning_curve(mean_response_all[stopped,cell_idx],orientation[stopped],orientations) 

fig,ax = plt.subplots()

ax.errorbar(orientations,tuning_stopped,stdev_stopped)
ax.scatter(orientation[stopped],mean_response_all[stopped,cell_idx],label = 'Stopped')
ax.errorbar(orientations,tuning_running,stdev_running)
ax.scatter(orientation[running],mean_response_all[running,cell_idx],label = 'Running')

ax.legend()


In [None]:
many_r2_running = np.zeros(num_cells)
many_r2_stopped = np.zeros(num_cells)

tuning_running = np.zeros((num_cells,orientations.shape[0]))
tuning_running_1 = np.zeros((num_cells,orientations.shape[0]))

for cc in range(num_cells):
    # Using running to predict stopped response
    tuning_running[cc,:],_ =compute_tuning_curve(mean_response_all[running_trials,cc],orientation[running_trials],orientations)
    prediction_stopped = tuning_running[cc,ix[stopped]]
    many_r2_stopped[cc] = compute_coefficient_of_determination(mean_response_all[stopped,cc],prediction_stopped)

    # Use half of running to predict other half.
    tuning_running_1[cc,:],s_ =compute_tuning_curve(mean_response_all[running_trials_1st_half,cc],orientation[running_trials_1st_half],orientations) 
    prediction_running = tuning_running_1[cc,ix[running_trials_2st_half]]
    many_r2_running[cc] = compute_coefficient_of_determination(mean_response_all[running_trials_2st_half,cc],prediction_running)

In [None]:
print('R^2 running-->running:' +str(many_r2_running[cell_idx]))
print('R^2 running-->stopped: ' + str(many_r2_stopped[cell_idx]))

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">


So what did this teach us? 

A model fit to "running" trials does better at predicting other "running" trials than "stopped" trials. In other words, we found a that changes tunning its tuning based on the mouse's behavior!
   
You will recall that, earlier, we were comparing the reliability of our cells between the first and second halves of the session. While our chosen cell was somewhat reliable, we also saw that it was far from being our most reliable cells. If you look at the running trace for session, do you think that what we now know about the cells behavior with respect to running speed might influence this reliability?

After lunch, we will learn to formalize some of this analysis, including tools for systematic data splitting and evaluation the inclusion and formulation of model parameters. For now, though, lets eat!    