<img src="../code/Resources/cropped-SummerWorkshop_Header.png"> 

<h1 align="center">Workshop 1: Tutorial on neuronal decoding and behavior</h1> 
<h3 align="center">Summer Workshop on the Dynamic Brain</h3> 
<h3 align="center">Thursday, August 26th, 2025</h3> 
<h4 align="center">Day 2</h4> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
# 0.0 Neural Coding 

Neural coding describes how neurons represent information about the world. Coding can be studied by asking whether external or internal events lead to changes in neural activity (<b>encoding</b>), or by asking whether different types of information can be read out from neural activity (<b>decoding</b>). In this workshop we will focus on this later problem. Specifically, we will try to read out information about stimulus identity from neurons recorded during the Dynamic Routing Task. 

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
### 0.1 The 'visual change detection' task

For an example today to start to understand encoding and decoding, we are going to look at the dataset including the visual change detection task. Allen folks like to refer to this as the "Visual Behavior Task" dataset.

There is a very nice description of this task in the <b> data book </b>. Lets start by reminding ourselves how this task works:

https://allenswdb.github.io/physiology/stimuli/visual-behavior/VB-Behavior.html#change-detection-task

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
### 0.2 Our questions 

- How can we decode information from neurons and populations of neurons
- How do we decide if our decoding is any good?
- Can we use these tools to learn something about neural dyanimcs in visual cortex?


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
# 1.0 Setup

To Python, we first need packages Lets start with some you are familiar with. <b>Numpy</b>, <b>pandas</b>, and <b>matplotlib</b> should be favorites of yours by now. In addition, lets grab your new friend <b>pynwb</b> so that we can actually look some data!



In [None]:
# Lets start by importing some basic packages
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib as mpl

# pynwb is for reading python NWB files
import pynwb

pd.set_option('display.max_columns', None)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">


We also need to point our computer to the data. This will depend a bit on how you choose to do your compute. For CodeOcean, the platform is <i> 'amzn'</i>


In [None]:
import platform
from pathlib import Path
platstring = platform.platform()

if 'Darwin' in platstring:
    # macOS 
    data_root = Path("/Volumes/Brain2023/")
elif 'Windows'  in platstring:
    # Windows (replace with the drive letter of USB drive)
    data_root = Path("E:/")
elif ('amzn' in platstring):
    # then on CodeOcean
    data_root = Path("/data/")
else:
    # then your own linux platform
    # EDIT location where you mounted hard drive
    data_root = Path("/media/$USERNAME/Brain2025/")

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 1.1 An example session.

We will all use the same one to get started. 

In [None]:
example_session = 1139846596 # Other good ones to play with: 1152811536, 1069461581
this_session = str(example_session)
this_filename = f'ecephys_session_{this_session}.nwb'
nwb_path = data_root/'visual-behavior-neuropixels'/'behavior_ecephys_sessions'/this_session/this_filename
print(nwb_path)
# And read the nwb
session = pynwb.NWBHDF5IO(nwb_path).read()

In [None]:
# Lets take a look at the session object.
session

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 1.2 The trials table

Remember that the trial table lists all the per-trial event data.

In [None]:
# get trials data
trials = session.trials.to_dataframe()
trials.iloc[20:25]

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 1.3 Stimulus table(s)

But wait...We were trying to decode images.

The 'trials' table for this task is designed around the change task structure. However, there are other ways to look at these data- for example, by image presentations, by flash presentations, etc. 

These data are stored in the 'intervals' section of the nwb.

In [None]:
stimuli = session.intervals['Natural_Images_Lum_Matched_set_ophys_H_2019_presentations'].to_dataframe()

In [None]:
stimuli.head()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

There are both active and passive stimuli in this table. For now, lets only look at active trials. Trying to distinguish between these cases might be a good place to start a project, later! 

Read more about the task strucutre here: https://allenswdb.github.io/physiology/ephys/visual-behavior/VB-Neuropixels.html#experiment-design

In [None]:
active_stimuli = stimuli[stimuli.active == True]
# passive_stimuli = stimuli[stimuli.active == False]

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Visualize the stimulus table

Lets plot the information in the stimulus table to get a better sense of how its organized.

In [None]:
# Lets take a moment to highlight my favorite numpy command!
# Get the stimulus shown for each spike trial
unq_stim,stim_id, = np.unique(active_stimuli.image_name,return_inverse = True)
print(unq_stim)
print(stim_id[:10])

In [None]:
# Create a new plot. We need a tall one for this.
fig,ax = plt.subplots()

# This is just some fancy code to make the discrete colors work
cmap = mpl.cm.get_cmap('tab10', 
                       len(unq_stim))

norm = mpl.colors.BoundaryNorm(boundaries=np.arange(len(unq_stim)+1)-0.5,
                               ncolors=len(unq_stim))

# Count the index of each flash in each trial
_,trl_idx,trl_counts = np.unique(active_stimuli.trials_id,return_inverse=True,return_counts = True)
event_indices = np.zeros_like(trl_idx)
for i in range(len(trl_counts)):
    event_indices[trl_idx == i] = np.arange(trl_counts[i])

# The actual plotting code
z = ax.scatter(event_indices,
               trl_idx,
               s = 50,
               c = stim_id.T,
               cmap = cmap,
               norm = norm
              )

# Label stuff!
ax.set_xlabel('Flash in trial')
ax.set_ylabel('Trial ID')
ax.set_ylim([0,stimuli.trials_id.values[-1]])
ax.set_title('Task image stimuli')

# Get the color key
tick_locs = np.arange(len(unq_stim))
cbar = plt.colorbar(z, ticks=tick_locs)
cbar.set_label('Image identity')
cbar.set_ticklabels(unq_stim)

ax.set_ylim([0,20])
ax.set_yticks(np.arange(0,21));

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 1.4 The units table

Now that we have our stimuli, we need unit information! Just as the trials table contains information about each trial, the "units" table contains a good chunk of information about each unit!. Lets take a look.

In [None]:
# get units table 
units_table = session.units.to_dataframe()
units_table.head()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">
    
<h3> How many units are in the units table?? </h3>
    

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 1.5 Where are our units??

The units table doesn't actually tell us where are units were recorded from (it does in some dataset, just not this one). Instead, it stores the electrode id of the peak channel that that unit was recorded from. Where were these channels? for that we need to look at the electrodes table.



In [None]:
electrodes_table = session.electrodes.to_dataframe()
electrodes_table.head(2)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Data in the electrodes table are organized by unique channel identifiers. This setup allows us to do a dataframe join on the peak_channel_id from the units table. The result of this operation will be a table that contains all of the unit information, PLUS the electrode information for the key-ed channel.



In [None]:
units_electrode_table = units_table.join(electrodes_table,on = 'peak_channel_id')
print('Lenght before join: ' + str(len(units_table)))
print('Lenght after join: ' + str(len(units_electrode_table)))

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">
    
<h3> Take a look at the units table. </h3>

Can you find which brain area each unit is localized to now? hint: use .head or .iloc

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 1.6 Unit QC

Thats a lot of units. But do we trust them all?

The truth is, one of the biggest challenges with ephy analysis sorting units and figuring out which are safe to use. Neuropixels, by virtue of having many contacts near eachother, make it possible to record many units simultaniously. This is great, but it creates an additional challenge of knowing which units are "good" for analysis. Because large recordings are almost always sorted algorithmically, selecting good units is particularly important because no human has manually identified which units are safe for further analysis.

The units table contains many unit QC metrics. Typically, rather than just blindly trusting every output of the automated sorting, we typically impose some constraints on the data. Here are a few examples of useful ones:

+ isi_violations_ratio: what fraction of spikes happen closer together than should be possible for a neuron. If this is too high, its a sign of eather non-neuronal noise or more than one neuron getting merged into a unit.
+ amplitude_cutoff: An estimate of the fraction of spikes missed by the sorter, based on the amplitude historgram for the neuron.
+ presence_ratio: recordings are not 100% static. If the there is mechanical drift, the unit won't be present for the entire session. This adds all sorts of problems for analysis down the road - how can you compare accorss sessions if your unit comes or goes between them? we therefore want to use only units that we can follow through the recording.

Note that exactly which unit QC criteria you use may very based on the questions you ask. Have a question that doesn't really depend on well isolated neurons? Try loosening these criteria. Need to know about the differences of well-isolated units over a long timescale? try tightening them. As a starting place, though, these numbers are reasonable. If you want to get fancy, more information about QC critria can be found here: https://spikeinterface.readthedocs.io/en/stable/modules/qualitymetrics.html 

In [None]:
# QC criteria? 
good_units = units_electrode_table[
    (units_electrode_table.isi_violations<.5) &
    (units_electrode_table.amplitude_cutoff<.1) &
    (units_electrode_table.presence_ratio>.95)
    ]
print(len(good_units))


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">
    
<h3> How many 'good' VISp neurons do we have?? </h3>

np.unique has a handy "return_counts" option. Try using it to count the good VISp neurons. Never used this command before? Try using "shift+tab" to see the help menu!
    

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 1.7 Select an area to analyis



In [None]:
this_structure_units_table  = good_units[good_units.location == 'VISp']
len(this_structure_units_table)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

# 2 Can we decode stimulus identity from a single neuron?

Its time to look at our first neuron of the day! (WOOT!)

A very important comment as you get ready to start your projects: In this course, we will fit many models with increasing levels of abstraction to our data. It is always going to be tempting to jump straight into modeling, and gloss over the actual data. Don't be tempted to do this! Looking at and understanding your  data is, at the end of the day, alway more important than whatever model you might fit. After all, if your data are bad, so is your model.

Fortunately, someone here has already done a good chunk of the quality control on our data. So here, "data" really means the spike times for a given unit.

With this in mind, lets look at the activity of a single neuron as the mouse is presented with its stimulus. First, we need some data.


In [None]:
# Choose a neuron to start.
unit = 6
# Get the spike times for this neuron
spike_times = this_structure_units_table.spike_times.values[unit]
print(spike_times)


In [None]:
# and get the times that the stimulus presentation started
stim_times=  active_stimuli.start_time.values
print(stim_times[:10])

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 2.1 Rasters and PSTHs

What we what to know right now is, how does this neuron respond to a stimulus?

To answer this, we need to grab a window of time around the when the stimulus was presented. We will then look at the only the spikes that happen within this window. 

We are going to make two types of plots to visualize these data. 

+ The first we call a <i> raster </i> plot. Here, we will represent each spike by a dot, and each trial as a row in our plot.
+ The raster plot is useful for visualizing activity across trials, but it can be difficult to quantify. With this in mind, we will also make a <i> Peristimulus Time Histogram </i> or <i> PSTH </i>. This is the averaged stimulus triggered average for the neuron or, equivalently, a histogram showing the average of the raster plot.

In [None]:
# Define a stimulus window.
pre_window  = .2 # How far before the stimulus should we look?
post_window = .75 # How far after the stimulus should we look?
bin_size = .01 # What size bins do we want for our PSTH?
bins = np.arange(-pre_window,post_window+bin_size,bin_size) # Set up bins
bin_centers = bins[:-1]+bin_size/2
# Storage for data.
triggered_spike_times = []
triggered_stim_index = []

# Loop through the stimuli!!
for i, stim_time in enumerate(stim_times):
    # Select spikes that fall within the time window around this stimulus
    mask = ((spike_times >= stim_time - pre_window) & 
            (spike_times < stim_time + post_window))
    
    # Align spike times to stimulus onset (0 = stimulus)
    trial_spikes = spike_times[mask] - stim_time

    triggered_spike_times.append(trial_spikes)
    triggered_stim_index.append(np.ones(len(trial_spikes))*i)

# triggered_spike_times now has the times of each spike per trial.  
print(triggered_spike_times[:3])
# triggered_trial_index is for keeping track of which trial this spike was from
print(triggered_stim_index[:3])

In [None]:
# For plotting, we are going to want to concatenate these data into one big vector
triggered_spike_times = np.concatenate(triggered_spike_times)
triggered_stim_index = np.concatenate(triggered_stim_index)

In [None]:
# Instantiate a plot
fig,ax = plt.subplots(nrows = 2,figsize =(5,10))

# Plot the raster! Its just dots, so we use scatter.
# The 'k' here is a shoutout to all the matlab users...
ax[0].scatter(triggered_spike_times,triggered_stim_index,s = 1,c = 'k')
ax[0].set_xlabel('Time from stimulus (seconds)')
ax[0].set_ylabel('Stim number (sorted)')
ax[0].axvline([0],c = 'r')

# and make the histogram.
a,b = np.histogram(triggered_spike_times,bins = bins)
# Divide by # of trials, then bin size to get a rate estimate in Spikes/Sec = Hz
a = a/np.max(triggered_stim_index)/bin_size
ax[1].plot(bin_centers,a,c = 'k')
ax[1].set_xlabel('Time from stimulus (seconds)')
ax[1].set_ylabel('Spike Rate (Hz)')


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Great! We have a neuron that responds to the stimulus!

But remember our goal here- we want to know if we can decode the difference between stimuli from this neuron. With that in mind, lets try seperating out the different stimulus identities.


In [None]:
fig, ax = plt.subplots(nrows=2, figsize=(5, 10))

# Define a stimulus window.
pre_window  = .2 # How far before the stimulus should we look?
post_window = .75 # How far after the stimulus should we look?
bin_size = .01 # What size bins do we want for our PSTH?
bins = np.arange(-pre_window,post_window+bin_size,bin_size) # Set up bins
bin_centers = bins[:-1]+bin_size/2

n_trials = len(stim_id)

# Ensure integer trial ids for indexing
triggered_stim_index = triggered_stim_index.astype(int, copy=False)

# Use unique to compute how many of each type of trial there are 
_,counts  = np.unique(stim_id,return_counts=True)

# Blocks will be offset by the number of trials of each type
offsets = np.zeros(len(unq_stim) + 1, dtype=int)
offsets[1:] = np.cumsum(counts)

# We are remapping trials to rows, and we need a place to store the output
trial_to_row = np.empty(n_trials, dtype=int)


# Loop through trial types
for i, stim in enumerate(unq_stim):
    
    this_trials = np.flatnonzero(stim_id == i)  # trial ids for stim i
    trial_to_row[this_trials] = offsets[i] + np.arange(len(this_trials), dtype=int)

    if len(this_trials) == 0:
        continue

    mask = np.isin(triggered_stim_index, this_trials)
    this_times = triggered_spike_times[mask]
    this_rows  = trial_to_row[triggered_stim_index[mask]]  # OK now that tsi is int

    # Plot this chunk of the raster
    ax[0].scatter(this_times, this_rows, s=.1)  # add marker='|' for look

    # Plot this PSTH
    a, _ = np.histogram(this_times, bins=bins)
    rate = a / this_trials.size / bin_size
    ax[1].plot(bin_centers, rate, label=str(stim))


# Labels
ax[0].set_xlabel('Time from stimulus (seconds)')
ax[0].set_ylabel('Trial # (sorted)')
ax[1].set_xlabel('Time from stimulus (seconds)')
ax[1].set_ylabel('Spike Rate (Hz)')
ax[1].legend()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 2.2 Linear Classifier
So far these data are promising - on average, the neuron doesn't respond the same way on every trial.

We are now ready to use a mathematical model to see if we decode stimulus identity from this neuron. Note that, despite these promising averages, this many not be a guarantee - the variability across trials may still make this very difficult.

For now, we will start with a linear classifier. Specifically, we will use <b>sklearn</b>'s implementation of a <i>Support Vector Classifier</i>, or <i>SVC</i>. These are part of a broader class of algorithms known as a <i> Support Vector Machine (SVM) </i>. Importantly, this means we have officially made it to the machine learning part of the course. Yay!




In [None]:
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

A linear classifier will, in effect, attempt to find a linear divider to separate to different classes of data. 

A SVC is a type of <i> supervised </i> classifier. This means that it is trained to do classification using data that have known labels.

Before we get to the neural data, lets work through a simple example of what this all means with some fake data.

In [None]:
# Get some randomly generated data
x_1 = np.random.normal(loc = 1,scale = 5,size = 10000)
x_2 = np.random.normal(loc = 6,scale = 5,size = 10000)
# Class identities
y_1 = np.ones(x_1.shape)
y_2 = np.ones(x_2.shape)*2

# make these into one big vector
x = np.concatenate([x_1,x_2])
y = np.concatenate([y_1,y_2])

# Plot them!
fig,ax = plt.subplots()
tmp_bins = np.arange(-10,20,.15)
ax.hist(x_1,tmp_bins,color= 'teal')
ax.hist(x_2,tmp_bins,color = 'darkorange')
ax.set_xlabel('Value')
ax.set_ylabel('# Samples')

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

One important thing to note here is that we are passing labeled data in an effort to learn the distinction between classes. This is a great way to build a model, but it comes with risks- because we are training our model using these data, it would not be fair to evaluate model performance using the same data.

In cases where we don't want to go out and colect more data, we can split our data into "training" and "testing." <b>Sklearn</b> provides a handy function for doing this called "train_test_spilt."


In [None]:
x_train,x_test,y_train,y_test = train_test_split(x,y)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Now we are ready to fit our model. Again, <b>sklearn</b> provides a useful interface for this.

Note: In this tutorial we are using SVC. However, one of the nice things about <b>sklearn</b> is that that it uses a standardized interface for all of its model fits. This makes it very easy to play with different classifiers, model types, etc.!

In [None]:
# Create a model fitting object
svc = LinearSVC()
svc

In [None]:
# SVC requires inputs be a certain shape. 
# When using 1-d arrays, we need to some reshaping to follow this convention
svc.fit(x_train.reshape(-1, 1),y_train.ravel())

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Now that we have this model, we can use it to predict new data. One way of doing this is to our held out data, and see how well we did.

In [None]:
y_prediction = svc.predict(x_test.reshape(-1, 1))
score = np.sum(y_prediction==y_test)/(len(y_test))
score

In [None]:
# This can also be acomplished using the built in "score" function
svc.score(x_test.reshape(-1, 1),y_test)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Alternatively, we can pass in x values to find out what class would have been predicted. This can be very useful for understanding how our model is actually doing its classification. 

In [None]:
# numbers to test
tmp_bins = np.arange(-10,20.15,.15)

# Plot our origional distributions
fig,ax = plt.subplots(nrows = 2)
ax[0].hist(x_1,tmp_bins,color= 'teal')
ax[0].hist(x_2,tmp_bins,color = 'darkorange')
ax[0].set_xlabel('Value')
ax[0].set_ylabel('# Samples')

# 
bins_prediction = svc.predict(tmp_bins.reshape(-1,1))
ax[1].axhline(1,color=  'teal',linewidth = 20,alpha = .6)
ax[1].axhline(2,color= 'darkorange',linewidth = 20,alpha = .6)
ax[1].plot(tmp_bins,bins_prediction,'k')


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 2.3 Now lets try with our neuron!!

The challenge, of course, is that our neural data is high dimensional. To keep things interpretable, lets try to small window from the overall spike train that will give us a decent chance of decoding information from this neuron.

In [None]:
fig,ax = plt.subplots(nrows = 2,figsize =(5,10))

# A counter, useful for stacking plots
counter = 0
for ii in range(len(unq_stim)):
    # spike times for this trial type
    this_triggered_spike_times = triggered_spike_times[stim_id[triggered_stim_index.astype(int)]==ii]

    # trial index subselected by this trial type.
    this_trl_idx = np.arange(np.sum(stim_id[triggered_stim_index.astype(int)]==ii))

    ax[0].scatter(this_triggered_spike_times,counter + this_trl_idx,s = 1)


    # stack the plots
    counter += np.max(this_trl_idx)
    
    # Plot the raster just for this stimulus type
    a,b = np.histogram(this_triggered_spike_times,bins = bins)
    a = a/np.max(this_trl_idx)/bin_size
    ax[1].plot(bin_centers,a,label = unq_stim[ii]) # Note that we are labeling each plot
    ax[1].set_xlabel('Time from stimulus (seconds)')
    ax[1].set_ylabel('Spike Rate (Hz)')

ax[0].axvspan(xmin = 0,xmax = .35,color = 'Gray',alpha = .2)
ax[0].set_xlabel('Time from stimulus (seconds)')
ax[0].set_ylabel('Trial number (sorted)')
ax[1].axvspan(xmin = 0,xmax = .35,color = 'Gray',alpha = .2)
ax[1].legend() # Plot a legend using the established labels.

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

We can start by counting the number of spikes for each trial type within this window. We will then go about fitting a linear classier to attempt to decode trial type from number of spikes within our window!

In [None]:
spike_times = this_structure_units_table.spike_times.values[unit]
stim_times = active_stimuli.start_time

start= 0
stop = .350

spike_count = []
trial_index = []

for i, stim_time in enumerate(stim_times):
    # Select spikes that fall within the time window around this stimulus
    mask = ((spike_times >= stim_time + start) & 
            (spike_times < stim_time + stop))
    
    # Count spikes in this bin
    spike_count.append(len(spike_times[mask]))
    
spike_count = np.array(spike_count)
trial_index = np.arange(len(spike_count))
trial_id_types,trial_id = np.unique(active_stimuli.image_name.values,return_inverse=  True)

# Get information for manual control of colors
prop_cycle = plt.rcParams['axes.prop_cycle']
color_list = [entry['color'] for entry in prop_cycle]

fig,ax = plt.subplots()
for jj in range(9):
    ax.scatter(spike_count[trial_id==jj],np.random.random(len(spike_count[trial_id==jj]))/2+jj-.25,s = 1)
    box = ax.boxplot(spike_count[trial_id==jj],
                        positions = [jj],widths=[.75],
                        showfliers=False,vert= False,
                        patch_artist = True,
                        medianprops=dict(color="black", linewidth=2))
    box['boxes'][0].set_facecolor(color_list[jj])
    box['boxes'][0].set_alpha(.3)

ax.set_yticklabels(unq_stim,rotation = 0)
ax.set_xlabel('# of spikes')
ax.set_ylabel('Stimulus')
ax.set_title('Spike count by trial type')
ax.legend()

In [None]:
# Just as before, we need to split out data.
x_train,x_test,y_train,y_test = train_test_split(spike_count,trial_id)

# Note that we are using the same x = predictor, y = class 
# label convention that we were using before. 
print(f'x train shape: {x_train.shape}')
print(f'y train shape: {y_train.shape}')
print(f'x test shape: {x_test.shape}')
print(f'y test shape: {y_test.shape}')

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">
    
<h3> Ready? </h3>
Go ahead and fit a linear SCV to the training data!

In [None]:
# Make a new model, call it svc
svc = 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h3> Try passing a few spike rates to your shiny new model </h3> 

How do things compare to the histogram we just made?


In [None]:
import matplotlib as mpl
prop_cycle = plt.rcParams['axes.prop_cycle']
color_list = [entry['color'] for entry in prop_cycle]

input_spike_count = np.arange(0,np.max(spike_count),.1) 
predicted_class = svc.predict(input_spike_count.reshape(-1,1))
fig,ax = plt.subplots(nrows = 2)


for jj in range(9):
    ax[0].scatter(spike_count[trial_id==jj],np.random.random(len(spike_count[trial_id==jj]))/2+jj-.25,s = 1)
    box = ax[0].boxplot(spike_count[trial_id==jj],
                        positions = [jj],widths=[.75],
                        showfliers=False,vert= False,
                        patch_artist = True,
                        medianprops=dict(color="black", linewidth=2))
    box['boxes'][0].set_facecolor(color_list[jj])
    box['boxes'][0].set_alpha(.3)


ax[0].set_yticklabels(unq_stim,rotation = 0)
ax[0].set_xlabel('# of spikes')
ax[0].set_ylabel('Stimulus')
ax[0].set_title('Spike count by trial type')
ax[0].legend()

for ii in range(len(unq_stim)):
    ax[1].axhline(ii,c = color_list[ii],linewidth=10,alpha = .5)
ax[1].plot(input_spike_count,predicted_class,'k',linewidth = 4)
ax[1].set_yticks(np.arange(9))
ax[1].set_yticklabels(unq_stim,rotation = 0)
ax[1].set_xlabel('# of spikes')
ax[1].set_ylabel('predicted class')
ax[1].set_title('Prediction by spike count')


fig.tight_layout()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

But how good is this model, really?

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h3> Compute the model score on your held out testing data</h3> 

Is this good? Do you think its better than "chance"?


In [None]:
# Get the predictions for our held out test data
prediction = svc.score(x_test.reshape(-1,1),y_test.ravel())

In [None]:
# Find the model score
score = svc.score(x_test.reshape(-1,1),y_test.ravel())
score

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Is this any good? When examining the answer to that question, it is probably most useful to ask, "how much better was our classifier than if we had just guessed by chance".

In cases where you have evenly distributed classes, chance estimates are easy. For example, if we just had 8 evenly distributed classes, chance performance would be 1/8. 

However, we don't have exactly even numbers of trials: there are, for example, fewer omission trials. The mouse also spent more time struggling with some stimuli than others. In cases like this, we can sometimes estimate a "chance" figure by asking, "if we shuffled trial identities, what fraction of the time would they line up with the true trial identity."

In [None]:
chance_est = []
for ii in range(1000):
    shuffled_trial_id = trial_id[np.random.randint(len(trial_id))]
    chance_est.append(np.sum(trial_id==shuffled_trial_id)/len(trial_id))

chance  =np.mean(chance_est)
print(f'Chance estimate: {chance}')

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

It is worth saying that, because we have more than one class, we can dig a little deeper into our prediction.

Specifically, when we guess incorrectly, miss-classifications often have structure to them. A confusion matrix is a useful tool to understand mistakes a classifier makes. Here we plot, given a "true" class, what was the distribution of predictions made by our classifier.

In [None]:
from sklearn.metrics import confusion_matrix

prediction = svc.predict(x_test.reshape(-1,1))

fig,ax = plt.subplots()
im  = ax.imshow(confusion_matrix(y_true=y_test, y_pred=prediction, normalize='pred'))
ax.set_xlabel('Predicted Class')
ax.set_ylabel('True Class')
ax.set_xticks(np.arange(len(unq_stim)))
ax.set_yticks(np.arange(len(unq_stim)))
ax.set_xticklabels(unq_stim,rotation = 90)
ax.set_yticklabels(unq_stim)
cbar = plt.colorbar(im)
cbar.set_label('Fraction Guessed')


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h3> Compare the confusion matrix to the histogram for this neuron </h3> 

Can you understand why the classifier is making the errors that it is?

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

# 3.0 Now for a neural population.

While the mathematical formalism is useful to understand what our chosen neuron is doing here, if we are being honest its probably overkill for understanding this single neurons activity. There are, in fact, whole classes of regression models that might be better suited for asking what information a neuron encodes. 

Why, then, have we spent so much time on this example? Decoding gives us a mathematical way to look at what information we can extract from a neural population. As this population becomes larger, however, it can become increasingly difficult to visualize and intuit what our decoder is doing under the hood. Starting with this one-dimension example will help with intuition as we move to this harder case.


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

With that said, lets use decoding to take a hard look at the timescale of VISPs' encoding of stimulus identity. 

Doing this is going to require a fair bit of data wrangling. Just as before, we needed to get the number of spikes per trial. But if we are looking over time and over neurons, we need to get the number of spikes for each neuron on each trial. 

Here is one way to do this, though you will notice that it is a little on the slow side.

In [None]:
%%timeit -n 1 -r 1

n_neurons = len(this_structure_units_table.spike_times.values)
stim_times = active_stimuli.start_time


bins = np.arange(-.2,.5,.05)
storage = np.empty((n_neurons,len(stim_times),len(bins)-1))

for nn in range(n_neurons):
    spike_times = this_structure_units_table.spike_times.values[nn]

    spike_count = []
    trial_index = []
    
    for i, stim_time in enumerate(stim_times):
        # Select spikes that fall within the time window around this stimulus

        mask = ((spike_times >= stim_time + np.min(bins)) & 
                (spike_times < stim_time + np.max(bins)))
        
        # Align spike times to stimulus onset (0 = stimulus)
        trial_spikes,_ = np.histogram(spike_times[mask] - stim_time,bins)
        
        spike_count.append(trial_spikes)
    
    storage[nn,:,:] = np.array(spike_count)
trial_index = np.arange(len(spike_count))
trial_id_types,trial_id = np.unique(active_stimuli.image_name.values,return_inverse=  True)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

This code is actually pretty hard to follow. It would be more readable if we converted the core piece of it into a function:

In [None]:
def get_binned_triggered_spike_times(spike_times,stim_times,bins):
    spike_count = []
    trial_index = []
    
    for i, stim_time in enumerate(stim_times):
        # Select spikes that fall within the time window around this stimulus
        
        mask = ((spike_times >= stim_time + np.min(bins)) & 
                (spike_times < stim_time + np.max(bins)))
        
        # Align spike times to stimulus onset (0 = stimulus)
        trial_spikes,_ = np.histogram(spike_times[mask] - stim_time,bins)
        
        spike_count.append(trial_spikes)
    return np.array(spike_count)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

You will notice, though, that this is a nice way to organize our code, but it won't actually make anything go any faster.

Now, we are going to give you the *fast* version of this function. If you end up working with triggered spike trains much during this course, the difference in time here can be pretty meaningful.

In [None]:
def get_binned_triggered_spike_counts_fast(spike_times, stim_times, bins):
    """
    Fast peri-stimulus time histogram using searchsorted.

    Parameters
    ----------
    spike_times : 1D array_like, sorted
        Times of all spikes (e.g. in seconds).
    stim_times : 1D array_like
        Times of stimulus onsets.
    bins : 1D array_like
        Bin edges *relative* to stimulus (e.g. np.linspace(-0.1, 0.5, 61)).

    Returns
    -------
    counts : 2D ndarray, shape (n_trials, len(bins)-1)
        counts[i, j] is the number of spikes in bin j of trial i.
    """
    # ensure numpy arrays
    spike_times = np.asarray(spike_times)
    stim_times = np.asarray(stim_times)
    bins = np.asarray(bins)

    # If your spike_times isn't already sorted, uncomment:
    # spike_times = np.sort(spike_times)

    n_trials = stim_times.size
    n_bins = bins.size - 1
    counts = np.zeros((n_trials, n_bins), dtype=int)

    for i, stim in enumerate(stim_times):
        # compute the absolute edges for this trial
        edges = stim + bins
        # find the insertion indices for each edge
        idx = np.searchsorted(spike_times, edges, side='left')
        # differences between successive indices = counts per bin
        counts[i, :] = np.diff(idx)

    return counts

In [None]:
%%timeit -n 1 -r 1
n_neurons = len(this_structure_units_table.spike_times.values)
stim_times = active_stimuli.start_time


bins = np.arange(-.2,.5,.1)
storage = np.empty((n_neurons,len(stim_times),len(bins)-1))

for nn in range(n_neurons):
    spike_times = this_structure_units_table.spike_times.values[nn]

    spike_count = []
    trial_index = []

    storage[nn,:,:]  = get_binned_triggered_spike_counts_fast(spike_times,stim_times,bins)

trial_index = np.arange(len(spike_count))
trial_id_types,trial_id = np.unique(stimuli.image_name.values,return_inverse=  True)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

The %%timeit is preventing the above code from saving inputs, so lets run this fast version one more time for real

In [None]:
n_neurons = len(this_structure_units_table.spike_times.values)
stim_times = active_stimuli.start_time


bins = np.arange(-.2,.75,.1)
storage = np.empty((n_neurons,len(stim_times),len(bins)-1))

for nn in range(n_neurons):
    spike_times = this_structure_units_table.spike_times.values[nn]

    spike_count = []
    trial_index = []

    storage[nn,:,:]  = get_binned_triggered_spike_counts_fast(spike_times,stim_times,bins)

trial_index = np.arange(len(spike_count))
trial_id_types,trial_id = np.unique(active_stimuli.image_name.values,return_inverse=  True)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">


Now, even though we binned at a higher temporal rate than before, we can still fit a classifier that is analogous to our single-neuron version.


In [None]:
stimulus_change_number = active_stimuli.flashes_since_change.values

In [None]:
change_number = 0

# Select a time window to use for decoding
inc_time_idx = np.where((bins>.1) & (bins<.3))[0] # select times to include
start_idx = np.min(inc_time_idx)
end_idx = np.max(inc_time_idx)

# Find the number of spikes in the selected window
X = np.sum(storage[:,stimulus_change_number==change_number,start_idx:end_idx],axis=2).T 
# And the trial identity for each of the selected stimuli
y = trial_id[stimulus_change_number==change_number]

# Spit the data
x_train,x_test,y_train,y_test = train_test_split(X,y)

# Fit the model
svc = LinearSVC()
svc.fit(x_train,y_train)
print(f'Model score: {svc.score(x_test,y_test)}')

# make and plot the confusion matrix
prediction = svc.predict(x_test)
fig,ax = plt.subplots()
im  = ax.imshow(confusion_matrix(y_true=y_test, y_pred=prediction, normalize='pred'))
ax.set_xlabel('Predicted Class')
ax.set_ylabel('True Class')
ax.set_xticks(np.arange(len(unq_stim)))
ax.set_yticks(np.arange(len(unq_stim)))
ax.set_xticklabels(unq_stim,rotation = 90)
ax.set_yticklabels(unq_stim)

cbar = plt.colorbar(im)
cbar.set_label('Fraction Guessed')

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### 3.1 Timescale of population dynamics  

What this means is that we can, using all of our VISp neurons, <b>very</b> reliably decode stimulus identity. This is, in and of itself, pretty exciting. But we can use our new decoding technique to dig deeper into the behavior of the neural population within our chosen time window.

To do this, we can loop through a fit a model to each time bin. Once we have that model, we can compute the score for each model using both the training and testing partitions in our dataset.

In [None]:
score_train = np.zeros(len(bins)-1)
score_test = np.zeros(len(bins)-1) 

# Loop through and fit a model to each time bin.
for ii in range(len(bins)-1):
    svc = LinearSVC()

    # Find the number of spikes in the selected window
    X = storage[:,stimulus_change_number==change_number,ii].T
    # And the trial identity for each of the selected stimuli
    y = trial_id[stimulus_change_number==change_number]
        
    x_train,x_test,y_train,y_test = train_test_split(X,y)
    svc.fit(x_train,y_train)
    # Find score on both the training and the test data.
    score_train[ii] = svc.score(x_train,y_train)
    score_test[ii] = svc.score(x_test,y_test)

In [None]:
# Now Plot the score on the test data.
fig,ax = plt.subplots()
ax.plot(bins[:-1]+.05,score_test,label = 'Test')
ax.axhline(np.mean(chance),linestyle = '--',label = 'Chance est.',c = 'r')
ax.legend()
ax.set_xlabel('Time from stimulus')
ax.set_ylabel('Model Score')

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Remember how we made such a big deal about splitting our data before? The importance of doing this becomes especially visible now that we have moved to the high dimensional, many neuron case. Because the dimensionality here is so high, models tend to do very, very well at predicting the data they were trained on. 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h3> Plot the testing and training data scores </h3> 

How does it compare? How does it compare to our chance estimate?


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
### 3.2 Cross validation

So far, we have seen only a single train/test split for each dataset. However, it is often useful to split your data multiple times. Depending on how you do this, it can allow you to do more fine grained characterization of either the variability or (depending on how you split) time course of what you are decoding.

To make this easy, <b>sklearn</b> includes a cross_validate function that automates much of what you could achieve yourself using a for loop. Its default is to do 5-fold cross validation. This means that the data is split in 5ths. 5 models are then fit, each with 4/5 of the data for training and the remaining 1/5 for testing. 

Lets try it here for 0-100 ms.

Note: There is nothing special about the number 5- you can use n-fold cross validation as suits your question. In the extreme case, you can use "Leave-one-out" cross validation, where n= number of samples. 


In [None]:
from sklearn.model_selection import cross_validate
cross_validate(LinearSVC(),storage[:,:,3].T,trial_id)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Now, we can loop through to get a sense of our decoder variance over time. This effectivly allows us to draw errorbars on our decoder!


In [None]:
n_cross = 5
scores = np.empty([len(bins)-1,n_cross])       

for ii in range(len(bins)-1):
    # Find the number of spikes in the selected window
    X = storage[:,stimulus_change_number==change_number,ii].T
    # And the trial identity for each of the selected stimuli
    y = trial_id[stimulus_change_number==change_number]
    
    scores[ii,:] = cross_validate(LinearSVC(),X,y,cv = n_cross)['test_score']


In [None]:
fig,ax = plt.subplots()
median_score = []
for ii in range(len(bins)-1):
    px = plt.scatter([bins[ii]]*(n_cross),scores[ii,:],c = [1,2,3,4,5])
    median_score.append(np.mean(scores[ii,:]))
plt.plot(bins[:-1]+bin_size/2,median_score)
ax.axhline(np.mean(chance_est),linestyle = '--',label = 'Chance est.',c = 'r')
ax.set_xlabel('Time from stimulus')
ax.set_ylabel('Model Score')
cbar = plt.colorbar(px)
cbar.set_ticks([1,2,3,4,5])
cbar.set_label('Fold #')

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### A couple quick notes: 
### 3.4 firing rates
Lets quickly look at the distribution of maximum firing rates for our VISp units.

In [None]:
rates = storage[:,:,3].T

In [None]:

fig,ax = plt.subplots(nrows = 2)
ax[0].hist(np.max(rates,axis=0),25)
ax[0].set_xlabel('Max Rate')
ax[0].set_ylabel('# of units')
ax[0].set_title('Max')

ax[1].hist(np.mean(rates,axis=0),25)
ax[1].set_xlabel('Mean Rate')
ax[1].set_ylabel('# of units')
ax[1].set_title('Mean')

fig.tight_layout()


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

There is a danger here: depending on how you do you decoding, the high firing rate neurons could end up caring far more import than the low rate neurons.

A common solution to (1) center our data around zero by subtracting the mean and (2) normalize the variance by dividing by the standard deviation. In other words, we need to "Z-score" our data or "Standardize" it. 

$ \vec{x_{rescaled}} = \frac{\vec{x}-mean(x)}{stdev(x)}$

<b>Sklearn</b> provides a handy interface to do this using the "<b>StandardScaler</b>."

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
S = StandardScaler()
rates_rescaled = S.fit_transform(rates)

In [None]:
fig,ax = plt.subplots(nrows = 2)
ax[0].hist(np.max(rates_rescaled,axis=0),25)
ax[0].set_xlabel('Max Rate')
ax[0].set_ylabel('# of units')
ax[0].set_title('Max')

ax[1].hist(np.mean(rates_rescaled,axis=0),25)
ax[1].set_xlabel('Mean Rate')
ax[1].set_ylabel('# of units')
ax[1].set_title('Mean')

fig.tight_layout()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### BUT WAIT!

If this scaling was such a risk, why have we made it this far into the workshop without worrying about it?

It turns out that linear models are one of the cases where rescaling will not change behavior of our model. Rescaling will, of course, change the weights fit by the linear model. But, because Z-scoring is itself a linear operation, it wont impact the loss function or prediction. 

In [None]:
mdl = cross_validate(LinearSVC(),rates,stim_id)
print(mdl['test_score'])

mdl_rescaled = cross_validate(LinearSVC(),rates_rescaled,stim_id)
print(mdl_rescaled['test_score'])

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### So...Should I rescale?

It depends. In many cases (like here!), it won't matter. 

However, there are cases where it can make a big difference. Rescaling is, for example, an essential part of principle components analysis- without it large values would dominate the resulting PCs.

Even if rescaling won't change the predictions of your model, it can make it easier to understand. Model weights will scale with firing rate. As a result, if you want to compare the impacts of different units with different weight on your linear model, rescaling can make these weights interpretable.

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### A few quick notes: 
### 3.5 Population size impacts decoding

Up till now, we seen decoding with both a single neuron, and our entire V1 population.

As it turns out, there is a lot of space in the middle of these two questions.

In [None]:
# The random library is useful for sub selecting data
import random

# Here is an example of sampling, without replacement, from a set of indicies.
idx_list = list(np.arange(0,len(this_structure_units_table)))
samples = random.sample(idx_list,6)
subset  = np.array(samples).astype(int)
print(subset)

In [None]:
# number of neuron sets to grab
n_selections = 25
# number of neurons per set.
n_neurons = np.arange(1,len(this_structure_units_table),1)

# Storage for our model scores
scores = np.zeros([n_selections,len(n_neurons)])

# Choose the window 0-100ms after the stimulus
X = storage[:,stimulus_change_number==change_number,3].T
# And the trial identity for each of the selected stimuli
y = trial_id[stimulus_change_number==change_number]

# Loop through nuerons and subsets
for nn,neuron_count in enumerate(n_neurons):
    for ii in range(n_selections):
        samples = random.sample(idx_list,neuron_count)
        subset  = np.array(samples).astype(int)
        scores[ii,nn] = np.mean(cross_validate(LinearSVC(),X[:,subset],y,cv = 3)['test_score'])



In [None]:
# find the mean and standard error of the mean for each neuron number
means = np.mean(scores,axis =0)
st_err = np.std(scores,axis = 0)/np.sqrt(n_selections)
fig,ax = plt.subplots()

ax.errorbar(n_neurons,
             means,
             yerr = st_err,
           label = 'Model Score')
ax.set_xlabel('# of neurons')
ax.set_ylabel('Average Model Score')
ax.axhline(chance,c = 'r',linestyle = '--',label = 'chance')
ax.legend()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

# 4.0 Whats next?

So far, we have seen that visual cortex encodes image identity in a visual task.

This, in and of itself, may not be that surprising to you. But as you have seen, decoders are a useful tool in understanding the dynamics of a population - we have used them to understand the time course of visual responses, as well as to understand how broadly distributed the visual code might be.

Now, lets try moving beyond image identity. This is, after all, a change detection task. Even though "change" images are drawn from the same set as the non-change images, can we differentiate change image presentations? 

To answer this, we can use the same "storage" matrix that we created before- its just a matter of changing 'y' class labels that we are decoding.

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h3> Try decoding change images. </h3> 

Can you plot the timecourse of change encoding?


In [None]:
# Here is some code to get things started
fig,ax = plt.subplots()
median_score = np.zeros(len(bins)-1)
for ii in range(len(bins)-1):
    X = storage[:,:,ii].T
    y = active_stimuli.is_change.values
    ...


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<h3> How did you do? A little too well? </h3> 

Lets see if we can figure out what went wrong here

In [None]:
# Wait...what why are we doing so well.
prob_of_no_change = 1-np.sum(stimuli.is_change)/len(stimuli)
print('Success rate if you always guessed no: ' + str(prob_of_no_change))

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

In cases with very uneven sampling, the 'balanced' option for the Linear SVC can be very helpful!

In [None]:
# Decode somethings else.
fig,ax = plt.subplots()
median_score = []
for ii in range(len(bins)-1):
    scores = cross_validate(LinearSVC(class_weight='balanced'),storage[:,:,ii].T,active_stimuli.is_change.values,)
    scores = scores['test_score']
    ax.scatter([bins[ii]+np.median(np.diff(bins))/2]*n_cross,scores,c = [0,1,2,3,4])
    median_score.append(np.median(scores))
ax.plot(bins[:-1]+np.median(np.diff(bins))/2,median_score)
ax.axhline(.5,linestyle = '--',c = 'r')

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

# 4.1 Closing thoughts.

This workshop has focused on decoding and how to assess your decoding. Here, we largely used cross validation to assess how well a model was doing across nominally homogenous block of trials. But what we are really asking with cross validation is "how well does my model do on data it hasn't seen before." 

As you saw in Shawn's talk this morning, however, mice and mouse behavior are often far from homogenous. 

It is often useful to fit a model in one condition and test in another- this is a way to assess if population encoding has changed. Hold on to this idea - we will see it again later this afternoon!

