
![Image](./resources/cropped-SummerWorkshop_Header.png)

<h1 align="center">Encoding </h1> 
<h2 align="center"> Day 1, Exercises. SWDB 2023 </h2> 

<h3 align="center">Monday, August 21, 2023</h3> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">
There are more excersizes here than you can likely do in a afternoon. That said, please try a few of them - this is your first chance to use you newly minted Python skills to dig into a new data set! 
    
Remember, if you get stuck there are TAs all around to help you. Don't waste time beating your head against a problem. We are here to help!
    

In [1]:
# We need to import these modules to get started
import numpy as np
import pandas as pd
import platform
import os
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
# Set file location based on platform. 
platstring = platform.platform()
if ('Darwin' in platstring) or ('macOS' in platstring):
    # macOS 
    data_root = "/Volumes/Brain2023/"
elif 'Windows'  in platstring:
    # Windows (replace with the drive letter of USB drive)
    data_root = "E:/"
elif ('amzn' in platstring):
    # then on Code Ocean
    data_root = "/data/"
else:
    # then your own linux platform
    # EDIT location where 
    # you mounted hard drive
    data_root = "/media/$USERNAME/Brain2023/"

In [3]:
from allensdk.core.brain_observatory_cache import BrainObservatoryCache

manifest_file = os.path.join(data_root,'allen-brain-observatory/visual-coding-2p/manifest.json')

boc = BrainObservatoryCache(manifest_file=manifest_file)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 1: Explore direction tuning. </h2>
     <p>
   
The workshops earlier today looked deeply at only a few examples of direction tuning in excitatory cells in primary visual cortex (Visp). Try digging a little deeper into the Brain Observatory data. 
</p>   

<p>
There is abolutly no right answer to this exercise, and everyone will end up with different adventure. 
</p>   
    
Some ideas include:
<ul>
  <li>Try a different brain area. </li>
  <li>Or try a different Cre line. Maybe an inhibitory line would be particularly interesting?</li>
  <li>Try adifferent imaging  depth.</li> 
  <li>Find a session with more running, or one where the mouse never runs.  </li> 
</ul>
   
<p> 
Do these areas/cre line/depths have direction tuning? Does it look different than in this mornings workshops? Are cells in the population more or less reliable?
</p>
    
<p>   
There is a very large search space. One strategy might be to team up with someone else so you can systematically try and compare a few different combinations.
</p>

To get you started, here is a quick refresher on how to query sessions in the data. We will go over two ways, one using the primarily the AllenSDK and the other outsourcing some of your queries to Pandas.

Recall: the BrainObservatoryCache will tell you all of the avalible brain areas, cre lines, etc.

In [4]:
print(boc.get_all_targeted_structures())
print(boc.get_all_cre_lines())

['VISal', 'VISam', 'VISl', 'VISp', 'VISpm', 'VISrl']
['Cux2-CreERT2', 'Emx1-IRES-Cre', 'Fezf2-CreER', 'Nr5a1-Cre', 'Ntsr1-Cre_GN220', 'Pvalb-IRES-Cre', 'Rbp4-Cre_KL100', 'Rorb-IRES2-Cre', 'Scnn1a-Tg3-Cre', 'Slc17a7-IRES2-Cre', 'Sst-IRES-Cre', 'Tlx3-Cre_PL56', 'Vip-IRES-Cre']


You can then use the Cache object to query for your session(s) of interest. 

If you unsure where to start, we recomend choosing an SST-IRES-Cre line. Here, expression of GCaMP is driven in SST-positive interneurons, a type of inhibitory cell. This will form a nice juxdeposition to the excitatory cell line use in this morning's example. 

One note: you may notice that there are relativly few inhibitory neurons in any given session. This is because there are proportionally fewer inhibitory cells in cortex, and therefore fewer in any experiments field of view.

In [5]:
sessions_list = boc.get_ophys_experiments(stimuli=['drifting_gratings'],
                                                            cre_lines = ['SSt-IRES-Cre'],
                                                            targeted_structures=['VISpm'])
sessions_table = pd.DataFrame(sessions_list)
sessions_table.head()

Unnamed: 0,id,imaging_depth,targeted_structure,cre_line,reporter_line,acquisition_age_days,experiment_container_id,session_type,donor_name,specimen_name,fail_eye_tracking
0,603763073,325,VISpm,Sst-IRES-Cre,Ai148(TIT2L-GC6f-ICL-tTA2),102,603763385,three_session_A,323528,Sst-IRES-Cre;Ai148(CAM)-323528,True
1,639117196,375,VISpm,Sst-IRES-Cre,Ai148(TIT2L-GC6f-ICL-tTA2),97,639117194,three_session_A,340377,Sst-IRES-Cre;Ai148-340377,True
2,639251932,375,VISpm,Sst-IRES-Cre,Ai148(TIT2L-GC6f-ICL-tTA2),96,639251930,three_session_A,340854,Sst-IRES-Cre;Ai148-340854,True
3,599909878,300,VISpm,Sst-IRES-Cre,Ai148(TIT2L-GC6f-ICL-tTA2),111,599920955,three_session_A,315562,Sst-IRES-Cre;Ai148(CAM)-315562,False
4,603188560,275,VISpm,Sst-IRES-Cre,Ai148(TIT2L-GC6f-ICL-tTA2),103,602397921,three_session_A,321822,Sst-IRES-Cre;Ai148(CAM)-321822,False


From here, you can select a session ID to try using the code from this morning!

In [28]:

sessions_with_spont_list_rbp4 = boc.get_ophys_experiments(stimuli=['spontaneous'], 
                                                cre_lines = ['Rbp4-Cre_KL100'], 
                                                targeted_structures=['VISp'])
#print the number of sessions
print('The number of sessions with Rbp4 in VISp: ', len(sessions_with_spont_list_rbp4) )

#create a dataframe of the sessions
sessions_table = pd.DataFrame(sessions_with_spont_list_rbp4)

#show the first 5 rows of the dataframe
sessions_table.head()

#determine the number of fail_eye_tracking sessions in the dataframe and print the number and isolate the session ids
fail_eye_tracking = sessions_table[sessions_table.fail_eye_tracking == True]
print('The number of sessions with failed eye tracking: ', len(fail_eye_tracking))
fail_eye_tracking_ids = fail_eye_tracking.id.values
print('The ids of the sessions with failed eye tracking: ', fail_eye_tracking_ids)

#create the dataframe of the sessions without failed eye tracking, and rename it to sessions_table
sessions_table_without_failedeyetracking = sessions_table[sessions_table.fail_eye_tracking == False]

#show the first 5 rows of the dataframe
sessions_table_without_failedeyetracking.head()


The number of sessions with Rbp4 in VISp:  21
The number of sessions with failed eye tracking:  6
The ids of the sessions with failed eye tracking:  [502962794 638754561 502665019 502741583 637672042 637115675]


Unnamed: 0,id,imaging_depth,targeted_structure,cre_line,reporter_line,acquisition_age_days,experiment_container_id,session_type,donor_name,specimen_name,fail_eye_tracking
0,652340572,375,VISp,Rbp4-Cre_KL100,Ai93(TITL-GCaMP6f),105,649401934,three_session_B,352161,Rbp4-Cre_KL100;Camk2a-tTA;Ai93-352161,False
1,649401936,375,VISp,Rbp4-Cre_KL100,Ai93(TITL-GCaMP6f),89,649401934,three_session_A,352161,Rbp4-Cre_KL100;Camk2a-tTA;Ai93-352161,False
2,591537010,375,VISp,Rbp4-Cre_KL100,Ai93(TITL-GCaMP6f),139,588503721,three_session_C2,301125,Rbp4-Cre_KL100;Camk2a-tTA;Ai93-301125,False
3,571494829,375,VISp,Rbp4-Cre_KL100,Ai93(TITL-GCaMP6f),122,571137444,three_session_C2,288600,Rbp4-Cre;Camk2a-tTA;Ai93-288600,False
5,571137446,375,VISp,Rbp4-Cre_KL100,Ai93(TITL-GCaMP6f),117,571137444,three_session_A,288600,Rbp4-Cre;Camk2a-tTA;Ai93-288600,False


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 2: To df/f or not to df/f, that is the question. </h2>

In the workshop, we chose to use detected Ca2+ events to build our receptive fields and analyse our data. Did this matter? How might this have changed our results?
    
<p> To test this, find a few cells with known tuning behavior. Try building a tuning curve using both the df/f traces and the detected event magnitudes. What does this decision do to the tuning curves? In what situtations might this distinction matter more or less? </p> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 3: What else might be encoded in visual cortex?</h2>
   
    <p>
Remeber, these mice saw more then just driting grating stimuli! Lets see if we can learn about the spatial selectivity of these cells, that is, we can ask whether cells encode the spatial location of stimuli.
   
        <p>
To do this, we will use a stimulus set known as "locally sparse noise." Here, mice were shown movies consisting of a mostly grey screen with non-neighboring (locally sparse) black and white pixels positioned randomly (noise) on the screen. 


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 3a: Get the dataset and look at the noise stimulus</h2>
    
First, lets get the session containing the locally sparse noise and plot a frame of the stimulus.

In [None]:
sessions_list = boc.get_ophys_experiments(experiment_container_ids= [637998953],
                                          stimuli=['locally_sparse_noise_8deg'],)
sessions_table = pd.DataFrame(sessions_list)
sessions_table

In [None]:
# Load the dataset
session_id = sessions_table.id.values[0]
data_set = boc.get_ophys_experiment_data(ophys_experiment_id=sessions_table.id.values[0])

In [None]:
# Get the stimulus table
stim_table = data_set.get_stimulus_table('locally_sparse_noise_8deg')
stim_table.head()

In [None]:
# Get the stimulus movie
# It is called the "stimulus template"
stimulus_template = data_set.get_stimulus_template('locally_sparse_noise_8deg')
stimulus_template.shape

In [None]:
# Plot a single frame
fig,ax = plt.subplots()
ax.imshow(stimulus_template[0],cmap= 'Greys')
ax.set_xlabel('X pixels')
ax.set_ylabel('Y pixels')

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 3b: Get per-trial responses  </h2>
    
Here, we need to make a decision about how to bin time for each trial. How large should the trial window be? Should it extend beyond the lenght of the trail? Why? 

In [35]:
# create a function that iterates the input ids which is a list of session ids 
# and prints the number of cells in each session
def print_number_of_cells(input_ids): 
    for session_id in input_ids: #for each session id in the list of session ids
        data_set = boc.get_ophys_experiment_data(ophys_experiment_id=session_id) #get the data set for that session id
        print('The number of cells in session ' + str(session_id) + ' is ' + str(data_set.get_cell_specimen_ids().shape[0]))
        
#call the function on the sessions with failed eye tracking
print_number_of_cells(fail_eye_tracking_ids)

#create a function that plots the mean response of a a randome cell to a stimulus 

The number of cells in session 502962794 is 79
The number of cells in session 638754561 is 20
The number of cells in session 502665019 is 71
The number of cells in session 502741583 is 76
The number of cells in session 637672042 is 15
The number of cells in session 637115675 is 23


In [None]:
# Choose a window. 
# We will (somewhat arbitrarily) start with 15 frames, or about 1/2 secons
window = 15

In [None]:
# Get the average response amplitude on each trial
responses = np.zeros((len(stim_table),num_cells))
for ii,row in stim_table.iterrows():
    for cc in range(num_cells):
         responses[ii,cc] = events[cc,row.start:row.start+window].mean()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 3c: Build a design matrix using this stimulus</h2>
    
We can represent each frame in the 2 dimension image by "flattening" the image into the vector. Stacking these vectors will allow us to build a design matrix.

In [None]:
# What is the shape of one frame?
stimulus_template[0].shape

In [None]:
# What does this look like flattened?
flat_stim = stimulus_template[0].flatten()
flat_stim.shape

In [None]:
fig,ax = plt.subplots()
ax.plot(flat_stim)
ax.set_xlabel('pixel id')
ax.set_ylabel('pixel intensity')

In [None]:
# Do this for every trial
design_matrix = np.zeros((len(stim_table),len(flat_stim)))
for ii in range(len(stim_table)):
    design_matrix[ii,:] = stimulus_template[ii].flatten()
design_matrix

In [None]:
# For ease of interpetability, change white values to 1s, greys to 0, and blacks to -1:
design_matrix[design_matrix==255] = 1
design_matrix[design_matrix==0] = -1
design_matrix[design_matrix==127]=0

In [None]:
# Check that worked
fig,ax = plt.subplots()
plt.plot(design_matrix[0,:])
ax.set_xlabel('pixel id')
ax.set_ylabel('pixel intensity')

In [None]:
# what does this look like?
# Its too big to plot the whole thing, but lets look at a section; 
# The first 150 rows is a good start
fig,ax = plt.subplots()


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 3d: Split your data into training and testing sets</h2>

In [None]:
# Import model fitting code
from sklearn.model_selection import train_test_split
# Split


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 3e: Fit a model. </h2>
    
<p>
We recomend cell_specimine_id  = 662194111, but try several!
   
    <p>
Try plotting the model coefficients. Because we flattened the stimulus frame early, you will need to use np.reshape to get the vectorized coefficients back into the shape of the original frame in the stimulus template. What does the pattern tell you? 
        


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 3f: Fit a model for all the cells in the session</h2>

<p> Plot the coefficients for the most and least reliable models. What do you notice about the population of responses?
   

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 4: What else might be encoded in visual cortex?</h2>
    
Try to see if you can find a neuron that reliable encodes another of the presented stimulus sets. 
    
Some places to try:
<ul>
  <li>Tempral frequencies: We filtered this mornings data to select only one temporal frequncy. Was there temporal frequency tuning as well?</li>
  <li>Natural Scenes: Scenes were shown more than one time. Are there cells with consistant responses across presentations? </li>
  <li>Natural Movies: Are there cells with a consistant response timecourse as a move unfolds?</li>
</ul>
    
Answering each of these questions will require thoughts about the assumtions you make about the data. For example, what time window makes sense to use? Given the timecourse of Ca2+ data, should you include time after a stimulus for short presentations? Similarly, how might you bin data in a natural movie? 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 5: The Bootstrap - Another way to assess reliability</h2>
  
<p>
In this morning's tutiorial, we saw the use of one particular statistic, the direction selectivity index, or DSI. We also saw the key problem with DSI and many similar metrics - they will tell you how selective a cell is (on average) to a particular stimulus, but not how reliable this selectivity is.
    </p>   
<p>
One approach to address this problem is to ask the hypotheical, "What would this metric be for my cell if my cell's repsonses were independent of the stimulus?" By breaking the relationship between a stimulus and a cells response, we can get sense of whether our observed metric was higher than what we might have expected by chance. 
    </p>    

<p>
We can break this relationship by shuffling the cell's response relative to whatever value in the stimulus we want to test. Because we have already collected the data, we can shuffle the same data set many times to build a distribtuion of "null" values, and look at where our observed value falls in this distribuition. 
    </p>
    

<p>
This is a version of a technique known as "bootstraping," because our resampling allows us to "pull our data up by its bootstraps" - resample our data without needing to collect more.
</p>
    
    
<p>
While bootstrapping does not explicilty quanitify reliability, it is an indirect way of assessing it. If a cell has a reliable response to a particular stimulus, shuffling the stimulus identity will greatly reduce the DSI. On th

<p>
Here, lets try some bootstrapping using the data from this mornings session. Convienently, we have already saved these data in a way that will make them easy to work with for this exercise:


In [None]:
# Load data from workshop
import os

session_id = 637998955
load_loc = os.path.join('/scratch','Workshop1')

orientation = np.load(os.path.join(load_loc,str(session_id)+'_orientation.npy'))
temp_freq = np.load(os.path.join(load_loc,str(session_id)+'_temp_freq.npy'))
mean_response_all = np.load(os.path.join(load_loc,str(session_id)+'_mean_response_all.npy'))

#filter for temp_freq = 1:
orientation = orientation[temp_freq==1]
mean_response_all = mean_response_all[temp_freq==1,:]
temp_freq = temp_freq[temp_freq==1]

# Get all orientations
orientations = np.unique(orientation)

In [None]:
# Define a fuction that computes DSI
def dsi(ori_vals,tuning):
    """
    Computes the direction selectivity of a cell. 
    De Vries 2019

    Parameters
    ----------
    ori_vals : float array of length N
         List of orientation values, in degrees
    tuning : float array of length N
        Each value the (averaged) response of the cell at a different
        orientation, in the same order as ori_vals

    Returns
    -------
    dsi : float
        Direction selectivity Index
    """
    # Find the index of the max response
    pref_idx = np.argmax(tuning) 
    # Find the prefered direction
    pref_dir = ori_vals[pref_idx] 
    # and prefered response.
    R_pref = tuning[pref_idx]
    
    # Null direction is opposing, so 180 degress from prefered
    # Here % sign is the modulus, so wraps around 360.
    null_dir = (pref_dir +180)%360 
    # Find the index of the null direction
    null_idx = list(ori_vals).index(null_dir)
    # Find the null response
    R_null = tuning[null_idx]
    
    return (R_pref-R_null)/(R_pref+R_null)

def compute_tuning_curve(mean_response,orientation,orientations):
    """
    Compute the tuning curve for a set of reponses and orientations
    
    Parameters
    ----------
    mean_response : np.array
        The mean response for each stimulus
    orientation : np.array
        Orientation of each stimulus
    orientations: np.array
        All orientations to compute tuning over
        Useful when a subset of orientations are needed.
    
    Returns
    -------
    tuning : np.array
        mean response at each orientation
    stdev: np.array 
        Standard deviation of responses at each orientation
    """
    tuning = np.zeros(orientations.shape)
    stdev = np.zeros(orientations.shape)
    for ii,ori in enumerate(orientations):
        tuning[ii] = mean_response[orientation==ori].mean()
        stdev[ii] = mean_response[orientation==ori].std()
    return tuning,stdev
    

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 5a: Compute DSI</h2>

In [None]:
# Lets grab the same cell we were using before
cell_idx = 17

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

Compute the dsi for this cell. Note that here we used the full session whereas this morning we used only the first half, which is why this number is not exactly the same.

In [None]:
tuning,_ = compute_tuning_curve(mean_response_all[:,cell_idx],orientation,orientations)
observed_dsi = dsi(orientations,tuning) 
observed_dsi

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

Now lets shuffle responses, and recompute the DSI. How does it compare?
    
How you shuffle your data is an important part of what you will learn with any bootstrapping analysis. Here, we are randomly permuting cell responses, so that the statistics of the cell remain the same but the stimulus relationship is broken. However, randomly shifting reponses in time would have achieved the same effect, while preserving the cells temporal statistics as well. When choosing shuffling techniques, it is important to think about and understand what relationship(s) is/are being broken by your shuffling - these are what you will assess with your analysis.

In [None]:
shuffled_responses = np.random.permutation(mean_response_all[:,cell_idx])
shuffled_tuning,_ = compute_tuning_curve(shuffled_responses,orientation,orientations)
shuffled_dsi = dsi(orientations,shuffled_tuning)
shuffled_dsi

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 5b: Build a distribution</h2>
    
Using the for loop below, build a distribtion of shuffled responses for this cell. Plot the histogram of these data, and compare this to the observed DSI value for this cell.

In [None]:
n_shuffles= 10000
shuffled_dsi = np.zeros(n_shuffles)
for ii in range(n_shuffles):
    None # delete this and fill in your code!

In [None]:
# Plot the histogram
fig,ax = plt.subplots()


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 5c: Significance test</h2>
   
    <p>
Now that we have a distribtion, we can ask "What is the probabliltiy that we would have observed this DSI value if there were not relationship to orientation tuning?" This is, by definition, a p-value for our cells tuning. Like all pvalues, we can assess "significance" for our cell using a significance threshold. 
    </p>
<p> 
Here, compute the p-value for this cell. Common signifcance thresholds are e.g. $p<.05$, $p<.01$. Did we pass?

In [None]:
pval = 1-np.sum(shuffled_dsi<observed_dsi)/(n_shuffles+1)
pval

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 5d: Do this for the population</h2>
    <p>
        
Now scale this analysis up for the population of cell in this recording session. For each cell, compute the observed DSI and its p-value. Using a signficance threshold of $p<.05$, plot the distributions of DSI for significant and non-significant cells.
    </p>  
    <p>
    
Note: One of the chalenges of bootstrapping is computational cost. We used 10,000 shuffles above, but this will take a few minutes if you do it for all your data. We recomend starting with 1000 shuffles here (but take a moment to think about what this might do to the result).
    </p>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 6: Running speed as a categorical variable.</h2>
 
    <p>
    Earlier we carried our regression model treating running speed as a continuous variable. But, the difference between running and not running might be far more consequential than changes in the value of running speed per se. This would motivate treating running speed as categorical variable, just as we have treated the stimulus orientations.  
</p>   

<p>
Try repeating the regression analysis in this way. Do the results change? Is the answer sensitive to how we choose define our running threshold?
</p>   

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 7: Systematic evaluation of orientation- and running-modulation.</h2>
 
    <p>
    Our regression analysis focused on just one arbitrarily selected cell. Presumably, our scientific question did not pertain to this specific cell in this specific mouse <i>per se</i>, but more generally, cells belonging to the visual cortex of mice. Accordingly, any conclusions drawn on the basis of our analysis depend critically on the degree to which our selected cell is representative of the this broader population.  
</p>   

<p>
In previous analyses, you've seen that things can look quite a bit different across different cells. For this exercise, try carrying out a more systematic investigation of tuning and running modulation of visual cortical cells. How  The goal is to obtain a better understanding of variability at the scale that our question concerns.
</p>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #DFF0D8; ">

<h2> Exercise 8: Constructing a generalized linear model (GLM).</h2>
 
    <p>
    We tried to spice up our linear model using polynomials -- still, this analysis assumes a particular distribution for our dependent variable (in particular, the residuals).  
</p>   

<p>
An additional change for the sake of flexibility would be to equip our model with a <i>link function</i> via what's known as a <i> Genaralized </i> Linear Models (GLM). GLMs are routinely used to model spiking neural responses to stimuli, as spiking events are discrete, integer observations (counts) that are not Gaussian (but, rather <i> Poisson </i>) distributed.
</p>

<p>
The GLM is defined as follows for some input $\vec{x}_i$ and output $y_i$:
</p>
    
<p>
$P(y_i|\vec{x}_i;\vec{w}) = F(g^{-1}(\vec{w}\cdot\vec{x}_i))$
</p>
    
<p>
Where $g$ is called the "link function", $F(m)$ represents some probability distribution with mean $m$, and $\vec{w}$ is a vector of fitted parameters. These parameters are fit by finding the $\vec{w}$ that maximizes $\prod_{i=1}^N P(y_i|\vec{x}_i;\vec{w})$ for some dataset of $N$ samples. Note that when $g$ is the identity, and $F$ is the normal distribution with some fixed variance, this is just a linear regression problem.
</p>
 
<p>
Fortunately, the sklearn package that we've been using so far also implements multiple GLMs with common link functions. These regressors can be imported from sklearn.linear_model as before, and they are equipped with the same functions (e.g., <i>fit</i>, <i>score</i>, and <i>predict</i>.
</p>
    
<p>
Below, try examining two different glm fits for the event data. Recall that above, we have taken the mean of the calcium event trace. How might the best link function differ if we used df/F instead? Compare model fits below.
</p>
    
   <p>
    <b> Note: </b> "score" will no longer correspond to $R^2$ once we've incorporated a (non-identity) link function! Hence, quantitative comparison across models is non-trivial. For now, though, try to create tuning curves based on these various linear models (i.e., linear regression as before, and GLMs with different link functions). How do these modeled tuning curves compare in appearance?
   </p>
    
    <p>
        <b> Hint: </b> The GLM regressors default to including a pretty strong regularization term ($L_2$ penalty). Start out by calling these regressors by explicitly setting this value to 0, as demonstrated in the next cell.
   </p>
 
</div>