<img src="../resources/cropped-SummerWorkshop_Header.png">  

<h1 align="center">Brain Observatory - Visual Behavior </h1> 
<h2 align="center">Summer Workshop on the Dynamic Brain </h2> 
<h3 align="center">Monday, August 26, 2019</h3> 


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Visual Behavior Exercises Overview </h2>
    
<p> This notebook contains exercises covering several topics including behavior performance, single cell physiology and across session analysis.
  
<p> The exercises can be done in sequential order, or you can skip around to the exercises that interest you most. 
    
<p> If you get stuck, please ask a TA for help, look it up on stackoverflow, or check the solutions notebook. 
    We dont expect you to know everything already - the goal is to learn, and learning requires making mistakes and asking questions. 

<h3>Outline </h3>

<p><b>Behavior Exercises</b>
<ul><li>Exercise 1.1: Plot the distribution of reaction times for go trials from one session
<li>Exercise 1.2: Plot reward rate over time during a session
<li>Exercise 1.3: Compute and plot hit rate over time during a session
<li>Exercise 1.4: Plot the average hit rate for each image
<li>Exercise 1.5: Plot the response probability for all image transitions
</ul>
<p><b>Single Cell Physiology Exercises</b>
<ul><li>Exercise 2.1: Plot activity across stimulus repetitions using the flash response dataframe
<li>Exercise 2.2: Correlate neural activity with running speed on a flash by flash basis
<li>Exercise 2.3: Plot the trial averaged response across images for one cell using the trial response dataframe
<li>Exercise 2.4: Create a heatmap of all cells' trial averaged responses following an image change
<li>Exercise 2.5: Compute a sparseness metric for one cell
<li>Exercise 2.6: Identify image responsive cells and the mean lifetime sparseness across the population
</ul>
<p><b>Across Session Physiology Exercises</b>
<ul><li>Exercise 3.1: Follow along to learn some useful pandas tricks for multi session data comparison
<li>Exercise 3.2: Compare the activity of matched cells across sessions
<li>Exercise 3.3: Merge all the experiments from a single container and plot the mean response for different image sets
<li>Exercise 3.4: Plot the trial averaged response for passive vs. active sessions from one container
</ul>
</div>

In [None]:
# you will need these libraries for computation & data manipulation
import os
import numpy as np
import pandas as pd

# matplotlib is a standard python visualization package
import matplotlib.pyplot as plt
%matplotlib inline

# seaborn is another library for statistical data visualization
# seaborn style & context settings make plots pretty & legible automatically
import seaborn as sns
sns.set_context('notebook', font_scale=1.5, rc={'lines.markeredgewidth': 2})
sns.set_style('white');
sns.set_palette('deep');

In [None]:
# Import allensdk modules for loading and interacting with the data
from allensdk.brain_observatory.behavior.swdb import behavior_project_cache as bpc
# Import allensdk utilities for Visual Behavior
import allensdk.brain_observatory.behavior.swdb.utilities as tools

In [None]:
# AWS path
cache_path = r'/data/dynamic-brain-workshop/visual_behavior/2019'

# Mac/Linux path
cache_path = r'/Volumes/Brain2019/dynamic-brain-workshop/visual_behavior/2019'

# Windows path
cache_path = r'H:\dynamic-brain-workshop\visual_behavior\2019'

cache = bpc.BehaviorProjectCache(cache_path)

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Pick an experiment session based on its <code>cre_line</code>, <code>imaging_depth</code> and <code>stage_name</code></b>

<p>1) Filter the experiments table according to your metadata of interest and get the <code>ophys_experiment_id</code> for a session of your choosing. 
    
<p>Hint: use pandas Boolean indexing to filter by multiple column values.
    
 __[Documentation for Boolean indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html)__


In [None]:
experiments = cache.experiment_table

In [None]:
# get an ophys_experiment_id for a session of interest


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>Load the session object for your experiment

</div>

In [None]:
# load a session from the cache


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Behavior Exercises</h2>

<p> This set of exercises explores basic behavior metrics, using the 'trials' dataframe. The 'trials' dataframe is organized around the times of stimulus identity changes (go trials) and sham change times (catch trials). It contains data and metadata for each trial, including lick times, reward times, and image identity.
    
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 1.1: Plot the distribution of reaction times for go trials from one session</b>

<p> 1) get the <code>trials</code> dataframe from the session object. 
    
<p> 2) Filter the trials dataframe to get go trials only. 
    
<p> 3) Use the values of the <code>response_latency</code> column to plot a histogram of reaction times. 
    
<p> <code>response_latency</code> is the first lick time, in seconds, relative to the change time. 
    
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 1.2: Plot reward rate over time during a session</b>

<p> 1) Use the <code>reward_rate</code> and <code>change_time</code> columns of the trials dataframe to plot reward rate over time. 
    
<p> The <code>reward rate</code> on each trial has been pre-computed as the number of rewards per minute, over a 25 trial rolling window. 
   
<p> Was the mouse actively performing the task and earning rewards during the entire session? 
    
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 1.3: Compute and plot hit rate over time during a session</b>

<p> 1) Filter the trials dataframe to get the <code>response_binary</code> column for <code>go</code> trials only. Assign this to a new variable called <code>go_responses</code> for further computation. Note: Make sure that you get <code>go_responses</code> as a pandas series, without calling <code>.values</code>, so that the next step will work properly. 
    
<p> The <code>response_binary</code> column of the trials dataframe contains a 1 for all trials where there was a licking response within the 750ms reward window and a 0 where there was not. 

<p> 2) Apply the pandas <code>rolling()</code> method to <code>go_responses</code> followed by <code>.mean()</code> to take a rolling mean across across go trials. Set <code>window = 25, center = True</code> in the call to <code>rolling()</code> for a centered window over 25 trials. Set the output of this step to a variable called <code>rolling_hit_rate</code>. 

 __[Documentation for pandas.rolling()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html)__
       
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

<p> 3) Plot the <code>rolling_hit_rate</code>. Label your axes.
       
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 1.4: Plot the average hit rate for each image</b>

<p>1) Loop through the 8 images in this session and quantify the fraction of go trials where there was a correct response for each image to get the mean hit rate across the session. 
    Hint: Use <code>session.trials.change_image_name.unique()</code> to get the image names.

<p>2) Plot the average hit rate for each image, with image names along the x-axis. 
    
<p> Bonus: Sort hit rate values in ascending order and apply the same sorting to the image names along the x-axis. Hint: Useful functions include np.sort() and np.argsort()
    
</div>

In [None]:
# get the hit rate for each image


In [None]:
# sort the hit rates in ascending order and sort the image labels in the same order


In [None]:
# plot hit rate by image with image names on the x-axis


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 1.5: Plot the response probability for all image transitions</b>

<p> The task cycles through 8 different images, resulting in 64 possible image transitions. Some image transitions might be easier for the mouse to detect than others. 

<p> 1) Use pandas <code>pivot_table</code> on the trials table to aggregate and average the <code>response_binary</code> values by <code>initial_image_name</code> and <code>change_image_name</code>. This will create a matrix of response probability for all image transitions. 
    
 __[Documentation for pandas.pivot_table()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html)__ 
 
</div>

In [None]:
# use pivot table to make a matrix of respones probability


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p> 2) Plot the hit transition matrix as a heatmap. Try using seaborn's heatmap function.

 __[Documentation for seaborn.heatmap()](https://seaborn.pydata.org/generated/seaborn.heatmap.html#seaborn.heatmap)__ 
    

<p> Did the mouse respond similarly for image changes compared to the same image repeated on catch trials? Are some image transitions more detectable than others? 

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>Bonus: Plot response probability across image transitions for engaged trials only


<p> 1) Filter out trials where the mouse wasn't reliably performing the task using the value of the <code>reward_rate</code> column, with a threshold of 2 rewards per minute to distinguish engaged from disengaged periods, then plot the transition heatmap again. 

<p> Does varying engagement influence how we should analyze neural activity?

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2> Single Cell Physiology Exercises</h2>
    
<p> These exercises explore neural activity aligned to trials or to all stimulus flashes. They make use of the 'trial_response_df' and 'flash_response_df' dataframes that have been pre-computed for you, after temporal alignment between ophys and stimulus data streams. 

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p><b>Info about trial and flash response dataframes</b>
    
<p> The <code>trial_response_df</code> contains the response of each cell to each behavioral trial during the task.  
    
<p> The <code>flash_response_df</code> contains the response of each cell to each individual stimulus presentation during the session. 
    
<p> Both dataframes have a column called <code>dff_trace</code> that contains a segment of each cell's fluorescence trace over a window of time. For the <code>trial_response_df</code> this window is [-4,8] seconds relative to the <code>change_time</code> for each trial. For the <code>flash_response_df</code> this window is [-0.5, 0.75] seconds relative to the stimulus <code>start_time</code> for for each flash. 
    
<p> The duration of the window over which the <code>dff_trace</code> was extracted can be found in the <code>analysis_files_metadata</code> attribute of the cache object. 
    
<p> Both dataframes also have a column called <code>mean_response</code> that contains each cell's response averaged over  500ms after the <code>change_time</code> or the <code>start_time</code>. The period of time used for averaging to get the <code>mean_response</code> is also stored in the <code>analysis_files_metadata</code> attribute of the cache object. 
    
<p> Examine the <code>analysis_files_metadata</code> to get a better understanding of how <code>trial_response_df</code> and <code>flash_response_df</code> were created. 

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

<p> Examine the <code>analysis_files_metadata</code> to get a better understanding of how <code>trial_response_df</code> and <code>flash_response_df</code> were created. 

</div>

In [None]:
# examine the metadata in the cache object
cache.analysis_files_metadata

In [None]:
# get the window relative to the 'change_time' that is used to create the 'dff_trace' column of the 'trial_response_df'
print('window relative to change_time for dff_traces in trial_response_df =', 
      cache.analysis_files_metadata['trial_response_df_params']['window_around_timepoint_seconds'],
     'relative to change_time')

In [None]:
# get the window relative to the 'start_time' that is used to create the 'dff_trace' column of the 'flash_response_df'
print('window for dff_traces in flash_response_df =', 
      cache.analysis_files_metadata['flash_response_df_params']['window_around_timepoint_seconds'],
     'relative to start_time')

In [None]:
# what is the duration of time used for averaging to create the 'mean_response' column of either dataframe? 
print('duration of mean_response window =',
      cache.analysis_files_metadata['flash_response_df_params']['response_window_duration_seconds'])

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 2.1: Plot activity across stimulus repetitions using the flash response dataframe</b>

<p> 1) Assign session.flash_response_df to a variable called <code>fr</code>. Use .copy() when you assign the variable to avoid accidentally changing values of the original dataframe. 

<p> 2) Pick a cell and get all flashes of the preferred image for that cell by filtering <code>fr</code> by both the <code>cell_specimen_id</code> column and the <code>pref_stim</code> column. The preferred stimulus for each cell was pre-computed as the image that evoked the largest average response for that cell.

</div>

In [None]:
# assign flash_response_df to a shorter variable name


In [None]:
# get all flashes of preferred image for one cell


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p> 3) Get the values of the <code>dff_trace</code> column from your filtered dataframe. Loop through these values and plot the traces for all flashes of the preferred image. Set color='gray. 
    
Bonus: Plot with time relative to the stimulus onset, in seconds, on the x-axis. Hint: Use the <code>dff_trace_timestamps</code> and <code>start_time</code> columns to get time relative to stimulus onset for one flash.
    
<p> 4) Take the mean of the <code>dff_trace</code> values and plot the average response on the same figure as the individual trials, this time setting color='b'.

Does this cell have a reliable response to its preferred image? 

In [None]:
# plot flash responses for one cell


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">   

<p> 5) Create this plot for multiple cells. How do different cells respond?
    
Bonus: Plot multiple cells in one figure on different axes. Use <code>fig, ax = plt.subplots()</code> followed by <code>ax = ax.ravel()</code> to create iterable axes. 

 __[Documentation for matplotlib.pyplot.subplots()](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.subplots.html)__ 
    
 __[Why use fig, ax = plt.subplots() ? ](https://stackoverflow.com/questions/34162443/why-do-many-examples-use-fig-ax-plt-subplots-in-matplotlib-pyplot-python )__ 

</div>

In [None]:
# plot flash responses for 15 cells


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 2.2: Correlate neural activity with running speed on a flash by flash basis</b>

<p> Trial to trial variability could arise from differences in animal behavior. One possibility is modulation by running speed. 

<p> 1) Pick a cell and select all flashes of its preferred stimulus. 
    
<p> 2) Create a scatterplot of running speed vs neural response magnitude using the <code>mean_response</code> and <code>mean_running_speed</code> columns of the <code>flash_response_df</code>. 

<p> The <code>mean_running_speed</code> is the average of the running_speed trace during the 250ms stimulus presentation for each image flash. 
    
</div>

In [None]:
# get preferred stimulus flashes for one cell


In [None]:
# plot scatter plot of mean response vs. running speed


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p> 2) Compute the Pearson correlation between the flash-wise <code>mean_response</code> and <code>mean_running_speed</code> using <code>scipy.stats.pearsonr()</code>. Is there a correlation? 

</div>

In [None]:
# get pearson correlation 
import scipy.stats as st


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

<p> Bonus: Compute the running correlation for all cells in the session and plot a histogram of the values. 

</div>

In [None]:
# get pearson correlation values for all cells


In [None]:
# plot distribution of pearson r values


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 2.3: Plot the trial averaged response across images for one cell using the trial response dataframe</b>
   
<p> 1) Assign the <code>trial_response_df</code> to a variable called <code>tr</code>. Dont forget to use <code>.copy()</code>
    
<p> 2) Which cell had the largest <code>mean_response</code>? What image was shown on that trial? Was it a go trial or a catch trial?  
    
</div>

In [None]:
# get trial_response_df


In [None]:
# get the trial with the largest value of mean_response


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p> 3) Plot the average <code>dff_trace</code> across trials for the cell, image name, and trial type identified in the step above. Plot the x-axis in seconds relative to the <code>change_time</code>. 
    
Bonus: Show the the time of the change flash (from 0 to .25 seconds after the change time) using ax.vspan(). 
</div>

In [None]:
# plot the trial averaged trace for the conditions identified above


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
 
<p> 4) Loop through the images shown in this session and plot the average dF/F trace for each image for this cell.
    
</div>

In [None]:
# plot the mean response for all images


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 2.4: Create a heatmap of all cells' trial averaged responses following an image change</b>

<p> The SDK has utilities functions to make some computations easier. Import the utilities using the code below, then run help() on the function <code>get_mean_df</code>. What are it's inputs and outputs? 
    
</div>

In [None]:
# import SDK utilities 
import allensdk.brain_observatory.behavior.swdb.utilities as tools

help(tools.get_mean_df)

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

<p> 1) Filter the <code>trial_response_df</code> to get only go trials, then pass to the <code>get_mean_df</code> function. Set <code>conditions = ['cell_specimen_id', 'change_image_name']</code>. Assign the output of the function to <code>mean_df</code>. 

</div>

In [None]:
mean_df = tools.get_mean_df(tr[tr.go], conditions=['cell_specimen_id', 'change_image_name'])
mean_df.head()

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;"> 
<p> 2) Filter <code>mean_df</code> by <code>pref_stim</code> = True to limit the data to each cell's preferred image. 
    
<p> 3) Get the values in the <code>mean_trace</code> column and convert to an array using <code>np.stack()</code>.  This response array should be m x n where m is the number of unique cells in the session and n is the length of the <code>mean_trace</code> in frames.
    
</div>

In [None]:
# create a matrix of the mean dF/F traces for the preferred image for all cells


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;"> 
<p> 4) Plot a heatmap of all cells' mean trace for their preferred image. Set the vmax of the heatmap equal to the 95% percentile value of the response array using <code>np.percentile()</code>. Set vmin to 0. 
    
 __[Documentation for numpy.percentile()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html)__ 

<p> What structure do you see in this population? Bonus: Sort the cells to help see structure in the data.

<p> Extra bonus: Set xticklabels to display time in seconds relative to the change time. Hint: You can use the <code>ophys_frame_rate</code> to convert between ophys frames (the units of the <code>mean_trace</code>) and time in seconds. <code>ophys_frame_rate</code> can be obtained using the <code>metadata</code> attribute of the session object, or in the <code>analysis_files_metadata</code> attribute of the cache object. You may also need to know the window around the change time that was used in the creation of the <code>trial_response_df</code>. This can be found in <code>cache.analysis_files_metadata</code>, or you can recall the default value of [-4, 8].

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 2.5: Compute a sparseness metric for one cell</b>

<p>lifetime sparseness is a metric for how selective and sparse a cell's activity is across conditions, bounded between 0 and 1. A high value of this metric indicates high selectivity - a differential response to one or a few stimulus conditions over others. A low value of this metric indicates a similar response across all conditions. 

<p> 1) Create an array containing the mean response across all flashes for each of the 8 images in the session for one cell. Hint: Use the <code>get_mean_df</code> function introduced in Exercise 2.4 to create a dataframe with the mean response by image for all cells, using the <code>flash_response_df</code>. 
    
<p> 2) Plot this array to visualize the cell's tuning for images. 
    
</div>

In [None]:
# get mean_df for flashes


In [None]:
# get array of image responses for one cell and plot it


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

<p> 2) Provide the image response array to the function below to compute the lifetime sparseness metric for your cell. How selective is this cell? 
    
</div>

In [None]:
def compute_lifetime_sparseness(image_responses):
    # image responses should be an array of the trial averaged responses to each image
    # sparseness = 1-(sum of trial averaged responses to images / N)squared / (sum of (squared mean responses / n)) / (1-(1/N))
    # N = number of images
    # after Vinje & Gallant, 2000; Froudarakis et al., 2014
    N = float(len(image_responses))
    ls = ((1-(1/N) * ((np.power(image_responses.sum(axis=0),2)) / (np.power(image_responses,2).sum(axis=0)))) / (1-(1/N)))
    return ls

In [None]:
# compute lifetime sparseness using the provided function


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 2.6: Identify image responsive cells and the mean lifetime sparseness across the population</b>
    
<p> We only want to quantify lifetime sparseness for cells with a significant image response, otherwise we would be including noise in our measurement. Before taking a population average, lets first identify responsive cells. 
    
<p> The <code>p_value</code> column of the <code>flash_response_df</code> is computed as a one-way ANOVA comparing the values of the dF/F trace in the 500ms after the flash with the activity during the spontaneous activity period. 

<p> Let's define responsive cells as having at least 10% of trials with a <code>p_value</code> < 0.005

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p> 1) Loop through all cells in the <code>flash_response_df</code>, evaluate the fraction of trials with <code>p_value</code> < 0.005 and make a list of the indicies of responsive cells. Assign the list to a variable called <code>responsive_cells</code>.
                                                                                                              <p> Alternatively, you can use the output of <code>get_mean_df()</code>, which includes a column called <code>fraction_signifiant_responses</code> where this value was computed for each cell for the given conditions. Provide  <code>conditions=['cell_specimen_id']</code> to get the <code>fraction_significant_responses</code> across all images for each cell.
                                                                                                           
<p>What fraction of cells in this experiment were responsive following a stimulus change? 

</div>

In [None]:
# get responsive cells 


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>2) Loop through each cell in <code>responsive_cells</code>, compute lifetime sparseness as you did above, and add the value to a list. 

<p>3) Convert the list to an array and take the mean. How does the average selectivity across the population compare with your single cell measured in the previous Exercise? 

</div>

In [None]:
# get the mean lifetime_sparseness for responsive cells


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2> Across Session Physiology Exercises</h2>
  
<p>This section deals with comparing neural activity across different experiment sessions, both at the single cell and population level. It also teaches you some neat pandas tricks for reformatting data in useful ways.  
    
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 3.1: Follow along to learn some useful pandas tricks for multi session data comparison</b>

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
One interesting aspect of this dataset is that there are multiple behavior + ophys sessions from each animal. In some cases we might want to perform analyses that compare multiple sessions from the same container to know how behavior or neural responses change with each stage of the task.     

</div>

In [None]:
experiments = cache.experiment_table
experiments.head(15)[['container_id','stage_name','ophys_experiment_id']]

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
From there, we can select two experiment dataframes we wish to analyze from the same container.
</div>

In [None]:
#an active, image_set A dataset
a_trials = cache.get_session(792815735).trials
#an active, image_set B dataset
b_trials = cache.get_session(795953296).trials

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
We can compare many metrics between these two sessions, as an example, the hit rate between each session using code developed above
</div>

In [None]:
fig, ax = plt.subplots(2,figsize=(10,4))

ax[0].plot(a_trials.change_time, a_trials.reward_rate)
ax[0].set_title('image set A session')
ax[0].set_ylabel ('reward rate')

ax[1].plot(b_trials.change_time, b_trials.reward_rate)
ax[1].set_xlabel ('time in session (sec)')
ax[1].set_title('image set B session')
ax[1].set_ylabel ('reward rate');
plt.tight_layout()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
For more in-depth analysis and to ease plotting, we can merge the two dataframes, passing keys to keep their labels
</div>

In [None]:
a_b = pd.concat([a_trials,b_trials], keys=['a', 'b'])
a_b.head()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
Note that looking at the tail lets you see the b trial dataframe that was just merged
</div>

In [None]:
a_b.tail()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
We can calculate simple metrics by grouping by the index. In a multi-index dataframe, these can be referenced by their level
</div>

In [None]:
a_b.groupby(level=0).trial_length.mean()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
We can also quickly plot metrics across the two sessions
</div>

In [None]:
ax = sns.boxplot(data=a_b[a_b.go].reset_index(),x='level_0', y='response_latency')
ax.set_xlabel('image set')

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
We can also get an idea if there are different numbers of trials, in this instance hit trials
</div>

In [None]:
sns.countplot(data=a_b.reset_index()[['level_0','hit']],x='hit',hue='level_0')

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p> <b>Exercise 3.2: Compare the activity of matched cells across sessions</b>
   
<p> In our experimental design, the same 2-photon field of view is imaged across multiple sessions. A cell may be observed in one or more sessions, dependending on whether or not it is active on different days. Cells that are identified across multiple days have the same 'cell_specimen_id' in all sessions in which they were observed. 
    
<p> To compare activity across multiple sessions, you can use another useful tool provided in the SDK utilities - the <code>create_multi_session_mean_df</code> function.      
    
<p> 1) Run help on  <code>create_multi_session_mean_df</code>. What are its inputs and outputs? 
    
<p> 2) Create a multi session df using the same 2 experiment sessions that were used above (<code>experiment_ids = [792815735, 795953296]</code>) and assign the output to variable called <code>multi_session_df</code>. Set <code>flashes = True</code> to merge across the <code>flash_response_df</code> for the 2 sessions. Note: If <code>flashes = False</code> (the default setting), the function will merge the <code>trial_response_df</code>. 
    
</div>

In [None]:
help(tools.create_multi_session_mean_df)

In [None]:
multi_session_df = tools.create_multi_session_mean_df(cache, [792815735,795953296], flashes=True, conditions=['cell_specimen_id','image_name'])

In [None]:
# what are the columns? 


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
By using the <code>groupby()</code> command you can quickly generate comparisons of the same cells across sessions. 
</div>

In [None]:
# get the mean response across images for each experiment session, for each cell_specimen_id
multi_session_df.groupby(['cell_specimen_id','experiment_id'])['mean_response'].mean()

<iv style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p> Using the <code>unstack()</code> command you can regroup these values by cell
    
<p> How can you tell if a cell was identified in both sessions?
</div>

In [None]:
# unstack the grouped dataframe to get each cell's mean_response for different experiments as columns
cell_exp_mean = multi_session_df.groupby(['cell_specimen_id','experiment_id'])['mean_response'].mean()
cell_exp_mean.unstack(level=-1).head()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
If the function you want to apply after the <code>groupby()</code> command operates on something other than a numeric value, such as the <code>mean_trace</code> array, you might need to use the <code>.apply(<function>)</code>command. 
    
</div>

In [None]:
# get the mean trace across images for each experiment, for each cell
mean_trace = multi_session_df.groupby(['cell_specimen_id','experiment_id'])['mean_trace'].apply(np.mean)
mean_trace.head()

In [None]:
# unstack to get experiments as columns
mean_trace = mean_trace.unstack(level=-1)
mean_trace.head()

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;"> 
<p> 3) Use the <code>mean_trace</code> dataframe generated above to plot the dF/F trace for one cell across the 2 experiment sessions. Include the <code>experiment_id</code> that each trace came from in the figure legend. Make sure it is a cell that has a <code>mean_trace</code> in both sessions. 
    
<p> How does the response differ across days? 
    
</div>

In [None]:
# plot the average flash response for one cell across 2 sessions

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p> <b>Exercise 3.3: Merge all the experiments from a single container and plot the mean response for different image sets</b>   
    
<p> 1) Get all experiment_ids for a single container_id. 

<p> 2) Merge the <code>trial_response_dfs</code> across sessions using <code>create_multi_session_mean_df</code> and assign the output to <code>container_trial_mean_df</code>. 
</div>

In [None]:
# get experiment_ids for one container


In [None]:
# create multi_session dataframe with experiments from this container for trials


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">    
<p> 3) Plot the mean response for image set A vs. image set B for each cell as a scatter plot. Hint: use <code>groupby()</code> and <code>unstack()</code> as demonstrated above to make it more efficient. 
</div>

In [None]:
# create scatterplot of cell by cell responses to image set A vs. B


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">    
<p> 4) Plot the population average trace across cells for image set A vs. image set B using the values of the <code>mean_trace</code> column. Indicate the image set for each trace in a figure legend. 
<p> Bonus: plot the x-axis in seconds. 
<p> Which image set evokes stronger activity across the population? 
</div>

In [None]:
# plot the population average trace for image set A vs B


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 3.4: Plot the trial averaged response for passive vs. active sessions from one container</b>
   
<p> Use the pandas skills learned above to plot the mean response on passive vs. active sessions. 
    
<p> 1) Plot the cell by cell mean response for passive vs. active as a scatterplot. Use the <code>passive</code> column of the <code>container_trial_mean_df</code> dataframe to differentiate passive vs. active.  
</div>

In [None]:
# create a scatterplot of cell by cell responses averaged across passive vs. active sessions


In [None]:
# plot the population average trace across all cells for active vs. passive
