![cropped-SummerWorkshop_Header.png](attachment:02a8fa98-5280-4e58-972b-b49e606371b3.png)

<h1 align="center">Day 1 Workshop 2: Science Quest</h1>  
<h2 align="center"> How do neuron interactions change after learning? </h4>   
<h2 align="center"> SWDB 2025</h2> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
<h2>Goal</h2>
    
In this workshop, we will use our data access skills and dataset knowledge to ask a specific question: 

<b>How do neuron interactions change after learning?</b>
   
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
How should we go about addressing this question? 
    
#### <b> Step 1: </b>Which datasets have we learned about that would be useful to analyze to address this question? 
    
We will want a dataset with a behavior task where the mouse has to learn something, and we will want there to be a period of time before and after learning that we can use to compare activity to see if it changes. 
    
Check our dataset supertable and find a dataset with a behavior task and a spontaneous activity period pre and post task learning. 
    
We can use... the BCI learning dataset! 
    
<b>Why use this dataset?</b> In this dataset, mice learn to control the activity of a specific neuron in order to receive rewards. The activity of the conditioned neuron is read out in real time and is linked to the movement of a reward spot. Over a few trials, the activity of the conditioned neuron becomes coupled to the movement of the spout and the mouse "learns" to activate that neuron to move the spout towards then to get rewards. In contrast to other behavior tasks, this task enables us to study how learning of a single neuron can affect other cells in the network. See below for additional details of this dataset. 
    
#### <b> Step 2: </b>What analysis method can we use to address this question? 
    
One simple way to see if neurons could be interacting is to quantify how correlated their activity patterns are. If neurons are consistently co-active, they are likely to be connected or receive shared input as part of an interacting circuit. 
        
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Let's use our new data access skills to load the BCI data and compare neuron activity correlations during the spotaneous periods before and after a mouse learns the BCI task!

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<h2>Outline:</h2>
    
<h4>Part 1: Accessing the BCI data</h4> 
    - Metadata table & pynwb

    
<h4>Part 2: What are the conditions in this dataset? </h4>
    - Intervals, Epochs and Trials

<h4>Part 3: Quantifying activity correlations before and after BCI learning</h4>
    - Cell activity traces, selecting time periods of interest, computing correlations

<h4>Part 4: Do correlations depend on distance between neurons?</h4>
    - Segmented ROI masks and spatial relationships
<h4>   </h4>
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>BCI dataset overview </h2> 
    
**Overview:** In this notebook, we will analyze neural activity and connectivity changes 
during learning using a dataset from the [Credit Assignment During Learning](https://www.allenneuraldynamics.org/projects/credit-assignment-during-learning) 
project at the Allen Institute for Neural Dynamics. This dataset provides a unique window 
into how cortical circuits adapt as mice learn to volitionally control a brain-computer interface (BCI) 
using single neurons in motor cortex. This tutorial focuses on population activity and event-aligned responses
in one experimental paradigm, multi-day learning, longitudinal connectivity mapping, 
and causal perturbations across many animals.
 
**Dataset:** 
Neural activity was recorded from layer 2/3 excitatory neurons in the primary motor cortex 
of head-fixed mice using two-photon calcium imaging. Imaging was performed over multiple days
 as each animal learned and performed a BCI task. Each day, a new conditioned neuron (CN) was selected, 
 the activity of this neuron was mapped in real-time to the position of a motorized reward port. 
 To receive water rewards, mice had to learn to increase the activity of the CN to move the port into reach.
The dataset also includes simultaneous photostimulation and calcium imaging sessions, in which individual or 
groups of neurons were optogenetically stimulated to assess their causal influence on the surrounding network. 
These connection mapping sessions were repeated daily to measure how connectivity changed as learning progressed.
 Imaging data were preprocessed using Suite2p and include motion-corrected fluorescence traces, 
 extracted ROIs, inferred spiking events, and stimulus-aligned behavioral metadata. 
 All data are registered across days to track the same neurons longitudinally.

 **Experiment:**
 This experiment was designed to test competing models of learning rules—such as Hebbian learning, 
 long-range input modulation, and biologically plausible approximations of error backpropagation—by 
 directly measuring changes in neural activity and inferred connectivity during learning. 
 The core task involved a closed-loop BCI paradigm in which a single neuron’s activity controlled 
 a reward mechanism. Because the mapping from activity to behavior was fully defined by the experimenter,
  this paradigm enables ground-truth labeling of neurons as behaviorally causal (e.g., the CN) versus merely correlated.
To probe learning-related circuit changes, cellular-resolution two-photon photostimulation
 was used to perturb neurons before and after learning. By analyzing evoked responses, 
 researchers could infer the presence and strength of functional connections. 
 Learning-induced changes in connectivity were then compared to predictions 
 from recurrent neural network models trained with different plasticity rules, 
 enabling discrimination between competing learning algorithms.



<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Import packages and load data

In [None]:
# General imports 
import os 
import re
import numpy as np
import pandas as pd
from pathlib import Path
import scipy.stats as stats 
from skimage import measure
import matplotlib.pyplot as plt

# Pynwb imports
from hdmf_zarr import NWBZarrIO
from nwbwidgets import nwb2widget

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Part 1: Accessing the BCI data</h2>   

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Metadata

The 'metadata.csv' in the '/data' folder contains relevant information about the experimental session and the subject. 

| Column    | Description |
| -------- | ------- |
| id | data asset id |
| name | filename of data asset (raw) |
| subject_id| numerical id for animal subject  |
| session_time |  experiment date (%Y-%m-%d %H:%M:%S)   |
| session_type   |  experiment identifier  |
| genotype  | subject genotype   |
| virus   | injected virus type  |
| ophys_fov   | field of view identifier  |
| session_number    | behavior training session number   |

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Load the metadata csv file as a pandas dataframe. What are the column values? 

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

What are the unique values for `genotype`? What aspects of the genotype might be useful for this experiment?

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

Sort the dataframe by `subject_id` and `session_time` to group experiments from the same mice and put them in chronological order. 
    
Hint: Make sure to reset the index after sorting the dataframe.

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Let's select a mouse and session to look at


In [None]:
# Pick the first mouse
subject_id = metadata['subject_id'].unique()[0]
# Save just the year, month, and date for the 3rd session from this mouse
date = metadata[metadata['subject_id']==subject_id].session_time[3][0:10] 
print('mouse_id:',subject_id, 'date:', date)

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Load the NWB file for this session using pynwb and examine its contents
    
Hint: Use `NWBZarrIO`

</div>

In [None]:
# First go through all the datasets in this capsule and find the folder that has our mouse's subject_id in the folder name
data_folder = [folder for folder in os.listdir(r'/data/') if str(subject_id) in folder and str(date) in folder][0]

# Set the directory to load the file
data_dir = os.path.join(r'/data/', data_folder)

# Now find the NWB file and set the path to load it
nwb_file = [file for file in os.listdir(data_dir) if 'nwb' in file][0]
nwb_path = os.path.join(data_dir, nwb_file)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### NWB files

As you'll recall from workshop 1, NWB files are a standardized file format for systems neuroscience experiments. They are formatted similarly to hdf5 containers, with data stored using using a directory format. There are "containers" (i.e. directories) for each type of data.  

The key types of data in an NWB file are: 
* metadata (subject information, recording methodology, devices used, etc.)
* events (stimulus tables, discrete task epochs, etc.)
* processed data (cell traces, segmented ROI masks, etc.)
    
The data in each container are typically provided as dynamic tables or as matrices. 


In [None]:
nwbfile

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

`nwb2widget` creates an interactive widget to easily explore the hierarchical contents of the NWB file. The widget can also render interactive data plots (e.g. calcium activity traces, image segmentation masks). 

In [None]:
nwb2widget(nwbfile) 


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Containers are accessed via dot (.) notation. If the data within the container are strings, they can also be accesed with dot notation. Like this: 

In [None]:
nwbfile.subject.genotype

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

If the data within a container is an object (another container, tables, or matrices), it can be accessed like items in a dictionary. Like this: 


In [None]:
nwbfile.devices['442_Bergamo_2p_photostim']

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Part 2: What are the conditions in this dataset? </h2>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Epoch Table 
    
The epochs table contains the start and stop times/frames for each experimental epoch. This tells you what types of conditions there are in the experiment.


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Load the `epochs` table from the *<b>intervals</b>* container of the NWB file. What epochs are present?
    
Hint: Make sure to turn the data into a dataframe to view it.

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Load the `dff_traces` from the *<b>processing</b>* container of the NWB file. What are the rows and columns? 
    
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

Plot the dFF trace for ROI = 100 with the stimulus epochs overlaid in color. Does it's activity change across epochs?
    
</div>

In [None]:
# Pick an ROI
ROI = 100

# Plot dff trace for selected ROI 
plt.rcParams["figure.figsize"] = (10, 4)
plt.plot(dff_traces[:, ROI], label=f'ROI {ROI}', color='black')

# Add shaded regions for stimulus epochs 

stimulus_names = epoch_table.stimulus_name.unique()

import matplotlib.cm as cm
colors = cm.get_cmap('Paired')
colors = colors(np.linspace(0, 1, len(stimulus_names)))

                
for c, stimulus_name in enumerate(stimulus_names):
    stim_epoch = epoch_table[epoch_table.stimulus_name==stimulus_name]
    for j in range(len(stim_epoch)):
        plt.axvspan(xmin=stim_epoch.start_frame.iloc[j], xmax=stim_epoch.stop_frame.iloc[j], color=colors[c], alpha=0.3, label=stimulus_name)

plt.ylabel('dF/F')
plt.xlabel('2P Frame')
plt.title('Stimulus epochs and dF/F for ROI '+str(ROI))
plt.legend(bbox_to_anchor=(1.0, 1.0))
plt.show()

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### Stimulus and trials tables
 

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** What types of stimulus tables are available? Check the keys of the *<b>stimulus</b>* container.
    
Does every epoch have a corresponding stimulus table? 

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Stimulus table (2p optogenetic activation stimulus) 
    
The stimulus table contains information about each 2p optogenetic stimulation trial. Optogenetic stimulation is used in this experiment to probe connectivity between neurons. In each trial, one neuron is stimulated, and all other neurons are recorded. Neurons with short latency responses after the optostim can be considered to be connected. 

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** What stimuli are delivered during the photostim epoch? 
    
Load the `PhotostimTrials` table from the *<b>stimulus</b>* container and turn it into a dataframe, then check the columns. 

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
    
What are the values of the `stimulus_name` column? 

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

There are two photostimulation epochs - one before and one after the BCI epoch.

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Here is what the columns mean:

| Column    | Description |
| -------- | ------- |
| start_time  | stimulus start (s)  |
| stop_time | stimulus end (s)   |
| start_frame | stimulus start (frame)     |
| stop_frame    | stimulus end (frame)  |
| tiff_file   | data source file name  |
| stimulus_name    | stimulus name   |
| laser_x    | x coordinate of stimulated neuron (pixel)   |
| laser_y    | y coordinate of stimulated neuron (pixel)  |
| power    | stimulus intensity (mW)  |
| duration    | trial duration (s)  |
| stimulus_function    | stimulus template   |
| group_index    | identity of stimulated neuron(s)   |
| closest_roi    | index in dff that corresponds to the photostimulated neuron   |


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### BCI Behavior Table 
    
The behavior trials table contains information about what the mouse did during each trial, such as whether it licked or got a reward, and when in the trial these events happened. It also includes the ID of the conditioned neuron. 

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** What happens during the BCI task epoch? 
    
Load the `Trials` table from the *<b>stimulus</b>* container, turn it into a dataframe, then check the columns. 

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Here is what the columns mean:

| Column    | Description |
| -------- | ------- |
| start_time  | trial start (s)  |
| stop_time | trial end (s)   |
| go_cue |  time of go cue relative to start time (s)   |
| hit   |  boolean of whether trial was hit   |
| lick_l  | lick times (s)   |
| reward_time   | reward delivery time (s)   |
| threshold_crossing_times    | time when reward port crossed position threshold (s)   |
| zaber_steps_times   | position of reward port  |
| tiff_file    | data source file  |
| start_frame    | trial start (frame)  |
| stop_frame    | trial end (frame)  |
| conditioned_neuron_x    | coordinate for conditioned neuron (pixels)  |
| conditioned_neuron_y    | coordinate for conditioned neuron (pixels)  |
| closest_roi    | index in dff that corresponds to the photostimulated neuron  |


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

Could the learning that happens during the BCI task epoch change the activity of the other neurons in the imaging plane? 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Part 3: Quantifying activity correlations before and after BCI learning</h2>
    - Cell activity traces, selecting time periods of interest, computing correlations
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

We can quantify correlations during the spontaneous activity periods before and after BCI learning to see if neuronal interactions have changed as a result of learning. 


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Extract cell activity during spontaneous activity epochs. 
    
First get the start and end frames of the spontaneous activity epochs before and after the BCI task. Include both `spont` and `spont_again` epochs for spontaneous activity before the task, so that it is a similar time frame as the `spontpost` epoch (~250 seconds)
    
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

Now get the dF/F traces limited to the spontaneous epochs. Create one new array for each epoch (pre and post). 
    
Transpose the array so that the rows are cell_ids and columns are 2P frames. 

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Visualize the dFF traces for each spontaneous period as a heatmap. Make sure the x-axis is 2P frames and the y-axis is # ROIs.
    
Hint: use [pcolormesh](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.pcolormesh.html) to plot, it makes nicer heatmaps than imshow and interprets the axes nicely (but takes a bit longer).

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Woah thats a lot of ROIs. Are they all valid? 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Image segmentation ROI masks
    
The ROI table contains the output of suite2P segmentation for this imaging plane. depending on the parameters used in data processing, suite2P can pick up a lot of ROIs or just a few. The probability of ROIs being a soma or dendrite are provided and can be used to filter for valid soma ROIs. 
    
Here is what the columns of the ROI table mean:

| Column    | Description |
| -------- | ------- |
| is_soma  | ==1 if ROI classified as soma, ==0 if not  |
| soma_probability | if >0.5 classified as soma  |
| is_dendrite |  ==1 if ROI classified as dendrite, ==0 if not   |
| dendrite_probability   |  if >0.5 classified as dendrite  |
| image_mask  | HxW sparse array defining image masks|

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Filter ROIs to limit data to high probability cell somas. 
    
Load the image segmentation masks from the *<b>data_interfaces</b>* container within the *<b>processing</b>* container. 

Turn the `roi_table` into a dataframe and check the columns.

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

Use a `soma_probability` threshold of 0.5 and get the coresponding ROI ids. 

Filter your dff traces arrays for the pre and post spontaneous periods to limit to these ROI ids and plot the heatmaps again. 

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Quantifying correlations
    
Correlations between neuron activity traces could indicate a direct connection between them, or more likely indicates that they received shared input and are part of an interacting network. Let's compute the pairwise correlations between all the neurons and compare them for pre and post spontaneous periods. 

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Quantify the correlation between every cell pair's dff traces for the pre and post spontaneous periods and plot each as a heatmap. Dont forget to label your axes and include a title so you remember which is pre and which is post. 
    
Hint: Use [np.corrcoef](https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html) to do this. 

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Plot the distribution of correlation values as histogram comparing pre and post. 
    
Hint 1: Make sure to flatten your matrix into a 1D array to plot the histogram. 
    
Hint 1: Use histtype='step' as an argument to plt.hist to be able to more easily view the results.
    
Do a t-test with [scipy.stats.ttest_ind](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html) - are the distributions statistically different? include the p-value in the plot title
    
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Part 4: Do correlations depend on distance between neurons?</h2>
    - Segmented ROI masks and spatial relationships


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### ROI Image Masks 
    
ROI masks are extracted from the ophys data to identify regions corresponding to cells and dendrites. Here we want to identify the ROIs correspoding to the cells we computed correlations for earlier and quantify how the distance between ROIs relates to their correlations. 

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Get the `roi_table` from the from the *<b>data_interfaces</b>* container within the *<b>processing</b>* container and limit it to ROIs with `soma_probability` > 0.5 like we did earlier for the dF/F traces. 
    
Plot the `image_mask` for one of the ROIs. 

In [None]:
# In case you forgot how to find the roi_table
roi_table = nwbfile.processing["processed"].data_interfaces["image_segmentation"].plane_segmentations["roi_table"].to_dataframe()
# Limit to real somas
roi_table = roi_table[roi_table['soma_probability']>0.5]

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### Calculate the centroid of each ROI image mask 

The ROI image masks are represented as a HxW sparse array with non-zero values that span the ROI area. We can find the centroid of the ROI by calculating the mean of the x,y indices of each mask. We've pre-written the function `get_roi_centroids` to do this. 

In [None]:
def get_roi_centroids(roi_table):
    centroids = []
    for mask in roi_table['image_mask']:
        ys, xs = np.where(mask)
        x = np.mean(xs)
        y = np.mean(ys)
        centroids.append((x, y))
    return np.array(centroids)

In [None]:
# Calculate centroids for each ROI and plot 

centroids = get_roi_centroids(roi_table)
centroidX = centroids[:, 0]
centroidY = centroids[:, 1]

plt.plot(centroidX, centroidY, 'ko', alpha = 0.2, label = 'ROI centroids')
plt.xlabel('X position') 
plt.ylabel('Y position')
plt.title('ROI centroids for high probability cell somas')
plt.legend()
plt.legend(bbox_to_anchor=(1.0, 1.0))

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Calculate the distance between one pair of cells using [math.dist](https://docs.python.org/3/library/math.html#math.dist)


In [None]:
import math

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Create pairs data table

Let's make a table that contains a row for each cell pair, with columns for each cell index, their correlation pre and post, and the distance between them. This can make analysis easier and helps us keep track of which piece of data goes with which cell pair.


In [None]:
pairs_data = []
for roi_id_1 in range(len(roi_table.index.values)): 
    for roi_id_2 in range(len(roi_table.index.values)): 
        centroid_cell_1 = centroids[roi_id_1, :]
        centroid_cell_2 = centroids[roi_id_2, :]
        distance = math.dist(centroid_cell_1, centroid_cell_2)
        r_value_pre = correlations_pre[roi_id_1, roi_id_2]
        r_value_post = correlations_post[roi_id_1, roi_id_2]
        pairs_data.append([roi_id_1, roi_id_2, r_value_pre, r_value_post, distance])


In [None]:
pairs = pd.DataFrame(pairs_data, columns=['roi_id_1', 'roi_id_2', 'r_value_pre', 'r_value_post', 'distance'])
pairs.head()

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**Exercise:** Plot activity correlations for pre and post spontaneous periods as a function of distance

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; ">

#### Questions and analyses to explore further:
 
* How does connectivity change across the photostimulation periods? 
        
* Do correlated neurons tend to have stronger connections? 
    
* Quantify the degree to which the conditioned neuron increases its activity throughout the task. 
    
* Do differences in the transgenic lines or injected viruses impact connectivity and photostimulation measurements? 
    
* Do non-conditioned neurons change activity during the BCI task? 
    