<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">


## Space
The analysis we did during the workshop did not take into account physical space. If neurons near each other have more similar functional properties, and neurons near to one another are more likely to be connected, this effect might be explained just by spatial factors. How big are those effects? Can they explain this shift?

There is good evidence that synapses located close to the cell body of equal size are functionally stronger than synapses which are farther from the cell body.  

This exercise is longer and more complex than the others

In [3]:
import numpy as np
import os 
import caveclient
import scipy
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 
%matplotlib inline



<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### We will start with recalculating the dataframe from the workshop

In [4]:
import platform
import os

platstring = platform.platform()
if ('Darwin' in platstring) or ('macOS' in platstring):
    # macOS 
    data_root = "/Volumes/Brain2023/"
elif 'Windows'  in platstring:
    # Windows (replace with the drive letter of USB drive)
    data_root = "E:/"
elif ('amzn' in platstring):
    # then on Code Ocean
    data_root = "/data/"
else:
    # then your own linux platform
    # EDIT location where you mounted hard drive
    data_root = "/media/$USERNAME/Brain2023/"
    
data_dir = os.path.join(data_root, 'microns_in_silico')

# you can just override this if the location of the data varies
# data_dir = '/Users/forrestc/Downloads/microns_in_silico/'

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### Units dataframe column description

* animal_id: uniformally 17797, this experiment has just one animal
* scan_session: what day the experiment was done on
* scan_idx: what scan on that day was the recording done. NOTE: it is COMBINATION of session and scan which uniquely defines a recording, there are multiple scans with scan_idx =5, 7, 6, and 9. 
* unit_id: an index on the ROIs recording during that session + scan combination.
* row_idx: what is the index of this unit in the response numpy matrix

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### Coregistration columns
* id: a unique index for this coregistration point
* target_id: the nucleus ID for this coregistration (target because its a reference to a nucleus)
* session: the session (aka scan_session from responses)
* scan_idx: the scan index (same as responses)
* unit_id: the unit id in the session+scan (same as responses)
* field: which of the [1-8] 2d fields the unit came from
* residual: a coregistration QC statistic (see paper)
* score: a coregistration QC seperation statistic (see paper)
* pt_root_id: the segmentation id of the object (possible for one segmentation to have 0,1,2+ nuclei)
* pt_position: the x,y,z position of nucleus (in um because of desired_resolution=[1000,1000,1000])
* volume: the volume of the nucleus in um^3
  
Columns you can safely ignore for this exercise
* id_ref: the id of the referenced annotation (in this case a nucleus_id so same as target_id)
* created: when this was created
* created_ref: when the nucleus annotation this references was created
* pt_supervoxel_id: the supervoxel underneath this nucleus location
* bb_start_position: spots for bounding box start (nan in this dataset)
* bb_end_position: spot for bounding boxes end (nan in this dataset)
* valid: an internal check variable

</div>

In [18]:
# we are going to load up the data and prepare the dataframe like we did 
# in class but with fewer comments

# load up the in-silico responses as a pandas dataframe from a numpy array 
resp=pd.DataFrame(np.load(os.path.join(data_dir, 'nat_resp.npy')))

# load up the csv of metadata about the 104171 units
units_df = pd.read_csv(os.path.join(data_dir, 'nat_unit.csv'))

# set the index to the be the row_idx of the units_df
resp.index = units_df['row_idx']

# if we are on code ocean, the CAVEsetup helped you make your token an environment variable
if 'amzn' in platstring:
    client= caveclient.CAVEclient('minnie65_public', auth_token=os.environ['API_SECRET'])
else:
    # otherwise if you are local, then it should be saved to a file in your harddrive 
    # that the caveclient knows where to read.
    client= caveclient.CAVEclient('minnie65_public')

# lets pull out the manual coregistered neurons
# desired_resolution describes how many nanometers you want each unit to be
# so 1000,1000,1000 gives positions in microns for x,y and z
coreg_df = client.materialize.query_table('coregistration_manual_v3', desired_resolution=[1000,1000,1000])

#print the shape of units_df
print('The shape of units_df is: ', units_df.shape)

#print the shape of resp
print('The shape of resp is: ', resp.shape)

#print the shape of coreg_df
print('The shape of coreg_df is: ', coreg_df.shape)


The shape of units_df is:  (104171, 5)
The shape of resp is:  (104171, 5000)
The shape of coreg_df is:  (13925, 19)


In [19]:
units_df.head()

Unnamed: 0,animal_id,scan_session,scan_idx,unit_id,row_idx
0,17797,4,7,1,0
1,17797,4,7,3,1
2,17797,4,7,4,2
3,17797,4,7,5,3
4,17797,4,7,6,4


In [20]:
# lets merge these dataframes so we get the row_idx of each coregistered unit
# we merge on the corresponding columns, however scan was called something
# slightly different in one csv vs the CAVE table
coreg_in_silico=pd.merge(units_df, coreg_df, # left and right dataframes
         left_on=['scan_session', 'scan_idx', 'unit_id'], # left columns
          right_on=['session','scan_idx', 'unit_id']) # right columns

#print the shape of coreg_in_silico
print('The shape of coreg_in_silico is: ', coreg_in_silico.shape)

The shape of coreg_in_silico is:  (12094, 22)


In [21]:
coreg_in_silico.head()

Unnamed: 0,animal_id,scan_session,scan_idx,unit_id,row_idx,id_ref,created_ref,valid_ref,volume,pt_supervoxel_id,...,created,valid,target_id,session,field,residual,score,pt_position,bb_start_position,bb_end_position
0,17797,4,7,648,517,516506,2020-09-28 22:44:43.650751+00:00,t,276.767375,105487283075464806,...,2023-04-05 22:38:59.933339+00:00,t,516506,4,2,6.18126,11.443982,"[1184.832, 378.752, 632.12]","[nan, nan, nan]","[nan, nan, nan]"
1,17797,4,7,662,530,452329,2020-09-28 22:45:02.852190+00:00,t,311.618437,101758289725398433,...,2023-04-05 22:39:40.786518+00:00,t,452329,4,2,5.42279,12.006788,"[1076.992, 395.328, 735.28]","[nan, nan, nan]","[nan, nan, nan]"
2,17797,4,7,665,533,451461,2020-09-28 22:41:51.543636+00:00,t,263.504036,102531727972419182,...,2023-04-05 22:39:41.400499+00:00,t,451461,4,2,1.28047,15.025886,"[1099.456, 376.256, 881.84]","[nan, nan, nan]","[nan, nan, nan]"
3,17797,4,7,671,539,420222,2020-09-28 22:45:01.445495+00:00,t,309.88288,100491446037118564,...,2023-04-05 22:39:27.234487+00:00,t,420222,4,2,0.704028,21.035651,"[1039.744, 389.44, 678.96]","[nan, nan, nan]","[nan, nan, nan]"
4,17797,4,7,682,549,420058,2020-09-28 22:44:36.438460+00:00,t,267.026432,98661652329244694,...,2023-04-05 22:36:19.482289+00:00,t,420058,4,2,9.64831,2.79444,"[985.792, 383.424, 634.92]","[nan, nan, nan]","[nan, nan, nan]"


In [22]:
# reset the index to make sure that we have the index
coreg_in_silico.reset_index(inplace=True) 

coreg_in_silico.head()

Unnamed: 0,index,animal_id,scan_session,scan_idx,unit_id,row_idx,id_ref,created_ref,valid_ref,volume,...,created,valid,target_id,session,field,residual,score,pt_position,bb_start_position,bb_end_position
0,0,17797,4,7,648,517,516506,2020-09-28 22:44:43.650751+00:00,t,276.767375,...,2023-04-05 22:38:59.933339+00:00,t,516506,4,2,6.18126,11.443982,"[1184.832, 378.752, 632.12]","[nan, nan, nan]","[nan, nan, nan]"
1,1,17797,4,7,662,530,452329,2020-09-28 22:45:02.852190+00:00,t,311.618437,...,2023-04-05 22:39:40.786518+00:00,t,452329,4,2,5.42279,12.006788,"[1076.992, 395.328, 735.28]","[nan, nan, nan]","[nan, nan, nan]"
2,2,17797,4,7,665,533,451461,2020-09-28 22:41:51.543636+00:00,t,263.504036,...,2023-04-05 22:39:41.400499+00:00,t,451461,4,2,1.28047,15.025886,"[1099.456, 376.256, 881.84]","[nan, nan, nan]","[nan, nan, nan]"
3,3,17797,4,7,671,539,420222,2020-09-28 22:45:01.445495+00:00,t,309.88288,...,2023-04-05 22:39:27.234487+00:00,t,420222,4,2,0.704028,21.035651,"[1039.744, 389.44, 678.96]","[nan, nan, nan]","[nan, nan, nan]"
4,4,17797,4,7,682,549,420058,2020-09-28 22:44:36.438460+00:00,t,267.026432,...,2023-04-05 22:36:19.482289+00:00,t,420058,4,2,9.64831,2.79444,"[985.792, 383.424, 634.92]","[nan, nan, nan]","[nan, nan, nan]"


In [23]:
# this will pull out the responses to the coregistered units
# by using the row_idx that was provided in the metadata
coreg_resp = resp.loc[coreg_in_silico.row_idx,:] # pull out the responses to the coregistered units by using the row_idx that was provided in the metadata

In [24]:
# now with a reduced set of units, we can calculate the Pearson correlation
# between their responses
corr_M = np.corrcoef(coreg_resp.values) 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

However this time lets make a dataframe that contains all the correlations
but also the nucleus IDs of both sides of the correlation
and then merge in the nucleus positions so we can measure the
soma to soma distance of that correlation


In [8]:
coreg_in_silico.head(10)

Unnamed: 0,index,animal_id,scan_session,scan_idx,unit_id,row_idx,id,created,valid,target_id,...,score,id_ref,created_ref,valid_ref,volume,pt_supervoxel_id,pt_root_id,pt_position,bb_start_position,bb_end_position
0,0,17797,4,7,648,517,2043,2023-04-05 22:38:59.933339+00:00,t,516506,...,11.443982,516506,2020-09-28 22:44:43.650751+00:00,t,276.767375,105487283075464806,864691135348268503,"[1184.832, 378.752, 632.12]","[nan, nan, nan]","[nan, nan, nan]"
1,1,17797,4,7,662,530,10173,2023-04-05 22:39:40.786518+00:00,t,452329,...,12.006788,452329,2020-09-28 22:45:02.852190+00:00,t,311.618437,101758289725398433,864691135700505634,"[1076.992, 395.328, 735.28]","[nan, nan, nan]","[nan, nan, nan]"
2,2,17797,4,7,665,533,10871,2023-04-05 22:39:41.400499+00:00,t,451461,...,15.025886,451461,2020-09-28 22:41:51.543636+00:00,t,263.504036,102531727972419182,864691135776919981,"[1099.456, 376.256, 881.84]","[nan, nan, nan]","[nan, nan, nan]"
3,3,17797,4,7,671,539,8088,2023-04-05 22:39:27.234487+00:00,t,420222,...,21.035651,420222,2020-09-28 22:45:01.445495+00:00,t,309.88288,100491446037118564,864691135472842290,"[1039.744, 389.44, 678.96]","[nan, nan, nan]","[nan, nan, nan]"
4,4,17797,4,7,682,549,1480,2023-04-05 22:36:19.482289+00:00,t,420058,...,2.79444,420058,2020-09-28 22:44:36.438460+00:00,t,267.026432,98661652329244694,864691135349237975,"[985.792, 383.424, 634.92]","[nan, nan, nan]","[nan, nan, nan]"
5,5,17797,4,7,686,553,1106,2023-04-05 22:36:19.163677+00:00,t,420403,...,10.74727,420403,2020-09-28 22:42:27.376760+00:00,t,304.421274,99999071321769531,864691135345445279,"[1025.28, 394.048, 779.04]","[nan, nan, nan]","[nan, nan, nan]"
6,6,17797,4,7,690,557,692,2023-04-05 22:38:09.793823+00:00,t,389167,...,2.665841,389167,2020-09-28 22:41:51.558752+00:00,t,346.720829,98099595997372919,864691135334495209,"[969.024, 409.92, 713.08]","[nan, nan, nan]","[nan, nan, nan]"
7,7,17797,4,7,691,558,779,2023-04-05 22:38:09.863476+00:00,t,389164,...,5.579912,389164,2020-09-28 22:45:11.330457+00:00,t,332.952207,98029227252939503,864691135688112096,"[967.36, 408.832, 705.32]","[nan, nan, nan]","[nan, nan, nan]"
8,8,17797,4,7,702,568,3658,2023-04-05 22:39:03.611226+00:00,t,389183,...,12.02662,389183,2020-09-28 22:45:05.663650+00:00,t,317.565665,96551414972912364,864691135234029401,"[924.032, 406.784, 727.96]","[nan, nan, nan]","[nan, nan, nan]"
9,9,17797,4,7,703,569,4908,2023-04-05 22:39:07.425785+00:00,t,389458,...,8.079086,389458,2020-09-28 22:44:30.630418+00:00,t,258.159903,97676765526421836,864691135478783942,"[957.44, 390.912, 844.6]","[nan, nan, nan]","[nan, nan, nan]"


In [6]:
#print the column names 
print(coreg_in_silico.columns)

Index(['index', 'animal_id', 'scan_session', 'scan_idx', 'unit_id', 'row_idx',
       'id', 'created', 'valid', 'target_id', 'session', 'field', 'residual',
       'score', 'id_ref', 'created_ref', 'valid_ref', 'volume',
       'pt_supervoxel_id', 'pt_root_id', 'pt_position', 'bb_start_position',
       'bb_end_position'],
      dtype='object')


In [10]:
# get an array of the nucleus IDs of each row/column of the corr_M
# this should be in the 'target_id' column
array_of_nucleus_ids = coreg_in_silico['target_id'].values 


# use the row and column indices to get an array of nucleus IDs on each side of the correlation matrix 


# use fancy indexing to pull out the correlation values


[516506 452329 451461 ... 454429 391240 454708]


In [None]:
# construct a dataframe using these 3 columns
# hint use a a dictionary to name the columns
# and include "copy=False" to avoid blowing up memory


In [None]:
# get the nucleus positions dataframe
# converting the positions to microns
# and using standard transform to adjust them to be flat
nuc_df = client.materialize.query_view('nucleus_detection_lookup_v1', 
                                        select_columns = ['id', 'pt_root_id', 'pt_position'],
                                        desired_resolution=[1,1,1])
from standard_transform.datasets import minnie_transform_nm
tform=minnie_transform_nm()
nuc_df['pt_position']=tform.apply(nuc_df.pt_position)
nuc_df['pt_position']=nuc_df.pt_position.apply(np.array)

In [None]:
# merge on the pre and post positions


In [None]:
# visualize the first few rows of your dataframe


In [None]:
# measure the distance between the soma of nuc1 and nuc2

# measure the distance also in x,z only.. this is along the surface of cortex

# hints: look at np.vstack, np.linalg.norm


In [None]:
# filter out distances of <2 microns


In [None]:
# using binned statistic, lets measure the avg C as a function of euclidean distance

# make up some distance bins from 2-250 microns


# use scipy.stats.binned_statistic
# to measure correlation as a function of distance



In [None]:
# what about using the cortical distance
# use the same bins




In [None]:
# make a plot of mean Correlation and std error bars a function of distance
# put both distances on same plot


In [None]:
# make a plot of how many pairs fall in each of these distance bins


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Thought question

What explains these curves? 

In [29]:
# make the same plot but for connected pairs of neurons
# first lets reconstruct the dataframe from the workshop
# we need this code to work in solutions directory
# and one up..
if 'solutions' in os.getcwd():
    workshop2file = '../../workshop2/all_prf_coreg_conn_v661.pkl'
else:
    workshop2file = '../workshop2/all_prf_coreg_conn_v661.pkl'
all_syn_df = pd.read_pickle(workshop2file)

# lets merge on the pre and post-synaptic positions of these connections

# renaming the positions as pre and post depending on how we did the merge
# and drop the duplicate id columns
all_syn_dfm=all_syn_df.merge(nuc_df[['id', 'pt_position']], left_on='pre_nuc_id', right_on='id')\
.rename({'pt_position':'pre_pt_position'}, axis=1)\
.merge(nuc_df[['id', 'pt_position']], left_on='post_nuc_id', right_on='id')\
.rename({'pt_position':'post_pt_position'}, axis=1)\
.drop(['id_x', 'id_y'], axis=1)

# now lets merge in the neurons that are coregistered with responses

# we have to drop duplicates to avoid the few cells that were coregistered twice 
# being double counted
all_syn_dfm2=all_syn_dfm.merge(coreg_in_silico[['index','target_id', 'scan_session', 'scan_idx', 'field','unit_id', 'score', 'residual']],
                  left_on='pre_nuc_id', 
                  right_on='target_id')\
.merge(coreg_in_silico[['index','target_id', 'scan_session', 'scan_idx', 'field','unit_id','score', 'residual']],
                  left_on='post_nuc_id', 
                  right_on='target_id',
                  suffixes=['_pre', '_post'])\
.drop(['target_id_pre', 'target_id_post'],axis=1)\
.drop_duplicates(subset=['pre_nuc_id', 'post_nuc_id'])
all_syn_dfm2

# now use fancy indexing to pull out the correlation associated with each of these connections
all_syn_dfm2['C']=corr_M[all_syn_dfm2.index_pre, all_syn_dfm2.index_post]



NameError: name 'nuc_df' is not defined

In [None]:
# lets cut the dataframe down to just the columns we need
df_conn=all_syn_dfm2[['pre_nuc_id', 'post_nuc_id', 'n_syn', 'sum_size', 'C', 'pre_pt_position', 'post_pt_position']]

In [None]:
df_conn.head()

In [None]:
# calculate the intersoma distance


In [None]:
# filter out the soma distances of <2 microns
# to discount the double roi cells


In [None]:
# use scipy.stats.binned_statistic
# to measure correlation as a function of distance for connected


In [None]:
#  plot both curves for connected and all paris on top of one another
# make a plot of mean Correlation and std error bars a function of distance


In [None]:
#  plot both curves for connected and all paris on top of one another
# make a plot of mean Correlation and std error bars a function of distance


In [None]:
# plot how many connected pairs are in each distance bin


In [None]:
# plot how many are connected in a cortical distance bin


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

#### Thought questions

What explains the difference between this curve and the overall distribution of pairs of recorded ROIs? 

Does this curve match your expectation for what cortical connectivity should look like?

If space explains a lot of the differences between connected and unconnected pairs,
what does that mean?

How does it affect your interpretation of the effects?

Are there spatial effects that go beyond just soma to soma distance that would be important to control for in order to interpret a finding as being evidence for a particular mechanism?

#### Extensions/Project Ideas

Can you make a model which resamples the all pairs to match the spatial distributions found in the connected dataset?  Are the results significant compared to that null model?

Does this explain the variation seen in the single cell effects.. that some cells have closer and farther away targets in the brain?

</div>