<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">


## Space
The analysis we did during the workshop did not take into account physical space. If neurons near each other have more similar functional properties, and neurons near to one another are more likely to be connected, this effect might be explained just by spatial factors. How big are those effects? Can they explain this shift?

There is good evidence that synapses located close to the cell body of equal size are functionally stronger than synapses which are farther from the cell body.  

This exercise is longer and more complex than the others

In [1]:
import numpy as np
import os
import caveclient
import scipy
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt



<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

### We will start with recalculating the dataframe from the workshop

In [2]:
import platform
import os

platstring = platform.platform()
if ('Darwin' in platstring) or ('macOS' in platstring):
    # macOS 
    data_root = "/Volumes/Brain2023/"
elif 'Windows'  in platstring:
    # Windows (replace with the drive letter of USB drive)
    data_root = "E:/"
elif ('amzn' in platstring):
    # then on Code Ocean
    data_root = "/data/"
else:
    # then your own linux platform
    # EDIT location where you mounted hard drive
    data_root = "/media/$USERNAME/Brain2023/"
    
data_dir = os.path.join(data_root, 'microns_in_silico')

# you can just override this if the location of the data varies
# data_dir = '/Users/forrestc/Downloads/microns_in_silico/'

In [3]:
# we are going to load up the data and prepare the dataframe like we did 
# in class but with fewer comments

# load up the in-silico responses as a pandas dataframe from a numpy array 
resp=pd.DataFrame(np.load(os.path.join(data_dir, 'nat_resp.npy')))

# load up the csv of metadata about the 104171 units
units_df = pd.read_csv(os.path.join(data_dir, 'nat_unit.csv'))

# set the index to the be the row_idx of the units_df
resp.index = units_df['row_idx']

# if we are on code ocean, the CAVEsetup helped you make your token an environment variable
if 'amzn' in platstring:
    client= caveclient.CAVEclient('minnie65_public', auth_token=os.environ['API_SECRET'])
else:
    # otherwise if you are local, then it should be saved to a file in your harddrive 
    # that the caveclient knows where to read.
    client= caveclient.CAVEclient('minnie65_public')

# lets pull out the manual coregistered neurons
# desired_resolution describes how many nanometers you want each unit to be
# so 1000,1000,1000 gives positions in microns for x,y and z
coreg_df = client.materialize.query_table('coregistration_manual_v3', desired_resolution=[1000,1000,1000])

# lets merge these dataframes so we get the row_idx of each coregistered unit
# we merge on the corresponding columns, however scan was called something
# slightly different in one csv vs the CAVE table
coreg_in_silico=pd.merge(units_df, coreg_df, 
         left_on=['scan_session', 'scan_idx', 'unit_id'],
          right_on=['session','scan_idx', 'unit_id'])
# reset the index to make sure that we have the index
coreg_in_silico.reset_index(inplace=True)

# this will pull out the responses to the coregistered units
# by using the row_idx that was provided in the metadata
coreg_resp = resp.loc[coreg_in_silico.row_idx,:]

# now with a reduced set of units, we can calculate the Pearson correlation
# between their responses
corr_M = np.corrcoef(coreg_resp.values)




<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

However this time lets make a dataframe that contains all the correlations
but also the nucleus IDs of both sides of the correlation
and then merge in the nucleus positions so we can measure the
soma to soma distance of that correlation


In [4]:
# get an array of the nucleus IDs of each row/column of the corr_M

nuc_ids = coreg_in_silico.target_id

# get the row and column indices of the upper right triangle
# of this matrix
rows, cols = np.triu_indices_from(corr_M,1)

# use the row and column indices to get an array of nucleus IDs on each side of the correlation
nuc1 = nuc_ids[rows]
nuc2 = nuc_ids[cols]

# use fancy indexing to pull out the correlation values
Cs = corr_M[(rows,cols)]


In [5]:

# construct a dataframe using these 3 columns
# hint use a np.column_stack to pass multiple arrays to a dataframe
Cseries = pd.Series(Cs, name='C')
Nuc1Series = pd.Series(nuc1, name='nuc1')
Nuc2Series = pd.Series(nuc2, name='nuc2')




In [10]:
np.column_stack((Cs, nuc1,nuc2), dtype=('

[0;31mSignature:[0m       [0mnp[0m[0;34m.[0m[0mcolumn_stack[0m[0;34m([0m[0mtup[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mCall signature:[0m  [0mnp[0m[0;34m.[0m[0mcolumn_stack[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m            _ArrayFunctionDispatcher
[0;31mString form:[0m     <function column_stack at 0x7fbf342bd000>
[0;31mFile:[0m            /opt/conda/lib/python3.10/site-packages/numpy/lib/shape_base.py
[0;31mDocstring:[0m      
Stack 1-D arrays as columns into a 2-D array.

Take a sequence of 1-D arrays and stack them as columns
to make a single 2-D array. 2-D arrays are stacked as-is,
just like with `hstack`.  1-D arrays are turned into 2-D columns
first.

Parameters
----------
tup : sequence of 1-D or 2-D arrays.
    Arrays to stack. All of them must have the same first dimension.

Returns
-------
stacked : 2-D array
    The array formed by stacking the give

In [9]:
pd.concat([Cseries, Nuc1Series], axis=1)

KeyboardInterrupt: 

In [None]:
Cdf = pd.DataFrame(data=[Cseries, Nuc1Series])


In [None]:
# delete the correlation matrix to save memory
del(corr_M)

In [20]:
len(Cs)

73126371

In [24]:
len(nuc2)

73126371

73132418.0