# Reproducible activity without attractors in the mouse cortex 

Analysis code to reproduce all panels in figures 1 and 2 of the paper by Guarino, Filipchuk, Destexhe (2022)   
preprint link: https://www.biorxiv.org/content/10.1101/2022.05.24.493230v2

All this code is hosted on a github [repository](https://github.com/dguarino/Guarino-Filipchuk-Destexhe) (with a Zenodo DOI persistent identifier [here](https://zenodo.org)) and can be interactively executed here.  
The repository also contains a copy of the required data files from the [MICrONS project phase1](https://www.microns-explorer.org/phase1) (freely available on the project website), to ease the setup on Binder. 

This notebook performs loading and selection of the MICrONS data, structural and dynamical analyses, and plots the results as in the paper panels.

We divided the analysis code into:
- `imports_functions.py` : performs the imports and definition of various helper functions.
- `structural_analysis.py` : creates a graph from the connectivity matrix and computes several graph measures (using [igraph](https://igraph.org)).
- `dynamical_analysis.py` : performs the same population event analysis as in [Filipchuk et al. 2022](https://www.biorxiv.org/content/10.1101/2021.08.31.458322v2) and then also extracts the core neurons of the events.


In [32]:
from platform import python_version
print(python_version())

from builtins import exec
exec(open("./imports_functions.py").read())

3.10.4


## Loading curated data from MICrONS project phase 1

The following code for data loading and selection is taken from   
https://github.com/AllenInstitute/MicronsBinder/blob/master/notebooks/intro/MostSynapsesInAndOut.ipynb   
https://github.com/AllenInstitute/MicronsBinder/blob/master/notebooks/vignette_analysis/function/structure_function_analysis.ipynb

`Neurons.pkl` contains the `segment_id` for each pyramidal neuron in the EM volume.    
`Soma.pkl` contains the soma position for all the cells in the EM volume.   
`calcium_trace.pkl` contains the calcium imaging traces (including deconvolved spikes).    
`soma_subgraph_synapses_spines_v185.csv` contains the list of synapses with root pre-/post-synaptic somas.

In [140]:
if not os.path.exists("MICrONS_data/calcium_trace.pkl"):
    print("Downloading 2photon calcium traces ...")
    resp = wget.download("https://zenodo.org/record/5646567/files/calcium_trace.pkl?download=1", "MICrONS_data/calcium_trace.pkl")
    print("... Done: "+resp)
    
if os.path.exists("MICrONS_data/calcium_trace.pkl"):
    calcium_trace = pd.read_pickle("MICrONS_data/calcium_trace.pkl")
    calcium_trace_df = pd.DataFrame.from_dict(calcium_trace, orient='index')
# print(calcium_trace)
# print(calcium_trace_df.columns) # ['scan', 'trace_raw', 'trace', 'spike', 'stimulus']
# print(len(calcium_trace_df.index) ) # 112
# print(len(calcium_trace_df[calcium_trace_df['scan']==scan_id].index))

**CAUTION: The cell below might take some time to load the data.**

In [141]:
if not os.path.exists("MICrONS_data/pni_synapses_v185.csv"):
    print("Downloading Synapse table ...")
    resp = wget.download("https://zenodo.org/record/3710459/files/pni_synapses_v185.csv?download=1", "MICrONS_data/pni_synapses_v185.csv")
    print("... Done: "+resp)

if not os.path.exists("MICrONS_data/soma_subgraph_synapses_spines_v185.csv"):
    print("Downloading soma_subgraph_synapses_spines_v185 ...")
    resp = wget.download("https://zenodo.org/record/3710459/files/soma_subgraph_synapses_spines_v185.csv?download=1", "MICrONS_data/soma_subgraph_synapses_spines_v185.csv")
    print("... Done: "+resp)

with open("MICrONS_data/Neuron.pkl", 'rb') as handle:
    Neuron = pickle.load(handle)
with open("MICrONS_data/Soma.pkl", 'rb') as handle:
    Soma = pickle.load(handle)

syn_spines_df = pd.read_csv('MICrONS_data/soma_subgraph_synapses_spines_v185.csv')
# id, pre_root_id, post_root_id, cleft_vx, spine_vol_um3
print(syn_spines_df.shape)

syn_df = pd.read_csv('MICrONS_data/pni_synapses_v185.csv')
print(syn_df.shape)
# print(syn_df)

(1961, 17)
(3239275, 16)


Get the IDs and number of recorded pyramidal neurons

In [3]:
pyc_list = Neuron["segment_id"]
n_pyc = pyc_list.shape[0]

Set the folder to which all results will be saved, and the frame duration (from the MICrONS docs).

In [4]:
exp_path = os.getcwd()
frame_duration = 0.0674 # sec, 14.8313 frames per second

#### Accessing 2-photon Calcium imaging data subset

We are interested in reading only the Ca-imaging data of the cells for which also the EM reconstruction is available.   

##### CAUTION: next cell can take some time to load all calcium imaging data!

In [161]:
print("Pyramidal neurons recorded with 2-photon Calcium imaging: ",len(calcium_trace))
ophys_cell_ids = list(calcium_trace.keys())
n_frames = len(calcium_trace[ophys_cell_ids[0]]['spike'])
print(n_frames)
start_time = 0 # 200 frames of blank screen are already removed from the data
stop_time = (n_frames)*frame_duration
time = np.arange(start_time,stop_time,frame_duration)

spiketrains = [[],[],[],[],[]] # five scans
ophys_scan_ids = [[],[],[],[],[]] # five scans
for ocell_id in ophys_cell_ids:
    decst = calcium_trace[ocell_id]["spike"]
    spiketrains[(calcium_trace[ocell_id]["scan"])-1].append( time[:][np.nonzero(decst)]) # deconvolved Ca spiketrains
    ophys_scan_ids[(calcium_trace[ocell_id]["scan"])-1].append( ocell_id )

print("... producing spike rasterplot")
fig = plt.figure(figsize=[12.8,4.8])
rowg = 0
rowc = ['b','g','r','c','m']
for scanid,scan in enumerate(spiketrains):
    for row,train in enumerate(scan):
        plt.scatter( train, [row+rowg]*len(train), marker='o', edgecolors='none', s=1, c=rowc[scanid] )
    rowg += row+1
    # add here timing of oriented stimulus
    
plt.ylabel("cell IDs")
plt.xlabel("time (s)")
fig.savefig(exp_path+'/results/rasterplot.png', transparent=False, dpi=800)
plt.close()
fig.clear()
fig.clf()

Pyramidal neurons recorded with 2-photon Calcium imaging:  112
27100
... producing spike rasterplot


#### Create the cell indexes from the list of IDs

In [145]:
ophys_cell_indexes = range(len(ophys_cell_ids))

#### Get soma center locations

They are provided in voxels coordinates of 4,4,40 nm

In [146]:
pyc_soma_loc = np.zeros((n_pyc, 3))
for i in range(n_pyc):
    seg_id = pyc_list[i]
    pyc_soma_loc[i,:] = get_soma_loc(Soma, seg_id)

Join cell indexes with their position

In [147]:
pyc_ca_soma_loc = np.zeros((len(ophys_cell_indexes), 3))
for i in ophys_cell_indexes:
    seg_id = ophys_cell_ids[i]
    idx = np.where(pyc_list==seg_id)[0][0]
    pyc_ca_soma_loc[i,:] = pyc_soma_loc[idx,:]

---
## Structural Analysis

First, we build an adjacency matrix for all the EM-imaged neurons:

In [148]:
adjacency_matrix = np.zeros((len(ophys_cell_indexes), len(ophys_cell_indexes)))

for i in ophys_cell_indexes:
    root_id = ophys_cell_ids[i]
    root_id_postsyn_list = syn_df[syn_df['pre_root_id'] == root_id]['post_root_id'].tolist()
    for ps in root_id_postsyn_list:
        if ps in ophys_cell_ids:
            ips = ophys_cell_ids.index(ps)
            adjacency_matrix[i][ips]=1
np.save(exp_path+'/results/adjacency_matrix.npy', adjacency_matrix)

Then we make another one 2p-scan-specific:

In [162]:
scan_adjacency_matrix = {}
for scan_id in range(5):
    
    gshape = len(ophys_scan_ids[scan_id]) # num cells in the scan
    print(gshape)
    adjacency_matrix = np.zeros((gshape, gshape))
    
    for i,root_id in enumerate(ophys_scan_ids[scan_id]):
        root_id_postsyn_list = syn_df[syn_df['pre_root_id'] == root_id]['post_root_id'].tolist()
        for ps in root_id_postsyn_list:
            if ps in ophys_scan_ids[scan_id]:
                ips = ophys_scan_ids[scan_id].index(ps)
                adjacency_matrix[i][ips]=1
        scan_adjacency_matrix[scan_id] = adjacency_matrix
    
np.save(exp_path+'/results/scan_adjacency_matrix.npy', scan_adjacency_matrix)

35
21
22
22
12


## Are co-active cells also connected?

We first measure 1-lag correlation across all cells.

In [170]:
for scan_id,scan in enumerate(spiketrains):
    # make binary spiketrains
    binary_spiketrains = np.zeros( (len(scan),len(time)+2) )
    # print(binary_spiketrains.shape)
    
    for row,train in enumerate(scan):
        # iterate over spiketrains assigning 1 to the binary_spiketrains at the corresponding position
        tidxs = np.trunc(np.array(train)/frame_duration).astype(int)
        tidxs[tidxs>len(time)] = len(time) 
        binary_spiketrains[row][tidxs] = 1

    functional_adjacency_matrix = []
    for irow,bsti in enumerate(binary_spiketrains):
        row_xcorr = []
        for jrow,bstj in enumerate(binary_spiketrains):
            if irow==jrow:
                row_xcorr.append(0.0) # no self connections
                continue
            row_xcorr.append(crosscorrelation(bsti, bstj, maxlag=1, mode='corr')[2])
        functional_adjacency_matrix.append(row_xcorr)
    functional_adjacency_matrix = np.array(functional_adjacency_matrix)
    np.save(exp_path+"/results/functional_adjacency_matrix_%d.npy"%scan_id, functional_adjacency_matrix)
    
    # plot
    fig = plt.figure()
    # norm = MidpointNormalize(vmin=np.amin(functional_adjacency_matrix), vmax=np.amax(functional_adjacency_matrix), midpoint=0)
    norm = MidpointNormalize(vmin=-0.04, vmax=0.12, midpoint=0)
    plt.pcolormesh(functional_adjacency_matrix, cmap='coolwarm', norm=norm)
    cbar = plt.colorbar()
    fig.savefig(exp_path+"/results/functional_adjacency_matrix_scan%d.png"%scan_id, transparent=True)
    plt.close()
    fig.clear()
    fig.clf()
    
    # masking
    maskedmatrix = functional_adjacency_matrix*scan_adjacency_matrix[scan_id]
    fig = plt.figure()
    # norm = MidpointNormalize(vmin=np.amin(maskedmatrix), vmax=np.amax(maskedmatrix), midpoint=0)
    norm = MidpointNormalize(vmin=-0.04, vmax=0.1, midpoint=0)
    plt.pcolormesh(maskedmatrix, cmap='coolwarm', norm=norm)
    cbar = plt.colorbar()
    fig.savefig(exp_path+"/results/EMmasked_functional_adjacency_matrix_scan%d.png"%scan_id, transparent=True)
    plt.close()
    fig.clear()
    fig.clf()

### Functional and EM connectivity together
We mask the functional adjacency matrix using the EM connectivity matrix.    

## Are co-tuned cells also co-active and connected?
Reading the stimulation protocol.


In [78]:
stimulus_df = pd.read_csv('MICrONS_data/visual_stimulus/stimulus_label_scan1.csv')
print(stimulus_df.shape)
print(stimulus_df)
print(stimulus_df.index[stimulus_df['label'].notnull()].tolist())
print(stimulus_df['label'].value_counts().sort_index())


(27300, 1)
       label
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
...      ...
27295    NaN
27296    NaN
27297    NaN
27298    NaN
27299    NaN

[27300 rows x 1 columns]
[216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 7

---
# Structural analysis

In [10]:
global_degree_counts = []
global_degree_distribution = []
global_structural_betweeness = []
global_structural_motifs = []
global_structural_motifsratio = []
global_structural_motifsurrogates = []

exec(open("./structural_analysis.py").read())

global_structural_betweeness.append(betweenness_centrality)
global_degree_counts.append(degree_counts)
global_degree_distribution.append(degrees)
global_structural_motifs.append(motifs)
global_structural_motifsurrogates.append(surrogate_motifs)
global_structural_motifsratio.append(motifsratio)

... adjacency matrix
... loaded
334
    number of vertices: 334
... Network nodes degrees
... Degree distributions
... Local Clustering Coefficient
    min 0.0
    mean 0.15509339039514802
    max 1.0
... Betweenness centrality
... Motifs




---
## Dynamical Analysis

Here we first population events, we quantify them, and we extract their core neurons.   
This analysis extends (from step 5 on) that performed by Filipchuk et al. 2022:
1. Compute population instantaneous firing rate (bin)

2. Establish significance threshold for population events   
    2.1 compute Inter-Spike Intervals (ISI) of the original spiketrains   
    2.2 reshuffle ISI to create (1000) surrogates   
    2.3 compute the population instantaneous firing rate for each surrogate time-binned rasterplot   

3. Find population events   
    3.1 smoothed firing rate   
    3.2 instantaneous threshold is the 99% of the surrogate population instantaneous firing rate   
    3.3 the peaks above intersections of smoothed fr and threshold mark population events   
    3.4 the minima before and after a peak are taken as start and end times of the population event   
    
4. Find clusters of events   
    4.1 produce a cell id signature vector of each population event   
    4.2 perform clustering linkage by complete cross-correlation of event vectors   
    4.3 produce surrogates clusters to establish a cluster significance threshold (95%)     
    4.4 find the event reproducibility within each cluster (cluster events cross-correlation)   

5. Find core neurons   
    5.1 take all neurons participating to a cluster of events   
    5.2 use the a percentage (from 60 to 99%) of the cluster event reproducibility as core significance threshold   
    5.3 if the occurrence frequency of a neuron is beyond threshold, then the neuron is taken as core   
    5.4 remove core neurons if firing unspecifically within and outside their cluster   
    
### All panels of Figure 1

are produced in the next cell by the file `dynamical_analysis.py`.

In [11]:
global_structural_motif_cores = {k: 0 for k in range(16)}
global_structural_motif_others = {k: 0 for k in range(16)}
global_events_sec = []
global_events_duration = []
global_cluster_number = []
global_cluster_selfsimilarity = []

core_reproducibility_perc = 60 # threshold for detecting cores
exec(open("./dynamical_analysis.py").read())

global_events_sec.append(events_sec)
global_events_duration.extend(events_durations_f)
global_cluster_number.append(nclusters)
global_cluster_selfsimilarity.extend(reproducibility_list)

... firing statistics
    population firing: 1.23±1.14 sp/frame
    smoothing
... generating surrogates to establish population event threshold
    cells firing rate: 0.01±0.10 sp/s
    event size threshold (mean): 3.2139256165099335
... find population events in the trial
... signatures of population events
    number of events: 226
    number of events per sec: 0.1228247519048706
    events duration: 0.674±0.229
    events size: 8.000±3.775
... Similarity of events matrix
... clustering
    linkage
    surrogate events signatures for clustering threshold
    cluster reproducibility threshold: 0.20448144105807747
    cluster size threshold: 2
    Total number of clusters: 97
    # clusters (after removing those below reproducibility threshold): 10
... finding cluster cores
    removing cores firing unspecifically
    gathering cores from all clusters
    # cores: 35
    # non-cores: 77
    cores per cluster: 4.64±1.92 (min 2, max 8)
    others per cluster: 107.36±1.92 (min 104, max 11

  arr = np.asanyarray(arr)


### Are cores more functionally connected?
How likely is that a core is functionally efficient to elicit a response in a core or others.

In [None]:
# efficacy probability as 1-lag correlations
core2core_efficacy = [] # probability
core2other_efficacy = []
other2core_efficacy = []
other2other_efficacy = []
for dyn_core in clusters_cores:
    dyn_core_indexes = [ophys_cell_ids.index(strid) for strid in dyn_core]
    dyn_other_indexes = list(set(ophys_cell_indexes).symmetric_difference(set(dyn_core_indexes)))
    # selection
    core2core_efficacy.extend( [conns for cid in functional_adjacency_matrix[dyn_core_indexes,:] for conns in cid[dyn_core_indexes]] )
    core2other_efficacy.extend( [conns for cid in functional_adjacency_matrix[dyn_core_indexes,:] for conns in cid[dyn_other_indexes]] )
    other2core_efficacy.extend( [conns for cid in functional_adjacency_matrix[dyn_other_indexes,:] for conns in cid[dyn_core_indexes]] )
    other2other_efficacy.extend( [conns for cid in functional_adjacency_matrix[dyn_other_indexes,:] for conns in cid[dyn_other_indexes]] )

print("    {:d} core2core 1-lag R: {:1.3f}±{:1.2f}".format(len(core2core_efficacy), np.mean(core2core_efficacy),np.std(core2core_efficacy)) )
print("    {:d} core2other 1-lag R: {:1.3f}±{:1.2f}".format(len(core2other_efficacy), np.mean(core2other_efficacy),np.std(core2other_efficacy)) )
print("    {:d} other2core 1-lag R: {:1.3f}±{:1.2f}".format(len(other2core_efficacy), np.mean(other2core_efficacy),np.std(other2core_efficacy)) )
print("    {:d} other2other 1-lag R: {:1.3f}±{:1.2f}".format(len(other2other_efficacy), np.mean(other2other_efficacy),np.std(other2other_efficacy)) )
# significativity
kwstat,pval = stats.kruskal(core2core_efficacy, other2other_efficacy)
print("    core-core vs other-other 1-lag R Kruskal-Wallis test results:",kwstat,pval)
d,_ = stats.ks_2samp(core2core_efficacy, other2other_efficacy) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
kwstat,pval = stats.kruskal(core2core_efficacy, core2other_efficacy)
print("    core-core vs core-other 1-lag R Kruskal-Wallis test results:",kwstat,pval)
d,_ = stats.ks_2samp(core2core_efficacy, core2other_efficacy) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
kwstat,pval = stats.kruskal(core2core_efficacy, other2core_efficacy)
print("    core-core vs other-core 1-lag R Kruskal-Wallis test results:",kwstat,pval)
d,_ = stats.ks_2samp(core2core_efficacy, other2core_efficacy) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
# all spine volumes by type
fig, ax = plt.subplots()
xs = np.random.normal(0, 0.04, len(core2core_efficacy))
plt.scatter(xs, core2core_efficacy, edgecolor='forestgreen', facecolor=('#228B224d'))
xs = np.random.normal(1, 0.04, len(core2other_efficacy))
plt.scatter(xs, core2other_efficacy, edgecolor='forestgreen', facecolor=('#228B224d'))
xs = np.random.normal(2, 0.04, len(other2core_efficacy))
plt.scatter(xs, other2core_efficacy, edgecolor='silver', facecolor=('#C0C0C04d'))
xs = np.random.normal(3, 0.04, len(other2other_efficacy))
plt.scatter(xs, other2other_efficacy, edgecolor='silver', facecolor=('#C0C0C04d'))
vp = ax.violinplot([core2core_efficacy,core2other_efficacy,other2core_efficacy,other2other_efficacy], [0,1,2,3], widths=0.3, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc in vp['bodies'][0:1]:
    pc.set_facecolor('#228B224d')
for pc in vp['bodies'][1:]:
    pc.set_facecolor('#D3D3D34d')
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Efficacy (1-lag R)')
plt.xticks([0, 1, 2, 3], ["core-core\n(n={:d})".format(len(core2core_efficacy)), "core-other\n(n={:d})".format(len(core2other_efficacy)),"other-core\n(n={:d})".format(len(other2core_efficacy)),"other-other\n(n={:d})".format(len(other2other_efficacy))])
fig.savefig(exp_path+'/results/global_cores_others_efficacy.png', transparent=True, dpi=1500)
# fig.savefig(exp_path+'/results/global_cores_others_efficacy.svg', transparent=True)
plt.close()
fig.clf()

---
## Mixing structural and dynamical analyses results to characterize core connectivity

Here, we collect the evidence contrasting the hypothesis that core neurons are strongly connected.   
We tested two fundamental attractor-driven assumptions:
- synapses between cores are more numerous and stronger compared to others   
- circuits made by cores involve more recursive connections toward cores

We can take the **number** and **volume** of post-synaptic spines as proxy for their functional efficacy. 

But two-photon imaged neurons are few (N=112), and core neurons even fewer (3 to 8 per cluster, 35 total). Therefore their count of (proofread) spines leads to underpowered statistics.    
We took three inclusive solutions: 
- we lowered the threshold for core identification to a minimum (participation to 60% of the events)
- we did not differentiate spines based on their point of contact (e.g. axo-somatic, axo-dendritic, axo-axonic, etc)
- we considered the number of spines to and from all neurons that were not part of the two-photon imaged dataset (based on the total count of spines, which lacks the proofread volume of the spines). 

### Core vs other to/from all spine number and volume (panel 2A)
We want to know whether cores and others differ globally in terms of their spines.    
Knowing the global properties of cores is relevant to assess the significance of the (underpowered) subsequent statistics limited to core vs other spine volume.

In [15]:
print("... postsynaptic spines on cores or others from all sources in the EM volume")
all2core_spine_vol = [] # µm3
core2all_spine_vol = []
all2other_spine_vol = []
other2all_spine_vol = []

all2core_spine_num = [] # num
core2all_spine_num = []
all2other_spine_num = []
other2all_spine_num = []
norm_all2core_spine_num = 0.0 # normalized num
norm_core2all_spine_num = 0.0
norm_all2other_spine_num = 0.0
norm_other2all_spine_num = 0.0

set_ids = set(ophys_cell_ids)
for dyn_core_ids in clusters_cores:
    dyn_other_ids = set_ids.symmetric_difference(dyn_core_ids)
    # searching
    
    all2core_synapse_df = syn_df.query(f'(post_root_id in {list(dyn_core_ids)})')
    if not all2core_synapse_df.empty:
        all2core_spine_num.extend( all2core_synapse_df.groupby('post_root_id').size() )
        norm_all2core_spine_num += all2core_synapse_df.groupby('post_root_id').size().sum()/(syn_df.shape[0]*len(list(dyn_core_ids))) # normalized by source*target

    all2other_synapse_df = syn_df.query(f'(post_root_id in {list(dyn_other_ids)})')
    if not all2other_synapse_df.empty:
        all2other_spine_num.extend( all2other_synapse_df.groupby('post_root_id').size() )
        norm_all2other_spine_num += all2other_synapse_df.groupby('post_root_id').size().sum()/(syn_df.shape[0]*len(list(dyn_other_ids)))

    core2all_synapse_df = syn_df.query(f'(pre_root_id in {list(dyn_core_ids)})')
    if not core2all_synapse_df.empty:
        core2all_spine_num.extend( core2all_synapse_df.groupby('pre_root_id').size() )
        norm_core2all_spine_num += core2all_synapse_df.groupby('pre_root_id').size().sum()/(syn_df.shape[0]*len(list(dyn_core_ids)))

    other2all_synapse_df = syn_df.query(f'(pre_root_id in {list(dyn_other_ids)})')
    if not other2all_synapse_df.empty:
        other2all_spine_num.extend( other2all_synapse_df.groupby('pre_root_id').size() )
        norm_other2all_spine_num += other2all_synapse_df.groupby('pre_root_id').size().sum()/(syn_df.shape[0]*len(list(dyn_other_ids)))

    # id, pre_root_id, post_root_id, cleft_vx, spine_vol_um3
    all2core_synapse_df = syn_spines_df.query(f'(post_root_id in {list(dyn_core_ids)})')
    if not all2core_synapse_df.empty:
        all2core_spine_vol.extend( all2core_synapse_df['spine_vol_um3'].tolist() )
    all2other_synapse_df = syn_spines_df.query(f'(post_root_id in {list(dyn_other_ids)})')
    if not all2other_synapse_df.empty:
        all2other_spine_vol.extend( all2other_synapse_df['spine_vol_um3'].tolist() )
    core2all_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_core_ids)})')
    if not core2all_synapse_df.empty:
        core2all_spine_vol.extend( core2all_synapse_df['spine_vol_um3'].tolist() )
    other2all_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_other_ids)})')
    if not other2all_synapse_df.empty:
        other2all_spine_vol.extend( other2all_synapse_df['spine_vol_um3'].tolist() )
        
# number description
print("    all2core spines number: {:1.3f}±{:1.2f} ".format(np.mean(all2core_spine_num),np.std(all2core_spine_num)) )
print("    core2all spines number: {:1.3f}±{:1.2f} ".format(np.mean(core2all_spine_num),np.std(core2all_spine_num)) )
print("    all2other spines number: {:1.3f}±{:1.2f} ".format(np.mean(all2other_spine_num),np.std(all2other_spine_num)) )
print("    other2all spines number: {:1.3f}±{:1.2f} ".format(np.mean(other2all_spine_num),np.std(other2all_spine_num)) )
# number significativity
kwstat,pval = stats.kruskal(all2core_spine_num, all2other_spine_num)
print("    all-core vs all-other spine number Kruskal-Wallis test results:",kwstat,pval)
d,_ = stats.ks_2samp(all2core_spine_num, all2other_spine_vol) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
kwstat,pval = stats.kruskal(core2all_spine_num, other2all_spine_num)
print("    core-all vs other-all spine number Kruskal-Wallis test results:",kwstat,pval)
d,_ = stats.ks_2samp(core2all_spine_num, other2all_spine_num) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

# plot
fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(core2all_spine_num))
plt.scatter(xs, core2all_spine_num, edgecolor='forestgreen', facecolor=('#228B224d'))
xs = np.random.normal(2, 0.04, len(other2all_spine_num))
plt.scatter(xs, other2all_spine_num, edgecolor='silver', facecolor=('#D3D3D34d'))
xs = np.random.normal(3, 0.04, len(all2core_spine_num))
plt.scatter(xs, all2core_spine_num, edgecolor='forestgreen', facecolor=('#228B224d'))
xs = np.random.normal(4, 0.04, len(all2other_spine_num))
plt.scatter(xs, all2other_spine_num, edgecolor='silver', facecolor=('#D3D3D34d'))
vp = ax.violinplot([core2all_spine_num,other2all_spine_num,all2core_spine_num,all2other_spine_num], widths=0.15, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d','#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Spine number per cell')
plt.yscale('log')
plt.xticks([1,2,3,4], ["core-all", "other-all", "all-core", "all-other"])
fig.savefig(exp_path+'/results/global_all_cores_others_spine_num.svg', transparent=True)
plt.close()
fig.clf()

# normalized spine number by type
x = np.array(["all-core", "all-other", "core-all", "other-all"])
y = np.array([norm_all2core_spine_num, norm_all2other_spine_num, norm_core2all_spine_num, norm_other2all_spine_num])
fig, ax = plt.subplots()
plt.bar(x, y, color=['forestgreen','silver','forestgreen','silver'])
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Normalized number of spines')
fig.savefig(exp_path+'/results/global_all_cores_others_spine_normnum.svg', transparent=True)
fig.clf()
plt.close()

print()
print("... postsynaptic spines on cores or others from sources in the EM volume (proofread with measured volume)")
# volume description
print("    {:d} all2core spines, volume: {:1.3f}±{:1.2f} µm3".format(len(all2core_spine_vol), np.mean(all2core_spine_vol),np.std(all2core_spine_vol)) )
# print("    "+str(stats.describe(all2core_spine_vol)) )
print("    {:d} core2all spines, volume: {:1.3f}±{:1.2f} µm3".format(len(core2all_spine_vol), np.mean(core2all_spine_vol),np.std(core2all_spine_vol)) )
# print("    "+str(stats.describe(all2core_spine_vol)) )
print("    {:d} all2other spines, volume: {:1.3f}±{:1.2f} µm3".format(len(all2other_spine_vol), np.mean(all2other_spine_vol),np.std(all2other_spine_vol)) )
# print("    "+str(stats.describe(all2other_spine_vol)) )
print("    {:d} other2all spines, volume: {:1.3f}±{:1.2f} µm3".format(len(other2all_spine_vol), np.mean(other2all_spine_vol),np.std(other2all_spine_vol)) )
# print("    "+str(stats.describe(all2other_spine_vol)) )
# volume significativity
kwstat,pval = stats.kruskal(all2core_spine_vol, all2other_spine_vol)
print("    all-core vs all-other spine volume Kruskal-Wallis test results:",kwstat,pval)
d,_ = stats.ks_2samp(all2core_spine_vol, all2other_spine_vol) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
kwstat,pval = stats.kruskal(core2all_spine_vol, other2all_spine_vol)
print("    core-all vs other-all spine volume Kruskal-Wallis test results:",kwstat,pval)
d,_ = stats.ks_2samp(core2all_spine_vol, other2all_spine_vol) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

# plot
fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(core2all_spine_vol))
plt.scatter(xs, core2all_spine_vol, edgecolor='forestgreen', facecolor=('#228B224d'))
xs = np.random.normal(2, 0.04, len(other2all_spine_vol))
plt.scatter(xs, other2all_spine_vol, edgecolor='silver', facecolor=('#D3D3D34d'))
xs = np.random.normal(3, 0.04, len(all2core_spine_vol))
plt.scatter(xs, all2core_spine_vol, edgecolor='forestgreen', facecolor=('#228B224d'))
xs = np.random.normal(4, 0.04, len(all2other_spine_vol))
plt.scatter(xs, all2other_spine_vol, edgecolor='silver', facecolor=('#D3D3D34d'))
vp = ax.violinplot([core2all_spine_vol,other2all_spine_vol,all2core_spine_vol,all2other_spine_vol], widths=0.15, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d','#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Spine Volume (µm^3)')
plt.xticks([1,2,3,4], ["core-all\n(n={:d})".format(len(core2all_spine_vol)), "other-all\n(n={:d})".format(len(other2all_spine_vol)), "all-core\n(n={:d})".format(len(all2core_spine_vol)), "all-other\n(n={:d})".format(len(all2other_spine_vol))])
fig.savefig(exp_path+'/results/global_all_cores_others_spine_vol.svg', transparent=True)
plt.close()
fig.clf()

... postsynaptic spines on cores or others from all sources in the EM volume
    all2core spines number: 2262.490±985.38 
    core2all spines number: 16.560±22.24 
    all2other spines number: 2690.411±1031.29 
    other2all spines number: 13.524±23.78 
    all-core vs all-other spine number Kruskal-Wallis test results: 7.504163352539384 0.006155652888035151
    Kolmogorov-Smirnov Effect Size: 1.000
    core-all vs other-all spine number Kruskal-Wallis test results: 4.734519917290584 0.029563097881958964
    Kolmogorov-Smirnov Effect Size: 0.187

... postsynaptic spines on cores or others from sources in the EM volume (proofread with measured volume)
    242 all2core spines, volume: 0.069±0.07 µm3
    122 core2all spines, volume: 0.078±0.07 µm3
    5863 all2other spines, volume: 0.081±0.08 µm3
    1770 other2all spines, volume: 0.076±0.07 µm3
    all-core vs all-other spine volume Kruskal-Wallis test results: 3.3893914505796774 0.06561716292555728
    Kolmogorov-Smirnov Effect Size: 0.

### Core vs others  
The number of cores and non-cores for each cluster is different. Therefore we have to normalize this count to evaluate.

For each set of reproducible cluster we count:    
- the number of synapses made by a cell type (core or not) towards others, weighted by the squared number of target cells    
    - the expectation is that core-to-core and core-to-other synapses should be numerous in order to pull the dynamics
- the post-synaptic spine volume of synapses made by a cell type (core or not) towards others.   
    - the expectation is that core-to-core and core-to-other spines should be larger in order to pull the dynamics

**Synapses between core neurons of each cluster are less than every other combination.**    
Note that the resulting normalized synapse counts (for the others) check with the network density.

In [17]:
# the density of the directed graph.
network_density = dgraph.density(loops=True)
print("... network density (ratio between the edges present and the maximum number of edges that the graph can contain):", network_density )
# spine number
core2core_spine_num = 0.0 # to be normalized
core2other_spine_num = 0.0
other2core_spine_num = 0.0
other2other_spine_num = 0.0
# spine volume
core2core_spine_vol = [] # µm3
core2other_spine_vol = []
other2core_spine_vol = []
other2other_spine_vol = []

set_ids = set(ophys_cell_ids)
cluster_colors = [color for color in cluster_color_array if color!='gray']
for cluster_k,dyn_core_ids in zip(cluster_colors,clusters_cores):
    if cluster_k=='gray':
        continue
    dyn_other_ids = set_ids.symmetric_difference(dyn_core_ids)
    
    # spine number
    core2core_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_core_ids)}) and (post_root_id in {list(dyn_core_ids)})')
    core2core_spine_num += len(core2core_synapse_df['id'].tolist())/(len(dyn_core_ids)*len(dyn_core_ids)) # normalized by source*target

    core2other_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_core_ids)}) and (post_root_id in {list(dyn_other_ids)})')
    core2other_spine_num += len(core2other_synapse_df['id'].tolist())/(len(dyn_core_ids)*len(dyn_other_ids)) 

    other2core_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_other_ids)}) and (post_root_id in {list(dyn_core_ids)})')
    other2core_spine_num += len(other2core_synapse_df['id'].tolist())/(len(dyn_core_ids)*len(dyn_other_ids)) 
    
    other2other_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_other_ids)}) and (post_root_id in {list(dyn_other_ids)})')
    other2other_spine_num += len(other2other_synapse_df['id'].tolist())/(len(dyn_other_ids)*len(dyn_other_ids)) 

    # spine volume
    # id, pre_root_id, post_root_id, cleft_vx, spine_vol_um3
    core2core_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_core_ids)}) and (post_root_id in {list(dyn_core_ids)})')
    if not core2core_synapse_df.empty:
        core2core_spine_vol.extend( core2core_synapse_df['spine_vol_um3'].tolist() )
    
    core2other_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_core_ids)}) and (post_root_id in {list(dyn_other_ids)})')
    if not core2other_synapse_df.empty:
        core2other_spine_vol.extend( core2other_synapse_df['spine_vol_um3'].tolist() )
    
    other2core_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_other_ids)}) and (post_root_id in {list(dyn_core_ids)})')
    if not other2core_synapse_df.empty:
        other2core_spine_vol.extend( other2core_synapse_df['spine_vol_um3'].tolist() )
 
    other2other_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_other_ids)}) and (post_root_id in {list(dyn_other_ids)})')
    if not other2other_synapse_df.empty:
        other2other_spine_vol.extend( other2other_synapse_df['spine_vol_um3'].tolist() )

# description
# number
print("... Normalized number of spines")
print("    {:f} core2core normalized spines number".format((core2core_spine_num)) )
print("    {:f} core2other normalized spines number".format((core2other_spine_num)) )
print("    {:f} other2core normalized spines number".format((other2core_spine_num)) )
print("    {:f} other2other normalized spines number".format((other2other_spine_num)) )

# spines
print("... Spine volumes")
print("    {:d} core2core spines, volume: {:1.3f}±{:1.2f} µm3".format(len(core2core_spine_vol), np.mean(core2core_spine_vol),np.std(core2core_spine_vol)) )
# print("    "+str(stats.describe(core2core_spine_vol)) )
print("    {:d} core2other spines, volume: {:1.3f}±{:1.2f} µm3".format(len(core2other_spine_vol), np.mean(core2other_spine_vol),np.std(core2other_spine_vol)) )
# print("    "+str(stats.describe(core2other_spine_vol)) )
print("    {:d} other2core spines, volume: {:1.3f}±{:1.2f} µm3".format(len(other2core_spine_vol), np.mean(other2core_spine_vol),np.std(other2core_spine_vol)) )
# print("    "+str(stats.describe(other2core_spine_vol)) )
print("    {:d} other2other spines, volume: {:1.3f}±{:1.2f} µm3".format(len(other2other_spine_vol), np.mean(other2other_spine_vol),np.std(other2other_spine_vol)) )
# print("    "+str(stats.describe(other2other_spine_vol)) )

# significativity
# this is just to test the significativity if the number of samples was correct
kwstat,pval = stats.kruskal(other2core_spine_vol, other2other_spine_vol)
print("   core vs other spine size Kruskal-Wallis test results:",kwstat,pval)

# plotting
# all spine number by type
x = np.array(["core-core", "core-other", "other-core", "other-other"])
y = np.array([core2core_spine_num, core2other_spine_num, other2core_spine_num, other2other_spine_num])
fig, ax = plt.subplots()
plt.bar(x, y, color=['forestgreen','forestgreen','silver','silver'])
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Normalized number of spines')
fig.savefig(exp_path+'/results/global_cores_others_spine_num.svg', transparent=True)
fig.clf()
plt.close()

# normalized spine number by type
x = np.array(["all\ncore", "all\nother", "core\nall", "other\nall", "core\ncore", "core\nother", "other\ncore", "other\nother"])
y = np.array([norm_all2core_spine_num, norm_all2other_spine_num, norm_core2all_spine_num, norm_other2all_spine_num, core2core_spine_num, core2other_spine_num, other2core_spine_num, other2other_spine_num])
fig, ax = plt.subplots()
plt.bar(x, y, width=0.5, color=['forestgreen','silver','forestgreen','silver','forestgreen','forestgreen','silver','silver'])
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Connection probability (relative spine freq.)')
fig.savefig(exp_path+'/results/global_all_cores_others_spine_normnum.svg', transparent=True)
fig.clf()
plt.close()


# all spine volumes by type
fig, ax = plt.subplots()
# xs = np.random.normal(0, 0.04, len(core2core_spine_vol))
# plt.scatter(xs, core2core_spine_vol, edgecolor='forestgreen', facecolor=('#228B224d'))
xs = np.random.normal(1, 0.04, len(core2other_spine_vol))
plt.scatter(xs, core2other_spine_vol, edgecolor='forestgreen', facecolor=('#228B224d'))
xs = np.random.normal(2, 0.04, len(other2core_spine_vol))
plt.scatter(xs, other2core_spine_vol, edgecolor='silver', facecolor=('#D3D3D34d'))
xs = np.random.normal(3, 0.04, len(other2other_spine_vol))
plt.scatter(xs, other2other_spine_vol, edgecolor='silver', facecolor=('#D3D3D34d'))
# vp = ax.violinplot([core2core_spine_vol,core2other_spine_vol,other2core_spine_vol,other2other_spine_vol], [0,1,2,3], widths=0.3, showextrema=False, showmedians=True)
vp = ax.violinplot([core2other_spine_vol,other2core_spine_vol,other2other_spine_vol], [1,2,3], widths=0.3, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc in vp['bodies'][0:1]:
    pc.set_facecolor('#228B224d')
for pc in vp['bodies'][1:]:
    pc.set_facecolor('#D3D3D34d')
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Spine Volume (µm^3)')
plt.xticks([0, 1, 2, 3], [ "core-core\n(n={:d})".format(len(core2core_spine_vol)),"core-other\n(n={:d})".format(len(core2other_spine_vol)),"other-core\n(n={:d})".format(len(other2core_spine_vol)),"other-other\n(n={:d})".format(len(other2other_spine_vol))])
fig.savefig(exp_path+'/results/global_cores_others_spine_vol.svg', transparent=True)
fig.clf()
plt.close()

... network density (ratio between the edges present and the maximum number of edges that the graph can contain): 0.017578615224640538
... Normalized number of spines
    0.000000 core2core normalized spines number
    0.095146 core2other normalized spines number
    0.052929 other2core normalized spines number
    0.042870 other2other normalized spines number
... Spine volumes
    0 core2core spines, volume: nan±nan µm3
    50 core2other spines, volume: 0.089±0.07 µm3
    26 other2core spines, volume: 0.092±0.06 µm3
    494 other2other spines, volume: 0.074±0.06 µm3
   core vs other spine size Kruskal-Wallis test results: 5.629426395826594 0.017661409795233805


  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
  ret = ret.dtype.type(ret / rcount)


### Non-Ca-imaged and outside EM volume inputs (panel 2AB)

Core responses could be due to non-imaged and outside volume sources. How can we rule this out (or reduce our lack of knowledge)?   
We can ask *Are there more or stronger spines made by non-imaged neurons (either local or far) on cores or others?*   
We have this information since we know the cell ID of all somas in the volume. We can take the spines having presynaptic ID different from the known Ca-imaged IDs or different from the somas within the EM volume.

### Are core more mutually connected than others?

We started by asking whether a global measure such as assortativity - - gives a clear summary of mutuality between all cores.

In [14]:
print('... assortativity')
print('    preparing vertex labels for cores and others')
dgraph.vs["ophys_cell_id"] = ophys_cell_ids
is_id_core = np.array( [0] * len(ophys_cell_ids) )
is_id_core[core_indexes] = 1
dgraph.vs["is_core"] = is_id_core.tolist()
pyc_ca_syn_df = syn_df.query(f'(pre_root_id in {ophys_cell_ids}) and (post_root_id in {ophys_cell_ids})')
is_syn_core = np.array( [0] * len(pyc_ca_syn_df) )
for cid in [item for sublist in clusters_cores for item in sublist]:
    is_syn_core[pyc_ca_syn_df['pre_root_id'] == cid] = 1
dgraph.es["is_core"] = is_syn_core.tolist()

# is a preference for a network's nodes to attach to others that are similar in some way
print("    overall:", dgraph.assortativity_nominal("is_core", directed=True) )
# cores degree distro vs others degree distro
# biological networks typically show negative assortativity, or disassortative mixing, or disassortativity, as high degree nodes tend to attach to low degree nodes.
print("    assortativity degree:", dgraph.assortativity_degree(directed=True) )


... assortativity
    preparing vertex labels for cores and others
    overall: 0.024409568514006098
    assortativity degree: 0.007494609849787891


### 3-Motif connectivity of cores and others (panel 2D)

This measure of the network reports the participation of cores (or non-cores) in triplet motifs.    
Note that the triplets are not exclusively made of cores (or non-cores).

In [15]:
# For each set of reproducible cluster cores we count their connectivity motifs.
set_indexes = set(ophys_cell_indexes)
for dyn_core_ids in clusters_cores:
    dyn_core_indexes = set([ophys_cell_ids.index(strid) for strid in dyn_core_ids])
    dyn_other_indexes = set_indexes.symmetric_difference(dyn_core_indexes)
    for mclass, mlist in motif_vertices.items():
        for mtriplet in mlist:
            intersection_cores = len(list(dyn_core_indexes.intersection(mtriplet)))
            intersection_others = len(list(dyn_other_indexes.intersection(mtriplet)))
            global_structural_motif_cores[mclass] += intersection_cores
            global_structural_motif_others[mclass] += intersection_others

fig = plt.figure()
plt.bar(global_structural_motif_cores.keys(), global_structural_motif_cores.values(), color='forestgreen')
plt.ylabel('cores occurrences')
plt.yscale('log')
plt.ylim([0.7,plt.ylim()[1]])
plt.xlabel('motifs types')
fig.savefig(exp_path+'/results/global_motifs_cores.svg', transparent=True)
plt.close()
fig.clear()
fig.clf()
fig = plt.figure()
plt.bar(global_structural_motif_others.keys(), global_structural_motif_others.values(), color='silver')
plt.ylabel('non-cores occurrences')
plt.yscale('log')
plt.ylim([0.7,plt.ylim()[1]])
plt.xlabel('motifs types')
fig.savefig(exp_path+'/results/global_motifs_others.svg', transparent=True)
plt.close()
fig.clear()
fig.clf()
print("... saved mutual connectivity of cores and others")

... saved mutual connectivity of cores and others


In [16]:
# dgraph is already defined from the structural_analysis included file
print("    graph diameter (#vertices):", dgraph.diameter(directed=True, unconn=True, weights=None))
print("    graph average path length (#vertices):", dgraph.average_path_length(directed=True, unconn=True))

    graph diameter (#vertices): 7
    graph average path length (#vertices): 2.9393095768374167


## Centrality of cores

If cores are not more mutually connected compared to others, then what is their characterizing feature?    
In the cells above, we saw indications of more interconnections between cores and others than within the same type.     
This could hint at some form of centrality.

### Degree centrality of cores is not different from others (panel 2E)

In [17]:
print('... degree centrality')
degree_centrality_cores = dgraph.degree(core_indexes, mode='out', loops=True)
degree_centrality_others = dgraph.degree(other_indexes, mode='out', loops=True)
# description
print("    cores: "+str(stats.describe(degree_centrality_cores)) )
print("    others: "+str(stats.describe(degree_centrality_others)) )
# significativity
print("    Welch t test:  %.3f p= %.3f" % stats.ttest_ind(degree_centrality_cores, degree_centrality_others, equal_var=False))
d,_ = stats.ks_2samp(degree_centrality_cores, degree_centrality_others) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(degree_centrality_cores))
plt.scatter(xs, degree_centrality_cores, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(degree_centrality_others))
plt.scatter(xs, degree_centrality_others, alpha=0.3, c='silver')
vp = ax.violinplot([degree_centrality_cores,degree_centrality_others], widths=0.15, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Degree')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(degree_centrality_cores)), "other\n(n={:d})".format(len(degree_centrality_others))])
fig.savefig(exp_path+'/results/global_cores_others_degree.svg', transparent=True)
plt.close()
fig.clf()

... degree centrality
    cores: DescribeResult(nobs=35, minmax=(0, 79), mean=11.542857142857143, variance=319.66722689075624, skewness=1.9980140270397448, kurtosis=4.196511544747042)
    others: DescribeResult(nobs=77, minmax=(0, 39), mean=5.753246753246753, variance=92.13568010936429, skewness=1.6772884894874924, kurtosis=2.118802706844791)
    Welch t test:  1.801 p= 0.079
    Kolmogorov-Smirnov Effect Size: 0.169


### Betweenness cenrality of cores is not different from others (panel 2F)

In [18]:
print('... betweenness')
cores_betweenness = np.array(dgraph.betweenness(vertices=core_indexes, directed=True))
others_betweenness = np.array(dgraph.betweenness(vertices=other_indexes, directed=True))
print("    cores: "+str(stats.describe(cores_betweenness)) )
print("    others: "+str(stats.describe(others_betweenness)) )
# significativity
print("    Welch t test:  %.3f p= %.3f" % stats.ttest_ind(cores_betweenness, others_betweenness, equal_var=False))
d,_ = stats.ks_2samp(cores_betweenness, others_betweenness) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(cores_betweenness))
plt.scatter(xs, cores_betweenness, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(others_betweenness))
plt.scatter(xs, others_betweenness, alpha=0.3, c='silver')
vp = ax.violinplot([cores_betweenness,others_betweenness], widths=0.15, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
# plt.yscale('log')
# plt.ylim([0.00001,plt.ylim()[1]])
plt.ylabel('Betweenness')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(cores_betweenness)), "other\n(n={:d})".format(len(others_betweenness))])
fig.savefig(exp_path+'/results/global_cores_others_betweenness.svg', transparent=True)
plt.close()
fig.clf()

... betweenness
    cores: DescribeResult(nobs=35, minmax=(0.0, 2611.278988942133), mean=372.4290036027808, variance=404149.9675061276, skewness=1.9705740574279296, kurtosis=3.2944463171590233)
    others: DescribeResult(nobs=77, minmax=(0.0, 1683.5995508593448), mean=225.43407589305858, variance=172752.41816079392, skewness=1.9865603004296337, kurtosis=3.191352629595454)
    Welch t test:  1.252 p= 0.217
    Kolmogorov-Smirnov Effect Size: 0.151


### Hub scores of cores is not different from others (panel 2G)

In [19]:
print("... authority score")
# what is the overlap of cores and hubs?
# Hub
authority_scores = np.array(dgraph.authority_score(weights=None, scale=True, return_eigenvalue=False))
authority_scores_cores = authority_scores[core_indexes]
authority_scores_others = authority_scores[other_indexes]
print("    authority cores: "+str(stats.describe(authority_scores_cores)) )
print("    authority others: "+str(stats.describe(authority_scores_others)) )
# significativity
print("    Kruskal-Wallis test:  %.3f p= %.3f" % stats.ttest_ind(authority_scores_cores, authority_scores_others))
d,_ = stats.ks_2samp(authority_scores_cores, authority_scores_others) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
# all eccentricity by type
fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(authority_scores_cores))
plt.scatter(xs, authority_scores_cores, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(authority_scores_others))
plt.scatter(xs, authority_scores_others, alpha=0.3, c='silver')
vp = ax.violinplot([authority_scores_cores,authority_scores_others], widths=0.15, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Hub score')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(authority_scores_cores)), "other\n(n={:d})".format(len(authority_scores_others))])
fig.savefig(exp_path+'/results/global_cores_others_authority_score.svg', transparent=True)
plt.close()
fig.clf()

print("... hub score")
# what is the overlap of cores and hubs?
# Hub
hub_scores = np.array(dgraph.hub_score(weights=None, scale=True, return_eigenvalue=False))
hub_scores_cores = hub_scores[core_indexes]
hub_scores_others = hub_scores[other_indexes]
print("    hub cores: "+str(stats.describe(hub_scores_cores)) )
print("    hub others: "+str(stats.describe(hub_scores_others)) )
# significativity
print("    Kruskal-Wallis test:  %.3f p= %.3f" % stats.ttest_ind(hub_scores_cores, hub_scores_others))
d,_ = stats.ks_2samp(hub_scores_cores, hub_scores_others) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
# all eccentricity by type
fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(hub_scores_cores))
plt.scatter(xs, hub_scores_cores, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(hub_scores_others))
plt.scatter(xs, hub_scores_others, alpha=0.3, c='silver')
vp = ax.violinplot([hub_scores_cores,hub_scores_others], widths=0.15, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Hub score')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(hub_scores_cores)), "other\n(n={:d})".format(len(hub_scores_others))])
fig.savefig(exp_path+'/results/global_cores_others_hub_score.svg', transparent=True)
plt.close()
fig.clf()

... authority score
    authority cores: DescribeResult(nobs=35, minmax=(0.050932284009955996, 0.6393118539241462), mean=0.28633501916044823, variance=0.025168617579472842, skewness=0.4045893820631807, kurtosis=-0.7041437958817056)
    authority others: DescribeResult(nobs=77, minmax=(0.009830045486743951, 1.0), mean=0.26912859944504264, variance=0.03275365462579275, skewness=1.7294508148746914, kurtosis=4.733534247428858)
    Kruskal-Wallis test:  0.484 p= 0.629
    Kolmogorov-Smirnov Effect Size: 0.145
... hub score
    hub cores: DescribeResult(nobs=35, minmax=(9.140308603652596e-18, 1.0), mean=0.12379854977870158, variance=0.04505489094516889, skewness=2.5058978577266298, kurtosis=6.831877154233105)
    hub others: DescribeResult(nobs=77, minmax=(9.140308603652596e-18, 0.4848643968219558), mean=0.05749152510277729, variance=0.01151653843129077, skewness=2.2962041715178856, kurtosis=5.109152512168375)
    Kruskal-Wallis test:  2.199 p= 0.030
    Kolmogorov-Smirnov Effect Size: 0.226

## Cores control the flow of cortical activity

So far, we used structural (graph) measures of neurons selected by looking at their reproducibility (a form of regular activity). In a sense, we were already crossing structural and dynamical information about the network.    

However, we could push this further.    

### Structural underpinnings of clusters (panel 2H)

What is the origin of pattern reproducibility?    
We can look at **how** the underlying connectivity structure supports each events activities.    

We can consider each event as a _network flow problem_, in which the activity can be transferred through cells along the available connections.    

For each event, we compute the max flow between the cells IDs according to their firing sequence.      
**Do core neurons sustain more flow compared to others?**

And we also consider 

In [20]:
# each edge has a capacity and each edge receives a flow. 
# The amount of flow on an edge cannot exceed the capacity of the edge.
# therefore, edges with high capacity will be more important for the flow.
# here we test the hypothesis that edges towards cores have higher capacity
# or that the sum of edges towards cores have a higher total capacity
cell_total_capacity = {cid:list() for cid in ophys_cell_ids}
edges_sourcing = {cid:0 for cid in ophys_cell_ids}
edges_targeting = {cid:0 for cid in ophys_cell_ids}

for cluster_k,events_cellids in sorted_events_cidlist.items():
    if cluster_k == 'gray':
        continue

    for vnt in events_cellids:
        for posi,vidj in enumerate(vnt[1:]):
            vidi = vnt[posi] # enumerate will go from 0
            # print(vidi, vidj)

            # check beginning and end are not the same
            if dgraph.vs.find(ophys_cell_id=vidi).index == dgraph.vs.find(ophys_cell_id=vidj).index:
                continue
            # # check there is a path between the two
            # if len(spinesgraph.get_all_shortest_paths(spinesgraph.vs.find(name=vidi).index, to=spinesgraph.vs.find(name=vidj).index, weights=None, mode='out'))>0:
            #     continue

            # Take the maximum flow between the previous and next vertices
            mfres = dgraph.maxflow(dgraph.vs.find(ophys_cell_id=vidi).index, dgraph.vs.find(ophys_cell_id=vidj).index)
            # print(mfres)
            # returns a tuple containing the following:
            # graph - the graph on which this flow is defined
            # value - the value (capacity) of the maximum flow between the given vertices
            # flow - the flow values on each edge. For directed graphs, this is simply a list where element i corresponds to the flow on edge i.
            # cut - edge IDs in the minimal cut corresponding to the flow.
            # partition - vertex IDs in the parts created after removing edges in the cut
            # es - an edge selector restricted to the edges in the cut.

            # we get a flow value for each edge contributing to the flow.
            # source
            mfres_value = mfres.value
            if vidi in np.array(ophys_cell_ids)[core_indexes]:
                mfres_value /= len(core_indexes)
            else:
                mfres_value /= len(other_indexes)
            cell_total_capacity[vidi].append(mfres_value)
            # target
            mfres_value = mfres.value
            if vidj in np.array(ophys_cell_ids)[core_indexes]:
                mfres_value /= len(core_indexes)
            else:
                mfres_value /= len(other_indexes)
            cell_total_capacity[vidj].append(mfres_value)
            
            # Iterate over the edges identified by the flow.
            # count the edges sourcing from cores, and those targeting cores. Which is more?
            for edge in mfres.es:
                sourceid = int(dgraph.vs[edge.source]['ophys_cell_id'])
                targetid = int(dgraph.vs[edge.target]['ophys_cell_id'])
                if sourceid in cell_total_capacity.keys():
                    edges_sourcing[sourceid] +=1 # just count
                if targetid in cell_total_capacity.keys():
                    edges_targeting[targetid] +=1 # just count

# Flow
# print(cell_total_capacity)
flowvalue_cores = []
for cid in np.array(ophys_cell_ids)[core_indexes]:
    flowvalue_cores.extend(cell_total_capacity[cid])
flowvalue_others = []
for cid in np.array(ophys_cell_ids)[other_indexes]:
    flowvalue_others.extend(cell_total_capacity[cid])

# description
print("    Flow cores: "+str(stats.describe(flowvalue_cores)) )
print("    Flow others: "+str(stats.describe(flowvalue_others)) )
# significativity
print("    Welch t test:  %.3f p= %.3f" % stats.ttest_ind(flowvalue_cores, flowvalue_others, equal_var=False))
d,_ = stats.ks_2samp(flowvalue_cores, flowvalue_others) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(flowvalue_cores))
plt.scatter(xs, flowvalue_cores, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(flowvalue_others))
plt.scatter(xs, flowvalue_others, alpha=0.3, c='silver')
vp = ax.violinplot([flowvalue_cores,flowvalue_others], widths=0.15, showextrema=False, showmeans=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmeans'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Normalized flow value')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(flowvalue_cores)), "other\n(n={:d})".format(len(flowvalue_others))])
fig.savefig(exp_path+'/results/global_cores_others_flowvalue.svg', transparent=True)
plt.close()
fig.clf()

print()
# Cuts
# print(edges_sourcing)
# print(edges_targeting)
flowcuts_core_sources = []
flowcuts_core_targets = []
for cid in np.array(ophys_cell_ids)[core_indexes]:
    flowcuts_core_sources.append(edges_sourcing[cid]/len(core_indexes))
    flowcuts_core_targets.append(edges_targeting[cid]/len(core_indexes))
flowcuts_other_sources = []
flowcuts_other_targets = []
for cid in np.array(ophys_cell_ids)[other_indexes]:
    flowcuts_other_sources.append(edges_sourcing[cid]/len(other_indexes))
    flowcuts_other_targets.append(edges_targeting[cid]/len(other_indexes))

# description
print("    Cut edges sourcing from cores: "+str(stats.describe(flowcuts_core_sources)) )
print("    Cut edges targeting cores: "+str(stats.describe(flowcuts_core_targets)) )
print("    Cut edges sourcing from others: "+str(stats.describe(flowcuts_other_sources)) )
print("    Cut edges targeting others: "+str(stats.describe(flowcuts_other_targets)) )
# significativity
print("    Core targets vs sources Welch t test:  %.3f p= %.3f" % stats.ttest_ind(flowcuts_core_targets, flowcuts_core_sources, equal_var=False))
d,_ = stats.ks_2samp(flowcuts_core_targets, flowcuts_core_sources) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

print("    Core targets vs Other targets Welch t test:  %.3f p= %.3f" % stats.ttest_ind(flowcuts_core_targets, flowcuts_other_targets, equal_var=False))
d,_ = stats.ks_2samp(flowcuts_core_targets, flowcuts_other_targets) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(flowcuts_core_sources))
plt.scatter(xs, flowcuts_core_sources, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(flowcuts_core_targets))
plt.scatter(xs, flowcuts_core_targets, alpha=0.3, c='forestgreen')
xs = np.random.normal(3, 0.04, len(flowcuts_other_sources))
plt.scatter(xs, flowcuts_other_sources, alpha=0.3, c='silver')
xs = np.random.normal(4, 0.04, len(flowcuts_other_targets))
plt.scatter(xs, flowcuts_other_targets, alpha=0.3, c='silver')
vp = ax.violinplot([flowcuts_core_sources,flowcuts_core_targets,flowcuts_other_sources,flowcuts_other_targets], widths=0.15, showextrema=False, showmeans=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc in vp['bodies'][0:2]:
    pc.set_facecolor('#228B224d')
for pc in vp['bodies'][2:]:
    pc.set_facecolor('#D3D3D34d')
vp['cmeans'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Normalized edges in the cut')
plt.xticks([1, 2, 3, 4], ["core as\nsource", "core as\ntarget", "other as\nsource", "other as\ntarget"])
fig.savefig(exp_path+'/results/global_cores_others_cutvalue.svg', transparent=True)
plt.close()
fig.clf()



    Flow cores: DescribeResult(nobs=491, minmax=(0.0, 0.6285714285714286), mean=0.054524294442828046, variance=0.006535403188250653, skewness=1.9746861750038214, kurtosis=6.168518072312809)
    Flow others: DescribeResult(nobs=179, minmax=(0.0, 0.2857142857142857), mean=0.019371689762751217, variance=0.0014466480981574071, skewness=3.0221010239560293, kurtosis=13.700985247727274)
    Welch t test:  7.600 p= 0.000
    Kolmogorov-Smirnov Effect Size: 0.283

    Cut edges sourcing from cores: DescribeResult(nobs=35, minmax=(0.0, 1.6), mean=0.29469387755102044, variance=0.1933623735208369, skewness=1.6545061248825716, kurtosis=1.6924146771017323)
    Cut edges targeting cores: DescribeResult(nobs=35, minmax=(0.0, 1.1714285714285715), mean=0.236734693877551, variance=0.09174412622191734, skewness=1.7458774827764796, kurtosis=2.3082162325499427)
    Cut edges sourcing from others: DescribeResult(nobs=77, minmax=(0.0, 0.5064935064935064), mean=0.040647664024287405, variance=0.0065350092291708

### Cores are targets of multiple paths

If cores are more often than others part of the paths, it means that they might not be central by virtue of their degree, but by how many event trajectory path (not just any path as in the betweenness, or hubness) pass through them.     
More in detail, cores are important because they are more often the target of cut flow edges. The **pagerank** - where a node rank is proportional to the total rank of the other nodes pointing to it - is a way to measure it.

In [21]:
print('... PageRank centrality')
pagerank_cores = np.array(dgraph.personalized_pagerank(vertices=core_indexes, directed=True, damping=0.85, reset="is_core"))
pagerank_others = np.array(dgraph.personalized_pagerank(vertices=other_indexes, directed=True, damping=0.85, reset="is_core"))
# description
print("    cores: "+str(stats.describe(pagerank_cores)) )
print("    others: "+str(stats.describe(pagerank_others)) )
# significativity
print("    Kruskal-Wallis test:  %.3f p= %.3f" % stats.kruskal(pagerank_cores, pagerank_others))
d,_ = stats.ks_2samp(pagerank_cores, pagerank_others) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(pagerank_cores))
plt.scatter(xs, pagerank_cores, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(pagerank_others))
plt.scatter(xs, pagerank_others, alpha=0.3, c='silver')
vp = ax.violinplot([pagerank_cores,pagerank_others], widths=0.15, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('PageRank')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(pagerank_cores)), "other\n(n={:d})".format(len(pagerank_others))])
fig.savefig(exp_path+'/results/global_cores_others_pagerank.svg', transparent=True)
plt.close()
fig.clf()

... PageRank centrality
    cores: DescribeResult(nobs=35, minmax=(0.006725200919659086, 0.013983399707699865), mean=0.008086067919655509, variance=2.2740236820600384e-06, skewness=2.139231064286641, kurtosis=5.142654286263989)
    others: DescribeResult(nobs=77, minmax=(7.176666085246508e-06, 0.008494477755171185), mean=0.0011266826080292174, variance=1.4417684587524277e-06, skewness=3.5408949693202096, kurtosis=17.3788598833623)
    Kruskal-Wallis test:  68.815 p= 0.000
    Kolmogorov-Smirnov Effect Size: 0.987


## Relationship between structural and dynamical cores

### Hierarchical modularity
The log-log linear relationship between Clustering coefficient and degree is plotted in the file `hierarchical modularity.png` (and .svg) produced by the `structural_analysis.py` and already stored in the `results` folder.

### Bow-tie structure of modules
Local bow-ties analysis as in Fujita et al. 2019.
Identify communities based on (multiple trials) random walks as information flows.    
With very sparsely connected networks, as MICrONS, the library igraph finds only one module (see [here](https://stackoverflow.com/questions/20364939/community-detection-with-infomap-algorithm-producing-one-massive-module)). Using the same algorithm (with teleportation) from the InfoMap authors (see [here](https://mapequation.github.io/infomap/python/)).


In [23]:
from infomap import Infomap # with teleportation to ensure no local solution
im = Infomap(no_self_links=True, flow_model="directed", seed=2**32-2, prefer_modular_solution=True)
im.add_networkx_graph( dgraph.to_networkx() ) # infomap accepts only networkx format
print("    starting infomap analysis")
im.run()
print(f"    found {im.num_top_modules} modules with codelength: {im.codelength:.4f}  entropy: {im.entropy_rate:.4f}")
previous_id = 1
communities_tot = []
communities_lens = []
community = []
structural_cores = []
for node_id, module_id in sorted(im.modules, key=lambda x: x[1]):
    if module_id>previous_id: # simple module handling
        community_graph = dgraph.subgraph(community) # community contains the indexes in dgraph
        imcommunity = Infomap(no_self_links=True, flow_model="directed", seed=2**32-2, prefer_modular_solution=True, silent=True)
        imcommunity.add_networkx_graph( community_graph.to_networkx() )
        imcommunity.run()
        # unspecific submodule specification
        if imcommunity.num_non_trivial_top_modules > 2:
            communities_lens.append(len(community))
        communities_tot.append(len(community))
        # get central community cores
        community_cores = []
        for imnode, immodules in imcommunity.get_multilevel_modules().items():
            if immodules[0]==1: # only the center
                community_cores.append(imnode)
        structural_cores.append(community_cores)
        # simple module handling
        previous_id=module_id
        community = []
    # print(node_id, module_id)
    community.append(node_id)
print("    bow-tie score:", len(communities_lens)/len(communities_tot))
print("    communities lens:",stats.describe(communities_lens))


    starting infomap analysis
    found 44 modules with codelength: 7.8376  entropy: 1.4132
. Found 2 levels with codelength 7.837594297

=> Trial 1/1 finished in 0.014791879s with codelength 7.8375943


Summary after 1 trial
Best end modular solution in 2 levels:
Per level number of modules:         [         44,           0] (sum: 44)
Per level number of leaf nodes:      [          0,         334] (sum: 334)
Per level average child degree:      [         44,     7.59091] (average: 11.829)
Per level codelength for modules:    [3.506240903, 0.000000000] (sum: 3.506240903)
Per level codelength for leaf nodes: [0.000000000, 4.331353394] (sum: 4.331353394)
Per level codelength total:          [3.506240903, 4.331353394] (sum: 7.837594297)

  Infomap ends at 2022-12-18 09:40:33
  (Elapsed time: 0.02138768s)
  Infomap v2.6.1 starts at 2022-12-18 09:43:26
  -> Input network: 
  -> No file output!
  -> Configuration: no-self-links
                    flow-model = directed
                    s

### (OPTIONAL) Random rewiring to test bow-tie score

The EM data gives a certain bow-tie score.    
To see whether the score can be improved or worsen by different connectivities, we can rewire at random (easy), and study their statistics.   
**Uncomment the lines below to run the (rather long) rewiring tests.**

In [24]:
# rewired_bowtie_score = {}
# for rewireprob in np.linspace(0.01, 0.05, num=10):
#     print("    \nrewiring probability:",rewireprob)
#     rewired_bowtie_score[rewireprob] = []
#
#     for trial in range(0,10):
#         rewired_graph = dgraph.copy()
#
#         rewired_graph.rewire_edges(prob=rewireprob, loops=False, multiple=True) # in place!
#
#         # Clustering Coefficient of only excitatory cells
#         local_clustering_coefficients = np.array(rewired_graph.transitivity_local_undirected(vertices=None, mode="zero"))
#         # plot
#         paramsfit = [2, 1.05] # 334 EM-only all proofread
#         pfit = powerlaw(degrees, *paramsfit)
#         fig = plt.figure()
#         summer = mpcm.summer
#         plt.scatter( degrees,local_clustering_coefficients, marker='o', facecolor='#111111', s=50, edgecolors='none', alpha=0.5) #
#         plt.plot(degrees,pfit,c='k')
#         plt.yscale('log')
#         plt.xscale('log')
#         ax = plt.gca()
#         ax.spines['top'].set_visible(False)
#         ax.spines['right'].set_visible(False)
#         plt.ylabel('LCC')
#         plt.xlabel('degree')
#         plt.tick_params(axis='both', bottom='on', top='on', left='off', right='off')
#         plt.tight_layout()
#         #fig.savefig(exp_path+'/results/rewiring/hierarchical_modularity'+str(rewireprob)+str(trial)+'.png', transparent=True, dpi=900)
#         fig.savefig(exp_path+'/results/rewiring/hierarchical_modularity'+str(rewireprob)+"_"+str(trial)+'.svg', transparent=True)
#         plt.close()
#         fig.clf()
#
#         # Local bow-ties analysis as in FujitaKichikawaFujiwaraSoumaIyetomi2019
#         from infomap import Infomap
#         im = Infomap(silent=True, no_self_links=True, flow_model="directed", seed=2**32-1, core_loop_limit=10, prefer_modular_solution=True, inner_parallelization=True, num_trials=10)
#         im.add_networkx_graph( rewired_graph.to_networkx() ) # infomap accepts only networkx format
#         im.run()
#         previous_id = 1
#         communities_tot = []
#         communities_lens = []
#         community = []
#         for node_id, module_id in sorted(im.modules, key=lambda x: x[1]):
#             if module_id>previous_id: # simple module handling
#                 community_graph = rewired_graph.subgraph(community) # community contains the indexes in rewired_graph
#                 imcommunity = Infomap(no_self_links=True, flow_model="directed", seed=2**32-1, core_loop_limit=10, prefer_modular_solution=True, silent=True, num_trials=10)
#                 imcommunity.add_networkx_graph( community_graph.to_networkx() )
#                 imcommunity.run()
#                 if imcommunity.num_non_trivial_top_modules > 2:
#                     communities_lens.append(len(community))
#                 communities_tot.append(len(community))
#                 previous_id=module_id
#                 community = []
#             community.append(node_id)
#         if len(communities_tot)<5:
#             continue
#         bowtie_score = len(communities_lens)/len(communities_tot)
#         print("    trial:", trial, "score:",bowtie_score)
#         rewired_bowtie_score[rewireprob].append( bowtie_score )
#
# for rwiredk, rewiredv in rewired_bowtie_score.items():
#     print("dgraph rewired with prob:",rwiredk)
#     print("    bow-tie score avg:", stats.describe(rewiredv))


### Overlap between structural and dynamical clusters

Check the consistency of our hypothesis chain by looking at the overlap between dynamically-identified core neurons – those reliably participating in multiple clustered population events – and structurally-identified core neurons – those repetitively found in bow-tie modules.    
(it requires the file with dynamical cores)

In [31]:
if os.path.exists('./results/clusters_cores.npy'):
    clusters_cores = np.load('./results/clusters_cores.npy', allow_pickle=True)
    print("... loaded:", clusters_cores.shape)
    # print(clusters_cores)
    # print(clusters_cores.shape)

    core_indexes = []
    other_indexes = []
    clusters_cores_indexes = []
    for dyn_core in clusters_cores:
        core_indexes.extend( [ophys_cell_ids.index(strid) for strid in dyn_core] )
        clusters_cores_indexes.append( [ophys_cell_ids.index(strid) for strid in dyn_core] )
    core_indexes = np.unique(core_indexes)
    print("    # cores:",len(core_indexes))
    other_indexes = [i for i in range(len(ophys_cell_ids)) if i not in core_indexes]
    print("    # non-cores:",len(other_indexes))

    print("... overlap between structural cores and dynamical cores")
    # for each set of dynamical cores from a cluster
    # we compare it with all the sets of structral core nodes identified by the InfoMap (and take the max to avoid duplicates)
    overlapSD = {}
    overlap_ratio = []
    ccs_len = []
    scs_len = []
    for iccs,ccs in enumerate(clusters_cores_indexes):
        ccs_len.append(len(ccs))
        kccs = "{}_{}_".format(iccs,len(ccs))
        overlapSD[kccs+"lens"] = []
        overlapSD[kccs+"ratio"] = []
        for scs in structural_cores:
            scs_len.append(len(scs))
            overlapSD[kccs+"lens"].append( "{}/{}".format( len(set(ccs)&set(scs)), len(ccs) ) )
            overlapSD[kccs+"ratio"].append( len(set(ccs)&set(scs))/len(ccs) )
        # print(len(scs),len(ccs))
        # choose which structural_cores better matches
        index_max = max(range(len(overlapSD[kccs+"ratio"])), key=overlapSD[kccs+"ratio"].__getitem__)
        print(overlapSD[kccs+"ratio"][index_max], overlapSD[kccs+"lens"][index_max])
        overlap_ratio.append(overlapSD[kccs+"ratio"][index_max])
    print(stats.describe(overlap_ratio))

    # plot all ratio
    fig, ax = plt.subplots()
    xs = np.random.normal(1, 0.04, len(overlap_ratio))
    plt.scatter(xs, overlap_ratio, alpha=0.3, c='gray', edgecolors='none')
    vp = ax.violinplot([overlap_ratio], widths=0.15, showextrema=False, showmeans=True)
    for pc in vp['bodies']:
        pc.set_edgecolor('black')
    for pc,cb in zip(vp['bodies'],['black']):
        pc.set_facecolor(cb)
    vp['cmeans'].set_color('orange')
    ax.spines['top'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.spines['right'].set_visible(False)
    plt.ylim([0,0.7])
    plt.ylabel('Overlap ratio')
    fig.set_figwidth(1.5)
    fig.tight_layout()
    fig.savefig('./results/dynamical_structural_cores.svg', transparent=True)
    plt.close()
    fig.clf()

... loading clusters
... loaded: (11,)
    # cores: 35
    # non-cores: 77
... overlap between structural cores and dynamical cores
0.0 0/2
0.375 3/8
0.14285714285714285 1/7
0.25 1/4
0.6666666666666666 2/3
0.25 1/4
0.2 1/5
0.2 1/5
0.0 0/2
0.0 0/7
0.25 1/4
DescribeResult(nobs=11, minmax=(0.0, 0.6666666666666666), mean=0.21222943722943724, variance=0.03775229334157905, skewness=0.9909598007383054, kurtosis=0.8177289227244207)


---
## Supplementary figure 3
   
To have keep cores within the attractor framework, cores activity could be sustained by indirect synaptic feedback, through highly connected secondary paths.   
To back up the attractor idea, one would expect that core neurons would have shorter paths or cycles, compared to others. 

### Shortest paths of cores and others (panel S3A)

In [39]:
print("... number of paths in a complete graph of the same size:", (np.math.factorial(112-2)*np.e))
print('... number of shortest paths between cores')
core_shortestpaths = []
for coreidx in core_indexes:
    othercores = list(core_indexes)
    othercores.remove(coreidx)
    shrtpth = dgraph.get_shortest_paths(coreidx, to=othercores, weights=None, mode='out', output='vpath')
    for strp in shrtpth:
        core_shortestpaths.append(len(strp))
other_shortestpaths = []
for otheridx in other_indexes:
    otherothers = list(other_indexes)
    otherothers.remove(otheridx)
    shrtpth = dgraph.get_shortest_paths(otheridx, to=otherothers, weights=None, mode='out', output='vpath')
    for strp in shrtpth:
        other_shortestpaths.append(len(strp))
print("    cores shortest paths: "+str(stats.describe(core_shortestpaths)) )
print("    others shortest paths: "+str(stats.describe(other_shortestpaths)) )
print("    equal variances? "+str(stats.levene(core_shortestpaths, other_shortestpaths)) )
# significativity
print("    Welch t test:  %.3f p= %.3f" % stats.ttest_ind(core_shortestpaths, other_shortestpaths, equal_var=False))
d,_ = stats.ks_2samp(core_shortestpaths, other_shortestpaths) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(core_shortestpaths))
plt.scatter(xs, core_shortestpaths, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(other_shortestpaths))
plt.scatter(xs, other_shortestpaths, alpha=0.3, c='silver')
vp = ax.violinplot([core_shortestpaths,other_shortestpaths], widths=0.15, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Shortest path length')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(core_shortestpaths)), "other\n(n={:d})".format(len(other_shortestpaths))])
fig.savefig(exp_path+'/results/global_cores_others_shortestpath.svg', transparent=True)
plt.close()
fig.clf()

... number of paths in a complete graph of the same size: 4.317298994652368e+178
... number of shortest paths between cores
    cores shortest paths: DescribeResult(nobs=110, minmax=(0, 5), mean=0.9727272727272728, variance=2.485487906588825, skewness=1.131712439974105, kurtosis=-0.412811249534613)
    others shortest paths: DescribeResult(nobs=10100, minmax=(0, 8), mean=0.6076237623762376, variance=2.021983658807508, skewness=2.163248016247377, kurtosis=3.4165268418042007)
    equal variances? LeveneResult(statistic=7.156175120461955, pvalue=0.007482545093017935)
    Welch t test:  2.418 p= 0.017
    Kolmogorov-Smirnov Effect Size: 0.124


  shrtpth = dgraph.get_shortest_paths(coreidx, to=othercores, weights=None, mode='out', output='vpath')
  shrtpth = dgraph.get_shortest_paths(otheridx, to=otherothers, weights=None, mode='out', output='vpath')


### Cycles between cores or others (panel S3B)

Cycles are built starting from a core (or other) and iterating neighbors of different lenghts, where the last vertex is the starting one.

In [None]:
print('... cycles')
# breadth first search of paths and unique cycles
def get_cycles(adj, paths, maxlen):
    # tracking the actual path length:
    maxlen -= 1
    nxt_paths = []
    # iterating over all paths:
    for path in paths['paths']:
        # iterating neighbors of the last vertex in the path:
        for nxt in adj[path[-1]]:
            # attaching the next vertex to the path:
            nxt_path = path + [nxt]
            if path[0] == nxt and min(path) == nxt:
                # the next vertex is the starting vertex, we found a cycle
                # we keep the cycle only if the starting vertex has the
                # lowest vertex id, to avoid having the same cycles
                # more than once
                paths['cycles'].append(nxt_path)
                # if you don't need the starting vertex
                # included at the end:
                # paths$cycles <- c(paths$cycles, list(path))
            elif nxt not in path:
                # keep the path only if we don't create
                # an internal cycle in the path
                nxt_paths.append(nxt_path)
    # paths grown by one step:
    paths['paths'] = nxt_paths
    if maxlen == 0:
        # the final return when maximum search length reached
        return paths
    else:
        # recursive return, to grow paths further
        return get_cycles(adj, paths, maxlen)
# Comparison of core based cycles vs other based cycles
maxlen = 10 # the maximum length to limit computation time
# creating an adjacency list
adj = [[n.index for n in v.neighbors()] for v in dgraph.vs]
# recursive search of cycles
# for each core vertex as candidate starting point
core_cycles = []
for start in core_indexes:
    core_cycles += get_cycles(adj,{'paths': [[start]], 'cycles': []}, maxlen)['cycles']
print("    # core-based cycles:", len(core_cycles) )
# count the length of loops involving 1 core
core_cycles_lens = [len(cycle) for cycle in core_cycles]
print("    core-based cycles length: "+str(stats.describe(core_cycles_lens)) )

other_cycles = []
for start in other_indexes:
    other_cycles += get_cycles(adj,{'paths': [[start]], 'cycles': []}, maxlen)['cycles']
print("    # other-based cycles:", len(other_cycles) )
# count the length of loops involving 1 core
other_cycles_lens = [len(cycle) for cycle in other_cycles]
print("    other-based cycles length: "+str(stats.describe(other_cycles_lens)) )

d,_ = stats.ks_2samp(core_cycles_lens, other_cycles_lens) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
# all cycles by type
fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(core_cycles_lens))
plt.scatter(xs, core_cycles_lens, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(other_cycles_lens))
plt.scatter(xs, other_cycles_lens, alpha=0.3, c='silver')
bp = ax.boxplot([core_cycles_lens,other_cycles_lens], notch=0, sym='', showcaps=False, zorder=10)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Cycles length')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(core_cycles_lens)), "other\n(n={:d})".format(len(other_cycles_lens))])
fig.savefig(exp_path+'/results/global_cores_others_cyclelens.png', transparent=True, dpi=1500)
plt.close()
fig.clf()

... cycles


### Do the cores of each cluster form more cliques than others? (Panel S3C)

If the cores of each cluster are pattern completion units, they should participate in more cliques (set of vertices where an edge is present between any two of them) than other non-core neurons.

In [45]:
cliques = dgraph.cliques(min=2)

cliques_cores = []
cliques_others = []

for cluster_cids in clustered_spectrums:
    cluster_core_indices = []
    # we take the index of the cell participating in this cluster
    cluster_indices = [ophys_cell_ids.index(strid) for strid in cluster_cids]
    # we take the cores of this cluster
    cluster_core_indices = list(set(core_indexes).intersection(cluster_indices))
    cluster_other_indices = list(set(other_indexes).intersection(cluster_indices))
    # we take the edges between the cores
    for clique in cliques:
        if set(clique).issubset(cluster_core_indices):
            cliques_cores.append(clique)
        if set(clique).issubset(cluster_other_indices):
            cliques_others.append(clique)
print(cliques_cores)
cores_cliques_count = len(cliques_cores)/len(core_indexes)
others_cliques_count = len(cliques_others)/len(other_indexes)

print("    cliques made by cores:",cores_cliques_count)
print("    cliques made by others:",others_cliques_count)

# print(core_edges)
x = np.array(["cores", "others"])
y = np.array([cores_cliques_count, others_cliques_count])
fig, ax = plt.subplots()
plt.bar(x, y, color=['forestgreen','silver'])
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Normalized count of cliques')
plt.xticks([0, 1], ["core\n(n={:.3f})".format(cores_cliques_count), "other\n(n={:.3f})".format(others_cliques_count)])
fig.savefig(exp_path+'/results/global_cores_others_cliques.svg', transparent=True)
plt.close()
fig.clf()
    

  cliques = dgraph.cliques(min=2)


[(21, 24), (24, 30), (5, 21), (24, 30), (24, 30), (5, 21), (5, 21), (5, 21), (5, 21), (21, 24), (5, 21), (24, 30), (21, 24), (21, 24), (5, 21), (21, 24), (24, 30), (21, 24), (24, 30), (5, 21), (21, 24), (21, 24), (5, 21)]
    cliques made by cores: 2.090909090909091
    cliques made by others: 2.8613861386138613


### Clusters of events are not reproducible trajectories of the population dynamics

Clusters of population events are found by correlating population vectors, which only retain the cell IDs while ignoring the time of firing.    
We can consider also time.

Each recorded frame (~67ms) is an instantaneous population state defined by all its cells (112 of them are known for their firing, the others are unkown).    
A sequence of population states is a trajectory in the population dynamical state space.    
In this space, clusters of reproducible population events are represented by reproducible trajectories. 

We can compare the event trajectories visited within a cluster by comparing their patterns.    
Events are made by cells firing (often multiple times) during the event interval, so each sequence is a 2D submatrix of the population rasterplot. This gives a measure of trajectory reproducibility. 

In [46]:
# print(cluster_events_spiketrains) # already expressed in integer (ms)

print("... sequence internal consistency")

# cycle over clusters
for cluster_k, events_cellindexes in sorted_events_indexes.items():
    if cluster_k == 'gray':
        continue
    print()

    # We want to compare the trajectories of this cluster.
    # Trajectories should have same shape. We will subract them to get the difference (/num of events).
    
    # Finding the common-shape trajectory
    # n is the maximal number of cells participating to events in this cluster
    maxcells = max(events_cellindexes, key = lambda i: len(i))
    # m is the largest interval between the first min spiketrain and the last max spiketrain of all events in the cluster
    events_spiketrains = cluster_events_spiketrains[cluster_k]
    # print(events_spiketrains)
    maxinterval = 0 # 
    for evt_spktrains in events_spiketrains:
        mint = np.amin([x for xs in evt_spktrains for x in xs]) # for cases of just one spiketinme in list
        if isinstance(mint, list): mint = mint[0] # for cases of list
        maxt = np.amax([x for xs in evt_spktrains for x in xs])
        if isinstance(maxt, list): maxt = maxt[-1]
        if maxt-mint > maxinterval:
            maxinterval = maxt-mint
    print("    common trajectory pattern with n cells:", len(maxcells), " and m intervals:", maxinterval)
    
    # cluster trajectories, one per event, all same shape
    cluster_trajectories = []
    for evt_indexes,evt_spktrains in zip(events_cellindexes,events_spiketrains):
        # create empty trajectory of shape n cell, m interval
        trajectory = np.zeros((len(maxcells),maxinterval+1))
        mint = np.amin([x for xs in evt_spktrains for x in xs]) # take local mintime to find the trajectory m index
        if isinstance(mint, list): mint = mint[0] # for cases of just one spiketinme in list
        for ncell,spktrain in enumerate(evt_spktrains):
            trajectory[ncell][spktrain-mint] = 1
        cluster_trajectories.append(trajectory)
    
    # correlation between trajectories
    # very simple (probably too much) measure of trajectory correspondence
    trajR = []
    for itr,itrajectory in enumerate(cluster_trajectories):
        for jtr,jtrajectory in enumerate(cluster_trajectories):
            if itr!=jtr:
                trajR.append( np.nanmean(np.corrcoef(itrajectory,jtrajectory)) )
    print("    correlation across all trajectories: {:1.3f}±{:1.2f}".format(np.mean(trajR),np.std(trajR)))

    print("... searching for repeating sequences in the ordered firing of cell IDs")
    size = 2
    # size = 3
    cluster_sequences = [x for xs in events_cellindexes for x in xs]
    # print(cluster_sequences)
    windows = [
        tuple(window)
        for window in more_itertools.windowed(cluster_sequences, size)
    ]
    counter = collections.Counter(windows)
    for window, count in counter.items():
        if count > 1:
            print("   ",window, count)
            print(core_indexes)
        

... sequence internal consistency

    common trajectory pattern with n cells: 6  and m intervals: 674
    correlation across all trajectories: 0.310±0.02
... searching for repeating sequences in the ordered firing of cell IDs
    (92, 93) 2
[ 5  8 21 24 30 36 47 50 81 92 93]

    common trajectory pattern with n cells: 9  and m intervals: 809
    correlation across all trajectories: 0.172±0.04
... searching for repeating sequences in the ordered firing of cell IDs
    (47, 81) 2
[ 5  8 21 24 30 36 47 50 81 92 93]

    common trajectory pattern with n cells: 14  and m intervals: 877
    correlation across all trajectories: 0.216±0.03
... searching for repeating sequences in the ordered firing of cell IDs
    (24, 5) 2
[ 5  8 21 24 30 36 47 50 81 92 93]

    common trajectory pattern with n cells: 7  and m intervals: 337
    correlation across all trajectories: 0.271±0.03
... searching for repeating sequences in the ordered firing of cell IDs
    (5, 21) 2
[ 5  8 21 24 30 36 47 50 81 92

  c /= stddev[:, None]
  c /= stddev[None, :]
