# How-to: core neurons are crossroads of cortical dynamics 

Analysis code to reproduce all panels in figures 1 and 2 of the paper by Guarino, Filipchuk, Destexhe (2022)   
preprint link: https://www.biorxiv.org/content/10.1101/2022.05.24.493230v2

All this code is hosted on a github [repository](https://github.com/dguarino/Guarino-Filipchuk-Destexhe) (with a Zenodo DOI persistent identifier [here](https://zenodo.org)) and can be interactively executed here.  
The repository also contains a copy of the required data files from the [MICrONS project phase1](https://www.microns-explorer.org/phase1) (freely available on the project website), to ease the setup on Binder. 

This notebook performs loading and selection of the MICrONS data, structural and dynamical analyses, and plots the results as in the paper panels.

We divided the analysis code into:
- `imports_functions.py` : performs the imports and definition of various helper functions.
- `structural_analysis.py` : creates a graph from the connectivity matrix and computes several graph measures (using [igraph](https://igraph.org)).
- `dynamical_analysis.py` : performs the same population event analysis as in [Filipchuk et al. 2022](https://www.biorxiv.org/content/10.1101/2021.08.31.458322v2) and then also extracts the core neurons of the events.


In [1]:
from platform import python_version
print(python_version())

from builtins import exec
exec(open("./imports_functions.py").read())

3.10.4


## Loading curated data from MICrONS project phase 1

The following code for data loading and selection is taken from   
https://github.com/AllenInstitute/MicronsBinder/blob/master/notebooks/intro/MostSynapsesInAndOut.ipynb   
https://github.com/AllenInstitute/MicronsBinder/blob/master/notebooks/vignette_analysis/function/structure_function_analysis.ipynb

`Neurons.pkl` contains the `segment_id` for each pyramidal neuron in the EM volume.    
`Soma.pkl` contains the soma position for all the cells in the EM volume.   
`calcium_trace.pkl` contains the calcium imaging traces (including deconvolved spikes).    
`soma_subgraph_synapses_spines_v185.csv` contains the list of synapses with root pre-/post-synaptic somas.

**CAUTION: The cell below might take some time to load the data.**

In [2]:
if not os.path.exists("MICrONS_data/calcium_trace.pkl"):
    print("Downloading 2photon calcium traces ...")
    resp = wget.download("https://zenodo.org/record/5646567/files/calcium_trace.pkl?download=1", "MICrONS_data/calcium_trace.pkl")
    print("... Done: "+resp)

if not os.path.exists("MICrONS_data/pni_synapses_v185.csv"):
    print("Downloading Synapse table ...")
    resp = wget.download("https://zenodo.org/record/3710459/files/pni_synapses_v185.csv?download=1", "MICrONS_data/pni_synapses_v185.csv")
    print("... Done: "+resp)

if not os.path.exists("MICrONS_data/calcium_trace.pkl"):
    print("Downloading soma_subgraph_synapses_spines_v185 ...")
    resp = wget.download("https://zenodo.org/record/3710459/files/soma_subgraph_synapses_spines_v185.csv?download=1", "MICrONS_data/soma_subgraph_synapses_spines_v185.csv")
    print("... Done: "+resp)

with open("MICrONS_data/Neuron.pkl", 'rb') as handle:
    Neuron = pickle.load(handle)
with open("MICrONS_data/Soma.pkl", 'rb') as handle:
    Soma = pickle.load(handle)
if os.path.exists("MICrONS_data/calcium_trace.pkl"):
    calcium_trace = pd.read_pickle("MICrONS_data/calcium_trace.pkl")
# print(calcium_trace)

syn_spines_df = pd.read_csv('MICrONS_data/soma_subgraph_synapses_spines_v185.csv')
# id, pre_root_id, post_root_id, cleft_vx, spine_vol_um3
print(syn_spines_df.shape)

syn_df = pd.read_csv('MICrONS_data/pni_synapses_v185.csv')
print(syn_df.shape)

(1961, 17)
(3239275, 16)


Get the IDs and number of recorded pyramidal neurons

In [3]:
pyc_list = Neuron["segment_id"]
n_pyc = pyc_list.shape[0]

Set the folder to which all results will be saved, and the frame duration (from the MICrONS docs).

In [4]:
exp_path = os.getcwd()
frame_duration = 0.0674 # sec, 14.8313 frames per second

#### Accessing 2-photon Calcium imaging data subset

We are interested in reading only the Ca-imaging data of the cells for which also the EM reconstruction is available.   

##### CAUTION: next cell can take some time to load all calcium imaging data!

In [5]:
print("Pyramidal neurons recorded with 2-photon Calcium imaging: ",len(calcium_trace))
ophys_cell_ids = list(calcium_trace.keys())
n_frames = len(calcium_trace[ophys_cell_ids[0]]['spike'])
start_time = 200*frame_duration # 200 frames of blank screen
stop_time = (200+n_frames)*frame_duration
time = np.arange(start_time,stop_time,frame_duration)

decs = []
for ocell_id in ophys_cell_ids:
    decs.append(calcium_trace[ocell_id]["spike"]) # deconvolved Ca spiketrains

fig, ax = plt.subplots()
ax.plot(range(n_frames), decs[0])
fig.savefig(exp_path+'/results/deconvolved_Ca_spikes0.png', dpi=300, transparent=True)
plt.close()
fig.clf()
spiketrains = []
for decst in decs:
    spiketrains.append( time[:][np.nonzero(decst)] )

print("... producing spike rasterplot")
fig = plt.figure(figsize=[12.8,4.8])
for row,train in enumerate(spiketrains):
    plt.scatter( train, [row]*len(train), marker='o', edgecolors='none', s=1, c='k' )
plt.ylabel("cell IDs")
plt.xlabel("time (s)")
fig.savefig(exp_path+'/results/rasterplot.png', transparent=False, dpi=800)
plt.close()
fig.clear()
fig.clf()

Pyramidal neurons recorded with 2-photon Calcium imaging:  112
... producing spike rasterplot


#### Create the cell indexes from the list of IDs

In [6]:
ophys_cell_indexes = range(len(ophys_cell_ids))

#### Get soma center locations

They are provided in voxels coordinates of 4,4,40 nm

In [7]:
pyc_soma_loc = np.zeros((n_pyc, 3))
for i in range(n_pyc):
    seg_id = pyc_list[i]
    pyc_soma_loc[i,:] = get_soma_loc(Soma, seg_id)

Join cell indexes with their position

In [8]:
pyc_ca_soma_loc = np.zeros((len(ophys_cell_indexes), 3))
for i in ophys_cell_indexes:
    seg_id = ophys_cell_ids[i]
    idx = np.where(pyc_list==seg_id)[0][0]
    pyc_ca_soma_loc[i,:] = pyc_soma_loc[idx,:]

---
## Structural Analysis

First, we build an adjacency matrix of the 2p/EM-imaged neurons:

In [9]:
adjacency_matrix = np.zeros((len(ophys_cell_indexes), len(ophys_cell_indexes)))
for i in ophys_cell_indexes:
    root_id = ophys_cell_ids[i]
    root_id_postsyn_list = syn_df[syn_df['pre_root_id'] == root_id]['post_root_id'].tolist()
    # print(root_id_postsyn_list)
    for ps in root_id_postsyn_list:
        if ps in ophys_cell_ids:
            # ips = np.argwhere(ophys_cell_ids==ps)[0][0]
            ips = ophys_cell_ids.index(ps)
            # print(ps, ips)
            adjacency_matrix[i][ips]=1
np.save(exp_path+'/results/adjacency_matrix.npy', adjacency_matrix)

Several global purely structural measures.    
This includes **panel 2B** (with inset).

In [10]:
global_degree_counts = []
global_degree_distribution = []
global_structural_betweeness = []
global_structural_motifs = []
global_structural_motifsratio = []
global_structural_motifsurrogates = []

exec(open("./structural_analysis.py").read())

global_structural_betweeness.append(betweenness_centrality)
global_degree_counts.append(degree_counts)
global_degree_distribution.append(degrees)
global_structural_motifs.append(motifs)
global_structural_motifsurrogates.append(surrogate_motifs)
global_structural_motifsratio.append(motifsratio)

... adjacency matrix
... loaded
    number of vertices: 112
... Network nodes degrees
... Degree distributions
... Betweenness centrality
... Motifs




## Dynamical Analysis

Here we first population events, we quantify them, and we extract their core neurons.   
This analysis extends (from step 5 on) that performed by Filipchuk et al. 2022:
1. Compute population instantaneous firing rate (bin)

2. Establish significance threshold for population events   
    2.1 compute Inter-Spike Intervals (ISI) of the original spiketrains   
    2.2 reshuffle ISI to create (1000) surrogates   
    2.3 compute the population instantaneous firing rate for each surrogate time-binned rasterplot   

3. Find population events   
    3.1 smoothed firing rate   
    3.2 instantaneous threshold is the 99% of the surrogate population instantaneous firing rate   
    3.3 the peaks above intersections of smoothed fr and threshold mark population events   
    3.4 the minima before and after a peak are taken as start and end times of the population event   
    
4. Find clusters of events   
    4.1 produce a cell id signature vector of each population event   
    4.2 perform clustering linkage by complete cross-correlation of event vectors   
    4.3 produce surrogates clusters to establish a cluster significance threshold     
    4.4 find the event reproducibility within each cluster (cluster events cross-correlation)   

5. Find core neurons   
    5.1 take all neurons participating to a cluster of events   
    5.2 use the 99% of the cluster event reproducibility as significance threshold   
    5.3 if the occurrence frequency of a neuron is beyond threshold, then the neuron is taken as core   
    5.4 remove core neurons if firing unspecifically within and outside their cluster   
    
### All panels of Figure 1

are produced in the next cell by the file `dynamical_analysis.py`.

In [11]:
global_structural_motif_cores = {k: 0 for k in range(16)}
global_structural_motif_others = {k: 0 for k in range(16)}
global_events_sec = []
global_events_duration = []
global_cluster_number = []
global_cluster_selfsimilarity = []

core_reproducibility_perc = 85 # change this to relax the threshold for detecting cores
exec(open("./dynamical_analysis.py").read())

global_events_sec.append(events_sec)
global_events_duration.extend(events_durations_f)
global_cluster_number.append(nclusters)
global_cluster_selfsimilarity.extend(reproducibility_list)

... firing statistics
    population firing: 1.23±1.14 sp/frame
    smoothing
... generating surrogates to establish population event threshold
    cells firing rate: 0.01±0.10 sp/s
    event size threshold (mean): 3.2139256164294947
... find population events in the trial
... signatures of population events
    number of events: 225
    number of events per sec: 0.12228127955130923
    events duration: 0.674±0.255
    events size: 8.000±3.919
... Similarity of events matrix
... clustering
    linkage
    surrogate events signatures for clustering threshold
    cluster reproducibility threshold: 0.25135164220675643
    cluster size threshold: 2
    Total number of clusters: 91
    removing below size threshold clusters: 3
    removing below reproducibility threshold clusters: 86
... finding cluster cores
    removing cores firing unspecifically
    gathering cores from all clusters
    # cores: 19
    # non-cores: 93
    plotting single events rasterplots ...


---
## Mixing structural and dynamical analyses results to characterize core connectivity

Here, we collect the evidence contrasting the hypothesis that core neurons are strongly connected.   
We tested two fundamental attractor-driven assumptions:
- synapses between cores are more numerous and stronger compared to others   
- circuits made by cores involve more recursive connections toward cores

### Spine number and volume (panel 2A, 2B)

We can take the **number** (2A) and **volume** (2B) of post-synaptic spines as proxy for their functional efficacy.   
The number of cores and non-cores for each cluster is different. Therefore we have to normalize this count to evaluate.

For each set of reproducible cluster we count:    
- the number of synapses made by a cell type (core or not) towards others, weighted by the squared number of target cells    
    - the expectation is that core-to-core and core-to-other synapses should be numerous in order to pull the dynamics
- the post-synaptic spine volume of synapses made by a cell type (core or not) towards others.   
    - the expectation is that core-to-core and core-to-other spines should be larger in order to pull the dynamics

**Synapses between core neurons of each cluster are less than every other combination.**    
Note that the resulting normalized synapse counts (for the others) check with the network density.

In [None]:
# the density of the directed graph.
network_density = dgraph.density(loops=True)
print("... network density (ratio between the edges present and the maximum number of edges that the graph can contain):", network_density )
# spine number
core2core_spine_num = 0.0 # to be normalized
core2other_spine_num = 0.0
other2core_spine_num = 0.0
other2other_spine_num = 0.0
# spine volume
core2core_spine_vol = [] # µm3
core2other_spine_vol = []
other2core_spine_vol = []
other2other_spine_vol = []

set_ids = set(ophys_cell_ids)
for dyn_core_ids in clusters_cores:
    dyn_other_ids = set_ids.symmetric_difference(dyn_core_ids)

    # id, pre_root_id, post_root_id, cleft_vx, spine_vol_um3
    core2core_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_core_ids)}) and (post_root_id in {list(dyn_core_ids)})')
    if not core2core_synapse_df.empty:
        core2core_spine_vol.extend( core2core_synapse_df['spine_vol_um3'].tolist() )
        core2core_spine_num += len(core2core_synapse_df['spine_vol_um3'].tolist())/(len(dyn_core_ids)*len(dyn_core_ids)) # normalized by target
    
    core2other_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_core_ids)}) and (post_root_id in {list(dyn_other_ids)})')
    if not core2other_synapse_df.empty:
        core2other_spine_vol.extend( core2other_synapse_df['spine_vol_um3'].tolist() )
        core2other_spine_num += len(core2other_synapse_df['spine_vol_um3'].tolist())/(len(dyn_core_ids)*len(dyn_other_ids)) 
    
    other2core_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_other_ids)}) and (post_root_id in {list(dyn_core_ids)})')
    if not other2core_synapse_df.empty:
        other2core_spine_vol.extend( other2core_synapse_df['spine_vol_um3'].tolist() )
        other2core_spine_num += len(other2core_synapse_df['spine_vol_um3'].tolist())/(len(dyn_core_ids)*len(dyn_other_ids)) 
 
    other2other_synapse_df = syn_spines_df.query(f'(pre_root_id in {list(dyn_other_ids)}) and (post_root_id in {list(dyn_other_ids)})')
    if not other2other_synapse_df.empty:
        other2other_spine_vol.extend( other2other_synapse_df['spine_vol_um3'].tolist() )
        other2other_spine_num += len(other2other_synapse_df['spine_vol_um3'].tolist())/(len(dyn_other_ids)*len(dyn_other_ids)) 

# description
# number
print("... Normalized number of spines")
print("    {:f} core2core normalized spines number".format((core2core_spine_num)) )
print("    {:f} core2other normalized spines number".format((core2other_spine_num)) )
print("    {:f} other2core normalized spines number".format((other2core_spine_num)) )
print("    {:f} other2other normalized spines number".format((other2other_spine_num)) )

# spines
print("... Spine volumes")
print("    {:d} core2core spines, volume: {:1.3f}±{:1.2f} µm3".format(len(core2core_spine_vol), np.mean(core2core_spine_vol),np.std(core2core_spine_vol)) )
# print("    "+str(stats.describe(core2core_spine_vol)) )
print("    {:d} core2other spines, volume: {:1.3f}±{:1.2f} µm3".format(len(core2other_spine_vol), np.mean(core2other_spine_vol),np.std(core2other_spine_vol)) )
# print("    "+str(stats.describe(core2other_spine_vol)) )
print("    {:d} other2core spines, volume: {:1.3f}±{:1.2f} µm3".format(len(other2core_spine_vol), np.mean(other2core_spine_vol),np.std(other2core_spine_vol)) )
# print("    "+str(stats.describe(other2core_spine_vol)) )
print("    {:d} other2other spines, volume: {:1.3f}±{:1.2f} µm3".format(len(other2other_spine_vol), np.mean(other2other_spine_vol),np.std(other2other_spine_vol)) )
# print("    "+str(stats.describe(other2other_spine_vol)) )

# plotting
# all spine number by type
x = np.array(["core-core", "core-other", "other-core", "other-other"])
y = np.array([core2core_spine_num, core2other_spine_num, other2core_spine_num, other2other_spine_num])
fig, ax = plt.subplots()
plt.bar(x, y, color=['forestgreen','forestgreen','silver','silver'])
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Normalized number of spines')
fig.savefig(exp_path+'/results/global_cores_others_spine_num.svg', transparent=True)
fig.clf()
plt.close()

# all spine volumes by type
fig, ax = plt.subplots()
xs = np.random.normal(0, 0.04, len(core2other_spine_vol))
plt.scatter(xs, core2other_spine_vol, edgecolor='forestgreen', facecolor=('#228B224d'))
xs = np.random.normal(1, 0.04, len(other2core_spine_vol))
plt.scatter(xs, other2core_spine_vol, edgecolor='silver', facecolor=('#D3D3D34d'))
xs = np.random.normal(2, 0.04, len(other2other_spine_vol))
plt.scatter(xs, other2other_spine_vol, edgecolor='silver', facecolor=('#D3D3D34d'))
vp = ax.violinplot([core2other_spine_vol,other2core_spine_vol,other2other_spine_vol], [0,1,2], widths=0.3, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc in vp['bodies'][0:1]:
    pc.set_facecolor('#228B224d')
for pc in vp['bodies'][1:]:
    pc.set_facecolor('#D3D3D34d')
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Spine Volume (µm^3)')
plt.xticks([0, 1, 2], [ "core-other\n(n={:d})".format(len(core2other_spine_vol)),"other-core\n(n={:d})".format(len(other2core_spine_vol)),"other-other\n(n={:d})".format(len(other2other_spine_vol))])
fig.savefig(exp_path+'/results/global_cores_others_spine_vol.svg', transparent=True)
fig.savefig(exp_path+'/results/global_cores_others_spine_vol.png', transparent=True, dpi=1200)
fig.clf()
plt.close()

### Non-Ca-imaged and outside EM volume inputs (panel 2B, last two boxes)

Core responses could be due to non-imaged and outside volume sources. How can we rule this out (or reduce our lack of knowledge)?   
We can ask *Are there more or stronger spines made by non-imaged neurons (either local or far) on cores or others?*   
We have this information since we know the cell ID of all somas in the volume. We can take the spines having presynaptic ID different from the known Ca-imaged IDs or different from the somas within the EM volume.

In [None]:
print("... postsynaptic spines on cores or others from sources non-imaged or without soma in the volume")
far2core_spine_vol = [] # µm3
far2other_spine_vol = []
set_ids = set(ophys_cell_ids)
for dyn_core_ids in clusters_cores:
    dyn_other_ids = set_ids.symmetric_difference(dyn_core_ids)
    # searching
    # id, pre_root_id, post_root_id, cleft_vx, spine_vol_um3
    far2core_synapse_df = syn_spines_df.query(f'(pre_root_id not in {list(set_ids)}) and (post_root_id in {list(dyn_core_ids)})')
    if not far2core_synapse_df.empty:
        far2core_spine_vol.extend( far2core_synapse_df['spine_vol_um3'].tolist() )
    far2other_synapse_df = syn_spines_df.query(f'(pre_root_id not in {list(set_ids)}) and (post_root_id in {list(dyn_other_ids)})')
    if not far2other_synapse_df.empty:
        far2other_spine_vol.extend( far2other_synapse_df['spine_vol_um3'].tolist() )
        
# description
print("    {:d} far2core spines, volume: {:1.3f}±{:1.2f} µm3".format(len(far2core_spine_vol), np.mean(far2core_spine_vol),np.std(far2core_spine_vol)) )
# print("    "+str(stats.describe(far2core_spine_vol)) )
print("    {:d} far2other spines, volume: {:1.3f}±{:1.2f} µm3".format(len(far2other_spine_vol), np.mean(far2other_spine_vol),np.std(far2other_spine_vol)) )
# print("    "+str(stats.describe(far2other_spine_vol)) )

# significativity
kwstat,pval = stats.kruskal(far2core_spine_vol, far2other_spine_vol)
print("    far-core vs far-other spine size Kruskal-Wallis test results:",kwstat,pval)
if len(far2core_spine_vol)>0 and len(far2other_spine_vol)>0:
    d,_ = stats.ks_2samp(far2core_spine_vol, far2other_spine_vol) # non-parametric measure of effect size [0,1]
    print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

# all spine volumes by type
fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(far2core_spine_vol))
plt.scatter(xs, far2core_spine_vol, edgecolor='forestgreen', facecolor=('#228B224d'))
xs = np.random.normal(2, 0.04, len(far2other_spine_vol))
plt.scatter(xs, far2other_spine_vol, edgecolor='silver', facecolor=('#D3D3D34d'))
vp = ax.violinplot([far2core_spine_vol,far2other_spine_vol], widths=0.3, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Spine Volume (µm^3)')
plt.xticks([1, 2], ["far-core\n(n={:d})".format(len(far2core_spine_vol)), "far-other\n(n={:d})".format(len(far2other_spine_vol))])
fig.savefig(exp_path+'/results/global_far_cores_others_spine_vol.svg', transparent=True)
fig.savefig(exp_path+'/results/global_far_cores_others_spine_vol.png', transparent=True, dpi=1200)
plt.close()
fig.clf()

### Are core more mutually connected than others?

We started by asking whether a global measure such as assortativity - - gives a clear summary of mutuality between all cores.

In [None]:
dgraph.vs["ophys_cell_id"] = ophys_cell_ids
is_id_core = np.array( [0] * len(ophys_cell_ids) )
is_id_core[core_indexes] = 1
dgraph.vs["is_core"] = is_id_core.tolist()
pyc_ca_syn_df = syn_df.query(f'(pre_root_id in {ophys_cell_ids}) and (post_root_id in {ophys_cell_ids})')
is_syn_core = np.array( [0] * len(pyc_ca_syn_df) )
for cid in [item for sublist in clusters_cores for item in sublist]:
    is_syn_core[pyc_ca_syn_df['pre_root_id'] == cid] = 1
dgraph.es["is_core"] = is_syn_core.tolist()
color_dict = {0: "gray", 1: "green"}
ig.plot(dgraph, exp_path+'/results/all_ring.svg', layout=dgraph.layout("circle"),
        edge_curved=0.2,
        edge_color=[color_dict[is_core] for is_core in dgraph.es["is_core"]],
        edge_width=0.5,
        edge_arrow_size=0.1,
        vertex_size=5,
        vertex_color=[color_dict[is_core] for is_core in dgraph.vs["is_core"]],
        vertex_frame_color=[color_dict[is_core] for is_core in dgraph.vs["is_core"]],
        margin=50)

print('... assortativity')
# is a preference for a network's nodes to attach to others that are similar in some way
print("    overall:", dgraph.assortativity_nominal("is_core", directed=True) )
# cores degree distro vs others degree distro
# biological networks typically show negative assortativity, or disassortative mixing, or disassortativity, as high degree nodes tend to attach to low degree nodes.
print("    assortativity degree:", dgraph.assortativity_degree(directed=True) )


### 3-Motif connectivity of cores and others (panel 2D)

This measure of the network reports the participation of cores (or non-cores) in triplet motifs.    
Note that the triplets are not exclusively made of cores (or non-cores).

In [None]:
# For each set of reproducible cluster cores we count their connectivity motifs.
set_indexes = set(ophys_cell_indexes)
for dyn_core_ids in clusters_cores:
    dyn_core_indexes = set([ophys_cell_ids.index(strid) for strid in dyn_core_ids])
    dyn_other_indexes = set_indexes.symmetric_difference(dyn_core_indexes)
    for mclass, mlist in motif_vertices.items():
        for mtriplet in mlist:
            intersection_cores = len(list(dyn_core_indexes.intersection(mtriplet)))
            intersection_others = len(list(dyn_other_indexes.intersection(mtriplet)))
            global_structural_motif_cores[mclass] += intersection_cores
            global_structural_motif_others[mclass] += intersection_others

fig = plt.figure()
plt.bar(global_structural_motif_cores.keys(), global_structural_motif_cores.values(), color='forestgreen')
plt.ylabel('cores occurrences')
plt.yscale('log')
plt.ylim([0.7,plt.ylim()[1]])
plt.xlabel('motifs types')
fig.savefig(exp_path+'/results/global_motifs_cores.svg', transparent=True)
plt.close()
fig.clear()
fig.clf()
fig = plt.figure()
plt.bar(global_structural_motif_others.keys(), global_structural_motif_others.values(), color='silver')
plt.ylabel('non-cores occurrences')
plt.yscale('log')
plt.ylim([0.7,plt.ylim()[1]])
plt.xlabel('motifs types')
fig.savefig(exp_path+'/results/global_motifs_others.svg', transparent=True)
plt.close()
fig.clear()
fig.clf()
print("... saved mutual connectivity of cores and others")

In [None]:
# dgraph is already defined from the structural_analysis included file
print("    graph diameter (#vertices):", dgraph.diameter(directed=True, unconn=True, weights=None))
print("    graph average path length (#vertices):", dgraph.average_path_length(directed=True, unconn=True))

### Do the cores of each cluster form a clique? (panel E)

If the cores of each cluster are pattern completion units, they should participate in more cliques (set of vertices where an edge is present between any two of them) than other non-core neurons.

In [None]:
cliques = dgraph.cliques(min=2)

cliques_cores = []
cliques_others = []

for cluster_cids in clustered_spectrums:
    cluster_core_indices = []
    # we take the index of the cell participating in this cluster
    cluster_indices = [ophys_cell_ids.index(strid) for strid in cluster_cids]
    # we take the cores of this cluster
    cluster_core_indices = list(set(core_indexes).intersection(cluster_indices))
    cluster_other_indices = list(set(other_indexes).intersection(cluster_indices))
    # we take the edges between the cores
    for clique in cliques:
        if set(clique).issubset(cluster_core_indices):
            cliques_cores.append(clique)
        if set(clique).issubset(cluster_other_indices):
            cliques_others.append(clique)

cores_cliques_count = len(cliques_cores)
others_cliques_count = len(cliques_others)

print("    cliques made by cores:",cores_cliques_count)
print("    cliques made by others:",others_cliques_count)

# print(core_edges)
x = np.array(["cores", "others"])
y = np.array([cores_cliques_count, others_cliques_count])
fig, ax = plt.subplots()
plt.bar(x, y, color=['forestgreen','silver'])
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Count of cliques')
plt.xticks([0, 1], ["core\n(n={:d})".format(cores_cliques_count), "other\n(n={:d})".format(others_cliques_count)])
fig.savefig(exp_path+'/results/global_cores_others_cliques.svg', transparent=True)
plt.close()
fig.clf()
    

## Centrality of cores

If cores are not more mutually connected compared to others, then what is their characterizing feature?    
In the cells above, we saw indications of more interconnections between cores and others than within the same type.     
This could hint at some form of centrality.

### The simple measures of centrality of cores are not different from others (panel 2F)

But the pagerank centrality is a generalization of degree centrality. Simple degree centrality measures the number of direct neighbors, while pagerank centrality measures the number of all nodes that can be connected through a path, with the contributions of distant nodes penalized.

In [None]:
print('... degree centrality')
degree_centrality_cores = dgraph.degree(core_indexes, mode='out', loops=True)
degree_centrality_others = dgraph.degree(other_indexes, mode='out', loops=True)
# description
print("    cores: "+str(stats.describe(degree_centrality_cores)) )
print("    others: "+str(stats.describe(degree_centrality_others)) )
# significativity
print("    Welch t test:  %.3f p= %.3f" % stats.ttest_ind(degree_centrality_cores, degree_centrality_others, equal_var=False))
d,_ = stats.ks_2samp(degree_centrality_cores, degree_centrality_others) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(degree_centrality_cores))
plt.scatter(xs, degree_centrality_cores, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(degree_centrality_others))
plt.scatter(xs, degree_centrality_others, alpha=0.3, c='silver')
vp = ax.violinplot([degree_centrality_cores,degree_centrality_others], widths=0.3, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Degree')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(degree_centrality_cores)), "other\n(n={:d})".format(len(degree_centrality_others))])
fig.savefig(exp_path+'/results/global_cores_others_degree.svg', transparent=True)
plt.close()
fig.clf()

### Betweenness of cores and others (panel 2G)

In [None]:
print('... betweenness')
cores_betweenness = np.array(dgraph.betweenness(vertices=core_indexes, directed=True))
others_betweenness = np.array(dgraph.betweenness(vertices=other_indexes, directed=True))
print("    cores: "+str(stats.describe(cores_betweenness)) )
print("    others: "+str(stats.describe(others_betweenness)) )
# significativity
print("    Welch t test:  %.3f p= %.3f" % stats.ttest_ind(cores_betweenness, others_betweenness, equal_var=False))
d,_ = stats.ks_2samp(cores_betweenness, others_betweenness) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(cores_betweenness))
plt.scatter(xs, cores_betweenness, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(others_betweenness))
plt.scatter(xs, others_betweenness, alpha=0.3, c='silver')
vp = ax.violinplot([cores_betweenness,others_betweenness], widths=0.3, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Betweenness')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(cores_betweenness)), "other\n(n={:d})".format(len(others_betweenness))])
fig.savefig(exp_path+'/results/global_cores_others_betweenness.svg', transparent=True)
plt.close()
fig.clf()

### Hub scores of cores and others (panel 2H)

Are the cores also hubs of the network?

In [None]:
print("... hub score")
# what is the overlap of cores and hubs?
# Hub
hub_scores = np.array(dgraph.hub_score(weights=None, scale=True, return_eigenvalue=False))
hub_scores_cores = hub_scores[core_indexes]
hub_scores_others = hub_scores[other_indexes]
print("    hub cores: "+str(stats.describe(hub_scores_cores)) )
print("    hub others: "+str(stats.describe(hub_scores_others)) )
# significativity
print("    Kruskal-Wallis test:  %.3f p= %.3f" % stats.ttest_ind(hub_scores_cores, hub_scores_others))
d,_ = stats.ks_2samp(hub_scores_cores, hub_scores_others) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
# all eccentricity by type
fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(hub_scores_cores))
plt.scatter(xs, hub_scores_cores, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(hub_scores_others))
plt.scatter(xs, hub_scores_others, alpha=0.3, c='silver')
vp = ax.violinplot([hub_scores_cores,hub_scores_others], widths=0.3, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Hub score')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(hub_scores_cores)), "other\n(n={:d})".format(len(hub_scores_others))])
fig.savefig(exp_path+'/results/global_cores_others_hub_score.svg', transparent=True)
plt.close()
fig.clf()

## Cores control the flow of cortical activity

So far, we used structural (graph) measures of neurons selected by looking at their reproducibility (a form of regular activity). In a sense, we were already crossing structural and dynamical information about the network.    

However, we could push this further.    

### Structural underpinnings of clusters (panel 2XXX)

What is the origin of pattern reproducibility?    
We can look at the underlying connectivity structure. 

Given a state in which a certain cell fires, we should be able to predict which cell will fire next based on the connectivity graph, e.g. the most frequent (shortest or simple) path, or max flow (assigning a capacity proportional to the spine volume).   

Take the full list of cells, with the full list of synapses and spine volumes. 
For each event, compute (a combination of) the max flow (considering spine volumes as weights), and number of shortest paths, between the current state (reduced to just one cid, if possible) and the next state (just one cid).    
Do we get a good prediction?    
Do core neurons have more shortest paths leading to them compared to others? (If yes, this can be also quantified using pagerank. It already works).

Another way will be to look at the min cuts of the flow. It already works.

 

In [29]:
# print(syn_spines_df.shape)
# print(syn_spines_df.columns)
# print(syn_spines_df.head())

pre_root_ids = syn_spines_df.pre_root_id.unique().tolist()
# print("pre_root_ids",len(pre_root_ids))
post_root_ids = syn_spines_df.post_root_id.unique().tolist()
# print("post_root_ids",len(post_root_ids))

# print("intersections pre post")
# print( len(list(set(pre_root_ids).intersection(post_root_ids))) )
# print( len(list(set(post_root_ids).intersection(pre_root_ids))) )
# # so, post_root_ids contain all others

# print("intersections ophys pre post")
# print( len(list(set(ophys_cell_ids).intersection(post_root_ids))) )
# print( len(list(set(ophys_cell_ids).intersection(pre_root_ids))) )

# creating a larger graph containing all synapses with measured spines 
# def index_from_postid(row):
#     if row['post_root_id'] in ophys_cell_ids:
#         return ophys_cell_ids.index(row['post_root_id'])
#     else:
#         return post_root_ids.index(row['post_root_id'])+1000
# def index_from_preid(row):
#     if row['pre_root_id'] in ophys_cell_ids:
#         return ophys_cell_ids.index(row['pre_root_id'])
#     else:
#         return post_root_ids.index(row['pre_root_id'])+1000
# # print(syn_spines_df.apply(lambda row: index_from_postid(row), axis=1))
# syn_spines_df['source'] = syn_spines_df.apply(lambda row: index_from_preid(row), axis=1)
# syn_spines_df['target'] = syn_spines_df.apply(lambda row: index_from_postid(row), axis=1)
# print(syn_spines_df[["source", "pre_root_id"]])
# print(list(zip(ophys_cell_ids,list(ophys_cell_indexes))))
# print(sorted(syn_spines_df.source.unique()))
# print(sorted(syn_spines_df.target.unique()))
# print(syn_spines_df.columns)

spinesgraph = ig.Graph.DataFrame(syn_spines_df[["pre_root_id", "post_root_id", "spine_vol_um3"]], directed=True)
# plotting
is_id_core = np.array( [0] * len(ophys_cell_ids) )
is_id_core[core_indexes] = 1
spinesgraph.vs["is_core"] = is_id_core.tolist()
color_dict = {0: "gray", 1: "green"}
ig.plot(spinesgraph, exp_path+'/results/spines_ring.svg', layout=spinesgraph.layout("circle"),
        edge_curved=0.2,
        edge_width = spinesgraph.es['spine_vol_um3'],
        edge_arrow_size=0.1,
        vertex_size=5,
        vertex_color=[color_dict[is_core] for is_core in spinesgraph.vs["is_core"]],
        vertex_frame_color=[color_dict[is_core] for is_core in spinesgraph.vs["is_core"]],
        margin=50)
# print(spinesgraph)

# for each event in a cluster, take each cellindex as source and search the graph for the path to the next cell firing
print("... cluster pathways")
# print(sorted_events_indexes)

# we do not want the sorted_events_indexes... we want the id/idx in the temporal sequence

for cluster_k,events_cellindexes in sorted_events_indexes.items():
    print()
    # level of sequence regularity
    middle_paths = []
    if cluster_k == 'gray':
        continue
    print(cluster_k,events_cellindexes)
    for vnt in events_cellindexes:
        for posi,vidj in enumerate(vnt[1:]):
            vidi = vnt[posi]
            
            print(vidi, vidj, ophys_cell_ids[vidi],ophys_cell_ids[vidj])
            
            # Take the maximum flow between the source and target vertices
            mfres = spinesgraph.maxflow(spinesgraph.vs.find(name=ophys_cell_ids[vidi]).index, spinesgraph.vs.find(name=ophys_cell_ids[vidj]).index, capacity='spine_vol_um3')
            # returns a tuple containing the following: 
            # graph - the graph on which this flow is defined
            # value - the value (capacity) of the maximum flow between the given vertices
            # flow - the flow values on each edge. For directed graphs, this is simply a list where element i corresponds to the flow on edge i.
            # cut - edge IDs in the minimal cut corresponding to the flow.
            # partition - vertex IDs in the parts created after removing edges in the cut
            # es - an edge selector restricted to the edges in the cut.
            print(mfres)
            print(mfres.value)
            # print(mfres.flow)
            # print(sum(mfres.flow))
            
            # Iterate over the edges identified by the flow.
            # count the edges sourcing from cores, and those targeting cores. Which is more?
            for edge in mfres.es:
                print(edge.source, edge.target, spinesgraph.vs[edge.target])

            
            
            break
        break
        
            # spaths = spinesgraph.get_all_shortest_paths(spinesgraph.vs.find(name=ophys_cell_ids[vidi]), to=spinesgraph.vs.find(name=ophys_cell_ids[vidj]), mode='out')
            # spaths = spinesgraph.get_all_simple_paths(spinesgraph.vs.find(name=ophys_cell_ids[vidi]), spinesgraph.vs.find(name=ophys_cell_ids[vidj]), mode='out')
            # print(spaths)

            # for spts in spaths:
            #     # print(spts, spts[1:-1])
            #     middle_paths.extend(spts[1:-1])

# print(middle_paths)

... cluster pathways


#ff5f30 [[107, 35, 39, 41, 94], [94, 39, 42, 45, 13, 29, 41, 21, 92, 35], [35, 74, 10, 94, 31], [86, 39, 21, 53, 18, 94, 35, 10]]
107 35 648518346349537741 648518346349539862
Graph flow (68 edges, 41 vs 40 vertices, value=7.1815)
7.181497876857552
64 27 igraph.Vertex(<igraph.Graph object at 0x7f362a1867a0>, 27, {'name': 6.485183463495313e+17, 'is_core': 0})
64 69 igraph.Vertex(<igraph.Graph object at 0x7f362a1867a0>, 69, {'name': 6.485183463495384e+17, 'is_core': 0})
64 75 igraph.Vertex(<igraph.Graph object at 0x7f362a1867a0>, 75, {'name': 6.485183463495395e+17, 'is_core': 0})
64 75 igraph.Vertex(<igraph.Graph object at 0x7f362a1867a0>, 75, {'name': 6.485183463495395e+17, 'is_core': 0})
64 75 igraph.Vertex(<igraph.Graph object at 0x7f362a1867a0>, 75, {'name': 6.485183463495395e+17, 'is_core': 0})
64 69 igraph.Vertex(<igraph.Graph object at 0x7f362a1867a0>, 69, {'name': 6.485183463495384e+17, 'is_core': 0})
64 80 igraph.Vertex(<igraph.Graph object at 0x7f362a1867a

### Flow of cores vs others (panel 2J)

To understand how core centrality could affect population events, we considered the flow – number and identity of connections to cut to interrupt the circuit between the first and the last firing neuron of each population event (e.g. the subgraphs made by neurons active in the events depicted in Fig. 1E). 

In [None]:
print('... flow between beginning and end of event cells')
# Flow
# Returns all the cuts between the source and target vertices in a directed graph.
# This function lists all edge-cuts between a source and a target vertex. Every cut is listed exactly once.
core_edges = []
other_edges = []
for sts,stscol in zip(source_target_cidx,source_target_color):
    cuts = dgraph.all_st_cuts(source=sts[0], target=sts[1])
    for cut in cuts:
        for edge in cut.es:
            source_vertex_id = edge.source
            target_vertex_id = edge.target
            if source_vertex_id in core_indexes:
                core_edges.append(source_vertex_id)
            elif target_vertex_id in core_indexes:
                core_edges.append(target_vertex_id)
            else:
                other_edges.append(source_vertex_id)
                other_edges.append(target_vertex_id)
# clusters_cores_by_color
cores_edges_count = sum(np.unique(core_edges, return_counts=True)[1])
others_edges_count = sum(np.unique(other_edges, return_counts=True)[1])
print("    cores in the edges removed to stop the flow:",cores_edges_count)
print("    others in the edges removed to stop the flow:",others_edges_count)

# print(core_edges)
x = np.array(["cores", "others"])
y = np.array([cores_edges_count, others_edges_count])
fig, ax = plt.subplots()
plt.bar(x, y, color=['forestgreen','silver'])
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Count of cutting-flow edges')
plt.xticks([0, 1], ["core\n(n={:d})".format(cores_edges_count), "other\n(n={:d})".format(others_edges_count)])
fig.savefig(exp_path+'/results/global_cores_others_flow.svg', transparent=True)
plt.close()
fig.clf()

### Cores are crossroads of multiple paths

If cores are more often than others part of the paths, it means that they might not be central by virtue of their degree, but by how many event trajectory path (not just any path as in the betweenness, or hubness) pass through them.     
Their **pagerank** is a closer approximation.

In [None]:
print('... PageRank centrality')
pagerank_cores = np.array(dgraph.personalized_pagerank(vertices=core_indexes, directed=True, damping=0.85, reset="is_core"))
pagerank_others = np.array(dgraph.personalized_pagerank(vertices=other_indexes, directed=True, damping=0.85, reset="is_core"))
# description
print("    cores: "+str(stats.describe(pagerank_cores)) )
print("    others: "+str(stats.describe(pagerank_others)) )
# significativity
print("    Welch t test:  %.3f p= %.3f" % stats.kruskal(pagerank_cores, pagerank_others))
d,_ = stats.ks_2samp(pagerank_cores, pagerank_others) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)

fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(pagerank_cores))
plt.scatter(xs, pagerank_cores, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(pagerank_others))
plt.scatter(xs, pagerank_others, alpha=0.3, c='silver')
vp = ax.violinplot([pagerank_cores,pagerank_others], widths=0.3, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Degree')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(pagerank_cores)), "other\n(n={:d})".format(len(pagerank_others))])
fig.savefig(exp_path+'/results/global_cores_others_pagerank.svg', transparent=True)
plt.close()
fig.clf()

---
## Supplementary figure 3
   
To have keep cores within the attractor framework, cores activity could be sustained by indirect synaptic feedback, through highly connected secondary paths.   
To back up the attractor idea, one would expect that core neurons would have shorter paths or cycles, compared to others. 

### Shortest paths of cores and others (panel S3A)

In [None]:
print("... number of paths in a complete graph of the same size:", (np.math.factorial(112-2)*np.e))
print('... number of shortest paths between cores')
core_shortestpaths = []
for coreidx in core_indexes:
    othercores = list(core_indexes)
    othercores.remove(coreidx)
    shrtpth = dgraph.get_shortest_paths(coreidx, to=othercores, weights=None, mode='out', output='vpath')
    for strp in shrtpth:
        core_shortestpaths.append(len(strp))
other_shortestpaths = []
for otheridx in other_indexes:
    otherothers = list(other_indexes)
    otherothers.remove(otheridx)
    shrtpth = dgraph.get_shortest_paths(otheridx, to=otherothers, weights=None, mode='out', output='vpath')
    for strp in shrtpth:
        other_shortestpaths.append(len(strp))
print("    cores shortest paths: "+str(stats.describe(core_shortestpaths)) )
print("    others shortest paths: "+str(stats.describe(other_shortestpaths)) )
print("    equal variances? "+str(stats.levene(core_shortestpaths, other_shortestpaths)) )
# significativity
print("    Welch t test:  %.3f p= %.3f" % stats.ttest_ind(core_shortestpaths, other_shortestpaths, equal_var=False))
d,_ = stats.ks_2samp(core_shortestpaths, other_shortestpaths) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(core_shortestpaths))
plt.scatter(xs, core_shortestpaths, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(other_shortestpaths))
plt.scatter(xs, other_shortestpaths, alpha=0.3, c='silver')
vp = ax.violinplot([core_shortestpaths,other_shortestpaths], widths=0.3, showextrema=False, showmedians=True)
for pc in vp['bodies']:
    pc.set_edgecolor('black')
for pc,cb in zip(vp['bodies'],['#228B224d','#D3D3D34d']):
    pc.set_facecolor(cb)
vp['cmedians'].set_color('orange')
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Shortest path length')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(core_shortestpaths)), "other\n(n={:d})".format(len(other_shortestpaths))])
fig.savefig(exp_path+'/results/global_cores_others_shortestpath.svg', transparent=True)
plt.close()
fig.clf()

### Cycles between cores or others (panel S3B)

Cycles are built starting from a core (or other) and iterating neighbors of different lenghts, where the last vertex is the starting one.

In [None]:
print('... cycles')
# breadth first search of paths and unique cycles
def get_cycles(adj, paths, maxlen):
    # tracking the actual path length:
    maxlen -= 1
    nxt_paths = []
    # iterating over all paths:
    for path in paths['paths']:
        # iterating neighbors of the last vertex in the path:
        for nxt in adj[path[-1]]:
            # attaching the next vertex to the path:
            nxt_path = path + [nxt]
            if path[0] == nxt and min(path) == nxt:
                # the next vertex is the starting vertex, we found a cycle
                # we keep the cycle only if the starting vertex has the
                # lowest vertex id, to avoid having the same cycles
                # more than once
                paths['cycles'].append(nxt_path)
                # if you don't need the starting vertex
                # included at the end:
                # paths$cycles <- c(paths$cycles, list(path))
            elif nxt not in path:
                # keep the path only if we don't create
                # an internal cycle in the path
                nxt_paths.append(nxt_path)
    # paths grown by one step:
    paths['paths'] = nxt_paths
    if maxlen == 0:
        # the final return when maximum search length reached
        return paths
    else:
        # recursive return, to grow paths further
        return get_cycles(adj, paths, maxlen)
# Comparison of core based cycles vs other based cycles
maxlen = 10 # the maximum length to limit computation time
# creating an adjacency list
adj = [[n.index for n in v.neighbors()] for v in dgraph.vs]
# recursive search of cycles
# for each core vertex as candidate starting point
core_cycles = []
for start in core_indexes:
    core_cycles += get_cycles(adj,{'paths': [[start]], 'cycles': []}, maxlen)['cycles']
print("    # core-based cycles:", len(core_cycles) )
# count the length of loops involving 1 core
core_cycles_lens = [len(cycle) for cycle in core_cycles]
print("    core-based cycles length: "+str(stats.describe(core_cycles_lens)) )

other_cycles = []
for start in other_indexes:
    other_cycles += get_cycles(adj,{'paths': [[start]], 'cycles': []}, maxlen)['cycles']
print("    # other-based cycles:", len(other_cycles) )
# count the length of loops involving 1 core
other_cycles_lens = [len(cycle) for cycle in other_cycles]
print("    other-based cycles length: "+str(stats.describe(other_cycles_lens)) )

d,_ = stats.ks_2samp(core_cycles_lens, other_cycles_lens) # non-parametric measure of effect size [0,1]
print('    Kolmogorov-Smirnov Effect Size: %.3f' % d)
# all cycles by type
fig, ax = plt.subplots()
xs = np.random.normal(1, 0.04, len(core_cycles_lens))
plt.scatter(xs, core_cycles_lens, alpha=0.3, c='forestgreen')
xs = np.random.normal(2, 0.04, len(other_cycles_lens))
plt.scatter(xs, other_cycles_lens, alpha=0.3, c='silver')
bp = ax.boxplot([core_cycles_lens,other_cycles_lens], notch=0, sym='', showcaps=False, zorder=10)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.ylabel('Cycles length')
plt.xticks([1, 2], ["core\n(n={:d})".format(len(core_cycles_lens)), "other\n(n={:d})".format(len(other_cycles_lens))])
fig.savefig(exp_path+'/results/global_cores_others_cyclelens.png', transparent=True, dpi=1500)
plt.close()
fig.clf()

### Clusters of events are not reproducible trajectories of the population dynamics

Clusters of population events are found by correlating population vectors, which only retain the cell IDs while ignoring the time of firing.    
We can consider also time.

Each recorded frame (~67ms) is an instantaneous population state defined by all its cells (112 of them are known for their firing, the others are unkown).    
A sequence of population states is a trajectory in the population dynamical state space.    
In this space, clusters of reproducible population events are represented by reproducible trajectories. 

We can compare the event trajectories visited within a cluster by comparing their patterns.    
Events are made by cells firing (often multiple times) during the event interval, so each sequence is a 2D submatrix of the population rasterplot. This gives a measure of trajectory reproducibility. 

In [None]:
# print(cluster_events_spiketrains) # already expressed in integer (ms)

print("... sequence internal consistency")

# cycle over clusters
for cluster_k, events_cellindexes in sorted_events_indexes.items():
    if cluster_k == 'gray':
        continue
    print()

    # We want to compare the trajectories of this cluster.
    # Trajectories should have same shape. We will subract them to get the difference (/num of events).
    
    # Finding the common-shape trajectory
    # n is the maximal number of cells participating to events in this cluster
    maxcells = max(events_cellindexes, key = lambda i: len(i))
    # m is the largest interval between the first min spiketrain and the last max spiketrain of all events in the cluster
    events_spiketrains = cluster_events_spiketrains[cluster_k]
    # print(events_spiketrains)
    maxinterval = 0 # 
    for evt_spktrains in events_spiketrains:
        mint = np.amin([x for xs in evt_spktrains for x in xs]) # for cases of just one spiketinme in list
        if isinstance(mint, list): mint = mint[0] # for cases of list
        maxt = np.amax([x for xs in evt_spktrains for x in xs])
        if isinstance(maxt, list): maxt = maxt[-1]
        if maxt-mint > maxinterval:
            maxinterval = maxt-mint
    print("    common trajectory pattern with n cells:", len(maxcells), " and m intervals:", maxinterval)
    
    # cluster trajectories, one per event, all same shape
    cluster_trajectories = []
    for evt_indexes,evt_spktrains in zip(events_cellindexes,events_spiketrains):
        # create empty trajectory of shape n cell, m interval
        trajectory = np.zeros((len(maxcells),maxinterval+1))
        mint = np.amin([x for xs in evt_spktrains for x in xs]) # take local mintime to find the trajectory m index
        if isinstance(mint, list): mint = mint[0] # for cases of just one spiketinme in list
        for ncell,spktrain in enumerate(evt_spktrains):
            trajectory[ncell][spktrain-mint] = 1
        cluster_trajectories.append(trajectory)
    
    # correlation between trajectories
    # very simple (probably too much) measure of trajectory correspondence
    trajR = []
    for itr,itrajectory in enumerate(cluster_trajectories):
        for jtr,jtrajectory in enumerate(cluster_trajectories):
            if itr!=jtr:
                trajR.append( np.nanmean(np.corrcoef(itrajectory,jtrajectory)) )
    print("    correlation across all trajectories: {:1.3f}±{:1.2f}".format(np.mean(trajR),np.std(trajR)))

    print("... searching for repeating sequences in the ordered firing of cell IDs")
    size = 2
    # size = 3
    cluster_sequences = [x for xs in events_cellindexes for x in xs]
    # print(cluster_sequences)
    windows = [
        tuple(window)
        for window in more_itertools.windowed(cluster_sequences, size)
    ]
    counter = collections.Counter(windows)
    for window, count in counter.items():
        if count > 1:
            print("   ",window, count)
            print(core_indexes)
        