## Number of synapses per connection

In this notebook you will analyze and validate one of parameter of the connectome: the number of synapses per connection.

Cell A is connected to cell B if the axon of one cell makes at least one synapse on the other cell (we do not consider gap junction).

A connection has a direction, so between A and B we can identify two pathways. A->B means that axon from A creates the synapse(s) on B, while B->A means that axon from B creates the synapse(s) on A.

Another important point is that a connection may have one or multiple synapses, which has an impact on the anatomy and physiology of the network.

In this notebook, you will analyze the number of synapses per connection in all the possible pathways.

---

Import some python packages.

In [None]:
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn

from bluepysnap import Circuit
from bluepysnap.bbp import Cell

Reading and preparing the data.

In [None]:
circuit_path = '/home/data-bbp/20191017/circuit_config.json'
circuit = Circuit(circuit_path)
cells = circuit.nodes["hippocampus_neurons"]
conn = circuit.edges["hippocampus_neurons__hippocampus_neurons__chemical"]

### Analysis

Initialize where to store the results.

Since all the possible pathways form a 2D matrix, it is convenient to have one matrix where to store means and another one where to store standard deviations.

Furthermore, you are going to compare the result with values extracted from literature. Among those values, you have the number of synapses per connection in parvalbumin positive (PV+) cells, a group that includes SP_PVBC, SP_BS, and SP_AA. The group PV is already defined in the circuit, but you have to include it in the matrices.

In [None]:
mtypes = cells.property_values(Cell.MTYPE)
model_mean = pd.DataFrame(index=mtypes, columns=mtypes.union({u'PV'}), dtype=float)
model_std = pd.DataFrame(index=mtypes, columns=mtypes.union({u'PV'}), dtype=float)

The analysis could be quite expensive, so better to reduce the number of samples.

Furthermore, since you will repeat the same analysis many times, it is convenient to create a helper function.

In [None]:
nsample = 100
def sample_nsyn(pre, post):
    it = conn.iter_connections(pre, post, return_edge_count=True)
    return np.array([p[2] for p in itertools.islice(it, nsample)])

Here, you run the analysis.

Note that the function sample_nsyn returns a 2D matrix with all the same connections for a given pathway.

From this matrix, you will calculate the mean and standard deviation that end up in the result matrices. The current position in the analysis loop is indicated by the processed m-type. 

In [None]:
for pre_mtype in mtypes:
    for post_mtype in mtypes:
        data = sample_nsyn(
            pre={Cell.MTYPE: pre_mtype, Cell.REGION: {'$regex': 'mc2.*'}},
            post={Cell.MTYPE: post_mtype}
        )
        if len(data) != 0:
            model_mean[post_mtype][pre_mtype] = data.mean()
            model_std[post_mtype][pre_mtype] = data.std()
    print(pre_mtype)

In [None]:
model_mean

Plot the result using an heatmap.

Note that when you have a white cell, it means that the two cell types are not connected (at least in the sample tested).

In [None]:
ax = seaborn.heatmap(model_mean)

fig = plt.gcf()
fig.suptitle('Number of synapses per connection', )

ax.set_xlabel('presynaptic mtype')
ax.set_ylabel('postsynaptic mtype')

ax.collections[0].colorbar.set_label("# synapses")

fig.show()

### Validation

After having analyzing the circuit, you can compare the model with experimental data extracted from literature.

The next cell loads the experimental data and puts it in a pandas dataframe.

In [None]:
bio_path = '/home/data-bbp/20191017/bioname/nsyn_per_connection_20190131.tsv'

In [None]:
df = pd.read_csv(bio_path, skiprows=1, names=['pre', 'post', 'bio_mean', 'bio_std'], usecols=[0, 1, 2, 3], delim_whitespace=True)
df.head()

As you can see, there are only a limited number of pathways.

Extract from the result matrices only the pathways for which you have experimental data.

In [None]:
df['mod_mean'] = np.NAN
df['mod_std'] = np.NAN

In [None]:
for idx in df.index:
    pre = df.loc[idx, 'pre']
    post = df.loc[idx, 'post']
    df.loc[idx, 'mod_mean'] = model_mean[post][pre]
    df.loc[idx, 'mod_std'] = model_std[post][pre]

Now we plot the results. The more points are lying on the diagonal, the more the model is close to experimental values.

In [None]:
plt.clf
x = df['mod_mean'].values
y = df['bio_mean'].values
# remove nan value from the array
l = np.linspace(0, max(x[~np.isnan(x)].max(), y.max()), 50)
# l = np.linspace(0, max(x.max(), y.max()), 50)
fig, ax = plt.subplots()
fig.suptitle('synapses per connection')
ax.plot(x, y, 'o')
ax.errorbar(x, y, xerr=df['mod_std'].values, yerr=df['bio_std'].values, fmt='o', ecolor='g', capthick=2)
ax.plot(l, l, 'k--')
ax.set_xlabel('Model (#)')
ax.set_ylabel('Experiment (#)')

fig.show()

### Exercise #1
Calculate the average number of synapses per connection in the four classes of connections (EE, EI, IE, II). Put the answers in a list called _ans\_1_ in the order (EE, EI, IE, II).

### Exercise #2
Calculate the distribution of number of synapses from SP_PVBC to SP_PC. Provide a list for _ans\_2_, with the first element being the mean value and the second element being the standard deviation of the number.

In [None]:
# Work here

In [None]:
# This is to generate the answers to paste in the submission box below.
# After you defined the variables with your answers, run this cell and the next cell, and copy-paste the output into the box below
print(json.dumps(dict([("ans_1", ans_1),
                       ("ans_2", ans_2)])))

In [None]:
import single_cell_mooc_client as sc_mc
s = sc_mc.Submission()