# It looks like the FAFB graph doesn't have all the connections that the connections table does

In [2]:
import polars as pl

import vnc_networks

connections = vnc_networks.connections.Connections(
    CR=vnc_networks.connectome_reader.FAFB_v783()
)
connections_table = pl.from_pandas(connections.connections)

Attribute class_1 not found in the graph. Adding it.


In [2]:
print(f"connections table has {len(connections_table)} connections")
print(f"connections graph has {len(connections.graph.edges)} connections")

connections table has 2709829 connections
connections graph has 2484163 connections


It turns out that the FAFB connections table sometimes has multiple entries between the same pair of neurons.

If we check the documentation [here](https://codex.flywire.ai/api/download) it says "More than one row can be present for the same pair of cells if they synapse in multiple neuropils (regions)."

In [3]:
connections_table.group_by(["start_bid", "end_bid"]).agg(
    pl.col("syn_count").len().alias("number_of_connections")
).filter(pl.col("number_of_connections") > 1).sort(
    "number_of_connections", descending=True
)

start_bid,end_bid,number_of_connections
i64,i64,u32
720575940624402173,720575940622160705,12
720575940620540507,720575940618932763,11
720575940612718563,720575940623122125,11
720575940639242303,720575940622160705,11
720575940622915060,720575940616551029,10
…,…,…
720575940613455986,720575940637085503,2
720575940620795784,720575940607689394,2
720575940609658377,720575940630758418,2
720575940630403269,720575940636250292,2


Looking at all the connections between the first pair of neurons:

In [4]:
connections_table.filter(
    (pl.col("start_bid") == 720575940624402173)
    & (pl.col("end_bid") == 720575940622160705)
)


start_bid,end_bid,syn_count,nt_type,eff_weight,subdivision_start,subdivision_end,syn_count_norm,eff_weight_norm,start_uid,end_uid
i64,i64,i64,str,i64,i64,i64,f64,f64,i64,i64
720575940624402173,720575940622160705,45,"""ACH""",45,0,0,0.034695,0.034695,131982,72220
720575940624402173,720575940622160705,12,"""ACH""",12,0,0,0.009252,0.009252,131982,72220
720575940624402173,720575940622160705,11,"""ACH""",11,0,0,0.008481,0.008481,131982,72220
720575940624402173,720575940622160705,6,"""ACH""",6,0,0,0.004626,0.004626,131982,72220
720575940624402173,720575940622160705,13,"""ACH""",13,0,0,0.010023,0.010023,131982,72220
…,…,…,…,…,…,…,…,…,…,…
720575940624402173,720575940622160705,6,"""ACH""",6,0,0,0.004626,0.004626,131982,72220
720575940624402173,720575940622160705,25,"""ACH""",25,0,0,0.019275,0.019275,131982,72220
720575940624402173,720575940622160705,35,"""ACH""",35,0,0,0.026985,0.026985,131982,72220
720575940624402173,720575940622160705,61,"""ACH""",61,0,0,0.047032,0.047032,131982,72220


In [5]:
connections.graph.edges[131982, 72220]


{'syn_count': 33,
 'nt_type': 'ACH',
 'eff_weight': 33,
 'syn_count_norm': 0.025443330763299923,
 'eff_weight_norm': 0.025443330763299923,
 'weight': 33}

The graph just remembered that last connection in the table - the rest were dropped by networkx

## Quick check for MANC

MANC v1.2 must have already aggregated synapses over pairs of neurons (actually I think I did this when downloading the data from neuprint), so here there's no problem.

This is because the neurotransmitter predictions are only applied at the level of neurons, so we first aggregate and then label the synapse type according to the neuron's neurotransmitter type.

In [6]:
def check_connections_table_and_graph_count(
    connectome_reader: vnc_networks.connectome_reader.ConnectomeReader,
):
    c = vnc_networks.connections.Connections(connectome_reader)
    print(
        f"connections table has {len(c.connections)} connections with {c.connections["syn_count"].sum()} synapses"
    )
    print(
        f"connections graph has {len(c.graph.edges)} connections with {sum(
        c.graph.edges[e]["syn_count"] for e in c.graph.edges
    )} synapses"
    )

check_connections_table_and_graph_count(vnc_networks.connectome_reader.MANC_v_1_2())

Attribute class_1 not found in the graph. Adding it.
connections table has 1372588 connections with 24151003 synapses
connections graph has 1372588 connections with 24151003 synapses


MANC v1.0 is all good too

In [7]:
check_connections_table_and_graph_count(vnc_networks.connectome_reader.MANC_v_1_0())

Attribute class_1 not found in the graph. Adding it.
connections table has 1548657 connections with 27387970 synapses
connections graph has 1548657 connections with 27387970 synapses


## How many synapses are affected?

In [8]:
connections_table_synapses = connections_table["syn_count"].sum()
graph_synapses = sum(
    connections.graph.edges[e]["syn_count"] for e in connections.graph.edges
)

print(
    f"We lost {(diff := connections_table_synapses - graph_synapses)} out of {connections_table_synapses} synapses, which is {diff/connections_table_synapses*100:.1f}%"
)

We lost 3549543 out of 31574890 synapses, which is 11.2%


Weirdly, there are 12285 neuron pairs that have multiple different neurotransmitters

In [60]:
connections_table.group_by(["start_bid", "end_bid"]).agg(
    pl.col("nt_type").value_counts().alias("different_neurotransmitters"),
    pl.col("nt_type").n_unique().alias("num_different_neurotransmitters"),
    pl.col("syn_count").sum().alias("num_synapses"),
).filter(pl.col("num_different_neurotransmitters") > 1).sort(
    pl.col("num_different_neurotransmitters"), descending=True
)


start_bid,end_bid,different_neurotransmitters,num_different_neurotransmitters,num_synapses
i64,i64,list[struct[2]],u32,i64
720575940623317321,720575940639278399,"[{""GLUT"",2}, {""GABA"",3}, … {""ACH"",1}]",4,156
720575940641501648,720575940621268651,"[{""SER"",1}, {""GABA"",1}, … {""ACH"",1}]",4,57
720575940631763909,720575940613096089,"[{""GABA"",1}, {""GLUT"",1}, … {""ACH"",1}]",4,49
720575940641501648,720575940629399370,"[{""GABA"",2}, {""ACH"",1}, {""GLUT"",1}]",3,76
720575940622263066,720575940607384242,"[{""GLUT"",1}, {""ACH"",1}, {""GABA"",1}]",3,25
…,…,…,…,…
720575940610384082,720575940623682195,"[{""ACH"",4}, {""SER"",1}]",2,39
720575940627209990,720575940624202424,"[{""GABA"",1}, {""GLUT"",1}]",2,10
720575940617505629,720575940618392912,"[{""GLUT"",1}, {""GABA"",2}]",2,48
720575940621801866,720575940609819512,"[{""GLUT"",2}, {""GABA"",1}]",2,27


If we count the total number of synapses per neurotransmitter type, things look kind of messy. We even have a lot of neurons that have multiple of acetylcholine, glutamate and GABA, which shouldn't really be possible? Maybe it's just mislabelling? Would need to check the [neurotransmitter paper](https://doi.org/10.1016/j.cell.2024.03.016)...

I think it could be important to know that a neuron which primarily uses a fast acting neurotransmitter can also have neuromodulatory connections, and it would be nice to keep these.

In [145]:
connections_table.group_by(["start_bid", "end_bid", "nt_type"]).agg(
    pl.col("syn_count").sum().alias("num_nt_synapses"),
).group_by(["start_bid", "end_bid"]).agg(
    pl.struct("nt_type", "num_nt_synapses").alias("num_synapses_per_nt"),
    pl.col("nt_type").value_counts().alias("num_connections_per_nt"),
).filter(pl.col("num_synapses_per_nt").list.len() > 1).explode(
    "num_synapses_per_nt"
).unnest("num_synapses_per_nt").pivot(
    on="nt_type",
    index=["start_bid", "end_bid", "num_connections_per_nt"],
    values="num_nt_synapses",
).sort(pl.col("num_connections_per_nt").list.len(), descending=True)


start_bid,end_bid,num_connections_per_nt,ACH,GLUT,GABA,SER,DA,OCT
i64,i64,list[struct[2]],i64,i64,i64,i64,i64,i64
720575940631763909,720575940613096089,"[{""GABA"",1}, {""GLUT"",1}, … {""SER"",1}]",33,5,6,5,,
720575940641501648,720575940621268651,"[{""ACH"",1}, {""SER"",1}, … {""GLUT"",1}]",5,8,39,5,,
720575940623317321,720575940639278399,"[{""GLUT"",1}, {""GABA"",1}, … {""ACH"",1}]",11,53,86,,6,
720575940630911991,720575940625310014,"[{""GLUT"",1}, {""GABA"",1}, {""ACH"",1}]",7,11,7,,,
720575940622263066,720575940620543025,"[{""GABA"",1}, {""GLUT"",1}, {""ACH"",1}]",8,13,31,,,
…,…,…,…,…,…,…,…,…
720575940633356883,720575940636316023,"[{""GLUT"",1}, {""GABA"",1}]",,65,23,,,
720575940639071977,720575940620818017,"[{""GABA"",1}, {""ACH"",1}]",7,,6,,,
720575940630026745,720575940617190361,"[{""GLUT"",1}, {""GABA"",1}]",,26,16,,,
720575940626995880,720575940629010356,"[{""ACH"",1}, {""GABA"",1}]",23,,25,,,


## What can we do?

If all the neurotransmitter types for the same neuron pair were the same we could just aggregate the synapse counts, but this isn't the case...

One easy thing we can do that maybe works is to use the networkx `MultiDiGraph` - a directed graph that can have multiple edges between pairs of nodes. Maybe this works with some things and not others though?

In [11]:
import networkx as nx

fafb_multigraph = nx.from_pandas_edgelist(
    connections.connections,
    source="start_uid",
    target="end_uid",
    edge_attr=[
        "syn_count",  # absolute synapse count
        "nt_type",
        "eff_weight",  # signed synapse count (nt weighted)
        "syn_count_norm",  # input normalized synapse count
        "eff_weight_norm",  # input normalized signed synapse count
    ],
    create_using=nx.MultiDiGraph,
)

In [12]:
print(
        f"connections table has {len(connections.connections)} connections with {connections.connections["syn_count"].sum()} synapses"
    )
print(
    f"connections graph has {len(fafb_multigraph.edges)} connections with {sum(
    fafb_multigraph.edges[e]["syn_count"] for e in fafb_multigraph.edges
)} synapses"
)


connections table has 2709829 connections with 31574890 synapses
connections graph has 2709829 connections with 31574890 synapses


Whether this works for all the methods that operate on the graph, I'm not sure

Alternatively, we can reuse the existing functionality of splitting neurons to split each neuron with multiple different neurotransmitter outputs into "virtual neurons", each with only one type of neurotransmitter.

Then everything still fits in a normal digraph, and we can still represent it as an adjacency matrix