# Loading Larva L1EM CNS Data

In this notebook, we detail the process with which the connectomics data from the Larva L1EM Central Nervous System is loaded into a running NeuroArch instance.

The raw data is taken from the publically available CATMAID dataset hosted on the [Virtual Fly Brain](https://l1em.catmaid.virtualflybrain.org/).

The folder structure is as follows:
```
.
├── Load_Larva_L1EM.ipynb             # this file
├── Load_Larva_L1EM.py                # executable script
├── connectors_published.csv          # synapse information
├── meshes                            # folder of all mesh files
├── neurons_published_2020_12_04.csv  # neuron information
└── swc                               # folder of all SWC neuron skeleton files
```

Please refer to the [README](https://github.com/FlyBrainLab/datasets/tree/main/l1em/README.md) for downloading the swc files required in this code

In [1]:
import neuroarch.na as na
import pandas as pd
from tqdm import tqdm
import numpy as np

### Define regions
The regions defined here are the high-level organizing nodes in the NeuroArch database, underwhich neurons/synapses are defined. They are mostly neuropils with the exception of CNS which is the entire mesh.

Each region is an entry in the dictionary of the format:
```python
entry_name: {
    'System': system_name_that_entry_is_included_in,
    'Neuropil': name_of_the_neuropil_of_the_system,
    'synomyms': list_of_synonyms_of_the_entry,
    'morphology': path_to_mesh_file
}
```

In [2]:
cns = {
    'System': 'CNS', 
    'morphology': './meshes/CNS_um.json'
}
neuropils = {
    'AL':{
        'System':'CNS',
        'Neuropil':'AL',
        'synonyms':['right antennal lobe','antennal lobe','al_r','al','right al'],
        'morphology': './meshes/AL_um.json'
    },
    'al':{
        'System':'CNS',
        'Neuropil':'al',
        'synonyms':['left antennal lobe','antennal lobe','al_l','al','left al'],
        'morphology': './meshes/AL_um.json'
    },
    'MB':{
        'System':'CNS',
        'Neuropil':'MB',
        'synonyms':['right mushroom body','mushroom body','right mb','mb_r','mb'],
        'morphology': './meshes/MB_um.json'
    },
    'mb':{
        'System':'CNS',
        'Neuropil':'mb',
        'synonyms':['left mushroom body','mushroom body','left mb','mb_l','mb'],
        'morphology': './meshes/MB_um.json'
    },
    'LON':{
        'System':'CNS',
        'Neuropil':'LON',
        'synonyms':['right larva optic neuropil','optic neuropil','right lon','lon_r','lon']
    },
    'lon':{
        'System':'CNS',
        'Neuropil':'lon',
        'synonyms':['left larva optic neuropil','optic neuropil','left lon','lon_f','lon']
    },
    'LH':{
        'System':'CNS',
        'Neuropil':'LH',
        'synonyms':['right lateral horn','lateral horn','right lh','lh']
    },
    'lh':{
        'System':'CNS',
        'Neuropil':'lh',
        'synonyms':['left lateral horn','lateral horn','left lh','lh']
    },
    'unknown':{
        'System':'CNS',
        'Neuropil':'unknown',
        'synonyms': ['left unknown', 'left unspecified', 'unspecified', 'na', 'unknown']
    },
    'UNKNOWN':{
        'System':'CNS',
        'Neuropil':'UNKNOWN',
        'synonyms': ['right unknown', 'right unspecified', 'unspecified', 'na', 'unknown']
    },
    'vnc':{
        'System':'CNS',
        'Neuropil':'vnc',
        'synonyms': ['left vnc', 'left ventral nerve cord', 'left ventral nerve', 'ventral nerve cord', 'vnc']
    },
    'VNC':{
        'System':'CNS',
        'Neuropil':'VNC',
        'synonyms': ['right vnc', 'right ventral nerve cord', 'left ventral nerve', 'ventral nerve cord', 'vnc']
    },
    'sez':{
        'System':'CNS',
        'Neuropil':'sez',
        'synonyms': ['left sez', 'left subesophageal zone', 'left subesophageal', 'subesophageal', 'sez']
    },
    'SEZ':{
        'System':'CNS',
        'Neuropil':'SEZ',
        'synonyms': ['right sez', 'right subesophageal zone', 'right subesophageal', 'subesophageal', 'sez']
    },
}


In [3]:
neuron_synonyms = {
    'PR': ['PR', 'photoreceptor'],
    'PN': ['PN', 'Projection Neuron'],
    'uPN': ['upn', 'Uni PN', 'uni-PN', 'uniglomeruluar projection neuron', 'uniglomerular pn'],
    'mPN': ['mpn', 'Multi PN', 'multi-PN', 'multi Projection Neuron', 'Multi Projection Neuron', 'Multiglomeruli PN', 'Multiglomeruli Projection Neuron'],
    'OSN': ['OSN', 'ORN', 'Olfactory Receptor Neuron', 'Olfactory Sensory Neuron'],
    'KC': ['kc', 'kenyon cell'],
    'MBON': ['mbon', 'mushroom body output neuron', 'Output Neuron'],
    'MBIN': ['mbin', 'mushroom body input neuron', 'mbin'],
    'DAN': ['dan', 'Dopaminergic Neuron'],
    'OAN': ['oan', 'Octopaminergic Neuron'],
    'LHN': ['lhn', 'lateral horn neuron'],
    'LHON': ['lhon', 'lateral horn output neuron'],
    'CN': ['cn', 'convergence neuron'],
    'MB2ON': ['mb2on'],
    'VPN': ['vpn', 'visual projection neuron'],
    'LN': ['ln', 'antennal lobe local neuron'],
    'Motor': ['motor', 'motor neuron'],
    'PMN': ['pmn', 'premotor neuron', 'pre-motor neuron'],
    'APL': ['apl', 'anterior posterior lateral'],
    'FFN': ['ffn', 'feedforward neuron'],
    'FBN': ['fbn', 'feedback neuron'],
    'FAN': ['fan', 'feedacross neuron']
}

### Create and connect to database. mode 'o' overwrites the entire database.

In [4]:
l1em = na.NeuroArch('localhost', 'l1em', mode = 'o')

### Create a species

In [5]:
species = l1em.add_species('Drosophila Melanogaster', stage = 'larva',
                                sex = 'female',
                                synonyms = ['larva fruit fly'])

### Create a datasource under the species

In [6]:
version = '1.0'
datasource = l1em.add_DataSource('L1EM', version = version,
                                 url = 'https://l1em.catmaid.virtualflybrain.org/',
                                 species = species)
l1em.default_DataSource = datasource

Setting default DataSource to L1EM version 1.0


### Create subsystems, tracts, neuropils and subregions under the datasource

In [7]:
cns = l1em.add_Subsystem('CNS', synonyms=['larva CNS', 'cns'], 
                   morphology={'type':'mesh', 'filename':'./meshes/CNS_um.json'})

In [8]:
for k, v in neuropils.items():
    if 'morphology' in v:
        l1em.add_Neuropil(k,
                           morphology = {'type': 'mesh', 'filename': v['morphology']},
                           subsystem = v['System']
        )
    else:
        l1em.add_Neuropil(k,
                   subsystem = v['System']
        )

### Load Neurons

The `neuron_list` dataframe contains a list of neurons of the following fields:

1. `uname`: globally unique name in the database
2. `label`: display name of the neuron. Follows `neuron_type-index` structure whenever possible. When `neuron_type` is not characterized by the FlyBrainLab developers, we follow the `catmaid_names` for the label
3. `catmaid_names`: names of the neuron in the catmaid database
4. `type`: neuron type
5. `side`: left/right side of the brain. Bilateral neurons are not currently handled by the database and are assumed as left by default.
6. `neurotransmitter`: neurotransmiter information of the neuron whenever available.
7. `locality`: whether a neuron is `input` to or `output` from or `local` to the associated `neuropil`
8. `neuropil`: name of neuropil that neuron is associated with
9. `source`: publication where data is originally released
10. `skid`: skeleton id, the identifier in catmaid database for neuron skeletons

In [9]:
neuron_list = pd.read_csv('./neurons_published_2020_12_04.csv', index_col=0)
swc_dir = './swc/'

In [10]:
neuron_list.head(3)

Unnamed: 0,uname,label,catmaid_names,type,side,neurotransmitter,locality,neuropil,source,skid
0,OSN-13a_7527710,OSN-13a,13a ORN left,OSN,left,acetylcholine,input,AL,Berck2016,7527710
1,OSN-13a_4073353,OSN-13a,13a ORN right,OSN,right,acetylcholine,input,AL,Berck2016,4073353
2,OSN-1a_2611805,OSN-1a,1a ORN left,OSN,left,acetylcholine,input,AL,Berck2016,2611805


In [11]:
pbar = tqdm(neuron_list.iterrows(), desc='Loading Neurons', total=neuron_list.shape[0])
for i, row in pbar:
    bodyID = row['skid']
    cell_type = row['type']
    uname = row['uname']
    name = row['label']

    info = {
        'source': row.source,
    }
    
    if row.catmaid_names != 'unknown':
        info['catmaid_name'] = row.catmaid_names
    
    
    neuropils = row['neuropil'].split(',')
    if row.neuropil != 'unknown':
        npl = neuropils[0].lower() if row.side.lower() == 'left' else neuropils[0].upper()
        arborization = {
            'type': 'neuropil',
             'dendrites': {npl: 1},
             'axons': {}
        }
    else:
        arborization = None
    l1em.add_Neuron(uname, # uname
                     name, # name
                     locality=True if row.locality == 'local' else False,
                     synonyms=neuron_synonyms[cell_type] if cell_type in neuron_synonyms else None,
                     neurotransmitters=row.neurotransmitter if row.neurotransmitter != 'unknown' else None,
                     referenceId = str(bodyID), #referenceId
                     info = info if len(info) else None,
                     morphology = {'type': 'swc', 'filename': '{}/{}.swc'.format(swc_dir, bodyID), 'scale': 1.},
                     arborization = arborization)
pbar.close()

Loading Neurons: 100%|██████████| 1051/1051 [03:45<00:00,  4.67it/s]


In [12]:
# If restarting the kernel after loading neurons, start with this
# l1em = na.NeuroArch('localhost', 'l1em', mode = 'w')
# l1em.default_DataSource = l1em.find_objs('DataSource', name = 'l1em')[0]

In [13]:
# find all the neurons so they can be keyed by their referenceId.
neurons = l1em.sql_query('select from Neuron').nodes_as_objs
# set the cache so there is no need for database access.
for neuron in neurons:
    l1em.set('Neuron', neuron.uname, neuron, l1em.default_DataSource)
neuron_ref_to_obj = {int(neuron.referenceId): neuron for neuron in neurons}

### Load synapses

The `synapse_df` dataframe contains a list of neurons of the following fields:

1. `N`: number of synapses
2. `presynaptic`: `uname` of presynaptic neuron
3. `postsynaptic`: `uname` of postsynaptic neuron
3. `pre_skid`: `skid` of presynaptic neuron
3. `post_skid`: `skid` of postsynaptic neuron
3. `uname`: uname of the synapse that follows the `presynaptic--postsynaptic` structure.
4. `x`: list of `x`-coordinates for synapse locations
4. `y`: list of `y`-coordinates for synapse locations
4. `z`: list of `z`-coordinates for synapse locations
4. `r`: list of radius of synapses. If `0`, the minimum size as specified in the `Neu3D` renderer is used when the synapses are rendered

In [14]:
synapse_df = pd.read_csv('./connectors_published.csv', index_col=0)

In [15]:
synapse_df.head(3)

Unnamed: 0,N,postsynaptic,presynaptic,r,uname,x,y,z,post_skid,pre_skid
0,14,picky-2_8877971,OSN-22c_40045,[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.],OSN-22c_40045--picky-2_8877971,[81335.2 81399.8 81399.8 81361.8 81399.8 80161...,[61841.2 61879.2 61879.2 59747.4 61879.2 59603...,[37450. 37200. 37200. 37650. 37200. 38250. 380...,8877971,40045
1,23,keystone_5030808,OSN-22c_40045,[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ...,OSN-22c_40045--keystone_5030808,[81335.2 81206. 81221.2 81221.2 81358. 81399...,[61841.2 62320. 61427. 61427. 61921. 61879...,[37450. 37900. 37550. 37550. 37350. 37200. 372...,5030808,40045
2,70,PN-22c_7865652,OSN-22c_40045,[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ...,OSN-22c_40045--PN-22c_7865652,[81335.2 81206. 81221.2 81358. 81399.8 81399...,[61841.2 62320. 61427. 61921. 61879.2 61879...,[37450. 37900. 37550. 37350. 37200. 37200. 372...,7865652,40045


In [16]:
pbar = tqdm(synapse_df.iterrows(), total=synapse_df.shape[0], desc='Loading Synapses')
for i, row in pbar:
    pre_neuron = neuron_ref_to_obj[row['pre_skid']]
    post_neuron = neuron_ref_to_obj[row['post_skid']]
    
    x = eval(row['x'].replace('\n', '').replace('  ', ' ').replace(', ', ',').replace(' ', ','))
    y = eval(row['y'].replace('\n', '').replace('  ', ' ').replace(', ', ',').replace(' ', ','))
    z = eval(row['z'].replace('\n', '').replace('  ', ' ').replace(', ', ',').replace(' ', ','))
    r = eval(row['r'].replace('\n', '').replace('  ', ' ').replace(', ', ',').replace(' ', ','))
    
    content = {'type': 'swc'}
    content['x'] = [round(i, 3) for i in np.array(x+x)/1000.] # conver from nm to um
    content['y'] = [round(i, 3) for i in np.array(y+y)/1000.] # conver from nm to um
    content['z'] = [round(i, 3) for i in np.array(z+z)/1000.] # conver from nm to um
    content['r'] = [round(i, 3) for i in np.array(r+r)/1000.] # conver from nm to um
    content['parent'] = [-1]*len(x) + [i+1 for i in range(len(x))]
    content['identifier'] = [7]*len(x) + [8]*len(x)
    content['sample'] = [i+1 for i in range(len(content['x']))]
    l1em.add_Synapse(pre_neuron, post_neuron, N = row['N'],
                      morphology = content)
pbar.close()

Loading Synapses: 100%|██████████| 30350/30350 [02:18<00:00, 218.78it/s]
