# Tutorial to use the MICrONS dataclear package

The package allows to download and easily process data from the MICrONS mm3 dataset. This Notebook shows the typical workflow to obtain the data for the first time.

The first step is to include all the relevant libraries to work.

In [1]:
#Cell magic for autoreloading files
%reload_ext autoreload
%autoreload 2

#Import general packages
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

#Allows to detect the package
import os

#Import our package
import microns_datacleaner as mic

The package contains a class that includes all the necessary stuff. First, we have to initialize the class.
Then, we can get the correct CAVEClient for it to start downloading. The data will be downloaded in the `datadir` argument of the class constructor. Please note that the path is relative to the place where the code is executed (i.e. where this notebook is located, or the folder where `python` is invoked).

In [7]:
cleaner = mic.MicronsDataCleaner(datadir = "../data", version=1300, custom_tables={"func_props":None})

['nucleus_detection_v0', 'aibs_metamodel_celltypes_v661', 'proofreading_status_and_strategy', 'nucleus_functional_area_assignment', 'coregistration_manual_v4']


The client can be used from `cleaner.client` to use all the functionality of CAVEClient, if needed For example, one can see all the tables with 

In [8]:
cleaner.get_table_list()

['baylor_gnn_cell_type_fine_model_v2',
 'nucleus_alternative_points',
 'allen_column_mtypes_v2',
 'bodor_pt_cells',
 'aibs_metamodel_mtypes_v661_v2',
 'allen_v1_column_types_slanted_ref',
 'aibs_column_nonneuronal_ref',
 'nucleus_ref_neuron_svm',
 'apl_functional_coreg_vess_fwd',
 'vortex_compartment_targets',
 'baylor_log_reg_cell_type_coarse_v1',
 'functional_properties_v3_bcm',
 'gamlin_2023_mcs',
 'l5et_column',
 'pt_synapse_targets',
 'coregistration_manual_v4',
 'cg_cell_type_calls',
 'synapses_pni_2',
 'nucleus_detection_v0',
 'vortex_manual_nodes_of_ranvier',
 'vortex_astrocyte_proofreading_status',
 'bodor_pt_target_proofread',
 'nucleus_functional_area_assignment',
 'coregistration_auto_phase3_fwd_apl_vess_combined_v2',
 'synapse_target_structure',
 'coregistration_auto_phase3_fwd_v2',
 'gamlin_2023_mcs_met_types',
 'vortex_manual_myelination_v0',
 'proofreading_status_and_strategy',
 'synapse_target_predictions_ssa',
 'aibs_metamodel_celltypes_v661']

However, usually we will not need to manually call the client. The tables to be downloaded for each version are manually curated and can be automatically downloaded and processed. The processing (`process_nucleus_data`) connects all classifications and auxiliary tables to the nucleus reference, generating a unified unit table with all the useful information for modelling. The steps are

1. The brain cell types and estimated brain area are linked to the nucleus reference.
2. The proofreading status and functional properties (OSI and pref. orientation from MICrONS) are added.
3. The coordinates are transformed to have pial distances in three separate columns. 
4. The y coordinate is discretized into 10 μm segments. Using the neuron classification from the first step, one can assign a layer to each segment. Finally, the layer boundaries are computed and returned. 
5. Duplicates and invalid `pt_root_id` are eliminated. 

The `process_nucleus_data` returns the table of units and the start and end of each layer. Please notice that no information about synapses has been yet processed.

In [13]:
#Download all data related to nucleus: reference table, various classifications, functional matchs
#cleaner.download_nucleus_data()

#Process the data and obtain units and segment tables
units, segments = cleaner.process_nucleus_data()

Index(['Unnamed: 0', 'id', 'created', 'superceded_id', 'valid',
       'pt_position_x', 'pt_position_y', 'pt_position_z', 'valid_id',
       'status_dendrite', 'status_axon', 'strategy_dendrite', 'strategy_axon',
       'pt_supervoxel_id', 'pt_root_id'],
      dtype='object')
strategy_dendrite


Transform positions: 100%|██████████| 94014/94014 [00:00<00:00, 120970.27it/s]


In [14]:
units

Unnamed: 0,pt_root_id,id,pt_position_x,pt_position_y,pt_position_z,classification_system,cell_type,brain_area,strategy_dendrite,strategy_axon,layer
0,864691136090135607,373879,828.189723,638.553801,783.72,excitatory_neuron,6P-CT,V1,none,none,L6
1,864691135194387242,408486,891.157407,662.693656,1002.96,nonneuron,oligo,V1,none,none,L6
4,864691136041571414,199883,514.440335,481.908658,1044.32,excitatory_neuron,5P-ET,V1,none,none,L5
6,864691135777840480,200523,513.635721,509.463386,652.56,excitatory_neuron,6P-CT,V1,none,none,L6
7,864691136273924109,372649,829.039163,557.615841,1039.92,excitatory_neuron,6P-IT,V1,none,none,L6
...,...,...,...,...,...,...,...,...,...,...,...
94009,864691135783968051,167679,427.922269,496.375161,1005.96,excitatory_neuron,5P-ET,V1,none,none,L5
94010,864691135940617126,303490,668.626983,461.348673,1003.64,excitatory_neuron,5P-ET,V1,none,none,L5
94011,864691135430122800,233088,533.829149,454.887682,1007.28,excitatory_neuron,5P-ET,V1,none,none,L5
94012,864691135884866160,232979,519.002564,451.791678,986.64,excitatory_neuron,5P-ET,V1,none,none,L5


In [6]:
segments

Unnamed: 0,layer,region_index,y_start,y_end,height
0,L1,0,1.224539,70.981459,69.75692
1,L2/3,0,70.981459,250.356396,179.374937
2,L4,0,250.356396,359.974413,109.618017
3,L5,0,359.974413,509.453527,149.479114
4,L6,0,509.453527,828.342305,318.888777


The next step is to download the synapses. Again, this operation is automated by the library. Downloading the whole dataset of synapses is very long. Thus, one can download synapses from a subset of pre- and post- neurons. 

As an example here, we will sample 120 random units and ask to download all potential presynaptic connections to those

In [9]:
postids = units['pt_root_id'].sample(n=120)
preids  = units['pt_root_id']

Now the download happens. Queries **to the API can be slow and need to retry.** To minimize problems, the downloader will try to download small chunks of `neur_per_steps=500` neurons to facilitate the task. If the downloader finds a problem and stops, it will automatically retry after `delay=5` seconds for `max_retries=10` times. 

You can select also if you want all synapses between neurons or just an effective synaptic volume between a pair of neurons. In the first case (`drop_synapses_duplicates=False`), you will download all individual synapses between the two neurons, their positions in space and synaptic volume; in the second one, (`drop_synapses_duplicates=True`, the default) you will get the sum of all synaptic volumes accross all synapses between those neurons. 

**Caution! The code below can be slow!**

In [23]:
cleaner.download_synapse_data(preids, postids)

Postsynaptic neurons queried so far: 0...


Estimated remaining time: 0s


If the process fails for any reason (download trials exceeded, internet connection lost...) one can check how many chunks were downloaded inside the folder `data/raw/synapses/`. Look at the last `connections_table_X`. Then, you can call `cleaner.download_synapse_data(preids, postids, start_index=X-1)` to start the downloaded right where you left it.

Once the function above is successful, all the downloaded chunks can be merged into a single synapse table with

In [24]:
cleaner.merge_synapses(syn_table_name="example_merged_synapses", )

Merged 1 tables into /home/victor/Fisica/Research/Milan/MICrONS-datacleaner/tutorial/../data/1300/raw/example_merged_synapses.csv


In [25]:
synapses = pd.read_csv("../data/1300/raw/example_merged_synapses.csv")
synapses

Unnamed: 0.1,Unnamed: 0,pre_pt_root_id,post_pt_root_id,size
0,0,864691134884743674,864691135952861219,1292
1,1,864691134884749306,864691135347037599,884
2,2,864691134884762106,864691135275949029,16620
3,3,864691134884807418,864691135661074928,5140
4,4,864691134884855290,864691135784120883,760
...,...,...,...,...
14069,14069,864691137198939713,864691135645073391,1056
14070,14070,864691137198988097,864691135275477477,4764
14071,14071,864691137199116097,864691135492906591,3548
14072,14072,864691137199116097,864691136057941464,5628


Now you are ready to start analysing your data!