# `FANCDataset` demo

The `FANCDataset` class integrates tables from different sources. The aim of this notebook is to demonstrate what information can be accessed through an object of the `FANCDataset` class. The information are organized as Pandas dataframes. You should first familiarize yourself with Pandas as needed.

In [1]:
import pandas as pd
from datetime import datetime

import ysp_bot

The first thing to note is that an object of the `FANCDataset` class models a version of the FANC data dump. While it is possible to create a `FANCDataset` using its `__init__` method, the intended way to instantiate it is to use one of the following class methods:
- `dataset = FANCDataset.get_latest()`: This downloads and materializes the latest version if it doesn't already exist, or load it if it's already cached.
- `dataset = FANCDataset.from_path(version_data_dir, mat_timestamp)`: This load an already downloaded (and materialized) dataset. See docstring for more info.

In the server, `FANCDataset.get_latest()` is called once every hour. When developing, it's handy to use either of the methods above to create a dataset object in isolation of the server. Here we will get the latest version:

In [2]:
dataset = ysp_bot.dataset.FANCDataset.get_latest()

We can check when the dataset was materialized and where it is stored on the disk:

In [3]:
materialization_time = datetime.fromtimestamp(dataset.mat_timestamp)
version_path = dataset.version_data_dir
print(f'The dataset was materialized at {materialization_time}. '
      f'It is stored at {version_path}.')

The dataset was materialized at 2023-03-27 06:00:04. It is stored at /home/sibwang/Data/fanc/ysp_bot/dump/bc_dump_1679889604.


## Tables from BrainCircuits hourly dump

Now, let's look at what can be accessed in the dataset. The first two useful tables are the `node_table` and `edge_table`.

### Node table
The `node_table` is indexed by the segment ID and contains information on how many pre- and post-synapses (`nr_pre/post`) are attached to the segment. It also indicates how many up/downstream partners (`nr_up/downstream_partner`) it has. Note that a segment can synapse onto a downstream partner by more than one synapses.

In [4]:
dataset.node_table.head()

Unnamed: 0_level_0,size,nr_pre,nr_downstream_partner,nr_post,nr_upstream_partner
segment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
648518346504017760,3161098536,10151.0,7356.0,2515.0,948.0
648518346496404378,2617142755,18.0,2.0,17.0,1.0
648518346492002161,2084178692,10.0,1.0,10.0,1.0
648518346497002904,1678711950,1539.0,1026.0,147.0,89.0
648518346477750423,1550661059,1926.0,1251.0,244.0,112.0


### Edge table
The `edge_table` contains three columns: the source segment ID (`src`), the destination segment ID (`dst`), and the number of synapses from the source to the destination segment (`count`). There is one row for every pair of (directionally) connected segments.

In [5]:
dataset.edge_table.head()

Unnamed: 0,src,dst,count
0,648518346490423132,648518346491541672,875
1,648518346494533687,648518346489751895,816
2,648518346501894762,648518346492864174,815
3,648518346491135521,648518346499440219,769
4,648518346489439941,648518346526236119,623


Note that `src` and `dst` are _not_ set as the index of the table: this way, you can query from the tables more easily. For example, if you want all connections from segment ID 648518346490423132, you can simply run:
```Python
dataset.edge_table[dataset.edge_table['src'] == 648518346490423132]
```

(Note that by the time you are running this notebook, the segment ID 648518346490423132 might no longer be valid. Simply treat this as a demo.)

## Tables from CAVE

You can also access a selection of tables from CAVE (Connectome Annotation Versioning Engine). These tables are downloaded at an older checkpoint but are re-materialized when a new FANC data dump version is downloaded. This means that all entries in these tables have a segment ID that is consistent with the version in the `node_table` and `edge_table` above. **This up-to-date segment ID is stored in the `remat_segment_id` column; you should always use this ID instead of `target_id`!**

> ⚠️ You should always use the segment ID in the `remat_segment_id` column instead of `target_id`!

For the rest of the tutorial, we will always reset the index to the `remat_segment_id` column.

### Soma table
The first table from CAVE is the soma table. The following description is by the uploader of the table (Sumiya Kuroda):

> Information about all the cell bodies in FANC. Their nuclei were detected using a convolutional neural network
> - `pt_position` (x,y,z): the center
> - `volume` (um^3): the volume
> - `bb` (x,y,z): the bounding box of connected components

Other columns are also present in the table indicating more technical information.

In [6]:
selected_columns = [
    'volume',
    'pt_position',
    'bb_start_position', 'bb_end_position',
    'x', 'y', 'z'  # same as pt_position, but split into individual columns
]
dataset.soma_table.set_index('remat_segment_id')[selected_columns].head()

Unnamed: 0_level_0,volume,pt_position,bb_start_position,bb_end_position,x,y,z
remat_segment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
648518346489008618,1.589229,"[25528, 84220, 2199]","[25136.0, 84000.0, 2167.0]","[25920.0, 84544.0, 2210.0]",25528,84220,2199
648518346494311794,2.80719,"[27260, 86824, 2002]","[27008.0, 86544.0, 1982.0]","[27408.0, 87104.0, 2023.0]",27260,86824,2002
648518346499891411,3.072168,"[28028, 92796, 2662]","[27696.0, 92448.0, 2634.0]","[28512.0, 93248.0, 2700.0]",28028,92796,2662
648518346505341738,3.302213,"[35088, 188960, 791]","[34832.0, 188656.0, 771.0]","[35344.0, 189264.0, 812.0]",35088,188960,791
648518346493996811,3.782539,"[26584, 83696, 2034]","[26176.0, 83344.0, 2013.0]","[26992.0, 84048.0, 2060.0]",26584,83696,2034


### Motor neuron tables
Next, we have the motor neuron tables for the legs (`leg_mn_table`), the halteres (`haltere_mn_table`), the wings (`wing_mn_table`), and the neck (`neck_mn_table`). We will use leg motor neurons as an example here.

The following description is provided by the uploader of the table (Sumiya Kuroda):

> This table contains all T1, T2, and T3 leg motor neurons traced on CATMAID.
> - `classification_system` column indicates the bundle and L vs R neuromeres
> - `cell_type` column shows their skeleton_ids
> - `pt_position` are placed on their nuclei
>
> This table uses soma_jan2022 as a reference table. 

Other columns are also present in the table indicating more technical information.

In [7]:
selected_columns = [
    'classification_system',
    'cell_type',
    'volume',
    'pt_position',
    'bb_start_position', 'bb_end_position',
    'x', 'y', 'z'  # same as pt_position, but split into individual columns
]
dataset.leg_mn_table.set_index('remat_segment_id')[selected_columns].head()

Unnamed: 0_level_0,classification_system,cell_type,volume,pt_position,bb_start_position,bb_end_position,x,y,z
remat_segment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
648518346501223267,L1_T1R,395849,4.174681,"[51580, 86328, 3018]","[51024.0, 86112.0, 2993.0]","[52160.0, 86544.0, 3043.0]",51580,86328,3018
648518346495199158,L7_T2L,430353,5.442486,"[8128, 130928, 2067]","[7760.0, 130688.0, 2037.0]","[8496.0, 131168.0, 2099.0]",8128,130928,2067
648518346489708815,A3_T1L,254411,6.495155,"[18616, 87288, 2182]","[17984.0, 86848.0, 2151.0]","[19248.0, 87728.0, 2195.0]",18616,87288,2182
648518346491966859,L6_T2L,429997,7.975113,"[7560, 133928, 2130]","[7184.0, 133536.0, 2104.0]","[7936.0, 134320.0, 2157.0]",7560,133928,2130
648518346467270407,L6_T2L,396805,8.472266,"[10952, 128576, 2082]","[10672.0, 128208.0, 2053.0]","[11232.0, 128944.0, 2112.0]",10952,128576,2082


### Nerve bundle table
The nerve bundle table marks nerve bundle fibers that either leave or enter the VNC from, for example, a leg. These tables contain motor neurons and sensory neurons. The following description is provided by this table's uploader Leila Elabbady (Tuthill Lab):

> This table includes point locations for incoming and outgoing axons from the following nerves: T1_L, both T2 leg nerves, both T3 leg nerves, ADMN_L, ADMN_R, AbNT, and both Halteres. Classification denotes the nerve and directionality of the fiber. Cell_type denotes the finest label we have to date.

Other columns are also present in the table indicating more technical information.

In [8]:
selected_columns = [
    'classification_system',
    'cell_type',
    'pt_position',
    'x', 'y', 'z'  # same as pt_position, but split into individual columns
]
dataset.nerve_bundle_table.set_index('remat_segment_id')[selected_columns].head()

Unnamed: 0_level_0,classification_system,cell_type,pt_position,x,y,z
remat_segment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
648518346488815453,T1_L_Afferent,CO_Claw,"[9147, 100730, 3740]",9147,100730,3740
648518346501717065,T1_L_Afferent,Hair_plate,"[8904, 100497, 3740]",8904,100497,3740
648518346490998904,T1_L_Afferent,Bristle,"[11043, 100638, 3740]",11043,100638,3740
648518346494804106,T1_L_Afferent,Sensory,"[7191, 103094, 3740]",7191,103094,3740
648518346472907126,T1_L_Afferent,Bristle,"[10942, 100620, 3740]",10942,100620,3740


## Other tables

### Nerve connective table
This table marks ascending and descending neurons that go through the neck connective. Note that for now this is not the same as the connective table on CAVE — I generated this table from a seed plane at y level 75200 marked by the Jefferis Lab. This is because the connective table from FANC is defined above an artificial 0-indexed supervoxel and includes many artifacts (namely very short sections of the axon). **Similar to CAVE tables, you should also use the `remat_segment_id` column.**

This table contains the following self-explanatory columns:

In [9]:
dataset.neck_connective_table.set_index('remat_segment_id').head()

Unnamed: 0_level_0,x,y,z,side
remat_segment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
648518346509725129,40903.0,75200,1223,L
648518346475465856,41010.003906,75200,1212,L
648518346496839576,41175.003906,75200,1212,L
648518346500337624,41229.0,75200,1226,L
648518346493808014,41039.0,75200,1225,L
