<img src="resources/cropped-SummerWorkshop_Header.png">  

<h1 align="center">Workshop SWDB 2024 </h1> 
<h3 align="center">EM data access via CAVE</h3> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
   
The connectome data (synapses, cell types, etc.) can be accessed programmatically via CAVE, the <a href=https://caveconnectome.github.io/sections/cave_overview.html>Connectome Annotation Versioning Engine</a>. 
    
This notebook assumes that you already completed the <a href=https://allenswdb.github.io/anatomy/microns-em/em-caveclient-setup.html> CAVEsetup </a>.
    
</div>

In [1]:
%load_ext autoreload

In [2]:
import caveclient
import pandas as pd
import os
import numpy as np

## Initialize CAVEclient with a datastack
    
Datasets in CAVE are organized as datastacks. These are a combination of an EM dataset, a segmentation and a set of annotations. The datastack for MICrONS public release is `minnie65_public`. When you instantiate your client with this datastack, it loads all relevant information to access it.   
    

In [3]:
client = caveclient.CAVEclient("minnie65_public", auth_token=os.environ["API_SECRET"])

## Materialization versions

Data in CAVE is timestamped and periodically versioned - each (materialization) version corresponds to a specific timestamp. Individual versions are made publicly available. The materialization service provides annotation queries to the dataset. It is available under `client.materialize`.

Currently the following versions are publicly available (in this workshop we will be using 1078):



In [4]:
client.materialize.get_versions()

[1078, 117, 661, 343, 795, 943]

And these are their associated timestamps (all timestamps are in UTC):



In [5]:
for version in client.materialize.get_versions():
    print(f"Version {version}: {client.materialize.get_timestamp(version)}")

Version 1078: 2024-06-05 10:10:01.203215+00:00
Version 117: 2021-06-11 08:10:00.215114+00:00
Version 661: 2023-04-06 20:17:09.199182+00:00
Version 343: 2022-02-24 08:10:00.184668+00:00
Version 795: 2023-08-23 08:10:01.404268+00:00
Version 943: 2024-01-22 08:10:01.497934+00:00


The client will automatically query the latest materialization version. You can specify a `materialization_version` for every query if you want to access a specific version.



## Tables and generally useful information

A datastack has a large number of tables that can be intimidating to traverse at first. CAVE provides several ways to find the tables you may want use. To print all tables that are available run:

In [6]:
client.materialize.get_tables()

['proofreading_status_and_strategy',
 'synapse_target_structure',
 'aibs_metamodel_celltypes_v661',
 'nucleus_alternative_points',
 'allen_column_mtypes_v2',
 'bodor_pt_cells',
 'aibs_metamodel_mtypes_v661_v2',
 'allen_v1_column_types_slanted_ref',
 'aibs_column_nonneuronal_ref',
 'nucleus_ref_neuron_svm',
 'apl_functional_coreg_vess_fwd',
 'vortex_compartment_targets',
 'baylor_log_reg_cell_type_coarse_v1',
 'functional_properties_v3_bcm',
 'l5et_column',
 'pt_synapse_targets',
 'proofreading_status_public_release',
 'coregistration_auto_phase3_fwd_apl_vess_combined',
 'coregistration_manual_v4',
 'nucleus_neuron_svm',
 'coregistration_manual_v3',
 'vortex_manual_myelination_v0',
 'synapses_pni_2',
 'nucleus_detection_v0',
 'vortex_manual_nodes_of_ranvier',
 'bodor_pt_target_proofread',
 'vortex_astrocyte_proofreading_status',
 'nucleus_functional_area_assignment',
 'coregistration_auto_phase3_fwd']

For each datastack, CAVE stores information about key data sources and parameters. These can be accessed through:

In [7]:
client.info.get_datastack_info()

{'aligned_volume': {'id': 1,
  'name': 'minnie65_phase3',
  'display_name': 'Minnie65',
  'description': "This is the second alignment of the IARPA 'minnie65' dataset, completed in the spring of 2020 that used the seamless approach.",
  'image_source': 'precomputed://https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/em'},
 'segmentation_source': 'graphene://https://minnie.microns-daf.com/segmentation/table/minnie65_public',
 'skeleton_source': 'precomputed://https://minnie.microns-daf.com/skeletoncache/api/v1/minnie65_public/precomputed/skeleton/',
 'analysis_database': None,
 'viewer_site': 'https://neuroglancer.neuvue.io',
 'synapse_table': 'synapses_pni_2',
 'soma_table': 'nucleus_detection_v0',
 'local_server': 'https://minnie.microns-daf.com',
 'description': 'This is the publicly released version of the minnie65 volume and segmentation. ',
 'viewer_resolution_x': 4.0,
 'viewer_resolution_y': 4.0,
 'viewer_resolution_z': 40.0,
 'proofreading_status_table': No

For instance, the synapse table is defined as `synapses_pni_2` and the cell body table as `nucleus_detection_v0`. 

## Query 1: Querying cells and their types

### Querying cell bodies

The basic querying logic of CAVE is `client.materialize.query_table`. This accepts at least a table as parameter. Let's query the table of all automatically segmented nuclei:

In [8]:
nucleus_table_name = client.info.get_datastack_info()["soma_table"]
nucleus_df = client.materialize.query_table(nucleus_table_name, split_positions=True)
nucleus_df.head(5)

Unnamed: 0,id,created,superceded_id,valid,volume,pt_position_x,pt_position_y,pt_position_z,bb_start_position_x,bb_start_position_y,bb_start_position_z,bb_end_position_x,bb_end_position_y,bb_end_position_z,pt_supervoxel_id,pt_root_id
0,730537,2020-09-28 22:40:41.780734+00:00,,t,32.307937,381312,273984,19993,,,,,,,0,0
1,373879,2020-09-28 22:40:41.781788+00:00,,t,229.045043,228816,239776,19593,,,,,,,96218056992431305,864691136090135607
2,601340,2020-09-28 22:40:41.782714+00:00,,t,426.13801,340000,279152,20946,,,,,,,0,0
3,201858,2020-09-28 22:40:41.783784+00:00,,t,93.753836,146848,213600,26267,,,,,,,84955554103121097,864691135373893678
4,600774,2020-09-28 22:40:41.785273+00:00,,t,135.189791,339120,276112,19442,,,,,,,0,0


Every annotation table has at least one position column (here: `pt_position`) which serves as anchor to the segmentation. These positions are automatically associated to the segmentation using `pt_root_id`s which can be thought of segment or cell IDs. Beyond positions and their associated IDs, every table stores metadata. For instance, the nucleus table contains the `volume` of each cell body.

Every table has a description and metadata attached to it that describes how the data was generated, limitations of it, and papers to cite when using it:

In [9]:
client.materialize.get_table_metadata(nucleus_table_name)

{'created': '2020-11-02T18:56:35.530100',
 'schema': 'nucleus_detection',
 'table_name': 'nucleus_detection_v0',
 'id': 38256,
 'aligned_volume': 'minnie65_phase3',
 'valid': True,
 'schema_type': 'nucleus_detection',
 'user_id': '121',
 'description': 'A table of nuclei detections from a nucleus detection model developed by Shang Mu, Leila Elabbady, Gayathri Mahalingam and Forrest Collman. Pt is the centroid of the nucleus detection. id corresponds to the flat_segmentation_source segmentID. Only included nucleus detections of volume>25 um^3, below which detections are false positives, though some false positives above that threshold remain. ',
 'notice_text': None,
 'reference_table': None,
 'flat_segmentation_source': 'precomputed://https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/nuclei',
 'write_permission': 'PRIVATE',
 'read_permission': 'PUBLIC',
 'last_modified': '2022-10-25T19:24:28.559914',
 'segmentation_source': '',
 'pcg_table_name': 'minnie3_v1',
 'l

### Querying cell type information 

There are two distinct ways cell types were classified in the MICrONS dataset: manual and automated. Manual annotations are available for ~1,000 neurons (`allen_v1_column_types_slanted_ref`), automated classifications are available for all cell bodies based on these manual annotations (`aibs_metamodel_celltypes_v661`). Because they are annotating an existing annotations, these annotations are introduced as a "reference" table:

In [10]:
ct_manual_not_merged_df = client.materialize.query_table("allen_v1_column_types_slanted_ref", desired_resolution=[1, 1, 1], 
                                                         split_positions=True, merge_reference=False)

ct_manual_not_merged_df.head(5)

Unnamed: 0,id,created,superceded_id,valid,target_id,classification_system,cell_type
0,1,2023-03-18 14:13:21.577622+00:00,,t,298945,aibs_coarse_excitatory,5P-IT
1,2,2023-03-18 14:13:21.578408+00:00,,t,294715,aibs_coarse_excitatory,23P
2,3,2023-03-18 14:13:21.579093+00:00,,t,264649,aibs_coarse_inhibitory,MC
3,4,2023-03-18 14:13:21.579790+00:00,,t,294489,aibs_coarse_excitatory,23P
4,5,2023-03-18 14:13:21.580463+00:00,,t,292708,aibs_coarse_excitatory,23P


Reference annotations contain `target_id` to merge them onto the table they target (here: the nucleus table). But do not worry, CAVE automatically merges them onto their target table by default (`merge_reference=True`):

In [11]:
ct_manual_df = client.materialize.query_table("allen_v1_column_types_slanted_ref", desired_resolution=[1, 1, 1], split_positions=True)

# remove segments with multiple cell bodies
ct_manual_df.drop_duplicates("pt_root_id", keep=False, inplace=True)
ct_manual_df.head(5)

Unnamed: 0,id_ref,created_ref,valid_ref,volume,pt_position_x,pt_position_y,pt_position_z,bb_start_position_x,bb_start_position_y,bb_start_position_z,...,bb_end_position_y,bb_end_position_z,pt_supervoxel_id,pt_root_id,id,created,valid,target_id,classification_system,cell_type
0,258319,2020-09-28 22:40:42.476911+00:00,t,261.806162,713600.0,572992.0,849520.0,,,,...,,,89309001002848425,864691137054893686,50,2023-03-18 14:13:21.613360+00:00,t,258319,aibs_coarse_excitatory,23P
1,276438,2020-09-28 22:40:42.700226+00:00,t,277.317714,718592.0,1035072.0,943880.0,,,,...,,,89465269428261699,864691136487559186,1119,2023-03-18 14:13:22.506660+00:00,t,276438,aibs_coarse_excitatory,6P-CT
2,260552,2020-09-28 22:40:42.745779+00:00,t,230.111805,709632.0,631872.0,840080.0,,,,...,,,89170256379033022,864691136040432126,35,2023-03-18 14:13:21.602813+00:00,t,260552,aibs_coarse_excitatory,23P
3,260263,2020-09-28 22:40:42.746658+00:00,t,274.324193,677760.0,632512.0,810640.0,,,,...,,,88044356338331571,864691136334881203,95,2023-03-18 14:13:21.644304+00:00,t,260263,aibs_coarse_excitatory,23P
4,262898,2020-09-28 22:40:42.749245+00:00,t,230.092308,690048.0,701120.0,878560.0,,,,...,,,88468836747612860,864691135759892302,81,2023-03-18 14:13:21.634505+00:00,t,262898,aibs_coarse_inhibitory,BPC


The reference table added two additional data columns: `classification_system` and `cell_type`. The `classification_system` divides the cells into excitatitory and inhibitory neurons as well as non-neuronal cells. `cell_type` provides lower level cell annotations. 

Next, we query the automatically classified cell type information. The query works the same way:

In [12]:
ct_auto_df = client.materialize.query_table("aibs_metamodel_celltypes_v661", desired_resolution=[1, 1, 1], split_positions=True)

# remove segments with multiple cell bodies
ct_auto_df.drop_duplicates("pt_root_id", keep=False, inplace=True)
ct_auto_df.head(5)

Unnamed: 0,id_ref,created_ref,valid_ref,volume,pt_position_x,pt_position_y,pt_position_z,bb_start_position_x,bb_start_position_y,bb_start_position_z,...,bb_end_position_y,bb_end_position_z,pt_supervoxel_id,pt_root_id,id,created,valid,target_id,classification_system,cell_type
0,336365,2020-09-28 22:42:48.966292+00:00,t,272.488202,839040.0,723328.0,1083040.0,,,,...,,,93606511657924288,864691136274724621,36916,2023-12-19 22:47:18.659864+00:00,t,336365,excitatory_neuron,5P-IT
1,110648,2020-09-28 22:45:09.650639+00:00,t,328.533443,425792.0,518528.0,1016400.0,,,,...,,,79385153184885329,864691135489403194,1070,2023-12-19 22:38:00.472115+00:00,t,110648,excitatory_neuron,23P
2,112071,2020-09-28 22:43:34.088785+00:00,t,272.929423,414784.0,597888.0,623320.0,,,,...,,,79035988248401958,864691136147292311,1099,2023-12-19 22:38:00.898837+00:00,t,112071,excitatory_neuron,23P
3,197927,2020-09-28 22:43:10.652649+00:00,t,91.308851,574400.0,744768.0,1058840.0,,,,...,,,84529699506051734,864691136050858227,13259,2023-12-19 22:41:14.417986+00:00,t,197927,nonneuron,oligo
4,198087,2020-09-28 22:41:36.677186+00:00,t,161.744978,551808.0,763776.0,1094440.0,,,,...,,,83756261929388963,864691135809440972,13271,2023-12-19 22:41:14.685474+00:00,t,198087,nonneuron,astrocyte


In [13]:
ct_auto_df["classification_system"].value_counts()

classification_system
excitatory_neuron    63758
nonneuron            18689
inhibitory_neuron     7849
Name: count, dtype: int64

In [14]:
ct_auto_df["cell_type"].value_counts()

cell_type
23P          19643
4P           14722
6P-IT        11636
5P-IT         7887
astrocyte     7105
oligo         6899
6P-CT         6755
BC            3310
MC            2434
microglia     2392
5P-ET         2158
BPC           1484
OPC           1447
5P-NP          957
pericyte       846
NGC            621
Name: count, dtype: int64

### Storing cell type information 

For the use in the course we are writing out the cell type tables for easier use. The feather format is an efficient format to store these tables and is natively supported in pandas:

In [15]:
ct_manual_df.to_feather("cell_types_microns_1078_manual.feather")

In [16]:
ct_auto_df.to_feather("cell_types_microns_1078_auto.feather")

## Query 2: Querying synapses and proofread neurons

### Proofread neurons

Proofreading is necessary to obtain accurate reconstructions of a cell. In the MICrONS dataset, the general rule is that dendrites onto cells with a _single_ cell body are sufficiently proofread to trust synaptic connections onto a cell. Axons on the other hand require so much proofread that only ~1,000 cells have axons that were proofread to various degrees such that their outputs can be used for analysis.

The table `proofreading_status_and_strategy` contains proofreading information about ~1,300 neurons. <a href=https://www.microns-explorer.org/manifests/mm3-proofreading>This website</a> provides the most detailed overview. In brief, axons annotated with any `strategy_axon` were cleaned of false mergers but not all were fully extended. The most important distinction is axons annotated with `axon_column_truncated` were only proofread within a certain volume wheras others were proofread without such bias. 

In [34]:
proof_all_df = client.materialize.query_table("proofreading_status_and_strategy", desired_resolution=[1, 1, 1], split_positions=True)

Table Owner Notice on proofreading_status_and_strategy: NOTE: this table supercedes 'proofreading_status_public_release'. For more details, see: www.microns-explorer.org/manifests/mm3-proofreading.


In [35]:
proof_all_df["strategy_axon"].value_counts()

strategy_axon
axon_column_truncated      598
axon_partially_extended    341
none                       213
axon_interareal            146
axon_fully_extended         77
Name: count, dtype: int64

We can filter our query to only return rows that match a condition by adding a filter to our query:

In [36]:
proof_df = client.materialize.query_table("proofreading_status_and_strategy", filter_in_dict={"strategy_axon": ["axon_partially_extended", "axon_fully_extended", "axon_interareal", "axon_column_truncated"]}, desired_resolution=[1, 1, 1], split_positions=True)

Table Owner Notice on proofreading_status_and_strategy: NOTE: this table supercedes 'proofreading_status_public_release'. For more details, see: www.microns-explorer.org/manifests/mm3-proofreading.


In [37]:
proof_df["strategy_axon"].value_counts()

strategy_axon
axon_column_truncated      598
axon_partially_extended    341
axon_interareal            146
axon_fully_extended         77
Name: count, dtype: int64

Again, we store this file for future use:

In [38]:
proof_df.to_feather("proofread_axons_microns_1078.feather")

### Synapse query

The MICrONS dataset relies on automatically detected synapses for connectivity information. The consortium automatically detected and associated a total of 337 million synaptic clefts. The detections were evaluated by manually identifying synapses in 70 small subvolumes (n=8,611 synapses) distributed across the dataset, giving the automated detection an estimated precision of 96% and recall of 89% with a partner assignment accuracy of 98%.

We can query the synapse table directly. However, it is too large to query all at once. CAVE limits to queries to 500,000 rows at once and will display a warning when that happens. Here, we demonstrate this with the limit set to 10:

In [39]:
synapse_table_name = client.info.get_datastack_info()["synapse_table"]
syn_df = client.materialize.query_table(synapse_table_name, limit=10, desired_resolution=[1, 1, 1], split_positions=True)
syn_df

201 - "Limited query to 10 rows


Unnamed: 0,id,created,superceded_id,valid,pre_pt_position_x,pre_pt_position_y,pre_pt_position_z,post_pt_position_x,post_pt_position_y,post_pt_position_z,ctr_pt_position_x,ctr_pt_position_y,ctr_pt_position_z,size,pre_pt_supervoxel_id,pre_pt_root_id,post_pt_supervoxel_id,post_pt_root_id
0,4456,2020-11-04 13:02:08.388988+00:00,,t,211448.0,409744.0,801440.0,211448.0,409744.0,801440.0,211612.0,410172.0,801400.0,2956,72063160986635724,864691135533713769,72063160986635724,864691135533713769
1,4503,2020-11-04 12:09:33.286834+00:00,,t,212456.0,408032.0,800360.0,212456.0,408032.0,800360.0,212168.0,408088.0,800400.0,344,72063092267156962,864691135087527094,72063092267156962,864691135087527094
2,4508,2020-11-04 13:02:13.024144+00:00,,t,212448.0,411696.0,801440.0,212448.0,411696.0,801440.0,212224.0,411800.0,801560.0,344,72063229706111827,864691135533713769,72063229706111827,864691135533713769
3,4568,2020-11-04 13:44:08.085705+00:00,,t,213392.0,415448.0,802920.0,213392.0,415448.0,802920.0,213096.0,415176.0,802880.0,13816,72133735889250131,864691134530418554,72133735889250131,864691134530418554
4,4581,2020-11-04 07:29:12.917622+00:00,,t,213552.0,417184.0,800800.0,213552.0,417184.0,800800.0,213240.0,417080.0,801080.0,10436,72133804608718799,864691134745062676,72133804608718799,864691134745062676
5,4582,2020-11-04 13:02:17.694701+00:00,,t,212880.0,409120.0,801440.0,212880.0,409120.0,801440.0,213016.0,408832.0,801520.0,1344,72063160986636743,864691135533713769,72063160986636743,864691135533713769
6,4588,2020-11-04 12:20:12.290593+00:00,,t,213200.0,421120.0,805520.0,213200.0,421120.0,805520.0,213064.0,421000.0,805600.0,7128,72133942047682150,864691134609767690,72133942047682150,864691134609767690
7,4590,2020-11-04 13:20:01.875310+00:00,,t,213504.0,406440.0,805160.0,213504.0,406440.0,805160.0,213336.0,406596.0,805200.0,6572,72133461011344162,864691135091400630,72133461011344162,864691135091400630
8,4606,2020-11-04 07:24:39.038223+00:00,,t,213384.0,413792.0,800800.0,213384.0,413792.0,800800.0,213256.0,413976.0,801040.0,2100,72133667169766499,864691134609872906,72133667169766499,864691134609872906
9,4611,2020-11-04 07:24:37.800341+00:00,,t,213336.0,415304.0,800960.0,213336.0,415304.0,800960.0,213192.0,415604.0,800960.0,492,72133735889243887,864691134609872906,72133735889243887,864691134609872906


Instead we need to limit our query to a few neurons. We can query the graph spanned by the neurons with "clean" axons using the `filter_in_dict` parameter (takes ~3 mins):

In [40]:
%%time 

synapse_table_name = client.info.get_datastack_info()["synapse_table"]
syn_proof_only_df = client.materialize.query_table(synapse_table_name, desired_resolution=[1, 1, 1], split_positions=True,
                                                   filter_in_dict={"pre_pt_root_id": proof_df["pt_root_id"], 
                                                                   "post_pt_root_id": proof_df["pt_root_id"]})

# remove autapses
syn_proof_only_df = syn_proof_only_df[syn_proof_only_df["pre_pt_root_id"] != syn_proof_only_df["post_pt_root_id"]]
syn_proof_only_df

CPU times: user 97.2 ms, sys: 39 ms, total: 136 ms
Wall time: 3min 51s


Unnamed: 0,id,created,superceded_id,valid,pre_pt_position_x,pre_pt_position_y,pre_pt_position_z,post_pt_position_x,post_pt_position_y,post_pt_position_z,ctr_pt_position_x,ctr_pt_position_y,ctr_pt_position_z,size,pre_pt_supervoxel_id,pre_pt_root_id,post_pt_supervoxel_id,post_pt_root_id
0,144648847,2020-11-04 10:25:21.607615+00:00,,t,671704.0,459200.0,903440.0,671680.0,459072.0,903320.0,671704.0,459104.0,903400.0,5720,87827409285525264,864691135645645039,87827409285520659,864691136143786292
1,152094175,2020-11-04 07:20:36.943498+00:00,,t,687808.0,776304.0,791480.0,687984.0,776152.0,791440.0,688008.0,776296.0,791720.0,14020,88401010355440214,864691135865371262,88401010355435596,864691135561699041
2,137764071,2020-11-04 09:39:48.804128+00:00,,t,661616.0,539216.0,800080.0,662208.0,539400.0,800200.0,661920.0,539208.0,799960.0,21584,87478245288662989,864691135645645039,87548614032842358,864691135754259661
3,158842100,2020-11-04 10:11:44.081977+00:00,,t,717304.0,389656.0,837280.0,717592.0,389400.0,837680.0,717536.0,389448.0,837400.0,4388,89443553671426299,864691135988854016,89443553671436885,864691135954940424
4,169627147,2020-11-04 06:48:58.480065+00:00,,t,731568.0,853496.0,864160.0,731736.0,853464.0,864440.0,731472.0,853296.0,864320.0,1932,89951734335675875,864691135463863102,89951734335679095,864691137197468481
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
131196,173100472,2020-11-04 08:28:28.208230+00:00,,t,742496.0,561928.0,846480.0,743024.0,561776.0,846440.0,742712.0,561824.0,846520.0,6964,90293751104382021,864691135359413848,90293751104388154,864691135404792046
131197,161970989,2020-11-04 09:39:51.371123+00:00,,t,715208.0,727160.0,886280.0,714840.0,727184.0,885880.0,715076.0,726748.0,886040.0,2592,89384523841861858,864691136117421732,89314155097665442,864691135586346876
131198,146940592,2020-11-04 07:31:42.648641+00:00,,t,678016.0,719320.0,837000.0,678368.0,719160.0,836960.0,678216.0,719480.0,837000.0,20772,88047242623579607,864691135688058592,88117611367744936,864691135586346876
131199,368508637,2020-11-04 10:36:39.850466+00:00,,t,1197608.0,853320.0,991680.0,1197760.0,853888.0,991800.0,1197688.0,853600.0,991920.0,11364,105925439666802964,864691135273591825,105925439666815556,864691136137140093


In [41]:
syn_proof_only_df.to_feather("syn_proofread_axons_microns_1078.feather")

### Extended synapse query

In [42]:
synapse_table_name = client.info.get_datastack_info()["synapse_table"]
syn_proof_out_df_blocks = []

for root_id_block in np.array_split(proof_df["pt_root_id"], 20):
    syn_proof_out_df_blocks.append(client.materialize.query_table(synapse_table_name, filter_in_dict={"pre_pt_root_id": root_id_block}, desired_resolution=[1, 1, 1], split_positions=True))

syn_proof_out_df = pd.concat(syn_proof_out_df_blocks)
    
# remove autapses
syn_proof_out_df = syn_proof_out_df[syn_proof_out_df["pre_pt_root_id"] != syn_proof_out_df["post_pt_root_id"]]
syn_proof_out_df

  return bound(*args, **kwds)


Unnamed: 0,id,created,superceded_id,valid,pre_pt_position_x,pre_pt_position_y,pre_pt_position_z,post_pt_position_x,post_pt_position_y,post_pt_position_z,ctr_pt_position_x,ctr_pt_position_y,ctr_pt_position_z,size,pre_pt_supervoxel_id,pre_pt_root_id,post_pt_supervoxel_id,post_pt_root_id
0,228711288,2020-11-04 06:48:58.343057+00:00,,t,880640.0,529376.0,656360.0,880568.0,529304.0,655920.0,880584.0,529360.0,656080.0,2984,95007425567978633,864691135946980644,95007425567972423,864691135182553602
1,117425064,2020-11-04 06:48:58.343057+00:00,,t,610792.0,486152.0,1087520.0,610392.0,486136.0,1087920.0,610504.0,486128.0,1087720.0,2988,85787609661534918,864691135855890478,85717240917368623,864691136385578623
2,205635390,2020-11-04 07:26:27.893733+00:00,,t,828048.0,549464.0,731760.0,827560.0,549520.0,731720.0,827780.0,549488.0,731720.0,2812,93248825640583083,864691136025099065,93248825640577496,864691135976624707
3,206366790,2020-11-04 06:48:59.403833+00:00,,t,829344.0,669624.0,933920.0,829768.0,670024.0,934080.0,829580.0,669948.0,933960.0,9560,93252880760716792,864691135759685966,93323249504888873,864691135082749175
4,144046168,2020-11-04 06:48:58.480065+00:00,,t,671480.0,417808.0,628240.0,671528.0,417808.0,628040.0,671552.0,417872.0,628200.0,1928,87826033956760003,864691135759685966,87826033956758154,864691133879805766
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33789,305793989,2020-11-04 14:29:23.588485+00:00,,t,1059800.0,517928.0,1072760.0,1060288.0,518168.0,1072560.0,1060064.0,517968.0,1072560.0,7456,101199464081118816,864691135346361759,101199464081116249,864691135937611780
33790,347009270,2020-11-04 10:11:28.504178+00:00,,t,1159592.0,729608.0,963480.0,1159616.0,730112.0,963160.0,1159600.0,729800.0,963440.0,9076,104654610316200293,864691135275843557,104654679035667446,864691136144008244
33791,280164511,2020-11-04 07:42:45.145523+00:00,,t,986328.0,771152.0,785160.0,986336.0,771144.0,785600.0,986348.0,771476.0,785480.0,6332,98674709566287540,864691135468780172,98674709566294046,864691135778611133
33792,352146653,2020-11-04 08:37:57.173301+00:00,,t,1161160.0,814568.0,978640.0,1161112.0,814832.0,978720.0,1161104.0,814728.0,978760.0,4648,104657496534628746,864691137198382657,104657496534629859,864691136144008244


In [43]:
syn_proof_out_df.to_feather("syn_proofread_axons_all_out_microns_1078.feather")

In [44]:
synapse_table_name = client.info.get_datastack_info()["synapse_table"]
syn_proof_in_df_blocks = []

for root_id_block in np.array_split(proof_df["pt_root_id"], 20):
    syn_proof_in_df_blocks.append(client.materialize.query_table(synapse_table_name, filter_in_dict={"post_pt_root_id": root_id_block}, desired_resolution=[1, 1, 1], split_positions=True))

syn_proof_in_df = pd.concat(syn_proof_in_df_blocks)
    
# remove autapses
syn_proof_in_df = syn_proof_in_df[syn_proof_in_df["pre_pt_root_id"] != syn_proof_in_df["post_pt_root_id"]]
syn_proof_in_df

  return bound(*args, **kwds)


Unnamed: 0,id,created,superceded_id,valid,pre_pt_position_x,pre_pt_position_y,pre_pt_position_z,post_pt_position_x,post_pt_position_y,post_pt_position_z,ctr_pt_position_x,ctr_pt_position_y,ctr_pt_position_z,size,pre_pt_supervoxel_id,pre_pt_root_id,post_pt_supervoxel_id,post_pt_root_id
0,126794016,2020-11-04 06:48:58.343057+00:00,,t,633368.0,723776.0,888600.0,633248.0,723664.0,888480.0,633280.0,723800.0,888520.0,2976,86569705355370503,864691136247875344,86569636635857973,864691135214129208
1,361055445,2020-11-04 06:48:59.787324+00:00,,t,1193080.0,628056.0,1026040.0,1192688.0,628128.0,1026360.0,1192920.0,628232.0,1026120.0,2500,105777143170039650,864691135945856136,105777143170045657,864691135700409211
2,113373305,2020-11-04 06:48:58.480065+00:00,,t,602912.0,762792.0,798440.0,602776.0,762920.0,798440.0,602896.0,762880.0,798440.0,1900,85515479527576676,864691133933869432,85515479527571345,864691135617152361
3,456611088,2020-11-04 06:48:59.326085+00:00,,t,1410648.0,729000.0,850240.0,1410800.0,728960.0,850520.0,1410672.0,728960.0,850280.0,5140,113239596703504810,864691135479280180,113239596703516355,864691135572530981
4,203335378,2020-11-04 06:48:59.326085+00:00,,t,803040.0,764056.0,816560.0,802728.0,763688.0,816480.0,802816.0,763832.0,816640.0,5160,92411616523999913,864691136899330670,92341247779802107,864691135781981776
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207260,481463352,2020-11-04 13:20:01.875310+00:00,,t,1486040.0,617232.0,912400.0,1486032.0,617448.0,912480.0,1486088.0,617424.0,912560.0,256,115839460868228903,864691136379227093,115839529587652810,864691135867905302
207261,357622673,2020-11-04 07:53:54.127961+00:00,,t,1182368.0,692480.0,782840.0,1182176.0,692656.0,782640.0,1182096.0,692640.0,782680.0,2056,105427428947681865,864691136912667633,105427428947676212,864691135463909789
207262,344256750,2020-11-04 13:26:44.330700+00:00,,t,1155832.0,559136.0,925360.0,1156424.0,559424.0,925200.0,1156112.0,559168.0,925120.0,5564,104508168977140289,864691135463683646,104508168977136537,864691135346361759
207263,349979673,2020-11-04 09:42:22.641715+00:00,,t,1173616.0,628536.0,1011360.0,1173416.0,628704.0,1011480.0,1173424.0,628680.0,1011480.0,308,105073455661330461,864691136310627674,105073455661343088,864691135346361759


In [45]:
syn_proof_in_df.to_feather("syn_proofread_axons_all_in_microns_1078.feather")

In [46]:
syn_proof_df = pd.concat([syn_proof_out_df, syn_proof_in_df]).drop_duplicates("id", keep="first")
syn_proof_df.to_feather("syn_proofread_axons_all_microns_1078.feather")