# Select a subset of cells for proofreading
In this example, we select the functionally-coregistered excitatory cells without proofreading as potential proofreading targets. Includes use of:
* CAVE materialize.tables  
* pd.merge  
* pd.query
* np.random
* nglui statebuilders

In [22]:
import numpy as np
import pandas as pd 

from datetime import datetime
from caveclient import CAVEclient

First, we'll need to initialize a CAVE client, here for the "minnie65_phase3_v1" dataset. If you don't have access to this dataset you can plug in your own dataset and neurons to follow along.

In [2]:
# Initialize a client for the "minnie65_phase3_v1" datastack.
client = CAVEclient(datastack_name='minnie65_phase3_v1')

# set preferred voxel resolution, for consistency
voxel_resolution = np.array([4,4,40])

# set materialization version, for consistency
materialization = 1007 # 3/26/2024

## Query CAVE tables

### Coregistered cells
The neurons in the dataset that have both functional recordings and EM reconstructions

In [3]:
coreg_table = client.materialize.tables.coregistration_manual_v3().query(
    select_columns={'nucleus_detection_v0': ['id','pt_root_id','pt_supervoxel_id','pt_position'], # from the reference table
                    'coregistration_manual_v3': ['session', 'scan_idx','unit_id','field']},  # functional information
    desired_resolution=voxel_resolution,
    materialization_version=materialization,
    split_positions=True, # here the XYZ positions are split for later filtering
    )

# Drop duplicates (due to multiple recordings of the same cell)
coreg_table.drop_duplicates(subset=['pt_root_id'], keep='first', inplace=True)

The `client.materialize.tables` interface is experimental and might experience breaking changes before the feature is stabilized.


### Cell types table
Using the soma-nucleus feature classifier, which predicts both broad cell classification and cell sub-types. All functional cell should be excitatory, but we will remove the edge cases that are not.  

In [4]:
ct_table = client.materialize.tables.aibs_metamodel_celltypes_v661().query(
    select_columns={'nucleus_detection_v0': ['id','pt_root_id','pt_position'], # from the reference table
                    'aibs_metamodel_celltypes_v661': ['classification_system', 'cell_type']}, # classifier information
    materialization_version=materialization,
    desired_resolution=voxel_resolution)

# remove root_id=0
ct_table = ct_table.query('pt_root_id!=0')

# drop duplicate segment ids (due to soma merges)
ct_table.drop_duplicates(subset=['pt_root_id'], keep='first', inplace=True)

The `client.materialize.tables` interface is experimental and might experience breaking changes before the feature is stabilized.


### Proofread cells
Cells that have been labeled as proofread. We want to exclude cells that already have complete axons from further proofreading. 

In [5]:
prf_table = client.materialize.tables.proofreading_status_public_release().query(
    select_columns=['pt_root_id', 'status_dendrite', 'status_axon',],
    desired_resolution=voxel_resolution,
    materialization_version=materialization,
    )

# Select only the axon extended cells (will be excluded from consideration)
prf_table.query("status_axon=='extended'", inplace=True)

The `client.materialize.tables` interface is experimental and might experience breaking changes before the feature is stabilized.


## Combine and filter tables

### keep coregistered cells that are also excitatory
Note that the removed cells could be either 1) misclassified excitatory cells (more likely) or 2) mis-registered functional cells. We make no judgement here and exclude all for now.

In [9]:
# inner join of coreg and cell types tables
coreg_ct_table = pd.merge(coreg_table, ct_table,
                        on = ['id','pt_root_id'],
                        how='inner' )

# remove cells not classified as excitatory
coreg_ct_table.query("classification_system=='excitatory_neuron'", inplace=True)
coreg_ct_table.drop(columns=['classification_system'], inplace=True)

coreg_ct_table.tail()

Unnamed: 0,id,pt_root_id,pt_supervoxel_id,pt_position_x,pt_position_y,pt_position_z,session,scan_idx,unit_id,field,cell_type,pt_position
11976,395416,864691135394864117,97825062761246370,240528,154032,25770,9,3,2456,2,4P,"[240528, 154032, 25770]"
11977,485825,864691136966674894,105277895836000216,294512,107648,23083,5,6,1538,2,23P,"[294512, 107648, 23083]"
11978,298963,864691135783565875,91354023978086624,193408,175408,21696,6,7,6479,6,4P,"[193408, 175408, 21696]"
11979,302957,864691135697306778,90020659837046869,183456,202752,20953,4,7,7908,8,5P-ET,"[183456, 202752, 20953]"
11980,298916,864691135741494891,90155899901322754,184608,161808,21806,9,3,8134,6,4P,"[184608, 161808, 21806]"


### exclude cells that are already proofread
This merge identifies whether the `root_id` exists in the `both` dataframes, `left_only`, or `right_only`. We want `left_only`

In [12]:
# merge with indicator option, to keep cells that are in coreg_ct_table but not prf_table
coreg_ct_no_prf = pd.merge(coreg_ct_table, prf_table['pt_root_id'],
                           on=['pt_root_id'],
                           how='outer',
                           indicator=True).query('_merge=="left_only"')

coreg_ct_no_prf.drop(columns=['_merge'], inplace=True)

coreg_ct_no_prf.tail()

Unnamed: 0,id,pt_root_id,pt_supervoxel_id,pt_position_x,pt_position_y,pt_position_z,session,scan_idx,unit_id,field,cell_type,pt_position
11904,395416.0,864691135394864117,9.782506e+16,240528.0,154032.0,25770.0,9.0,3.0,2456.0,2.0,4P,"[240528, 154032, 25770]"
11905,485825.0,864691136966674894,1.052779e+17,294512.0,107648.0,23083.0,5.0,6.0,1538.0,2.0,23P,"[294512, 107648, 23083]"
11906,298963.0,864691135783565875,9.135402e+16,193408.0,175408.0,21696.0,6.0,7.0,6479.0,6.0,4P,"[193408, 175408, 21696]"
11907,302957.0,864691135697306778,9.002066e+16,183456.0,202752.0,20953.0,4.0,7.0,7908.0,8.0,5P-ET,"[183456, 202752, 20953]"
11908,298916.0,864691135741494891,9.01559e+16,184608.0,161808.0,21806.0,9.0,3.0,8134.0,6.0,4P,"[184608, 161808, 21806]"


### select cells with soma centroids >100 microns from boundaries
Now we uses the split columns we queried earlier. Given the resolution of the dataset (4x4x40 nm/voxel) we can find soma centroids that are 100 microns from the edge of the volume, enough to keep their dendritic arbors largely intact. 

In [13]:
x_range = [coreg_ct_no_prf.pt_position_x.min(), coreg_ct_no_prf.pt_position_x.max()]
y_range = [coreg_ct_no_prf.pt_position_y.min(), coreg_ct_no_prf.pt_position_y.max()]
z_range = [coreg_ct_no_prf.pt_position_z.min(), coreg_ct_no_prf.pt_position_z.max()]

edge_buffer_xy = 100*1000/voxel_resolution[0] # microns * 1000 / 4 nm resolution
edge_buffer_z = 100*1000/voxel_resolution[2] # microns * 1000 / 40 nm resolution

x_buffer = x_range + np.array([edge_buffer_xy, -edge_buffer_xy])
y_buffer = y_range + np.array([edge_buffer_xy, -edge_buffer_xy])
z_buffer = z_range + np.array([edge_buffer_z, -edge_buffer_z])

In [14]:
# Filter the cell table on soma position
coreg_ct_no_prf.query('(pt_position_x>{}) & (pt_position_x<{})'.format(*x_buffer), inplace=True)
coreg_ct_no_prf.query('(pt_position_z>{}) & (pt_position_z<{})'.format(*z_buffer), inplace=True)

# Note: it was not necessary to filter by Y (depth) in this search

### Subselect 100 cells
Perform a (seeded) random subsampling of the cells, and select 100 for further consideration

In [17]:
# specify the rng seed for reproduction
rng = np.random.default_rng(seed=20240326)

## Random subsample
coreg_ct_no_prf['shuffle'] = rng.permutation(len(coreg_ct_no_prf))
coreg_ct_no_prf.sort_values('shuffle',ascending=True, inplace=True)

# Take top 100
selected_candidates = coreg_ct_no_prf.iloc[:100].copy()

## Create screening link (neuvue)
We want to review the candidates in neuroglancer. The following makes a url link for the seung-lab branch of neuroglancer, specifying some of the annotation tools and viewer settings that are not the datastack default.

### statebuilders

In [18]:
from nglui.statebuilder import StateBuilder
from nglui.statebuilder import ImageLayerConfig, SegmentationLayerConfig, AnnotationLayerConfig, PointMapper, LineMapper
from nglui.statebuilder import helpers

## Generate neuvue

In [20]:
# Build the neuroglancer state
img_source, seg_source = helpers.from_client(client)

img_layer = ImageLayerConfig(
    img_source.source,
)
seg_layer = SegmentationLayerConfig(
        seg_source.source,
        alpha_3d=0.5,
    )

## Add soma locations
pt = PointMapper(
    point_column='pt_position',
    description_column='cell_type',
    linked_segmentation_column='pt_root_id',
)
anno_soma = AnnotationLayerConfig(
    name='soma',
    mapping_rules=pt,
    color='#FFFFFF',
    linked_segmentation_layer='seg',
)

# Generate state
sb = StateBuilder(layers=[img_layer, seg_layer, anno_soma])
temp_state = sb.render_state(selected_candidates, return_as = 'dict')

# Correct resolution
temp_state['navigation']['pose']['position']['voxelSize'] = [4,4,40]

# Manually add JSON state server
temp_state['jsonStateServer'] = "https://global.daf-apis.com/nglstate/api/v1/post"

# Manually add the annotaion tags
temp_state['layers'][2]['annotationTags'] = [
    {'id': 1, 'label': 'remove'},
] 

# Generate the state with intermediate additions
new_sb = StateBuilder(base_state=temp_state)
url = helpers.make_url_robust(selected_candidates, new_sb, client, shorten='always')
url

'https://neuroglancer.neuvue.io/?json_url=https://global.daf-apis.com//nglstate/api/v1/4506696414134272'

## Export selected table
Save the generated candidate-cell table in pickle format, and alternately as a .csv

In [21]:
# save candidate cells in two formats
selected_candidates.to_pickle('coreg_100_cells_240326.pkl')

selected_candidates_csv = selected_candidates[['id','pt_root_id','pt_position','pt_supervoxel_id','cell_type']]
selected_candidates_csv.to_csv('coreg_100_cells_240326.csv', index=False)