# The FrameworkClient

The FrameworkClient is a client side library to allow easy interaction with the services within CAVE (connectome annotation versioning engine, also known as Dynamic Annotation Framework), eg. the annotations, stateserver. The github repository is public:
https://github.com/seung-lab/AnnotationFrameworkClient

The library can be installed directly from the github repository or from the prebuilt versions using pip:
```
pip install annotationframeworkclient
```


## Tutorials

This tutorial mainly covers the interactions with the materialized annotation tables. More information and better explanations of the other functionalities of the client can be found in the following tutorial. Please be advised that depending on your permission level you may not be able to execute all queries in this tutorial with the preset parameters as it was written with defaults for iarpa's microns project:
https://github.com/seung-lab/AnnotationFrameworkClient/blob/master/FrameworkClientExamples.ipynb


## Authentication & Authorization

If this is your first time to interact with any part of CAVE, chances are you need to setup your local credentials for your FlyWire account first. Please follow the section "Setting up your credentials" at the beginning of the tutorial above to do so.

You will need to have access to the FlyWire's production dataset to retrieve annotations. Otherwise you will see

```HTTPError: 403 Client Error: FORBIDDEN for url```

errors upon querying the materialization server.

## Initialize FrameworkClient

The FrameworkClient is instantiated with a datastack name. A datastack is a set of segmentation, and annotation tables and lives within an aligned volume (the coordinate space). FlyWire's main datastack is `flywire_fafb_production`, the aligned volume is `fafb_seung_alignment_v0` (v14.1). For convenience, there are other defaults set on the datastack level.

In [1]:
import numpy as np
from annotationframeworkclient import FrameworkClient

In [2]:
datastack_name = "flywire_fafb_production"
client = FrameworkClient(datastack_name)

## Annotation tables

Annotations are represented by points in space and parameters (such as size, type). At specific timepoints, annotations are combined with the (proofread) segmentation to create a materialized version of the annotation table. The AnnotationEngine (`client.annotation`) owns the raw annotations and the Materialization Service (`client.materialize`) owns the materialized versions of these tables. 

To check what annotation tables are visible to you run

In [3]:
client.annotation.get_tables()

Every table has metadata associated with it which includes information about the owner/creator, a description and a schema that annotations in this table follow. Please review the metadata of any table you might use in the future before using it as it might contain instructions and restrictions for its usage and how to credit it's creators. For instance, the (v1) synapse table (`synapses_nt_v1`) includes an extensive description on all its columns, credits people that created it, contains instructions for citing this resource among others:

In [4]:
meta_data = client.annotation.get_table_metadata("synapses_nt_v1")
print(meta_data["description"])

The meta data contains information about the schema which ultimately determines how annotations in a table are structured. All annotations in a table follow the same schema. The synapse table follows the `fly_nt_synapse` schema:

In [5]:
meta_data["schema_type"]

## Materialized annotation tables & Queries

```
materialization = annotation + segmentation snapshot
```

As the segmentation and annotations change over time, we need to create snapshots of a combined view of them (materialized versions). Materialized versions of the annotation tables are (automatically) generated at a certain frequency. In addition to that, we are planning to include an option to retrieve any timestamp since the latest materialization ("live") but that is not available at the moment. 

There are usually a number of materialized versions available at the same time:

In [6]:
client.materialize.get_versions()

Each version comes with meta data about the time when it was created and when it will be deleted (expired). Different tables have different lifetimes and some may be LTS versions. The exact frequency and lifetime of tables will depend on how the community is using these tables. 

In [10]:
client.materialize.get_version_metadata(27)

Each materialization version contains a set of annotation tables. At the moment all tables are included in a materialization but in the future we might not include all tables in every materialization:

In [12]:
client.materialize.get_tables(version=27)

### Queries

Here, we demonstrate some queries with the synapses from Buhmann et al..

Each table in this list is stored as a SQL table on the backend. The client allows users to query these tables through the frontend of the Materialization Service conventiently without the need for SQL specific language. The client will format the results as pandas dataframes. Queries are restricted to a size of 200k rows to not overwhelm the server. Should a query result in a larger list of rows, only the first 200k are returned. For bulk downloads (eg. for data preservation before a publication) please contact us.

To deomstrate this this query would pull the entire table but will only gather 200k rows (should take <2min). In the future, a warning will be raised in such cases but at the moment, the server silently cuts the query short.

In [14]:
%%time

syn_df = client.materialize.query_table("synapses_nt_v1", materialization_version=27)
len(syn_df)

Here, we set the materialization version specifically. If the materialization version is not specified, the query defaults to the most recent version.

Let's take a brief look at the columns to illustrate how the materialization extends an annotation table:

In [15]:
syn_df.head()

Annotations consist of parameters and spatial points. Some or all of these spatial points are what we call "BoundSpatialPoints". These are linked to the segmentation during materialization. The synapse tables have two such points (`pre_pt`, `post_pt`). Per point there are three columns: `*_position`, `*_supervoxel_id`, `*_root_id`. Supervoxels are the small atomic segments, and root ids describe large components (neurons) consisting of many supervoxels. A root id always refers to the same version of a neuron and represents a snapshot in time in its own right. For a given annotation id (`id`), all but the `*_root_id` columns stay constant between materializations. 

`query_table` has three parameters to define filters: filter_in_dict, filter_out_dict, filter_equal_dict. More options will be added. This can be used to query synapses between any lists of neurons. For instance, to query the outgoing synapses of an AMMC-B1 neuron we included in the FlyWire paper:
(see the next section for how to come up with a specific root id)

In [17]:
%%time

syn_df = client.materialize.query_table("synapses_nt_v1", materialization_version=15,
                                        filter_in_dict={"pre_pt_root_id": [720575940627197566]})

As described in the metadata above, we suggest filtering the synapse table using the `cleft_score` and `connection_score`. Tuning these will help to reduce the number of false positive synapses in the list. The best threshold(s) will depend on the specific neurons included in the analysis. Here we will just remove all synapses with a `cleft_score < 50`.

In [16]:
syn_df = syn_df[syn_df["cleft_score"] >= 50]

Some postsynaptic partners have a 0 id. Many of these are due to the synapse prediction covering a bigger space than the segmentation. Here, we remove these along with synapses onto itself as we are confident that this cell does not make autapses.

In [17]:
syn_df = syn_df[syn_df["pre_pt_root_id"] != syn_df["post_pt_root_id"]]
syn_df = syn_df[syn_df["post_pt_root_id"] != 0]

This synapse table comes with neurotransmitter prediction from the work of Eckstein et al.. Please review the description in the metadata to understand the caveats of this data with regards your analysis. Here, we just look at the mean of the probablities of all outgoing synapses which shows that this neuron's neurotransmitter is very likely acetylcholine.

In [18]:
np.mean(syn_df[["gaba", "ach", "glut", "oct", "ser", "da"]])

Here we take a brief look at the postsynaptic partners and sorting them by number of synapses; displaying the top 10:

In [19]:
u_post_root_ids, c_post_root_ids = np.unique(syn_df["post_pt_root_id"], return_counts=True)

sorting = np.argsort(c_post_root_ids)[::-1][:10]
list(zip(u_post_root_ids[sorting], c_post_root_ids[sorting]))

The main target is an AMMC-A1 (720575940613535430) which is a connection we described in Figure 6 in the FlyWire paper.

We can further restrict the query by filtering the postsynaptic targets. For instance this query will only return the synapses between the these two root ids.

In [20]:
syn_df = client.materialize.query_table("synapses_nt_v1", materialization_version=15,
                                        filter_in_dict={"pre_pt_root_id": [720575940627197566],
                                                        "post_pt_root_id": [720575940613535430]})
syn_df = syn_df[syn_df["cleft_score"] >= 50]

syn_df

## Retrieving matching root ids

Neuroglancer shows the most recent version of the segmentation by default. Neurons that have been updated since a materialized version are not included in a table of that version. To reconcile this, users need to look up root ids for their data with a timestamp. 

We generally recommend storing annotations as points in space as these can be mapped to root ids easily (that's basically what materialization is). Soon, users will be able to create their own annotation tables and CAVE will provide fitting root ids automatically. Still, use cases will arrive that require a mnual materialization by the user:


### Programmatically

The client has an interface to the chunkedgraph (see Section 5 in [the related tutorial](https://github.com/seung-lab/AnnotationFrameworkClient/blob/master/FrameworkClientExamples.ipynb) which allows users to query a root id for a given supervoxel id. Supervoxel ids can be retrieved from the segmentation using [cloudvolume](https://github.com/seung-lab/cloud-volume/).


### Neuroglancer

The segmentation layer has an option under the tab "graph" to lock a layer to a specific timestamps. Then, root ids are looked up with this specific timestamp (proofreading is not possible in this mode). Be aware that this mode does not prevent the pasting of root ids from different timestamps into the layer as that circumvents the lookup to the server.

### Timestamps

Timestamps are _always_ UTC. 

Please be aware that the package or browser you are using might format timestamps in your local timezone. The timestamp for all annotation tables within a materialization are the same:

In [21]:
client.materialize.get_version_metadata(27)

## Creating neuroglancer links programmatically

We are building infrastructure into neuroglancer to display this information there while browsing neurons. Until this is ready, the most convenient way to visualize this information in neuroglancer is to programmatically create neuroglancer state and to upload them to the state server. The links can then be distributed. 

[NeuroglancerAnnotationUI (nglui)](https://github.com/seung-lab/NeuroglancerAnnotationUI)  makes programmatic creation of neuroglancer states convenient. The [statebuilder examples](https://github.com/seung-lab/NeuroglancerAnnotationUI/blob/master/examples/statebuilder_examples.ipynb) shows how one can directly from dataframes as the one above to neuroglancer states. The [related tutorial on this client](https://github.com/seung-lab/AnnotationFrameworkClient/blob/master/FrameworkClientExamples.ipynb) shows under "4. JSON Service" how this client can be used to upload states to the server and to create neuroglancer links.


## Further references


More examples for the usage of CAVE can be found in a related project:

https://github.com/AllenInstitute/MicronsBinder

A rough overview of the structure of our backend services can be found here:

https://github.com/seung-lab/AnnotationPipelineOverview

## Credit

CAVE is developed at Princeton University and the Allen Institute for Brain Science within the iarpa MICrONs project and the FlyWire project. Main contributors to the design and backend development 
are Derrick Brittain, Forrest Collman, Sven Dorkenwald, Chris Jordan, Casey Schneider-Mizell

A citable publication is in the works. Please contact us if you are interested in using CAVE on another dataset. 