# Using Blue Brain Nexus in a data pipeline

## Prerequisites

This notebook assumes you've created a project within the [demo organization](https://sandbox.bluebrainnexus.io/web/demo/) of the sandbox deployment of Blue Brain Nexus.

## Overview

You'll work through the following steps:

1. Configure the Blue Brain Nexus environment you will work in
2. Download mouse and human neuron morphology data from the Allen Cell Types Database through the allenSDK
3. Explore the data structure of the Allen Cell Types Database
4. Store the downloaded neuron morphology reconstruction files in Blue Brain Nexus
5. Map the metadata to the Blue Brain Knowledge Graph Schema
6. Generate provenance entities with metadata for neuron morphologies and store them in Blue Brain Nexus
7. Select morphologies of interest as dataset
8. Register this dataset back into Blue Brain Nexus
9. Retrieve the dataset from Blue Brain Nexus and use it to run Topological Morphology Descriptor (TMD) analysis
10. Register the analysis plot back into Nexus
11. Capture the provenance of the analysis plot

![Using Blue Brain Nexus in a data pipeline](https://docs.google.com/uc?id=1dv2Cc3ZQgk-khPkAPy9-Dbkg5AqNhqgd)

## Step 1: Configure the Blue Brain Nexus environment you will work in

In [None]:
!pip install -U nexus-sdk
!pip install allensdk
!pip install rdflib
!pip install SPARQLWrapper
!git clone https://github.com/BlueBrain/TMD
!pip install ./TMD

In [None]:
import os
import nexussdk as nexus
import getpass

from allensdk.core.cell_types_cache import CellTypesCache
from allensdk.api.queries.cell_types_api import CellTypesApi
from allensdk.core.cell_types_cache import ReporterStatus as RS

import utils as ut

In [None]:
%load_ext autoreload

In [None]:
%autoreload 1

In [None]:
%aimport utils
%aimport sparqlendpointhelper
%aimport dataset

We will be working in the **production** environment of Blue Brain Nexus

In [None]:
DEPLOYMENT = "https://sandbox.bluebrainnexus.io/v1"

Provide your **token** below. Your token can be obtained after log-in by clicking on *Copy token* in the top left corner of [Nexus Web](https://sandbox.bluebrainnexus.io/web)

In [None]:
TOKEN = getpass.getpass()

Configure your environment and token to be used for the nexus python SDK

In [None]:
nexus.config.set_environment(DEPLOYMENT)

In [None]:
nexus.config.set_token(TOKEN)

We will be working in the **demo** organization of Blue Brain Nexus

In [None]:
ORGANIZATION = "demo"

Provide your **project label** below. This should correspond to the project you have configured in the previous session

In [None]:
PROJECTLABEL = "<YOUR PROJECT>"

## Step 2: Download mouse and human neuron morphologies from the Allen Cell Types DB

We will be working with human and mouse neuron morphology data from the [Allen Cell Types Database](https://celltypes.brain-map.org/). The [AllenSDK](https://allensdk.readthedocs.io/en/latest/) can be used for data download

Set the cell types cache for the Allen Cell Types Database: "The CellTypesCache class provides a Python interface for downloading data in the Allen Cell Types Database into well known locations so that you don’t have to think about file names and directories."

In [None]:
ctc = CellTypesCache(manifest_file="allen_cell_types_db/manifest.json")

#### Download neuron morphologies from human tissue:

Get all cells from the Allen Cell Types Database which are from human and have a reconstruction 

In [None]:
human_cells = ctc.get_cells(species=[CellTypesApi.HUMAN], require_reconstruction = True)

In [None]:
print("Total of human cells with reconstruction: %d" % len(human_cells))
print("---")
print("Metadata of an example cell (human):")
ut.pretty_print(human_cells[0])

We will download the first twenty of those neuron morphology reconstructions

In [None]:
human_cellIDs = [c["id"] for c in human_cells][0:20]

In [None]:
human_reconstruction = [ctc.get_reconstruction(i) for i in human_cellIDs]

#### Download neuron morphologies from mouse tissue

Get all cells from the Allen Cell Types Database which are from mouse and have a reconstruction 

In [None]:
mouse_cells = ctc.get_cells(species=[CellTypesApi.MOUSE], require_reconstruction = True)

In [None]:
print("Total of mouse cells with reconstruction: %d" % len(mouse_cells))
print("---")
print("Metadata of an example cell (mouse):")
ut.pretty_print(mouse_cells[0])

We will download the first twenty of those neuron morphology reconstructions

In [None]:
mouse_cellIDs = [c["id"] for c in mouse_cells][0:20]

In [None]:
mouse_reconstruction = [ctc.get_reconstruction(i) for i in mouse_cellIDs]

 ## Step 3: Explore the data structure of the Allen Cell Types Database

#### The **cells.json** metadata file 

This file contains all the available metadata of all cells currently available through the Allen Cell Types Database. The structure presents one object per cell.

In [None]:
allen_cell_types_meta = ut.get_json("allen_cell_types_db/cells.json")
print("Metadata from the cells.json file of an example cell (human):")
ut.pretty_print(allen_cell_types_meta[0])

#### The metadata exposed through the **allenSDK**

In [None]:
print("Metadata exposed through the allenSDK of an example cell (human):")
ut.pretty_print(human_cells[0])

#### The **folder structure** of downloaded data

When downloading data from the Allen Cell Types Database through the allenSDK, a folder per neuron morphology file is created. The folder name is composed as follows: **specimen_allenID**. This folder contains the neuron morphology reconstruction with the filename **reconstruction.swc**

## Step 4: Store the downloaded neuron morphology reconstruction files in Blue Brain Nexus

#### Integrate the downloaded neuron morphology reconstruction files into Blue Brain Nexus

In [None]:
morph_files_meta = {}

Store the downloaded human neuron morphology reconstructions in Blue Brain Nexus:

In [None]:
for cellID in human_cellIDs:
    file_path = f"./allen_cell_types_db/specimen_{cellID}/reconstruction.swc"
    response = nexus.files.create(org_label=ORGANIZATION, project_label=PROJECTLABEL, filepath=file_path)
    morph_files_meta[cellID] = {
        "file_name": response["_filename"],
        "content_value": response["_bytes"],
        "file_id": response["@id"],
        "digest_value": response["_digest"]["_value"]}

Store the downloaded mouse neuron morphology reconstructions in Blue Brain Nexus:

In [None]:
for cellID in mouse_cellIDs:
    file_path = f"./allen_cell_types_db/specimen_{cellID}/reconstruction.swc"
    response = nexus.files.create(org_label=ORGANIZATION, project_label=PROJECTLABEL, filepath=file_path)
    morph_files_meta[cellID] = {
        "file_name": response["_filename"],
        "content_value": response["_bytes"],
        "file_id": response["@id"],
        "digest_value": response["_digest"]["_value"]}

In [None]:
print("Check out the Blue Brain Nexus metadata of the stored files:")
ut.pretty_print(morph_files_meta)

## Step 5: Map the metadata to the Blue Brain Knowledge Graph Schema

![Provenance](https://docs.google.com/uc?id=1Hoz3wK3vNkLxdhKNZXK53NOE6qKuBh7o)

## Step 6: Generate provenance entities with metadata for neuron morphologies and store them in Blue Brain Nexus

This is the Grid identifier of the Allen Institute of Brain Science which will be used to asign contribution:

In [None]:
ALLEN_GRID = "https://www.grid.ac/institutes/grid.417881.3"

Generate the provenance entities for **Subject**, **PatchedCell** and **NeuronMorphology** for the human neuron morphology reconstructions and store them in Blue Brain Nexus:

In [None]:
for human_cellID in human_cellIDs:
        morph_meta = list(filter(lambda cell: cell['specimen__id'] == human_cellID, allen_cell_types_meta))[0]
        morph_file_meta = morph_files_meta[human_cellID]

        try:
            subject = ut.subject(morph_meta)
            nexus.resources.create(org_label=ORGANIZATION, project_label=PROJECTLABEL, data=subject)
            ut.pretty_print(subject)
        except nexus.HTTPError as e:
            nexus.tools.pretty_print(e.response.json())

        try:    
            patchedcell = ut.patchedcell(morph_meta, ALLEN_GRID)
            nexus.resources.create(org_label=ORGANIZATION, project_label=PROJECTLABEL, data=patchedcell)
            ut.pretty_print(patchedcell)
        except nexus.HTTPError as e:
            nexus.tools.pretty_print(e.response.json())
            
        try:
            neuronmorphology = ut.neuronmorphology(morph_meta, ALLEN_GRID, morph_file_meta)
            nexus.resources.create(org_label=ORGANIZATION, project_label=PROJECTLABEL, data=neuronmorphology)
            ut.pretty_print(neuronmorphology)
        except nexus.HTTPError as e:
            nexus.tools.pretty_print(e.response.json())

Generate the provenance entities for **Subject**, **PatchedCell** and **NeuronMorphology** for the mouse neuron morphology reconstructions and store them in Blue Brain Nexus:

In [None]:
for mouse_cellID in mouse_cellIDs:
    morph_meta = list(filter(lambda cell: cell['specimen__id'] == mouse_cellID, allen_cell_types_meta))[0]
    morph_file_meta = morph_files_meta[mouse_cellID]

    try:
        subject = ut.subject(morph_meta)
        nexus.resources.create(org_label=ORGANIZATION, project_label=PROJECTLABEL, data=subject)
        ut.pretty_print(subject)
    except nexus.HTTPError as e:
        nexus.tools.pretty_print(e.response.json())

    try:    
        patchedcell = ut.patchedcell(morph_meta, ALLEN_GRID)
        nexus.resources.create(org_label=ORGANIZATION, project_label=PROJECTLABEL, data=patchedcell)
        ut.pretty_print(patchedcell)
    except nexus.HTTPError as e:
        nexus.tools.pretty_print(e.response.json())

    try:
        neuronmorphology = ut.neuronmorphology(morph_meta, ALLEN_GRID, morph_file_meta)
        nexus.resources.create(org_label=ORGANIZATION, project_label=PROJECTLABEL, data=neuronmorphology)
        ut.pretty_print(neuronmorphology)
    except nexus.HTTPError as e:
        nexus.tools.pretty_print(e.response.json())

Check out [Nexus Web](https://sandbox.bluebrainnexus.io/web/demo/) to view and navigate your created resources.

## Step 7: Select morphologies of interest as dataset

### Select the morphologies

In [None]:
import json
import os
from sparqlendpointhelper import SparqlViewHelper
from dataset import Dataset, ComplexHandler

In [None]:
sparqlview_endpoint = DEPLOYMENT+"/views/"+ORGANIZATION+"/"+PROJECTLABEL+"/graph/sparql"
sparqlviewhelper = SparqlViewHelper(sparqlview_endpoint,DEPLOYMENT, ORGANIZATION, PROJECTLABEL, TOKEN)

In [None]:
_type = "nsg:NeuronMorphology"
_layer_label = "\"layer 5\""
apicalDendrite = "\"intact\""

In [None]:
dataset_query = """
SELECT *
WHERE
{
    BIND (%s as ?type).
    ?id a nsg:NeuronMorphology.
    ?id nsg:brainLocation / nsg:layer / rdfs:label %s.
    ?id nsg:apicalDendrite %s.
    ?id nxv:rev ?rev.
    ?id schema:distribution/schema:contentUrl ?contentUrl.
    ?id schema:name ?name
}
LIMIT 100
""" % (_type, _layer_label, apicalDendrite)

result_df = sparqlviewhelper.query_sparql(dataset_query,result_format = "DATAFRAME")
display(result_df.head(100))

### Build the dataset 

In [None]:
PERSON_ID = "Your id in Blue Brain Nexus"

In [None]:
dataset = Dataset(identifier=DATASET_ID,name="Selected morphologies for TMD", description="Awesome morphologies")

print(dataset.name)
dataset.addContributor(PERSON_ID, "Scientist")

for index, row in result_df.iterrows():
    dataset.addPart(identifier=row["id"], _type=row["type"], contentUrl=row["contentUrl"],name=row["name"], rev = row["rev"])

dataset_str = json.dumps(dataset, default=ComplexHandler)
dataset_json =  json.loads(dataset_str)
ut.pretty_print(dataset_json)

## Step 8: Register this dataset back into Blue Brain Nexus and tag it

### Create the dataset in Nexus

In [None]:
dataset_resource = ut.create_resource(nexus=nexus,json_payload=dataset_json, org=ORGANIZATION, project=PROJECTLABEL)
ut.pretty_print(dataset_resource)
print("The dataset is identified by %s" % (dataset_resource["@id"]))

### Tag the dataset to get an immutatble identifier

In [None]:
TAG_VALUE = "morpho_v0.1.0"

### Let fetch the dataset using its tag

In [None]:
response = ut.tag_resource(nexus=nexus,json_payload=dataset_resource, tag_value=TAG_VALUE,rev_to_tag=dataset_resource["_rev"])
ut.pretty_print(response)

dataset_identifier = dataset_resource["@id"]
dataset_immuatable_id = dataset_resource["@id"]+"?tag="+TAG_VALUE
dataset_access_address = dataset_resource["_self"]+"?tag="+TAG_VALUE
print("The dataset identifier is %s" %(dataset_identifier))
print("The dataset has now an immutable identifier %s" %(dataset_immuatable_id))
print("The dataset is now accessible through %s" %(dataset_access_address))


In [None]:
response = ut.fetch_resource(nexus,dataset_identifier, org=ORGANIZATION, project=PROJECTLABEL, tag=TAG_VALUE)
ut.pretty_print(response)

## Step 9: Download the dataset from Blue Brain Nexus and use it to run Topological Morphology Descriptor (TMD) analysis

### Get the content urls

In [None]:
contenturls_df = sparqlviewhelper.get_dataset_contenturls(dataset_identifier, result_format=sparqlendpointhelper.DATAFRAME)
display(contenturls_df.head())

### Download

In [None]:
downloadUrls = set(contenturls_df["partcontentUrl"])
downloadUrls = downloadUrls|set(contenturls_df["maincontentUrl"])
names = set(contenturls_df["name"])
names = [name+".swc" for name in names]
entries = list(zip(downloadUrls,names))
print("Number of download links: %s" % (len(entries)))

In [None]:
os.mkdir("./downloaded")

In [None]:
download_dir = "./downloaded"
report = ut.download_from_nexus(downloadurls_to_name= entries, download_dir=download_dir, token=TOKEN)
print(report)

In [None]:
datasetID = dataset_immuatable_id

### Run analysis

Example to extract the persistence diagram from a neuronal tree

In [None]:
import tmd
from tmd.view import view, plot

Load a population of neuron morphology reconcstructions

In [None]:
pop = tmd.io.load_population("./downloaded")

Get a list of diagrams for all apicals

In [None]:
phs_list = [tmd.methods.get_persistence_diagram(n.apical[0]) for n in pop.neurons]

Generate analysis plots from collapsed data from selected dataset

In [None]:
plot.diagram(tmd.analysis.collapse(phs_list))

In [None]:
plot.barcode(tmd.analysis.collapse(phs_list))

In [None]:
plot.persistence_image(tmd.analysis.collapse(phs_list), output_path="./", output_name="persistence_image")

## Step 10: Register the analysis plot back into Nexus

In [None]:
image_file_meta = {}

In [None]:
file_path = "./persistence_image.png"

In [None]:
response = nexus.files.create(org_label=ORGANIZATION, project_label=PROJECTLABEL, filepath=file_path)

In [None]:
image_file_meta = {
    "file_name": response["_filename"],
    "content_value": response["_bytes"],
    "file_id": response["@id"],
    "digest_value": response["_digest"]["_value"]}

## Step 11: Capture the provenance of the analysis plot

In [None]:
analysis = ut.analysis(PERSON_ID, datasetID, image_file_meta)
nexus.resources.create(org_label=ORGANIZATION, project_label=PROJECTLABEL, data=analysis)
ut.pretty_print(analysis)

Check out [Nexus Web](https://sandbox.bluebrainnexus.io/web/demo/) to view and navigate your created resources.