<a href="figures/Nexus_logo_800px.png" target="_blank"><img src="figures/Nexus_logo_800px.png" 
width="270" border="10" /></a>

# Blue Brain Nexus - A knowledge graph for data-driven science

##  1 - General introduction


### 1-1 Challenges of data management in neuroscience

Neuroscience - like many other scientific fields - produces a vast amount of data. It is important to enable good data management to support scientific discovery. Some of the challenges of data management include:

* Heterogeneity of data (e.g. morphology reconstructions, electrophysiology recordings, whole-brain imaging, simulations, validations)
* Varying size of datasets (small to large)
* Data is often stored in distributed silos (lab servers, personal computers, dropbox, google drive)
* Data provenance is often not easily accessible (metadata kept in different spreadsheets, hand-written labbooks, not captured)

&rightarrow; **Discovery of similar and related data across silos is needed but not enabled!**


###  1-2 The FAIR Data Principles

The FAIR Guiding Principles. were defined to help the scientific community in implementing good data management. The acronym stands for: **Findability, Accessibility, Interoperability, and Reusability**. They are a set of principles intended to help enhance the reusability of scientific data with specific emphasis on usability of data by both machines and humans. 

* FAIR Data Principles: https://www.nature.com/articles/sdata201618

###  1-3 What is Nexus and how does it address the outlined challenges in data management?

#### Nexus is

* A data repository
* A metadata catalog
* A semantic search engine

It treats **provenance** as first class citizen, supports **ontologies** and is **agnostic of the domain** of application. It was further built to handle **large amounts of data**.














he Nexus KnowledgeGraph operates on **4 types** of resources: **Organizations, Domains, Schemas** and **Instances**, nested as described in the diagram below. For the use case described in this notebook, we will look at data stored under the **bbp** organization being part of the domains **experiment, electrophysiology** and **morphology**. 

<a href="figures/nexus-kg-resources.png" target="_blank"><img src="figures/nexus-kg-resources.png" 
width="700" border="10" /></a>


The Nexus KnowledgeGraph exposes a RESTful interface over HTTP(S). The generally adopted transport format is **JSON-LD**. All resources in the system generally follow the very same lifecycle (see diagram below). Changes to the data (creation, updates, state changes) are recorded into the system as **revisions**.


<a href="figures/nexus-kg-resource-lifecycle.png" target="_blank"><img src="figures/nexus-kg-resource-lifecycle.png" 
width="700" border="10" /></a>


#### KnowledgeGraph

* Concept of entities (metadata and raw data)
* store the relations between entities (e.g. provenance)
* make that data searchabl

#### Provenance



## Ressources

* Nexus on Github: https://github.com/BlueBrain/nexus
* Nexus in media: 
    * https://www.technologynetworks.com/informatics/articles/neuroinformatics-and-the-blue-brain-project-part-1-295850
    * https://www.technologynetworks.com/informatics/articles/building-the-blue-brain-nexus-295851
    * https://actu.epfl.ch/news/blue-brain-nexus-an-open-source-tool-for-data-driv/
* Nexus landing page: http://nexus.apps.bbp.epfl.ch/dev/home/ (navigate to ontoDocs and explorer)


## Objectives of practical:

* Learn about challenges in scientific data management
* Learn about Blue Brain Nexus
* Use Nexus to
    * Define entities
    * Relate them through provenance
    * Create entities to grow the knowledge graph
    * Attach raw data to entities and manage the revisions
    * Traverse the knowledge graph to find relevant data

* TODO: Show graphics with different steps of lifecycle

## Use case

Brian is a computational neuroscientist working with single cell models of the somatosensory cortex. His model is based on experimental data collected by Maria, an experimental neuroscientist using in vitro whole-cell patch clamp recording techniques for her research. To refine his model, Brian wants to include one more cell from the dataset Maria had collected into his model. He wants to use both the morphology reconstruction of the cell as well as the recorded traces. Both Maria and Brian happen to be at the same scientific conference and Maria remembers one particular cell that would fit Brian's needs. She quickly scribbles down the ID she had given that cell: C060600A2-MT-C1. Using the provided ID, Brian wants to retrieve all the recorded traces from that cell as well as the morphology reconstruction to include in his model.    

**For a specific patched cell, retrieve the corresponding reconstructed cell alongside with all the traces that were recorded from that patched cell.**


## Define your entities, relate them through provenance, share your domain

The below image shows the (provenance) **graph** for **data types** of in vitro single cell electrophysiology and morphology data. It includes **entities** of the data as well as **activities** which generated the entities, **agents** associated with the activities and **protocols** involved in their generation. Bold arrowheads highlight entities of relevance to the use case outlined above.

<a href="figures/full-provenance-template.png" target="_blank"><img src="figures/full-provenance-template.png" 
width="1000" border="10" /></a>

* TODO: include prov pattern for electrophysiology and morphology

## Create and link entities to grow your knowledge graph

## Attach raw data to your entities and manage data revisions

## Traverse the knowledge graph to find relevant data and discover






## Nexus API interactions
----------------

----------------

### Introduction 


### (1) Create an organization and a domain

New organizations and new domains on Nexus can be created using a PUT request.

In [None]:
!pip install requests pyyaml pygments

In [2]:
import json
import requests
import yaml
from pygments import highlight
from pygments.lexers import JsonLdLexer
from pygments.formatters import TerminalFormatter

In [3]:
def pprint(string):
    json_obj = json.loads(string)
    json_str = json.dumps(json_obj, indent=2)
    lexer = JsonLdLexer()
    print(highlight(json_str, lexer, TerminalFormatter()))

Below, the the organization and the domain to be created can be specified and then created:

In [1]:
organization = input()
domain = input()

a
b


In [None]:
url_org  = 'https://bbp-nexus.epfl.ch/dev/v0/organizations/{}'.format(organization)
description = {"description": "A description of the organization"}

response = requests.put(url_org, json=description)
if response.status_code >= 400:
    response_text = yaml.load(response.text)
    print("The organization could not be created because", response_text.get('code'))
else:
    response_text = yaml.load(response.text)
    print(response_text.get("@id"))

In [None]:
url_domain  = 'https://bbp-nexus.epfl.ch/dev/v0/domains/{}/{}'.format(organization, domain)
description = {"description": "A description of the domain"}

response = requests.put(url_domain, json=description)
if response.status_code > 400:
    response_text = yaml.load(response.text)
    print("The domain could not be created because", response_text.get('code'))
else:
    response_text = yaml.load(response.text)
    print(response_text.get("@id"))

### (2) List existing schemas and load a schema into Nexus under a specified organization and domain

A listing of all the existing schemas in Nexus can be performed using the following endpoint:

https://bbp-nexus.epfl.ch/dev/v0/schemas

A new schema can be created using a PUT request. Following its creation, a schema needs to be published using a PATCH request before being able to validate instances against it:

In [None]:
url  = 'https://bbp-nexus.epfl.ch/dev/v0/schemas/{}/{}/patchedcell/v0.1.0'.format(organization, domain)
filename = 'patchedcell_schema.json'
url_publish = "{}/config?rev=1".format(url)
published = {"published": True}

with open(filename) as json_file:
    schema = json.load(json_file)
response = requests.put(url, json=schema)
if response.status_code > 400:
    response_text = yaml.load(response.text)
    print("The schema could not be created because", response_text.get("code"))
else:
    response = requests.patch(url_publish, json=published)
    response_text = yaml.load(response.text)
    print(response_text.get("@id"))


### (3) Post the reconstructed cell C060600A2_idA2
----------------

Metadata for the reconstructed cell C060600A2_idA2 (serialized as <a href="https://www.youtube.com/watch?v=vioCbTo3C-4" target="_blank">JSON-LD</a>) will be posted to the Blue Brain Nexus platform for validation against the **schema for a reconstructed cell** (https://bbp-nexus.epfl.ch/dev/v0/schemas/bbp/morphology/reconstructedcell/v0.1.0).

In [None]:
url = 'https://bbp-nexus.epfl.ch/dev/v0/data/bbp/morphology/reconstructedcell/v0.1.0'
filename = 'C060600A2_idA2.json'
with open(filename) as json_file:
    C060600A2_idA2 = json.load(json_file)
response = requests.post(url, json=C060600A2_idA2)
jsonld = yaml.load(response.text)
nexus_id = jsonld.get("@id")
print('The reconstructed cell has the following identifier on Nexus', nexus_id)

### (4) Attach the morphology binary to cell C060600A2_idA2
----------------

The morphology for the reconstructed cell C060600A2_idA2 will be attached to the metadata just posted to Nexus.

In [None]:
filename = 'C060600A2_idA2.ASC'

url = nexus_id + "/attachment?rev=1"
morphology = {'file': open(filename, 'rb')}
response = requests.put(url, files=morphology)
print(nexus_id)

### (5) Correct the name of cell C060600A2_idA2
----------------

In the above posted metadata, we spotted a typo in the name: Instead of C060600A2_idB2, it should be C060600A2_idA2. To correct this typo, we can update the metadata, resulting in a **new revision** of that instance: 

In [None]:
C060600A2_idA2 = yaml.load(requests.get(nexus_id).text)
C060600A2_idA2.update({'name': 'C060600A2_idA2'})
url = "{}?rev={}".format(nexus_id, C060600A2_idA2["nxv:rev"])
del C060600A2_idA2["nxv:rev"]
del C060600A2_idA2["nxv:deprecated"]
del C060600A2_idA2["links"]
del C060600A2_idA2["distribution"]
response = requests.put(url, json=C060600A2_idA2)
print(nexus_id)

### (6) USE CASE

----------------

The following sections demonstrate the use case outlined above. The use case includes the retrieval of all traces recorded from the patched cell with the provider ID C060600A2-MT-C1 as well as the corresponding reconstructed cell.

##### For the patched cell with the provider ID C060600A2-MT-C1, retrieve all corresponding traces

<a href="figures/get_traces.png" target="_blank"><img src="figures/get_traces.png" 
width="480" border="10" /></a>

The following filter needs to be applied to retrieve all traces recorded from the specified patched cell. The value of the filter specifies to filter for every instance of type Trace that connects through graph traversal to the patched cell with the provider ID C060600A2-MT-C1.

In [5]:
pprint("""{"@context": {"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#","nsg" : "https://bbp-nexus.epfl.ch/vocabs/bbp/neurosciencegraph/core/v0.1.0/","prov": "http://www.w3.org/ns/prov#"
  },"filter": {"op": "and","value": [{"path": "rdf:type","op": "eq","value": "nsg:Trace"}, {"path": "prhttps://bbp-nexus.epfl.ch/dev/v0/data/bbp/electrophysiology?filter=%7B%0A%20%20%22%40context%22%3A%20%7B%0A%20%20%20%20%22rdf%22%3A%20%22http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%22%2C%0A%20%20%20%20%22nsg%22%20%3A%20%22https%3A%2F%2Fbbp-nexus.epfl.ch%2Fvocabs%2Fbbp%2Fneurosciencegraph%2Fcore%2Fv0.1.0%2F%22%2C%0A%20%20%20%20%22prov%22%3A%20%22http%3A%2F%2Fwww.w3.org%2Fns%2Fprov%23%22%2C%0A%20%20%20%20%22schema%22%3A%20%22http%3A%2F%2Fschema.org%2F%22%0A%20%20%7D%2C%0A%20%20%22filter%22%3A%0A%20%20%7B%0A%20%20%20%20%22op%22%3A%20%22and%22%2C%0A%20%20%20%20%22value%22%3A%20%5B%0A%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%22op%22%3A%20%22eq%22%2C%0A%20%20%20%20%20%20%20%20%22path%22%3A%20%22rdf%3Atype%22%2C%0A%20%20%20%20%20%20%20%20%22value%22%3A%20%22nsg%3ATrace%22%0A%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%22op%22%3A%20%22eq%22%2C%0A%20%20%20%20%20%20%20%20%22path%22%3A%20%22prov%3AwasGeneratedBy%20%2F%20prov%3Aused%20%2F%20nsg%3AproviderId%22%2C%0A%20%20%20%20%20%20%20%20%22value%22%3A%20%22C060600A2-MT-C1%22%0A%0A%20%20%20%20%20%7D%0A%20%20%20%20%5D%0A%20%20%7D%0A%7Dov:wasGeneratedBy / prov:used / nsg:providerId","op": "eq","value": "C060600A2-MT-C1"}]}}""")

{
  [30;01m"@context"[39;49;00m: {
    [34;01m"rdf"[39;49;00m: [33m"http://www.w3.org/1999/02/22-rdf-syntax-ns#"[39;49;00m,
    [34;01m"nsg"[39;49;00m: [33m"https://bbp-nexus.epfl.ch/vocabs/bbp/neurosciencegraph/core/v0.1.0/"[39;49;00m,
    [34;01m"prov"[39;49;00m: [33m"http://www.w3.org/ns/prov#"[39;49;00m
  },
  [34;01m"filter"[39;49;00m: {
    [34;01m"op"[39;49;00m: [33m"and"[39;49;00m,
    [34;01m"value"[39;49;00m: [
      {
        [34;01m"path"[39;49;00m: [33m"rdf:type"[39;49;00m,
        [34;01m"op"[39;49;00m: [33m"eq"[39;49;00m,
        [34;01m"value"[39;49;00m: [33m"nsg:Trace"[39;49;00m
      },
      {
        [34;01m"path"[39;49;00m: [33m"prov:wasGeneratedBy / prov:used / nsg:providerId"[39;49;00m,
        [34;01m"op"[39;49;00m: [33m"eq"[39;49;00m,
        [34;01m"value"[39;49;00m: [33m"C060600A2-MT-C1"[39;49;00m
      }
    ]
  }
}



https://bbp-nexus.epfl.ch/dev/v0/data/bbp/electrophysiology?filter=%7B%0A%20%20%22%40context%22%3A%20%7B%0A%20%20%20%20%22rdf%22%3A%20%22http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%22%2C%0A%20%20%20%20%22nsg%22%20%3A%20%22https%3A%2F%2Fbbp-nexus.epfl.ch%2Fvocabs%2Fbbp%2Fneurosciencegraph%2Fcore%2Fv0.1.0%2F%22%2C%0A%20%20%20%20%22prov%22%3A%20%22http%3A%2F%2Fwww.w3.org%2Fns%2Fprov%23%22%2C%0A%20%20%20%20%22schema%22%3A%20%22http%3A%2F%2Fschema.org%2F%22%0A%20%20%7D%2C%0A%20%20%22filter%22%3A%0A%20%20%7B%0A%20%20%20%20%22op%22%3A%20%22and%22%2C%0A%20%20%20%20%22value%22%3A%20%5B%0A%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%22op%22%3A%20%22eq%22%2C%0A%20%20%20%20%20%20%20%20%22path%22%3A%20%22rdf%3Atype%22%2C%0A%20%20%20%20%20%20%20%20%22value%22%3A%20%22nsg%3ATrace%22%0A%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%22op%22%3A%20%22eq%22%2C%0A%20%20%20%20%20%20%20%20%22path%22%3A%20%22prov%3AwasGeneratedBy%20%2F%20prov%3Aused%20%2F%20nsg%3AproviderId%22%2C%0A%20%20%20%20%20%20%20%20%22value%22%3A%20%22C060600A2-MT-C1%22%0A%0A%20%20%20%20%20%7D%0A%20%20%20%20%5D%0A%20%20%7D%0A%7D

##### For the patched cell with the provider ID C060600A2-MT-C1, retrieve the corresponding morphology

<a href="figures/get_reconstructedcell.png" target="_blank"><img src="figures/get_reconstructedcell.png" 
width="480" border="10" /></a>

The following filter needs to be applied to retrieve the reconstructed cell for the specified patched cell:

In [1]:
pprint("""{"@context": {"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#","nsg" : "https://bbp-nexus.epfl.ch/vocabs/bbp/neurosciencegraph/core/v0.1.0/","prov": "http://www.w3.org/ns/prov#"
  },"filter": {"op": "and","value": [{"path": "rdf:type","op": "eq","value": "nsg:ReconstructedCell"}, {"path": "^prov:generated / prov:used / prov:wasRevisionOf / nsg:providerId","op": "eq","value": "C060600A2-MT-C1"}]}}""")

NameError: name 'pprint' is not defined

https://bbp-nexus.epfl.ch/dev/v0/data/bbp?filter=%7B%0A%20%20%22%40context%22%3A%20%7B%0A%20%20%20%20%22rdf%22%3A%20%22http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%22%2C%0A%20%20%20%20%22nsg%22%20%3A%20%22https%3A%2F%2Fbbp-nexus.epfl.ch%2Fvocabs%2Fbbp%2Fneurosciencegraph%2Fcore%2Fv0.1.0%2F%22%2C%0A%20%20%20%20%22prov%22%3A%20%22http%3A%2F%2Fwww.w3.org%2Fns%2Fprov%23%22%0A%20%20%7D%2C%0A%20%20%22filter%22%3A%0A%20%20%7B%0A%20%20%20%20%22op%22%3A%20%22and%22%2C%0A%20%20%20%20%22value%22%3A%20%5B%0A%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%22op%22%3A%20%22eq%22%2C%0A%20%20%20%20%20%20%20%20%22path%22%3A%20%22rdf%3Atype%22%2C%0A%20%20%20%20%20%20%20%20%22value%22%3A%20%22nsg%3AReconstructedCell%22%0A%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%22op%22%3A%20%22eq%22%2C%0A%20%20%20%20%20%20%20%20%22path%22%3A%20%22%5Eprov%3Agenerated%2Fprov%3Aused%2Fprov%3AwasRevisionOf%20%2F%20nsg%3AproviderId%22%2C%0A%20%20%20%20%20%20%20%20%22value%22%3A%20%22C060600A2-MT-C1%22%0A%0A%20%20%20%20%20%20%7D%0A%20%20%20%20%5D%0A%20%20%7D%0A%7D%0A