<a href="figures/Nexus_logo_800px.png" target="_blank"><img src="figures/Nexus_logo_800px.png" 
width="150" border="10" /></a>

# Blue Brain Nexus - A knowledge graph for data-driven science - Part 1

##  1 - General introduction


### 1-1 Challenges of data management in neuroscience

Neuroscience - like many other scientific fields - produces a vast amount of data. It is important to enable good data management to support scientific discovery. Some of the challenges of data management include:

* Heterogeneity of data (e.g. morphology reconstructions, electrophysiology recordings, whole-brain imaging, simulations, validations)
* Varying size of datasets (small to large)
* Data is often stored in distributed silos (lab servers, personal computers, dropbox, google drive)
* Data provenance is often not easily accessible (metadata kept in different spreadsheets, hand-written labbooks, not captured)

&rightarrow; **Discovery of similar and related data across silos is hard!**


###  1-2 The FAIR Data Principles

The FAIR Guiding Principles. were defined to help the scientific community in implementing good data management. The acronym stands for: **Findability, Accessibility, Interoperability, and Reusability**. They are a set of principles intended to help enhance the reusability of scientific data with specific emphasis on usability of data by both machines and humans. 

* FAIR Data Principles: https://www.nature.com/articles/sdata201618

###  1-3 What is Nexus?

#### Nexus is...

* A data repository
* A metadata catalog
* A semantic search engine

At the core of Blue Brain Nexus lies a knowledge graph [(What is a knowledge graph?)](https://hackernoon.com/wtf-is-a-knowledge-graph-a16603a1a25f). The Nexus KnowledgeGraph operates on **4 types** of resources: **Organizations, Domains, Schemas** and **Instances**, nested as described in the diagram below:

<a href="figures/nexus-kg-resources.png" target="_blank"><img src="figures/nexus-kg-resources.png" 
width="500" border="10" /></a>


The Nexus KnowledgeGraph exposes a RESTful interface over HTTP(S). The generally adopted transport format is **JSON-LD**. All resources in the system generally follow the very same lifecycle (see diagram below). Changes to the data (creation, updates, state changes) are recorded into the system as **revisions**.

<a href="figures/nexus-kg-resource-lifecycle.png" target="_blank"><img src="figures/nexus-kg-resource-lifecycle.png" 
width="500" border="10" /></a>

#### Nexus resources

* Nexus on Github: https://github.com/BlueBrain/nexus
* Nexus API documentation: https://bbp-nexus.epfl.ch/dev/docs/kg/index.html


## 2 - Learning objectives of practical:

### Part 1 (45')

**From a concrete use case learn to**

* Identify and define entities
* Relate them through simple provenance
* Extend the provided schemas for the identified entities
* Prepare payloads (data) for individual entities


### Part 2 (45')

**Using Blue Brain Nexus and Pyxus, learn how to**

* Create organisation on Nexus
* Create domain on Nexus
* Secure organisation
* Create and publish schemas of identified entities
* Create entities on Nexus using prepared payloads
* Attach file to entity and manage the revisions of the entity
* Filter for relevant data


## 3 - Use case

Amy is a computational neuroscientist working with single cell models of the somatosensory cortex. Her model uses experimental data collected by Brian, an experimental neuroscientist using *in vitro* patch-clamp recording techniques for his research, producing both electrophysiology as well as morphology data. To facilitate access to the data for Amy, Brian wants to use Blue Brain Nexus to integrate it (both files and respective metadata). This is an example page taken from **Brian's labbook** containing the relevant metadata of the experiment he wants to make available to Amy t Blue Brain Nexus:

<a href="figures/labbook.png" target="_blank"><img src="figures/labbook.png" 
width="1000" border="10" /></a>


## 4 - Exercises Part 1

### 4-0 From data in the lab to data in Nexus

The figure below shows some essential steps for creating a data model (left column), what this would look like for a concrete example (middle column) and how the data model is represented in Nexus:

<a href="figures/in-silico-neuroscience-course-graphic.png" target="_blank"><img src="figures/in-silico-neuroscience-course-graphic.png" 
width="1000" border="10" /></a>

### 4-1 Identify and define the entities presented in the use case

* Use Brian's labbook entry to identfiy three entities of relevance in his experiment 
* Identify the metadata fields associated with the three entities 
* Relate the three entities using the provenance relation **wasDerivedFrom** (http://www.w3.org/ns/prov#wasDerivedFrom)

### 4-2 Inspect and extend the provided SHACL schemas

If you want to a data instance of e.g. a subject like a rat in Nexus, it is important to know exactly how that data instance of a subject should be organised, what metadata properties it should come with, which of those are optional (e.g. the weight of the subjcet, because the information is not always available) and which of them should be required (e.g. the species the subject belongs to). This is where schemas come into place: Schemas define properties an entity has and help further constrain them. In the case of Nexus, schemas are written in the so-called **Shapes Constraint Language** (SHACL): https://www.w3.org/TR/shacl/ and stored in JSON-LD format. 

**Use the online JSON editor (accessible through the links below) to** 
* inspect the provided schemas for **Subject, Neuron and Dataset**
* familiarise yourself with the structure of the schemas and the properties provided 
* extend the Subject schema with a property "strain"

[Subject schema](https://jsoneditoronline.org/?id=c6745518a8e5414d70b74059f09f6fcb)

[Neuron schema](https://jsoneditoronline.org/?id=c6745518a8e5414d70b74059f0b059b3)

[Dataset schema](https://jsoneditoronline.org/?id=ab204158e246b767c827f7b505d555a9)


### 4-3 Prepare the payloads (data) of the individual entities

Now that you have familiarised yourself with the SHACL schemas for the three entity types Subject, Neuron and Dataset, you can create data examples using the online JSON editor with the data taken from Brian's labbook:

**Use the online JSON editor to complete the data examples for Subject, Neuron and Dataset**

[Subject metadata](https://jsoneditoronline.org/?id=74d090a43654d2fb720c231395351736)

[Neuron metadata](https://jsoneditoronline.org/?id=31db2a690512ec8fd7afa9111662493b)

[Dataset metadata](https://jsoneditoronline.org/?id=a679385b81c01fe4d9f1ed5915aeffff)

### 4-4 Test your data against the schemas using the SHACL playground

Copy-paste the schema with its respective data example into the SHACL playground and look at the validation report: http://shacl.org/playground/ (the SHACL schema should be pasted into the 'Shapes graph' window and the data example into the 'Data graph' window). See how the validation report changes if you make changes to your data graph!