<a href=https://docs.google.com/uc?id=1xbyIyDFmryHaZ4HhVNDBeoMCZGyXCf10 target="_blank"><img src=https://docs.google.com/uc?id=1xbyIyDFmryHaZ4HhVNDBeoMCZGyXCf10 
width="150" border="10" /></a>

# Blue Brain Nexus - A knowledge graph for data-driven science - Part 1

##  1 - General introduction


### 1-1 Challenges of data management in neuroscience

Neuroscience - like many other scientific fields - produces a vast amount of data. It is important to enable good data management to support scientific discovery. Some of the challenges of data management include:

* Heterogeneity of data (e.g. morphology reconstructions, electrophysiology recordings, whole-brain imaging, simulations, validations)
* Varying size of datasets (small to large)
* Data is often stored in distributed silos (lab servers, personal computers, dropbox, google drive)
* Data provenance is often not easily accessible (metadata kept in different spreadsheets, hand-written labbooks, not captured)

&rightarrow; **Discovery of similar and related data across silos is hard!**


###  1-2 The FAIR Data Principles

The FAIR Guiding Principles. were defined to help the scientific community in implementing good data management. The acronym stands for: **Findability, Accessibility, Interoperability, and Reusability**. They are a set of principles intended to help enhance the reusability of scientific data with specific emphasis on usability of data by both machines and humans. 

[FAIR Data Principles](https://www.nature.com/articles/sdata201618)

###  1-3 What is Nexus?

#### Nexus is...

* A data repository
* A metadata catalog
* A semantic search engine

At the core of Blue Brain Nexus lies a knowledge graph [(What is a knowledge graph?)](https://hackernoon.com/wtf-is-a-knowledge-graph-a16603a1a25f). The Nexus KnowledgeGraph operates on **4 types** of resources: **Organizations, Domains, Schemas** and **Instances**, nested as described in the diagram below:

<a href= target=http://nexus.apps.bbp.epfl.ch/dev/docs/kg/assets/api-reference/resources.png "_blank"><img src=http://nexus.apps.bbp.epfl.ch/dev/docs/kg/assets/api-reference/resources.png 
width="500" border="10" /></a>


The Nexus KnowledgeGraph exposes a RESTful interface over HTTP(S). The generally adopted transport format is **JSON-LD**. All resources in the system generally follow the very same lifecycle (see diagram below). Changes to the data (creation, updates, state changes) are recorded into the system as **revisions**.

<a href= target=http://nexus.apps.bbp.epfl.ch/dev/docs/kg/assets/api-reference/resource-lifecycle.png "_blank"><img src=http://nexus.apps.bbp.epfl.ch/dev/docs/kg/assets/api-reference/resource-lifecycle.png 
width="500" border="10" /></a>

#### Nexus resources

[Nexus on Github](https://github.com/BlueBrain/nexus)

[Nexus API documentation](https://bbp-nexus.epfl.ch/dev/docs/kg/index.html)


## 2 - Learning objectives of practical:

### Part 1 (45')

**From a concrete use case learn to**

* Identify and define entities
* Relate them through simple provenance
* Extend the provided schemas for the identified entities
* Prepare payloads (data) for individual entities


### Part 2 (45')

**Using Blue Brain Nexus and Pyxus, learn how to**

* Create an organization on Nexus
* Create a domain on Nexus
* Secure your organization
* Create and publish schemas of identified entities
* Create entities on Nexus using prepared payloads
* Attach files to entities and manage the revisions of the entity
* Filter for relevant data


## 3 - Use case

Amy is a computational neuroscientist working with single cell models of the somatosensory cortex. Her model uses experimental data collected by Brian, an experimental neuroscientist using *in vitro* patch-clamp recording techniques for his research, producing both electrophysiology as well as morphology data. To facilitate access to the data for Amy, Brian wants to use Blue Brain Nexus to integrate his data (both files and respective metadata). This is an example page taken from **Brian's labbook** containing the relevant metadata of the experiment he wants to make available to Amy through Blue Brain Nexus:

<a href=https://docs.google.com/uc?id=1w5BIpx6q4PEmDQ6ozEC2o9GI8hYPFFQd target="_blank"><img src=https://docs.google.com/uc?id=1w5BIpx6q4PEmDQ6ozEC2o9GI8hYPFFQd 
width="1000" border="10" /></a>


## 4 - Exercises Part 1

### 4-0 From data in the lab to data in Nexus

The figure below shows some essential steps for creating a data model (left column), what this would look like for a concrete example (middle column) and how the data model is represented in Nexus:

<a href=https://docs.google.com/uc?id=1q7RTwMWlsTaZdWkTwEZbwYBPcHV5MRWX target="_blank"><img src=https://docs.google.com/uc?id=1q7RTwMWlsTaZdWkTwEZbwYBPcHV5MRWX 
width="1000" border="10" /></a>

### 4-1 Identify and define the entities presented in the use case

* Use Brian's labbook entry to identfiy three entities of relevance in his experiment 
* Identify the metadata fields associated with the three entities 
* Relate the three entities using the provenance relation [wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom)

### 4-2 Inspect and extend the provided SHACL schemas

If you want to describe a data instance of e.g. a subject like a rat in Nexus, it is important to know exactly how that data instance of a subject should be organised, what metadata properties it should come with, which of those are optional (e.g. the weight of the subjcet, because the information is not always available) and which of them should be required (e.g. the species the subject belongs to). This is where schemas come into place: Schemas define properties an entity has and help further constrain them. In the case of Nexus, schemas are written in the so-called **Shapes Constraint Language** ( [SHACL](https://www.w3.org/TR/shacl/) ) and stored in JSON-LD format. 

**Use the [JSON editor online](https://jsoneditoronline.org/) to** 
* inspect the provided schemas for **Subject, Neuron and Dataset**
* familiarise yourself with the structure of the schemas and the properties provided 
* extend the Subject schema with a property "strain" (the property should be required with a maximum count of 1) 

<font color='red'>NOTE</font>: To inspect and extend the schemas, copy the provided schemas from below and paste them into a new editor window in the [JSON editor online](https://jsoneditoronline.org/). To open a new editor window, click the "New" tab in the tool bar on top. You can save each of the schemas individually by clicking on "Save" --> "Save online". By clicking the "<" icon in the middle, you can format the schema on the left to increase its readability.

------


** Subject schema**

{
  "@context": [
    "https://bbp-nexus.epfl.ch/staging/v0/contexts/neurosciencegraph/core/schema/v0.1.0"
  ],
  "@type": "nxv:Schema",
  "shapes": [
    {
      "@id": "https://bbp-nexus.epfl.ch/staging/v0/schemas/neurosciencegraph/experiment/subject/v0.1.0/shapes/SubjectShape",
      "@type": "sh:NodeShape",
      "label": "Subject shape definition",
      "comment": "Subject used in experiment.",
      "targetClass": "nsg:Subject",
      "nodeKind": "sh:BlankNodeOrIRI",
      "property": [
        {
          "path": "schema:name",
          "name": "Name",
          "description": "The name of the subject",
          "datatype": "xsd:string",
          "maxCount": 1
        },
        {
          "path": "nsg:species",
          "name": "Species",
          "description": "The species of the subject",
          "datatype": "xsd:string",
          "minCount": 1,
          "maxCount": 1
        },
        {
          "path": "nsg:sex",
          "name": "Sex",
          "description": "The sex of the subject",
          "datatype": "xsd:string",
          "maxCount": 1
        },
        {
          "path": "nsg:age",
          "name": "Age",
          "description": "The age of the subject.",
          "node": "https://bbp-nexus.epfl.ch/staging/v0/schemas/neurosciencegraph/experiment/subject/v0.1.0/AgeShape",
          "maxCount": 1
        },
        {
          "path": "nsg:weight",
          "name": "Weight",
          "description": "The weight of the subject.",
          "node": "https://bbp-nexus.epfl.ch/staging/v0/schemas/neurosciencegraph/experiment/subject/v0.1.0/QuantitativeValueShape",
          "maxCount": 1
        }
      ]
    },
    {
      "@id": "https://bbp-nexus.epfl.ch/staging/v0/schemas/neurosciencegraph/experiment/subject/v0.1.0/AgeShape",
      "@type": "sh:NodeShape",
      "property": [
        {
          "path": "nsg:period",
          "name": "Period",
          "in": [
            "Pre-natal",
            "Post-natal"
          ],
          "minCount": 1,
          "maxCount": 1
        },
        {
          "path": "schema:value",
          "name": "Age value",
          "node": "https://bbp-nexus.epfl.ch/staging/v0/schemas/neurosciencegraph/experiment/subject/v0.1.0/QuantitativeValueShape",
          "minCount": 1,
          "maxCount": 1
        }
      ]
    },
    {
      "@id": "https://bbp-nexus.epfl.ch/staging/v0/schemas/neurosciencegraph/experiment/subject/v0.1.0/QuantitativeValueShape",
      "@type": "sh:NodeShape",
      "property": [
        {
          "path": "schema:value",
          "name": "Value",
          "datatype": "xsd:string",
          "maxCount": 1,
          "minCount": 1
        },
        {
          "path": "schema:unitText",
          "name": "Unit",
          "datatype": "xsd:string",
          "maxCount": 1,
          "minCount": 1
        }
      ]
    }
  ]
}

**Neuron schema**

{
  "@context": [
    "https://bbp-nexus.epfl.ch/staging/v0/contexts/neurosciencegraph/core/schema/v0.1.0"
  ],
  "@type": "nxv:Schema",
  "shapes": [
    {
      "@id": "https://bbp-nexus.epfl.ch/staging/v0/schemas/neurosciencegraph/experiment/neuron/v0.1.0/shapes/NeuronShape",
      "@type": "sh:NodeShape",
      "label": "Neuron shape definition",
      "comment": "Neuron recorded in experiment.",
      "targetClass": "nsg:Neuron",
      "nodeKind": "sh:BlankNodeOrIRI",
      "property": [
        {
          "path": "schema:name",
          "name": "Name",
          "description": "The name of the neuron",
          "datatype": "xsd:string",
          "maxCount": 1
        },
        {
          "path": "nsg:mType",
          "name": "Morphological type",
          "description": "The morphological type of the neuron",
          "datatype": "xsd:string",
          "minCount": 1,
          "maxCount": 1
        },
        {
          "path": "nsg:eType",
          "name": "Electrical type",
          "description": "The electrical type of the neuron",
          "datatype": "xsd:string",
          "minCount": 1,
          "maxCount": 1
        },
        {
          "path": "prov:wasDerivedFrom",
          "name": "Was derived from",
          "description": "The neuron was derived from a subject.",
          "class": "nsg:Subject",
          "minCount": 1,
          "maxCount": 1
        }
      ]
    }
  ]
}

**Dataset schema**

{
  "@context": [
    "https://bbp-nexus.epfl.ch/staging/v0/contexts/neurosciencegraph/core/schema/v0.1.0"
  ],
  "@type": "nxv:Schema",
  "shapes": [
    {
      "@id": "https://bbp-nexus.epfl.ch/staging/v0/schemas/neurosciencegraph/core/dataset/v0.1.0/shapes/DatasetShape",
      "@type": "sh:NodeShape",
      "targetClass": "nsg:Dataset",
      "nodeKind": "sh:BlankNodeOrIRI",
      "property": [
        {
          "path": "schema:name",
          "name": "Name",
          "description": "The dataset name.",
          "datatype": "xsd:string",
          "maxCount": 1
        },
        {
          "path": "schema:description",
          "name": "Description",
          "description": "The dataset description.",
          "datatype": "xsd:string",
          "maxCount": 1
        },
        {
          "path": "prov:wasDerivedFrom",
          "name": "Was derived from",
          "description": "The dataset was derived from a neuron.",
          "class": "nsg:Neuron",
          "minCount": 1,
          "maxCount": 1
        }
      ]
    }
  ]
}

### 4-3 Prepare the payloads (data) of the individual entities

Now that you have familiarised yourself with the SHACL schemas for the three entity types Subject, Neuron and Dataset, you can create data examples using the online JSON editor with the data taken from Brian's labbook:

**Use the online JSON editor to complete the data examples for Subject, Neuron and Dataset (one dataset for morphology and one dataset for electrophysiology)**: 

<font color='red'>NOTE</font>: To prepare the data, copy the provided data templates from below and paste them into a new editor window in the [JSON editor online](https://jsoneditoronline.org/). To open a new editor window, click the "New" tab in the tool bar on top. You can save each of the data individually by clicking on "Save" --> "Save online". By clicking the "<" icon in the middle, you can format the data on the left to increase its readability.

<font color='red'>NOTE</font>: The value of the "@id" key of the "wasDerivedFrom" field has to be left empty and can only be filled once the respective data was put into Nexus in Part 2 of this course.

------


**Subject metadata**

{
"@context": [
  "https://bbp-nexus.epfl.ch/staging/v0/contexts/neurosciencegraph/core/data/v0.1.2"
  ],
  "@type": [
    "nsg:Subject"
    ],
  "name": "",
  "species": "",
  "strain": "",
  "sex": "",
  "age": {
    "period": "",
    "value": {
      "unitText": "",
      "value": ""
    }
  },
  "weight": {
    "unitText": "",
    "value": ""
  }
}

**Neuron**

{
"@context": [
  "https://bbp-nexus.epfl.ch/staging/v0/contexts/neurosciencegraph/core/data/v0.1.2"
  ],
  "@type": [
    "nsg:Neuron"
    ],
  "name": "",
  "mType": "",
  "eType": "",
  "wasDerivedFrom": {
    "@id": "https://nexus-id-placeholder.ch",
    "@type": ""
  }
}

**Dataset**

{
"@context": [
  "https://bbp-nexus.epfl.ch/staging/v0/contexts/neurosciencegraph/core/data/v0.1.2"
  ],
  "@type": [
    "nsg:Dataset"
    ],
  "name": "",
  "description": "",
  "wasDerivedFrom": {
    "@id": "https://nexus-id-placeholder.ch",
    "@type": ""
  }
}

### 4-4 Test your data against the schemas using the SHACL playground

Copy-paste the schema with its respective data example into the [SHACL playground](http://shacl.org/playground/) and look at the validation report (the SHACL schema should be pasted into the 'Shapes graph' window and the data example into the 'Data graph' window). See how the validation report changes if you make changes to your data graph!