## Step by step buidling of a knowledge graph

The goal of this notebook is to build an example of knowledge graph using a step by step approach.


## Prerequisites

This notebook assumes you've created a project within the sandbox deployment of Nexus. If not follow the Blue Brain Nexus [Quick Start tutorial](https://bluebrain.github.io/nexus-bbp-domains/docs/bbptutorial/getting-started/quick-start/index.html).

## Overview
Explain the Research domain (with figure). Uses schema.org and json-ld to describe entities

You'll work through the following steps:

1. Create and configure a Blue Brain Nexus client
2. Create a Person entity
3. Create an Organization entity and link it to the Person entity as affiliation
4. Create an article entity and link it to the Person entity as author
5. Explore and navigate the created knowledge graph

## Step 1: Create and configure a Nexus client

In [None]:
#install the Blue Brain Nexus python SDK
!pip install -U nexus-sdk

In [55]:
#Set a token to authenticate to Nexus
import getpass
token = getpass.getpass()


········


In [56]:
#Configure a nexus client
nexus_environment = "https://sandbox.bluebrainnexus.io/v1"
org ="demo"
project ="testdemo"

import nexussdk as nexus
nexus.config.set_environment(nexus_environment)
nexus.config.set_token(token)

vocab = "%s/vocabs/%s/%s/"%(nexus_environment, org, project)



In [57]:
from pygments.lexers import JsonLdLexer
from pygments import highlight
from pygments.formatters import TerminalFormatter, TerminalTrueColorFormatter
import json

In [58]:
from sparqlendpointhelper import SparqlViewHelper

%load_ext autoreload
%autoreload 1
%aimport sparqlendpointhelper
%aimport utils

sparqlview_endpoint = nexus_environment+"/views/"+org+"/"+project+"/graph/sparql"
sparqlviewhelper = SparqlViewHelper(sparqlview_endpoint,nexus_environment, org, project, token)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [59]:
%reload_ext autoreload

### Import the Blue Brain Knowledge Graph schema 

In [69]:
#Import the BBP Knowledge Graph schema

project_to_resolve_to = "neurosciencegraph/datamodels"

# Resolver
cross_project_resolver = {
  "@type": [
    "CrossProject"
  ],
  "projects": [
    project_to_resolve_to
  ],
  "identities": [
    {
    "@type": "Authenticated",
    "realm": "github"
    }
  ],
  "priority": 50
}


response = utils.create_resolver(nexus,cross_project_resolver,org,project)
utils.pretty_print(response)

bbp_context =  {
    "@context":[
        "https://neuroshapes.org",
        {
            "@vocab": vocab
        }
    ],
    "@id":"https://bbp.neuroshapes.org"
    
}


response = utils.create_resource(nexus,bbp_context,org,project, )
utils.pretty_print(response)




{
    [34;01m"@context"[39;49;00m: [33m"https://bluebrain.github.io/nexus/contexts/resource.json"[39;49;00m,
    [34;01m"@id"[39;49;00m: [33m"https://sandbox.bluebrainnexus.io/v1/resources/demo/testdemo/_/36273830-5f37-4b29-a335-29bea43ae430"[39;49;00m,
    [34;01m"@type"[39;49;00m: [
        [33m"CrossProject"[39;49;00m,
        [33m"Resolver"[39;49;00m
    ],
    [34;01m"_self"[39;49;00m: [33m"https://sandbox.bluebrainnexus.io/v1/resolvers/demo/testdemo/36273830-5f37-4b29-a335-29bea43ae430"[39;49;00m,
    [34;01m"_constrainedBy"[39;49;00m: [33m"https://bluebrain.github.io/nexus/schemas/resolver.json"[39;49;00m,
    [34;01m"_project"[39;49;00m: [33m"https://sandbox.bluebrainnexus.io/v1/projects/demo/testdemo"[39;49;00m,
    [34;01m"_rev"[39;49;00m: [34m1[39;49;00m,
    [34;01m"_deprecated"[39;49;00m: [34mfalse[39;49;00m,
    [34;01m"_createdAt"[39;49;00m: [33m"2019-06-05T07:46:22.903Z"[39;49;00m,
    [34;01m"_createdBy"[39;49;00m: [33m"https://

## Step 2: Create a Person entity

Let define an entity of type Person as follows:

* A person is an entity of type Person (@type value)
* A person has an identifier (@id value)
* A person has a family name (familyName value), a given name (givenName value) and a job title (jobTitle value)
* A person has a job


In [None]:
# Use an orcid identifier if you have one, or your github id
person ={
    "@context":"https://bbp.neuroshapes.org",
    "@id":"http://your/id/here",
    "@type":"Person",
    "familyName":"your familly name here",
    "givenName":"your given name here",
    "jobTitle":"job title"
} 

response = utils.create_resource(nexus,person,org,project)
utils.pretty_print(response)


## Step 3: Create an article entity and link it to the Person entity as author

The knowledge Graph now contains a single entity of type Person. Let add to the knowledge graph one scholarly article (publication) authored by the person entity:

* A scholarly article is  an entity of type ScholarlyArticle (@type value)
* A scholarly article has an identifier (@id value)
* A scholarly article has a name (name value)
* A scholarly article has a publishing date (datePublished value)


###  Create a ScholarlyArticle entity

In [None]:
#Let create an entity describing a publication with a doi: https://doi.org/10.1186/1471-2105-13-s1-s4
scholarly_article ={
    "@context":"https://bbp.neuroshapes.org",
    "@type":"ScholarlyArticle",
    "@id":"https://doi.org/10.1016/j.cell.2015.09.029",
    "name":"Reconstruction and Simulation of Neocortical Microcircuitry",
    "datePublished":"2015-10"
} 

response = utils.create_resource(nexus,scholarly_article,org,project)
utils.pretty_print(response)

###  Link the Person and the ScholarlyArticle entity with authorship

In [None]:
# A reference to the Person identifier (value of @id) is enough to link with the article
# Note the revision value change (should be "_rev": 2) after an update
scholarly_article["author"]=person["@id"]
response = utils.update_resource(nexus,scholarly_article["@id"],scholarly_article,org,project)
utils.pretty_print(response)

###  Fetch the ScholarlyArticle by identifier to view the update 

In [None]:
response = utils.fetch_resource(nexus,scholarly_article["@id"],org,project)
utils.pretty_print(response)

##  Step 4: Update the Person entity to add its affiliation

Let update the person entity with an affiliation information. We will use EPFL as an affiliation and link to the person entity via "affiliation" propoerty.

In this step, we'll:

* Search for the EPFL organization entity and retrieve its identifier
* Update the person entity affiliation to point to the EPFL entity

### Search for the EPFL organization entity and retrieve its identifier and name

The query below is a SPARQL query. Most SPARQL queries you'll see will have the anotomy above with:

* a SELECT clause that let you select the variables you want to retrieve
* a WHERE clause defining a set of constraints that the variables should satisfy to be retrieved
* LIMIT and OFFSET clauses to enable pagination
* the constraints are usually graph patterns in the form of triple (?s for subject, ?p for property and ?o for ?object)

In [None]:
acronym = "\"EPFL\""


epfl_query = """
Select  ?institute  ?name ?grid_id
WHERE {
    ?institute vocab:acronym %s.
    ?institute a vocab:Organization.
    ?institute vocab:grid_id ?grid_id . 
    ?institute vocab:name ?name . 
}
LIMIT 100
""" % (acronym)

orgs_df = sparqlviewhelper.query_sparql(epfl_query,result_format = "DATAFRAME")
display(orgs_df.head(100))

### Update the person entity affiliation to point to the EPFL entity

In [None]:
# Note the revision value change after an update
person["affiliation"]=orgs_df["institute"]

for index, row in orgs_df.iterrows():
    person["affiliation"]=row["institute"]


response = utils.update_resource(nexus,person["@id"],person,org,project)
utils.pretty_print(response)

## Setp 5 Explore and navigate the created knowledge graph using the SPARQL query language

Let write our first query.

In [None]:
select_all_query = """
SELECT ?s ?p ?o
WHERE
{
  ?s ?p ?o
}
OFFSET 0
LIMIT 5
"""

results_df = sparqlviewhelper.query_sparql(select_all_query,result_format = "DATAFRAME")

results_df.head()

Most SPARQL queries you'll see will have the anotomy above with:
* a **SELECT** clause that let you select the variables you want to retrieve
* a **WHERE** clause defining a set of constraints that the variables should satisfy to be retrieved
* **LIMIT** and **OFFSET** clauses to enable pagination
* the constraints are usually graph patterns in the form of **triple** (?s for subject, ?p for property and ?o for ?object)

Multiple triples can be provided as graph pattern to match but each triple should end with a period. As an example, let retrieve EPFL (?institute) along with its name (?name) and identifier (?grid_id).

In [None]:
epfl_with_name = """

Select  ?institute ?name ?grid_id
WHERE {
    ?institute vocab:acronym "EPFL".
    ?institute a vocab:Organization.
    ?institute vocab:grid_id ?grid_id . 
    ?institute vocab:name ?name . 
}
LIMIT 100
"""

results_df = sparqlviewhelper.query_sparql(epfl_with_name,result_format = "DATAFRAME")
results_df.head()


This is a typical instance query where entities are filtered by their type(s) and then some of their properties are retrieved (here ?name). 

Let retrieve everything that is linked (outgoing) to the movies. 
The * character in the SELECT clause indicates to retreve all variables: ?institute, ?p, ?o

In [None]:
org_with_properties = """
Select *
 WHERE  {
    ?institute a vocab:Organization.
    ?institute ?p ?o.
} LIMIT 20
"""

results_df = sparqlviewhelper.query_sparql(org_with_properties,result_format = "DATAFRAME")
results_df.head()


As a little exercise, write a query retrieving incoming entities to organizations. You can copy past the query above and modify it.

Hints: ?s ?p ?o can be read as: ?o is linked to ?s with an outgoing link.

Do you have results ?

In [None]:
#Your query here
