# Querying

This notebook demonstrates Nexus Forge data [querying features](https://nexus-forge.readthedocs.io/en/latest/interaction.html#querying).

In [None]:
from kgforge.core import KnowledgeGraphForge

A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook [00-Initialization.ipynb](00%20-%20Initialization.ipynb).

In [None]:
forge = KnowledgeGraphForge("../../configurations/forge.yml")

## Imports

In [None]:
from kgforge.core import Resource
from kgforge.specializations.resources import Dataset

## Retrieval

### latest version

In [6]:
jane = Resource(type="Person", name="Jane Doe")

In [7]:
forge.register(jane)

<action> _register_one
<succeeded> True


In [8]:
resource = forge.retrieve(jane.id)

In [9]:
resource == jane

True

### specific version

In [10]:
jane = Resource(type="Person", name="Jane Doe")

In [11]:
forge.register(jane)

<action> _register_one
<succeeded> True


In [12]:
forge.tag(jane, "v1")

<action> _tag_one
<succeeded> True


In [13]:
jane.email = "jane.doe@epfl.ch"

In [14]:
forge.update(jane)

<action> _update_one
<succeeded> True


In [15]:
try:
    # DemoStore
    print(jane._store_metadata.version)
except:
    # BlueBrainNexus
    print(jane._store_metadata._rev)

3


In [16]:
jane_v1 = forge.retrieve(jane.id, version=1)

In [17]:
jane_v1_tag = forge.retrieve(jane.id, version="v1")

In [18]:
jane_v1 == jane_v1_tag

True

### crossbucket retrieval
It is possible to retrieve resources stored in buckets different then the configured one. The configured store should of course support it.

In [19]:
resource = forge.retrieve(jane.id, cross_bucket=True) # cross_bucket defaults to False

### error handling

In [20]:
resource = forge.retrieve("123")

<action> retrieve
<error> RetrievalError: 404 Client Error: Not Found for url: https://sandbox.bluebrainnexus.io/v1/resources/github-users/mfsy/_/%3A%2F%2F123



In [21]:
resource is None

True

## Searching

Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel. Commented lines are for DemoModel.

In [22]:
jane = Resource(type="Person", name="Jane Doe")
contribution_jane = Resource(type="Contribution", agent=jane)

In [23]:
john = Resource(type="Person", name="John Smith")
contribution_john = Resource(type="Contribution", agent=john)

In [24]:
dataset = Dataset(forge, type="Dataset", contribution=[contribution_jane, contribution_john])
dataset.add_distribution("../../data/associations.tsv")

In [25]:
forge.register(dataset)

<action> _register_one
<succeeded> True


In [None]:
forge.as_json(dataset)

### Paths as filters

The `paths` method load the data structure for the given type.

Please refer to the [Modeling.ipynb](11%20-%20Modeling.ipynb) notebook to learn about modeling and types.

In [41]:
p = forge.paths("Dataset")

You have autocompletion on `p` and this can be used to create search filters.

Note: There is a known issue for RdfModel which requires using `p.type.id` instead of `p.type`.

In [42]:
resources = forge.search(p.type.id=="Person", limit=3)

In [43]:
type(resources)

list

In [44]:
len(resources)

3

In [45]:
forge.as_dataframe(resources)

Unnamed: 0,id,type,distribution.type,distribution.atLocation.type,distribution.atLocation.store.id,distribution.contentSize.unitCode,distribution.contentSize.value,distribution.contentUrl,distribution.digest.algorithm,distribution.digest.value,distribution.encodingFormat,distribution.name,name
0,https://sandbox.bluebrainnexus.io/v1/resources...,Person,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506.0,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,9639abc864e91c645779f510ae5c06a1618941d569eb1a...,text/tab-separated-values,associations.tsv,Jane Doe
1,https://sandbox.bluebrainnexus.io/v1/resources...,Person,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,126.0,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,8eae22fcd5e3b14fd6df3d45b005bda759e78bf8267b63...,text/csv,persons.csv,John Smith
2,https://www.wikidata.org/wiki/Q937,Person,,,,,,,,,,,Albert Einstein


In [46]:
forge.as_json(resources[2])

{'id': 'https://www.wikidata.org/wiki/Q937',
 'type': 'Person',
 'name': 'Albert Einstein'}

In [47]:
forge.as_dataframe(resources, store_metadata=True)

Unnamed: 0,id,type,distribution.type,distribution.atLocation.type,distribution.atLocation.store.id,distribution.contentSize.unitCode,distribution.contentSize.value,distribution.contentUrl,distribution.digest.algorithm,distribution.digest.value,...,_createdBy,_deprecated,_incoming,_outgoing,_project,_rev,_schemaProject,_self,_updatedAt,_updatedBy
0,https://sandbox.bluebrainnexus.io/v1/resources...,Person,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506.0,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,9639abc864e91c645779f510ae5c06a1618941d569eb1a...,...,https://sandbox.bluebrainnexus.io/v1/realms/gi...,False,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/projects/...,1,https://sandbox.bluebrainnexus.io/v1/projects/...,https://sandbox.bluebrainnexus.io/v1/resources...,2021-08-17T11:00:14.662Z,https://sandbox.bluebrainnexus.io/v1/realms/gi...
1,https://sandbox.bluebrainnexus.io/v1/resources...,Person,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,126.0,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,8eae22fcd5e3b14fd6df3d45b005bda759e78bf8267b63...,...,https://sandbox.bluebrainnexus.io/v1/realms/gi...,False,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/projects/...,1,https://sandbox.bluebrainnexus.io/v1/projects/...,https://sandbox.bluebrainnexus.io/v1/resources...,2021-08-17T11:00:14.664Z,https://sandbox.bluebrainnexus.io/v1/realms/gi...
2,https://www.wikidata.org/wiki/Q937,Person,,,,,,,,,...,https://sandbox.bluebrainnexus.io/v1/realms/gi...,False,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/projects/...,1,https://sandbox.bluebrainnexus.io/v1/projects/...,https://sandbox.bluebrainnexus.io/v1/resources...,2021-08-17T11:00:26.713Z,https://sandbox.bluebrainnexus.io/v1/realms/gi...


### Nested property querying

Property autocompletion is available on a path `p` even for nested properties like `p.contribution`.

In [50]:
# Search for resources of type Dataset and with a attached files of content type text/tab-separated-values
resources = forge.search(p.type.id == "Person", p.distribution.encodingFormat == "text/tab-separated-values", limit=3)

In [51]:
len(resources)

3

In [52]:
forge.as_dataframe(resources)

Unnamed: 0,id,type,distribution.type,distribution.atLocation.type,distribution.atLocation.store.id,distribution.contentSize.unitCode,distribution.contentSize.value,distribution.contentUrl,distribution.digest.algorithm,distribution.digest.value,distribution.encodingFormat,distribution.name,name
0,https://sandbox.bluebrainnexus.io/v1/resources...,Person,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,9639abc864e91c645779f510ae5c06a1618941d569eb1a...,text/tab-separated-values,associations.tsv,Jane Doe
1,https://sandbox.bluebrainnexus.io/v1/resources...,Person,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,9639abc864e91c645779f510ae5c06a1618941d569eb1a...,text/tab-separated-values,associations.tsv,Jane Doe
2,https://sandbox.bluebrainnexus.io/v1/resources...,Person,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,9639abc864e91c645779f510ae5c06a1618941d569eb1a...,text/tab-separated-values,associations.tsv,Jane Doe


### Dict as filters
A dictionary can be provided for filters:
* {'type': {'id':'Dataset'}} is equivalent to p.type.id=="Dataset"
* only the '==' operator is supported
* nested dict are supported
* it is not mandatory for the provided properties and values to be defined in the forge model. Results will be retrieved if there are corresponding data in the store.

This feature is not supported when using the DemoStore


In [60]:
# Search for resources of type Dataset and with conribution from agent named "Jane Doe"
filters = {"type": "Person", "name":"Jane Doe"}
resources = forge.search(filters, limit=3)

In [61]:
type(resources)

list

In [62]:
len(resources)

3

In [63]:
forge.as_dataframe(resources, store_metadata=True)

Unnamed: 0,id,type,distribution.type,distribution.atLocation.type,distribution.atLocation.store.id,distribution.contentSize.unitCode,distribution.contentSize.value,distribution.contentUrl,distribution.digest.algorithm,distribution.digest.value,...,_createdBy,_deprecated,_incoming,_outgoing,_project,_rev,_schemaProject,_self,_updatedAt,_updatedBy
0,https://sandbox.bluebrainnexus.io/v1/resources...,Person,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506.0,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,9639abc864e91c645779f510ae5c06a1618941d569eb1a...,...,https://sandbox.bluebrainnexus.io/v1/realms/gi...,False,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/projects/...,1,https://sandbox.bluebrainnexus.io/v1/projects/...,https://sandbox.bluebrainnexus.io/v1/resources...,2021-08-17T11:00:14.662Z,https://sandbox.bluebrainnexus.io/v1/realms/gi...
1,https://sandbox.bluebrainnexus.io/v1/resources...,Person,,,,,,,,,...,https://sandbox.bluebrainnexus.io/v1/realms/gi...,False,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/projects/...,1,https://sandbox.bluebrainnexus.io/v1/projects/...,https://sandbox.bluebrainnexus.io/v1/resources...,2021-08-17T12:28:14.266Z,https://sandbox.bluebrainnexus.io/v1/realms/gi...
2,https://sandbox.bluebrainnexus.io/v1/resources...,Person,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,126.0,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,8eae22fcd5e3b14fd6df3d45b005bda759e78bf8267b63...,...,https://sandbox.bluebrainnexus.io/v1/realms/gi...,False,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/projects/...,1,https://sandbox.bluebrainnexus.io/v1/projects/...,https://sandbox.bluebrainnexus.io/v1/resources...,2021-08-17T12:28:24.873Z,https://sandbox.bluebrainnexus.io/v1/realms/gi...


### Crossbucket search
It is possible to search for resources stored in buckets different than the configured one. The configured store should of course support it.

In [69]:
resources = forge.search(p.type.id == "Association", limit=3, cross_bucket=True)  # cross_bucket defaults to False

In [70]:
type(resources)

list

In [71]:
len(resources)

3

In [72]:
forge.as_dataframe(resources)

Unnamed: 0,id,type,agent.type,agent.gender.id,agent.gender.type,agent.gender.label,agent.name,distribution.type,distribution.atLocation.type,distribution.atLocation.store.id,distribution.contentSize.unitCode,distribution.contentSize.value,distribution.contentUrl,distribution.digest.algorithm,distribution.digest.value,distribution.encodingFormat,distribution.name,name
0,https://kg.example.ch/associations/123,Association,Person,http://purl.obolibrary.org/obo/PATO_0000383,LabeledOntologyEntity,female,Marie Curie,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,46.0,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,e0fe65f725bf28fe2b88c7bafb51fb5ef1df0ab14c68a3...,text/plain,marie_curie.txt,Curie Association
1,https://sandbox.bluebrainnexus.io/v1/resources...,Association,Person,http://purl.obolibrary.org/obo/PATO_0000384,LabeledOntologyEntity,male,Albert Einstein,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,50.0,https://sandbox.bluebrainnexus.io/v1/files/git...,SHA-256,91a5ce5c84dc5bead730a4b49d0698b4aaef4bc06ce164...,text/plain,albert_einstein.txt,Einstein Association
2,https://sandbox.bluebrainnexus.io/v1/resources...,Association,Person,,,,Jane Doe,,,,,,,,,,,


In [None]:
#Furthermore it is possible to filter by bucket when cross_bucket is set to True. Setting a bucket value when cross_bucket is False will trigger a not_supported exception.
resources = forge.search(p.type.id == "Person", limit=3, cross_bucket=True, bucket=<str>)  # add a bucket

In [None]:
type(resources)

In [None]:
len(resources)

In [None]:
forge.as_dataframe(resources)

## Graph traversing

SPARQL is used as a query language.

A SPARQL query rewriting strategy lets users write simplified queries without using prefix declarations, prefix names or long IRIs. With this strategy, the user could only provides type and property names. For a given entity type, these names could be seen in its template.

Please refer to the [Modeling.ipynb](11%20-%20Modeling.ipynb) notebook to learn about templates.

Note: DemoStore doesn't implement SPARQL operations yet. Please use another store for this section.

Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel.

In [76]:
jane = Resource(type="Person", name="Jane Doe")
contribution_jane = Resource(type="Contribution", agent=jane)

In [77]:
john = Resource(type="Person", name="John Smith")
contribution_john = Resource(type="Contribution", agent=john)

In [78]:
association = Resource(type="Dataset", contribution=[contribution_jane, contribution_john])

In [79]:
forge.register(association)

<action> _register_one
<succeeded> True


In [80]:
forge.template("Dataset")

{
    id: ""
    type:
    {
        id: ""
    }
    annotation:
    {
        id: ""
        type: Annotation
        hasBody:
        {
            id: ""
            type: AnnotationBody
            label: ""
            note: ""
        }
        hasTarget:
        {
            id: ""
            type: AnnotationTarget
        }
        note: ""
    }
    brainLocation:
    {
        id: ""
        type: BrainLocation
        atlasSpatialReferenceSystem:
        {
            id: ""
            type: AtlasSpatialReferenceSystem
        }
        brainRegion:
        {
            id: ""
            label: ""
        }
        coordinatesInBrainAtlas:
        {
            id: ""
            valueX: 0.0
            valueY: 0.0
            valueZ: 0.0
        }
        coordinatesInSlice:
        {
            spatialReferenceSystem:
            {
                id: ""
                type: SpatialReferenceSystem
            }
            valueX: 0.0
            valueY: 0.0
      

### Prefix and namespace free SPARQL query

When a forge RDFModel is configured, then there is no need to provide prefixes and namespaces when writing a SPARQL query. Prefixes and namespaces will be automatically inferred from the provided schemas and/or JSON-LD context and the query rewritten accordingly.

In [81]:
query = """
    SELECT ?id ?name
    WHERE {
        ?id a Dataset ;
        contribution/agent ?contributor.
        ?contributor name ?name.
    }
"""

In [82]:
resources = forge.sparql(query, limit=3)

In [83]:
type(resources)

list

In [84]:
len(resources)

3

In [85]:
type(resources[0])

kgforge.core.resource.Resource

In [86]:
forge.as_dataframe(resources)

Unnamed: 0,id,name
0,https://sandbox.bluebrainnexus.io/v1/resources...,John Smith
1,https://sandbox.bluebrainnexus.io/v1/resources...,Jane Doe
2,https://sandbox.bluebrainnexus.io/v1/resources...,John Smith


### rewritten query display
The prefix free SPARQL query provided above is rewritten as the ouput of cell when a forge Model is configured. 

In [87]:
resources = forge.sparql(query, limit=3, debug=True)

Submitted query:
   PREFIX dc: <http://purl.org/dc/elements/1.1/>
   PREFIX dcat: <http://www.w3.org/ns/dcat#>
   PREFIX dcterms: <http://purl.org/dc/terms/>
   PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
   PREFIX nsg: <https://neuroshapes.org/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX schema: <http://schema.org/>
   PREFIX sh: <http://www.w3.org/ns/shacl#>
   PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX vann: <http://purl.org/vocab/vann/>
   PREFIX void: <http://rdfs.org/ns/void#>
   PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   PREFIX : <https://neuroshapes.org/>
   
       SELECT ?id ?name
       WHERE {
           ?id a schema:Dataset ;
           nsg:contribution/prov:agent ?contributor.
           ?contributor sc

### SPARQL query

regular SPARQL query can also be provided.

In [88]:
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
   PREFIX dcat: <http://www.w3.org/ns/dcat#>
   PREFIX dcterms: <http://purl.org/dc/terms/>
   PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
   PREFIX nsg: <https://neuroshapes.org/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX schema: <http://schema.org/>
   PREFIX sh: <http://www.w3.org/ns/shacl#>
   PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX vann: <http://purl.org/vocab/vann/>
   PREFIX void: <http://rdfs.org/ns/void#>
   PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   PREFIX : <https://neuroshapes.org/>
   SELECT ?id ?name
   WHERE {
       ?id a schema:Dataset ;
       nsg:contribution/prov:agent ?contributor.
       ?contributor schema:name ?name.
   }
"""

In [89]:
resources = forge.sparql(query, limit=3)

In [90]:
type(resources)

list

In [91]:
len(resources)

3

In [92]:
type(resources[0])

kgforge.core.resource.Resource

In [93]:
forge.as_dataframe(resources)

Unnamed: 0,id,name
0,https://sandbox.bluebrainnexus.io/v1/resources...,John Smith
1,https://sandbox.bluebrainnexus.io/v1/resources...,Jane Doe
2,https://sandbox.bluebrainnexus.io/v1/resources...,John Smith


## Downloading

Note: DemoStore doesn't implement file operations yet. Please use another store for this section.

In [94]:
jane = Resource(type="Person", name="Jane Doe")

In [95]:
! ls -p ../../data | egrep -v /$

associations.tsv
my_data.xwz
my_data_derived.txt
persons.csv
tfidfvectorizer_model_schemaorg_linking


In [96]:
distribution = forge.attach("../../data")

In [97]:
association = Resource(type="Association", agent=jane, distribution=distribution)

In [98]:
forge.register(association)

<action> _register_one
<succeeded> True


In [99]:
# The argument overwrite: bool can be provided to decide whether to overwrite (True) existing files with the same name or
# to create new ones (False) with their names suffixed with a timestamp.
# A cross_bucket argument can be provided to download data from the configured bucket (cross_bucket=False - the default value) 
# or from a bucket different than the configured one (cross_bucket=True). The configured store should support crossing buckets for this to work.
forge.download(association, "distribution.contentUrl", "./downloaded/")

In [100]:
! ls -l ./downloaded/

total 464
-rw-r--r--  1 mfsy  staff     506 Aug 23 11:18 associations.tsv
-rw-r--r--  1 mfsy  staff     506 Aug 23 11:18 associations.tsv.20210823111849
-rw-r--r--  1 mfsy  staff     477 Aug 23 11:35 associations.tsv.20210823113551
-rw-r--r--  1 mfsy  staff      16 Aug 23 11:35 my_data.xwz
-rw-r--r--  1 mfsy  staff      24 Aug 23 11:35 my_data_derived.txt
-rw-r--r--  1 mfsy  staff      52 Aug 23 11:18 persons.csv
-rw-r--r--  1 mfsy  staff      52 Aug 23 11:35 persons.csv.20210823113551
-rw-r--r--  1 mfsy  staff  204848 Aug 23 11:35 tfidfvectorizer_model_schemaorg_linking


In [None]:
# ! rm -R ./downloaded/