# Querying

This notebook demonstrates how to retrieve, query and search data using the Forge.

In [None]:
from kgforge.core import KnowledgeGraphForge

In [None]:
forge = KnowledgeGraphForge("../../configurations/forge.yml")

## Imports

In [None]:
from kgforge.core import Resource

## Retrieval

In [None]:
jane = Resource(type="Person", name="Jane Doe")

In [None]:
forge.register(jane)

In [None]:
resource = forge.retrieve(jane.id)

In [None]:
resource == jane

### specific version

In [None]:
jane = Resource(type="Person", name="Jane Doe")

In [None]:
forge.register(jane)

In [None]:
forge.tag(jane, "v1")

In [None]:
jane.email = "jane.doe@epfl.ch"

In [None]:
forge.update(jane)

In [None]:
try:
    # DemoStore
    print(jane._store_metadata.version)
except:
    # BlueBrainNexus
    print(jane._store_metadata._rev)

In [None]:
jane_v1 = forge.retrieve(jane.id, version=1)

In [None]:
jane_v1_tag = forge.retrieve(jane.id, version="v1")

In [None]:
jane_v1 == jane_v1_tag

### crossbucket retrieval
It is possible to retrieve resources stored in buckets different then the configured one. The configured store should of course support it.

In [None]:
resource = forge.retrieve(jane.id, cross_bucket=True) # cross_bucket defaults to False

### error handling

In [None]:
resource = forge.retrieve("123")

In [None]:
resource is None

## Searching

Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel. Commented lines are for DemoModel.

In [None]:
jane = Resource(type="Person", name="Jane Doe")

In [None]:
john = Resource(type="Person", name="John Smith")

In [None]:
# association_jane = Resource(type="Association", agent=jane)
association_jane = Resource(type="Dataset", contribution=jane)

In [None]:
# association_john = Resource(type="Association", agent=john)
association_john = Resource(type="Dataset", contribution=john)

In [None]:
associations = [association_jane, association_john]

In [None]:
forge.register(associations)

The `paths` method load the data structure for the given type.

Please refer to the [Modeling.ipynb](11%20-%20Modeling.ipynb) notebook to learn about modeling and types.

In [None]:
# p = forge.paths("Association")
p = forge.paths("Dataset")

You have autocompletion on `p` and this can be used to build a search.

Note: There is a known issue for RdfModel which requires using `p.type.id` instead of `p.type`.

In [None]:
# resources = forge.search(p.type == "Association")

resources = forge.search(p.type.id=="Dataset", limit=3)

In [None]:
type(resources)

In [None]:
len(resources)

In [None]:
forge.as_dataframe(resources)

In [None]:
forge.as_dataframe(resources, store_metadata=True)

### crossbucket search
It is possible to search for resources stored in buckets different then the configured one. The configured store should of course support it.

In [None]:
resources = forge.search(p.type.id == "Dataset", limit=10, cross_bucket=True)  # cross_bucket defaults to False

In [None]:
type(resources)

In [None]:
len(resources)

In [None]:
forge.as_dataframe(resources)

In [None]:
#Furthermore it is possible to filter by bucket when cross_bucket is set to True. Setting a bucket value when cross_bucket is False will trigger a not_supported exception.
resources = forge.search(p.type.id == "Dataset", limit=3, cross_bucket=True, bucket=<str>)  # add a bucket

In [None]:
type(resources)

In [None]:
len(resources)

In [None]:
forge.as_dataframe(resources)

### nested field querying

You have autocompletion on `p` but also on nested properties like `p.agent`.

Note: There is a known issue for RdfModel which prevents from searching on the name.

In [None]:
# resources = forge.search(p.type == "Association", p.agent.name == "John Smith")
resources = forge.search(p.type.id == "Dataset", p.contribution.type == "Person", limit=3)

In [None]:
len(resources)

In [None]:
forge.as_dataframe(resources)

## Graph traversing

SPARQL is used as a query language.

A SPARQL query rewriting strategy lets users write simplified queries without using prefix declarations, prefix names or long IRIs. With this strategy, the user could only provides type and property names. For a given entity, these names could be seen in its template.

Please refer to the [Modeling.ipynb](11%20-%20Modeling.ipynb) notebook to learn about templates.

Note: DemoStore doesn't implement SPARQL operations yet. Please use another store for this section.

Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel. Commented lines are for DemoModel.

In [None]:
jane = Resource(type="Person", name="Jane Doe")

In [None]:
john = Resource(type="Person", name="John Smith")

In [None]:
# association = Resource(type="Association", agent=[jane, john])
association = Resource(type="Dataset", contribution=[jane, john])

In [None]:
forge.register(association)

In [None]:
# forge.template("Association")
forge.template("Dataset")

In [None]:
# query = """
#     SELECT ?id ?name
#     WHERE {
#         ?id a Association ;
#             agent ?agent .
#         ?agent name ?name .
#     }
# """
query = """
    SELECT ?id ?p ?o
    WHERE {
        ?id a Dataset ;
        contribution ?contributor.
        ?contributor name ?name.
    }
"""

In [None]:
resources = forge.sparql(query, limit=3)

In [None]:
type(resources)

In [None]:
len(resources)

In [None]:
type(resources[0])

In [None]:
forge.as_dataframe(resources)

### rewritten query display

In [None]:
resources = forge.sparql(query, limit=3, debug=True)

## Downloading

Note: DemoStore doesn't implement file operations yet. Please use another store for this section.

In [None]:
jane = Resource(type="Person", name="Jane Doe")

In [None]:
! ls -p ../../data | egrep -v /$

In [None]:
distribution = forge.attach("../../data")

In [None]:
association = Resource(type="Association", agent=jane, distribution=distribution)

In [None]:
forge.register(association)

In [None]:
forge.download(association, "distribution.contentUrl", "./downloaded/")

In [None]:
! ls -l ./downloaded/

In [None]:
# ! rm -R ./downloaded/