# Searching and Downloading Data from the Blue Brain Knowledge Graph at a given tag or from a given view

## Initialize and configure

### Get an authentication token

For now, the [Nexus web application](https://bbp.epfl.ch/nexus/web) can be used to get a token. We are looking for other simpler alternatives.

- Step 1: From the opened web page, click on the login button on the right corner and follow the instructions.

![login-ui](./login-ui.png)

- Step 2: At the end you’ll see a token button on the right corner. Click on it to copy the token.

![login-ui](./copy-token.png)


Once a token is obtained then proceed to paste it below.

In [None]:
import getpass

In [None]:
TOKEN = getpass.getpass()

### Configure a client (forge) to access the knowledge graph 

In [None]:
from kgforge.core import KnowledgeGraphForge

In [None]:
# Let target the sscx dissemination project in Nexus. Different values for ORG and PROJECT acn be set/
bucket = "bbp/lnmce"

In [None]:
forge = KnowledgeGraphForge("prod-forge-nexus.yml",bucket=bucket,token=TOKEN)

## Search and Download

In [None]:
# List available data types from the BBP Knowledge Graph
forge.types()

### Data at a given tag
Tagged data are data with immutable identifiers. Such identifier gives the guarantee to retrieve the state of the data at the time the tag was created. Tag here is similaar to git tag.

#### Set tag value

In [None]:
tag = "LNMCE2021"

#### Set filters

In [None]:
# Let search for Electrophysiology Traces
_type = "Trace"
#classification_type=":EType"
#eType="bIR"
brainRegion = "primary somatosensory cortex"
encodingFormat="application/nwb"
limit=10

#### Run Query

In [None]:
#path = forge.paths("Dataset") # to have autocompletion on the properties
data = forge.search(path.type.id == _type,
                    #path.annotation.hasBody.type.id ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'
                    #path.annotation.hasBody.label ==eType,
                    path.brainLocation.brainRegion.label == brainRegion,
                    path.distribution.encodingFormat == encodingFormat,
                    limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

#### Retrieve results at the set tag

In [None]:
results = list()
for d in data:
    r = forge.retrieve(d.id, version=tag)
    if r:
        results.append(r)
print(f"{len(results)} data of type '{_type}' found at tag {tag}.")

#### Display the results

In [None]:
DISPLAY_LIMIT = 10
reshaped_data = forge.reshape(results, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

#### Dowload

In [None]:
dirpath = "./downloaded/"
forge.download(results, "distribution.contentUrl", dirpath)

### Data in a given view
A view exposes a subset of data for query and access in specialised indices (SPARQL, ElasticSearch).

In [None]:
# Here is an example of view url
view_url = "https://bluebrain.github.io/nexus/vocabulary/lnmce2020SparqlIndex"
bucket="bbp/lnmce"

In [None]:
searchendpoints = {"sparql":{"endpoint":view_url}}
forge_view = KnowledgeGraphForge("prod-forge-nexus.yml", bucket=bucket, token=TOKEN, searchendpoints=searchendpoints)

#### Set filters

In [None]:
# Let search for Electrophysiology Traces
_type = "Trace"
classification_type=":EType"
eType="bIR"
brainRegion = "primary somatosensory cortex"
encodingFormat="application/nwb"
limit=10

#### Run Query

In [None]:
#path = forge_view.paths("Dataset") # to have autocompletion on the properties
data = forge_view.search(path.type.id == _type,
                    #path.annotation.hasBody.type ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'
                    path.annotation.hasBody.label ==eType,
                    path.brainLocation.brainRegion.label == brainRegion,
                    path.distribution.encodingFormat == encodingFormat,
                    limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

#### Display the results

In [None]:
DISPLAY_LIMIT = 10
reshaped_data = forge_view.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge_view.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

#### Dowload

In [None]:
dirpath = "./downloaded/"
forge_view.download(data, "distribution.contentUrl", dirpath)