# Querying external database sources of interest

* Enable users to integrate data from external databases of interest within BBP KG
* While using the Nexus Forge interface and BMO vocabulary as much as possible as
* While benefiting from out of the box (meta)data transformation to make them ready for BBP internal pipelines and applications
* Demo with Mouselight, NeuroElectro, UniProt

In [1]:
import os
import json

from kgforge.core import KnowledgeGraphForge
from kgforge.specializations.resources import Dataset

In [2]:
# import getpass
# TOKEN = getpass.getpass()

In [3]:
endpoint = "https://staging.nise.bbp.epfl.ch/nexus/v1"
BUCKET = "dke/kgforge"
forge = KnowledgeGraphForge("../../configurations/database-sources/prod-nexus-sources.yml", endpoint=endpoint, bucket=BUCKET)

# List of Data sources

In [4]:
forge.db_sources(pretty=True)

Available Database sources:
UniProt
NeuroElectro
NeuroMorpho


In [5]:
sources = forge.db_sources()

In [6]:

data = {
       'origin': 'store',
       'source': 'DemoStore',
       'model': { 
          'name': 'DemoModel',
          'origin': 'directory',
          'source': "../../../tests/data/demo-model/" 
        }
}


In [7]:
from kgforge.specializations.databases import StoreDatabase
ds = StoreDatabase(forge, name="DemoDB", **data)

In [8]:
forge.add_db_source(ds)

In [9]:
forge.db_sources(pretty=True)

Available Database sources:
UniProt
NeuroElectro
NeuroMorpho
DemoDB


# Data source metadata

In [10]:
neuroelectro = sources['NeuroElectro']

## Get data mappings (hold transformations logic) per data type

* Data mappings are used to transform results obtained from the external data sources so that they are ready for consumption by BBP tools
* Perform automatic ontology linking

In [11]:
forge.mappings(source="NeuroElectro")

Managed mappings for the data source per entity type and mapping type:
   - ElectrophysiologicalFeatureAnnotation:
        * DictionaryMapping
   - ParameterAnnotation:
        * DictionaryMapping
   - ParameterBody:
        * DictionaryMapping
   - ScholarlyArticle:
        * DictionaryMapping
   - SeriesBody:
        * DictionaryMapping


In [12]:
forge.mappings('UniProt')

Managed mappings for the data source per entity type and mapping type:
   - Gene:
        * DictionaryMapping
   - Protein:
        * DictionaryMapping


In [13]:
from kgforge.specializations.mappings import DictionaryMapping
mapping = forge.mapping(entity="ScholarlyArticle", source="NeuroElectro")

In [14]:
print(mapping)

{
    id: forge.format("identifier", "scholarlyarticles", x.id)
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: x.abstract
    author: x.authors_shaped
    datePublished: x.date_issued
    identifier: x.identifiers
    isPartOf:
    {
        type: Periodical
        issn: x.issn
        name: x.journal
        publisher: x.publisher
    }
    name: f"article_{x.id}"
    sameAs: x.full_text_link
    title: x.title
    url: x.full_text_link
}


In [15]:
forge.db_sources(type_='Gene', pretty=True)

<action> db_sources
<error> AttributeError: 'StoreDatabase' object has no attribute 'datatypes'



# Search and Access data from data source

* Mapping are automatically applied to search results
* takes a mn for now => working on making it faster 

In [16]:
filters = {"type":"ScholarlyArticle"}
#map=True, use_cache=True, # download=True
resources = forge.search(filters, db_source="NeuroElectro", limit=2, debug=True) 
# Add function for checking datsource health => reqsuire health url from db


Submitted query:
   PREFIX bmc: <https://bbp.epfl.ch/ontologies/core/bmc/>
   PREFIX bmo: <https://bbp.epfl.ch/ontologies/core/bmo/>
   PREFIX commonshapes: <https://neuroshapes.org/commons/>
   PREFIX datashapes: <https://neuroshapes.org/dash/>
   PREFIX dc: <http://purl.org/dc/elements/1.1/>
   PREFIX dcat: <http://www.w3.org/ns/dcat#>
   PREFIX dcterms: <http://purl.org/dc/terms/>
   PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
   PREFIX nsg: <https://neuroshapes.org/>
   PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
   PREFIX oa: <http://www.w3.org/ns/oa#>
   PREFIX obo: <http://purl.obolibrary.org/obo/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX schema: <http://schema.org/>
   PREFIX sh: <http://www.w3.org/ns/shacl#>
   PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
   PREFI

In [17]:
len(resources)

2

In [18]:
print(resources[0])

{
    context: https://bbp.neuroshapes.org
    id: https://bbp.epfl.ch/neurosciencegraph/data/scholarlyarticles/91941
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: Neurons in the medial septal/diagonal band complex (MS/DB) in vivo exhibit rhythmic burst-firing activity that is phase-locked with the hippocampal theta rhythm. The aim was to assess the morphology of local axon collaterals of electrophysiologically identified MS/DB neurons using intracellular recording and biocytin injection in vitro. Cells were classified according to previous criteria into slow-firing, fast-spiking, regular-spiking, and burst-firing neurons; previous work has suggested that the slow-firing neurons are cholinergic and that the other types are GABAergic. A novel finding was the existence of two types of burst-firing neuron. Type I burst-firing neurons had significantly longer duration after hyperpolarisation potentials when held at -60 mV, and at -75 mV, type I neurons exhibit

In [19]:
uquery = """
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein
WHERE {
  ?protein a up:Protein ;
  up:reviewed true.
}
"""

In [20]:
uresources = forge.sparql(query=uquery, db_source='UniProt', limit=10, debug=True)

Submitted query:
   
   PREFIX up: <http://purl.uniprot.org/core/>
   SELECT ?protein
   WHERE {
     ?protein a up:Protein ;
     up:reviewed true.
   }
     LIMIT 10



In [21]:
len(uresources)

10

In [22]:
uresources[0]

Resource(_last_action=None, _validated=False, _synchronized=False, _store_metadata=None, _inner_sync=False, protein='http://purl.uniprot.org/uniprot/A0B137')

## Use Filters to search

In [23]:
from kgforge.core.wrappings.paths import Filter, FilterOperator

In [24]:

proteins = forge.search({'type': 'Protein', 'up:reviewed': True}, db_source='UniProt', limit=10, debug=True)

Submitted query:
   PREFIX up: <http://purl.uniprot.org/core/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX owl2xml: <http://www.w3.org/2006/12/owl2-xml#>
   PREFIX swrlb: <http://www.w3.org/2003/11/swrlb#>
   PREFIX protege: <http://protege.stanford.edu/plugins/owl/protege#>
   PREFIX swrl: <http://www.w3.org/2003/11/swrl#>
   PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX dc11: <http://purl.org/dc/terms/>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?id WHERE {?id rdf:type up:Protein;
    up:reviewed ?v1 . 
    FILTER(?v1 = 'true'^^xsd:boolean)
   }  LIMIT 10



In [25]:
proteins[0]

Resource(_last_action=None, _validated=False, _synchronized=False, _store_metadata=None, id='http://purl.uniprot.org/uniprot/A0B137', _inner_sync=False)

# Map resources

In [26]:
uniprot = sources['UniProt']

In [27]:
complete_query = """
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?id ?gene ?label ?subject ?gene_label
WHERE {
  ?id a up:Protein ;
  up:reviewed true ;
  up:encodedBy ?gene ;
  up:recommendedName / up:fullName ?label ;
  up:organism / up:scientificName ?subject .
  ?gene skos:prefLabel ?gene_label . 
}
"""

In [28]:
raw_proteins = uniprot.sparql(complete_query)

In [29]:
new_resource = uniprot.map(raw_proteins[0], 'Protein')

In [30]:

print(json.dumps(forge.as_jsonld(new_resource), indent=4))

[
    {
        "@context": "https://bbp.neuroshapes.org",
        "@id": "https://bbp.epfl.ch/neurosciencegraph/data/proteins/P0DJN9",
        "@type": [
            "Entity",
            "Protein"
        ],
        "encodedBy": {
            "@id": "http://purl.uniprot.org/uniprot/P0DJN9#gene-MD5A00DD99270221B359AB0AE338E423668",
            "label": "acsF"
        },
        "identifier": {
            "propertyID": "UniProtKB",
            "value": "P0DJN9"
        },
        "name": "Protein P0DJN9 from UniProtKB",
        "label": "Aerobic magnesium-protoporphyrin IX monomethyl ester [oxidative] cyclase",
        "subject": {
            "label": "Rubrivivax gelatinosus"
        }
    }
]


### same result could be obtain from a dictionary and a DictionaryMapping instance

In [31]:
dict_resource = forge.as_json(raw_proteins[0])
mapping = DictionaryMapping.load("../../database-sources/UniProt/mappings/DictionaryMapping/Protein.hjson")
print(json.dumps(forge.as_jsonld(uniprot.map(dict_resource, mapping)), indent=4))

[
    {
        "@context": "https://bbp.neuroshapes.org",
        "@id": "https://bbp.epfl.ch/neurosciencegraph/data/proteins/P0DJN9",
        "@type": [
            "Entity",
            "Protein"
        ],
        "encodedBy": {
            "@id": "http://purl.uniprot.org/uniprot/P0DJN9#gene-MD5A00DD99270221B359AB0AE338E423668",
            "label": "acsF"
        },
        "identifier": {
            "propertyID": "UniProtKB",
            "value": "P0DJN9"
        },
        "name": "Protein P0DJN9 from UniProtKB",
        "label": "Aerobic magnesium-protoporphyrin IX monomethyl ester [oxidative] cyclase",
        "subject": {
            "label": "Rubrivivax gelatinosus"
        }
    }
]


## Query the NeuroMorpho WebService

In [32]:
neuromorpho = sources['NeuroMorpho']

In [33]:
nmo_filters = {"species": "rat,mouse,human", "response_loc": ["_embedded", "neuronResources"]}

In [34]:
nmo_resources = forge.search(nmo_filters, db_source='NeuroMorpho')



In [35]:
import uuid

In [36]:
new_morphology = neuromorpho.map(nmo_resources[0], 'NeuronMorphology')

In [37]:
print(json.dumps(forge.as_jsonld(new_morphology), indent=4))

[
    {
        "@context": "https://bbp.neuroshapes.org",
        "@type": [
            "Dataset",
            "NeuronMorphology"
        ],
        "@id": "https://bbp.epfl.ch/neurosciencegraph/data/neuronmorphologies/neuromorpho/000000001",
        "brainLocation": {
            "@type": "Class",
            "@id": "http://purl.obolibrary.org/obo/UBERON_0001950",
            "label": "neocortex",
            "subClassOf": "prov:Entity",
            "isDefinedBy": "http://bbp.epfl.ch/neurosciencegraph/ontologies/mtypes"
        },
        "contribution": {
            "@type": "Contribution",
            "agent": {
                "@type": "Organization",
                "label": "George Mason University"
            }
        },
        "identifier": 1,
        "archive": "Wearne_Hof",
        "name": "cnic_001",
        "generation": {
            "@type": "Generation",
            "activity": {
                "@type": "nsg:NeuronMorphologyReconstruction",
                "hadPro

In [38]:
format_file = neuromorpho.service.files_download['endpoint'] + "/{}/Source-Version/{}.swc"

In [39]:
example_url = format_file.format(new_morphology[0].archive.lower(), new_morphology[0].name)

In [40]:
file_path = f"./downloaded/{example_url.split('/')[-1]}"

In [41]:
neuromorpho._download_one(example_url, file_path)



In [42]:
neuromorpho.attach_file(new_morphology[0], file_path)

In [51]:
print(json.dumps(forge.as_jsonld(new_morphology[0]), indent=4))

{
    "@context": "https://bbp.neuroshapes.org",
    "@type": [
        "Dataset",
        "NeuronMorphology"
    ],
    "@id": "https://bbp.epfl.ch/neurosciencegraph/data/neuronmorphologies/neuromorpho/000000001",
    "brainLocation": {
        "@type": "Class",
        "@id": "http://purl.obolibrary.org/obo/UBERON_0001950",
        "label": "neocortex",
        "subClassOf": "prov:Entity",
        "isDefinedBy": "http://bbp.epfl.ch/neurosciencegraph/ontologies/mtypes"
    },
    "contribution": {
        "@type": "Contribution",
        "agent": {
            "@type": "Organization",
            "label": "George Mason University"
        }
    },
    "identifier": 1,
    "archive": "Wearne_Hof",
    "name": "cnic_001",
    "generation": {
        "@type": "Generation",
        "activity": {
            "@type": "nsg:NeuronMorphologyReconstruction",
            "hadProtocol": {}
        }
    },
    "subject": {
        "@type": "Subject",
        "species": {
            "label": "mo

In [49]:
forge.validate(new_morphology[0], type_="NeuronMorphology")

<action> _validate_one
<succeeded> False
<error> ReportableRuntimeError: Evaluation path too deep!
<NodeShape https://neuroshapes.org/dash/neuronmorphology/shapes/NeuronMorphologyShape>-><AndConstraintComponent on <NodeShape https://neuroshapes.org/dash/neuronmorphology/shapes/NeuronMorphologyShape>>-><NodeShape ub1bL20C14>-><NodeConstraintComponent on <NodeShape ub1bL20C14>>-><NodeShape https://neuroshapes.org/commons/minds/shapes/MINDSShape>-><AndConstraintComponent on <NodeShape https://neuroshapes.org/commons/minds/shapes/MINDSShape>>-><NodeShape ub2bL25C14>-><NodeConstraintComponent on <NodeShape ub2bL25C14>>-><NodeShape https://neuroshapes.org/commons/entity/shapes/EntityShape>-><PropertyConstraintComponent on <NodeShape https://neuroshapes.org/commons/entity/shapes/EntityShape>>-><PropertyShape Distribution>-><NodeConstraintComponent on <PropertyShape Distribution>>-><NodeShape https://neuroshapes.org/commons/distribution/shapes/DistributionShape>-><AndConstraintComponent on <No

In [50]:
forge.register(new_morphology[0], schema_id="datashapes:neuronmorphology")

<action> _register_one
<succeeded> False
<error> RegistrationError: 400 Client Error: Bad Request for url: https://staging.nise.bbp.epfl.ch/nexus/v1/resources/dke/kgforge/datashapes%3Aneuronmorphology
