# Querying external database sources of interest

* Enable users to integrate data from external databases of interest within BBP KG
* While using the Nexus Forge interface and BMO vocabulary as much as possible as
* While benefiting from out of the box (meta)data transformation to make them ready for BBP internal pipelines and applications
* Demo with Mouselight, NeuroElectro, UniProt

In [1]:
import json

from kgforge.core import KnowledgeGraphForge
from kgforge.specializations.resources import Dataset

In [32]:
import getpass
TOKEN = getpass.getpass()

In [33]:
endpoint = "https://staging.nise.bbp.epfl.ch/nexus/v1"
BUCKET = "neurosciencegraph/datamodels"
forge = KnowledgeGraphForge("../../configurations/database-sources/prod-nexus-sources.yml",
token=TOKEN, endpoint=endpoint, bucket=BUCKET)

# List of Data sources

In [3]:
forge.db_sources(pretty=True)

Available Database sources:
UniProt
NeuroElectro


In [4]:
sources = forge.db_sources()

In [5]:

data = {
       'origin': 'store',
       'source': 'DemoStore',
       'model': { 
          'name': 'DemoModel',
          'origin': 'directory',
          'source': "../../../tests/data/demo-model/" 
        }
}


In [6]:
from kgforge.specializations.databases import StoreDatabase
ds = StoreDatabase(forge, name="DemoDB", **data)

In [7]:
forge.add_db_source(ds)

In [8]:
forge.db_sources(pretty=True)

Available Database sources:
UniProt
NeuroElectro
DemoDB


# Data source metadata

In [9]:
neuroelectro = sources['NeuroElectro']

## Get data mappings (hold transformations logic) per data type

* Data mappings are used to transform results obtained from the external data sources so that they are ready for consumption by BBP tools
* Perform automatic ontology linking

In [10]:
forge.mappings(source="NeuroElectro")

Managed mappings for the data source per entity type and mapping type:
   - ElectrophysiologicalFeatureAnnotation:
        * DictionaryMapping
   - ParameterAnnotation:
        * DictionaryMapping
   - ParameterBody:
        * DictionaryMapping
   - ScholarlyArticle:
        * DictionaryMapping
   - SeriesBody:
        * DictionaryMapping


In [11]:
forge.mappings('UniProt')

Managed mappings for the data source per entity type and mapping type:
   - Gene:
        * DictionaryMapping
   - Protein:
        * DictionaryMapping


In [12]:
from kgforge.specializations.mappings import DictionaryMapping
mapping = forge.mapping(entity="ScholarlyArticle", source="NeuroElectro")

In [13]:
print(mapping)

{
    id: forge.format("identifier", "scholarlyarticles", x.id)
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: x.abstract
    author: x.authors_shaped
    datePublished: x.date_issued
    identifier: x.identifiers
    isPartOf:
    {
        type: Periodical
        issn: x.issn
        name: x.journal
        publisher: x.publisher
    }
    name: f"article_{x.id}"
    sameAs: x.full_text_link
    title: x.title
    url: x.full_text_link
}


In [14]:
forge.db_sources(type_='Gene', pretty=True)

<action> db_sources
<error> AttributeError: 'StoreDatabase' object has no attribute 'datatypes'



# Search and Access data from data source

* Mapping are automatically applied to search results
* takes a mn for now => working on making it faster 

In [15]:
filters = {"type":"ScholarlyArticle"}
#map=True, use_cache=True, # download=True
resources = forge.search(filters, db_source="NeuroElectro", limit=2, debug=True) 
# Add function for checking datsource health => reqsuire health url from db


Submitted query:
   PREFIX bmc: <https://bbp.epfl.ch/ontologies/core/bmc/>
   PREFIX bmo: <https://bbp.epfl.ch/ontologies/core/bmo/>
   PREFIX commonshapes: <https://neuroshapes.org/commons/>
   PREFIX datashapes: <https://neuroshapes.org/dash/>
   PREFIX dc: <http://purl.org/dc/elements/1.1/>
   PREFIX dcat: <http://www.w3.org/ns/dcat#>
   PREFIX dcterms: <http://purl.org/dc/terms/>
   PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
   PREFIX nsg: <https://neuroshapes.org/>
   PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
   PREFIX oa: <http://www.w3.org/ns/oa#>
   PREFIX obo: <http://purl.obolibrary.org/obo/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX schema: <http://schema.org/>
   PREFIX sh: <http://www.w3.org/ns/shacl#>
   PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
   PREFI

In [16]:
len(resources)

2

In [17]:
print(resources[0])

{
    context: https://bbp.neuroshapes.org
    id: https://bbp.epfl.ch/neurosciencegraph/data/scholarlyarticles/35463
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: Rationally, an increased intrinsic excitability of dorsal horn neurons could be a factor contributing to alter the gain of the nociceptive system during central sensitization, however direct evidence is scarce. Here we have examined this hypothesis using current and voltage-clamp recordings from dorsal horn neurons in the spinal cord in vitro preparation obtained from mice pups of either sex. Cords were extracted from carrageenan-pretreated and control animals to allow for comparison. Dorsal horn neurons from treated animals showed significantly larger and faster synaptic responses. Synaptic changes started developing shortly after inflammation (1 h) and developed further after a longer-term inflammation (20 h). However, these neurons showed biphasic changes in membrane excitability with an incr

In [18]:
uquery = """
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein
WHERE {
  ?protein a up:Protein ;
  up:reviewed true.
}
"""

In [19]:
uresources = forge.sparql(query=uquery, db_source='UniProt', limit=10, debug=True)

Submitted query:
   
   PREFIX up: <http://purl.uniprot.org/core/>
   SELECT ?protein
   WHERE {
     ?protein a up:Protein ;
     up:reviewed true.
   }
     LIMIT 10



In [20]:
len(uresources)

10

In [21]:
uresources[0]

Resource(_last_action=None, _validated=False, _synchronized=False, _store_metadata=None, _inner_sync=False, protein='http://purl.uniprot.org/uniprot/A0B137')

## Use Filters to search

In [22]:
from kgforge.core.wrappings.paths import Filter, FilterOperator

In [23]:

proteins = forge.search({'type': 'Protein', 'up:reviewed': True}, db_source='UniProt', limit=10, debug=True)

Submitted query:
   PREFIX up: <http://purl.uniprot.org/core/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX owl2xml: <http://www.w3.org/2006/12/owl2-xml#>
   PREFIX swrlb: <http://www.w3.org/2003/11/swrlb#>
   PREFIX protege: <http://protege.stanford.edu/plugins/owl/protege#>
   PREFIX swrl: <http://www.w3.org/2003/11/swrl#>
   PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX dc11: <http://purl.org/dc/terms/>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?id WHERE {?id rdf:type up:Protein;
    up:reviewed ?v1 . 
    FILTER(?v1 = 'true'^^xsd:boolean)
   }  LIMIT 10



In [24]:
proteins[0]

Resource(_last_action=None, _validated=False, _synchronized=False, _store_metadata=None, id='http://purl.uniprot.org/uniprot/A0B137', _inner_sync=False)

# Map resources

In [25]:
uniprot = sources['UniProt']

In [26]:
complete_query = """
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?id ?gene ?label ?subject ?gene_label
WHERE {
  ?id a up:Protein ;
  up:reviewed true ;
  up:encodedBy ?gene ;
  up:recommendedName / up:fullName ?label ;
  up:organism / up:scientificName ?subject .
  ?gene skos:prefLabel ?gene_label . 
}
"""

In [27]:
raw_proteins = uniprot.sparql(complete_query)

In [28]:
new_resource = uniprot.map(raw_proteins[0], 'Protein')

In [29]:

print(json.dumps(forge.as_jsonld(new_resource), indent=4))

[
    {
        "@context": "https://bbp.neuroshapes.org",
        "@id": "https://bbp.epfl.ch/neurosciencegraph/data/proteins/P0DJN9",
        "@type": [
            "Entity",
            "Protein"
        ],
        "encodedBy": {
            "@id": "http://purl.uniprot.org/uniprot/P0DJN9#gene-MD5A00DD99270221B359AB0AE338E423668",
            "label": "acsF"
        },
        "identifier": {
            "propertyID": "UniProtKB",
            "value": "P0DJN9"
        },
        "name": "Protein P0DJN9 from UniProtKB",
        "label": "Aerobic magnesium-protoporphyrin IX monomethyl ester [oxidative] cyclase",
        "subject": {
            "label": "Rubrivivax gelatinosus"
        }
    }
]


### same result could be obtain from a dictionary and a DictionaryMapping instance

In [31]:
dict_resource = forge.as_json(raw_proteins[0])
mapping = DictionaryMapping.load("../../database-sources/UniProt/mappings/DictionaryMapping/Protein.hjson")
print(json.dumps(forge.as_jsonld(uniprot.map(dict_resource, mapping)), indent=4))

[
    {
        "@context": "https://bbp.neuroshapes.org",
        "@id": "https://bbp.epfl.ch/neurosciencegraph/data/proteins/P0DJN9",
        "@type": [
            "Entity",
            "Protein"
        ],
        "encodedBy": {
            "@id": "http://purl.uniprot.org/uniprot/P0DJN9#gene-MD5A00DD99270221B359AB0AE338E423668",
            "label": "acsF"
        },
        "identifier": {
            "propertyID": "UniProtKB",
            "value": "P0DJN9"
        },
        "name": "Protein P0DJN9 from UniProtKB",
        "label": "Aerobic magnesium-protoporphyrin IX monomethyl ester [oxidative] cyclase",
        "subject": {
            "label": "Rubrivivax gelatinosus"
        }
    }
]
