# Querying external database sources of interest

* Enable users to integrate data from external databases of interest within BBP KG
* While using the Nexus Forge interface and BMO vocabulary as much as possible as
* While benefiting from out of the box (meta)data transformation to make them ready for BBP internal pipelines and applications
* Demo with Mouselight, NeuroElectro, UniProt

In [1]:
import json

from kgforge.core import KnowledgeGraphForge
from kgforge.specializations.resources import Dataset

In [2]:
endpoint = "https://staging.nise.bbp.epfl.ch/nexus/v1"
BUCKET = "neurosciencegraph/datamodels"
forge = KnowledgeGraphForge("../../configurations/database-sources/prod-nexus-sources_progress.yml", endpoint=endpoint, bucket=BUCKET)

# List of Data sources

In [3]:
forge.db_sources(pretty=True)

Available Database sources:
UniProt
NeuroElectro
MouseLight


In [4]:
sources = forge.db_sources()

In [5]:

data = {
       'origin': 'store',
       'source': 'DemoStore',
       'model': { 
          'name': 'DemoModel',
          'origin': 'directory',
          'source': "../../../tests/data/demo-model/" 
        }
}


In [6]:
from kgforge.specializations.databases import StoreDatabase
ds = StoreDatabase(forge, name="DemoDB", **data)

In [7]:
forge.add_db_source(ds)

In [8]:
forge.db_sources(pretty=True)

Available Database sources:
UniProt
NeuroElectro
MouseLight
DemoDB


# Data source metadata

In [9]:
neuroelectro = sources['NeuroElectro']

## Get data mappings (hold transformations logic) per data type

* Data mappings are used to transform results obtained from the external data sources so that they are ready for consumption by BBP tools
* Perform automatic ontology linking

In [10]:
forge.mappings("NeuroElectro")

{'ElectrophysiologicalFeatureAnnotation': ['DictionaryMapping'],
 'ParameterAnnotation': ['DictionaryMapping'],
 'ParameterBody': ['DictionaryMapping'],
 'ScholarlyArticle': ['DictionaryMapping'],
 'SeriesBody': ['DictionaryMapping']}

In [11]:
forge.mappings('UniProt')

{'Gene': ['DictionaryMapping'], 'Protein': ['DictionaryMapping']}

In [12]:
from kgforge.specializations.mappings import DictionaryMapping
mapping = forge.mapping("ScholarlyArticle", "NeuroElectro")
direct_mapping = neuroelectro.mapping("ScholarlyArticle", type=DictionaryMapping)

In [13]:
print(mapping)

{
    id: forge.format("identifier", "scholarlyarticles", x.id)
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: x.abstract
    author: x.authors_shaped
    datePublished: x.date_issued
    identifier: x.identifiers
    isPartOf:
    {
        type: Periodical
        issn: x.issn
        name: x.journal
        publisher: x.publisher
    }
    name: f"article_{x.id}"
    sameAs: x.full_text_link
    title: x.title
    url: x.full_text_link
}


In [14]:
print(direct_mapping)

{
    id: forge.format("identifier", "scholarlyarticles", x.id)
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: x.abstract
    author: x.authors_shaped
    datePublished: x.date_issued
    identifier: x.identifiers
    isPartOf:
    {
        type: Periodical
        issn: x.issn
        name: x.journal
        publisher: x.publisher
    }
    name: f"article_{x.id}"
    sameAs: x.full_text_link
    title: x.title
    url: x.full_text_link
}


In [15]:
forge.db_sources(mappings='Gene', pretty=True)

Available Database sources:
UniProt


# Search and Access data from data source

* Mapping are automatically applied to search results
* takes a mn for now => working on making it faster 

In [16]:
filters = {"type":"ScholarlyArticle"}
#map=True, use_cache=True, # download=True
resources = forge.search(filters, db_source="NeuroElectro", limit=2, debug=True) 
# Add function for checking datsource health => reqsuire health url from db


Submitted query:
   PREFIX bmc: <https://bbp.epfl.ch/ontologies/core/bmc/>
   PREFIX bmo: <https://bbp.epfl.ch/ontologies/core/bmo/>
   PREFIX commonshapes: <https://neuroshapes.org/commons/>
   PREFIX datashapes: <https://neuroshapes.org/dash/>
   PREFIX dc: <http://purl.org/dc/elements/1.1/>
   PREFIX dcat: <http://www.w3.org/ns/dcat#>
   PREFIX dcterms: <http://purl.org/dc/terms/>
   PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
   PREFIX nsg: <https://neuroshapes.org/>
   PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
   PREFIX oa: <http://www.w3.org/ns/oa#>
   PREFIX obo: <http://purl.obolibrary.org/obo/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX schema: <http://schema.org/>
   PREFIX sh: <http://www.w3.org/ns/shacl#>
   PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
   PREFI

In [17]:
len(resources)

2

In [18]:
print(resources[0])

{
    context: https://bbp.neuroshapes.org
    id: https://bbp.epfl.ch/neurosciencegraph/data/scholarlyarticles/72136
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: We investigated the effects of muscarinic acetylcholine receptor (mAChR) activation on GABAergic synaptic transmission in rat hippocampal neurons. Current-clamp recordings revealed that methacholine produced membrane depolarization and action potential firing. Methacholine augmented the bicuculline-sensitive and GABA(A) -mediated frequency of spontaneous inhibitory postsynaptic currents (sIPSCs); the action of methacholine had a slow onset and longer duration. The increase in methacholine-evoked sIPSCs was completely inhibited by atropine and was insensitive to glutamatergic receptor blockers. Interestingly, methacholine action was not inhibited by intracellular perfusion with GDP-β-S, suggesting that muscarinic effects on membrane excitability and sIPSC frequency are mainly presynaptic. McN-A-3

In [19]:
uquery = """
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein
WHERE {
  ?protein a up:Protein ;
  up:reviewed true.
}
"""

In [20]:
uresources = forge.sparql(query=uquery, db_source='UniProt', limit=10, debug=True)

Submitted query:
   
   PREFIX up: <http://purl.uniprot.org/core/>
   SELECT ?protein
   WHERE {
     ?protein a up:Protein ;
     up:reviewed true.
   }
     LIMIT 10



In [21]:
len(uresources)

10

In [22]:
uresources[0]

Resource(_last_action=None, _validated=False, _synchronized=False, _store_metadata=None, _inner_sync=False, protein='http://purl.uniprot.org/uniprot/A0B137')

## Use Filters to search

In [23]:
from kgforge.core.wrappings.paths import Filter, FilterOperator

In [24]:

proteins = forge.search({'type': 'Protein', 'up:reviewed': True}, db_source='UniProt', limit=10, debug=True)

Submitted query:
   PREFIX up: <http://purl.uniprot.org/core/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX owl2xml: <http://www.w3.org/2006/12/owl2-xml#>
   PREFIX swrlb: <http://www.w3.org/2003/11/swrlb#>
   PREFIX protege: <http://protege.stanford.edu/plugins/owl/protege#>
   PREFIX swrl: <http://www.w3.org/2003/11/swrl#>
   PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX dc11: <http://purl.org/dc/terms/>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?id WHERE {?id rdf:type up:Protein;
    up:reviewed ?v1 . 
    FILTER(?v1 = 'true'^^xsd:boolean)
   }  LIMIT 10



In [25]:
proteins[0]

Resource(_last_action=None, _validated=False, _synchronized=False, _store_metadata=None, id='http://purl.uniprot.org/uniprot/A0B137', _inner_sync=False)

# Map resources

In [26]:
uniprot = sources['UniProt']

In [27]:
complete_query = """
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?id ?gene ?label ?subject ?gene_label
WHERE {
  ?id a up:Protein ;
  up:reviewed true ;
  up:encodedBy ?gene ;
  up:recommendedName / up:fullName ?label ;
  up:organism / up:scientificName ?subject .
  ?gene skos:prefLabel ?gene_label . 
}
"""

In [28]:
raw_proteins = uniprot.sparql(complete_query)

In [30]:
new_resource = uniprot.map_resources(raw_proteins[0], 'Protein')

In [31]:

print(json.dumps(forge.as_jsonld(new_resource), indent=4))

[
    {
        "@context": "https://bbp.neuroshapes.org",
        "@id": "https://bbp.epfl.ch/neurosciencegraph/data/proteins/P0DJN9",
        "@type": [
            "Entity",
            "Protein"
        ],
        "encodedBy": {
            "@id": "http://purl.uniprot.org/uniprot/P0DJN9#gene-MD5A00DD99270221B359AB0AE338E423668",
            "label": "acsF"
        },
        "identifier": {
            "propertyID": "UniProtKB",
            "value": "P0DJN9"
        },
        "name": "Protein P0DJN9 from UniProtKB",
        "label": "Aerobic magnesium-protoporphyrin IX monomethyl ester [oxidative] cyclase",
        "subject": {
            "label": "Rubrivivax gelatinosus"
        }
    }
]


# Save in BBP KG (Nexus)

## Access

### Set filters

In [32]:
_type = "NeuronMorphology"
filters = {"type": _type}

### Run Query

In [33]:
limit = 10  # You can limit the number of results, pass `None` to fetch all the results

data = forge.search(filters, db_source='MouseLight', limit=limit)

print(f"{str(len(data))} dataset(s) of type {_type} found")

10 dataset(s) of type NeuronMorphology found


### Display the results as pandas dataframe

In [34]:
property_to_display = ["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"]
reshaped_data = forge.reshape(data, keep=property_to_display)

forge.as_dataframe(reshaped_data)

Unnamed: 0,id,brainLocation.brainRegion.id,brainLocation.brainRegion.label,brainLocation.layer,contribution.type,contribution.agent.id,contribution.agent.type,distribution.contentUrl,distribution.encodingFormat,distribution.name,name,subject.id,subject.type
0,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,VISp5,5,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,application/swc,reconstruction.swc,Htr3a-Cre_NO152;Ai14-314467.03.02.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
1,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,VISp5,5,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,application/swc,reconstruction.swc,Pvalb-IRES-Cre;Ai14-185362.03.01.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
2,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,VISli6a,6a,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,application/swc,reconstruction.swc,Chrna2-Cre_OE25;Ai14(BT)-280154.04.01.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
3,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,MTG,5,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,application/swc,reconstruction.swc,H16.06.008.01.26.04,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
4,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,VISp5,5,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,application/swc,reconstruction.swc,Rorb-IRES2-Cre-D;Ai14-168053.05.01.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
5,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,VISp6b,6b,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,application/swc,reconstruction.swc,Ctgf-2A-dgCre;Ai14(IVSCC)-230665.02.02.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
6,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure/33,VISp6a,6a,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,application/swc,reconstruction.swc,Nos1-CreERT2;Sst-IRES-FlpO;Ai65-304714.02.01.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
7,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,MTG,3,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,application/swc,reconstruction.swc,H16.06.010.01.03.04.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
8,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,VISp4,4,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,application/swc,reconstruction.swc,Nr5a1-Cre;Ai14-187780.03.02.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
9,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,VISp2/3,2/3,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,application/swc,reconstruction.swc,Slc17a6-IRES-Cre;Ai14-190263.04.01.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
