# Querying external database sources of interest

* Enable users to integrate data from external databases of interest within BBP KG
* While using the Nexus Forge interface and BMO vocabulary as much as possible as
* While benefiting from out of the box (meta)data transformation to make them ready for BBP internal pipelines and applications
* Demo with Mouselight, NeuroElectro, UniProt

In [1]:
import json

from kgforge.core import KnowledgeGraphForge
from kgforge.specializations.resources import Dataset

In [2]:
endpoint = "https://staging.nise.bbp.epfl.ch/nexus/v1"
BUCKET = "neurosciencegraph/datamodels"
forge = KnowledgeGraphForge("../../configurations/database-sources/prod-nexus-sources_progress.yml", endpoint=endpoint, bucket=BUCKET)

# List of Data sources

In [3]:
forge.db_sources(pretty=True)

Available Database sources:
UniProt
NeuroElectro
MouseLight


In [4]:
sources = forge.db_sources()

In [5]:

data = {
       'origin': 'store',
       'source': 'DemoStore',
       'model': { 
          'name': 'DemoModel',
          'origin': 'directory',
          'source': "../../../tests/data/demo-model/" 
        }
}


In [6]:
from kgforge.specializations.databases import StoreDatabase
ds = StoreDatabase(forge, name="DemoDB", **data)

In [7]:
forge.add_db_source(ds)

In [8]:
forge.db_sources(pretty=True)

Available Database sources:
UniProt
NeuroElectro
MouseLight
DemoDB


# Data source metadata

In [9]:
neuroelectro = sources['NeuroElectro']

## Get data mappings (hold transformations logic) per data type

* Data mappings are used to transform results obtained from the external data sources so that they are ready for consumption by BBP tools
* Perform automatic ontology linking

In [10]:
forge.mappings("NeuroElectro")

{'ElectrophysiologicalFeatureAnnotation': ['DictionaryMapping'],
 'ParameterAnnotation': ['DictionaryMapping'],
 'ParameterBody': ['DictionaryMapping'],
 'ScholarlyArticle': ['DictionaryMapping'],
 'SeriesBody': ['DictionaryMapping']}

In [11]:
forge.mappings('UniProt')

{'Gene': ['DictionaryMapping'], 'Protein': ['DictionaryMapping']}

In [12]:
from kgforge.specializations.mappings import DictionaryMapping
mapping = forge.mapping("ScholarlyArticle", "NeuroElectro")
direct_mapping = neuroelectro.mapping("ScholarlyArticle", type=DictionaryMapping)

In [13]:
print(mapping)

{
    id: forge.format("identifier", "scholarlyarticles", x.id)
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: x.abstract
    author: x.authors_shaped
    datePublished: x.date_issued
    identifier: x.identifiers
    isPartOf:
    {
        type: Periodical
        issn: x.issn
        name: x.journal
        publisher: x.publisher
    }
    name: f"article_{x.id}"
    sameAs: x.full_text_link
    title: x.title
    url: x.full_text_link
}


In [14]:
print(direct_mapping)

{
    id: forge.format("identifier", "scholarlyarticles", x.id)
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: x.abstract
    author: x.authors_shaped
    datePublished: x.date_issued
    identifier: x.identifiers
    isPartOf:
    {
        type: Periodical
        issn: x.issn
        name: x.journal
        publisher: x.publisher
    }
    name: f"article_{x.id}"
    sameAs: x.full_text_link
    title: x.title
    url: x.full_text_link
}


In [15]:
forge.db_sources(mappings='Gene', pretty=True)

Available Database sources:
UniProt


# Search and Access data from data source

* Mapping are automatically applied to search results
* takes a mn for now => working on making it faster 

In [16]:
filters = {"type":"ScholarlyArticle"}
#map=True, use_cache=True, # download=True
resources = forge.search(filters, db_source="NeuroElectro", limit=2, debug=True) 
# Add function for checking datsource health => reqsuire health url from db


Submitted query:
   PREFIX bmc: <https://bbp.epfl.ch/ontologies/core/bmc/>
   PREFIX bmo: <https://bbp.epfl.ch/ontologies/core/bmo/>
   PREFIX commonshapes: <https://neuroshapes.org/commons/>
   PREFIX datashapes: <https://neuroshapes.org/dash/>
   PREFIX dc: <http://purl.org/dc/elements/1.1/>
   PREFIX dcat: <http://www.w3.org/ns/dcat#>
   PREFIX dcterms: <http://purl.org/dc/terms/>
   PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
   PREFIX nsg: <https://neuroshapes.org/>
   PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
   PREFIX oa: <http://www.w3.org/ns/oa#>
   PREFIX obo: <http://purl.obolibrary.org/obo/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX schema: <http://schema.org/>
   PREFIX sh: <http://www.w3.org/ns/shacl#>
   PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
   PREFI

In [17]:
len(resources)

2

In [18]:
print(resources[0])

{
    context: https://bbp.neuroshapes.org
    id: https://bbp.epfl.ch/neurosciencegraph/data/scholarlyarticles/91941
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: Neurons in the medial septal/diagonal band complex (MS/DB) in vivo exhibit rhythmic burst-firing activity that is phase-locked with the hippocampal theta rhythm. The aim was to assess the morphology of local axon collaterals of electrophysiologically identified MS/DB neurons using intracellular recording and biocytin injection in vitro. Cells were classified according to previous criteria into slow-firing, fast-spiking, regular-spiking, and burst-firing neurons; previous work has suggested that the slow-firing neurons are cholinergic and that the other types are GABAergic. A novel finding was the existence of two types of burst-firing neuron. Type I burst-firing neurons had significantly longer duration after hyperpolarisation potentials when held at -60 mV, and at -75 mV, type I neurons exhibit

In [19]:
uquery = """
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein
WHERE {
  ?protein a up:Protein ;
  up:reviewed true.
}
"""

In [20]:
uresources = forge.sparql(query=uquery, db_source='UniProt', limit=10, debug=True)

Submitted query:
   
   PREFIX up: <http://purl.uniprot.org/core/>
   SELECT ?protein
   WHERE {
     ?protein a up:Protein ;
     up:reviewed true.
   }
     LIMIT 10



In [21]:
len(uresources)

10

In [22]:
uresources[0]

Resource(_last_action=None, _validated=False, _synchronized=False, _store_metadata=None, _inner_sync=False, protein='http://purl.uniprot.org/uniprot/A0B137')

In [23]:
uniprot = sources['UniProt']

In [24]:
complete_query = """
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein ?gene
WHERE {
  ?protein a up:Protein ;
  up:reviewed true ;
  up:encodedBy ?gene; .
  ?gene prefLabel ?gene_label
  
}
"""

In [25]:
from kgforge.core.wrappings.paths import Filter, FilterOperator

In [26]:


proteins = forge.search({'type': 'Protein', 'up:reviewed': True}, db_source='UniProt', limit=10, debug=True)

Submitted query:
   PREFIX up: <http://purl.uniprot.org/core/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX owl2xml: <http://www.w3.org/2006/12/owl2-xml#>
   PREFIX swrlb: <http://www.w3.org/2003/11/swrlb#>
   PREFIX protege: <http://protege.stanford.edu/plugins/owl/protege#>
   PREFIX swrl: <http://www.w3.org/2003/11/swrl#>
   PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX dc11: <http://purl.org/dc/terms/>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?id WHERE {?id rdf:type up:Protein;
    up:reviewed ?v1 . 
    FILTER(?v1 = 'true'^^xsd:boolean)
   }  LIMIT 10



In [27]:
proteins[0]

Resource(_last_action=None, _validated=False, _synchronized=False, _store_metadata=None, id='http://purl.uniprot.org/uniprot/A0B137', _inner_sync=False)

In [28]:
uniprot = sources['UniProt']

In [29]:
genes = forge.search({'type': 'Gene'}, db_source='UniProt', limit=10)

In [30]:
genes[0]

Resource(_last_action=None, _validated=False, _synchronized=False, _store_metadata=None, id='http://purl.uniprot.org/uniprot/A1KV59#gene-MD5C1D876E2DCF48FF8A37D8833A1B756B4', _inner_sync=False)

# Save in BBP KG (Nexus)

## Access

### Set filters

In [31]:
_type = "NeuronMorphology"
filters = {"type": _type}

### Run Query

In [33]:
limit = 10  # You can limit the number of results, pass `None` to fetch all the results

data = forge.search(filters, db_source='MouseLight', limit=limit)

print(f"{str(len(data))} dataset(s) of type {_type} found")

10 dataset(s) of type NeuronMorphology found


### Display the results as pandas dataframe

In [34]:
property_to_display = ["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"]
reshaped_data = forge.reshape(data, keep=property_to_display)

forge.as_dataframe(reshaped_data)

Unnamed: 0,id,brainLocation.brainRegion.id,brainLocation.brainRegion.label,contribution.type,contribution.agent.id,contribution.agent.type,contribution.agent.label,distribution.contentUrl,distribution.encodingFormat,distribution.name,name,subject.type,subject.species.id,subject.species.label,subject.strain.label
0,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,Primary motor area Layer 5,Contribution,https://www.grid.ac/institutes/grid.443970.d,Organization,Janelia Research Campus,https://staging.nise.bbp.epfl.ch/nexus/v1/file...,application/swc,AA1050.swc,AA1050,Subject,http://purl.obolibrary.org/obo/NCBITaxon_10090,Mus musculus,Sim1-Cre
1,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,Primary somatosensory area mouth layer 5,Contribution,https://www.grid.ac/institutes/grid.443970.d,Organization,Janelia Research Campus,https://staging.nise.bbp.epfl.ch/nexus/v1/file...,application/swc,AA1049.swc,AA1049,Subject,http://purl.obolibrary.org/obo/NCBITaxon_10090,Mus musculus,Sim1-Cre
2,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,Retrosplenial area ventral part layer 5,Contribution,https://www.grid.ac/institutes/grid.443970.d,Organization,Janelia Research Campus,https://staging.nise.bbp.epfl.ch/nexus/v1/file...,application/swc,AA1045.swc,AA1045,Subject,http://purl.obolibrary.org/obo/NCBITaxon_10090,Mus musculus,Sim1-Cre
3,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,Parafascicular nucleus,Contribution,https://www.grid.ac/institutes/grid.443970.d,Organization,Janelia Research Campus,https://staging.nise.bbp.epfl.ch/nexus/v1/file...,application/swc,AA1046.swc,AA1046,Subject,http://purl.obolibrary.org/obo/NCBITaxon_10090,Mus musculus,Sim1-Cre
4,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,Medial mammillary nucleus,Contribution,https://www.grid.ac/institutes/grid.443970.d,Organization,Janelia Research Campus,https://staging.nise.bbp.epfl.ch/nexus/v1/file...,application/swc,AA1048.swc,AA1048,Subject,http://purl.obolibrary.org/obo/NCBITaxon_10090,Mus musculus,Sim1-Cre
5,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,Primary somatosensory area mouth layer 5,Contribution,https://www.grid.ac/institutes/grid.443970.d,Organization,Janelia Research Campus,https://staging.nise.bbp.epfl.ch/nexus/v1/file...,application/swc,AA1051.swc,AA1051,Subject,http://purl.obolibrary.org/obo/NCBITaxon_10090,Mus musculus,Sim1-Cre
6,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,Entorhinal area lateral part,Contribution,https://www.grid.ac/institutes/grid.443970.d,Organization,Janelia Research Campus,https://staging.nise.bbp.epfl.ch/nexus/v1/file...,application/swc,AA1047.swc,AA1047,Subject,http://purl.obolibrary.org/obo/NCBITaxon_10090,Mus musculus,Sim1-Cre
7,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,Primary motor area Layer 6a,Contribution,https://www.grid.ac/institutes/grid.443970.d,Organization,Janelia Research Campus,https://staging.nise.bbp.epfl.ch/nexus/v1/file...,application/swc,AA1043.swc,AA1043,Subject,http://purl.obolibrary.org/obo/NCBITaxon_10090,Mus musculus,Sim1-Cre
8,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,Medial mammillary nucleus,Contribution,https://www.grid.ac/institutes/grid.443970.d,Organization,Janelia Research Campus,https://staging.nise.bbp.epfl.ch/nexus/v1/file...,application/swc,AA1041.swc,AA1041,Subject,http://purl.obolibrary.org/obo/NCBITaxon_10090,Mus musculus,Sim1-Cre
9,https://bbp.epfl.ch/neurosciencegraph/data/neu...,http://api.brain-map.org/api/v2/data/Structure...,Medial mammillary nucleus,Contribution,https://www.grid.ac/institutes/grid.443970.d,Organization,Janelia Research Campus,https://staging.nise.bbp.epfl.ch/nexus/v1/file...,application/swc,AA1039.swc,AA1039,Subject,http://purl.obolibrary.org/obo/NCBITaxon_10090,Mus musculus,Sim1-Cre
