# Querying external database sources of interest

* Enable users to integrate data from external databases of interest within BBP KG
* While using the Nexus Forge interface and BMO vocabulary as much as possible as
* While benefiting from out of the box (meta)data transformation to make them ready for BBP internal pipelines and applications
* Demo with Mouselight, NeuroElectro, UniProt

In [1]:
import json

from kgforge.core import KnowledgeGraphForge
from kgforge.specializations.resources import Dataset

In [2]:
endpoint = "https://staging.nise.bbp.epfl.ch/nexus/v1"
BUCKET = "neurosciencegraph/datamodels"
forge = KnowledgeGraphForge("../../configurations/database-sources/prod-nexus-sources.yml", endpoint=endpoint, bucket=BUCKET)

# List of Data sources

In [3]:
forge.db_sources(pretty=True)

Available Database sources:
UniProt
NeuroElectro
MouseLight


In [4]:
sources = forge.db_sources(pretty=False)

In [5]:

data = {
       'store':{
          'name': 'DemoStore'
       },
        'model': { 
          'name': 'DemoModel',
          'origin': 'directory',
          'source': "../../../tests/data/demo-model/" 
        }
}


In [6]:
from kgforge.specializations.resources import DatabaseSource
ds = DatabaseSource(forge, name="DemoDB", from_forge=False, **data)

In [7]:
print(ds)

{
    type: Database
    _store:
    {
        context: null
        bucket: null
        endpoint: null
        file_mapping: null
        metadata_context: null
        model_context: null
        service:
        {
            archives: {}
            records: {}
            tags: {}
        }
        token: null
        versioned_id_template: null
    }
    model:
    {
        origin: directory
        source: ../../../tests/data/demo-model/
    }
    name: DemoDB
    store:
    {
        name: DemoStore
    }
}


In [8]:
forge.db_sources(pretty=True)

Available Database sources:
UniProt
NeuroElectro
MouseLight
DemoDB


# Data source metadata

In [9]:
mouselight = sources["MouseLight"]

## Name, description, url, license, protocol => more can be added through configuration

In [10]:
print(mouselight.name)
print(mouselight.protocol)
print(mouselight.license)

MouseLight
https://www.janelia.org/project-team/mouselight/resources
{'id': 'https://creativecommons.org/licenses/by-nc/4.0', 'label': 'CC BY-NC 4.0'}


## Get data mappings (hold transformations logic) per data type

* Data mappings are used to transform results obtained from the external data sources so that they are ready for consumption by BBP tools
* Perform automatic ontology linking

In [11]:
forge.mappings("MouseLight", pretty=False)

{'NeuronMorphology': ['DictionaryMapping']}

In [12]:
forge.mappings('UniProt', pretty=True)

Managed mappings for the data source per entity type and mapping type:
   - NeuronElectrophysiologicalFeature:
        * DictionaryMapping


In [13]:
forge.mappings('NeuroElectro', pretty=True)

Managed mappings for the data source per entity type and mapping type:
   - ElectrophysiologicalFeatureAnnotation:
        * DictionaryMapping
   - ParameterAnnotation:
        * DictionaryMapping
   - ParameterBody:
        * DictionaryMapping
   - ScholarlyArticle:
        * DictionaryMapping
   - SeriesBody:
        * DictionaryMapping


In [14]:
from kgforge.specializations.mappings import DictionaryMapping
mapping = forge.mapping("NeuronMorphology", "MouseLight", type=DictionaryMapping)
direct_mapping = mouselight.mapping("NeuronMorphology", type=DictionaryMapping)

In [15]:
print(mapping)

{
    id: forge.format("identifier", "neuronmorphologies/mouselight", x.neurons[0]["idString"])
    type:
    [
        Dataset
        NeuronMorphology
    ]
    brainLocation:
    {
        type: BrainLocation
        brainRegion:
        {
            id: f"http://api.brain-map.org/api/v2/data/Structure/{x.neurons[0]['soma']['allenId']}"
            label: x.neurons[0]["allenLabel"]
        }
        coordinatesInBrainAtlas:
        {
            valueX: x.neurons[0]["soma"]["x"]
            valueY: x.neurons[0]["soma"]["y"]
            valueZ: x.neurons[0]["soma"]["z"]
        }
    }
    contribution:
    {
        type: Contribution
        agent:
        {
            id: https://www.grid.ac/institutes/grid.443970.d
            type: Organization
            label: Janelia Research Campus
        }
    }
    dateCreated: x.neurons[0]["sample"]["date"]
    description: x.neurons[0]["annotationSpace"]["description"]
    distribution: forge.attach(f"./mouselight/{x.neurons[0]['idSt

In [16]:
print(direct_mapping)

{
    id: forge.format("identifier", "neuronmorphologies/mouselight", x.neurons[0]["idString"])
    type:
    [
        Dataset
        NeuronMorphology
    ]
    brainLocation:
    {
        type: BrainLocation
        brainRegion:
        {
            id: f"http://api.brain-map.org/api/v2/data/Structure/{x.neurons[0]['soma']['allenId']}"
            label: x.neurons[0]["allenLabel"]
        }
        coordinatesInBrainAtlas:
        {
            valueX: x.neurons[0]["soma"]["x"]
            valueY: x.neurons[0]["soma"]["y"]
            valueZ: x.neurons[0]["soma"]["z"]
        }
    }
    contribution:
    {
        type: Contribution
        agent:
        {
            id: https://www.grid.ac/institutes/grid.443970.d
            type: Organization
            label: Janelia Research Campus
        }
    }
    dateCreated: x.neurons[0]["sample"]["date"]
    description: x.neurons[0]["annotationSpace"]["description"]
    distribution: forge.attach(f"./mouselight/{x.neurons[0]['idSt

In [17]:
forge.db_sources(with_datatype='NeuronMorphology', pretty=True)

Available Database sources:
MouseLight


In [18]:
ne = sources['NeuroElectro']

# Search and Access data from data source

* Mapping are automatically applied to search results
* takes a mn for now => working on making it faster 

In [19]:
# Type, source or target brain region, 
filters = {"type":"ScholarlyArticle"}
#map=True, use_cache=True, # download=True
resources = forge.search(filters, db_source="NeuroElectro", limit=2) 
# ADd function for checking datsource health => reqsuire health url from db


In [20]:
len(resources)

2

In [21]:
print(resources[0])

{
    context: https://bbp.neuroshapes.org
    id: https://bbp.epfl.ch/neurosciencegraph/data/scholarlyarticles/35463
    type:
    [
        Entity
        ScholarlyArticle
    ]
    abstract: Rationally, an increased intrinsic excitability of dorsal horn neurons could be a factor contributing to alter the gain of the nociceptive system during central sensitization, however direct evidence is scarce. Here we have examined this hypothesis using current and voltage-clamp recordings from dorsal horn neurons in the spinal cord in vitro preparation obtained from mice pups of either sex. Cords were extracted from carrageenan-pretreated and control animals to allow for comparison. Dorsal horn neurons from treated animals showed significantly larger and faster synaptic responses. Synaptic changes started developing shortly after inflammation (1 h) and developed further after a longer-term inflammation (20 h). However, these neurons showed biphasic changes in membrane excitability with an incr

In [22]:
uquery = """
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?protein
WHERE {
  ?protein a up:Protein .
  ?protein up:reviewed true .
}
"""

In [23]:
uresources = forge.sparql(query=uquery, db_source='UniProt', limit=10, debug=True)

Submitted query:
   
   PREFIX up: <http://purl.uniprot.org/core/>
   SELECT ?protein
   WHERE {
     ?protein a up:Protein .
     ?protein up:reviewed true .
   }
     LIMIT 10



In [24]:
forge.search({"type": 'Protein'}, db_source='UniProt')

<action> search
<error> ValueError: context model missing



# Save in BBP KG (Nexus)

In [25]:
# forge.register(resources)

## Access

### Set filters

In [26]:
_type = "NeuronMorphology"
filters = {"type": _type}

### Run Query

In [27]:
limit = 10  # You can limit the number of results, pass `None` to fetch all the results

data = forge.search(filters, limit=limit)

print(f"{str(len(data))} dataset(s) of type {_type} found")

0 dataset(s) of type NeuronMorphology found


### Display the results as pandas dataframe

In [28]:
property_to_display = ["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"]
reshaped_data = forge.reshape(data, keep=property_to_display)

forge.as_dataframe(reshaped_data)

### Download

In [29]:
dirpath = "./downloaded/"
forge.download(data, "distribution.contentUrl", dirpath)

<action> download
<error> DownloadingError: path to follow 'distribution.contentUrl' was not found in any provided resource.

