# Query Cell Types probabilities

# Context

This notebook has been put together in the context of this JIRA ticket [DKE-961](https://bbpteam.epfl.ch/project/issues/browse/DKE-961)

# Imports

In [1]:
import json
import rdflib
import getpass
import pandas as pd
from rdflib import RDF, RDFS, XSD, OWL, URIRef, BNode, SKOS
from bmo.ontologies import subontology_from_term
import pprint
from kgforge.core import KnowledgeGraphForge

# Setup

In [None]:
TOKEN = getpass.getpass()

The cell type ontology is stored in the `neurosciencegraph/datamodels` bucket in the knowledge graph. This is the `bucket` we target with the forge instance below.

In [3]:
forge = KnowledgeGraphForge("https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/prod-forge-nexus.yml",
                            token=TOKEN,
                            endpoint="https://staging.nise.bbp.epfl.ch/nexus/v1",
                            bucket="neurosciencegraph/datamodels")

# Ontologies

## Set brain region

During the meeting on `2022-05-30`, it was specified that a brain region will serve as entry point when searching for cell types in the MMB context. Hence, this notebook starts by defining a brain region one wants to get cell types and probabilities for. Since the most complete cell type information is available for the `Cerebral cortex`, this has been set as the default below.

In [4]:
BRAIN_REGION = "Cerebral cortex"

Get brain region id

In [5]:
r = forge.search({"label": BRAIN_REGION})

In [6]:
brain_region = r[0].id

In [7]:
brain_region

'http://api.brain-map.org/api/v2/data/Structure/688'

# Queries

## Get all cell type combinations and probabilities for a given brain region

This query cell type combinations for a given brain region (i.e. the `BRAIN_REGION` specified above). For demonstration purposes, the `limit` parameter on the query has been set to `1000`. This can be increased to get all available cell type combinations.

In [12]:
query = f"""

SELECT ?brain_region ?m_type ?e_type ?molecular_type ?probability

WHERE {{
        ?probability_id hasTarget / hasSource / hasSomaLocatedIn ?brain_region_id ;
            hasBody / value ?probability ;
            hasTarget ?m_type_target ;
            hasTarget ?e_type_target ;
            hasTarget ?molecular_type_target .
        ?brain_region_id label ?brain_region ;
            ^hasPart* <{brain_region}> .
        ?m_type_target hasSource / a MType ;
            hasSource / label ?m_type .
        ?e_type_target hasSource / a EType ;
            hasSource / label ?e_type .
        ?molecular_type_target hasSource / a NeuronMolecularType ;
            hasSource / label ?molecular_type .
}}
"""

In [13]:
resources = forge.sparql(query, limit=1000)

In [14]:
df = forge.as_dataframe(resources)

In [15]:
df.head()

Unnamed: 0,brain_region,e_type,m_type,molecular_type,probability
0,Isocortex,bSTUT,L6_MC,SST+,0.3070717
1,Isocortex,dSTUT,L4_LBC,SST+,0.0
2,Isocortex,dNAC,L6_SBC,SST+,0.24137931
3,Isocortex,dNAC,L6_SBC,SST+,0.24137931
4,Isocortex,dNAC,L6_SBC,SST+,0.24137931
