# Create celltypes excel table from Cell Types Ontology

## Context

- See JIRA task [DKE-1041](https://bbpteam.epfl.ch/project/issues/browse/DKE-1041)
- Georges Khazen [AnnotationMappingTable](https://docs.google.com/spreadsheets/d/1Ky0FA1XaJru9od9lze9d_7ZaFSi7OkqH/edit#gid=2119141304)
- Will be implemented below for `Neuron Type`

## Imports

In [None]:
import rdflib
import pandas as pd
from rdflib import RDF, RDFS, XSD, OWL, URIRef, BNode, SKOS

## Helper

In [None]:
def get_query(cell_type_type_id:str):
    query = f"""

       PREFIX bmc: <https://bbp.epfl.ch/ontologies/core/bmc/>
       PREFIX bmo: <https://bbp.epfl.ch/ontologies/core/bmo/>
       PREFIX commonshapes: <https://neuroshapes.org/commons/>
       PREFIX datashapes: <https://neuroshapes.org/dash/>
       PREFIX dc: <http://purl.org/dc/elements/1.1/>
       PREFIX dcat: <http://www.w3.org/ns/dcat#>
       PREFIX dcterms: <http://purl.org/dc/terms/>
       PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
       PREFIX nsg: <https://neuroshapes.org/>
       PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
       PREFIX oa: <http://www.w3.org/ns/oa#>
       PREFIX obo: <http://purl.obolibrary.org/obo/>
       PREFIX owl: <http://www.w3.org/2002/07/owl#>
       PREFIX prov: <http://www.w3.org/ns/prov#>
       PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
       PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
       PREFIX schema: <http://schema.org/>
       PREFIX sh: <http://www.w3.org/ns/shacl#>
       PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
       PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
       PREFIX vann: <http://purl.org/vocab/vann/>
       PREFIX void: <http://rdfs.org/ns/void#>
       PREFIX xml: <http://www.w3.org/XML/1998/namespace/>
       PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   
   
       SELECT 
       ?brain_region_label 
       ?brain_region_id 
       ?species_id
       ?species_label
       ?cell_type_label 
       ?cell_type_id 
       ?cell_type_source
       ?cell_type_type_label
       ?transmitter_type_label
       ?transmitter_type_id
   
       WHERE {{
               ?cell_type_id rdfs:subClassOf* <{cell_type_type_id}> ;
                       rdfs:label ?cell_type_label .
               <{cell_type_type_id}> rdfs:label ?cell_type_type_label .
               
               
            OPTIONAL {{ 
               ?cell_type_id rdfs:subClassOf* ?region_restriction .
               ?region_restriction a owl:Restriction ;
                   owl:onProperty bmo:canBeLocatedInBrainRegion ;
                   owl:someValuesFrom ?brain_region_id .
                }} .
                   
            OPTIONAL {{
               ?cell_type_id rdfs:subClassOf* ?transmitter_restriction .
               ?transmitter_restriction a owl:Restriction ;
                   owl:onProperty <https://bbp.epfl.ch/ontologies/core/mtypes/hasNeurotransmitterType> ;
                   owl:someValuesFrom ?transmitter_type_id .
               ?transmitter_type_id rdfs:label ?transmitter_type_label .
               }} .
               
            OPTIONAL {{
                ?cell_type_id rdfs:seeAlso ?cell_type_source 
            }} .
            
            OPTIONAL {{
            
                ?cell_type_id rdfs:subClassOf* ?species_restriction .
                ?species_restriction a owl:Restriction ;
                   owl:onProperty <https://neuroshapes.org/hasInstanceInSpecies> ;
                   owl:someValuesFrom ?species_id .
                ?species_id rdfs:label ?species_label .
            }} .
               
            FILTER NOT EXISTS {{ ?s rdfs:subClassOf ?cell_type_id }} .

       }}
         LIMIT 1000
    """
    
    return query

## Load Cell Types Ontology

Downloaded from WebProtégé

In [None]:
cell_types_ontology = rdflib.Graph()
cell_types_ontology.parse("./celltypes.ttl")

## Load Brain Region Ontology

Downloaded from WebProtégé

In [None]:
brainregion_ontology = rdflib.Graph()
brainregion_ontology.parse("./brainregion.ttl")

## Query

The query should be run for subclasses of `Neuron t-type`, `Neuron m-type` and `Neuron e-type`

In [None]:
cell_type_type_ids = [
    "https://bbp.epfl.ch/ontologies/core/celltypes/NeuronTranscriptomicType",
    "https://bbp.epfl.ch/ontologies/core/bmo/NeuronMorphologicalType",
    "https://bbp.epfl.ch/ontologies/core/bmo/NeuronElectricalType"
]

In [None]:
rows = list()
for cell_type_type_id in cell_type_type_ids:
    query = get_query(cell_type_type_id)
    result = cell_types_ontology.query(query)
    for row in result:
        rows.append(row)

## Save to excel

In [None]:
df = pd.DataFrame(rows, columns=["brain_region_label", 
                             "brain_region_id", 
                             "species_id",
                             "species_label",
                             "cell_type_label",
                             "cell_type_id",
                             "cell_type_source",
                             "cell_type_type_label",
                             "transmitter_type_label",
                             "transmitter_type_id",
                             
                             ]) 

In [None]:
df.head()

In [None]:
for row in df.iterrows():
    br_id = str(row[1].brain_region_id)
    if br_id:
        for s, p, o in brainregion_ontology.triples((rdflib.term.URIRef(br_id), RDFS.label, None)):
            row[1].brain_region_label = o

In [None]:
df.to_excel("./celltypes.xlsx")