# Cell Type Ontology

This notebook was put together as a result of discussion on this ticket: [MMB2022-32](https://bbpteam.epfl.ch/project/issues/browse/MMB2022-32)

# Imports

In [1]:
import json
import rdflib
import pandas as pd
from rdflib import RDF, RDFS, XSD, OWL, URIRef, BNode
from rdflib.paths import OneOrMore
from bmo_tools.ontologies import subontology_from_term

# Available Cell Types

## Cell Type Ontology from WebProtégé

This ontology file was downloaded from WebProtégé: https://webprotege.kcp.bbp.epfl.ch/#projects/968c9144-bca3-4436-bdb5-6529d46016b9/edit/Classes

In [2]:
cell_type_ontology = rdflib.Graph()
cell_type_ontology.parse("/Users/akkaufma/Desktop/cell-type-ontology-(workspace)-ontologies-owl-REVISION-HEAD/urn:webprotege:ontology:b307df0e-232d-4e20-9467-80e0733ecbec.owl")

<Graph identifier=N0c4f0a8675b747d7955ce8d4e82736e0 (<class 'rdflib.graph.Graph'>)>

In [3]:
len(list(cell_type_ontology.subjects()))

3497

## ME-Type to T-Type compatibility mapping from Yann Roussel

This file was shared on this ticket: [MMB2022-32](https://bbpteam.epfl.ch/project/issues/browse/MMB2022-32)

In [4]:
with open("./me_type_to_t_type_compatibility.json") as f:
    met_mapping = json.load(f)

In [6]:
# met_mapping.values()

In [7]:
t_types = list()
for v in met_mapping.values():
    for el in v:
        t_types.append(el)
t_types = list(set(t_types))

## Cell Types from Blue Brain Cell Atlas: 

The cell types were copied from the Blue Brain Cell Atlas: https://bbp.epfl.ch/nexus/cell-atlas/?all=1 and pasted into an excel sheet

In [8]:
cell_atlas_types = pd.read_excel("/Users/akkaufma/Desktop/Blue Brain Cell Atlas Cell Types.xlsx")

In [9]:
cell_atlas_types.head()

Unnamed: 0,Parent,Child
0,Excitatory,BPC
1,Excitatory,DCNe
2,Excitatory,GrC
3,Excitatory,IO
4,Excitatory,IPC


## All labels from Cell Type Ontology

In [10]:
labels = list()
for s, p, o in cell_type_ontology.triples((None, RDFS.label, None)):
    labels.append(str(o))

# Check for labels missing in the Cell Type Ontology

## Check which Cell Atlas labels not yet present

In [11]:
len(list(set(cell_atlas_types.Child)))

60

In [12]:
missing_cell_atlas_types = list()
for el in list(cell_atlas_types.Child):
    if el not in labels:
        missing_cell_atlas_types.append(el)

In [13]:
len(missing_cell_atlas_types)

54

In [14]:
len(list(set(cell_atlas_types.Parent)))

4

In [15]:
for el in list(set(cell_atlas_types.Parent)):
    if el not in labels:
        missing_cell_atlas_types.append(el)

In [16]:
len(missing_cell_atlas_types)

57

In [17]:
missing_cell_atlas_types[:5]

['BPC', 'DCNe', 'GrC', 'IO', 'IPC']

## Check which ME-types not yet present

In [18]:
len(met_mapping.keys())

195

In [19]:
missing_me_types = list()
for el in met_mapping.keys():
    if el not in labels:
        missing_me_types.append(el)

In [20]:
len(missing_me_types)

177

In [21]:
missing_me_types[:5]

['L1_DAC_bNAC', 'L1_DAC_cNAC', 'L1_DLAC_cNAC', 'L1_HAC_bNAC', 'L1_HAC_cIR']

## Check which T-types not yet present

In [22]:
len(t_types)

217

In [23]:
missing_t_types = list()
for el in t_types:
    if el not in labels:
        missing_t_types.append(el)

In [24]:
len(missing_t_types)

217

In [25]:
missing_t_types[:5]

['L2/3 IT Plch1_5', 'L4/5 IT_3', 'L2/3 IT Cxcl14_8', 'Pvalb_12', 'Vip_3']

## Check which M-types not yet present

In [26]:
missing_m_types = list()
for el in met_mapping.keys():
    fragments = el.split("_")
    if len(fragments) == 3:
        m_type = f"{fragments[0]}_{fragments[1]}"
        if m_type not in labels:
            missing_m_types.append(m_type)

In [27]:
len(missing_m_types)

4

In [28]:
missing_m_types[:5]

['L1_DLAC', 'L1_SLAC', 'L1_SLAC', 'L1_SLAC']

## Check which E-types not yet present

In [72]:
missing_e_types = list()
for el in met_mapping.keys():
    fragments = el.split("_")
    if len(fragments) == 3:
        e_type = fragments[2]
        if e_type not in labels:
            missing_e_types.append(e_type)

In [73]:
len(missing_e_types)

0

# Add a new class to the Cell Type Ontology

In [98]:
def add_term(label, parent_label=None):
    new_s = rdflib.URIRef(f"http://bbp.epfl.ch/neurosciencegraph/ontologies/celltypes/{label.replace(' ', '')}") # TODO set the base
    triples_to_add = set()
    if parent_label:
        for s, p, o in cell_type_ontology.triples((None, RDFS.label, rdflib.term.Literal(parent_label, lang='en'))):
            print(s)
            triples_to_add.add((new_s, RDFS.subClassOf, s))
    triples_to_add.add((new_s, RDFS.label, rdflib.term.Literal(label, lang='en')))
    triples_to_add.add((new_s, RDF.type, OWL.Class))
    for el in triples_to_add:
        cell_type_ontology.add(el)

In [99]:
for m in missing_m_types:
    add_term(m)

# Serialize the updated Cell Type Ontology

In [100]:
cell_type_ontology.serialize(destination="/Users/akkaufma/Desktop/cell-type-ontology.ttl")

<Graph identifier=N0c4f0a8675b747d7955ce8d4e82736e0 (<class 'rdflib.graph.Graph'>)>

The updated Cell Type Ontology then needs to be merged back into WebProtégé by:
1. Navigating to the [Cell Type Ontology](https://webprotege.kcp.bbp.epfl.ch/#projects/968c9144-bca3-4436-bdb5-6529d46016b9/edit/Classes)
2. Clicking on `Project` in the top right corner
3. Clicking on `Apply External Edits` 
4. Selecting the file that you have just serialised