# Ancestor and Descendant Mappings for Tissues and Cell Types

## Overview

The ontology-aware tissue and cell type filters in the Single Cell Data Portal each require two artifacts generated by this notebook:

#### 1. Ancestor Mappings
To facilitate result set filtering, datasets must be tagged with the set of ancestors for each tissue and cell type value associated with it. For example, if a dataset is tagged with the tissue `lung`, all ancestors of `lung` must be added to the dataset's `tissue_ancestors` value.

This notebook generates a dictionary of ancestors keyed by either a tissue or cell type ontology term ID. The dictionary is copied to the Single Cell Data Portal's `tissue_ontology_mappings` or `cell_type_ontology_mappings` constants (see [/utils/ontology_mappings/constants.py](https://github.com/chanzuckerberg/single-cell-data-portal/tree/main/backend/corpora/common/utils/ontology_mappings/constants.py)) and the backend then joins datasets and their ancestor mappings on request of the `datasets/index` API endpoint.

The ancestor mappings should be updated when:

1. The ontology version is updated, or,
2. A new tissue or cell type is added to the production corpus.

#### 2. Descendant Mappings
To facilitate in-filter, cross-panel restriction of filter values, a descendant hierarchy dictionary is required by the Single Cell Data Portal frontend. For example, if a user selectes `hematopoietic system` in the tissue filter's `System` panel, the values in the tissue filter's `Organ` and `Tissue` panels must be restricted by `hematopoietic system`.

This notebook generates a dictionary of descendants keyed by tissue or cell type ontology term ID. The dictionary is copied to the Single Cell Data Portal's frontend constants `TISSUE_DESCENDANTS` or `CELL_TYPE_DESCENDANTS` (see [constants.ts](https://github.com/chanzuckerberg/single-cell-data-portal/blob/main/frontend/src/components/common/Filter/common/constants.ts)).

The descendant mappings should be updated when:

1. The ontology version is updated,
2. A new tissue or cell type is added to the production corpus, or,
3. The hand-curated systems, organs, cell classes or cell subclasses are updated.

## Notebook Implementation

### Tissues
This notebook extracts a subgraph of UBERON starting with a set of hand-curated systems. Specifically, this notebook: 

1. Loads the required ontology file, pinned for 2.0.0 schema.
2. Builds descendants of all systems and orphans (i.e. tissues in production that have no corresponding system), traversing both `is_a` and `part_of` relationships.

From the subgraph, the two artifacts described above can then be generated:

1. Build an ancestor dictionary 
    - Maps every tissue in the production corpus to their ancestors.
    - Writes the dictionary to a JSON file, to be copied into `tissue_ontology_mapping` in the Single Cell Data Portal.

2. Build a descendant dictionary
    - Builds a dictionary, mapping every tissue in the production corpus to their descendants. Descendants are limited to the set of tissues lower in the tissue hierarchy than themselves. For example, systems can have organ or tissue descendants, organs can have tissue descendants and tissues can have no descendants.
    - Writes the dictionary to a JSON file, to be copied into `TISSUE_DESCENDANTS` in the Single Cell Data Portal.

#### Hand-Curation of Systems and Organs
Systems and organs were hand-curated in this [spreadsheet](https://docs.google.com/spreadsheets/d/18761SLamZUN9FLAMV_zmg0lutSSUkArCEs8GnprxtZE/edit#gid=717648045).

### Cell Types
This notebook extracts a subgraph of CL starting with a set of hand-curated cell classes. Specifically, this notebook: 

1. Loads the required ontology file, pinned for 2.0.0 schema.
2. Builds descendants of all cell classes and orphans (i.e. cell types in production that have no corresponding cell class), traversing only `is_a` relationships.

From the subgraph, the two artifacts described above can then be generated:

1. Build an ancestor dictionary 
    - Maps every cell type in the production corpus to their ancestors.
    - Writes the dictionary to a JSON file, to be copied into `cell_type_ontology_mapping` in the Single Cell Data Portal.

2. Build a descendant dictionary
    - Builds a dictionary, mapping every cell type in the production corpus to their descendants. Descendants are limited to the set of cell types lower in the cell type hierarchy than themselves. For example, cell classes can have cell subclass or cell type descendants, cell subclasses can have cell type descendants and call types can have no descendants.
    - Writes the dictionary to a JSON file, to be copied into `TISSUE_DESCENDANTS` in the Single Cell Data Portal.

#### Hand-Curation of Cell Classes and Cell Subclasses
Cell classes and cell subclasses were hand-curated in this [spreadsheet](https://docs.google.com/spreadsheets/d/1ebGc-LgZJhNsKinzQZ3rpzuh1e1reSH3Rcbn88mCOaU/edit#gid=1625183014).

In [None]:
!pip install owlready2
!apt install graphviz
!apt install libgraphviz-dev
!pip install pygraphviz

In [3]:
# Load owl.info to grab latest ontology sources
owl_info_yml = '../backend/ontology_files/owl_info.yml'
with open(owl_info_yml, "r") as owl_info_handle:
  owl_info = yaml.safe_load(owl_info_handle)

In [4]:
# Load CL, pinned for 3.0.0 schema.
cl_latest_key = owl_info['CL']['latest']
cl_ontology = owl_info['CL']['urls'][cl_latest_key]
cl_world = World()
cl_world.get_ontology(cl_ontology).load()

get_ontology("http://purl.obolibrary.org/obo/cl.owl#")

In [5]:
# Load UBERON, pinned for 3.0.0 schema.
uberon_latest_key = owl_info['UBERON']['latest']
uberon_ontology = owl_info['UBERON']['urls'][uberon_latest_key]
uberon_world = World()
uberon_world.get_ontology(uberon_ontology).load()

get_ontology("http://purl.obolibrary.org/obo/uberon.owl#")

#### Tissue Constants

In [6]:
# Hand-curated systems.
system_tissues = [
  "UBERON_0001017", "UBERON_0004535", "UBERON_0001009", "UBERON_0001007", "UBERON_0000922", "UBERON_0000949", "UBERON_0002330", "UBERON_0002390", "UBERON_0002405", "UBERON_0000383", "UBERON_0001016", "UBERON_0000010", "UBERON_0001008", "UBERON_0000990", "UBERON_0001004", "UBERON_0001032", "UBERON_0001434"
]

In [7]:
# Hand-curated organs.
organ_tissues = [
  "UBERON_0000992","UBERON_0000029", "UBERON_0002048", "UBERON_0002110", "UBERON_0001043", "UBERON_0003889", "UBERON_0018707", "UBERON_0000178", "UBERON_0002371", "UBERON_0000955", "UBERON_0000310", "UBERON_0000970", "UBERON_0000948", "UBERON_0000160", "UBERON_0002113", "UBERON_0002107", "UBERON_0000004", "UBERON_0001264", "UBERON_0001987", "UBERON_0002097", "UBERON_0002240", "UBERON_0002106", "UBERON_0000945", "UBERON_0002370", "UBERON_0002046", "UBERON_0001723", "UBERON_0000995", "UBERON_0001013"
]

In [8]:
# Production tissues as of 09/09/2022.
tissues = [
  "UBERON_8410026","UBERON_8410025","UBERON_0002436","UBERON_0009834","UBERON_0016538","UBERON_0002113","UBERON_0000362","CL_0002328 (cell culture)","UBERON_0001225","UBERON_0000059","UBERON_0035328","UBERON_0002116","UBERON_0000160","UBERON_0000178","UBERON_0000115","UBERON_0001005","UBERON_0000947","UBERON_0002048","UBERON_0002098","UBERON_0002084","UBERON_0002080","UBERON_0002094","UBERON_0002079","UBERON_0002078","UBERON_0001384","UBERON_0002370","UBERON_0002367","UBERON_0000057","UBERON_0002037","UBERON_0000006","UBERON_0002728","UBERON_0003126","UBERON_0005616","UBERON_0001155","UBERON_0002107","UBERON_0002097","UBERON_0001013","UBERON_0010032","UBERON_0018707","UBERON_0002371","UBERON_0002081","UBERON_0002082","UBERON_0001811","UBERON_0000964","UBERON_0001621","UBERON_0000016","UBERON_0001295","UBERON_0000017","UBERON_0000970","UBERON_0001542","UBERON_0001817","UBERON_0000029","UBERON_0001911","UBERON_0002378","UBERON_0008612","UBERON_0002385","UBERON_0001296","UBERON_0001831","UBERON_0010033","UBERON_0002382","UBERON_0003902","UBERON_0001773","UBERON_0001416","UBERON_0001868","UBERON_0002108","UBERON_0002106","UBERON_0002190","UBERON_0001736","UBERON_0001723","UBERON_0000995","UBERON_0002049","UBERON_0000955","UBERON_0002661","UBERON_0009958","UBERON_0000948","UBERON_0003661","UBERON_0001264","UBERON_0000956","UBERON_0001228","UBERON_0002266","UBERON_0005383","UBERON_0001885","UBERON_0003876","UBERON_0007628","UBERON_0001950","UBERON_0001882","UBERON_0002894","UBERON_0004167","UBERON_0006514","UBERON_0004725","UBERON_0000451","UBERON_0034751","UBERON_0008933","UBERON_0008934","UBERON_0001786","UBERON_0002822","UBERON_0013682","UBERON_0015143","UBERON_0001348","CL_0000115 (cell culture)","UBERON_0001103","UBERON_0003428","UBERON_0003968","UBERON_0002369","UBERON_0001637","UBERON_0001156","UBERON_0007106","UBERON_0002114","CL_0002322 (cell culture)","UBERON_0001043","UBERON_0003889","UBERON_0002110","UBERON_0002115","UBERON_0001630","UBERON_0003688","UBERON_0000992","UBERON_0001987","UBERON_0000977","UBERON_0001052","UBERON_0002228","UBERON_0001159","UBERON_0002240","UBERON_0000945","UBERON_0001871","UBERON_0000473","UBERON_0002046","UBERON_0001157","UBERON_0012168","UBERON_0000056","UBERON_0000002","UBERON_0004339","UBERON_0001154","UBERON_0001624","UBERON_0009472","UBERON_0013706","UBERON_0000175","UBERON_0002429","UBERON_0012648","UBERON_8410010","UBERON_0016632","UBERON_0001117","UBERON_0009835","UBERON_0004026","UBERON_0002421","UBERON_0004024","UBERON_0001872","UBERON_0000995 (organoid)","UBERON_0004903","UBERON_0002185","UBERON_0004802","UBERON_0001153","UBERON_0000400","UBERON_0000030","UBERON_0002509","UBERON_0001134","UBERON_0007644","UBERON_0001836","UBERON_0001893","UBERON_0001707","UBERON_0018303","UBERON_0008946","UBERON_0000004","UBERON_0000966","UBERON_0039167","UBERON_0008953","UBERON_0013756","UBERON_0001294","UBERON_0001293","UBERON_0000074","UBERON_0002771","UBERON_0001890","UBERON_0016525","UBERON_0002811","UBERON_0002802","UBERON_0002808","UBERON_0002810","UBERON_0002803","UBERON_0002809","UBERON_0023852","CL_0000082 (cell culture)","UBERON_0001870","UBERON_0016540","UBERON_0016530","UBERON_0001976","UBERON_8410000","CL_0002327 (cell culture)","UBERON_0002420","UBERON_0002021","UBERON_0002807","UBERON_0007625","CL_0000010 (cell culture)","CL_0010003 (cell culture)","UBERON_0036288","UBERON_0000310","UBERON_0004648","UBERON_0001388","UBERON_0008954","UBERON_0000344","UBERON_0001511","UBERON_0000966 (organoid)","UBERON_0000014","UBERON_0003517","UBERON_0001224","UBERON_0002048 (organoid)"
]

In [9]:
# Production tissues with no corresponding hand-curated system; required so
# that they are explicitly added to the generated subgraph.
orphan_tissues = [
  "UBERON_0001013", # adipose tissue
  "UBERON_0009472", #	axilla
  "UBERON_0018707", # bladder organ
  "UBERON_0000310", # breast
  "UBERON_0001348", # brown adipose 
  "UBERON_0007106", #	chorionic villus
  "UBERON_0000030", #	lamina propria
  "UBERON_0015143",	# mesenteric fat pad
  "UBERON_0000344", # mucosa
  "UBERON_0003688", #	omentum
  "UBERON_0001264", # pancreas
  "UBERON_0000175", #	pleural effusion
  "UBERON_0001836", #	saliva
  "UBERON_0001416",	# skin of abdomen
  "UBERON_0002097", # skin of body
  "UBERON_0001868", # skin of chest
  "UBERON_0001511", # skin of leg
  "UBERON_0002190", # subcutaneous adipose tissue
  "UBERON_0035328",	# upper outer quadrant of breast
  "UBERON_0000014" # zone of skin    
];

#### Cell Type Constants

In [10]:
# Hand-curated cell classes.
cell_classes = [
  "CL_0002494", "CL_0002320", "CL_0000473", "CL_0000066", "CL_0000988", "CL_0000187", "CL_0002319", "CL_0011115", "CL_0000151"
]

In [11]:
# Hand-curated cell subclasses.
cell_subclasses = [
    "CL_0000738", "CL_0000542", "CL_0000763", "CL_0000084", "CL_0002076", "CL_0002078", "CL_0000540", "CL_0011026", "CL_0000115", "CL_0008001", "CL_0000163", "CL_0000236", "CL_0000099", "CL_0000234", "CL_0000624", "CL_0000057", "CL_0000125", "CL_0000117", "CL_0000235", "CL_0000451", "CL_0000625", "CL_0000679", "CL_0000617", "CL_0000499", "CL_0000576", "CL_0000101", "CL_0000669", "CL_0000152", "CL_0000100"
]

In [12]:
# Production cell types as of 09/13/2022.
cell_types = [
  "CL_0000617","CL_0000127","CL_0000115","CL_0000679","CL_0000129","CL_0000003","CL_0000128","CL_0002453","CL_0000236","CL_0000084","CL_0000232","CL_0000576","CL_0002319","CL_0019018","CL_0000653","CL_1001432","CL_1001431","CL_1000849","CL_0000648","CL_1000692","CL_1001106","CL_1000838","CL_0000738","CL_0000235","CL_1000850","CL_0000650","CL_1000768","CL_1000547","CL_1000909","CL_1001111","CL_1000452","CL_0000669","CL_0000731","CL_0002341","CL_1000296","CL_0002340","CL_0000165","CL_0000151","CL_1000299","CL_1000487","CL_0002071","CL_0002253","CL_0000037","CL_0002250","CL_1000320","CL_0000646","CL_0000163","CL_0002138","CL_0002326","CL_0011026","CL_0000789","CL_1000342","CL_0009006","CL_0000057","CL_0000125","CL_1000326","CL_0001065","CL_0000097","CL_0000113","CL_1000343","CL_0000786","CL_1000278","CL_0009012","CL_0002504","CL_0002088","CL_1000279","CL_1000275","CL_0000624","CL_0000625","CL_0000860","CL_0000990","CL_0000798","CL_0000623","CL_0000875","CL_0000980","CL_0000784","CL_0000064","CL_2000059","CL_0000548","CL_0000066","CL_0000814","CL_0000763","CL_0000775","CL_0002544","CL_0002548","CL_0000135","CL_0000145","CL_0002503","CL_0002598","CL_0002145","CL_0000158","CL_0000451","CL_0002553","CL_0002393","CL_1001603","CL_1000223","CL_0000542","CL_0000782","CL_0002241","CL_0000815","CL_0002633","CL_0002591","CL_0002062","CL_0002063","CL_0002543","CL_0005006","CL_0000233","CL_0000808","CL_0002293","CL_0000893","CL_1001597","CL_1001430","CL_0002394","CL_0002399","CL_0001044","CL_0001050","CL_0000864","CL_0000708","CL_0000192","CL_0000696","CL_0000171","CL_0000173","CL_0002064","CL_0002079","CL_0005019","CL_0002410","CL_0000169","CL_0002627","CL_0002329","CL_0000138","CL_0000094","CL_0000312","CL_0000319","CL_0002600","CL_1000413","CL_0000584","CL_0000164","CL_0000764","CL_0001071","CL_0000677","CL_0000131","CL_0008015","CL_0009016","CL_0009017","CL_0019031","CL_0000186","CL_0000898","CL_0009011","CL_0000091","CL_1000398","CL_0000182","CL_0000766","CL_0002187","CL_0000362","CL_0002337","CL_1000428","CL_0002306","CL_0000666","CL_1000454","CL_1001016","CL_0002144","CL_0000077","CL_0000938","CL_0000934","CL_0001043","CL_0001049","CL_0000904","CL_0000913","CL_0000776","CL_0000787","CL_0000940","CL_0000788","CL_0000895","CL_0000900","CL_0001058","CL_0000818","CL_0000099","CL_0000100","CL_0000765","CL_0000914","CL_0002009","CL_0000050","CL_0000547","CL_0000068","CL_0000632","CL_0000644","CL_2000043","CL_0000065","CL_1001474","CL_0002259","CL_0000540","CL_0000047","CL_0008030","CL_0008029","CL_0002629","CL_0000807","CL_0000453","CL_0002573","CL_0000583","CL_0000767","CL_1001319","CL_1001428","CL_0000081","CL_0002204","CL_0000746","CL_0010022","CL_0000188","CL_0002489","CL_0002350","CL_2000018","CL_0001066","CL_0002191","CL_0008001","CL_0000816","CL_0002420","CL_1000892","CL_1000497","CL_1001045","CL_1000839","CL_0002048","CL_0008019","CL_0000134","CL_0002570","CL_0002275","CL_0000391","CL_0000817","CL_0000559","CL_0000594","CL_0000499","CL_0002131","CL_1001107","CL_1000597","CL_1001318","CL_0011020","CL_1000500","CL_0000740","CL_0002129","CL_0000905","CL_1000271","CL_1000143","CL_0000556","CL_1000491","CL_0001057","CL_0017000","CL_0019001","CL_0000359","CL_0000663","CL_0000984","CL_0000982","CL_0002203","CL_0002618","CL_0000498","CL_0002046","CL_0000557","CL_0000838","CL_0000791","CL_0002563","CL_0000897","CL_0000909","CL_0000809","CL_0000813","CL_0005011","CL_0002201","CL_0005009","CL_0000939","CL_0000255","CL_4023011","CL_4023016","CL_1001602","CL_4023013","CL_4023008","CL_0000647","CL_4023012","CL_0002609","CL_4023018","CL_4023015","CL_4023017","CL_0000561","CL_0000745","CL_0000748","CL_0000573","CL_0000604","CL_0000750","CL_0000749","CL_0000636","CL_0002586","CL_0001054","CL_0002397","CL_0000794","CL_0000896","CL_0000906","CL_1000309","CL_0002539","CL_0002332","CL_0000595","CL_0000988","CL_1000493","CL_0000067","CL_0000071","CL_0000785","CL_0019022","CL_0019021","CL_0019029","CL_1000488","CL_0000863","CL_0019028","CL_0019026","CL_0000492","CL_0000908","CL_0000894","CL_0002038","CL_0000049","CL_0000038","CL_0000823","CL_2000055","CL_0000841","CL_0002677","CL_0000921","CL_1000449","CL_0001034","CL_1000311","CL_0010008","CL_0000513","CL_0001082","CL_0000006","CL_0000514","CL_0011024","CL_0002422","CL_0000150","CL_0002396","CL_0000810","CL_0000811","CL_2000072","CL_0002139","CL_0000160","CL_0002325","CL_0000187","CL_0002098","CL_0019019","CL_0001203","CL_0001062","CL_0001063","CL_0000907","CL_2000001","CL_0002057","CL_0000800","CL_0000076","CL_0011107","CL_0000121","CL_0000622","CL_0000103","CL_0011101","CL_0000166","CL_0000575","CL_0002097","CL_2000041","CL_0007011","CL_0002632","CL_0008036","CL_0000397","CL_0000120","CL_0005026","CL_0011004","CL_0000126","CL_0000123","CL_0000555","CL_0000210","CL_0000122","CL_0011103","CL_0000525","CL_0000209","CL_0002488","CL_2000046","CL_0005025","CL_0000890","CL_1000348","CL_0002207","CL_0000442","CL_0000148","CL_0005012","CL_1000329","CL_0000972","CL_2000093","CL_0000082","CL_0001064","CL_1001568","CL_0001024","CL_0001077","CL_0000987","CL_0000985","CL_0000986","CL_0000545","CL_0000899","CL_0000546","CL_0001042","CL_0001056","CL_0001081","CL_0000839","CL_0000970","CL_1000312","CL_0002075","CL_0000861","CL_0002480","CL_0000313","CL_0019003","CL_0000915","CL_0002364","CL_0002425","CL_0002365","CL_0000820","CL_0000821","CL_0001074","CL_0002375","CL_0002092","CL_0000936","CL_0000771","CL_0001069","CL_0002377","CL_0000957","CL_1000272","CL_0000956","CL_0002355","CL_0000826","CL_0000836","CL_0002368","CL_0000558","CL_0000954","CL_0002028","CL_0001029","CL_0002154","CL_0000837","CL_0002151","CL_0000553","CL_0002193","CL_0002045","CL_0000055","CL_0000937","CL_1000450","CL_1001225","CL_1000494","CL_1000349","CL_0002208","CL_0002371","CL_0005022","CL_0002622","CL_1000305","CL_1000304","CL_0002623","CL_1000432","CL_0002320","CL_1001589","CL_1000334","CL_1000436","CL_0002149","CL_0000287","CL_0000190","CL_0000136","CL_4006000","CL_1001516","CL_0019032","CL_0002538","CL_0002363","CL_0002518","CL_2000016","CL_0000185","CL_0002366","CL_0009009","CL_0002303","CL_0000681","CL_0002370","CL_0002585","CL_0009005","CL_1000331","CL_1000330","CL_0008011","CL_0000189","CL_1000495","CL_0000019","CL_0000034","CL_0000114","CL_0000388","CL_0002673","CL_0000844","CL_0000790","CL_0002132","CL_0000630","CL_0002601","CL_0002188","CL_1001096","CL_1000615","CL_1001099","CL_1000510","CL_1001109","CL_1001108","CL_1001285","CL_0002419","CL_0002010","CL_0000827","CL_4023041","CL_0008034","CL_0000010","CL_1001567","CL_0000485","CL_0011025","CL_0010003","CL_0001204","CL_0009038","CL_0000843","CL_0000751","CL_0000792","CL_0000802","CL_0000922","CL_1000447","CL_0000234","CL_0000079","CL_0000842","CL_0000322","CL_0000216","CL_2000095","CL_0002322","CL_1001433","CL_0009002","CL_0000155","CL_0000670","CL_0005010","CL_0002258","CL_0002603","CL_0000119","CL_0001031","CL_0002671","CL_0011108","CL_0000222","CL_0011012","CL_0000682","CL_0009010","CL_0000979","CL_0002117","CL_0000001","CL_0002236","CL_0000183","CL_0002457","CL_0002231","CL_1000448","CL_1000458","CL_0002324","CL_2000021","CL_0008002","CL_0002620","CL_0002521","CL_0002678","CL_0000501","CL_0000503","CL_4023051","CL_4023038","CL_4023036","CL_4023040","CL_0002605","CL_4023070","CL_0000432","CL_1000698","CL_1001131","CL_1001033","CL_0000695","CL_0010011","CL_0013000","CL_0008024","CL_0002067","CL_0000502"
]

In [13]:
# Production cell types with no corresponding hand-curated cell class; required
# so that they are explicitly added to the generated subgraph.
orphan_cell_types = [
    "CL_0000003", "CL_0009012", "CL_0000064", "CL_0000548", "CL_0000677", "CL_0000186", "CL_0009011", "CL_1001319", "CL_0000188", "CL_1000497", "CL_0008019", "CL_1000597", "CL_1000500", "CL_1000271", "CL_0000663", "CL_0000255", "CL_0001034", "CL_0001063", "CL_0011101", "CL_0008036", "CL_0000525", "CL_0002488", "CL_0000148", "CL_0001064", "CL_0002092", "CL_0002371", "CL_0009005", "CL_0000019", "CL_0000114", "CL_0000630", "CL_0008034", "CL_0000010", "CL_0009002", "CL_0000670", "CL_0000222", "CL_0009010", "CL_0000001", "CL_0000183", "CL_1000458", "CL_2000021"
]

#### Function Definitions

In [14]:
def build_descendants_graph(entity_name, graph):
  """
  Recursively build set of descendants (that is, is_a descendants) for the
  given entity and add to graph.
  """

  # Add node to graph, this covers the case where a top-level tissue has no
  # children.
  graph.add_node(entity_name)

  # List descendants via is_a relationship.
  subtypes = list_direct_descendants(entity_name)

  for subtype in subtypes:

    child_name = subtype.name

    # Check if child has been added to graph already.
    child_visted = graph.has_node(child_name)

    # Add valid child to graph as a descendant.
    graph.add_edge(entity_name, child_name) 

    # Build graph for child if it hasn't already been visited.
    if not child_visted:
      build_descendants_graph(child_name, graph)

In [15]:
def build_descendants_and_parts_graph(entity_name, graph):
  """
  Recursively build set of descendants and parts (that is, include both is_a
  and part_of descendants) for the given entity and add to graph.
  """

  # Add node to graph, this covers the case where a top-level tissue has no
  # children.
  graph.add_node(entity_name)

  # List descendants via is_a and part_of relationships.
  subtypes_and_parts = list_direct_descendants_and_parts(entity_name)

  for subtype_or_part in subtypes_and_parts:

    # Each child should be a singleton array; detect, report and continue if
    # an invalid child is found (manual investigation of failure is required).
    child_len = len(subtype_or_part)
    if child_len == 0 or child_len > 1:
      print("Invalid child length", child_len, subtype_or_part)
      continue;

    child = subtype_or_part[0]

    # Ignore axioms, only add true entities.
    if not is_axiom(child):
      child_name = child.name

      # Ignore disjoint.
      if child_name == "Nothing":
        continue

      # Check if child has been added to graph already.
      child_visted = graph.has_node(child_name)

      # Add valid child to graph as a descendant.
      graph.add_edge(entity_name, child_name) 

      # Build graph for child if it hasn't already been visited.
      if not child_visted:
        build_descendants_and_parts_graph(child_name, graph)

In [16]:
def build_graph_for_cell_types(entity_names):
  """
  Extract a subgraph of CL for the given cell types.
  """
  graph = pgv.AGraph()
  for entity_name in entity_names:
      build_descendants_graph(entity_name, graph)
  return graph

In [17]:
def build_graph_for_tissues(entity_names):
  """
  Extract a subgraph of UBERON for the given tissues.
  """
  tissue_graph = pgv.AGraph()
  for entity_name in entity_names:
      build_descendants_and_parts_graph(entity_name, tissue_graph)
  return tissue_graph

In [18]:
def is_axiom(entity):
  """
  Returns true if the given entity is an axiom.
  For example, obo.UBERON_0001213 & obo.BFO_0000050.some(obo.NCBITaxon_9606)
  """
  return hasattr(entity, "Classes")

In [19]:
def is_cell_culture(entity_name):
  """
  Returns true if the given entity name contains (cell culture).
  """
  return "(cell culture)" in entity_name

In [20]:
def is_cell_culture_or_organoid(entity_name):
  """
  Returns true if the given entity name contains (cell culture) or (organoid).
  """
  return is_cell_culture(entity_name) or is_organoid(entity_name)

In [21]:
def is_organoid(entity_name):
  """
  Returns true if the given entity name contains "(organoid)".
  """
  return "(organoid)" in entity_name

In [22]:
def key_ancestors_by_entity(entity_names, graph):
  """
  Build a dictionary of ancestors keyed by entity for the given entities.
  """

  ancestors_by_entity = {}
  for entity_name in entity_names:
    descendants = set()
    list_ancestors(entity_name, graph, descendants)

    sanitized_entity_name = reformat_ontology_term_id(entity_name, to_writable=True)
    sanitized_ancestors = [reformat_ontology_term_id(descendant, to_writable=True) for descendant in descendants]

    ancestors_by_entity[sanitized_entity_name] = sanitized_ancestors

  return ancestors_by_entity

In [23]:
def key_organoids_by_ontology_term_id(entity_names):
  """
  Returns a dictionary of organoid ontology term IDs by stem ontology term ID.
  """
  
  organoids_by_ontology_term_id = {}
  for entity_name in entity_names:
    if is_organoid(entity_name):
      ontology_term_id = entity_name.replace(" (organoid)", "")
      organoids_by_ontology_term_id[ontology_term_id] = entity_name

  return organoids_by_ontology_term_id

In [24]:
def list_ancestors(entity_name, graph, ancestor_set):
  """
  From the given graph, recursively build up set of ancestors for the given
  entity.
  """

  ancestor_set.add(entity_name)

  # Ignore cell culture and organoids
  if is_cell_culture_or_organoid(entity_name):
    return ancestor_set

  try:
    ancestor_entities = graph.predecessors(entity_name)
  except KeyError:
    # Detect, report and continue if entity not found in graph. Manual
    # investigation of failure is required.
    print(f"{entity_name} not found")
    return ancestor_set

  for ancestor_entity in ancestor_entities:
    list_ancestors(ancestor_entity, graph, ancestor_set)

  return ancestor_set

In [25]:
def list_descendants(entity_name, graph, all_successors):
  """
  From the given graph, recursively build up set of descendants for the given
  entity from the given
  """

  # Ignore cell culture and organoid tissues.
  if is_cell_culture(entity_name) or is_organoid(entity_name):
    return

  successors = []
  try:
    successors = graph.successors(entity_name)
  except KeyError:
    # Detect, report and continue if entity not found in graph. Manual
    # investigation of failure is required.
    print(f"{entity_name} not found")

  # Add descendants to the set.
  if len(successors):
    all_successors.update(successors)

  # Find descendants of children of entity.
  for successor in successors:
    list_descendants(successor, graph, all_successors)

In [26]:
def list_direct_descendants(entity_name):
  """
  Return the set of descendants for the given entity.
  """

  entity = cl_world.search_one(iri = f'http://purl.obolibrary.org/obo/{entity_name}')
  if not entity:
    print(f"{entity_name} not found")
    return []

  return entity.subclasses()

In [27]:
def list_direct_descendants_and_parts(entity_name):
  """
  Determine the set of descendants and parts for the given entity.
  
  Tissues descendants must be traversed through both is_a and part_of
  relationships. For example, "retina" is_a "photoceptor array" whereas
  "photoceptor array" is part_of "eye". To build the full list of descendants
  for eye, both is_a and part_of relationships must be examined.

  WHERE
  --
  Looks for entities that are a subclass of the restriction (anonymous class) 
  where the definition of the restriction set is: has some members (part_of)
  of the given entity. See https://www.cs.vu.nl/~guus/public/owl-restrictions/.
  
  ?class rdfs:subClassOf <http://purl.obolibrary.org/obo/{entity}>
  --
  Looks for direct descendants (is_a).
  """
  
  query = """
    SELECT ?class 
    WHERE {{
      {{
        ?class rdfs:subClassOf ?restriction .
        ?restriction owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> .
        ?restriction owl:someValuesFrom <http://purl.obolibrary.org/obo/{entity}> .
      }}

    UNION {{
      ?class rdfs:subClassOf <http://purl.obolibrary.org/obo/{entity}>
      }}
    }}
    """.format(entity=entity_name)
  classes = uberon_world.sparql(query)
  return classes

In [28]:
def reformat_ontology_term_id(ontology_term_id: str, to_writable: bool = True):
    """
    Converts ontology term id string between two formats:
        - `to_writable == True`: from "UBERON_0002048" to "UBERON:0002048"
        - `to_writable == False`: from "UBERON:0002048" to "UBERON_0002048"
    """

    if to_writable:
        if ontology_term_id.count("_") != 1:
            raise ValueError(f"{ontology_term_id} is an invalid ontology term id, it must contain exactly one '_'")
        return ontology_term_id.replace("_", ":")
    else:
        if ontology_term_id.count(":") != 1:
            raise ValueError(f"{ontology_term_id} is an invalid ontology term id, it must contain exactly one ':'")
        return ontology_term_id.replace(":", "_")

In [29]:
def write_ancestors_by_entity(entities, graph, file_name):
  """
  Create dictionary of ancestors keyed by entity and write to file. The
  contents of the generated file is copied into ${entity}_ontology_mapping.py
  in the single-cell-data-portal repository and is used to key datasets with
  their corresponding entity ancestors.
  """
  ancestors_by_entity = key_ancestors_by_entity(entities, graph)
  with open(file_name, "w") as f:
    json.dump(ancestors_by_entity, f)

In [30]:
def write_descendants_by_entity(entity_hierarchy, graph, file_name):
  """
  Create descendant relationships between the given entity hierarchy. 
  """
  all_descendants = {}
  for idx, entity_set in enumerate(entity_hierarchy):

    # Create the set of descendants that can be included for this entity set.
    # For example, systems can include organs or tissues,
    # organs can only include tissues, tissues can't have descendants.
    accept_lists = entity_hierarchy[idx+1:]

    # Tissue or cell type for example will not have any descendants.
    if not accept_lists:
      continue

    accept_list = [i for sublist in accept_lists for i in sublist]
    organoids_by_ontology_term_id = key_organoids_by_ontology_term_id(accept_list)

    # List descendants of entity in this set.
    for entity_anme in entity_set:
      descendants = set()  
      list_descendants(entity_anme, graph, descendants)

      # Determine the set of descendants that be included.
      descendant_accept_list = []
      for descendant in descendants:

        # Include all entities in the accept list.
        if descendant in accept_list:
          descendant_accept_list.append(descendant)

        # Add organoid descendants, if any.
        if descendant in organoids_by_ontology_term_id:
          descendant_accept_list.append(organoids_by_ontology_term_id[descendant])

      # Add organoid entity, if any.
      if entity_anme in organoids_by_ontology_term_id:
        descendant_accept_list.append(organoids_by_ontology_term_id[entity_anme])

      if not descendant_accept_list:
        continue

      # Add descendants to dictionary.
      sanitized_entity_name = reformat_ontology_term_id(entity_anme, to_writable=True)
      sanitized_descendants = [reformat_ontology_term_id(descendant, to_writable=True) for descendant in descendant_accept_list]
      all_descendants[sanitized_entity_name] = sanitized_descendants

  with open(file_name, "w") as f:
    json.dump(all_descendants, f)

#### Calculate Tissue Graph and Tissue Ancestor and Descendant Mappings

In [31]:
# Extract a subgraph from UBERON for the hand-curated systems and orphans,
# collapsing is_a and part_of relations.
tissue_graph = build_graph_for_tissues(system_tissues + orphan_tissues)

In [32]:
# Create ancestors file, the contents of which are to be copied to
# tissue_ontology_mapping.py and read by Single Cell Data Portal BE.
write_ancestors_by_entity(tissues, tissue_graph, "../backend/corpora/common/utils/ontology_mappings/fixtures/tissue_ontology_mapping.json");

In [33]:
# Create descendants file, the contents of which are to be copied to
# TISSUE_DESCENDANTS and read by Single Cell Data Portal FE.
tissue_hierarchy = [system_tissues, organ_tissues, tissues]
write_descendants_by_entity(tissue_hierarchy, tissue_graph, "tissue_descendants.json")

#### Calculate Cell Type Graph and Cell Type Ancestor and Descendant Mappings

In [34]:
# Extract a subgraph from CL for the hand-curated cell classes and orphans,
# including only is_a relationships.
cell_type_graph = build_graph_for_cell_types(cell_classes + orphan_cell_types)

In [35]:
# Create ancestors file, the contents of which will be loaded into
# cell_type_ontology_mapping and read by Single Cell Data Portal BE.
write_ancestors_by_entity(cell_types, cell_type_graph, "../backend/corpora/common/utils/ontology_mappings/fixtures/cell_type_ontology_mapping.json");

CL_0008030 not found
CL_0008029 not found
CL_0002609 not found
CL_0011107 not found


In [36]:
# Create descendants file, the contents of which are to be copied to
# CELL_TYPE_DESCENDANTS and read by Single Cell Data Portal FE.
cell_type_hierarchy = [cell_classes, cell_subclasses, cell_types]
write_descendants_by_entity(cell_type_hierarchy, cell_type_graph, "cell_type_descendants.json")