## Phenotypic Series queries

We will walk through

1. Finding all MONDOs that correspond to an OMIM Phenotypic Series
2. Extend these to the leaf MODNO nodes
3. Query associations between these leaf nodes and genes


In [1]:
from oaklib.selector import get_adapter

In [2]:
handle = get_adapter("sqlite:obo:mondo")

In [14]:
# the first time you run this it may be slow while mondo sqlite downloads
mappings = list(handle.sssom_mappings_by_source("OMIMPS"))

In [4]:
len(mappings)

548

In [5]:
mappings[0]

Mapping(subject_id='MONDO:0000005', predicate_id='oio:hasDbXref', object_id='OMIMPS:203655', mapping_justification='semapv:UnspecifiedMatching', subject_label=None, subject_category=None, predicate_label=None, predicate_modifier=None, object_label=None, object_category=None, author_id=[], author_label=[], reviewer_id=[], reviewer_label=[], creator_id=[], creator_label=[], license=None, subject_type=None, subject_source='MONDO', subject_source_version=None, object_type=None, object_source='OMIMPS', object_source_version=None, mapping_provider=None, mapping_cardinality=None, mapping_tool=None, mapping_tool_version=None, mapping_date=None, confidence=None, subject_match_field=[], object_match_field=[], match_string=[], subject_preprocessing=[], object_preprocessing=[], semantic_similarity_score=None, semantic_similarity_measure=None, see_also=[], other=None, comment=None)

## Get descendants and their genes

Next we will write a routine that takes each MONDO-PS mapping:

1. get descendants (IS_A) of the mondo to get the mondo *leaf*
2. query that leaf for `RO:0004003` links to the gene

This will be returned as a tuple

In [7]:
from oaklib.datamodels.vocabulary import IS_A
has_material_basis_in_germline_mutation_in = "RO:0004003"

def ps_mapping_to_gene(m):
    mondo_grouping_id = m.subject_id
    ps_id = m.object_id
    leafs = list(handle.descendants(mondo_grouping_id, [IS_A]))
    for mondo_leaf, _p, gene in handle.relationships(leafs, [has_material_basis_in_germline_mutation_in]):
            yield ps_id, mondo_grouping_id, mondo_leaf, gene

In [8]:
assocs = []
for m in mappings:
    assocs.extend(list(ps_mapping_to_gene(m)))

In [9]:
print(len(assocs))

3476


In [10]:
import pandas as pd

In [11]:
df = pd.DataFrame(assocs, columns =['PS', 'MONDO_GROUP', 'MONDO_LEAF', 'GENE'])

In [12]:
df

Unnamed: 0,PS,MONDO_GROUP,MONDO_LEAF,GENE
0,OMIMPS:203655,MONDO:0000005,MONDO:0008757,<http://identifiers.org/hgnc/5172>
1,OMIMPS:231200,MONDO:0000009,MONDO:0007686,<http://identifiers.org/hgnc/31928>
2,OMIMPS:231200,MONDO:0000009,MONDO:0007930,<http://identifiers.org/hgnc/4439>
3,OMIMPS:231200,MONDO:0000009,MONDO:0008332,<http://identifiers.org/hgnc/4439>
4,OMIMPS:231200,MONDO:0000009,MONDO:0008553,<http://identifiers.org/hgnc/4238>
...,...,...,...,...
3471,OMIMPS:608638,MONDO:0100440,MONDO:0010343,<http://identifiers.org/hgnc/14287>
3472,OMIMPS:209880,MONDO:0800031,MONDO:0800026,<http://identifiers.org/hgnc/9143>
3473,OMIMPS:610551,MONDO:0800174,MONDO:0013633,<http://identifiers.org/hgnc/2330>
3474,OMIMPS:145600,MONDO:0800188,MONDO:0007783,<http://identifiers.org/hgnc/10483>


In [13]:
# TODO: contact gene URIs

In [15]:
df.to_csv("output/ps-to-gene.tsv", sep="\t")