# Ontology mapping with the EBI ontology service APIs

This notebook describes a typical ontology mapping scenario where we want to map some textual values to ontology terms. In this workflow we will use the EBI's Ontology Lookup Service to lookup an ontology by label using the OLS REST API. Once we have a term we will fetch additional metadata about the term, including the synonyms and the term description. We will the get related terms, such as all the parent or ancerstral term. Finally we will look for mapppings to other ontology terms using the EBI's ontology mappings service, OxO. 

To run this tutorial you will need access to a running instance of the Ontology Lookup Service and OxO. By default it will run against the public servcies run by EBI, but can change the base URL to use a different instance of the service.  

In [9]:
OLS_BASE_URI='https://www.ebi.ac.uk/ols'
OXO_BASE_URI='https://www.ebi.ac.uk/spot/oxo'

ols_base = input("Your base OLS URL [https://www.ebi.ac.uk/ols]: " or OLS_BASE_URI)
oxo_base = input("Your base OXO URL [https://www.ebi.ac.uk/spot/oxo]: " or OXO_BASE_URI)

print("Using OLS {}, OXO {}".format(OLS_BASE_URI, OXO_BASE_URI))





In this tutorial we will uses some disease data from EBI's GWAS catalogue. 

* Chronic obstructive pulmonary disease
* Crohn's disease
* Sarcoidosis
* Cystic fibrosis
* Idiopathic pulmonary fibrosis
* Lung adenocarcinoma
* Chronic bronchitis

In [2]:
input_terms = [
    'Chronic obstructive pulmonary disease',
    'Crohn\'s disease',
    'Sarcoidosis',
    'Cystic fibrosis',
    'Idiopathic pulmonary fibrosis',
    'Lung adenocarcinoma',
    'Chronic bronchitis']


## Querying the OLS API by label 

In this first step we will use the [OLS REST API](https://www.ebi.ac.uk/ols/docs/api) to query the [Experimental Factor Ontology](https://www.ebi.ac.uk/efo) for exact label matches to our input term. 

In [3]:
import requests
import json
import urllib.parse

OLS_SEARCH_API = OLS_BASE_URI + '/api/search'

term_id_map = {}

for lookup in input_terms:
        
    search_params = {
        'q' : lookup,
        'ontology' : 'efo',
        'exact' : True
    }
    response = requests.get(OLS_SEARCH_API, params=search_params)
    if (response.ok):

        jData = json.loads(response.content)
        
        # get the first hit
        first_hit = jData['response']['docs'][0]
        
        # store the mapping for use later
        term_id_map[lookup] = first_hit['obo_id']
        
        print ('OLS search result for \'{}\': mapped to \'{}\' with id {}'.format(lookup, first_hit['label'], first_hit['obo_id']))
        
        
        

OLS search result for 'Chronic obstructive pulmonary disease': mapped to 'chronic obstructive pulmonary disease' with id EFO:0000341


OLS search result for 'Crohn's disease': mapped to 'Crohn's disease' with id EFO:0000384


OLS search result for 'Sarcoidosis': mapped to 'Sarcoidosis' with id Orphanet:797


OLS search result for 'Cystic fibrosis': mapped to 'Cystic fibrosis' with id Orphanet:586


OLS search result for 'Idiopathic pulmonary fibrosis': mapped to 'idiopathic pulmonary fibrosis' with id EFO:0000768
OLS search result for 'Lung adenocarcinoma': mapped to 'lung adenocarcinoma' with id EFO:0000571


OLS search result for 'Chronic bronchitis': mapped to 'chronic bronchitis' with id EFO:0006505


## Get term information including synonyms and description

We can use the OLS API to fetch more information about each term, including the terms description and synonyms.

In [4]:
OLS_TERMS_API = OLS_BASE_URI + '/api/terms'

for term_id in term_id_map.values():
    

    lookup_params = {
        'id' : term_id,
        'ontology' : 'efo',
    }
    response = requests.get(OLS_TERMS_API, params=lookup_params)

    if response.ok:
        jData = json.loads(response.content)

        print ("Term: {} ({}),\nDescription:{}\nSynonyms:{}\n\n"
               .format(
                    term_id, 
                    jData['_embedded']['terms'][0]['label'],
                    jData['_embedded']['terms'][0]['description'][0], 
                    jData['_embedded']['terms'][0]['synonyms']))


Term: EFO:0000341 (chronic obstructive pulmonary disease),
Description:A chronic and progressive lung disorder characterized by the loss of elasticity of the bronchial tree and the air sacs, destruction of the air sacs wall, thickening of the bronchial wall, and mucous accumulation in the bronchial tree. The pathologic changes result in the disruption of the air flow in the bronchial airways. Signs and symptoms include shortness of breath, wheezing, productive cough, and chest tightness. The two main types of chronic obstructive pulmonary disease are chronic obstructive bronchitis and emphysema.
Synonyms:['DISEASE (COPD), CHRONIC OBSTRUCTIVE', 'Chronic obstructive lung disease, NEC', 'COAD - Chronic obstructive airways disease', 'COPD, chronic obstructive pulmonary disease', 'chronic obstructive lung disease [Ambiguous]', 'OBSTRUCTIVE PULMONARY DISEASE (COPD), CHRONIC', 'chronic obstructive airways disease NOS', 'COPD, CHRONIC OBSTRUCTIVE PULMONARY DISEASE', 'obstructive pulmonary dise

Term: EFO:0000384 (Crohn's disease),
Description:A gastrointestinal disorder characterized by chronic inflammation involving all layers of the intestinal wall, noncaseating granulomas affecting the intestinal wall and regional lymph nodes, and transmural fibrosis. Crohn disease most commonly involves the terminal ileum; the colon is the second most common site of involvement.
Synonyms:['Enteritis, Granulomatous', 'Colitis, Granulomatous', 'Ileocolitis', "pediatric Crohn's disease", 'CROHNS DIS', 'Enteritis, Regional', 'Crohn Disease', "Crohn's disease of colon", "Crohn's disease", 'Gastritis Associated with Crohn Disease', 'Crohn disease', "Gastritis Associated with Crohn's Disease", 'granulomatous colitis', 'regional enteritis', 'Ileitis, Regional', 'Crohns Disease', "Crohn's disease of large bowel", 'CROHN DIS', "Crohn's associated gastritis", 'Ileitis, Terminal']




Term: Orphanet:797 (Sarcoidosis),
Description:Sarcoidosis is a multisystemic disorder of unknown cause characterized by the formation of immune granulomas in involved organs.
Synonyms:["Boeck's sarcoid", 'Besnier-Boeck-Schaumann disease', 'Boeck sarcoid']




Term: Orphanet:586 (Cystic fibrosis),
Description:Cystic fibrosis (CF) is a genetic disorder characterized by the production of sweat with a high salt content and mucus secretions with an abnormal viscosity.
Synonyms:['Mucoviscidosis', 'CF']




Term: EFO:0000768 (idiopathic pulmonary fibrosis),
Description:Chronic and progressive fibrosis of the lung parenchyma of unknown cause.
Synonyms:['idiopathic pulmonary fibrosis, familial', 'usual interstitial pneumonia', 'fibrocystic pulmonary dysplasia', 'idiopathic pulmonary fibrosis', 'UIP', 'IPF', 'cryptogenic fibrosing alveolitis', 'CFA']


Term: EFO:0000571 (lung adenocarcinoma),
Description:A carcinoma that arises from the lung and is characterized by the presence of malignant glandular epithelial cells. There is a male predilection with a male to female ratio of 2:1. Usually lung adenocarcinoma is asymptomatic and is identified through screening studies or as an incidental radiologic finding. If clinical symptoms are present they include shortness of breath, cough, hemoptysis, chest pain, and fever. Tobacco smoke is a known risk factor.
Synonyms:['bronchogenic lung adenocarcinoma', 'nonsmall cell adenocarcinoma', 'Adenocarcinoma of Lung', 'adenocarcinoma of the lung', 'non-sma

Term: EFO:0006505 (chronic bronchitis),
Description:A type of chronic obstructive pulmonary disease characterized by chronic inflammation in the bronchial tree that results in edema, mucus production, obstruction, and reduced airflow to and from the lung alveoli. The most common cause is tobacco smoking. Signs and symptoms include coughing with excessive mucus production, and shortness of breath.
Synonyms:['chronic bronchitis', 'bronchitis, chronic']




## Getting related parent terms

You can use the OLS API to get all direct parent/child terms, or fetch all descendants/ancerstors for a given term.In this scenario we will use the `_links` exposed via the API to guide us to the correct REST URL. Also not that in these examples the results are paged. We will use the `_links` again to help navigate the results.  

In [5]:
OLS_TERMS_API = OLS_BASE_URI + '/api/terms'

for term_id in term_id_map.values():
    
    lookup_params = {
        'id' : term_id,
        'ontology' : 'efo',
    }
    response = requests.get(OLS_TERMS_API, params=lookup_params)

    if response.ok:
        jData = json.loads(response.content)

        # get the URL for direct parents and ancestors
        label = jData['_embedded']['terms'][0]['label']
        
        parents_url = jData['_embedded']['terms'][0]['_links']['parents']['href']
        ancestors_url = jData['_embedded']['terms'][0]['_links']['ancestors']['href']
   
        parents_response = requests.get(parents_url)
        
        print("{}({})".format(term_id, label))
        if parents_response.ok:
            
            jData = json.loads(parents_response.content)
            for parent in jData['_embedded']['terms']:
                print ("\t child of --> {} ({})".format(
                    parent['obo_id'],
                    parent['label']
                ))
                
            print("\n\n")
                            

        more_paged_results = True
        print("{}({})".format(term_id, label))
        while (more_paged_results): 
            ancestor_response = requests.get(ancestors_url)
            if ancestor_response.ok:
            
                jData = json.loads(ancestor_response.content)
                for parent in jData['_embedded']['terms']:
                    print ("\t descendant of --> {} ({})".format(
                        parent['obo_id'],
                        parent['label']
                    ))
                    
               
                if 'next' in jData['_links']:
                    ancestors_url = jData['_links']['next']['href']
                else:
                    more_paged_results = False
                    print("\n\n")


        

EFO:0000341(chronic obstructive pulmonary disease)
	 child of --> EFO:0009910 (chronic lung disease)
	 child of --> MONDO:0002567 (tracheal disease)



EFO:0000341(chronic obstructive pulmonary disease)


	 descendant of --> MONDO:0002567 (tracheal disease)
	 descendant of --> MONDO:0004867 (upper respiratory tract disease)
	 descendant of --> EFO:0000684 (respiratory system disease)
	 descendant of --> MONDO:0021199 (disease by anatomical system)
	 descendant of --> EFO:0000408 (disease)
	 descendant of --> None (disposition)
	 descendant of --> None (material property)
	 descendant of --> EFO:0000001 (experimental factor)
	 descendant of --> None (Thing)
	 descendant of --> EFO:0009433 (lower respiratory tract disease)
	 descendant of --> EFO:0009910 (chronic lung disease)
	 descendant of --> EFO:0003818 (lung disease)
	 descendant of --> MONDO:0000651 (thoracic disease)
	 descendant of --> MONDO:0024505 (disorder by anatomical region)
	 descendant of --> EFO:0009714 (chronic disease)





EFO:0000384(Crohn's disease)
	 child of --> EFO:0003767 (inflammatory bowel disease)



EFO:0000384(Crohn's disease)


	 descendant of --> EFO:0003767 (inflammatory bowel disease)
	 descendant of --> EFO:0005140 (autoimmune disease)
	 descendant of --> EFO:0000540 (immune system disease)
	 descendant of --> MONDO:0021199 (disease by anatomical system)
	 descendant of --> EFO:0000408 (disease)
	 descendant of --> None (disposition)
	 descendant of --> None (material property)
	 descendant of --> EFO:0000001 (experimental factor)
	 descendant of --> None (Thing)
	 descendant of --> EFO:0009431 (intestinal disease)
	 descendant of --> EFO:0000405 (digestive system disease)





Orphanet:797(Sarcoidosis)
	 child of --> Orphanet:377788 (disease)



Orphanet:797(Sarcoidosis)


	 descendant of --> Orphanet:377788 (disease)
	 descendant of --> Orphanet:C001 (clinical entity)
	 descendant of --> None (Thing)





Orphanet:586(Cystic fibrosis)
	 child of --> Orphanet:377788 (disease)



Orphanet:586(Cystic fibrosis)
	 descendant of --> Orphanet:377788 (disease)
	 descendant of --> Orphanet:C001 (clinical entity)
	 descendant of --> None (Thing)





EFO:0000768(idiopathic pulmonary fibrosis)
	 child of --> MONDO:0002429 (idiopathic interstitial pneumonia)
	 child of --> EFO:0009448 (pulmonary fibrosis)



EFO:0000768(idiopathic pulmonary fibrosis)


	 descendant of --> EFO:0009448 (pulmonary fibrosis)
	 descendant of --> EFO:0006890 (fibrosis)
	 descendant of --> EFO:0000616 (neoplasm)
	 descendant of --> MONDO:0023370 (neoplastic disease or syndrome)
	 descendant of --> MONDO:0045024 (cell proliferation disorder)
	 descendant of --> EFO:0000408 (disease)
	 descendant of --> None (disposition)
	 descendant of --> None (material property)
	 descendant of --> EFO:0000001 (experimental factor)
	 descendant of --> None (Thing)
	 descendant of --> EFO:0004244 (interstitial lung disease)
	 descendant of --> EFO:0003818 (lung disease)
	 descendant of --> MONDO:0000651 (thoracic disease)
	 descendant of --> MONDO:0024505 (disorder by anatomical region)
	 descendant of --> EFO:0009433 (lower respiratory tract disease)
	 descendant of --> EFO:0000684 (respiratory system disease)
	 descendant of --> MONDO:0021199 (disease by anatomical system)
	 descendant of --> MONDO:0015118 (rare pulmonary disease)
	 descendant of --> EFO:0003853 (respira

	 descendant of --> MONDO:0017027 (primary interstitial lung disease specific to adulthood)
	 descendant of --> MONDO:0017026 (interstitial lung disease specific to adulthood)
	 descendant of --> EFO:0003106 (pneumonia)
	 descendant of --> MONDO:0024355 (respiratory tract infectious disease)
	 descendant of --> EFO:0005741 (infectious disease)





EFO:0000571(lung adenocarcinoma)
	 child of --> EFO:0000228 (adenocarcinoma)
	 child of --> EFO:0003060 (non-small cell lung carcinoma)



EFO:0000571(lung adenocarcinoma)


	 descendant of --> EFO:0003060 (non-small cell lung carcinoma)
	 descendant of --> EFO:0001071 (lung carcinoma)
	 descendant of --> EFO:0000313 (carcinoma)
	 descendant of --> EFO:0006858 (epithelial neoplasm)
	 descendant of --> EFO:0000616 (neoplasm)
	 descendant of --> MONDO:0023370 (neoplastic disease or syndrome)
	 descendant of --> MONDO:0045024 (cell proliferation disorder)
	 descendant of --> EFO:0000408 (disease)
	 descendant of --> None (disposition)
	 descendant of --> None (material property)
	 descendant of --> EFO:0000001 (experimental factor)
	 descendant of --> None (Thing)
	 descendant of --> EFO:0000311 (cancer)
	 descendant of --> MONDO:0008903 (lung cancer)
	 descendant of --> MONDO:0000376 (respiratory system cancer)
	 descendant of --> EFO:0003853 (respiratory system neoplasm)
	 descendant of --> EFO:0000684 (respiratory system disease)
	 descendant of --> MONDO:0021199 (disease by anatomical system)
	 descendant of --> MONDO:0021117 (lung neoplasm)
	 descendant 

	 descendant of --> MONDO:0000651 (thoracic disease)
	 descendant of --> MONDO:0024505 (disorder by anatomical region)
	 descendant of --> EFO:0009433 (lower respiratory tract disease)
	 descendant of --> MONDO:0020641 (respiratory tract neoplasm)
	 descendant of --> MONDO:0021350 (neoplasm of thorax)
	 descendant of --> MONDO:0003274 (thoracic cancer)
	 descendant of --> EFO:0000228 (adenocarcinoma)
	 descendant of --> MONDO:0024276 (glandular cell neoplasm)





EFO:0006505(chronic bronchitis)
	 child of --> EFO:0000341 (chronic obstructive pulmonary disease)
	 child of --> EFO:0009661 (bronchitis)



EFO:0006505(chronic bronchitis)


	 descendant of --> EFO:0009661 (bronchitis)
	 descendant of --> EFO:1002018 (bronchial disease)
	 descendant of --> MONDO:0000651 (thoracic disease)
	 descendant of --> MONDO:0024505 (disorder by anatomical region)
	 descendant of --> EFO:0000408 (disease)
	 descendant of --> None (disposition)
	 descendant of --> None (material property)
	 descendant of --> EFO:0000001 (experimental factor)
	 descendant of --> None (Thing)
	 descendant of --> EFO:0009433 (lower respiratory tract disease)
	 descendant of --> EFO:0000684 (respiratory system disease)
	 descendant of --> MONDO:0021199 (disease by anatomical system)
	 descendant of --> MONDO:0021925 (tracheobronchitis)
	 descendant of --> EFO:0009903 (inflammatory disease)
	 descendant of --> EFO:0000341 (chronic obstructive pulmonary disease)
	 descendant of --> MONDO:0002567 (tracheal disease)
	 descendant of --> MONDO:0004867 (upper respiratory tract disease)
	 descendant of --> EFO:0009910 (chronic lung disease)
	 descendant of --> EF

## Checking for subsumptions

You can use the OLS API to test if a term is child a particular higher level category. We want to know if any of the terms above are type of cancer. Cancer in EFO has the full URI of `http://www.ebi.ac.uk/efo/EFO_0000311`.

In [6]:
EFO_CANCER_TERM = 'http://www.ebi.ac.uk/efo/EFO_0000311'

for term_name in term_id_map.keys():
    
    search_params = {
        'q' : term_name,
        'exact' : True,
        'childrenOf' : EFO_CANCER_TERM,
        'ontology' : 'efo',
    }
    response = requests.get(OLS_SEARCH_API, params=search_params)

    if response.ok:
        jData = json.loads(response.content)
        
        # if we get a result then it must be a child
        
        if len(jData['response']['docs']) > 0:
            print("{} is a type of cancer".format(jData['response']['docs'][0]['label']))
        else:
            print("{} is not a type of cancer".format(term_name))



Chronic obstructive pulmonary disease is not a type of cancer


Crohn's disease is not a type of cancer
Sarcoidosis is not a type of cancer


Cystic fibrosis is not a type of cancer
Idiopathic pulmonary fibrosis is not a type of cancer


lung adenocarcinoma is a type of cancer


Chronic bronchitis is not a type of cancer


## Finding mappings to other ontologies

You will sometimes need to map between ontologies, especially the disease ontologies. You can use the EBI's OxO service to lookup mappings from existing ontologies. Here we will use the [OxO REST API](https://www.ebi.ac.uk/spot/oxo/docs/api) to find mappings for the terms used above. 

You can submit multiple ids at once to OxO, we will also restrict the mappings to a set of disease ontologies; SNOMEDCT and ICD10CM. We also only want mappings from a trusted source, in this case we will set MONDO and EFO and the allowed source for mappings. We set the distance to 1 to only get direct mappings. In the next section we will look at distance 2 and how that changes the result. Also note the results from OxO are paged. 


In [7]:
    
input_data = {
    'ids' : term_id_map.values(),
    "mappingTarget": ["SNOMEDCT", "ICD10CM"],
    "mappingSource": ["EFO", "MONDO"],
    'distance' : 1
}

def get_mappings_from_oxo (input_data):
    oxo_search_url = OXO_BASE_URI + '/api/search'
    more_paged_results = True
    while (more_paged_results): 
    
        oxo_response = requests.post(oxo_search_url, data=input_data)
        
        if oxo_response.ok:
            jData = json.loads(oxo_response.content)
    
            for oxo_result in jData["_embedded"]["searchResults"]:
                print("{} ({})".format(oxo_result['queryId'], oxo_result['label']))
    
                for mappings in oxo_result['mappingResponseList']:
                    print("\tmaps to {} ({}))".format(
                        mappings["curie"],
                        mappings["label"]
                    ))
            
            if 'next' in jData['_links']:
                oxo_search_url = jData['_links']['next']['href']
            else:
                more_paged_results = False
                
get_mappings_from_oxo(input_data)

EFO:0000341 (chronic obstructive pulmonary disease)
	maps to ICD10CM:J44 ())
	maps to SNOMEDCT:13645005 (Chronic obstructive lung disease))
	maps to SNOMEDCT:413846005 (Chronic obstructive pulmonary disease finding))
	maps to ICD10CM:J44.9 (Chronic obstructive pulmonary disease, unspecified))
	maps to SNOMEDCT:84162001 (Cold))
EFO:0000384 (Crohn's disease)
	maps to ICD10CM:K50.1 ())
	maps to ICD10CM:K50 ())
	maps to SNOMEDCT:34000006 (Crohn's disease))
Orphanet:797 (Sarcoidosis)
	maps to ICD10CM:D86.9 (Sarcoidosis, unspecified))
	maps to ICD10CM:D86.2 (Sarcoidosis of lung with sarcoidosis of lymph nodes))
	maps to ICD10CM:D86.0 (Sarcoidosis of lung))
	maps to ICD10CM:D80-D89 ())
	maps to ICD10CM:D86 ())
	maps to ICD10CM:D86.3 (Sarcoidosis of skin))
	maps to ICD10CM:D86.1 (Sarcoidosis of lymph nodes))
	maps to ICD10CM:D86.8 ())
Orphanet:586 (Cystic fibrosis)
	maps to ICD10CM:E84.9 (Cystic fibrosis, unspecified))
	maps to ICD10CM:E84.0 (Cystic fibrosis with pulmonary manifestations))
	ma

Let's increase the distance and try again. Setting the distance to 2 will find indirect mappings and should increase the coverage for some terms. 

In [8]:
input_data = {
    'ids' : term_id_map.values(),
    "mappingTarget": ["SNOMEDCT", "ICD10CM"],
    "mappingSource": ["EFO", "MONDO"],
    'distance' : 2
}
get_mappings_from_oxo(input_data)

EFO:0000341 (chronic obstructive pulmonary disease)
	maps to ICD10CM:J44 ())
	maps to SNOMEDCT:13645005 (Chronic obstructive lung disease))
	maps to SNOMEDCT:196003006 (Chronic obstructive airways disease NOS))
	maps to SNOMEDCT:413846005 (Chronic obstructive pulmonary disease finding))
	maps to ICD10CM:J44.9 (Chronic obstructive pulmonary disease, unspecified))
	maps to SNOMEDCT:195948000 (Chronic obstructive lung disease))
	maps to SNOMEDCT:195935004 (Chronic obstructive lung disease))
	maps to SNOMEDCT:155569000 (Chronic obstructive lung disease))
	maps to SNOMEDCT:155617000 (Chronic obstructive airways disease NOS))
	maps to SNOMEDCT:155565006 (Chronic obstructive lung disease))
	maps to SNOMEDCT:155585005 (Chronic obstructive airways disease NOS))
	maps to SNOMEDCT:84162001 (Cold))
EFO:0000384 (Crohn's disease)
	maps to SNOMEDCT:196981009 (Regional enteritis of the large bowel))
	maps to ICD10CM:K50.1 ())
	maps to SNOMEDCT:266446008 (Crohn's disease of the large bowel NOS))
	maps 

# Summary

We've shown some of the functionality of the EBI's Ontology Lookup Service and OxO Mapping API. For more information about the EBI's ontology services see http://www.ebi.ac.uk/spot/ontology. You can also contact the team at ols-support@ebi.ac.uk