# **Bioinformatics with Jupyter Notebooks for WormBase:**
## **Analyses 6 - Ontology Analyses**
Welcome to the sixth jupyter notebook in the WormBase tutorial series. Over this series of tutorials, we will write code in Python that allows us to retrieve and perform simple analyses with data available on the WormBase sites.

This tutorial will deal with obtaining different ontology information for the given gene input.
Let's get started!

As always, we start by importing all the required packages.

In [None]:
import requests
import sys
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tissue_enrichment_analysis as tea
pd.set_option('display.max_columns', None) #for ensuring full view of the dataframe generated

Assign the gene you are looking to obtain the ontology information for to the variable in the cell below.

In [None]:
gene = 'WBGene00001648'

### Gene Ontology

Gene Ontology gives 3 main sets of information- 
1) Biological Processes
2) Molecular Functions
3) Cellular Processes

We will obtain the same information for our gene of interest and extract the data into a dataframe for easy understanding.

We first generate the request URL for the gene ontology summary and obtain the output of the request in the json format.

In [None]:
request = requests.get('http://rest.wormbase.org/rest/field/gene/'+gene+'/gene_ontology_summary')

In [None]:
if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

We output the gene ontology for biological processes, molecular functions, and cellular components each in dataframes which can be used to generate csv files. If you need any more information to be added in your dataframe, just edit the fields in the cells below as per your requirement!

##### Gene Ontology Biological Processes

In [None]:
GO_BP = pd.DataFrame(columns = ['ID', 'Label'])

In [None]:
biological_processes = request.json()['gene_ontology_summary']['data']['Biological_process']
for i in range(len(biological_processes)):
    GO_BP = GO_BP.append({'ID' : biological_processes[i]['term_description'][0]['id'], 
                          'Label' : biological_processes[i]['term_description'][0]['label']}, ignore_index = True)

In [None]:
GO_BP

##### Gene Ontology Molecular Functions

In [None]:
GO_MF = pd.DataFrame(columns = ['ID', 'Label'])

In [None]:
molecular_functions = request.json()['gene_ontology_summary']['data']['Molecular_function']
for i in range(len(molecular_functions)):
    GO_MF = GO_MF.append({'ID' : molecular_functions[i]['term_description'][0]['id'], 
                          'Label' : molecular_functions[i]['term_description'][0]['label']}, ignore_index = True)

In [None]:
GO_MF

##### Gene Ontology Cellular Components

In [None]:
GO_CC = pd.DataFrame(columns = ['ID', 'Label'])

In [None]:
cellular_components = request.json()['gene_ontology_summary']['data']['Cellular_component']
for i in range(len(cellular_components)):
    GO_CC = GO_CC.append({'ID' : cellular_components[i]['term_description'][0]['id'], 
                          'Label' : cellular_components[i]['term_description'][0]['label']}, ignore_index = True)

In [None]:
GO_CC

### Anatomy Ontology

Anatomy ontology gives information regarding the tissues in which the queried gene is expressed in.

We first generate the request URL for the anatomy ontology and obtain the output of the request in the json format.

In [None]:
request = requests.get('http://rest.wormbase.org/rest/field/gene/'+gene+'/expressed_in')

In [None]:
if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

We output the anatomy ontology in a dataframe which can be used to generate a csv file. If you need any more information to be added in your dataframe, just edit the fields in the cells below as per your requirement!

In [None]:
AO = pd.DataFrame(columns = ['ID', 'Label', 'Evidence'])

In [None]:
anatomy_terms = request.json()['expressed_in']['data']
for i in range(len(anatomy_terms)):
    AO = AO.append({'ID' : anatomy_terms[i]['ontology_term']['id'], 
                    'Label' : anatomy_terms[i]['ontology_term']['label'],
                    'Evidence': anatomy_terms[i]['details'][0]['evidence']['Description']}, ignore_index = True)

In [None]:
AO

### Disease Ontology

Disease ontology gives the information regarding the diseases which the queried gene is assumed or proved to be related to.

We first generate the request URL for the human disease ontology and obtain the output of the request in the json format.

In [None]:
request = requests.get('http://rest.wormbase.org/rest/field/gene/'+gene+'/human_diseases')

In [None]:
if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

We output the human disease ontology in a dataframe which can be used to generate a csv file. If you need any more information to be added in your dataframe, just edit the fields in the cells below as per your requirement!

In [None]:
DO = pd.DataFrame(columns = ['ID', 'Label', 'Inference Method'])

In [None]:
disease_terms = request.json()['human_diseases']['data']['potential_model']
for i in range(len(disease_terms)):
    DO = DO.append({'ID' : disease_terms[i]['id'], 
                    'Label' : disease_terms[i]['label'],
                    'Inference Method': disease_terms[i]['ev']}, ignore_index = True)

In [None]:
DO

### Life Stage Ontology

Life Stage ontology gives the information regarding the life stages of the organism during which the queried gene is assumed or proved to be active in.

We first generate the request URL for the life stage ontology and obtain the output of the request in the json format.

In [None]:
request = requests.get('http://rest.wormbase.org/rest/field/gene/'+gene+'/fpkm_expression_summary_ls')

In [None]:
if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

We output the life stage ontology in a dataframe which can be used to generate a csv file. If you need any more information to be added in your dataframe, just edit the fields in the cells below as per your requirement!

In [None]:
LSO = pd.DataFrame(columns = ['ID', 'Label'])

In [None]:
life_stages = request.json()['fpkm_expression_summary_ls']['data']['controls']
for i in range(len(life_stages)):
    LSO = LSO.append({'ID' : life_stages[i]['life_stage']['id'], 
                    'Label' : life_stages[i]['life_stage']['label']}, ignore_index = True)

In [None]:
LSO

### Phenotype Ontology

Phenotype ontology gives the information regarding the phenotypes whose expression the queried gene is assumed or proved to be responsible for.

We first generate the request URL for the life stage ontology and obtain the output of the request in the json format.

In [None]:
request = requests.get('http://rest.wormbase.org/rest/field/gene/'+gene+'/phenotype')

In [None]:
if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

We output the phenotype ontology in a dataframe which can be used to generate a csv file. If you need any more information to be added in your dataframe, just edit the fields in the cells below as per your requirement!

In [None]:
PO = pd.DataFrame(columns = ['ID', 'Label', 'Taxonomy'])

In [None]:
phenotype = request.json()['phenotype']['data']
for i in range(len(phenotype)):
    PO = PO.append({'ID' : phenotype[i]['phenotype']['id'], 
                    'Label' : phenotype[i]['phenotype']['label'],
                    'Taxonomy': phenotype[i]['phenotype']['taxonomy']}, ignore_index = True)

In [None]:
PO

#### Gene Ontology Enrichment Analysis

We have obtained ontology information for the required gene. Now, we can perform gene ontology enrichment analysis using the tissue enrichment analysis package.
More details about this can be found in the next tutorial where multiple enrichment analyses (including ontology enrichment) will be performed.

Read a csv file with gene names into a dataframe for performing the analyses further.

The csv file needs to have one gene in each line with the first line being a header.

In [None]:
genes = pd.read_csv('data/tea.csv')

Load the Gene Ontology dictionary to a variable.

In [None]:
GO_Enrichment = tea.fetch_dictionary('go')   

Now we analyse the gene list against the WormBase Gene Ontology dictionary.

We test the list of genes and store the results. For this, we need to set an alpha value to extract only the statistically significant results.

In [None]:
cutoff = 0.01

go_enrichment =  tea.enrichment_analysis(genes.gene_name, GO_Enrichment, show=False, alpha=cutoff)

Now that we have performed the Gene Ontology Enrichment analysis, we can view the results in a dataframe!

In [None]:
print('The Gene Ontology results has ', str(len(go_enrichment)), ' entries')
display(go_enrichment)

The results can also be used to generate plots. These will be explored in the next notebook.

This is the end of the sixth tutorial for WormBase data analysis! This tutorial dealt with obtaining several kinds of ontology information.

In the next tutorial, we will perform Enrichment analyses on WormBase data!

Acknowledgements:
- Tissue Enrichment Analysis GitHub Repository (https://github.com/dangeles/TissueEnrichmentAnalysis)
- TEA Publication - 'Tissue enrichment analysis for C. elegans genomics.'  Angeles-Albores, D., N. Lee, R.Y., Chan, J., Sternberg, P.W. BMC Bioinformatics 17, 366 (2016). https://doi.org/10.1186/s12859-016-1229-9
- TEA Tutorial (https://colab.research.google.com/github/Munfred/worm-tutorials/blob/main/tissue_enrichment_analysis.ipynb)