# **Bioinformatics with Jupyter Notebooks for WormBase:**
## **Analyses 6 - Ontology Analyses**
Welcome to the sixth jupyter notebook in the WormBase tutorial series. Over this series of tutorials, we will write code in Python that allows us to retrieve and perform simple analyses with data available on the WormBase sites.

This tutorial will deal with obtaining different ontology information for the given gene input.
Let's get started!

As always, we start by importing all the required packages.

In [1]:
import requests, sys, json
import pandas as pd
pd.set_option('display.max_columns', None) #for ensuring full view of the dataframe generated

Assign the gene you are looking to obtain the ontology information for to the variable in the cell below.

In [2]:
gene = 'WBGene00001648'

### Gene Ontology

Gene Ontology gives 3 main sets of information- 
1) Biological Processes
2) Molecular Functions
3) Cellular Processes

We will obtain the same information for our gene of interest and extract the data into a dataframe for easy understanding.

In [3]:
#We first generate the request URL for the gene ontology summary
request = requests.get('http://rest.wormbase.org/rest/field/gene/'+gene+'/gene_ontology_summary')

In [4]:
#We can obtain the output of the request in the json format
if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

{
    "class": "gene",
    "name": "WBGene00001648",
    "uri": "rest/field/gene/WBGene00001648/gene_ontology_summary",
    "gene_ontology_summary": {
        "data": {
            "Biological_process": [
                {
                    "extensions": null,
                    "term_id": {
                        "id": "GO:0050829",
                        "label": "GO:0050829",
                        "class": "go_term",
                        "taxonomy": "all"
                    },
                    "term_description": [
                        {
                            "id": "GO:0050829",
                            "label": "defense response to Gram-negative bacterium",
                            "class": "go_term",
                            "taxonomy": "all"
                        }
                    ]
                },
                {
                    "extensions": null,
                    "term_id": {
                        "id": "GO:0048681",
        

We output the gene ontology for biological processes, molecular functions, and cellular components each in dataframes which can be used to generate csv files. If you need any more information to be added in your dataframe, just edit the fields in the cells below as per your requirement!

In [5]:
#Gene Ontology Biological Processes
GO_BP = pd.DataFrame(columns = ['ID', 'Label'])

In [6]:
biological_processes = request.json()['gene_ontology_summary']['data']['Biological_process']
for i in range(len(biological_processes)):
    GO_BP = GO_BP.append({'ID' : biological_processes[i]['term_description'][0]['id'], 
                          'Label' : biological_processes[i]['term_description'][0]['label']}, ignore_index = True)

In [7]:
GO_BP

Unnamed: 0,ID,Label
0,GO:0050829,defense response to Gram-negative bacterium
1,GO:0048681,negative regulation of axon regeneration
2,GO:0032094,response to food
3,GO:0006935,chemotaxis
4,GO:0051301,cell division
5,GO:0120169,detection of cold stimulus involved in thermoc...
6,GO:0007188,adenylate cyclase-modulating G protein-coupled...
7,GO:0050896,response to stimulus
8,GO:0040012,regulation of locomotion
9,GO:0007165,signal transduction


In [8]:
#Gene Ontology Molecular Functions
GO_MF = pd.DataFrame(columns = ['ID', 'Label'])

In [9]:
molecular_functions = request.json()['gene_ontology_summary']['data']['Molecular_function']
for i in range(len(molecular_functions)):
    GO_MF = GO_MF.append({'ID' : molecular_functions[i]['term_description'][0]['id'], 
                          'Label' : molecular_functions[i]['term_description'][0]['label']}, ignore_index = True)

In [10]:
GO_MF

Unnamed: 0,ID,Label
0,GO:0003924,GTPase activity
1,GO:0005525,GTP binding
2,GO:0019901,protein kinase binding
3,GO:0016907,G protein-coupled acetylcholine receptor activity
4,GO:0046872,metal ion binding
5,GO:0043495,protein-membrane adaptor activity
6,GO:0019001,guanyl nucleotide binding
7,GO:0001664,G protein-coupled receptor binding
8,GO:0031683,G-protein beta/gamma-subunit complex binding
9,GO:0000166,nucleotide binding


In [11]:
#Gene Ontology Cellular Components
GO_CC = pd.DataFrame(columns = ['ID', 'Label'])

In [12]:
cellular_components = request.json()['gene_ontology_summary']['data']['Cellular_component']
for i in range(len(cellular_components)):
    GO_CC = GO_CC.append({'ID' : cellular_components[i]['term_description'][0]['id'], 
                          'Label' : cellular_components[i]['term_description'][0]['label']}, ignore_index = True)

In [13]:
GO_CC

Unnamed: 0,ID,Label
0,GO:0005834,heterotrimeric G-protein complex
1,GO:0045202,synapse
2,GO:0005938,cell cortex


### Anatomy Ontology

Anatomy ontology gives information regarding the tissues in which the queried gene is expressed in.

In [14]:
#We first generate the request URL for the anatomy ontology
request = requests.get('http://rest.wormbase.org/rest/field/gene/'+gene+'/expressed_in')

In [15]:
#We can obtain the output of the request in the json format
if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

{
    "class": "gene",
    "name": "WBGene00001648",
    "uri": "rest/field/gene/WBGene00001648/expressed_in",
    "expressed_in": {
        "data": [
            {
                "ontology_term": {
                    "id": "WBbt:0004697",
                    "label": "head mesodermal cell",
                    "class": "anatomy_term",
                    "taxonomy": "all"
                },
                "images": null,
                "details": [
                    {
                        "text": {
                            "id": "WBPaper00056826:hmc_biased",
                            "label": "WBPaper00056826:hmc_biased",
                            "class": "expression_cluster",
                            "taxonomy": "c_elegans"
                        },
                        "evidence": {
                            "Description": "Transcripts that showed significantly lower expression in somatic gonad precursor cells (SGP) vs. head mesodermal cells (hmc).",
      

We output the anatomy ontology in a dataframe which can be used to generate a csv file. If you need any more information to be added in your dataframe, just edit the fields in the cells below as per your requirement!

In [16]:
#Anatomy Ontology
AO = pd.DataFrame(columns = ['ID', 'Label', 'Evidence'])

In [17]:
anatomy_terms = request.json()['expressed_in']['data']
for i in range(len(anatomy_terms)):
    AO = AO.append({'ID' : anatomy_terms[i]['ontology_term']['id'], 
                    'Label' : anatomy_terms[i]['ontology_term']['label'],
                    'Evidence': anatomy_terms[i]['details'][0]['evidence']['Description']}, ignore_index = True)

In [18]:
AO

Unnamed: 0,ID,Label,Evidence
0,WBbt:0004697,head mesodermal cell,Transcripts that showed significantly lower ex...
1,WBbt:0003679,neuron,[LacZ activity was shown throughout the nervou...
2,WBbt:0004985,SIAVL,[Pharyngeal muscles and HSN stain only with KP...
3,WBbt:0005017,RMGL,[Pharyngeal muscles and HSN stain only with KP...
4,WBbt:0004086,PVM,[Pharyngeal muscles and HSN stain only with KP...
...,...,...,...
58,WBbt:0004983,SIAVR,[Pharyngeal muscles and HSN stain only with KP...
59,WBbt:0003832,AVM,[Pharyngeal muscles and HSN stain only with KP...
60,WBbt:0003681,pharynx,[LacZ activity was shown throughout the nervou...
61,WBbt:0006823,AVK,Transcripts significantly enriched in AVK neur...


### Disease Ontology

Disease ontology gives the information regarding the diseases which the queried gene is assumed or proved to be related to.

In [19]:
#We first generate the request URL for the human disease ontology
request = requests.get('http://rest.wormbase.org/rest/field/gene/'+gene+'/human_diseases')

In [20]:
#We can obtain the output of the request in the json format
if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

{
    "class": "gene",
    "name": "WBGene00001648",
    "uri": "rest/field/gene/WBGene00001648/human_diseases",
    "human_diseases": {
        "data": {
            "potential_model": [
                {
                    "ev": {
                        "Inferred_automatically": [
                            "Inferred by orthology to human genes with DO annotation (HGNC:4389)"
                        ]
                    },
                    "id": "DOID:0080450",
                    "label": "developmental and epileptic encephalopathy 17",
                    "class": "do_term",
                    "taxonomy": "all"
                }
            ],
            "gene": [
                "139311"
            ]
        },
        "description": "Diseases related to the gene"
    }
}


We output the human disease ontology in a dataframe which can be used to generate a csv file. If you need any more information to be added in your dataframe, just edit the fields in the cells below as per your requirement!

In [21]:
#Disease Ontology
DO = pd.DataFrame(columns = ['ID', 'Label', 'Inference Method'])

In [22]:
disease_terms = request.json()['human_diseases']['data']['potential_model']
for i in range(len(disease_terms)):
    DO = DO.append({'ID' : disease_terms[i]['id'], 
                    'Label' : disease_terms[i]['label'],
                    'Inference Method': disease_terms[i]['ev']}, ignore_index = True)

In [23]:
DO

Unnamed: 0,ID,Label,Inference Method
0,DOID:0080450,developmental and epileptic encephalopathy 17,{'Inferred_automatically': ['Inferred by ortho...


### Life Stage Ontology

Life Stage ontology gives the information regarding the life stages of the organism during which the queried gene is assumed or proved to be active in.

In [24]:
#We first generate the request URL for the life stage ontology
request = requests.get('http://rest.wormbase.org/rest/field/gene/'+gene+'/fpkm_expression_summary_ls')

In [25]:
#We can obtain the output of the request in the json format
if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

{
    "class": "gene",
    "name": "WBGene00001648",
    "uri": "rest/field/gene/WBGene00001648/fpkm_expression_summary_ls",
    "fpkm_expression_summary_ls": {
        "data": {
            "controls": [
                {
                    "control mean": {
                        "text": 208.9216,
                        "evidence": {
                            "comment": [
                                "This is the mean value of the N=63 RNASeq FPKM expression values for this gene in this life-stage (Mixed stages) of this species of all samples in the SRA that are wildtype controls that do not appear to have undergone any experimental conditions or treatment that would affect gene expression."
                            ]
                        }
                    },
                    "life_stage": {
                        "id": "WBls:0000002",
                        "label": "all stages Ce",
                        "class": "life_stage",
                        "taxono

We output the life stage ontology in a dataframe which can be used to generate a csv file. If you need any more information to be added in your dataframe, just edit the fields in the cells below as per your requirement!

In [26]:
#Life Stage Ontology
LSO = pd.DataFrame(columns = ['ID', 'Label'])

In [27]:
life_stages = request.json()['fpkm_expression_summary_ls']['data']['controls']
for i in range(len(life_stages)):
    LSO = LSO.append({'ID' : life_stages[i]['life_stage']['id'], 
                    'Label' : life_stages[i]['life_stage']['label']}, ignore_index = True)

In [28]:
LSO

Unnamed: 0,ID,Label
0,WBls:0000002,all stages Ce
1,WBls:0000027,L2 larva Ce
2,WBls:0000041,adult Ce
3,WBls:0000035,L3 larva Ce
4,WBls:0000003,embryo Ce
5,WBls:0000032,dauer larva Ce
6,WBls:0000038,L4 larva Ce
7,WBls:0000024,L1 larva Ce
8,total_over_all_stages,total over all stages


### Phenotype Ontology

Phenotype ontology gives the information regarding the phenotypes whose expression the queried gene is assumed or proved to be responsible for.

In [29]:
#We first generate the request URL for the life stage ontology
request = requests.get('http://rest.wormbase.org/rest/field/gene/'+gene+'/phenotype')

In [30]:
#We can obtain the output of the request in the json format
if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

{
    "class": "gene",
    "name": "WBGene00001648",
    "uri": "rest/field/gene/WBGene00001648/phenotype",
    "phenotype": {
        "data": [
            {
                "entity": null,
                "phenotype": {
                    "id": "WBPhenotype:0000062",
                    "label": "lethal",
                    "class": "phenotype",
                    "taxonomy": "all"
                },
                "evidence": {
                    "Allele": {
                        "text": {
                            "id": "WBVar00089556",
                            "label": "n499",
                            "class": "variation",
                            "taxonomy": "c_elegans"
                        },
                        "evidence": {
                            "Person_evidence": [
                                {
                                    "class": "person",
                                    "id": "WBPerson261",
                                    "

We output the phenotype ontology in a dataframe which can be used to generate a csv file. If you need any more information to be added in your dataframe, just edit the fields in the cells below as per your requirement!

In [31]:
#Phenotype Ontology
PO = pd.DataFrame(columns = ['ID', 'Label', 'Taxonomy'])

In [32]:
phenotype = request.json()['phenotype']['data']
for i in range(len(phenotype)):
    PO = PO.append({'ID' : phenotype[i]['phenotype']['id'], 
                    'Label' : phenotype[i]['phenotype']['label'],
                    'Taxonomy': phenotype[i]['phenotype']['taxonomy']}, ignore_index = True)

In [33]:
PO

Unnamed: 0,ID,Label,Taxonomy
0,WBPhenotype:0000062,lethal,all
1,WBPhenotype:0002292,body posture amplitude decreased,all
2,WBPhenotype:0001462,sodium chloride chemotaxis variant,all
3,WBPhenotype:0002345,forward locomotion variant,all
4,WBPhenotype:0002295,body posture wavelength decreased,all
...,...,...,...
111,WBPhenotype:0000154,reduced brood size,all
112,WBPhenotype:0002074,centrosome dynamics variant,all
113,WBPhenotype:0002074,centrosome dynamics variant,all
114,WBPhenotype:0001107,spindle rotation defective early emb,all


This is the end of the sixth tutorial for WormBase data analysis! This tutorial dealt with obtaining several kinds of ontology information.

In the next tutorial, we will perform Enrichment analyses on WormBase data!