# **Bioinformatics with Jupyter Notebooks for WormBase:**
## **Analyses 7 - Enrichment Analyses**
Welcome to the seventh jupyter notebook in the WormBase tutorial series. Over this series of tutorials, we will write code in Python that allows us to retrieve and perform simple analyses with data available on the WormBase sites.

This tutorial will deal with performing 3 kinds of enrichment analyses - tissue, gene ontology, and phenotype for the given gene list input.
Let's get started!

For this tutorial we use the WormBase Tissue Enrichment Analysis (TEA) pip package which we need to install and then import. We also need to import other relevant python packages.

In [None]:
!pip install tissue_enrichment_analysis

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tissue_enrichment_analysis as tea
%matplotlib inline
%config InlineBackend.figure_formats = {'png', 'retina'}

Read a csv file with gene names into a dataframe for performing the analyses further.

The csv file needs to have one gene in each line with the first line being a header.

In [None]:
genes = pd.read_csv('data/tea.csv')

For analysing the enrichment, we need to use the dictionaries which are maintained by WormBase regularly. There are 3 dictionaries, each for the three types of enrichment analyses.

In [None]:
#Tissue Enrichment
Tissue_Enrichment = tea.fetch_dictionary('tissue')

#Phenotype Enrichment
Phenotype_Enrichment = tea.fetch_dictionary('phenotype')

#Gene Ontology Enrichment
GO_Enrichment = tea.fetch_dictionary('go')

Now we analyse the gene list against the WormBase TEA dictionaries.

In [None]:
#Create a python dictionary to store the TEA dictionaries.
dictionaries = {'Tissue_Enrichment': Tissue_Enrichment, 'Phenotype_Enrichment': Phenotype_Enrichment, 'GO_Enrichment': GO_Enrichment}

In [None]:
#Test the list of genes and store the results.
#Set the alpha value to extact only the statistically significant results.
cutoff = 0.01

enrichments = {}
for analysis, dictionary in dictionaries.items():
    enrichments[analysis] = tea.enrichment_analysis(genes.gene_name, dictionary, show=False, alpha=cutoff)

We have performed the enrichment analysis! Now we can obtain the results in the form of dataframes!

In [None]:
for enrichment in enrichments:
  print('This is the ', enrichment, ' result dataframe, it has', str(len(enrichments[enrichment])), ' entries')
  display(enrichments[enrichment])

It is also possible to visualize these enrichments in the form of plots that show the top n hits based on -log10 q values.

In [None]:
for enrichment in ['GO_Enrichment', 'Phenotype_Enrichment', 'Tissue_Enrichment']:
    data_for_graph = enrichments[enrichment]
    data_for_graph['minus_log10_Qvalue'] = -np.log10(data_for_graph['Q value'])

    #how many top hits do you want?
    n = 10
    
    data_for_graph = data_for_graph.head(n).iloc[::-1]
    data_for_graph.plot(x='Term', y='minus_log10_Qvalue', kind="barh", legend=False, width=0.8, color = 'red', 
                        figsize = (10,5))
    for i, (Term, qval) in enumerate(zip(data_for_graph['Term'], data_for_graph["minus_log10_Qvalue"].round(1).astype(str))):
        plt.text(s = qval + '    ' + Term, x = 0, y = i, color = "k", horizontalalignment = 'left', verticalalignment = "center", size = 10)
        plt.axis("off")
        plt.tight_layout()
        plt.title(f'{analysis} enrichment analysis showing -log10 Q-values top '+str(n)+' hits', fontsize = 14)
plt.show()

This is the end of the seventh tutorial for WormBase data analysis! This tutorial dealt with performing enrichment analyses on WormBase data for a user provided gene list along with generating figures for easy understanding.

Thanks to this tutorial for the help - https://colab.research.google.com/github/Munfred/worm-tutorials/blob/main/tissue_enrichment_analysis.ipynb!

In the next tutorial, we will perform Literature based analyses on WormBase data!