# Use Case 5: Gene set enrichment analysis

<b>Import standard data analysis imports, as well as the gseapy which will allow us to perform a Gene set enrichment analysis</b>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import gseapy as gp

<b>Import the CPTAC data</b>

In [2]:
import CPTAC

Loading Clinical Data...
Loading Proteomics Data...
Loading Transcriptomics Data...
Loading CNA Data...
Loading Phosphoproteomics Data...
Loading Somatic Data...

 ******PLEASE READ******


<b>Retrieve the clinical and proteomics dataframes</b>

In [3]:
clinical = CPTAC.get_clinical()
proteomics = CPTAC.get_proteomics()

<b>For this example we will be separating the protein abudance based on the clinical MSI. Our first step is to combine the MSI information into the proteomics dataframe utilizing the <code>CPTAC.compare_clinical()</code> function</b>

In [4]:
msiProt = CPTAC.compare_clinical(clinical, proteomics, 'MSI')

<b>Separate the proteomics into two groups based on whether MSI is MSI-H or other </b>

In [5]:
high = msiProt['MSI'] == "MSI-H"
other = msiProt['MSI'] != "MSI-H"
highProt = msiProt[high]
otherProt = msiProt[other]

<b>Then use the genes that are up-regulated in these partitions to do a gene set enrichment analysis</b>

In [6]:
#gene_list = Get up-regulated genes in partitions
gene_list = highProt.columns[:10] #This will change to whatever the up-regulated genes are
gene_list = gene_list.tolist()
enr = gp.enrichr(gene_list = gene_list, description='MSI partitions', gene_sets='KEGG_2016', outdir='test/enrichr_kegg',cutoff=.5)
enr.res2d

Unnamed: 0,Term,Overlap,P-value,Adjusted P-value,Old P-value,Old Adjusted P-value,Z-score,Combined Score,Genes,Gene_set
0,Glycosphingolipid biosynthesis - globo series_...,1/14,0.00698,0.042347,0.010649,0.062034,-1.581753,7.853041,A4GALT,KEGG_2016
1,Tryptophan metabolism_Homo sapiens_hsa00380,1/40,0.019825,0.042839,0.028892,0.062034,-1.848648,7.248167,AADAT,KEGG_2016
2,"Valine, leucine and isoleucine degradation_Hom...",1/48,0.023748,0.042839,0.03445,0.062034,-1.82371,6.821165,AACS,KEGG_2016
3,Butanoate metabolism_Homo sapiens_hsa00650,1/28,0.013915,0.042839,0.020506,0.062034,-1.582637,6.765411,AACS,KEGG_2016
4,2-Oxocarboxylic acid metabolism_Homo sapiens_h...,1/17,0.008469,0.042347,0.012768,0.062034,-1.369591,6.534713,AADAT,KEGG_2016
5,Lysine degradation_Homo sapiens_hsa00310,1/52,0.025704,0.042839,0.03722,0.062034,-1.724271,6.31277,AADAT,KEGG_2016
6,Biosynthesis of amino acids_Homo sapiens_hsa01230,1/74,0.036398,0.048517,0.052341,0.069688,-1.578342,5.229428,AADAT,KEGG_2016
7,Complement and coagulation cascades_Homo sapie...,1/79,0.038814,0.048517,0.05575,0.069688,-1.595878,5.184974,A2M,KEGG_2016
8,RNA transport_Homo sapiens_hsa03013,1/172,0.082765,0.091961,0.117405,0.13045,-1.673121,4.169001,AAAS,KEGG_2016
9,Metabolic pathways_Homo sapiens_hsa01100,2/1239,0.12402,0.12402,0.216188,0.216188,-1.788092,3.732308,AADAT;A4GALT,KEGG_2016
