# MUC16 acetylproteomics cis comparison

## Step 1: Library Imports

Run this cell to import the necessary libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats
import re
import sys 
sys.path.append('C:\\Users\\brittany henderson\\GitHub\\WhenMutationsMatter\\Brittany\\')
import functions as f

import cptac
import cptac.algorithms as al

## Step 2: Find the frequently mutated genes for Endometrial Cancer

Enter the type of cancer and the cutoff for mutation frequency that you would like to use.

In [2]:
en_object = cptac.Endometrial()
desired_cutoff = .1

endometrial_freq_mut = al.get_frequently_mutated(en_object, cutoff = desired_cutoff)
print('\n\nNumber of Frequently Mutated Genes:', len(endometrial_freq_mut), '\n')
endometrial_freq_mut.loc[endometrial_freq_mut['Gene'] == 'MUC16']

                                    

Number of Frequently Mutated Genes: 232 



Unnamed: 0,Gene,Unique_Samples_Mut,Missence_Mut,Truncation_Mut
134,MUC16,0.189474,0.178947,0.052632


## Step 3: Select MUC16, a frequently mutated gene

In [3]:
gene = 'MUC16'

## Step 4: Select acetylproteomics

In [4]:
omics = en_object.get_acetylproteomics()

## Step 5: cis comparison 

Determine if the DNA mutation has an effect on the omics measurement. In order to do this, we have a few steps in code. The first three steps are found in the format_phospho_cis_comparison_data function.
1. get a table with both the omics and mutation data for tumors
2. get a binary column from the mutation data to separate our samples
3. format data frame to be used in the T-test
4. send data to the T-test.

In [6]:
# Create dataframe in order to do comparisons with wrap_ttest
omics_and_mut = en_object.join_omics_to_mutations(
    mutations_genes = gene, omics_df_name = 'acetylproteomics', omics_genes = gene)



No data for MUC16 acetylproteomics in the endometrial dataset. Not possible to do cis comparison.

# Repeat with the Colon dataset

Go through the same process, this time using the Colon dataset. We will only print five genes from the frequently mutated data frame for simplicity.

In [7]:
colon_object = cptac.Colon()
desired_cutoff = .1

colon_freq_mut = al.get_frequently_mutated(colon_object, cutoff = desired_cutoff)
print('\n\nNumber of Frequently Mutated Genes:', len(colon_freq_mut), '\n')
colon_freq_mut.loc[colon_freq_mut['Gene'] == 'MUC16']

                                    

Number of Frequently Mutated Genes: 612 



Unnamed: 0,Gene,Unique_Samples_Mut,Missence_Mut,Truncation_Mut
321,MUC16,0.402062,0.360825,0.072165


In [8]:
gene = 'MUC16'

In [15]:
# this cell gives an error because there is not a acetylproteomic dataframe for the colon dataset
#omics_mutations = colon_object.join_omics_to_mutations(
#        mutations_genes = gene, omics_df_name = 'acetylproteomics', omics_genes = gene)

Acetylproteomics dataframe is not included in the colon dataset. Not possible to do the cis comparison.

# Repeat with the Ovarian dataset

In [10]:
ov_object = cptac.Ovarian()
desired_cutoff = .1

ov_freq_mut = al.get_frequently_mutated(ov_object, cutoff = desired_cutoff)
print('\n\nNumber of Frequently Mutated Genes:', len(ov_freq_mut), '\n')
ov_freq_mut.loc[ov_freq_mut['Gene'] == 'MUC16']

                                    

Number of Frequently Mutated Genes: 16 



Unnamed: 0,Gene,Unique_Samples_Mut,Missence_Mut,Truncation_Mut
6,MUC16,0.144578,0.144578,0.012048


In [11]:
gene = 'MUC16'

ov_omics = ov_object.get_phosphoproteomics()

In [14]:
omics_mutations = colon_object.join_omics_to_mutations(
        mutations_genes = gene, omics_df_name = 'phosphoproteomics', omics_genes = gene)



No data for MUC16 acetylproteomics in the ovarian dataset. Not possible to do cis comparison.