# MUC5B phosphoproteomics cis comparison

## Step 1: Library Imports

Run this cell to import the necessary libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats
import re
import sys 
sys.path.append('C:\\Users\\brittany henderson\\GitHub\\WhenMutationsMatter\\Brittany\\')
import functions as f

import cptac
import cptac.algorithms as al

## Step 2: Find the frequently mutated genes for Endometrial Cancer

Enter the type of cancer and the cutoff for mutation frequency that you would like to use.

In [2]:
en_object = cptac.Endometrial()
desired_cutoff = .1

endometrial_freq_mut = al.get_frequently_mutated(en_object, cutoff = desired_cutoff)
print('\n\nNumber of Frequently Mutated Genes:', len(endometrial_freq_mut), '\n')
endometrial_freq_mut.loc[endometrial_freq_mut['Gene'] == 'MUC5B']

                                    

Number of Frequently Mutated Genes: 232 



Unnamed: 0,Gene,Unique_Samples_Mut,Missence_Mut,Truncation_Mut
136,MUC5B,0.147368,0.136842,0.031579


## Step 3: Select MUC5B, a frequently mutated gene

In [3]:
gene = 'MUC5B'

## Step 4: Select phosphoproteomics

In [4]:
omics = en_object.get_phosphoproteomics()

## Step 5: cis comparison 

Determine if the DNA mutation has an effect on the omics measurement. In order to do this, we have a few steps in code. The first three steps are found in the format_phospho_cis_comparison_data function.
1. get a table with both the omics and mutation data for tumors
2. get a binary column from the mutation data to separate our samples
3. format data frame to be used in the T-test
4. send data to the T-test.

In [5]:
# Step 1 - Create dataframe in order to do comparisons with wrap_ttest
omics_and_mut = en_object.join_omics_to_mutations(
    mutations_genes = gene, omics_df_name = 'phosphoproteomics', omics_genes = gene)

# Step 2 - Create the binary column needed to do the comparison
omics_and_mut['binary_mutations'] = omics_and_mut[gene+'_Mutation_Status'].apply(
    lambda x: 'Wildtype' if x == 'Wildtype_Tumor' else 'Mutated')

# Step 3 - Format
tumors = omics_and_mut.loc[omics_and_mut['Sample_Status'] == 'Tumor'] #drop Normal samples
columns_to_drop = [gene+"_Mutation", gene+"_Location", gene+"_Mutation_Status", "Sample_Status"]
binary_phospho = tumors.drop(columns_to_drop, axis = 1)
only_phospho = binary_phospho.drop('binary_mutations', axis = 1)

In [6]:
# Step 4 - T-test comparing means of mutated vs wildtype effect on cis omics
print("Doing t-test comparison for mutation status\n")
omics_col_list = list(only_phospho.columns) 
sig_pval_mut_status = al.wrap_ttest(binary_phospho, 'binary_mutations', omics_col_list)
print(sig_pval_mut_status)

Doing t-test comparison for mutation status

No significant comparisons.
None


  **kwargs)
  ret = ret.dtype.type(ret / rcount)
  return (self.a < x) & (x < self.b)
  return (self.a < x) & (x < self.b)
  cond2 = cond0 & (x <= self.a)


Repeat the same steps to compare mutation type (missence vs. truncation). Use the function, get_missence_truncation_comparison, to get the mutation type binary column and format the dataframe.

In [8]:
# Steps 1-3
formated_phospho_mut_type = f.get_missence_truncation_comparison(en_object, 'phosphoproteomics', gene)

In [9]:
# Step 4 - T-test comparing means of missence vs truncation effect on cis omics
print("Doing t-test comparison\n")
sig_pval_mut_type = al.wrap_ttest(formated_phospho_mut_type, 'binary_mutations', omics_col_list)
print(sig_pval_mut_type)

Doing t-test comparison

No significant comparisons.
None


# Repeat with the Colon dataset

Go through the same process, this time using the Colon dataset.

In [10]:
colon_object = cptac.Colon()
desired_cutoff = .1

colon_freq_mut = al.get_frequently_mutated(colon_object, cutoff = desired_cutoff)
print('\n\nNumber of Frequently Mutated Genes:', len(colon_freq_mut), '\n')
colon_freq_mut.loc[colon_freq_mut['Gene'] == 'MUC5B']

                                    

Number of Frequently Mutated Genes: 612 



Unnamed: 0,Gene,Unique_Samples_Mut,Missence_Mut,Truncation_Mut
326,MUC5B,0.278351,0.257732,0.051546


In [12]:
gene = 'MUC5B'

co_omics = colon_object.get_phosphoproteomics()

In [13]:
omics_mutations = colon_object.join_omics_to_mutations(
        mutations_genes = gene, omics_df_name = 'phosphoproteomics', omics_genes = gene)



No data for MUC5B phosphoproteomics in the colon dataset. Not possible to do cis comparison.

# Repeat with the Ovarian dataset

In [15]:
ov_object = cptac.Ovarian()
desired_cutoff = .1

ov_freq_mut = al.get_frequently_mutated(ov_object, cutoff = desired_cutoff)
print('\n\nNumber of Frequently Mutated Genes:', len(ov_freq_mut), '\n')
ov_freq_mut.loc[ov_freq_mut['Gene'] == 'MUC5B']

                                    

Number of Frequently Mutated Genes: 16 



Unnamed: 0,Gene,Unique_Samples_Mut,Missence_Mut,Truncation_Mut
9,MUC5B,0.108434,0.108434,0.0


In [16]:
gene = 'MUC5B'

ov_omics = ov_object.get_phosphoproteomics()

In [17]:
omics_mutations = colon_object.join_omics_to_mutations(
        mutations_genes = gene, omics_df_name = 'phosphoproteomics', omics_genes = gene)



No data for MUC5B phosphoproteomics in the ovarian dataset. Not possible to do cis comparison.

# Analysis of Results

MUC5B phosphoproteomics only existed in the endometrial dataset. No comparisons were significant.