# KRAS phosphoproteomics cis comparison

## Step 1: Library Imports

Run this cell to import the necessary libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats
import re
import sys 
sys.path.append('C:\\Users\\brittany henderson\\GitHub\\WhenMutationsMatter\\Brittany\\')
import functions as f

import cptac
import cptac.algorithms as al

## Step 2: Find the frequently mutated genes for Endometrial Cancer

Enter the type of cancer and the cutoff for mutation frequency that you would like to use.

In [17]:
en_object = cptac.Endometrial()
desired_cutoff = .2

endometrial_freq_mut = al.get_frequently_mutated(en_object, cutoff = desired_cutoff)
print('\n\nNumber of Frequently Mutated Genes:', len(endometrial_freq_mut), '\n')
endometrial_freq_mut.loc[endometrial_freq_mut['Gene'] == 'KRAS']

                                    

Number of Frequently Mutated Genes: 10 



Unnamed: 0,Gene,Unique_Samples_Mut,Missence_Mut,Truncation_Mut
4,KRAS,0.326316,0.326316,0.0


## Step 3: Select KRAS, a frequently mutated gene

In [4]:
gene = 'KRAS'

## Step 4: Select phosphoproteomics

In [5]:
omics = en_object.get_phosphoproteomics()

## Step 5: cis comparison 

Determine if the DNA mutation has an effect on the omics measurement. In order to do this, we have a few steps in code. The first three steps are found in the format_phospho_cis_comparison_data function.
1. get a table with both the omics and mutation data for tumors
2. get a binary column from the mutation data to separate our samples
3. format data frame to be used in the T-test
4. send data to the T-test.

In [11]:
# Create dataframe in order to do comparisons with wrap_ttest
omics_and_mut = en_object.join_omics_to_mutations(
    mutations_genes = gene, omics_df_name = 'phosphoproteomics', omics_genes = gene)

# Create the binary column needed to do the comparison
omics_and_mut['binary_mutations'] = omics_and_mut[gene+'_Mutation_Status'].apply(
    lambda x: 'Wildtype' if x == 'Wildtype_Tumor' else 'Mutated')

# Step 3 - format for loop (only phospho)
tumors = omics_and_mut.loc[omics_and_mut['Sample_Status'] == 'Tumor'] #drop Normal samples
tumors.dropna(axis = 0)

Unnamed: 0_level_0,KRAS-T124_phosphoproteomics,KRAS_Mutation,KRAS_Location,KRAS_Mutation_Status,Sample_Status,binary_mutations
Sample_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
S034,0.0876,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor,Wildtype
S040,-0.058,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor,Wildtype
S041,-0.0056,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor,Wildtype
S065,-0.468,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor,Wildtype
S069,0.0,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor,Wildtype
S087,-0.204,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor,Wildtype


KRAS phosphoproteomics only contains data for wildtype tumors. Not possible to do cis comparison. 

# Repeat with the Colon dataset

Go through the same process, this time using the Colon dataset. 

In [14]:
colon_object = cptac.Colon()
desired_cutoff = .2

colon_freq_mut = al.get_frequently_mutated(colon_object, cutoff = desired_cutoff)
print('\n\nNumber of Frequently Mutated Genes:', len(colon_freq_mut), '\n')
colon_freq_mut.loc[colon_freq_mut['Gene'] == 'KRAS']

                                    

Number of Frequently Mutated Genes: 39 



Unnamed: 0,Gene,Unique_Samples_Mut,Missence_Mut,Truncation_Mut
15,KRAS,0.360825,0.360825,0.0


In [15]:
gene = 'KRAS'

co_omics = colon_object.get_phosphoproteomics()

In [16]:
omics_mutations = colon_object.join_omics_to_mutations(
        mutations_genes = gene, omics_df_name = 'phosphoproteomics', omics_genes = gene)



Not possible to do the phosphoproteomic cis comparison for the colon dataset.