# MUC17 proteomics cis comparison

"The protein encoded by this gene is a membrane-bound mucin that provides protection to gut epithelial cells" (https://www.proteinatlas.org/ENSG00000169876-MUC17/tissue).

## Step 1: Library Imports

Run this cell to import the necessary libraries

In [27]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats
import sys 
sys.path.append('C:\\Users\\brittany henderson\\GitHub\\WhenMutationsMatter\\Brittany\\')
import functions as f

import cptac
import cptac.algorithms as al

## Step 2: Find the frequently mutated genes for Endometrial Cancer

Enter the type of cancer and the cutoff for mutation frequency that you would like to use.

In [30]:
en_object = cptac.Endometrial()
desired_cutoff = .1

endometrial_freq_mut = al.get_frequently_mutated(en_object, cutoff = desired_cutoff)
print('\n\nNumber of Frequently Mutated Genes:', len(endometrial_freq_mut), '\n', endometrial_freq_mut.head())

                                    

Number of Frequently Mutated Genes: 232 
         Unique_Samples_Mut  Missence_Mut  Truncation_Mut
Gene                                                    
ABCA12            0.147368      0.094737        0.073684
ABCA13            0.115789      0.105263        0.042105
ACVR2A            0.105263      0.010526        0.094737
ADGRG4            0.136842      0.126316        0.021053
ADGRV1            0.115789      0.094737        0.052632


## Step 3: Select MUC17, a frequently mutated gene

In [31]:
gene = 'MUC17'

## Step 4: Select proteomics

In [32]:
omics = en_object.get_proteomics()

## Step 5: cis comparison 

Determine if the DNA mutation has an effect on the omics measurement. In order to do this, we have a few steps in code. These steps are found in the format_cis_comparison_data function.
1. get a table with both the omics and mutation data for tumors
2. get a binary column from the mutation data to separate our samples
3. format data frame to be used in the T-test
4. send data to the T-test.

The format_cis_comparison_data does the first 3 steps.

In [33]:
omics_and_mutations = en_object.join_omics_to_mutations(
        mutations_genes = gene, omics_df_name = 'proteomics', omics_genes = gene)

MUC17 did not match any columns in proteomics dataframe. MUC17_proteomics column inserted, but filled with NaN.


Unnamed: 0_level_0,MUC17_proteomics,MUC17_Mutation,MUC17_Location,MUC17_Mutation_Status,Sample_Status
Sample_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
S001,,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor
S002,,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor
S003,,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor
S005,,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor
S006,,[Wildtype_Tumor],[No_mutation],Wildtype_Tumor,Tumor


# Repeat with the Colon dataset

Go through the same process, this time using the Colon dataset. We will only print five genes from the frequently mutated data frame for simplicity.

In [34]:
colon_object = cptac.Colon()
desired_cutoff = .1

colon_freq_mut = al.get_frequently_mutated(colon_object, cutoff = desired_cutoff)
print('\n\nNumber of Frequently Mutated Genes:', len(colon_freq_mut), '\n', colon_freq_mut.head())

                                    

Number of Frequently Mutated Genes: 612 
         Unique_Samples_Mut  Missence_Mut  Truncation_Mut
Gene                                                    
ABCA13            0.195876      0.164948        0.103093
ABCA2             0.175258      0.164948        0.030928
ABCA4             0.144330      0.082474        0.061856
ABCB4             0.134021      0.061856        0.072165
ABCB6             0.103093      0.061856        0.041237


In [35]:
gene = 'MUC17'

co_omics = colon_object.get_proteomics()

In [36]:
omics_and_mutations = colon_object.join_omics_to_mutations(
        mutations_genes = gene, omics_df_name = 'proteomics', omics_genes = gene)

MUC17 did not match any columns in proteomics dataframe. MUC17_proteomics column inserted, but filled with NaN.


# Repeat with the Ovarian dataset


In [37]:
ovarian_object = cptac.Ovarian()
desired_cutoff = .1

ovarian_freq_mut = al.get_frequently_mutated(ovarian_object, cutoff = desired_cutoff)

                                    

In [19]:
gene = 'MUC17'

ov_omics = ovarian_object.get_proteomics()

In [38]:
omics_and_mutations = ovarian_object.join_omics_to_mutations(
        mutations_genes = gene, omics_df_name = 'proteomics', omics_genes = gene)

MUC17 did not match any columns in proteomics dataframe. MUC17_proteomics column inserted, but filled with NaN.


# Analysis of Results

There is no proteomics data available for MUC17. It is not possible to do a pancancer cis comparison.