# MUC4 Mutation Trans Effect on Acetylproteomics

This notebook analyzes the trans effect of MUC4 mutation on interacting and other proteins Acetylproteomics, in Endometrial cancer (The Colon and Ovarian Datasets don't have acetylproteomic data.

### Library Imports

In [1]:
import pandas as pd
import numpy as np
import scipy.stats

import warnings
warnings.filterwarnings("ignore")

import cptac
import cptac.algorithms as al

en = cptac.Endometrial()
co = cptac.Colon()

                                    

### Specify Gene

In [2]:
gene = "MUC4"

### Investigate Proteomics, Phosphoproteomics, Acetylproteomics, or Transcriptomics

In [3]:
#omics = "proteomics"
#omics = "transcriptomics"
#omics = "phosphoproteomics"
omics = "acetylproteomics"

### Track all significant comparisons in Dataframe

In [4]:
all_significant_comparisons = pd.DataFrame(columns=['Cancer_Type', 'Gene', 'Comparison','Interacting_Protein','P_Value'])

In [5]:
def add_to_all_significant_comparisons(df, cancer, interacting, all_sig_comp):
    expanded = df
    expanded['Gene'] = gene
    expanded['Cancer_Type'] = cancer
    expanded['Interacting_Protein'] = interacting
    
    updated_all_comparisons = pd.concat([all_sig_comp, expanded], sort=False)
    
    return updated_all_comparisons

# Interacting Proteins: Acetylproteomics

### Generate interacting protein list

In [6]:
'''Use get interacting proteins method to generate list of interacting proteins'''
interacting_proteins = al.get_interacting_proteins(gene)

print("Interacting Proteins:")
for interacting_protein in interacting_proteins:
    print(interacting_protein)

Interacting Proteins:
MUC17
ST3GAL3
B3GNT5
B3GNT3
ST6GALNAC3
ST6GALNAC4
GALNT12
MUC21
MUC3A
GCNT3
MUC16
C1GALT1
MUC6
MUC7
MUC20
MUC15
GALNT11
MUC4
ST3GAL1
ST3GAL4
MUC5B
MUC12
GALNT6
MUC1
MUC13
MUC5AC
EZH2
SUZ12
EGFR
ERBB2
ERBB3
ERBB4


## Endometrial

### Test for significant comparisons in any of interacting proteins

In [7]:
'''Create dataframe in order to do comparisons with wrap_ttest'''
protdf = en.join_omics_to_mutations(mutations_genes=[gene], omics_df_name=omics, omics_genes=interacting_proteins)
protdf = protdf.loc[protdf['Sample_Status'] == 'Tumor']

'''Create the binary valued column needed to do the comparison'''
for ind, row in protdf.iterrows():
    if row[gene+"_Mutation_Status"] != 'Wildtype_Tumor':
        protdf.at[ind,'Label'] = 'Mutated'
    else:
        protdf.at[ind,'Label'] = 'Wildtype'

'''Format the dataframe correctly'''
protdf = protdf.drop(gene+"_Mutation",axis=1)
protdf = protdf.drop(gene+"_Location",axis=1)
protdf = protdf.drop(gene+"_Mutation_Status", axis=1)
protdf = protdf.drop("Sample_Status",axis=1)


'''Make list of columns to be compared using t-tests'''
col_list = list(protdf.columns)
col_list.remove('Label')

print("Doing t-test comparisons\n")

'''Call wrap_ttest, pass in formatted dataframe'''
wrap_results = al.wrap_ttest(protdf, 'Label', col_list)

'''Print results, if anything significant was found'''
if wrap_results is not None:
        print(wrap_results)
        print("\n\n")
        
        all_significant_comparisons = add_to_all_significant_comparisons(wrap_results, "Endometrial", True, all_significant_comparisons)


MUC17 did not match any columns in acetylproteomics dataframe. MUC17_acetylproteomics column inserted, but filled with NaN.
ST3GAL3 did not match any columns in acetylproteomics dataframe. ST3GAL3_acetylproteomics column inserted, but filled with NaN.
B3GNT5 did not match any columns in acetylproteomics dataframe. B3GNT5_acetylproteomics column inserted, but filled with NaN.
B3GNT3 did not match any columns in acetylproteomics dataframe. B3GNT3_acetylproteomics column inserted, but filled with NaN.
ST6GALNAC3 did not match any columns in acetylproteomics dataframe. ST6GALNAC3_acetylproteomics column inserted, but filled with NaN.
ST6GALNAC4 did not match any columns in acetylproteomics dataframe. ST6GALNAC4_acetylproteomics column inserted, but filled with NaN.
GALNT12 did not match any columns in acetylproteomics dataframe. GALNT12_acetylproteomics column inserted, but filled with NaN.
MUC21 did not match any columns in acetylproteomics dataframe. MUC21_acetylproteomics column inserte

## Colon

Acetylproteomic data does not exist for the Colon dataset

## Ovarian

Acetylproteomic data does not exist for the Ovarian dataset

# All Proteins: Phosphoproteomics

## Endometrial

In [8]:
try:
    print("\nGene: ", gene)

    '''Use all proteins'''
    proteomics = en.get_proteomics()
    all_proteins = proteomics.columns
    all_proteins = list(all_proteins)

    all_proteins_no_dash = []
    for ap in all_proteins:
        if '-' not in ap:
            all_proteins_no_dash.append(ap)
            
    #all_proteins = all_proteins[:500]

    '''Create dataframe in order to do comparisons with wrap_ttest'''
    protdf = en.join_omics_to_mutations(mutations_genes=[gene], omics_df_name=omics)
    protdf = protdf.loc[protdf['Sample_Status'] == 'Tumor']
    
    '''Create the binary valued column needed to do the comparison'''
    for ind, row in protdf.iterrows():
        if row[gene+"_Mutation_Status"] != 'Wildtype_Tumor':
            protdf.at[ind,'Label'] = 'Mutated'
        else:
            protdf.at[ind,'Label'] = 'Wildtype'

    '''Format the dataframe correctly'''
    protdf = protdf.drop(gene+"_Mutation",axis=1)
    protdf = protdf.drop(gene+"_Location",axis=1)
    protdf = protdf.drop(gene+"_Mutation_Status", axis=1)
    protdf = protdf.drop("Sample_Status",axis=1)

    '''Make list of columns to be compared using t-tests'''
    col_list = list(protdf.columns)
    col_list.remove('Label')

    print("Doing t-test comparisons\n")
    
    '''Call wrap_ttest, pass in formatted dataframe'''
    wrap_results = al.wrap_ttest(protdf, 'Label', col_list)

    '''Print results, if anything significant was found'''
    if wrap_results is not None:
            print(wrap_results)
            print("\n\n")
            
            all_significant_comparisons = add_to_all_significant_comparisons(wrap_results, "Endometrial", False, all_significant_comparisons)


except Exception as e:
    print("Error in Comparison")
    print(e)


Gene:  MUC4
Doing t-test comparisons

No significant comparisons.


## Colon

Colon Dataset doesn't contain acetylproteomic data

## Ovarian

Ovarian Dataset doesn't contain acetylproteomic data

### Print all significant comparisons

In [9]:
if len(all_significant_comparisons) > 0:
    display(all_significant_comparisons)
    
else:
    print('No Significant Comparisons!')

No Significant Comparisons!


### Write significant comparisons (if any) to shared CSV file

In [10]:
existing_results = pd.read_csv(gene+'_Trans_Results.csv')

updated_results = pd.concat([existing_results, all_significant_comparisons], sort=False)

updated_results.to_csv(path_or_buf = gene + '_Trans_Results.csv', index=False)