# Use Case 4: Comparing Mutation Protein Abundance

<b>Standard imports for playing with and plotting data frames.</b>

In [104]:
import pandas as pd
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import seaborn as sns

<b>Import CPTAC data</b>

In [2]:
import CPTAC

Loading Clinical Data...
Loading Proteomics Data...
Loading Transcriptomics Data...
Loading CNA Data...
Loading Phosphoproteomics Data...
Loading Somatic Data...

 ******PLEASE READ******


<b>To begin, retrieve protein abudance and somatic gene mutations. The somatic data is represented in binary format: 0 if no mutation is present in the gene, 1 if the mutation is present.</b>

In [224]:
somatic_mutations = CPTAC.get_somatic()
proteomics = CPTAC.get_proteomics()
phos = CPTAC.get_phosphoproteomics()
phos.head()

idx,AAAS-S495,AAK1-S18,AAK1-S20,AAK1-S21,AAK1-S624,AAK1-S637,AAK1-S670,AAK1-S678,AAK1-S682,AAK1-S731,...,ZZEF1-S1464,ZZEF1-S1488,ZZEF1-S1501,ZZEF1-S1518,ZZEF1-S2444,ZZEF1-T1477,ZZEF1-T1512,ZZEF1-T1521,ZZZ3-S397,ZZZ3-S426
S001,,-0.28,0.44,0.52,-0.7,-0.68,-0.49,-0.43,-0.93,,...,-0.28,-0.2,-0.11,-0.11,-0.08,-0.27,,0.24,0.42,-0.26
S002,,-0.53,-1.14,0.1,-0.66,-0.42,-0.43,-0.29,-0.39,-0.25,...,0.44,0.37,0.6,0.33,,1.13,1.0,0.34,-0.19,-0.55
S003,-0.29,0.42,-0.05,-0.06,-0.03,0.8,,-1.41,0.16,-0.09,...,1.05,0.61,,0.36,0.53,0.52,0.28,-0.27,,
S004,0.15,,-0.54,-0.17,0.21,-0.17,0.58,0.5,,-0.52,...,0.03,-0.18,,0.68,0.28,,0.85,-0.32,0.03,-0.75
S005,0.6,0.64,-0.05,-0.08,0.14,0.63,0.52,-0.76,-0.3,-0.01,...,-0.54,0.48,-0.7,0.26,-0.59,-0.66,-0.13,-0.21,-0.1,0.15


In [254]:
pcutoff = 0.05 / len(proteomics.columns)
pcutoff

5.216484089723526e-06

In [293]:
gene = 'FBXW7'
protein = 'MYC'

<b>Once you have found your gene of interest, use the gene to compare somatic and protein abundance utilizing the <code>CPTAC.compare_gene()</code> function. (An array of genes can also be passed to this function). If you want to be reminded of what gene you are comparing, enter <code>cross.name</code> (Also, note the binary nature of the somatic data, 0 if no mutation is present for that sample in that gene, 1 if the mutation is present).</b> 

### Cyclins

In [240]:
protList = ['CCNB1', 'CCNB2', 'CCNB3', 'CCNC', 'CCND1', 'CCND2', 'CCND3', 'CCNE1', 'CCNE2', 'CCNF', 'CCNG1', 'CCNG2',
           'CCNH', 'CCNI', 'CCNI2', 'CCNK', 'CCNL1', 'CCNL2', 'CCNO', 'CCNT1', 'CCNT2', 'CCNY', 'CCNYL1', 'CCNYL2',
           'CCNYL3']

In [247]:
pcutoff = 0.05/len(protList)
pcutoff

0.002

In [283]:
genedf = somatic_mutations[gene].to_frame()
for protein in protList:
    if protein in proteomics.columns:
        proteindf = proteomics[protein].to_frame()
        cross = genedf.add(proteindf, fill_value=0).dropna(axis=0)
        mutated = cross.loc[cross[gene] == 1.0]
        wt = cross.loc[cross[gene] == 0.0]
        ttest = scipy.stats.ranksums(mutated[protein], wt[protein])
        if ttest[1] <= 1:
            print("Test for " + protein + ": ")
            print(ttest)

Test for CCNB1: 
RanksumsResult(statistic=1.9424267848175283, pvalue=0.052085455357175334)
Test for CCNB2: 
RanksumsResult(statistic=1.5343332895066513, pvalue=0.12494768006193521)
Test for CCNC: 
RanksumsResult(statistic=1.225593326548004, pvalue=0.22035175040738875)
Test for CCND1: 
RanksumsResult(statistic=0.40198570354528895, pvalue=0.6876945477781223)
Test for CCNE1: 
RanksumsResult(statistic=0.8119691072749204, pvalue=0.4168093581566388)
Test for CCNH: 
RanksumsResult(statistic=0.9285966384693495, pvalue=0.3530981646150336)
Test for CCNK: 
RanksumsResult(statistic=1.3592501519623812, pvalue=0.17406733400309915)
Test for CCNL1: 
RanksumsResult(statistic=-0.11788003147861141, pvalue=0.9061627154914963)
Test for CCNL2: 
RanksumsResult(statistic=-1.0856057319303507, pvalue=0.2776534627631473)
Test for CCNT1: 
RanksumsResult(statistic=0.9644844312604355, pvalue=0.3348031101852522)
Test for CCNT2: 
RanksumsResult(statistic=0.06280363738440045, pvalue=0.9499228693755033)
Test for CCNY: 