# Make tables: create csv files for MAPK signaling proteins

This notebook combines dfs with FDR p-values and differential expressions for 3 cancers. 
Description of created csv files: 
* phospho_MAPK - contains all data in normal format, 
* all_heatmap_MAPK - all data appended to make a long table for easy use with heatmap function, 
* sig_pval_heatmap_MAPK - contains only significant genes (long format),

Imports

In [1]:
import pandas as pd
import numpy as np

import cptac
import cptac.utils as u
import plot_utils as p

In [2]:
print('cptac version:', cptac.version())

cptac version: 0.9.6


Read in the signle cancer dfs with p-values and differential expressions. Merge all dfs into one pancancer data frame.

In [3]:
l_merged = pd.read_csv('csv/LUAD_phospho_MAPK.csv')
e_merged = pd.read_csv('csv/EC_phospho_MAPK.csv')
c_merged = pd.read_csv('csv/CO_phospho_MAPK.csv')

# Make csv of all data

In [4]:
df1 = l_merged.merge(e_merged, on='Phospho',how='outer')
all_df = df1.merge(c_merged, on='Phospho',how='outer')

In [5]:
all_df.to_csv('csv/phospho_MAPK.csv', index=False)

# Make csv of data formatted to use with the HeatMap function 

In [6]:
# Create long df for heatmap

cancer = ['LUAD','EC','CO']
merged_dfs = [l_merged,e_merged,c_merged]

all_long_df = pd.DataFrame()
i = 0
for c in cancer:
    m = merged_dfs[i]
    m2 = m.assign(Cancer = c)
    m2 = m2.rename(columns={c+'_P_Value': 'P_Value'})
    m2 = m2.rename(columns={c+'_Median': 'Medians'})
    all_long_df = all_long_df.append(m2) 
    if i < 7:
        i += 1

print('Check total proteins:', len(all_long_df.Phospho.unique()))

Check total proteins: 2485


In [7]:
#all_long_df.to_csv('csv/all_heatmap_phospho.csv', index=False)

# Make csv of only significant proteins (formatted for heatmap)

Read in list_sig_in_at_least_one_cancer.csv. Convert to a list of genes significant in at least 1 cancer. Slice out genes in the list from the df with all data.

In [8]:
sig = pd.read_csv('csv/list_sig_one_cancer_phospho_MAPK.csv')
list_sig = list(sig['0'])

In [9]:
# Keep genes with at least one sig ttest
bool_df = all_long_df['Phospho'].isin(list_sig)
sig_df = all_long_df[bool_df]
print('Check total sig genes:', len(sig_df.Phospho.unique()))
t = list(sig_df.Phospho)

Check total sig genes: 62


In [10]:
#sig_df.to_csv('csv/sig_pval_heatmap_MAPK.csv', index=False)

# Multiple sig 

In [11]:
mult_sig = pd.read_csv('csv/list_sig_mult_cancers_phospho_MAPK.csv')
list_mult_sig = list(mult_sig['0'])

In [12]:
# Keep genes with at least one sig ttest
bool_df = all_long_df['Phospho'].isin(list_mult_sig)
mult_df = all_long_df[bool_df]
print('Check total sig genes:', len(mult_df.Phospho.unique()))

Check total sig genes: 4


In [13]:
#mult_df.to_csv('csv/mult_sig_heatmap_MAPK.csv', index=False)