# Make Supplemental Tables and variations

This notebook combines dfs with p-values and correlations for 8 cancers. It makes the supplmental data table as well as the dataframes used in downstream anaylisis.  
Description of created csv files: 
* Supplemental_Table_2  - This is the supplemental table provided with the manuscript. It has the FDR corrected p-values and correlations for all proteins, 
* Supplemental_Table_EGFR_sig_only - This is the filtered version of suppl table 1 which only has FDR significant p-values
* all_heatmap - all data appended to make a long table for easy use with heatmap function, 
* sig_pval_heatmap - contains only significant genes in long format for heatmap, 

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats
import gseapy as gp
import re
import sys 


  import pandas.util.testing as tm


Read in the single cancer dfs with FDR corrected p-values and correlation values. Merge all dfs into one pancancer data frame.

In [2]:
Gbm_df = pd.read_csv('csv_files/trans_effects_all_prot_fdr_corrected_GBM')
Hnscc_df = pd.read_csv('csv_files/trans_effects_all_prot_fdr_corrected_HNSCC')
Luad_df = pd.read_csv('csv_files/trans_effects_all_prot_fdr_corrected_LUAD')
Lscc_df = pd.read_csv('csv_files/trans_effects_all_prot_fdr_corrected_LSCC')
Brca_df = pd.read_csv('csv_files/trans_effects_all_prot_fdr_corrected_BR')
Ovarian_df = pd.read_csv('csv_files/trans_effects_all_prot_fdr_corrected_OV')
Colon_df = pd.read_csv('csv_files/trans_effects_all_prot_fdr_corrected_CO')
Kidney_df = pd.read_csv('csv_files/trans_effects_all_prot_fdr_corrected_ccRCC')


# Make Supplemental Table 2

Merge all cancer data frames into one wide data frame

In [3]:
pancan = pd.merge(Gbm_df, Kidney_df, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Ovarian_df, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Luad_df, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Lscc_df, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Brca_df, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Colon_df, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Hnscc_df, on="Comparison", how = "outer")
pancan = pancan[1:]
pancan.to_csv('csv_files/Supplemental_Table_2.csv', index=False)
pancan.head()

Unnamed: 0,Comparison,Correlation_GBM,P_value_GBM,Correlation_ccRCC,P_value_ccRCC,Correlation_OV,P_value_OV,Correlation_LUAD,P_value_LUAD,Correlation_LSCC,P_value_LSCC,Correlation_BR,P_value_BR,Correlation_CO,P_value_CO,Correlation_HNSCC,P_value_HNSCC
1,PHLDA1,0.816848,3.507071e-21,0.254436,0.060261,,,0.26011,0.07453,0.71342,2.644826e-14,0.364797,0.002164,0.386104,0.122847,0.664271,8.88864e-12
2,GRB2,-0.610889,6.72999e-08,-0.217427,0.120342,-0.19009,0.346111,-0.302439,0.020631,-0.198042,0.2437176,-0.177379,0.142733,0.15096,0.347409,-0.532341,3.320092e-06
3,SOCS2,0.56272,3.420388e-06,,,,,,,0.472624,0.01417921,,,,,0.020297,0.95573
4,CDH4,0.55918,3.420388e-06,0.148407,0.51349,,,,,,,,,,,,
5,DAB2,-0.556402,3.420388e-06,-0.076173,0.673774,0.076981,0.75051,-0.086403,0.597546,-0.072496,0.7501117,0.326055,0.003543,-0.147519,0.360266,-0.208437,0.149098


# Make Supplemental_Table_EGFR_sig_only

In [4]:
Gbm_df_sig =Gbm_df.loc[(Gbm_df["P_value_GBM"] <= 0.05)]
Kidney_df_sig =Kidney_df.loc[(Kidney_df["P_value_ccRCC"] <= 0.05)]
Colon_df_sig =Colon_df.loc[(Colon_df["P_value_CO"] <= 0.05)]
Ovarian_df_sig =Ovarian_df.loc[(Ovarian_df["P_value_OV"] <= 0.05)]
Luad_df_sig =Luad_df.loc[(Luad_df["P_value_LUAD"] <= 0.05)]
Lscc_df_sig =Lscc_df.loc[(Lscc_df["P_value_LSCC"] <= 0.05)]
Brca_df_sig =Brca_df.loc[(Brca_df["P_value_BR"] <= 0.05)]
Hnscc_df_sig =Hnscc_df.loc[(Hnscc_df["P_value_HNSCC"] <= 0.05)]

In [5]:
pancan = pd.merge(Gbm_df_sig, Kidney_df_sig, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Ovarian_df_sig, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Luad_df_sig, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Lscc_df_sig, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Brca_df_sig, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Colon_df_sig, on="Comparison", how = "outer")
pancan = pd.merge(pancan, Hnscc_df_sig, on="Comparison", how = "outer")
pancan = pancan[1:]
pancan.to_csv('csv_files/Supplemental_Table_EGFR_sig_only.csv', index=False)


# Make All Heatmap

In [6]:
# Create long df for heatmap

cancer = ['GBM','HNSCC','LSCC','LUAD','BR','OV','ccRCC','CO']
merged_dfs = [Gbm_df,Hnscc_df,Lscc_df,Luad_df,Brca_df,Ovarian_df,Kidney_df,Colon_df]

all_long_df = pd.DataFrame()
i = 0
for c in cancer:
    m = merged_dfs[i]
    m2 = m.assign(Cancer = c)
    m2 = m2.rename(columns={'P_value_' + c: 'P_Value'})
    m2 = m2.rename(columns={'Correlation_' + c: 'Correlation'})
    all_long_df = all_long_df.append(m2) 
    if i < 7:
        i += 1
        


In [7]:
all_long_df.to_csv('csv_files/all_prot_heatmap_EGFR.csv', index=False)
all_long_df

Unnamed: 0,Comparison,Correlation,P_Value,Cancer
0,EGFR,1.000000,0.000000e+00,GBM
1,PHLDA1,0.816848,3.507071e-21,GBM
2,GRB2,-0.610889,6.729990e-08,GBM
3,SOCS2,0.562720,3.420388e-06,GBM
4,CDH4,0.559180,3.420388e-06,GBM
...,...,...,...,...
7108,AK1,-0.000256,9.985768e-01,CO
7109,KRI1,-0.000217,9.986912e-01,CO
7110,MUL1,-0.000272,9.986912e-01,CO
7111,CADPS,0.000064,9.997745e-01,CO


# Significant P values of Heatmap df 

In [8]:
only_sig_pvals = all_long_df.loc[(all_long_df["P_Value"] <= 0.05)]
only_sig_pvals.to_csv('csv_files/sig_prot_heatmap_EGFR.csv', index=False)