Source code for "Individualized multi-omic pathway deviation scores using multiple factor analysis" (Rau et al., 2019)
This repository contains the following source code files used to analyze the TCGA breast and lung cancer multi-omic data in Rau et al. (2019) using padma.
The TCGA breast and lung cancer data were downloaded, formatted, and pre-processed as described in Rau et al. (2018); R scripts to perform these steps may be found in https://github.com/andreamrau/EDGE-in-TCGA, specifically in the
2_format_and_preprocess_TCGA.R scripts. In addition, the inferred AIMS subtypes for the TCGA breast cancer data found in the
aims_subtypes.txt file may be obtained by running the
AIMS_subtypes.R file in the same directory. In running each of these files in succession, the user obtains the two files
LUAD_results.RData, which are both read in as input for the scripts included in this repo.
The remainder of the scripts and files are organized as follows:
BRCA_mutations.txt: counts of IntOGen driver gene mutations observed for each TCGA barcode.
intogen-BRCA_drivers-data.tsv: table of 184 mutational cancer drivers detected across multiple breast cancer projects.
MFA_BRCA.R: script running the padma approach on the batch-corrected TCGA breast cancer data. Loads script files from the
4_misc/directory, looping over all MSigDB pathways and saving results into a named list.
pathology_report.txt: Histological grade measures for TCGA breast cancer individuals, obtained from http://legacy.dx.ai/tcga_breast. (NOTE: this link now appears to be broken!)
intogen-LUAD_drivers-data.tsv: table of 181 mutational cancer drivers detected across multiple breast cancer projects.
LUAD_mutations.txt: counts of IntOGen driver gene mutations observed for each TCGA barcode.
MFA_LUAD.R: script running the padma approach on the batch-corrected TCGA lung cancer data, looping over all MSigDB pathways and saving results into a named list. Loads script files from the
explore_single-omics.R: R script that calculates padma scores for RNA-seq data alone for a subset of BRCA pathways
healthy_validation.R: R script to perform the computational validation of padma scores using matched healthy and tumor multi-omic (RNA-seq, methylation, miRNA-seq) BRCA data
generalized_MFA_pathway_V3.R: main R script implementing the padma approach (pre-release of associated R package)
global_PCA.R: script performing the single-omic genome- and transcriptome-wide PCAs
paper_figures.R: R script reproducing all analysis figures from the main paper and supplementary materials
Plot_Function_0218_ar.R: R script containing some miscellaneous plot functions
simulations.R: R script to perform simulation study of padma
TCGA_batch_correction_v2.R: R script performing the per-omic batch correction for the BRCA and LUAD data obtained as described in Rau et al. (2018). The output of this script are the files
LUAD_noBatch_v2.rdswhich are input into the
LUAD/MFA_LUAD.Rfiles to run padma. These are omitted here due to space constraints.
hsa_MTI.xlsx: predicted miRNA-target interaction pairs in miRTarBase (version 7.0). To save space here, the spreadsheet has been pre-filtered to include only those pairs with the "Functional MTI" support type.
human_c2_v5p2.rdata: C2 curated gene sets from the Molecular Signatures Database (MSigDB), obtained from http://bioinf.wehi.edu.au/software/MSigDB. Corresponds to a named list of 4729 pathways containing Entrez IDs of member genes.
keggIDs_misc.txt: list of KEGG pathway ID's.
mmc1.xlsx: Table of standardized and curated clinical data included in the TCGA Pan-Cancer Clinical Resource (TCGA-CDR), including progression-free interval. This corresponds to Supplementary Table 1 of Liu et al. (2018).
msig_human.txt: Reformatted table of MSigDB pathways (
human_c2_v5p2.rdata) providing gene symbols rather than Entrez IDs.