Cancer prognosis with hiPathia
This repo contains data and R codes for reproducing numerical results of the following paper:
Yunlong Jiao, Marta Hidalgo, Cankut Cubuk, Alicia Amadoz, Jose Carbonell-Caballero, Jean-Philippe Vert, and Joaquin Dopazo. "Signaling Pathway Activities Improve Prognosis for Breast Cancer." bioRxiv preprint bioRxiv-132357, 2017. bioRxiv-132357.
See the notebook in
results/notebook.[md|html] for the detailed pipeline, codes and results for the numerical experiments of this study.
The top level structure is as follows:
data/- static RData to be used in running experiments, including
other.genes.vals.pt[1|2].RDataare gene-level profiles of breast tumors and
surv.grps.RDatais donor vital outcome, all downloaded from TCGA-ICGC data portal release No.20 and further processed as described in paper;
path.vals.RDataare pathway-level profiles, processed from gene expression with pathway analysis tool hiPathia;
go.vals.RDataare function-level profiles, processed from pathway activities with UniProt or GO annotations;
fpgs.RDatacontains detailed info of KEGG pathways modeled in paper.
results/- results of numerical experiments, including
notebook.htmlis the project notebook with raw codes found in
notebook.Rmd, markdown version in
notebook.mdand figures saved in
results.scores.txtcontain evaluation scores of prediction performance and
results.othergenes.txtcontain selected top features in each type of profile;
runPredict.Ris the R script for running using different profiles to make prediction with an example shell script to submit parallelized tasks to SGE cluster in
runPredict.shon parameters defined in
src/- source code and general purpose scripts, including
func.Rimplements general functions and classifiers for the entire study.
To build the project notebook
results/notebook.html locally, first make sure your local machine has installed the following R packages (or run the corresponding commands to install)
> require(rmarkdown) # install.packages('rmarkdown') > require(knitr) # install.packages('knitr') > require(devtools) # install.packages('devtools') > require(ggplot2) # install.packages('ggplot2') > require(igraph) # install.packages('igraph') > require(org.Hs.eg.db) # source('https://bioconductor.org/biocLite.R'); biocLite('org.Hs.eg.db') > require(GO.db) # source('https://bioconductor.org/biocLite.R'); biocLite('GO.db')
then run in shell, which should take seconds to get the project notebook
$ git clone email@example.com:YunlongJiao/hipathiaCancerPrognosis.git $ cd hipathiaCancerPrognosis/results/ $ Rscript -e "rmarkdown::render('notebook.Rmd', output_format = 'all')"
Note that in order to build the project from scratch, one needs to run the predictions to obtain the three results files
results/results.[scores|pathways|othergenes].txt, which requires to run
runPredict.R before building the project notebook. See
results/notebook.html for detail.
- hiPathia - Signaling pathway model
- Yunlong Jiao - main contributor
- Marta Hidalgo - data processing with hiPathia