Skip to content

WuOmicsLab/ReProMSig

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReProMSig

Description

Reproducible Prognosis Molecular Signature (ReProMSig) (https://omics.bjcancer.org/prognosis/) platform could help develop and validate a multivariable prognostic/predictive biomarker in a transparent and reproducible way, with the following advanced features:

  • It streamlines the analysis process in development of a multivariable prediction model using molecular profiles and/or clinicopathological factors, as well as evaluation of its prognostic and/or predictive value.

  • The full detail of modelling procedures and results can be provided as a signature report file (example report), which is well-designed following the TRIPOD statement.

  • Long-term storage and management of user datasets and signatures are supported for registered users.

  • Risk assessment for single patient using a developed biomarker is provided for research purpose.

This repository hosts source code of ReProMSig analysis pipeline, which can be used for local analysis. The generated signature models could also be uploaded to the ReProMSig web server for sharing to public.

Installation

System requirements: R >= 3.6.1 and Python.

  1. Run the installiation script to install the required packages automatically before using the pipeline.
Rscript scripts/package.install.R

Note: if automatic installation fails for some packages, please try manual installation with the failed package as below:

# Linux users: specify the specific version, e.g.
remotes::install_version("glmnet", version = "3.0-2", repos = "https://cran.us.r-project.org")

# Mac/Windows users: install the package in binary format (type = "binary"), e.g.
install.packages('glmnet', type='binary')
  1. Install shyaml package for processing user provided config files (YAML format).
pip install shyaml
  1. Install pandoc (v1.12.3 or higher required) for converting RMarkdown document to the HTML-format reporting file. Please run pandoc -v to check pandoc version if it has been installed on your system previously.

Prepare data before running

The portal script repromsig.sh takes two config files in YAML format as input. User need to provide clinicopathological and/or molecular profiles that will be used as training and validation cohort(s).

1) Config file for analysis (YAML format)

This YAML file consists of data path and analysis parameters. Please see ColoGuide_Stage_II_local/input/analysis.yaml for an example and config/analysis.default.setting.yaml for a complete list of configurations.

2) Config file for reporting (YAML format)

This YAML file consists of structured information needed for generating the reporting file of a developed siganture, according to the TRIPOD guideline. Please see ColoGuide_Stage_II_local/input/reporting.yaml for an example.

3) Clinicopathological / Molecular profile files

Please visit ReProMsig tutorial (section '1.1 Private datasets') for file format details.

  • Patient annotation Patient annotation file consists of three groups of columns including fixed columns, endpoint columns and custom columns. The names of fixed columns should be identical to the table template file. These clinicopathological parameters will be used for query samples suitable for analysis, as well as for prognosis model development. Please note that missing values in all columns should be provided as "NA". You can generate the formatted patient annotation file by modifying the template file.

  • Molecular profiles Molecular profile file is a feature-by-sample matrix in TXT/CSV/Excel format. Feature IDs should be provided at the first column with a column name "ID". The additional columns are molecular features for each sample with sample identifiers as column names. Molecular features could be gene mutation status, mRNA/non-coding RNA/protein quantification levels, methylation levels, etc.

Usages

# The portal script is repromsig.sh, which takes two aforementioned yaml files as input. 
bash scripts/repromsig.sh [analysis.yaml] [reporting.yaml]

# Running an example project
bash scripts/repromsig.sh ColoGuide_Stage_II_local/input/analysis.yaml  ColoGuide_Stage_II_local/input/reporting.yaml

This analysis will create multiple output sub-folders, including information extracted from analysis.yaml file (rda dir), output files from signature modelling (modeldir), independence test (independence dir), discrimination and calibration evaluation (performance dir), survival differences inspection between risk groups (external_evaluate dir), summary tables and figures for TRIPOD reporting (tripod dir) and the RData file and reporting html file (upload dir) that could be uploaded to "My signature" module of ReProMSig web server for sharing.

Script details

repromsig.sh utilizes multiple scripts to perform data processing, extracting, modelling and reporting, as shown below:

Script Description
ymal.process.R Data processing and extract analysis parameters from the user-provided analysis.yaml.
model.analysis.R Perform predictor selection, multivariable prediction model building, signature score calculation and patient risk group stratification.
independence.analysis.R Perform univariate and multivariate Cox regression analyses, to test whether the signature is an independent prognostic or predictive factor.
performance.analysis.R Perform model evaluation, including time dependent receiver operating characteristic (ROC), prediction error (PE) , and calibration analysis.
KM.evaluate.analysis.R Inspect the survival differences between risk groups by Kaplan-Meier analysis and log-rank test.
tripod.report.input.R Generate the variables, tables and graphs for a signature that will be shown in the reporting file.
tripod.report.html.R Export the reporting html file by extract data from analysis output and used-provided "tripod yaml file".
local.rdata.generate.R Export the RData file that could be uploaded to the ReProSig website, for displaying and sharing.

License

ReProMSig is free for academic users of non-commercial purposes. Commercial use of ReProMSig requires a license. If ReProMSig package was used for your analysis, please cite our package.

Contact information

Lihua Cao (lihuacao@bjcancer.org), Tingting Zhao(zhaott@bjmu.edu.cn), Jianmin Wu (wujm@bjmu.edu.cn).

About

A pipeline for developing Reproducible Prognosis Molecular Signature (ReProMSig)

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages