Code and data provided here are for reproducing data used to generate and create Figure 1 and Supplementary Figures 1-4 in gwasurvivr: an R package for genome wide survival analysis manuscript.
The preprint is available on bioRxiv (https://www.biorxiv.org/content/early/2018/05/18/326033).
The package can be found on Bioconductor: https://www.bioconductor.org/packages/gwasurvivr
The package repository can be found at https://github.com/suchestoncampbelllab/gwasurvivr
To download code and simulated data in order to reproduce the results please clone this repository. GitHub Large File Storage is needed to clone the genotype data. Please install before cloning the repository.
git lfs clone https://github.com/suchestoncampbelllab/gwasurvivr_manuscript.git
To address the limited software options for performing survival analyses with millions of SNPs, we developed gwasurvivr
, an R/Bioconductor package with a simple interface for conducting fast genome wide survival analyses using VCF (outputted from Michigan or Sanger imputation servers) and IMPUTE2 files. We benchmarked gwasurvivr with other GWAS software capable of conducting genome wide survival analysis (genipe
, SurvivalGWAS_SV
, and GWASTools
) and demonstrate improved scalability including shorter runtimes for large sample sizes and larger number of SNPs.
Folders in the repository are categorized based on the perfomance tests of gwasurvivr
and other GWAS software capable of conducting genome wide survival analysis (genipe
, SurvivalGWAS_SV
, and GWASTools
). Each folder contains a code
, results
and data
sub directories.
Scripts used for software comparisons and to generate figures used in the manuscript are kept under code
.
-
benchmark_experiments
: Experiments done to compare computational time among software for 100K SNPs, including 3 non-SNP covariates for 100, 1000 and 5000 sample sizes. Each analysis was done in triplicates to account for variability. -
diff_cov_benchmarks
: Computational time across softwares were compared for different number of non-SNP covariates included in the model. Tests were done for 4, 8 and 12 number of covariates. -
figures
: Figures included in the main paper and supplemental materials. -
full_gwas_experiments
: Experiments to determine the computational time ofgwasurvivr::impute2CoxSurv()
function on simulated genome-wide data including all 22 chromosomes. -
hapgen2
: Directory that has the 1000 Genomes CEU data for chromosome 18 and the simulated results. Please refer to/gwasurvivr_manuscript/hapgen2/code/generate*
files for how simulated data was generated. The data is compressed and should be unzipped to fully replicate results without generating own data. -
largeN_experiments
: Testing computing time ofgwasurvivr::impute2CoxSurv()
for sample sizes of 15K, 20K and 25K. -
supplemental_data
: Code for supplemental figures.