Skip to content

suchestoncampbelllab/gwasurvivr_manuscript

Repository files navigation

README

Introduction

Code and data provided here are for reproducing data used to generate and create Figure 1 and Supplementary Figures 1-4 in gwasurvivr: an R package for genome wide survival analysis manuscript.

The preprint is available on bioRxiv (https://www.biorxiv.org/content/early/2018/05/18/326033).

The package can be found on Bioconductor: https://www.bioconductor.org/packages/gwasurvivr

The package repository can be found at https://github.com/suchestoncampbelllab/gwasurvivr

To download code and simulated data in order to reproduce the results please clone this repository. GitHub Large File Storage is needed to clone the genotype data. Please install before cloning the repository.

git lfs clone https://github.com/suchestoncampbelllab/gwasurvivr_manuscript.git

Abstract

To address the limited software options for performing survival analyses with millions of SNPs, we developed gwasurvivr, an R/Bioconductor package with a simple interface for conducting fast genome wide survival analyses using VCF (outputted from Michigan or Sanger imputation servers) and IMPUTE2 files. We benchmarked gwasurvivr with other GWAS software capable of conducting genome wide survival analysis (genipe, SurvivalGWAS_SV, and GWASTools) and demonstrate improved scalability including shorter runtimes for large sample sizes and larger number of SNPs.

Repo Structure

Folders in the repository are categorized based on the perfomance tests of gwasurvivr and other GWAS software capable of conducting genome wide survival analysis (genipe, SurvivalGWAS_SV, and GWASTools). Each folder contains a code, results and data sub directories. Scripts used for software comparisons and to generate figures used in the manuscript are kept under code.

  • benchmark_experiments: Experiments done to compare computational time among software for 100K SNPs, including 3 non-SNP covariates for 100, 1000 and 5000 sample sizes. Each analysis was done in triplicates to account for variability.

  • diff_cov_benchmarks: Computational time across softwares were compared for different number of non-SNP covariates included in the model. Tests were done for 4, 8 and 12 number of covariates.

  • figures: Figures included in the main paper and supplemental materials.

  • full_gwas_experiments: Experiments to determine the computational time of gwasurvivr::impute2CoxSurv() function on simulated genome-wide data including all 22 chromosomes.

  • hapgen2: Directory that has the 1000 Genomes CEU data for chromosome 18 and the simulated results. Please refer to /gwasurvivr_manuscript/hapgen2/code/generate* files for how simulated data was generated. The data is compressed and should be unzipped to fully replicate results without generating own data.

  • largeN_experiments: Testing computing time of gwasurvivr::impute2CoxSurv() for sample sizes of 15K, 20K and 25K.

  • supplemental_data: Code for supplemental figures.

About

Code and Figures used for the gwasurvivr paper published in Bioinformatics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published