Skip to content

Code availability and reproducibility for Guillen et al. 2021

License

Notifications You must be signed in to change notification settings

MHBailey/pdxo_2021_paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdxo_2021_paper

Code availability and reproducibility for Guillen et al. 2021

Overall this project highlights two snakemake workflows to recreate the genomics and drug screening dataset for the following manuscript, "A breast cancer patient-derived xenograft and organoid platform for drug discovery and precision oncology" by Guillen et al. 2021. I'm one of the co-authors of this project and we made it to publication so I'm providing the code that I used to generate the foundation figures for these two sets of analysis. Many of these figures were "touched up" in Adobe illustrator but the base figures are provided here.

In each of the directories I provide a snakemake workflow (Snakefile) that executes the commands. Data for this project will not be stored in this GitHub Repo. Your job is to find those files and pull them in here (I'll have links as I make them available on dbGaP). Some of the publically available data that I formatted (say driver mutatation data from TCGA files, or other gene lists) are provided here. Your job is to get permission from dbGap to pull in the relevant data and then it should run.... (More to come on this front)

STEP 0: Intall miniconda

Follow the instruction provided from the miniconda.

STEP 1: Packages to install

Due to odd dependence with the DESeq2 libraries I've set up a couple different conda envs stats, rnaseq2, and gr50 (all are available in the main directory of the repo).

STEP 1.1 Create the stats env

conda env create -f stats.yml

STEP 1.2 Create the rnaseq2 env

conda env create -f rnaseq2.yml

STEP 1.3 Create the gr50 env

conda env create -f gr50.yml

STEP 2: Building the genomics figures

This code is organized using snakemake. After data is acquired through dbGaP the each of the base figures (before tidying them up using Adobe Illustrator) can be generated one at a time or all together.

I was extra careful with the conda environments to get this off the ground but I found that bioconda's DESeq2 didn't play well with other packages like ComplexHeatmap. What I'm trying to say is sometimes you'll have to switch environments to get the analysis working correctly (All rules that require the rule normalize_RNA) will have to be run with source activate rnaseq2 and stats.

STEP 2.1 (NOTE: make sure use the correct environment with each snakemake command, will clean this up over the next couple of weeks)

source activate stats source activate rnaseq2

STEP 2.2 (use -np for a dry-run)

snakemake -p Figure_1a
snakemake -p Figure_1b
snakemake -p Figure_4b
snakemake -p Figure_4c
snakemake -p Figure_57b
snakemake -p Figure_66c

STEP 3 Building the screening figures

The data for the drug screening data is provided as supplementary tables in the final publication. All you should have to do for this section is to download the .xlsx files and run with it. Over the next few weeks I'll be cleaning up the code (removing redundant or extraneious analysis that didn't contribute to the paper).

Please note that the order of the figure generation is critical to successful figure generation.

STEP 3.1 (NOTE: make sure use the correct environment with each snakemake command, will clean this up over the next couple of weeks)

source activate gr50
cd DrugScreeningFigures
snakemake -p figure_67b
snakemake -p figure_61
snakemake -p figure_5b
conda deactivate
cd ../

STEP 3.2 (use -np for a dry-run)

source activate stats
cd DrugScreeningFigures
snakemake -p figure_5a
snakemake -p figure_58
snakemake -p figsupp67
conda deactivate
cd ../

About

Code availability and reproducibility for Guillen et al. 2021

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published