Code availability and reproducibility for Guillen et al. 2021
Overall this project highlights two snakemake workflows to recreate the genomics and drug screening dataset for the following manuscript, "A breast cancer patient-derived xenograft and organoid platform for drug discovery and precision oncology" by Guillen et al. 2021. I'm one of the co-authors of this project and we made it to publication so I'm providing the code that I used to generate the foundation figures for these two sets of analysis. Many of these figures were "touched up" in Adobe illustrator but the base figures are provided here.
In each of the directories I provide a snakemake workflow (Snakefile) that executes the commands. Data for this project will not be stored in this GitHub Repo. Your job is to find those files and pull them in here (I'll have links as I make them available on dbGaP). Some of the publically available data that I formatted (say driver mutatation data from TCGA files, or other gene lists) are provided here. Your job is to get permission from dbGap to pull in the relevant data and then it should run.... (More to come on this front)
Follow the instruction provided from the miniconda.
Due to odd dependence with the DESeq2 libraries I've set up a couple different conda envs stats, rnaseq2, and gr50 (all are available in the main directory of the repo).
conda env create -f stats.yml
conda env create -f rnaseq2.yml
conda env create -f gr50.yml
This code is organized using snakemake. After data is acquired through dbGaP the each of the base figures (before tidying them up using Adobe Illustrator) can be generated one at a time or all together.
I was extra careful with the conda environments to get this off the ground but I found that bioconda's DESeq2 didn't play well with other packages like ComplexHeatmap. What I'm trying to say is sometimes you'll have to switch environments to get the analysis working correctly (All rules that require the rule normalize_RNA) will have to be run with source activate rnaseq2 and stats.
STEP 2.1 (NOTE: make sure use the correct environment with each snakemake command, will clean this up over the next couple of weeks)
source activate stats source activate rnaseq2
snakemake -p Figure_1a
snakemake -p Figure_1b
snakemake -p Figure_4b
snakemake -p Figure_4c
snakemake -p Figure_57b
snakemake -p Figure_66c
The data for the drug screening data is provided as supplementary tables in the final publication. All you should have to do for this section is to download the .xlsx files and run with it. Over the next few weeks I'll be cleaning up the code (removing redundant or extraneious analysis that didn't contribute to the paper).
Please note that the order of the figure generation is critical to successful figure generation.
STEP 3.1 (NOTE: make sure use the correct environment with each snakemake command, will clean this up over the next couple of weeks)
source activate gr50
cd DrugScreeningFigures
snakemake -p figure_67b
snakemake -p figure_61
snakemake -p figure_5b
conda deactivate
cd ../
source activate stats
cd DrugScreeningFigures
snakemake -p figure_5a
snakemake -p figure_58
snakemake -p figsupp67
conda deactivate
cd ../