Skip to content

BHKLAB-DataProcessing/GDSC-Pharmacoset_Snakemake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Current Implementation

PSet

Workflow Directory:

Directory Structure
workflow
├── config
│   └── pipeline.yaml
├── envs
│   ├── cnv.yaml
│   ├── fusion.yaml
│   ├── metadata.yaml
│   ├── microarray.Dockerfile
│   ├── microarray.yaml
│   ├── mutation.yaml
│   ├── PharmacoSet.yaml
│   ├── rnaseq.yaml
│   ├── snakemake.yaml
│   ├── test.yaml
│   └── treatmentResponse.yaml
├── profiles
│   ├── gcp
│   │   └── config.yaml
│   └── labserver
│       └── config.yaml
├── rules
│   ├── cnv.smk
│   ├── fusion.smk
│   ├── metadata.smk
│   ├── microarray.smk
│   ├── mutation.smk
│   ├── rnaseq.smk
│   └── treatmentResponse.smk
├── scripts
│   ├── build_PharmacoSet.R
│   ├── cnv
│   │   └── make_CNV_SE.R
│   ├── fusion
│   │   └── make_FUSION_SE.R
│   ├── metadata
│   │   ├── annotate_ChEMBL.R
│   │   ├── annotate_PubChemCIDS.R
│   │   ├── combine_annotated_treatmentData.R
│   │   ├── getCellosaurus
│   │   │   ├── getCellosaurusObject.R
│   │   │   └── mapCellosaurus.R
│   │   ├── map_treatments_to_PubChemCID.R
│   │   ├── preprocess_geneAnnotation.R
│   │   ├── preprocess_sampleMetadata.R
│   │   ├── preprocess_treatmentMetadata.R
│   │   └── utils.R
│   ├── microarray
│   │   └── make_MICROARRAY_SE.R
│   ├── mutation
│   │   └── make_MUTATION_SE.R
│   ├── rnaseq
│   │   └── make_RNASEQ_SE.R
│   ├── template.R
│   └── treatmentResponse
│       ├── build_treatmentResponseExperiment.R
│       └── fit_treatmentResponseExperiment.R
└── Snakefile

Setup

Ensure you have conda installed. If not, install miniconda. Install mamba for faster package management:

conda install mamba -n base -c conda-forge 

Create a conda environment with the required dependencies:

Note

This pipeline uses snakemake=7.32.4. It is highly recommended to build the environment using the provided snakemake.yaml file. All other conda environments that are used in the pipeline are detailed in the envs directory.

mamba env create -f GDSC-Pharmacoset_Snakemake/workflow/envs/snakemake.yaml

Activate the environment:

conda activate gdsc_pharmacoset

Workflow Execution

Note

The pipeline eventually works with hundreds of .CEL microarray files which causes the DAG building to take a few minutes.

Main Run

Note

The following command was tested on a linux machine, with 30 cores and 128GB of RAM. The pipeline can be run on a machine with less resources, but it will take longer to complete. To modify the number of cores used, change the --cores flag to the desired number.

snakemake --profile workflow/profiles/labserver

Dry Run

snakemake --profile workflow/profiles/labserver --dryrun

Create all Conda Environments

Tip

Creating each conda environment can take a long time when running the entire pipeline. To create all conda environments without running the pipeline, use the --conda-create-envs-only flag.

snakemake --profile workflow/profiles/labserver --use-conda --conda-create-envs-only

Rulegraph

snakemake --profile workflow/profiles/labserver --rulegraph | dot -Tsvg > resources/rulegraph.svg

Rulegraph

Directed Acyclic Graph (DAG)

snakemake --profile workflow/profiles/labserver -F --dag | dot -Tsvg > resources/dag.svg

DAG

Filegraph

snakemake --profile workflow/profiles/labserver  --filegraph | dot -Tsvg > resources/filegraph.svg

filegraph

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages