Skip to content

Snakemake pipeline for running PhyloWGS on NIH Biowulf Cluster

Notifications You must be signed in to change notification settings

aleighbrown/pwgs_snakemake

Repository files navigation

This pipeline was used to produce and parse the trees for this paper.

https://www.sciencedirect.com/science/article/pii/S2666379120300070

if you use it in your work please cite the above ^ as well as the revelant software packages

pwgs_snakemake

Snakemake pipeline for running PhyloWGS on NIH Biowulf Cluster Currently set up to run on copy number calls by FACETS, https://github.com/mskcc/facets, using a parser with an additional FACETS class. Theoretically it should be able to run with any kind of copy number caller if the appropiate changes are made.

Also runs on a vcf file that includes both indels and ssms mushed into one with a parser for making it into a useful file for PhyloWGS. Original PhyloWGS software here: https://github.com/morrislab/phylowgs

Features which might be useful include a tree parser which PhyloWGS doesn't include in their software, an additional classes for a "StrelkaCombinedParser" which will parse a VCF containing both indels and SSMs produced by combining the strelka.snv and strelk.indel files, as well as FACETs copy number caller.

Importantly you should be able to get everything set up by editing just the config file and the samples.csv, but there's a couple of instances with hard coded paths, which might need to be edited.

samples.csv should look like this

patient sample cnv_file_type vcf_file vcf_file_type
patient_a one_a battenberg /path/to/vcf/one_a.vcf sanger
patient_a two_a battenberg-smchet /path/to/vcf/two_a.vcf mutect_pcawg
patient_a three_a titan /path/to/vcf/three_a.vcf mutect_smchet
patient_b one_b facets /path/to/vcf/one_b.vcf strelka_indelcombined
patient_b two_b facets /path/to/vcf/two_b.vcf vardict

See the cnv parser and original Phylowgs parser README for the valid types of cnv_file_type and vcf_file_type. To add an additional file type for the parsers, extend the relevant classes in parse_cnvs_facets_extension.py and create_phylowgs_inputs.py.

Submit on a slurm cluster using the command: sbatch --cpus-per-task=20 --time=20:00:00 snakemake.sh

For an SGE cluster, you'd want a slightly different configuration for the snakemake.sh file. See my "rna_seq_snakemake" repo for an example

About

Snakemake pipeline for running PhyloWGS on NIH Biowulf Cluster

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages