pwgs_snakemake

This pipeline was used to produce and parse the trees for this paper.

https://www.sciencedirect.com/science/article/pii/S2666379120300070

if you use it in your work please cite the above ^ as well as the revelant software packages

pwgs_snakemake

Snakemake pipeline for running PhyloWGS on NIH Biowulf Cluster Currently set up to run on copy number calls by FACETS, https://github.com/mskcc/facets, using a parser with an additional FACETS class. Theoretically it should be able to run with any kind of copy number caller if the appropiate changes are made.

Also runs on a vcf file that includes both indels and ssms mushed into one with a parser for making it into a useful file for PhyloWGS. Original PhyloWGS software here: https://github.com/morrislab/phylowgs

Features which might be useful include a tree parser which PhyloWGS doesn't include in their software, an additional classes for a "StrelkaCombinedParser" which will parse a VCF containing both indels and SSMs produced by combining the strelka.snv and strelk.indel files, as well as FACETs copy number caller.

Importantly you should be able to get everything set up by editing just the config file and the samples.csv, but there's a couple of instances with hard coded paths, which might need to be edited.

samples.csv should look like this

patient	sample	cnv_file_type	vcf_file	vcf_file_type
patient_a	one_a	battenberg	/path/to/vcf/one_a.vcf	sanger
patient_a	two_a	battenberg-smchet	/path/to/vcf/two_a.vcf	mutect_pcawg
patient_a	three_a	titan	/path/to/vcf/three_a.vcf	mutect_smchet
patient_b	one_b	facets	/path/to/vcf/one_b.vcf	strelka_indelcombined
patient_b	two_b	facets	/path/to/vcf/two_b.vcf	vardict

See the cnv parser and original Phylowgs parser README for the valid types of cnv_file_type and vcf_file_type. To add an additional file type for the parsers, extend the relevant classes in parse_cnvs_facets_extension.py and create_phylowgs_inputs.py.

Submit on a slurm cluster using the command: sbatch --cpus-per-task=20 --time=20:00:00 snakemake.sh

For an SGE cluster, you'd want a slightly different configuration for the snakemake.sh file. See my "rna_seq_snakemake" repo for an example

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
sample_files		sample_files
README.md		README.md
cluster.yaml		cluster.yaml
cluster_qsub.sh		cluster_qsub.sh
cluster_sge.yaml		cluster_sge.yaml
config.yaml		config.yaml
create_phylowgs_inputs.py		create_phylowgs_inputs.py
helpers.py		helpers.py
index_data.py		index_data.py
multi_vcfcnv.snakefile		multi_vcfcnv.snakefile
parse_cnvs_facets_extension.py		parse_cnvs_facets_extension.py
parse_pwgs_output.py		parse_pwgs_output.py
phyloWGSparsing.snakefile		phyloWGSparsing.snakefile
phyloWGSparsing_multi.snakefile		phyloWGSparsing_multi.snakefile
run_full_pipeline.snakefile		run_full_pipeline.snakefile
run_pwgs.snakefile		run_pwgs.snakefile
snakemake.sh		snakemake.sh
submit.sh		submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pwgs_snakemake

About

Releases

Packages

Languages

aleighbrown/pwgs_snakemake

Folders and files

Latest commit

History

Repository files navigation

pwgs_snakemake

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages