Skip to content
Scripts involved in our workflow for detecting CNVs from WGS data using read depth-based methods
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CNV_overlap.py
CNV_read_depth_checker.sh Initial add Oct 6, 2017
CNVnator.py
CNVworkflowlib.py
COMMANDS.sh Initial commit Oct 13, 2017
Canvas.py Added all scripts so far Oct 4, 2017
ERDS.py Added all scripts so far Oct 4, 2017
Genome_STRiP.py Added all scripts so far Oct 4, 2017
IQR_samtools_depth.py Changed BTlib to CNVworkflowlib Mar 9, 2018
IQR_samtools_depth.sh Changed BTlib to CNVworkflowlib Mar 9, 2018
LICENSE Create LICENSE Oct 12, 2017
LUMPY.py Added LUMPY and Manta Feb 27, 2018
MANTA.py Added LUMPY and Manta Feb 27, 2018
RDXplorer.py Added all scripts so far Oct 4, 2017
README.md
SV.py Added all scripts so far Oct 4, 2017
add_features.py Fixed chr issue Nov 24, 2018
benchmark_overlap_counts.py Changed BTlib to CNVworkflowlib Mar 9, 2018
cnMOPS.py Added all scripts so far Oct 4, 2017
compare_CNVs_to_benchmark.py python executable instead of python3 Oct 5, 2017
compare_with_RLCR_definition.py
convert_CNV_calls_to_common_format.py Added all scripts so far Oct 4, 2017
format_cnvnator_results.py
format_erds_results.py Initial commit Oct 11, 2017
functions.py
get_normalized_depth.py Initial add Oct 6, 2017
hg19_gap.bed Fixed chr issue Nov 24, 2018
index_samtools_depth.py Initial add Oct 6, 2017
merge_Genome_STRiP.py Added all scripts so far Oct 4, 2017
merge_cnvnator_results.py Fixed chr issue Oct 31, 2018
merge_erds_results.py
myvcf.py
process_cnvs.erds+.sh
reproduce_results.sh Generalized Oct 5, 2017
split_HuRef_benchmark.sh Added all scripts so far Oct 4, 2017

README.md

TCAG-WGS-CNV-workflow

This repository contains scripts involved in our workflow for detecting CNVs from WGS data using read depth-based methods.

If you use any of these scripts, please cite:

B. Trost, S. Walker, Z. Wang, B. Thiruvahindrapuram, J.R. MacDonald, W.W.L. Sung, S.L. Pereira, J. Whitney, A.J.S. Chan, G. Pellecchia, M.S. Reuter, S. Lok, R.K.C. Yuen, C.R. Marshall, D. Merico, and S.W. Scherer. A comprehensive workflow for read depth-based identification of copy-number variation from whole-genome sequence data. American Journal of Human Genetics 102(1):142-155, 2018.

The raw HuRef and NA12878 sequencing data used in the paper are available from:

https://www.ncbi.nlm.nih.gov/sra/PRJNA542535

The workflow in the above paper relies on two CNV-detection tools, which can be obtained as follows.

This README file lists, and explains the purpose of, each script. The scripts are divided into three categories:

  1. "main scripts", which are designed to be called directly;
  2. "accessory scripts", which are only meant to be called by/used by the main scripts; and
  3. "commands", which are not meant to be called as scripts, but rather contain commands (for running, e.g., BWA, GATK, the CNV detection algorithms) that the user can copy-and-paste, replace placeholder filenames with their own, and then execute sequentially on their system/cluster.

For instructions on each script, as well as example usage, please refer to the comments at the beginning of each one.

Main scripts (designed to be called directly)

  • benchmark_overlap_counts.py: Output counts summarizing how CNVs in different categories overlap with the benchmark.
  • CNV_overlap.py: Finding overlapping CNV calls from either the CNV-detection algorithms, or different benchmark methods, or both.
  • CNV_read_depth_checker.sh: Calculate the ratio between the read depth of a CNV and the read depth of the same-size surrounding regions.
  • compare_CNVs_to_benchmark.py: Compare CNVs output from the CNV-detection algorithms to a CNV benchmark.
  • compare_with_RLCR_definition.py: Compare CNV calls that have been converted to the common format with the RLCR definition. Requires the "intervaltree" Python module to be installed.
  • convert_CNV_calls_to_common_format.py: Convert CNV calls to common format.
  • IQR_samtools_depth.sh: Calculates IQR from a BAM file.
  • merge_Genome_STRiP.py: Use this script on a Genome STRiP file that has already been converted to the common format using convert_CNV_calls_to_common_format.py in order to merge overlapping calls.
  • process_cnvs.erds+.sh: Use this script to perform the CNVnator-ERDS merging that was used in Stage 3 of the study (the analysis of rare, genic CNVs in the Autism Speaks MSSNG cohort).
  • reproduce_results.sh: A script that runs all the other main scripts in an appropriate sequence.
  • split_HuRef_benchmark.sh: Split the file containing the HuRef benchmark CNVs into separate files, one for each benchmark technology.

Accessory scripts (NOT to be called directly)

  • add_features.py: Used by process_cnvs.erds+.sh.
  • Canvas.py: Custom library for converting Canvas output to common format.
  • cnMOPS.py: Custom library for converting cn.MOPS output to common format.
  • CNVnator.py: Custom library for converting CNVnator output to common format.
  • CNVworkflowlib.py: Custom library of python functions used by other python scripts.
  • ERDS.py: Custom library for converting ERDS output to common format.
  • format_cnvnator_results.py: Used by process_cnvs.erds+.sh.
  • format_erds_results.py: Used by process_cnvs.erds+.sh.
  • functions.py: Used by process_cnvs.erds+.sh.
  • Genome_STRiP.py: Custom library for converting Genome_STRiP output to common format.
  • get_normalized_depth.py: Used by CNV_read_depth_checker.sh to actually calculate the normalized depth of a CNV.
  • index_samtools_depth.py: Used by CNV_read_depth_checker.sh to index a "samtools depth" file for fast use of get_normalized_depth.py.
  • IQR_samtools_depth.py: Does most of the work involved in calculating IQR from a BAM file.
  • merge_cnvnator_results.py: Used by process_cnvs.erds+.sh.
  • merge_erds_results.py: Used by process_cnvs.erds+.sh.
  • myvcf.py: Custom library of python functions for dealing with VCF files.
  • RDXplorer.py: Custom library for converting RDXplorer output to common format.
  • SV.py: Custom library of python functions for representing SVs/CNVs.

Data files (used by scripts)

  • hg19_gap.bed: Used by process_cnvs.erds+.sh

Commands (designed for the user to execute commands one-by-one)

  • commands.sh: a list of commands for running BWA and the CNV-detection algorithms
You can’t perform that action at this time.