a parallel R package for detecting copy-number alterations from short sequencing reads
R
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R adding gap expansion Nov 8, 2017
debian merging updates Nov 30, 2016
man adding gap expansion Nov 8, 2017
DESCRIPTION adding gap expansion Nov 8, 2017
LICENSE
NAMESPACE reorg to make github installs easier Jun 2, 2014
README.md Updating README formatting Nov 30, 2017

README.md

The copyCat package for R can detect somatic copy number aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. It achiev0;95;ces higher accuracy than many other packages, and runs faster by utilizing multi-core architectures to parallelize the processing of these large data sets.

copyCat takes in paired samples (tumor and normal) and can utilize mutation frequency information from samtools to help correct for purity and ploidy. This package also includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. It's primary input comes from running bam-window (https://github.com/genome-vendor/bam-window) on the tumor and normal bam files.

Installation

#install devtools if you don't have it already
install.packages("devtools")
library(devtools)
install_github("chrisamiller/copycat")

Usage

library(copyCat)
#The most convenient way to run copyCat is through the functions in meta.R. 
#For a paired tumor/normal sample, this looks something like this:
runPairedSampleAnalysis(annotationDirectory="~/annotations/copyCat/hg19/",
                    outputDirectory="ccout",
                    normal="/path/to/normal_window_file
                    tumor="/path/to/tumor_window_file
                    inputType="bins",
                    maxCores=2,
                    binSize=0, #infer automatically from bam-window output
                    perLibrary=1, #correct each library independently
                    perReadLength=1, #correct each read-length independently
                    verbose=TRUE,
                    minWidth=3, #minimum number of consecutive winds need to call CN
                    minMapability=0.6, #a good default
                    dumpBins=TRUE,
                    doGcCorrection=TRUE,
                    samtoolsFileFormat="unknown", #will infer automatically - mpileup 10col or VCF
                    purity=1,
                    normalSamtoolsFile="normal_mpileup",
                    tumorSamtoolsFile="tumor_mpileup")  #uses the VAFs of mpileup SNPs to infer copy-neutral regions

Annotations

CopyCat requires mapability and gc-content information that is dependent on the read-lengths of your data. (It accepts +/- 10bp as reasonable approximations) Annotation files that cover common read lengths on human build37 are hosted at: https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/copyCat/

Notes

  • The copyCat package is loosely based on readDepth, a tool by the same author.
  • It does support single-sample CN calling using the "runSingleSampleAnalysis" function.
  • It is not specific to the human genome. To create your own annotation files, use the above as a template, and fill in your own annotaions for mapability (using self-aligments with your aligner of choice) and GC-content (for reads starting in each 100bp window).
  • a window size of 10k is generally a reasonable default that balances specificity and sensitivity. (Specific applications may demand higher or lower sizes).