The copyCat package for R can detect somatic copy number aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. It achiev0;95;ces higher accuracy than many other packages, and runs faster by utilizing multi-core architectures to parallelize the processing of these large data sets.
copyCat takes in paired samples (tumor and normal) and can utilize mutation frequency information from samtools to help correct for purity and ploidy. This package also includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. It's primary input comes from running bam-window (https://github.com/genome-vendor/bam-window) on the tumor and normal bam files.
#install devtools if you don't have it already install.packages("devtools") library(devtools) install_github("chrisamiller/copycat")
library(copyCat) #The most convenient way to run copyCat is through the functions in meta.R. #For a paired tumor/normal sample, this looks something like this: runPairedSampleAnalysis(annotationDirectory="~/annotations/copyCat/hg19/", outputDirectory="ccout", normal="/path/to/normal_window_file tumor="/path/to/tumor_window_file inputType="bins", maxCores=2, binSize=0, #infer automatically from bam-window output perLibrary=1, #correct each library independently perReadLength=1, #correct each read-length independently verbose=TRUE, minWidth=3, #minimum number of consecutive winds need to call CN minMapability=0.6, #a good default dumpBins=TRUE, doGcCorrection=TRUE, samtoolsFileFormat="unknown", #will infer automatically - mpileup 10col or VCF purity=1, normalSamtoolsFile="normal_mpileup", tumorSamtoolsFile="tumor_mpileup") #uses the VAFs of mpileup SNPs to infer copy-neutral regions
CopyCat requires mapability and gc-content information that is dependent on the read-lengths of your data. (It accepts +/- 10bp as reasonable approximations) Annotation files that cover common read lengths on human build37 are hosted at: https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/copyCat/
- The copyCat package is loosely based on readDepth, a tool by the same author.
- It does support single-sample CN calling using the "runSingleSampleAnalysis" function.
- It is not specific to the human genome. To create your own annotation files, use the above as a template, and fill in your own annotaions for mapability (using self-aligments with your aligner of choice) and GC-content (for reads starting in each 100bp window).
- a window size of 10k is generally a reasonable default that balances specificity and sensitivity. (Specific applications may demand higher or lower sizes).