Skip to content

Martiantian/Somatic_cnv_detect_tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Somatic_cnv_detect_tool

cfDNA sequencing cnv analyze pipeline SCDT is a somatic CNV detection tool for detecting sub-chromosome CNVs in cf-DNA using whole genome sequencing (WGS) data or off-target reads in target sequencing data. Additional to using control samples for correcting genome position specific bias, two GC correction steps were performed, which regressed GC content of DNA fragments and that of genome bins, respectively. After GC correction, the coefficients of variation of copy ratios approximated the lower boundary of theoretical values, suggesting removing of almost all systematic errors. Finally, CNVs were detected by a piecewise least squares fitting based segmentation algorithm, which outperformed other segmentation methods. SCDT is comprised of perl and R programs. No installation is necessary except for the required dependencies listed below.

Contact email: qianzhaoyang@genomics.cn wangxiaofeng@genomics.cn shichang@genomics.cn

Download You can use git to download the entire codebase . git clone https://github.com/Martiantian/Somatic_cnv_detect_tool.

Required software The samtools version 1.2 or above must be installed.

Before SCDT Before SCDT running,two steps must be finished as follows.First, the script scdt_cfg.pl under path Somatic_cnv_detect_tool/ need to be completed.Next, files used in both two GC correction steps need to be generated by Somatic_cnv_detect_tool/Stac_ref_GC.pl. $perl Stac_ref_GC.pl [input ref_hg19] [input gc_file_type 1 or 2] [input stat_gc_length(bp)] gc_file_type: file for the 1st step gc correction or the 2nd step gc correction,both files need to be generated before run SCDT Example: perl Stac_ref_GC.pl ~/DataBase/ucsc.hg19.fasta 1 170 perl Stac_ref_GC.pl ~/DataBase/ucsc.hg19.fasta 2 1000000

Run SCDT To start SCDT, you can run 'perl somatic_cnv_detect_tool_180326.pl' to see usage and options. $perl somatic_cnv_detect_tool_180326.pl [options] Options: -i|--input (required)absolute path for sample file list(sort.markdup.bam),col[0]=name col[1]=absolute path sep=\t -o|--out (required)output directory -a|--alpha (required)alpha_confidence_level -d|--debin (required) debin length for first GC normalization(bp) -c|--control absolute path for control list(sort.markdup.bam),col[0]=control col[1]=absolute path, default F -b|--bin bin length(bp),default 1000000 -p|--pair pair end sequencing: T or F,default F -m|--maxq filter_bam_maxq,default 60 -l|--length windows length to calculate the GC-content distribution (bp),default 170 for cfDNA data -t|--threshold threshold to detect CNV,default 0.01(0~1) -r|--readcount readcount to calculate lambda for terminate the loop,default 20000000 --proportion required chimeric proportion for terminate the loop,default 0.03 --cnvsize required CNV size more than n times bin length for terminate the loop,default 3 --off_target off_target_data:T or F,default F --bed a file of target.bed --merge_con a file merge_contrl.cnr.stac.file.list,default F --remove remove tmp_data after detect process:T or F,default T

Example:

perl somatic_cnv_detect_tool_180326.pl -i ./bam.list -a 0.01 -d 100000 -o ./result

About

cfDNA sequencing cnv analyze pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published