Description
HiCancer is a pipeline for phasing cancer genome. It uses Hi-C paired-end reads and called SNPs as input and outputs chromosome-level haplotypes of cancer genome. HiCancer filtered somatic SNPs and phase the LOH regions in a correct way. At the same time, HiCancertakes advantage of allelic copy number imbalance in aneuploid regions and linkage disequilibrium information to improve thecompleteness and accuracy by assembling fragmented haplotypes, adding the lost SNPs back into haplotypes (imputation)and correcting the switching errors.
Dependency
- python The majoy part of HiCancer is written in Python, so Python has to be installed. Python2.7(or above) is suggested.
- samtools https://github.com/samtools/samtools
- bwa http://bio-bwa.sourceforge.net/
- HapCUT2 https://github.com/vibansal/HapCUT2
- Beagle5 http://faculty.washington.edu/browning/beagle/beagle.html Also need to download the genetic map files as beagle input
- reference genome http://hgdownload.soe.ucsc.edu/downloads.html#human
- 1000 genomic project vcf files https://www.internationalgenome.org/data
Usage
The whole pipeline is contained in HiCancer.sh. The users need to fill the first part ("input files") of HiCancer.sh which represent the all input files and parameters needed, and then run it.
Output
hapcutfile_filtered_beagleoutput_*.final gives the phased th chromosme before completing step. The result of completing step is contained in chr_phasing files.