Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
HiC-bench: a Hi-C analysis pipeline that allows combinatorial parameter exploration and benchmarking
HiC-bench is a configurable computational pipeline that allows comprehensive and reproducible analysis of Hi-C sequencing data. It has the following characteristics:
It performs complete Hi-C analysis starting with the alignment of reads (fastq files) and ending with the annotation of specific interactions, their visualization and enrichment analysis.
It is the first Hi-C pipeline that integrates TAD calling using published methods and our own algorithm.
It performs calculation of boundary scores using our own methods and existing ones.
Every pipeline step is followed by summary statistics (when applicable) and visualization of the results. This allows quality control and facilitates troubleshooting.
It is fully expandable and customizable. Users can follow the included wrapper script template in order to add new tools.
It allows parameter exploration and comparison of different methods in a combinatorial fashion. This unique feature facilitates the design and execution of complex benchmark studies that may involve combinations of multiple parameter/tool choices in each step
It has been built with reproducibility in mind. All parameter settings are automatically logged.
How to set up and run a pipeline
Clone the repository:
git clone --depth 1 https://github.com/NYU-BFX/hic-bench.git
Choose a pipeline directory:
Setup input fastq directories:
mkdir inputs/fastq cd inputs/fastq <create one directory per sample and populate with corresponding fastq files>
Create sample sheet in the "inputs" directory:
cd ../inputs ./code/create-sample-sheet.tcsh
Execute the pipeline from the main pipeline directory (hic-bench/pipelines/hicseq-standard):
The software listed here is required: bowtie2 aligner (1), Python (2.7 or later) (along with Numpy, Scipy and Matplotlib libraries), R (3.0.2), GenomicTools (2), various R packages (lattice, RColorBrewer, corrplot, reshape, gplots, preprocessCore, zoo, reshape2, plotrix, pastecs, boot, optparse, ggplot2, genlasso, igraph, Matrix, MASS, flsa, VennDiagram, futile.logger and plyr) and HiCPlotter (3). More details on the versions of the packages can be found in the Manual (sessionInfo()). In addition, installation of mirnylib (https://bitbucket.org/mirnylab/mirnylib) Python library is required for matrix balancing based on IC (4). The pipeline has been tested on a high-perfomance computing cluster based on Sun Grid Engine (SGE). The operating system used was Redhat Linux GNU (64 bit).
In the case you encounter problems, please contact Aristotelis Tsirigos (aristotelis.tsirigos at nyumc dot org) or Charalampos Lazaris (Charalampos.Lazaris at med dot nyu dot edu).
The MIT License (MIT)
Copyright (c) 2016, Charalampos Lazaris, Stephen Kelly, Panagiotis Ntziachristos, Iannis Aifantis and Aristotelis Tsirigos
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
We are grateful to researchers who have performed the Hi-C experiments analyzed in our study and have made their data publicly available. Aristotelis Tsirigos was supported by a Research Scholar Grant, RSG-15-189-01 - RMC from the American Cancer Society. We would like to thank Dennis Shasha and Juliana Freire for inspiring discussions. We would also like to thank Kadir Caner Akderim for useful discussions on the usage of HiCPlotter. We thank the NYUMC Genome Technology Center (GTC) for expert library preparation and sequencing. This shared resource is partially supported by the Cancer Center Support Grant, P30CA016087, at the Laura and Isaac Perlmutter Cancer Center. We are grateful to the NYUMC Applied Bioinformatics Center (ABC) for providing bioinformatics support and helping with the analysis and interpretation of the data. This work has used computing resources at the High Performance Computing Facility (HPCF) of the Center for Health Informatics and Bioinformatics at the NYU Langone Medical Center.
- Ben Langmead, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Meth 2012, 9:357–359.
- Tsirigos A, Haiminen N, Bilal E, Utro F: GenomicTools: a computational platform for developing high-throughput analytics in genomics. Bioinformatics 2012, 28:282–283.
- Akdemir KC, Chin L: HiCPlotter integrates genomic data with interaction matrices. Genome Biol 2015, 16:1270–8.
- Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, Dekker J, Mirny LA: Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Meth 2012, 9:999–1003.
Availability of "data" directory and precompiled binaries
The "data" directory containing genes, fragments (with different restriction enzymes), areas to be excluded from downstream analysis etc. can be downloaded from here: https://goo.gl/741FSS (md5sum: 0b7ab3c0e2c3d56491d8befddef6c2c3). We also provide precompiled binaries that the users can download from here: http://goo.gl/95suZd (md5sum: d90f3835dc5a12631a1a28c012281b48).