Aristotelis Tsirigos edited this page Dec 8, 2017 · 21 revisions

HiC-bench: a Hi-C analysis pipeline that allows combinatorial parameter exploration and benchmarking

HiC-bench is a configurable computational pipeline that allows comprehensive and reproducible analysis of Hi-C sequencing data. It has the following characteristics:

  • It performs complete Hi-C analysis starting with the alignment of reads (fastq files) and ending with the annotation of specific interactions, their visualization and enrichment analysis.

  • It is the first Hi-C pipeline that integrates TAD calling using published methods and our own algorithm.

  • It performs calculation of boundary scores using our own methods and existing ones.

  • Every pipeline step is followed by summary statistics (when applicable) and visualization of the results. This allows quality control and facilitates troubleshooting.

  • It is fully expandable and customizable. Users can follow the included wrapper script template in order to add new tools.

  • It allows parameter exploration and comparison of different methods in a combinatorial fashion. This unique feature facilitates the design and execution of complex benchmark studies that may involve combinations of multiple parameter/tool choices in each step

  • It has been built with reproducibility in mind. All parameter settings are automatically logged.

How to set up and run a pipeline

Clone the repository:

 git clone --depth 1 https://github.com/NYU-BFX/hic-bench.git

Choose a pipeline directory:

cd hic-bench/pipelines/hicseq-standard

Setup input fastq directories:

mkdir inputs/fastq
cd inputs/fastq
<create one directory per sample and populate with corresponding fastq files>

Create sample sheet in the "inputs" directory:

cd ../inputs
./code/create-sample-sheet.tcsh

Execute the pipeline from the main pipeline directory (hic-bench/pipelines/hicseq-standard):

./run

Software requirements

The software listed here is required: bowtie2 aligner (1), Python (2.7 or later) (along with Numpy, Scipy and Matplotlib libraries), R (3.0.2), GenomicTools (2), various R packages (lattice, RColorBrewer, corrplot, reshape, gplots, preprocessCore, zoo, reshape2, plotrix, pastecs, boot, optparse, ggplot2, genlasso, igraph, Matrix, MASS, flsa, VennDiagram, futile.logger and plyr) and HiCPlotter (3). More details on the versions of the packages can be found in the Manual (sessionInfo()). In addition, installation of mirnylib (https://bitbucket.org/mirnylab/mirnylib) Python library is required for matrix balancing based on IC (4). The pipeline has been tested on a high-perfomance computing cluster based on Sun Grid Engine (SGE). The operating system used was Redhat Linux GNU (64 bit).

Help

In the case you encounter problems, please contact Aristotelis Tsirigos (aristotelis.tsirigos at nyumc dot org) or Charalampos Lazaris (Charalampos.Lazaris at med dot nyu dot edu).

License

The MIT License (MIT)

Copyright (c) 2016, Charalampos Lazaris, Stephen Kelly, Panagiotis Ntziachristos, Iannis Aifantis and Aristotelis Tsirigos

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Acknowledgements

We are grateful to researchers who have performed the Hi-C experiments analyzed in our study and have made their data publicly available. Aristotelis Tsirigos was supported by a Research Scholar Grant, RSG-15-189-01 - RMC from the American Cancer Society. We would like to thank Dennis Shasha and Juliana Freire for inspiring discussions. We would also like to thank Kadir Caner Akderim for useful discussions on the usage of HiCPlotter. We thank the NYUMC Genome Technology Center (GTC) for expert library preparation and sequencing. This shared resource is partially supported by the Cancer Center Support Grant, P30CA016087, at the Laura and Isaac Perlmutter Cancer Center. We are grateful to the NYUMC Applied Bioinformatics Center (ABC) for providing bioinformatics support and helping with the analysis and interpretation of the data. This work has used computing resources at the High Performance Computing Facility (HPCF) of the Center for Health Informatics and Bioinformatics at the NYU Langone Medical Center.

References

  1. Ben Langmead, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Meth 2012, 9:357–359.
  2. Tsirigos A, Haiminen N, Bilal E, Utro F: GenomicTools: a computational platform for developing high-throughput analytics in genomics. Bioinformatics 2012, 28:282–283.
  3. Akdemir KC, Chin L: HiCPlotter integrates genomic data with interaction matrices. Genome Biol 2015, 16:1270–8.
  4. Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, Dekker J, Mirny LA: Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Meth 2012, 9:999–1003.

Availability

A pre-release is available on Zenodo: DOI

Availability of "data" directory and precompiled binaries

The "data" directory containing genes, fragments (with different restriction enzymes), areas to be excluded from downstream analysis etc. can be downloaded from here: https://goo.gl/741FSS (md5sum: 0b7ab3c0e2c3d56491d8befddef6c2c3). We also provide precompiled binaries that the users can download from here: http://goo.gl/95suZd (md5sum: d90f3835dc5a12631a1a28c012281b48).

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.