Skip to content

Commit

Permalink
merge of master into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
fidelram committed Mar 16, 2018
2 parents a9e10df + d15a9f5 commit 4735b3f
Show file tree
Hide file tree
Showing 78 changed files with 3,829 additions and 1,998 deletions.
3 changes: 3 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
* [ ] Flake8 passes (`flake8 . --exclude=.venv,.build,planemo_test_env,build --ignore=E501,F403,E402,F999,F405,E712`)
* [ ] Local tests pass (`py.test hicexplorer --doctest-modules`)

5 changes: 2 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,8 @@ before_install:

# Install packages
install:
- conda create --name hicexplorer_pytest --yes python=$TRAVIS_PYTHON_VERSION pytest
- source activate hicexplorer_pytest
- conda install --yes -c bioconda -c conda-forge -c defaults --file requirements.txt
- conda install --yes -c conda-forge -c bioconda python=$TRAVIS_PYTHON_VERSION --file requirements.txt
- conda install --yes -c conda-forge -c bioconda pytest
- if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then conda install --yes -c defaults -c conda-forge -c bioconda configparser ; fi
- python setup.py install

Expand Down
6 changes: 4 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@ any type of genomic scores), long range contacts and the visualization of viewpo
Citation:
^^^^^^^^^

Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, Laura Arrigoni, Bjoern A Gruening, Kin Chung Lam, Bianca Habermann, Asifa Akhtar, Thomas Manke
**"High-resolution TADs reveal DNA sequences underlying genome organization in flies"**. bioRxiv 115063; doi: https://doi.org/10.1101/115063

Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, Laura Arrigoni, Bjoern A Gruening, Kin Chung Lam, Bianca Habermann, Asifa Akhtar, Thomas Manke.
**"High-resolution TADs reveal DNA sequences underlying genome organization in flies". Nature Communications**, Volume 9, Article number: 189 (2018), doi: https://doi.org/10.1038/s41467-017-02525-w


.. image:: ./docs/images/hicex2.png

Expand Down
8 changes: 8 additions & 0 deletions bin/hicAggregateContacts
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

from hicexplorer.hicAggregateContacts import main

if __name__ == "__main__":
main()

Empty file modified bin/hicTransform
100644 → 100755
Empty file.
2 changes: 1 addition & 1 deletion docs/content/HiCExport.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,4 +95,4 @@ for differential Hi-C analysis.

- For more information on working with GInteraction and InteractionSet
objects in bioconductor check out `this
vingette <https://bioconductor.org/packages/devel/bioc/vignettes/InteractionSet/inst/doc/interactions.html>`__.
vignette <https://bioconductor.org/packages/devel/bioc/vignettes/InteractionSet/inst/doc/interactions.html>`__.
20 changes: 20 additions & 0 deletions docs/content/example_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,26 @@ plot the counts using the `--log1p` option.
:align: center

Corrected Hi-C counts in log scale.


Quality control of Hi-C data and biological replicates comparison
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

HiCExplorer integrates multiple tools that allow the evualuation of the quality of Hi-C libraries and matrices.

- hicQC on the log files produced by hicBuildMatrix and control of the pdf file produced.
Proportion of useful reads is important to assess the efficiency of the HiC protocol, which is dependant of
proportion of dangling ends detected...
Proportion of inter chromosomal, short range and long range contacts are important for....

- hicPlotDistVsCounts to compare the distribution of corrected Hi-C counts in relation with the genomic
distance between multiple samples. If some differences are observed between biological replicates,
these can be investigated more precisely by computing log2ratio matrices.

- hicCompareMatrices log2ratio of matrices of biological replicates to identify where the potential changes
are located.

- hicPlotPCA bins correlation of two biological replicates.


TAD calling
Expand Down
196 changes: 142 additions & 54 deletions docs/content/list-of-tools.rst

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions docs/content/tools/hicAggregateContacts.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
.. _hicAggregateContacts:

hicAggregateContacts
====================

.. contents::
:local:

Background
^^^^^^^^^^

``hicAggregateContacts`` is a tool that allows plotting of aggregated Hi-C sub-matrices of a specified list of positions. Positions of interest can for example be binding sites of a specific protein that were determined by ChIP-seq or genetic elements as transcription start sites of active genes.

Description
^^^^^^^^^^^

.. argparse::
:ref: hicexplorer.hicAggregateContacts.parse_arguments
:prog: hicAggregateContacts

Usage example
^^^^^^^^^^^^^

Below, you can find an example of an aggregate Hi-C matrix obtained from *Drosophila melanogaster* Hi-C data. The interactions are plotted at binding sites of a protein that were determined by ChIP-seq. We plot sub-matrices of 30 bins (1.5 kb bin size, 45 kb in total). The regions specified in the BED file will be centered between half number of bins and the other half number of bins.The considered range is 300-1000 kb. The range should be adjusted and only contain contacts larger than TAD size to reduce background interactions.

.. code-block:: bash
$ hicAggregateContacts --matrix Dmel.h5 --BED ChIP-seq-peaks.bed \
--outFileName Dmel_aggregate_Contacts --vMin 0.8 --vMax 2.2 \
--range 300000:1000000 --numberOfBins 30 --chromosomes X \
--avgType mean --transform obs/exp
.. image:: ../../images/hicAggregateContacts.png

This example was calculated using mean interactions of an observed vs expected transformed Hi-C matrix. Additional options for the matrix transformation are total-counts or z-score. Aggregate contacts can be plotted in 2D or 3D.
45 changes: 43 additions & 2 deletions docs/content/tools/hicCompareMatrices.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,50 @@
.. _hicCompareMatrices:

hicCorrectMatrix
================
hicCompareMatrices
==================

.. contents::
:local:

Background
^^^^^^^^^^

This tool is useful to compare two matrices in .h5 format by applying operations like difference, ratio or log2ratio after normalization. This can be used to determine the effect of a mutation compared to wild-type samples on contact enrichment, or to see TAD structure modifications near differentially expressed genes between two conditions when followed by :doc:`hicPlotMatrix`. It can also be used to compare two biological replicates.

Description
^^^^^^^^^^^

.. argparse::
:ref: hicexplorer.hicCompareMatrices.parse_arguments
:prog: hicCompareMatrices

Usage example
^^^^^^^^^^^^^

``hicCompareMatrices`` is usually perfomed on corrected matrices (:doc:`hicCorrectMatrix`) with bins merged (:doc:`hicMergeMatrixBins`) depending on the downstream analyses to perform. Here is an example of a log2ratio comparison between M1BP Knockdown and GST cells in *Drosophila melanogaster* on corrected matrices with 50 bins merged (about 30kb bins).

.. code:: bash
hicCompareMatrices -m \
M1BP_KD_merge_m50_corrected.h5 \
GST_merge_rf_m50_corrected.h5 \
--operation log2ratio -o m1bp_over_gst_log2_m50.h5
This code outputs a matrix containing the normalized log2ratio values of M1BP_KD_merge_m50_corrected.h5 over GST_merge_rf_m50_corrected.h5. We can then display this matrix using :doc:`hicPlotMatrix`.

.. code:: bash
hicPlotMatrix -m \
m1bp_over_gst_log2_m50.h5 \
--clearMaskedBins \
--region chr2L:12,000,000-19,000,000 \
--vMin -4 --vMax 4 \
-o m1bp_over_gst_log2_m50_matrix_plot.png
.. image:: ../../images/hicCompareMatrices_m1bp_over_gst_log2_m50_matrix_plot.png

In this plot we see that the cells with a M1BP Knockdown display a negative log2ratio compared to the wild-type. Depletion of M1BP thus show a dramatic effect on the distribution of Hi-C contacts in which short range contacts decrease (Ramirez *et al.* 2017, High-resolution TADs reveal DNA sequences underlying genome organization in flies, https://doi.org/10.1038/s41467-017-02525-w).

Below you can find an example of a log2ratio plot between Hi-C matrices of two biological replicates, no differences are observable which means that the replicates are well correlated.

.. image:: ../../images/hicCompareMatrices_QC_log2_m50_matrix_plot.png
3 changes: 3 additions & 0 deletions docs/content/tools/hicCorrectMatrix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
hicCorrectMatrix
================

.. contents::
:local:

.. argparse::
:ref: hicexplorer.hicCorrectMatrix.parse_arguments
:prog: hicCorrectMatrix
Expand Down
40 changes: 40 additions & 0 deletions docs/content/tools/hicCorrelate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,47 @@
hicCorrelate
============

.. contents::
:local:

Background
^^^^^^^^^^

``hicCorrelate`` is a dedicated Quality Control tool that allows the correlation of multiple Hi-C matrices at once with either a heatmap or scatterplots output.

Description
^^^^^^^^^^^

.. argparse::
:ref: hicexplorer.hicCorrelate.parse_arguments
:prog: hicCorrelate

Usage example
^^^^^^^^^^^^^

Below, you can find a correlation example of uncorrected Hi-C matrices obtained from *Drosophila melanogaster* embryos, either wild-type or having one gene knocked-down by RNAi.

.. code-block:: bash
$ hicCorrelate -m Dmel_wt_1.h5 Dmel_wt_2.h5 Dmel_kd_1.h5 Dmel_kd_2.h5 \
--method=pearson --log1p \
--labels Dmel_wt_1 Dmel_wt_2 Dmel_kd_1 Dmel_kd_2 \
--range 5000:200000 \
--outFileNameHeatmap Dmel_heatmap --outFileNameScatter Dmel_scatterplot \
--plotFileFormat png
Heatmap
-------

.. image:: ../../images/Dmel_heatmap.png

This example is showing a heatmap that was calculated using the Pearson correlation of un-corrected Hi-C matrices with a bin size of 6000 bp. The dendrogram indicates which samples are most similar to each other. You can see that the wild-type samples are seperated from the knock-down samples. The second option we offer is calculating the Spearman correlation.


Scatterplot
-----------

Additionally, pairwise scatterplots comparing interactions between each sample can be plotted.

.. image:: ../../images/Dmel_scatterplot.png

0 comments on commit 4735b3f

Please sign in to comment.