Skip to content

Commit

Permalink
editing and adding contact map
Browse files Browse the repository at this point in the history
  • Loading branch information
Michal Sakin committed Jan 27, 2021
1 parent a2d56a4 commit c9388b6
Show file tree
Hide file tree
Showing 5 changed files with 76 additions and 28 deletions.
1 change: 1 addition & 0 deletions docs/source/data_sets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ To download one of the data sets, simply use the wget command:
wget https://s3.amazonaws.com/dovetail.pub/HiC/fastqs/OmniC_2M_R1.fastq
wget https://s3.amazonaws.com/dovetail.pub/HiC/fastqs/OmniC_2M_R2.fastq
For testing purposes, we recommend using the 2M reads data sets, for any other purpose we recommend using the 800M reads data set.

+---------------------+------------------------------------------------------------------------+
| Library | Link |
Expand Down
89 changes: 64 additions & 25 deletions docs/source/epigenetics.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _EPIGENETICS:

Epigenetics application with Omni-C
Epigenetics applications with Omni-C
===================================

In this section you will learn how to:
Expand All @@ -11,26 +11,78 @@ In this section you will learn how to:

- :ref:`Identify A/B compartments<IAC>`

- :ref:`Predict Isochores<PI>`

- :ref:`list CTCF sites in a reference genome<LCTCF>`


Additional Dependencies:
+++++++++++++++++++++++
------------------------

- `Juicer Tools <https://github.com/aidenlab/juicer>`_ - Download the JAR file for juicertools and place it in the same directory as this reposity and name it as ``juicertools.jar``, e,g,:

.. code-block:: console
wget https://s3.amazonaws.com/hicfiles.tc4ga.com/public/juicer/juicer_tools_1.22.01.jar
mv juicer_tools_1.22.01.jar ./Omni-C/juicertools.jar
- Java - If not already installed, you can install Java as follows:

- :ref:`Cooler <https://cooler.readthedocs.io/en/latest/index.html>` - `pip3 install cooler`, or follow instructions on the :ref:`link <https://cooler.readthedocs.io/en/latest/index.html>`
- :ref:`hic2cool <https://github.com/4dn-dcic/hic2cool>` - `pip3 install hic2cool`, or follow instructions on the :ref:`link <https://github.com/4dn-dcic/hic2cool>`
- :ref:`Juicer Tools <https://github.com/aidenlab/juicer>` - `Download the JAR file for juicertools and place it in the same directory as this reposity and name it as `juicertools.jar` e.g. `wget https://s3.amazonaws.com/hicfiles.tc4ga.com/public/juicer/juicer_tools_1.19.02.jar`
- java - `sudo apt install default-jre`
.. code-block:: console
sudo apt install default-jre
.. _GCM:

Generate contact maps
---------------------
Generating contact maps using Juicer tools
------------------------------------------

In preparation
`HiC <https://github.com/aidenlab/juicer/wiki/Pre>`_ is a widely used contact matrix format. Here we will show you how to use ``Juicer tools`` for generating a ``HiC`` contact map.

- ``HiC`` is highly compressed binary representation of the contact matrix

- Provides rapid random access to any genomic region matrix

- Stores contact matrix at 9 different resolutions

- Can be programmatically manipulated using straw python API


The :ref:`.pairs<GPB>` file that you generated in the :ref:`From fastq to final valid pairs bam file<FTB>` section can be used directly with ``Juicer tools`` to generate the `HiC` contact matrix:

.. csv-table::
:file: tables/matrix_hic.csv
:header-rows: 1
:widths: 25 75
:class: tight-table


.. admonition:: Tip no.1

Please note that if you have an older vesrion of ``Juicer tools``, generating contact map directly from ``.pairs`` file may not be supported. We recommend updating to a newer version. As we tested, the ``pre`` utility of the version 1.22.01 support the .pairs to HiC function.

**Command:**

.. code-block:: console
java -Xmx16000m -Djava.awt.headless=true -jar <path_to_juicer_tools.jar> pre --threads <no_of_threads> <mapped.pairs> <contact-map.hic> <ref.genome>
**Example:**

.. code-block:: console
java -Xmx16000m -Djava.awt.headless=true -jar ./Omni-C/juicer_tools_1.22.01.jar pre --threads 16 mapped.pairs contact_map.hic GRCh38.p12.genome
.. admonition:: Tip no.2

``Juicer tools`` offers additional functions that were not discussed here, including matrix normalization and generating matrix for only specified regions in the genome. To learn more about advanced options, please refer to the `Juicer Tools documentation <https://github.com/aidenlab/juicer/wiki/Pre>`_.

The visualization tool ``Juicebox`` can be used to visualize the contact matrix. You can either `download <https://github.com/theaidenlab/juicebox/wiki/Download>`_ a local version of the tool to your computer or use a `web <https://www.aidenlab.org/juicebox/>`_ version of Juicebox. Load your ``.hic`` file to visualize the contact map and zoom in to areas of interest.

.. image:: /images/hic.png
:width: 200pt
:align: center

You can use the contact matrix for calling :ref:`TADs<ITAD>`, identify :ref:`A/B compartments<IAC>` or even for finding large structural variations.


.. _ITAD:
Expand All @@ -49,17 +101,4 @@ Identify A/B compartments
In preparation


.. _PI:

Predict Isochores
-----------------

In preparation


.. _LCTCF:

list CTCF sites in a reference genome
-------------------------------------

In preparation
6 changes: 3 additions & 3 deletions docs/source/fastq_to_bam.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,6 @@ From fastq to final valid pairs bam file
|clock| The full command above, with 2M read pairs on an Ubuntu 18.04 machine with 16 CPUs and 64GiB was completed in less than 5 minutes.
On the same machine type.

|clock| The full command above, with 2M read pairs on an Ubuntu 18.04 machine with 16 CPUs and 64GiB was completed in less than 5 minutes.
On the same machine type.

.. |clock| image:: /images/clock.jpg
:scale: 5 %
Expand Down Expand Up @@ -153,7 +151,7 @@ Removig PCR duplicates

``pairtools dedup`` detects molecules that could be formed via PCR duplication and tags them as “DD” pair type. These pairs should be excluded from downstream analysis. Use the pairtools dedup command with the `--output-stats` option to save the dup stats into a text file.

```pairtools dedup``` options:
``pairtools dedup``` options:

.. csv-table::
:file: tables/dedup.csv
Expand All @@ -175,6 +173,7 @@ Removig PCR duplicates
pairtools dedup --nproc-in 8 --nproc-out 8 --mark-dups --output-stats stats.txt --output dedup.pairsam sorted.pairsam
.. _GPB:

Generate .pairs and bam files
+++++++++++++++++++++++++++++
Expand Down Expand Up @@ -204,6 +203,7 @@ The ``pairtools split`` command is used to split the final ``.pairsam`` into two
pairtools split --nproc-in 8 --nproc-out 8 --output-pairs mapped.pairs --output-sam unsorted.bam dedup.pairsam
The ``.pairs`` file can be used for generating :ref:`contact matrix <GCM>`

Generating the final bam file
+++++++++++++++++++++++++++++
Expand Down
Binary file added docs/source/images/hic.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions docs/source/tables/matrix_hic.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Parameter,Function
-Xmx16000m,"The flag Xmx specifies the maximum memory allocation pool for a Java virtual machine, from our experience 16000m works well when processing human data sets, you can increase the allocated memory if needed."
Djava.awt.headless=true,"Java is ran in a headless mode when the application does not interact with a user (if not specified, the default is Djava.awt.headless=false)"
pre,The pre command allows users to create .hic files from their own data
--threads,Specifies the numbers of threads to be used (integer number)
*.pairs or *.pairs.gz,input file for generating the contact matrix
*.genome,"genome file, listing the chromosomes and their sizes"
*.hic,"hic output file, containing the contact matrix"

0 comments on commit c9388b6

Please sign in to comment.