Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Michal Sakin committed Jan 24, 2021
1 parent 5e0a54a commit a2d56a4
Show file tree
Hide file tree
Showing 6 changed files with 37 additions and 30 deletions.
8 changes: 4 additions & 4 deletions docs/source/before_you_begin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Make sure that the following dependencies are installed:
- `preseq <http://smithlabresearch.org/software/preseq/>`_
- `samtools <https://github.com/samtools/samtools>`_


If you are facing any issues with the installation of any of the dependencies, please contact the supporter of the relevant package.
python3 and pip3 are required, if you don't already have them installed, you will need sudo privileges.

* Update and install python3 and pip3:
Expand All @@ -47,16 +47,16 @@ python3 and pip3 are required, if you don't already have them installed, you wil
sudo update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
If you are working on a new machine and don't have the dependencies, you can use the ``installDep.sh`` script in this repository for updating your instance and installing the dependencies and python3. This process will take approximitly 10' and requiries sudo privileges. The scrit was tested on Ubuntu 18.04 with the latest version as of 04/11/2020
If you are working on a new machine and don't have the dependencies, you can use the ``installDep.sh`` script in this repository for updating your instance and installing the dependencies and python3. This process will take approximatley 10' and requires sudo privileges. The script was tested on Ubuntu 18.04 with the latest version as of 04/11/2020

If you choose to run the provided installetion script you will first need to set the permission to the file:
If you choose to run the provided installation script you will first need to set the permission to the file:

.. code-block:: console
chmod +x ./Omni-C/installDep.sh
And then run the installetion script:
And then run the installation script:

.. code-block:: console
Expand Down
4 changes: 3 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,5 +173,7 @@
# A list of files that should not be packed into the epub file.
epub_exclude_files = ['search.html']


#css
def setup(app):
app.add_stylesheet('css/custom.css')
# -- Extension configuration -------------------------------------------------
47 changes: 26 additions & 21 deletions docs/source/fastq_to_bam.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,32 +3,33 @@
From fastq to final valid pairs bam file
========================================

.. _impatient:
.. _Impatient:

fastq to final valid pairs bam file - for the impatient
-------------------------------------------------------
.. admonition:: fastq to final valid pairs bam file - for the impatient!

If you just want to give it a shot and run all the alignment and filtering steps without going over all the details, we made a shorter version for you, with all the steps piped, outputing a final bam file with its index file and a dup stats file, otherwise move to the next section :ref:`fastq to final valid pairs bam file - step by step<step-by-step>`
If you just want to give it a shot and run all the alignment and filtering steps without going over all the details, we made a shorter version for you, with all the steps piped, outputting a final bam file with its index file and a dup stats file, otherwise move to the next section :ref:`fastq to final valid pairs bam file - step by step<step-by-step>`

**Command:**
**Command:**

.. code-block:: console
.. code-block:: console
bwa mem -5SP -T0 -t<cores> <ref.fa> <OmniC.R1.fastq.gz> <OmniC.R2.fastq.gz>| \
pairtools parse --min-mapq 40 --walks-policy 5unique \
--max-inter-align-gap 30 --nproc-in <cores> --nproc-out <cores> --chroms-path <ref.genome> | \
pairtools sort --tmpdir=<full_path/to/tmpdir> --nproc <cores>|pairtools dedup --nproc-in <cores> \
--nproc-out <cores> --mark-dups --output-stats <stats.txt>|pairtools split --nproc-in <cores> \
--nproc-out <cores> --output-pairs <mapped.pairs> --output-sam -|samtools view -bS -@<cores> | \
samtools sort -@<cores> -o <mapped.PT.bam>;samtools index <mapped.PT.bam>
bwa mem -5SP -T0 -t<cores> <ref.fa> <OmniC.R1.fastq.gz> <OmniC.R2.fastq.gz>| \
pairtools parse --min-mapq 40 --walks-policy 5unique \
--max-inter-align-gap 30 --nproc-in <cores> --nproc-out <cores> --chroms-path <ref.genome> | \
pairtools sort --tmpdir=<full_path/to/tmpdir> --nproc <cores>|pairtools dedup --nproc-in <cores> \
--nproc-out <cores> --mark-dups --output-stats <stats.txt>|pairtools split --nproc-in <cores> \
--nproc-out <cores> --output-pairs <mapped.pairs> --output-sam -|samtools view -bS -@<cores> | \
samtools sort -@<cores> -o <mapped.PT.bam>;samtools index <mapped.PT.bam>
**Example:**

**Example:**
.. code-block:: console
.. code-block:: console
bwa mem -5SP -T0 -t16 hg38.fasta OmniC_2M_R1.fastq OmniC_2M_R2.fastq| pairtools parse --min-mapq 40 --walks-policy 5unique --max-inter-align-gap 30 --nproc-in 8 --nproc-out 8 --chroms-path hg38.genome | pairtools sort --tmpdir=/home/ubuntu/ebs/temp/ --nproc 16|pairtools dedup --nproc-in 8 --nproc-out 8 --mark-dups --output-stats stats.txt|pairtools split --nproc-in 8 --nproc-out 8 --output-pairs mapped.pairs --output-sam -|samtools view -bS -@16 | samtools sort -@16 -o mapped.PT.bam;samtools index mapped.PT.bam
bwa mem -5SP -T0 -t16 hg38.fasta OmniC_2M_R1.fastq OmniC_2M_R2.fastq| pairtools parse --min-mapq 40 --walks-policy 5unique --max-inter-align-gap 30 --nproc-in 8 --nproc-out 8 --chroms-path hg38.genome | pairtools sort --tmpdir=/home/ubuntu/ebs/temp/ --nproc 16|pairtools dedup --nproc-in 8 --nproc-out 8 --mark-dups --output-stats stats.txt|pairtools split --nproc-in 8 --nproc-out 8 --output-pairs mapped.pairs --output-sam -|samtools view -bS -@16 | samtools sort -@16 -o mapped.PT.bam;samtools index mapped.PT.bam
|clock| The full command above, with 2M read pairs on an Ubuntu 18.04 machine with 16 CPUs and 64GiB was completed in less than 5 minutes.
On the same machine type.

|clock| The full command above, with 2M read pairs on an Ubuntu 18.04 machine with 16 CPUs and 64GiB was completed in less than 5 minutes.
On the same machine type.
Expand Down Expand Up @@ -81,7 +82,7 @@ Bwa mem will output a sam file that you can either pipe or save to a path using
Recording valid ligation events
+++++++++++++++++++++++++++++++

We use the ``parse`` module of the ``pairtools`` pipeline to find ligation junctions in Omni-C (and other proximity ligation) libraries. When a ligation event is identified in the alignment file the pairtools pipeline will record the outer-most (5’) aligned base pair and the strand of each one of the paired reads into `.pairsam` file (pairam fromat captures SAM entries together with the Hi-C pair information). In addition, it will also asign a pair type for each event. e.g. if both reads aligned uniquely to only one region in the genome, the type UU (Unique-Unique) will be assigned to the pair. The following steps are necessary to identify the high quality valid pairs over low quality events (e.g. due to low mapping quality):
We use the ``parse`` module of the ``pairtools`` pipeline to find ligation junctions in Omni-C (and other proximity ligation) libraries. When a ligation event is identified in the alignment file the pairtools pipeline will record the outer-most (5’) aligned base pair and the strand of each one of the paired reads into ``.pairsam`` file (pairsam format captures SAM entries together with the Hi-C pair information). In addition, it will also asign a pair type for each event. e.g. if both reads aligned uniquely to only one region in the genome, the type UU (Unique-Unique) will be assigned to the pair. The following steps are necessary to identify the high quality valid pairs over low quality events (e.g. due to low mapping quality):


``pairtools parse`` options:
Expand All @@ -101,7 +102,7 @@ We use the ``parse`` module of the ``pairtools`` pipeline to find ligation junct
.. code-block:: console
pairtools parse --min-mapq 40 --walks-policy 5unique --max-inter-align-gap 30 --nproc-in <cores>\
--nproc-out <cores> --chroms-path <ref.genome> <aligned.sam> > <parsed.pairsam>
--nproc-out <cores> --chroms-path <ref.genome> <aligned.sam> > <parsed.pairsam>
**Example:**
Expand Down Expand Up @@ -141,10 +142,14 @@ The parsed pairs are then sorted using `pairtools sort`
pairtools sort --nproc 16 --tmpdir=/home/ubuntu/ebs/temp/ parsed.pairsam > sorted.pairsam
.. admonition:: Important!

Please note that an absolute path for the temp directory is required for ``pairtools sort``, e.g. path of the structure ~/ebs/temp/ or ./temp/ will not work, instead, something of this sort is needed /home/user/ebs/temp/

.. _DUPs:

Removig dups
++++++++++++
Removig PCR duplicates
++++++++++++++++++++++

``pairtools dedup`` detects molecules that could be formed via PCR duplication and tags them as “DD” pair type. These pairs should be excluded from downstream analysis. Use the pairtools dedup command with the `--output-stats` option to save the dup stats into a text file.

Expand Down Expand Up @@ -174,7 +179,7 @@ Removig dups
Generate .pairs and bam files
+++++++++++++++++++++++++++++

The ``pairtools split`` command is used to split the final ``.pairsam`` into two files: ``.sam`` (or ``.bam``) and ``.pairs`` (``.pairsam`` has two extra columns containing the alignments from which the Micro-C pair was extracted, these two columns are not included in ``.pairs`` files)
The ``pairtools split`` command is used to split the final ``.pairsam`` into two files: ``.sam`` (or ``.bam``) and ``.pairs`` (``.pairsam`` has two extra columns containing the alignments from which the Omni-C pair was extracted, these two columns are not included in ``.pairs`` files)

``pairtools split`` options:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Overview

- This guide will take you step by step on how to QC your Omni-C library, how to interparate the QC results and how to generate :ref:`contact maps <GCM>`, study :ref: `chromatin structure<ITAD>`, use Omni-C data analyzing and enhancing your assembly and more. If you don't yet have a sequenced Omni-C library and you want to get familiar with the data, you can download Omni-C sequenced libraries from our publicaly available :ref:`data sets<DATASETS>`.

- To :ref:`QC<LQ>` your library, first you will need to align the reads to the reference, filtering and additional alignment processing will be done using `pairtools package <https://github.com/open2c/pairtools>`_ and the final output `bam` file will be used for QC-ing the library and downstream steps.
- The QC process starts with aligning the reads to a reference genome then retaining high quality mapped reads. From there the mapped data will be used to generating a pairs file with pairtools, which categorizes pairs by read type and insert distance, this step both flags and removes PCR duplicates. Once pairs are categorized, counts of each class are summed and reported.

- If this is your first time following this tutorial, please check the :ref:`Before you begin page <BYB>` first.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/library_qc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Library QC
==========


At step :ref:`Removing dups<DUPs>` you used the flag `--output-stats`, genereting a stats file in addition to the pairsam output (e.g. --output-stats stats.txt). The stats file is an extensive output of pairs statistics as calulated by pairtools, including total reads, total mapped, total dups, total pairs for each pair of chromosomes etc'. Although you can use directly the pairtools stats file as is to get informed on the quality of the Omni-C library, we find it easier to focus on a few key metrics. We include in this repository the script `get_qc.py` that summarize the paired-tools stats file and present them in precentage values in addition to absolute values.
At step :ref:`Removig PCR duplicates<DUPs>` you used the flag ``--output-stats``, generating a stats file in addition to the pairsam output (e.g. --output-stats stats.txt). The stats file is an extensive output of pairs statistics as calculated by pairtools, including total reads, total mapped, total dups, total pairs for each pair of chromosomes etc'. Although you can use directly the pairtools stats file as is to get informed on the quality of the Omni-C library, we find it easier to focus on a few key metrics. We include in this repository the script `get_qc.py` that summarize the paired-tools stats file and present them in percentage values in addition to absolute values.

The images below explains how the values on the QC report are calculated:

Expand All @@ -16,7 +16,7 @@ The images below explains how the values on the QC report are calculated:

.. code-block:: console
python3 ./microc/get_qc.py -p <stats.txt>
python3 ./Omni-C/get_qc.py -p <stats.txt>
**Example:**
Expand Down
2 changes: 1 addition & 1 deletion docs/source/pre_alignment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Pre-Alignment

For downstream steps you will need a genome file, genome file is a tab delimited file with chromosome names and their respective sizes. If you don't already have a genome file follow these steps:

1. generate an index file for your reference, a reference file with only the main chromosomes should be used (e.g. without alternative or unplaced chromosomes).
1. Generate an index file for your reference, a reference file with only the main chromosomes should be used (e.g. without alternative or unplaced chromosomes).

**Command:**

Expand Down

0 comments on commit a2d56a4

Please sign in to comment.