Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

PyPI version Downloads Python 3.8 License: GPL v3 Twitter Follow

ToulligQC is dedicated to the QC analyses of Oxford Nanopore runs. This software is written in Python and developped by the GenomiqueENS core facility of the Institute of Biology of the Ecole Normale Superieure (IBENS).

Click on following image to see an report example. An online help is available to better understand graphics generated with ToulligQC when clicking on the icon.

Report preview

Authors / Support

Karine Dias, Bérengère Laffay, Lionel Ferrato-Berberian, Sophie Lemoine, Morgane Thomas-Chollier, Stéphane Le Crom and Laurent Jourdren.

Support is availlable on GitHub issue page and at toulligqc at bio.ens.psl.eu.

Table of Contents

1. Get ToulligQC

1.1 Local

This option is also suitable if you are interested in further developments of the package, but requires a little bit more hands-on. Install the dependencies required and clone the repository locally.

$ git clone https://github.com/GenomicParisCentre/toulligQC.git
# X.X here is the version of ToulligQC to install
$ git checkout vX.X
$ cd toulligqc && python3 setup.py build install
  • Requirements

ToulligQC is written with Python 3. To run ToulligQC without Docker, you need to install the following Python modules:

  • matplotlib
  • plotly
  • h5py
  • pandas
  • numpy
  • scipy
  • scikit-learn

1.2 Using a PyPi package

ToulligQC can be more easlily installed with a pip package availlable on the PyPi repository. The following command line will install the latest version of ToulligQC:

$ pip3 install toulligqc

1.3 Using Docker

ToulligQC and its dependencies are available through a Docker image. To install docker on your system, go to the Docker website (https://docs.docker.com/engine/installation/). Even if Docker can run on Windows or macOS virtual machines, we recommend to run ToulligQC on a Linux host.

  • Docker image recovery

An image of ToulligQC is hosted on the Docker hub on the genomicpariscentre repository (genomicpariscentre/toulligqc).

$ docker pull genomicpariscentre/toulligqc:latest

  • Launching Docker image with docker run

$ docker run -ti \
             -u $(id -u):$(id -g) \
             --rm \
             -v /path/to/basecaller/sequencing/summary/file:/path/to/basecaller/sequencing/summary/file \
             -v /path/to/basecaller/sequencing/telemetry/file:/path/to/basecaller/telemetry/summary/file \
             -v /path/to/result/directory:/path/to/result/directory \
             toulligqc:latest

2. Usage

ToulligQC is adapted to RNA-Seq along with DNA-Seq and it is compatible with 1D² runs. This QC tool supports only Guppy basecalling ouput files. It also needs a single FAST5 file (to catch the flowcell ID and the run date) if a telemetry file is not provided. Flow cells and kits version are retrieved using the telemetry file. ToulligQC can take barcoding samples by adding the barcode list as a command line option.

To do so, ToulligQC deals with different file formats: gz, tar.gz, bz2, tar.bz2 and .fast5 to retrieve a FAST5 information. This tool will produce a set of graphs, statistic file in plain text format and a HTML report.

To run ToulligQC you need the Guppy basecaller output files : sequencing_summary.txt and sequencing_telemetry.js. This can be compressed with gzip or bzip2. You can use your initial Fast5 ONT file too. ToulligQC can perform analyses on your data if the directory is organised as the following:

RUN_ID
├── sequencing_summary.txt
└── sequencing_telemetry.js

for 1D² analysis:

RUN_ID
├── sequencing_summary.txt
├── sequencing_telemetry.js
└── 1dsq_analysis
    └── sequencing_1dsq_summary.txt

For a barcoded run you can add the barcoding files generated by Guppy barcoding_summary_pass.txt and barcoding_summary_fail.txt to ToulligQC or a single file sequencing_summary_all.txt containing sequencing_summary and barcoding_summary information combined.

For the barcode list to use in the command line options, ToulligQC handle the following naming schemes: BCXX, RBXX, NBXX and barcodeXX where XX is the number of the barcode. The barcode naming schemes are case insensitive.

This is a directory for 1D² analysis with barcoding files:

  RUN_ID
  ├── sequencing_summary.txt
  ├── sequencing_telemetry.js
  └── 1dsq_analysis
      ├── barcoding_summary_pass.txt
      ├── barcoding_summary_fail.txt
      └── sequencing_1dsq_summary.txt

2.1 Command line

  • Options

General Options:

usage: ToulligQC V2.2.1 -a SEQUENCING_SUMMARY_SOURCE [-t TELEMETRY_SOURCE]
                        [-f FAST5_SOURCE] [-n REPORT_NAME]
                        [--output-directory OUTPUT] [-o HTML_REPORT_PATH]
                        [--data-report-path DATA_REPORT_PATH]
                        [--images-directory IMAGES_DIRECTORY]
                        [-d SEQUENCING_SUMMARY_1DSQR_SOURCE] [-b]
                        [-l BARCODES] [--quiet] [--force] [-h] [--version]

required arguments:
  -a SEQUENCING_SUMMARY_SOURCE, --sequencing-summary-source SEQUENCING_SUMMARY_SOURCE
                        Basecaller sequencing summary source, can be
                        compressed with gzip (.gz) or bzip2 (.bz2)
  -t TELEMETRY_SOURCE, --telemetry-source TELEMETRY_SOURCE
                        Basecaller telemetry file source, can be compressed
                        with gzip (.gz) or bzip2 (.bz2)
  -f FAST5_SOURCE, --fast5-source FAST5_SOURCE
                        Fast5 file source (necessary if no telemetry file),
                        can also be in a tar.gz/tar.bz2 archive or a directory

optional arguments:
  -n REPORT_NAME, --report-name REPORT_NAME
                        Report name
  --output-directory OUTPUT
                        Output directory
  -o HTML_REPORT_PATH, --html-report-path HTML_REPORT_PATH
                        Output HTML report
  --data-report-path DATA_REPORT_PATH
                        Output data report
  --images-directory IMAGES_DIRECTORY
                        Images directory
  -d SEQUENCING_SUMMARY_1DSQR_SOURCE, --sequencing-summary-1dsqr-source SEQUENCING_SUMMARY_1DSQR_SOURCE
                        Basecaller 1dsq summary source
  -b, --barcoding       Option for barcode usage
  -l BARCODES, --barcodes BARCODES
                        Coma separated barcode list (e.g.
                        BC05,RB09,NB01,barcode10)
  --quiet               Quiet mode
  --force               Force overwriting of existing files
  -h, --help            Show this help message and exit
  --version             show program's version number and exit

  • Examples

Example with optional arguments:

$ toulligqc --report-name FAF0256 \
            --telemetry-source /path/to/basecaller/output/sequencing_telemetry.js \
            --sequencing-summary-source /path/to/basecaller/output/sequencing_summary.txt \
            --sequencing-summary-1dsqr-source /path/to/basecaller/output/sequencing_1dsqr_summary.txt \ # (optional)
            --html-report-path /path/to/output/report.html

Example with optional arguments to deal with barcoded samples:

$ toulligqc --report-name FAF0256 \
            --barcoding \
            --telemetry-source /path/to/basecaller/output/sequencing_telemetry.js \
            --sequencing-summary-source /path/to/basecaller/output/sequencing_summary.txt \
            --sequencing-summary-source /path/to/basecaller/output/barcoding_summary_pass.txt \         # (optional)
            --sequencing-summary-source /path/to/basecaller/output/barcoding_summary_fail.txt \         # (optional)
            --sequencing-summary-1dsqr-source /path/to/basecaller/output/sequencing_1dsqr_summary.txt \ # (optional)
            --sequencing-summary-1dsqr-source /path/to/basecaller/output/barcoding_summary_pass.txt \   # (optional)
            --sequencing-summary-1dsqr-source /path/to/basecaller/output/barcoding_summary_fail.txt \   # (optional)
            --html-report-path /path/to/output/report.html \
            --data-report-path /path/to/output/report.data \                                            # (optional)
            --barcodes BC01,BC02,BC03

2.2 Sample data

We provide sample raw data that can be used to launch and evaluate our software. This demo data has been generated using a MinION MKIb with a R9.4.1 flowcell (FLO-MIN106) in 1D (SQK-LSK108) mode with barcoded samples (BC01, BC02, BC03, BC04, BC05 and BC07). Data acquisition was performed using MinKNOW 1.11.5 and basecalling/demultiplexing was completed using Guppy 3.2.4.

  • First download and uncompress sample data:
$ wget http://outils.genomique.biologie.ens.fr/leburon/downloads/toulligqc-example/toulligqc_demo_data.tar.bz2
$ tar -xzf toulligqc_demo_data.tar.bz2
$ cd toulligqc_demo_data
  • Then, you can launch the ToulligQC analysis of the demo data with the run-toulligqc-with-docker.sh script if you want to use a Docker container:
$ ./run-toulligqc-with-docker.sh
  • Or with run-toulligqc.sh script if ToulligQC is already installed on your system:
$ ./run-toulligqc.sh
  • Of course, you can also launch manually ToulligQC on the sample data with the following command line:
$ toulligqc \
    --report-name               'ToulligQC Demo Data' \
    --barcoding \
    --telemetry-source          sequencing_telemetry.js \
    --sequencing-summary-source sequencing_summary.txt \
    --sequencing-summary-source barcoding_summary_pass.txt \
    --sequencing-summary-source barcoding_summary_fail.txt \
    --barcodes                  BC01,BC02,BC03,BC04,BC05,BC07 \
    --output-directory          output

With this scripts or command line, ToulligQC will create an output directory with output HTML report. More information about this sample data and scripts can be found in the README file of the tar archive.

3.Output

If the options --output-directory or --html-report-path are not provided, ToulligQC generates all below files and images in the current directory. If no report-name is given, ToulligQC creates a default report name.

  • A HTML report with (the path of this file can be defined using --html-report-path command line option ):

    • useful information about the sequencing run given as input
    • a read count and a read length histograms about different read types
    • a graph checking that the sequencing was homogeneous during a run
    • a graph allowing to locate potential flowcell spatial biaises
    • graphs representing the PHRED score distribution and the density distribution across read types
    • a collection of graphs displaying length/speed/quality or number of sequences over sequencing time
    • a set of graphs providing quality, length information and read counts for each barcode

  • A report.data log file containing (the path of this file can be defined using --data-report-path command line option ):

    • information about ToulligQC execution
    • environment variables
    • full statistics are provided for complementary analyses if needed : the information by modules is retained in a key-value form, the prefix of a key being the report data file id of the module
    • the nucleotide rate per read

If you choose to use a directory output (default choice), the output will be organised like this :
RUN_ID
├── report.html
├── report.data
└── images
    └── plots.html
    └── plot.png

About

A post sequencing QC tool for Oxford Nanopore sequencers

Resources

License

Stars

Watchers

Forks

Packages

No packages published