FASTQ demux

A simple program to demultiplex Illumina FASTQ files based on barcodes in the FASTQ headers

Installation
Basic usage
- Sample sheet
Docker
Performance

Installation

git clone https://github.com/Molmed/fastq_demux.git
pip install -r requirements.txt .

Alternatively, use a docker image (see below).

Basic usage

For detailed usage instructions, run the script with the --help switch:

fastq_demux --help

To demultiplex a FASTQ file or a pair of FASTQ files based on the barcodes present in the FASTQ headers, supply a file with forward reads (with --R1), reverse reads (with --R2, if paired-end) and a tab-separated sample sheet providing a barcode-to-sample mapping (with --samplesheet).

fastq_demux \
  --R1 tests/dual-index-short_Undetermined_S0_L001_R1_001.fastq.gz \
  --samplesheet tests/samplesheet.tsv

For each sample, a forward (R1) and a reverse (R2, if input is paired-end) FASTQ file will be written to the output folder. In addition, the reads not matching any of the barcodes in the sample sheet will be written to the R1 (and R2) files for unknown barcodes. Summary statistics on the number of reads per barcode etc. are written to a json-formatted output file.

If the index sequences to be used for demultiplexing are available as regular reads in separate FASTQ files, you can specify these with the --I1 (for the i7-index) and, in the case of dual index, --I2 (for the i5-index) parameters. Any index information present in the FASTQ headers will in that case be ignored.

fastq_demux \
  --R1 tests/dual-index_Undetermined_S0_L001_R1_001.fastq.gz \
  --I1 tests/dual-index_Undetermined_S0_L001_I1_001.fastq.gz \
  --I2 tests/dual-index_Undetermined_S0_L001_I2_001.fastq.gz \
  --samplesheet tests/samplesheet.tsv

Barcode mismatches

It is possible to allow mismatches when comparing the read barcode to the samplesheet barcodes with the --mismatches parameter. In case of dual indexes, the number of mismathces are tolerated for each index independently. Note that if a read barcode cannot be unambiguously matched to a samplesheet barcode, an exception will be thrown and the number of allowed mismatches must be decreased.

Sample sheet

The sample sheet should have two or three columns for single or dual index reads, respectively. The columns are SampleID, P7-index and P5-index.

Here is an example for a sample sheet accompanying a dual-index FASTQ file:

Sample_1    GGGGGGGG    AGATCTCG
Sample_2    GAAGATTT    TTTACTCT
Sample_3    GAAGATTT    AAAACGCC

Docker

To build a docker image:

git clone https://github.com/Molmed/fastq_demux.git
docker build -t fastq_demux:master .

Run the docker image without arguments to see usage:

docker run fastq_demux:master

Example usage with a data directory mounted into the container from the local filesystem:

docker run -v $(pwd)/tests:/data fastq_demux:master \
--R1 /data/dual-index-short_Undetermined_S0_L001_R1_001.fastq.gz \
--samplesheet /data/samplesheet.tsv

Performance

FASTQ demux uses python's zlib module for gzip compression and decompression which is quite slow. Most of the runtime will be spent compressing data. Therefore, if possible, it's much quicker to work with uncompressed files and do the compression elsewhere. You can give the program uncompressed files. With the --no-gzip-compression command line switch, uncompressed FASTQ files are written.

As an example, below is a table with the CPU time required to demultiplex 10 samples from 5M dual-index read pairs, using combinations of compressed and uncompressed FASTQ as input/output:

`fastq.gz/fastq.gz`	`fastq.gz/fastq`	`fastq/fastq.gz`	`fastq/fastq`
`14m56.664s`	`1m5.816s`	`14m26.243s`	`0m35.434s`

It's obvious that compressing output on-the-fly is very expensive.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
fastq_demux		fastq_demux
requirements		requirements
tests		tests
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FASTQ demux

Installation

Basic usage

Barcode mismatches

Sample sheet

Docker

Performance

About

Releases

Packages

Languages

License

Molmed/fastq_demux

Folders and files

Latest commit

History

Repository files navigation

FASTQ demux

Installation

Basic usage

Barcode mismatches

Sample sheet

Docker

Performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages