alnstats

alnstats is a high-performance command-line tool designed to calculate yield and duplicate statistics from BAM, SAM, or CRAM alignment files. Its duplicate statistics are designed to be compatible with and resemble those produced by Picard's MarkDuplicates.

Features

Yield Statistics: Computes read counts, total bases, and clipped yield for both single-end (SE) and paired-end (PE) data.
Duplicate Statistics: Provides detailed metrics on duplication rates, optical duplicates, and estimated library size.
Aggregation: Supports aggregating statistics at the Sample or Library level, based on the Read Group information in the alignment header.
Format Support: Handles BAM, SAM, and CRAM files (CRAM requires a FASTA reference).
Performance: Real-time progress updates including processing speed and interval duration.

Usage

alnstats [OPTIONS] --input <INPUT>

Parameters and Options

Option	Short	Long	Description
Input	`-i`	`--input`	Required. The alignment file (BAM/SAM/CRAM) to process.
FASTA	`-f`	`--fasta`	The reference FASTA file. Required for decoding CRAM files.
Metrics	`-m`	`--metrics`	Output file path for duplicate metrics (JSON format).
Yield	`-y`	`--yield`	Output file path for aggregate yield results (JSON format).
Tags	`-d`	`--duplicate-type-tag`	Tag names used for marking duplicate types (default: `dt`). Can be specified multiple times.
Aggregation	`-a`	`--aggregation`	Level of data aggregation: `sample` or `library` (default: `library`).
Verbosity	`-v`	`--verbose`	Increase logging verbosity (can be used multiple times).
Help	`-h`	`--help`	Print help information.
Version	`-V`	`--version`	Print version information.

Output Files

alnstats generates two main types of output in JSON format, depending on the options provided.

Duplicate Metrics (`--metrics`)

This file contains statistics about duplicate reads, aggregated by the chosen level (Sample or Library).

Field	Description
`UNPAIRED_READS_EXAMINED`	Number of mapped reads examined which belong to an unpaired read or a pair where one end is unmapped.
`READ_PAIRS_EXAMINED`	Number of mapped read pairs examined.
`SECONDARY_OR_SUPPLEMENTARY_RDS`	Number of reads marked as secondary or supplementary alignments (ignored for duplicate counting).
`UNMAPPED_READS`	Total number of unmapped reads encountered.
`UNPAIRED_READ_DUPLICATES`	Number of unpaired reads marked as duplicates.
`READ_PAIR_DUPLICATES`	Number of read pairs marked as duplicates.
`READ_PAIR_OPTICAL_DUPLICATES`	Number of read pairs marked as optical duplicates (based on the provided tags).
`PERCENT_DUPLICATION`	The percentage of reads that are marked as duplicates.
`ESTIMATED_LIBRARY_SIZE`	An estimate of the number of unique molecules in the library.

Yield Results (`--yield`)

This file contains yield statistics for the processed alignments.

Paired-End (PE) Yield

For paired-end data, metrics are provided for both first_end and second_end:

Field	Description
`n_reads`	Total number of mapped reads.
`max_length`	The maximum read length observed.
`clipped_yield`	Total number of bases that are aligned (excluding soft/hard clips).
`total_yield`	Total number of bases in the reads (including clips).

Single-End (SE) Yield

For single-end data, the same fields (n_reads, max_length, clipped_yield, total_yield) are provided at the root of the aggregation key.

Requirements

Rust 1.89.0 or newer.
For CRAM processing, the corresponding reference FASTA file must be available.

Installation

To build alnstats from source:

cargo build --release

The resulting binary will be located at target/release/alnstats.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
src		src
.gitignore		.gitignore
COPYRIGHT		COPYRIGHT
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

alnstats

Features

Usage

Parameters and Options

Output Files

Duplicate Metrics (`--metrics`)

Yield Results (`--yield`)

Paired-End (PE) Yield

Single-End (SE) Yield

Requirements

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

alnstats

Features

Usage

Parameters and Options

Output Files

Duplicate Metrics (--metrics)

Yield Results (--yield)

Paired-End (PE) Yield

Single-End (SE) Yield

Requirements

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Duplicate Metrics (`--metrics`)

Yield Results (`--yield`)

Packages