Skip to content
/ ntHits Public

Identifying repeats in high-throughput sequencing data


Notifications You must be signed in to change notification settings


Repository files navigation



ntHits is a tool for efficiently counting and filtering k-mers based on their frequencies.


  • C++ compiler with c++17 and OpenMP support
  • Meson
  • btllib (>=1.7.2)
  • Catch2, only for running tests

ntHits uses argparse for command-line argument parsing which is built-in as a submodule (no further installation required).


NOTE: If you are installing btllib from its source, run its ./compile script and add the following environment variables:

export CPPFLAGS="-isystem /path/to/btllib/install/include $CPPFLAGS"
export LDFLAGS="-L/path/to/btllib/install/lib -lbtllib $LDFLAGS"

Download the latest release and run the following command in the project's root directory to create a buildsystem in the build folder:

meson setup build

Then, cd into the build folder and compile ntHits using:


This will generate two binary files in the build folder: nthits for generating the desired data structure containing the k-mers and if possible, their counts; and nthits-bfq for querying the output if it's a (counting) Bloom filter.


Usage: nthits --frequencies VAR [--min-count VAR] [--max-count VAR] [--kmer-length VAR] [-h] [--error-rate VAR] [--seeds VAR] [--threads VAR] [--solid] [--long-mode] --out-file VAR out_type files

Filters k-mers based on counts (cmin <= count <= cmax) in input files

Positional arguments:
  out_type              Output format: Bloom filter 'bf', counting Bloom filter ('cbf'), or table ('table') [required]
  files                 Input files [nargs: 0 or more] [required]

Optional arguments:
  -f, --frequencies     Frequency histogram file (e.g. from ntCard) [required]
  -cmin, --min-count    Minimum k-mer count (>=1), ignored if using --solid [default: 1]
  -cmax, --max-count    Maximum k-mer count (<=254) [default: 254]
  -k, --kmer-length     k-mer length, ignored if using spaced seeds (-s) [default: 64]
  -h, --num-hashes      Number of hashes to generate per k-mer/spaced seed [default: 3]
  -p, --error-rate      Target Bloom filter error rate [default: 0.0001]
  -s, --seeds           If specified, use spaced seeds (separate with commas, e.g. 10101,11011) 
  -t, --threads         Number of parallel threads [default: 4]
  --solid               Automatically tune 'cmin' to filter out erroneous k-mers 
  --long-mode           Optimize data reader for long sequences (>5kbp) 
  -v                    Level of details printed to stdout (-v: normal, -vv detailed) 
  -o, --out-file        Output file's name [required]

Copyright 2022 Canada's Michael Smith Genome Science Centre

If the output data structure is a Bloom filter (or CBF), it can be queried by either using the nthits-bfq tool, or using btllib's API.

ntHits Bloom Filter Query Tool

Usage: nthits-bfq [-h] [--cbf] [--seeds VAR] [--silent] bf_path

Query tool for ntHits' output Bloom filter

Positional arguments:
  bf_path       Input Bloom filter file [required]

Optional arguments:
  -h, --help    shows help message and exits 
  -v, --version prints version information and exits 
  --cbf         Treat input file as a counting Bloom filter and output k-mer counts 
  -s, --seeds   Spaced seed patterns separated with commas (e.g. 10101,11011) 
  --silent      Don't print logs to stdout 

Copyright 2022 Canada's Michael Smith Genome Science Centre

btllib's Bloom Filter API

C++ example:

#include <btllib/bloom_filter.hpp>
#include <btllib/counting_bloom_filter.hpp>
#include <string>

int main() {
  btllib::KmerBloomFilter bf(path_to_bloom_filter);
  // or btllib::KmerCountingBloomFilter8 
  std::string kmer = "AGCTATCAGTCGA";
  std::cout << bf.contains(kmer) << std::endl;
  return 0;

Python example:

import btllib

bf = btllib.KmerBloomFilter(path_to_bloom_filter)
# or btllib.KmerCountingBloomFilter8

If using spaced seeds, btllib's BloomFilter and CountingBloomFilter classes should be used instead. In this case, refer to btllib's docs and examples to query the Bloom filters using hashes generated from a SeedNtHash object.