Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?


Failed to load latest commit information.
Latest commit message
Commit time

Release Downloads Issues



ntHash is a recursive hash function for hashing all possible k-mers in a DNA/RNA sequence.

Installation & Usage

Download the repo (either from the releases section or close using git clone --recurse-submodules Generate a cmake buildsystem in an arbitrary directory (e.g. release), by running the following command in the project's root:

cmake -S . -B release

Then, build the project and its dependencies using:

cmake --build release --target all

The project's executable (./nthash), static library (libnthash.a), and tests (nthash_tests) will be generated alongside btllib in the release folder.

Tests can be run using ctest:

cd release && ctest


After building the project, use the output binary file to generate hash values from input data and store the results on disk:

Usage: nthash [options] files 

Positional arguments:
files           Input sequence files [required]

Optional arguments:
-v --version    prints version information and exits [default: false]
-k              k-mer size [required]
-o              Output file (for -f collect) or directory path [required]
-f              Output file organization (create files containing hashes for each 'file', 'record', or 'collect' all hashes into a single file [default: "file"]
-h              Number of hashes per k-mer [default: 1]
-s              Input spaced seed patterns separated by commas (e.g. 1110111,11011011). Performs k-mer hashing if no value provided.
--long          Optimize file reader for long sequences (>5kbp) [default: false]
--binary        Output hashes in binary files (otherwise plain text) [default: false]
--verbose       Print progress to stdout [default: false]

For example, given two input files 1.fa and 2.fa, two hash values for each 64-mer can be saved to a binary file called out.bin by running this command in release:

./nthash -k 64 -h 2 -o out.bin -f collect --binary --verbose 1.fa 2.fa

In the plain text format (tab separated values), the rows consist of the hashes of k-mers/seeds in the same order seen in the input sequences. For binary output, these values are dumped without any delimiters. E.g. in the case of out.bin generated above, the first and second hashes of the first k-mer are stored in the beginning of the file, followed by the first and second hashes of the second k-mer.

Static Library Usage

To use ntHash in a C++ project:

  • Link the code with libnthash.a (i.e. pass -L path/to/nthash/release -l nthash to the compiler).
  • Add the include directory (pass -I path/to/nthash/include to the compiler).
  • Repeat for btllib (add flags: -L path/to/nthash/release/btllib -l btllib -I path/to/nthash/vendor/btllib/include)
  • Import ntHash in the code using #include <nthash/nthash.hpp>.
  • Access ntHash classes from the nthash namespace.


Generally, the nthash::NtHash and nthash::SeedNtHash classes are used for hashing sequences:

nthash::NtHash nth("TGACTGATCGAGTCGTACTAG", 1, 5);  // 1 hash per 5-mer
while (nth.roll()) {
    // use nth.hashes() for canonical hashes
    //     nth.get_forward_hash() for forward strand hashes
    //     nth.get_reverse_hash() for reverse strand hashes
std::vector<std::string> seeds = {"10001", "11011"};
nthash::SeedNtHash nth("TGACTGATCGAGTCGTACTAG", seeds, 3, 5);
while (nth.roll()) {
    // nth.hashes()[0] = "T###T"'s first hash
    // nth.hashes()[1] = "T###T"'s second hash
    // nth.hashes()[2] = "T###T"'s third hash
    // nth.hashes()[3] = "TG#CT"'s first hash

Refer to docs for more information.



Parham Kazemi, Johnathan Wong, Vladimir Nikolić, Hamid Mohamadi, René L Warren, Inanç Birol, ntHash2: recursive spaced seed hashing for nucleotide sequences, Bioinformatics, 2022;, btac564,


Hamid Mohamadi, Justin Chu, Benjamin P Vandervalk, and Inanc Birol. ntHash: recursive nucleotide hashing. Bioinformatics (2016) 32 (22): 3492-3494. doi:10.1093/bioinformatics/btw397