Java implementation of the BLAKE2b cryptographic hash function
C Java
Latest commit c6ceb16 Feb 20, 2014 @alphazero MOD scope of Blake2b.Engine is package level. ADD project README.
   - TESTED.
   - ADD the emits of blake2btest for reference.


Blake2b is a Java implementation of the BLAKE2b cryptographic hash function created by Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O'Hearn, and Christian Winnerlein. (See Blake2 project site for details and authoritative information about the BLAKE2b digest.)

This implementation was made with close attention to the reference C implementation of Samuel Neves ( The quite excellent Go implementation by Dmitry Chestnykh was also reviewed and referenced for optimization inspirations.


I required a high performance SHA1 replacement with incremental hash capabilities for a pet project (a content addressable distributed temporal DB). BLAKE2b seemed to fit the bill and here we are.

"Cryptographic security" beyond collision resistance was/is a none concern, and I have zero/0 relevant subject matter expertise. For example, I have no clue as to how one would erase traces of the key used with the Mac feature of the algorithm from the JVM heap without resorting to JNI/JNA (?) and no such attempt has been made in this implementation. Fair warning, please do note it.


This implementation is provided with a suite of tests miminally covering the reference implementation KAT and Keyed-KAT, covering basic digest and MAC features (respectively). These may be inspected in the src/test/ source fork of this repository.

Additionally, a C test program emitting a tree based hash was used to confirm the Blake2b.Tree output. Augmenting the C test program to test salt, and personal is on the TODO list. But do note that these 2 features have not been tested yet, though the confidence level for these is fairly maximal. (It is on the TODO list.)


Written by Joubin Mohammad Houshyar, 2014.

To the extent possible under law, the author has dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty.


Blake2b provides full support for the algorithm per the reference C implementation on the JVM, as adapted for Java semantics.

The API very closely follows the MessageDigest's relevant public methods e.g. void update (byte[] input, int offset, int len), etc.

As of now, fully configurable Digest, MAC, and incremental hash (aka Tree) are provided.

(The parallel implementation of the algorithm is on the immediate TODO list.)

Installation and Use


First you need to clone this repo and install the product in your local .m2 repository. (Getting this project to a central maven repository is on the TODO list.)

So, clone it in some happy place:

git clone

Go there:

cd blake2b

And build and (locally) install it:

mvn install


To include Blake2b as a dependency in your Maven project, add the following to the <dependencies>section of your project pom.xml:


If you just want the jar, you can grab it from the project target/ after mvn install or mvn package:



The hashing API is defined by the top level Blake2b interface. This interface is implemented by the Blake2b.Engine and it is indirectly accessible via the semantic inner classes of Blake2b. These methods allow for both "streaming" and basic hashing of byte[] input data.

The API mimics the relevant sub-set of the standard Java MessageDigest class and Blake2b.Param implements the (tag) interface AlgorithmParameterSpec, as preparatory support for possible JCA compliance. But that is not on my TODO list as of now. But care has been taken to make that path easy for you, should you wish to do so yourself.

*** A note regarding input constraint checks ***

Typically one prefers a maximally defensive and pedantic implementation of functions that assert constraints on input arguments. But given that Blake2b is intended for high performance applications, the reference implementation (Blake2b.Engine) does not do so. So, pass in null and expect a NullPointerException, etc. Do note it.


update(..) is typically (*) used for sequential ("stream") updates of the digest.

The (3) variants of update:

  • void update (byte input) to update with a single byte
  • void update (byte[] input) to update with a byte[]
  • void update (byte[] input, int offset, int len) to update with a slice of a byte[]

Update methods treat all input args as immutable (const equivalent).

(*) Note that if your use case requires the (final) digest output of the algorithm to be copied to a slice of (user) supplied byte[], even in the non-streaming case, you need to invoke an update.


Given that final is a keyword in Java and finalize has specific semantics in context of Java Objects, digest(..) is the Java equivalent of a hash final(..) method.

The (3) variants of digest:

  • byte[] digest() to emit the final hash of the BLAKE2b state.
  • byte[] digest(byte[] input) to update the state with input and emit the final hash.
  • void digest(byte[] out, int off, int len) writes the final hash to the given byte[]

All digest(..) methods returning byte[] are guaranteed to return a non-null value.

Configuration and Usage

A Blake2b implementation can be used simply with default configuration parameters, or optionally, you may choose to set specific configuration parameters of the algorithm.

Default Settings (Blake2b.Param.Default)

Digest Length:   64 bytes 
Key:             none
Salt:            none
Personalization: none 
Depth:           1
Fanout:          1
Leaf Length:     0
Inner Length:    0
Node Depth:      0
Node Offset:     0

(Note that all Blake2b semantic factory zero-arg methods use the above default configuration parameters.)

Configuration Via Blake2b.Param

You can change the default configuration via the Blake2b.Param class.

To obtain an instance of the Param class, simply instantiate it:

// get an instance of the default configuration parameters

Blake2b.Param param = new Blake2b.Param(); 

Param exposes a fluent setter API allowing for chaining of setXXx methods.

For example:

// get an instance of the default configuration parameters
// and customize it with key, salt, and personalization,
// and output a 160bit hash (say for SHA1 replacement)

Blake2b.Param param = new Blake2b.Param().
   setDigestLength( 20 ). 
   setKey ( keyBytes ).
   setSalt ( saltBytes ).
   setPersonal ( personalizationBytes );

The constraints on the configuration parameter settings are detailed in Blake2b.Spec.

Note that Blake2b.Param pedantically asserts all constraints on input args.

General Hashing with BLake2b.Digest

Blake2b.Digest instantiates the reference implementation of Blake2b isolating you from the current/actual implementation (Blake2b.Engine).

// using default params
final Blake2b blake2b = Blake2b.Digest.newInstance();        


// using custom params
final Blake2b blake2b = Blake2b.Digest.newInstance( param ); 

Blake2b.Digest also exposes a convenience factory method for the common case of a customized digest length:

// Just config the output digest length - here 160bit output ala SHA1
final Blake2b blake2b = Blake2b.Digest.newInstance( 20 );  

Note that you can use the Blake2b instance obtained via Blake2b.Digest factory methods to perform any of the features of the BLAKE2b algorithm with appropriate configuration via Blake2b.Param, should you prefer.

Message Authentication Codes with Blake2b.Mac

MAC semantics are exposed by Blake2b.Mac class. This (convenience) class exposes a few factory methods.

final Blake2b mac = Blake2b.Mac.newInstance ( theKeyBytes );      // basic MAC

and the common case of MAC with custom digest length

final Blake2b mac = Blake2b.Mac.newInstance ( theKeyBytes, 20 );  // 160bit MAC

You can also use a (which must support encoding -- see Key javadocs for details) instead of a raw byte[] array:


final Key thekey = .. // not provided by Blake2b library

// 160bit MAC with Key
final Blake2b mac = Blake2b.Mac.newInstance ( theKey, 20 );  

Incremental Hashing with Blake2b.Tree

The Blake2b.Tree class provides a convenient semantic API for incremental hasing with Blake2b.

// incremntally hash a stream of bytes using Blake2b.
// here a tree of depth 2, fanout 3, leaf length of 4096, inner length of 64,
// and a final (tree) hash output of length 20)

Blake2b.Tree tree = Blake2b.Tree (2, 3, 4096, 64, 20);

// assume that we got our 3 chunks obtained elsewhere
// and these are byte[] of max "leaf length" bytes (here 4096).
// Also note that nodes will output a hash of length "inner length"

// hash the chunks

byte[] hash_00 = tree.getNode (0, 0).digest ( chunk00 ); 
byte[] hash_01 = tree.getNode (0, 1).digest ( chunk01 );
byte[] hash_02 = tree.getNode (0, 2).digest ( chunk02 ); // implicit 'last node'

// get the final tree hash output

final Blake2b digest = tree.getRoot(); // implicit node (1, 0)
digest.update (hash_00);
digest.update (hash_01);
digest.update (hash_02);
final byte[] hash = digest.digest();


Performance as of now is generally in par with JRE's SHA1, however SHA1 appears to be a bit faster in some cases. The relative performance between these two digests is also somewhat sensitive to the input data size. JRE's MD5 is significantly faster than both.

The hotspot is, as expected, is in the compress() function of the Blake2b.Engine. Given the waterfall data flow dependency of this critical section, there is little room for optimization by the JIT.

So, given that this function is nearly fully inlined (per above JIT considerations), it would appear that the aggrevating issue is the necessity for decoding of 64bit (long) values as byte[]s. (In the reference C implementation on platforms -- with native little-endian representation of integer values -- this en/decoding is a simple matter of casting to/from void* from/to uint64_t*.)

Further possible optimizations

Unrolling the Rounds loop

For the current byte[] (object) based implementation, a final unrolling of the G rounds can be attempted. This is on the near-term TODO list.

Using Hardcoded Array Offseting

Compress() is currently handed an array with an offset for mapping of the compression buffer from update() and digest(). So mapping of the compression input buffer to the m[] vector incurrs the cost of 128 offset calculations (e.g. m[13] |= ((long) b[ offset + 105 ] & 0xFF ) << 8;). The alternative would be to incurr the System.arraycopy equivalent of the 2 memcpy calls in update function and pass a fixed offset to compress() and use hardcoded offsets (e.g. m[13] |= ((long) b[ 105 ] & 0xFF ) << 8;).

This was a design decision breaking with the C reference implementation to allow for direct mapping of the m[] vector of the compress from user input byte[], and further obviating the need for the 2 memcpys of the reference implementation. Reverting to a variant closely following the C and using a double sized compression buffer and re-evaluting the performance is on the near-term TODO list.

Using ByteBuffer

Using the native byte[] backing buffer of a direct, & page aligned, allocated ByteBuffer may possibly afford a bit of efficiencies, but the development platform available to me (a Mac Air) does not support this optional feature of ByteBuffer. You're welcome to try this and I would be interested in the feedback as to your findings.


A variant Engine implementation using sun.misc.Unsafe is vaguely entertained for future. The fact that offering this varient would (optimally) requires support for pluggable engines (and thus possibly full JCA/JCE compliance) puts it off the near-term TODO list. Frankly it does not seem that the issue is in the use of byte[] object, but no claim to authoritative opinion is made in this regard. If you know better, do let me know!


To the Eternal Absolute, The One, ~!!! Ahura-Mazda !!!~ even ~!!! Al-Aziz-Al-Hakim !!!~, The Lord of Sentient Realms, The True in Love. The Friend.

Feb. 2014. bushwick. bk. nyc.