Blake2b is a Java implementation of the BLAKE2b cryptographic hash function created by Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O'Hearn, and Christian Winnerlein. (See Blake2 project site for details and authoritative information about the BLAKE2b digest.)
This implementation was made with close attention to the reference
C implementation of Samuel Neves (email@example.com). The quite excellent
Go implementation by Dmitry Chestnykh was also reviewed and referenced for optimization inspirations.
I required a high performance
SHA1 replacement with incremental hash capabilities for a pet project (a content addressable distributed temporal DB). BLAKE2b seemed to fit the bill and here we are.
"Cryptographic security" beyond collision resistance was/is a none concern, and I have zero/0 relevant subject matter expertise. For example, I have no clue as to how one would erase traces of the key used with the
Mac feature of the algorithm from the JVM heap without resorting to
JNI/JNA (?) and no such attempt has been made in this implementation. Fair warning, please do note it.
This implementation is provided with a suite of tests miminally covering the reference implementation
Keyed-KAT, covering basic digest and MAC features (respectively). These may be inspected in the
src/test/ source fork of this repository.
C test program emitting a tree based hash was used to confirm the
Blake2b.Tree output. Augmenting the
C test program to test salt, and personal is on the TODO list. But do note that these 2 features have not been tested yet, though the confidence level for these is fairly maximal. (It is on the TODO list.)
Written by Joubin Mohammad Houshyar, 2014.
To the extent possible under law, the author has dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty. http://creativecommons.org/publicdomain/zero/1.0/
Blake2b provides full support for the algorithm per the reference
C implementation on the
JVM, as adapted for
API very closely follows the
public methods e.g.
void update (byte input, int offset, int len), etc.
As of now, fully configurable Digest, MAC, and incremental hash (aka Tree) are provided.
(The parallel implementation of the algorithm is on the immediate TODO list.)
Installation and Use
First you need to clone this repo and install the product in your local
.m2 repository. (Getting this project to a central maven repository is on the TODO list.)
So, clone it in some happy place:
git clone https://github.com/alphazero/blake2b
And build and (locally) install it:
Blake2b as a dependency in your
Maven project, add the following to the
<dependencies>section of your project
<dependency> <groupId>ove</groupId> <artifactId>ove.blake2b</artifactId> <version>alpha.0</version> </dependency>
If you just want the
jar, you can grab it from the project
mvn install or
API is defined by the top level
Blake2b interface. This interface is implemented by the
Blake2b.Engine and it is indirectly accessible via the semantic inner classes of
Blake2b. These methods allow for both "streaming" and basic hashing of
byte input data.
API mimics the relevant sub-set of the standard Java
MessageDigest class and
Blake2b.Param implements the (tag) interface
AlgorithmParameterSpec, as preparatory support for possible
JCA compliance. But that is not on my TODO list as of now. But care has been taken to make that path easy for you, should you wish to do so yourself.
*** A note regarding input constraint checks ***
Typically one prefers a maximally defensive and pedantic implementation of functions that
assert constraints on input arguments. But given that
Blake2b is intended for high performance applications, the reference implementation (
Blake2b.Engine) does not do so. So, pass in
null and expect a
NullPointerException, etc. Do note it.
update(..) is typically (*) used for sequential ("stream") updates of the digest.
The (3) variants of
void update (byte input)to update with a single byte
void update (byte input)to update with a
void update (byte input, int offset, int len)to update with a slice of a
Update methods treat all input args as immutable (
(*) Note that if your use case requires the (final) digest output of the algorithm to be copied to a slice of (user) supplied
byte, even in the non-streaming case, you need to invoke an
final is a keyword in Java and
finalize has specific semantics in context of Java
digest(..) is the Java equivalent of a hash
The (3) variants of
byte digest()to emit the final hash of the
byte digest(byte input)to update the state with
inputand emit the final hash.
void digest(byte out, int off, int len)writes the final hash to the given
digest(..) methods returning
byte are guaranteed to return a non-
Configuration and Usage
Blake2b implementation can be used simply with default configuration parameters, or optionally, you may choose to set specific configuration parameters of the algorithm.
Default Settings (
Digest Length: 64 bytes Key: none Salt: none Personalization: none Depth: 1 Fanout: 1 Leaf Length: 0 Inner Length: 0 Node Depth: 0 Node Offset: 0
(Note that all
Blake2b semantic factory zero-arg methods use the above default configuration parameters.)
You can change the default configuration via the
To obtain an instance of the Param class, simply instantiate it:
// get an instance of the default configuration parameters Blake2b.Param param = new Blake2b.Param();
Param exposes a fluent setter
API allowing for chaining of setXXx methods.
// get an instance of the default configuration parameters // and customize it with key, salt, and personalization, // and output a 160bit hash (say for SHA1 replacement) Blake2b.Param param = new Blake2b.Param(). setDigestLength( 20 ). setKey ( keyBytes ). setSalt ( saltBytes ). setPersonal ( personalizationBytes );
The constraints on the configuration parameter settings are detailed in
Blake2b.Param pedantically asserts all constraints on input args.
General Hashing with
Blake2b.Digest instantiates the reference implementation of
Blake2b isolating you from the current/actual implementation (
// using default params final Blake2b blake2b = Blake2b.Digest.newInstance();
// using custom params final Blake2b blake2b = Blake2b.Digest.newInstance( param );
Blake2b.Digest also exposes a convenience factory method for the common case of a customized digest length:
// Just config the output digest length - here 160bit output ala SHA1 final Blake2b blake2b = Blake2b.Digest.newInstance( 20 );
Note that you can use the
Blake2b instance obtained via
Blake2b.Digest factory methods to perform any of the features of the
BLAKE2b algorithm with appropriate configuration via
Blake2b.Param, should you prefer.
Message Authentication Codes with
MAC semantics are exposed by
Blake2b.Mac class. This (convenience) class exposes a few factory methods.
final Blake2b mac = Blake2b.Mac.newInstance ( theKeyBytes ); // basic MAC
and the common case of MAC with custom digest length
final Blake2b mac = Blake2b.Mac.newInstance ( theKeyBytes, 20 ); // 160bit MAC
You can also use a
java.security.Key (which must support
encoding -- see Key javadocs for details) instead of a raw
import java.security.key; .. final Key thekey = .. // not provided by Blake2b library // 160bit MAC with Key final Blake2b mac = Blake2b.Mac.newInstance ( theKey, 20 );
Incremental Hashing with
Blake2b.Tree class provides a convenient semantic
API for incremental hasing with
// incremntally hash a stream of bytes using Blake2b. // here a tree of depth 2, fanout 3, leaf length of 4096, inner length of 64, // and a final (tree) hash output of length 20) Blake2b.Tree tree = Blake2b.Tree (2, 3, 4096, 64, 20); // assume that we got our 3 chunks obtained elsewhere // and these are byte of max "leaf length" bytes (here 4096). // Also note that nodes will output a hash of length "inner length" // hash the chunks byte hash_00 = tree.getNode (0, 0).digest ( chunk00 ); byte hash_01 = tree.getNode (0, 1).digest ( chunk01 ); byte hash_02 = tree.getNode (0, 2).digest ( chunk02 ); // implicit 'last node' // get the final tree hash output final Blake2b digest = tree.getRoot(); // implicit node (1, 0) digest.update (hash_00); digest.update (hash_01); digest.update (hash_02); final byte hash = digest.digest();
Performance as of now is generally in par with
SHA1 appears to be a bit faster in some cases. The relative performance between these two digests is also somewhat sensitive to the input data size.
MD5 is significantly faster than both.
The hotspot is, as expected, is in the
compress() function of the
Blake2b.Engine. Given the waterfall data flow dependency of this critical section, there is little room for optimization by the JIT.
So, given that this function is nearly fully inlined (per above JIT considerations), it would appear that the aggrevating issue is the necessity for decoding of 64bit (long) values as
bytes. (In the reference
C implementation on platforms -- with native little-endian representation of integer values -- this en/decoding is a simple matter of casting to/from
Further possible optimizations
Unrolling the Rounds loop
For the current
byte (object) based implementation, a final unrolling of the
G rounds can be attempted. This is on the near-term TODO list.
Using Hardcoded Array Offseting
Compress() is currently handed an array with an offset for mapping of the compression buffer from
digest(). So mapping of the compression input buffer to the
m vector incurrs the cost of
128 offset calculations (e.g.
m |= ((long) b[ offset + 105 ] & 0xFF ) << 8;). The alternative would be to incurr the
System.arraycopy equivalent of the 2
memcpy calls in
update function and pass a fixed offset to
compress() and use hardcoded offsets (e.g.
m |= ((long) b[ 105 ] & 0xFF ) << 8;).
This was a design decision breaking with the
C reference implementation to allow for direct mapping of the
m vector of the
compress from user input
byte, and further obviating the need for the 2
memcpys of the reference implementation. Reverting to a variant closely following the
C and using a double sized compression buffer and re-evaluting the performance is on the near-term TODO list.
Using the native
byte backing buffer of a direct, & page aligned, allocated
ByteBuffer may possibly afford a bit of efficiencies, but the development platform available to me (a Mac Air) does not support this optional feature of
ByteBuffer. You're welcome to try this and I would be interested in the feedback as to your findings.
Engine implementation using
sun.misc.Unsafe is vaguely entertained for future. The fact that offering this varient would (optimally) requires support for pluggable engines (and thus possibly full
JCA/JCE compliance) puts it off the near-term TODO list. Frankly it does not seem that the issue is in the use of
byte object, but no claim to authoritative opinion is made in this regard. If you know better, do let me know!
To the Eternal Absolute, The One, ~!!! Ahura-Mazda !!!~ even ~!!! Al-Aziz-Al-Hakim !!!~, The Lord of Sentient Realms, The True in Love. The Friend.
Feb. 2014. bushwick. bk. nyc.