Skip to content

@Cyan4973 Cyan4973 released this Mar 5, 2020 · 30 commits to master since this release

xxHash v0.7.3 is major evolution for xxh3 and xxh128, with a focus on speed and dispersion performance.

Speed improvements

v0.7.3 pays a lot of attention to small data, by delivering generally faster latency metrics (about +10%).

Inlining is now a first class citizen, as it is generally key to best performance on small inputs.
Among the visible changes:

  • XXH_INLINE_ALL can always be set before including xxhash.h, even if xxhash.h was previously included (for example transitively, as part of a prior *.h header file).
  • The algorithm implementation has been transferred into xxhash.h. It's no longer necessary to keep a copy of xxhash.c in the /include directory for inlining to work correctly.
    • Note: xxhash.c still exists, as it's useful to instantiate xxhash functions as public symbols accessible from a library or a *.o object file. It also remains compatible with existing projects.

Large data has also received a boost, which can go up to +20% for very large samples (> many MB).

Let's underline the remarkable optimization work of @easyaspi314, who hand optimized several hot loops and instructions, and even added a new Z-vector target for s390x hardware.

No API modification

The API has remained completely stable between 0.7.2 and 0.7.3. Any programs linking with 0.7.2 should work as-is.
Note that xxh3/xxh128 results are not comparable across these versions.

New test tool

Testing a 64-bit hash algorithm for its collision rate has remained elusive for most. The sheer volume of data required to assess quality at this scale is too large for traditional test tools like SMHasher. As a general guide, it requires 4 billion hashes to reach a 50% probability of getting a single collision. Accurate collision ratio evaluation requires many more hashes to actually measure something meaningful.

A new open-source tool in tests/collisions offers this capability. It requires a lot of memory to run, with a minimum of 32 GB to measure anything significant. But provided that one has a system with enough capacity, it can accurately measure the collision ratio of any 64-bit hash algorithm.

Several algorithms were measured thanks to this tool, the result of which is currently consolidated on this wiki page. More can be added in the future.

This new development round also introduced several improvements to the SMHasher test suite, uncovering new requirements for new scenarios. This proved beneficial to improve the general dispersion qualities of xxh3 and xxh128.

Changelist

Here is a summarized list of changes for this version:

  • perf: improved speed for large inputs (~+20%)
  • perf: improved latency for small inputs (~10%)
  • perf: s390x Vectorial code, by @easyaspi314
  • cli: Improved support for Unicode filenames on Windows, thanks to @easyaspi314 and @t-mat
  • api: xxhash.h can now be included in any order, multiple times, with and without XXH_STATIC_LINKING_ONLY or XXH_INLINE_ALL
  • build: xxHash's implementation has been transferred into xxhash.h. There is no more need to have xxhash.c in the /include directory for XXH_INLINE_ALL to work
  • install: created pkg-config file, by @bket
  • install: VCpkg installation instructions, by @LilyWangL
  • doc: Highly improved code documentation, by @easyaspi314
  • misc: New test tool in /tests/collisions: brute force collision tester for 64-bit hashes
Assets 4
You can’t perform that action at this time.