Skip to content

Latest commit

 

History

History
25 lines (21 loc) · 1.09 KB

README.md

File metadata and controls

25 lines (21 loc) · 1.09 KB

BLAKE3-gpu

Parallelizing the BLAKE3 crypto hash function via its merkle tree structure.
Check Presentation for a complete explanation.

Current best speedup ⚡ -> 4.5x at 1.07 GiB/s on an octacore

What

BLAKE3 is a gg crypto hash function. It has good scope for parallelism.
We try to extract as much of that parallelism as possible by using GPUs.
We also try to speed it up on the CPU with Open-MP and AVX2.
All of this is possible due to our new algorithm - Blaze3.

How

  • Rewrite the basic, reference implemenation in C++
  • Rewrite it again, in CUDA C++
  • Make sure all the tests pass (Continuous process)
  • Optimize it, fix memory bandwidth issues if they exist (Continuous process)

Development

  • The basic directory has the reference implementations.
  • A full copy of the original reference implementation is in testing.
  • The blake3 paper is also here for reference.
  • Openmp work in openmp. This version is maxed out for efficency.
  • Cuda work in cuda. This version uses dynamic parallelism.
  • Dark cuda work happens in dark.