Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch the BLAKE2 implementation to blake2b_simd/blake2s_simd #88

Closed
wants to merge 3 commits into from

Commits on Jun 15, 2020

  1. use blake2b_simd and blake2s_simd internally

    Replace the internal implementation of BLAKE2b and BLAKE2s with calls to
    the blake2b_simd and blake2s_simd crates. Those crates contain optimized
    implementations for SSE4.1 and AVX2, and they use runtime CPU feature
    detection to select the best implementation.
    
    Running the long-input benchmarks on an Intel i9-9880H with AVX2
    support, this change is a performance improvement of about 1.5x for
    BLAKE2b and 1.35x for BLAKE2s.
    
    This change deletes the undocumented `with_parameter_block` method, as
    the raw parameter block is not exposed by blake2b_simd or blak2s_simd.
    Callers who need BLAKE2 tree mode parameters can use the upstream crates
    directly. They provide a complete set of parameter methods.
    
    This change also deletes the `finalize_last_node` method. This method
    was arguably attached to the wrong types, `VarBlake2b` and `VarBlake2s`,
    where it would panic with a non-default output length. It's not very
    useful without the other tree parameters, so rather than moving it to
    the fixed-length `Blake2b` and `Blake2s` types where it belongs, we just
    delete it. This also simplifies the addition of BLAKE2bp and BLAKE2sp
    support in the following commit, as those algorithms use the last node
    flag internally and cannot expose it.
    oconnor663-zoom committed Jun 15, 2020
    Configuration menu
    Copy the full SHA
    5b06106 View commit details
    Browse the repository at this point in the history
  2. add support for BLAKE2bp and BLAKE2sp

    On an Intel i9-9880H with AVX2 support, both BLAKE2bp and BLAKE2sp are
    about 1.75x faster than BLAKE2b. Note that while these algorithms can be
    implemented with multi-threading, these implementations from
    blake2b_simd and blake2s_simd are single-threaded, using only SIMD
    parallelism.
    
    The blake2b_simd and blake2s_simd crates don't support salting or
    personalization for BLAKE2bp and BLAKE2sp, so the `with_params` methods
    are moved out into blake2b.rs and blake2s.rs.
    oconnor663-zoom committed Jun 15, 2020
    Configuration menu
    Copy the full SHA
    69eff42 View commit details
    Browse the repository at this point in the history
  3. remove the simd/simd_opt/simd_asm Cargo features for BLAKE2

    On x86 targets, SSE4.1 and AVX2 implementations are always compiled.
    With the `std` feature enabled, runtime CPU feature detection is used to
    select between them. With `std` disabled (e.g. --no-default-features),
    the only way to activate SIMD is something like
    
        export RUSTFLAGS="-C target-cpu=native"
    oconnor663-zoom committed Jun 15, 2020
    Configuration menu
    Copy the full SHA
    8ca8053 View commit details
    Browse the repository at this point in the history