Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHA256 implementations based on Intel SHA Extensions #13386

Merged
merged 3 commits into from Jul 9, 2018

Conversation

Projects
None yet
@sipa
Copy link
Member

sipa commented Jun 3, 2018

Based on #13191.

This adds SHA256 implementations that use Intel's SHA Extension instructions (using intrinsics). This needs GCC 4.9 or Clang 3.4.

In addition to #13191, two extra implementations are provided:

  • (a) A variable-length SHA256 implementation using SHA extensions.
  • (b) A 2-way 64-byte input double-SHA256 implementation using SHA extensions.

Benchmarks for 9001-element Merkle tree root computation on an AMD Ryzen 1800X system:

  • Using generic C++ code (pre-#10821): 6.1ms
  • Using SSE4 (master, #10821): 4.6ms
  • Using 4-way SSE4 specialized for 64-byte inputs (#13191): 2.8ms
  • Using 8-way AVX2 specialized for 64-byte inputs (#13191): 2.1ms
  • Using 2-way SHA-NI specialized for 64-byte inputs (this PR): 0.56ms

Benchmarks for 32-byte SHA256 on the same system:

  • Using SSE4 (master, #10821): 190ns
  • Using SHA-NI (this PR): 53ns

Benchmarks for 1000000-byte SHA256 on the same system:

  • Using SSE4 (master, #10821): 2.5ms
  • Using SHA-NI (this PR): 0.51ms
src/crypto/sha256.cpp Outdated
}

#if defined(ENABLE_SSE41) && !defined(BUILD_BITCOIN_INTERNAL)
if (have_sse4) {

This comment has been minimized.

@kallewoof

kallewoof Jun 4, 2018

Member

What about

        ret = "sse4";
#if defined(ENABLE_SSE41) && !defined(BUILD_BITCOIN_INTERNAL)
        TransformD64_4way = sha256d64_sse41::Transform_4way;
        ret += ",sse41";
#endif
    }

?

This comment has been minimized.

@sipa

sipa Jun 4, 2018

Author Member

Done.

@sipa sipa force-pushed the sipa:201806_shani branch Jun 4, 2018

@fanquake fanquake added the Validation label Jun 4, 2018

@Kick1986

This comment has been minimized.

Copy link

Kick1986 commented Jun 4, 2018

@sipa

Great works as usual!

I just came cross this thread

https://github.com/armfazh/flo-shani-aesni/blob/master/README.md

I hope you will have time to look at what they did!

@DrahtBot

This comment has been minimized.

Copy link
Contributor

DrahtBot commented Jun 4, 2018

Needs rebase

@promag

This comment has been minimized.

Copy link
Member

promag commented Jun 4, 2018

Concept ACK, nice numbers.

@sipa sipa force-pushed the sipa:201806_shani branch Jun 4, 2018

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Jun 4, 2018

Rebased.

@Kick1986 Nice, I'll have a look.

@DrahtBot

This comment has been minimized.

Copy link
Contributor

DrahtBot commented Jun 5, 2018

Note to reviewers: This pull request conflicts with the following ones:
  • #13442 (Convert the 1-way SSE4 SHA256 code from asm to intrinsics by sipa)
  • #13203 (Add POWER8 ASM for 4-way SHA256 by TheBlueMatt)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@Empact

This comment has been minimized.

Copy link
Member

Empact commented Jun 5, 2018

For clang version, looks like they were added in 3.4, but never noted in the release notes.
Source: went from commit date[1] to release date[2] to file in release version[3].
Did not check for every intrinsic used.
[1] llvm-mirror/clang@b83f5a7#diff-c4f203a0f202bf56c364a657b0aecae9
[2] http://releases.llvm.org/
[3] https://github.com/llvm-mirror/clang/blob/release_34/lib/Headers/shaintrin.h


/* Load state */
s0 = _mm_loadu_si128((const __m128i*)s);
s1 = _mm_loadu_si128((const __m128i*)(s + 4));

This comment has been minimized.

@theuni

theuni Jun 5, 2018

Member

I think these could be _mm_load_si128 if s[] was alignas(16).

src/Makefile.am Outdated
@@ -32,6 +32,7 @@ LIBBITCOIN_UTIL=libbitcoin_util.a
LIBBITCOIN_CRYPTO=crypto/libbitcoin_crypto.a
LIBBITCOIN_CRYPTO_SSE41=crypto/libbitcoin_crypto_sse41.a
LIBBITCOIN_CRYPTO_AVX2=crypto/libbitcoin_crypto_avx2.a
LIBBITCOIN_CRYPTO_SHANI=crypto/libbitcoin_crypto_shani.a

This comment has been minimized.

@theuni

theuni Jun 5, 2018

Member

These are starting to get out of hand. I think we should take @TheBlueMatt's suggestion and treat LIBBITCOIN_CRYPTO as a collection of these helpers. That way we can just add $(LIBBITCOIN_CRYPTO) everywhere, and that will pull in the cpu-specific libs as well. Something like:

...
LIBBITCOIN_CRYPTO=crypto/libbitcoin_crypto.a
LIBBITCOIN_CRYPTO_AVX2=crypto/libbitcoin_crypto_avx2.a
LIBBITCOIN_CRYPTO_SHANI=crypto/libbitcoin_crypto_shani.a
LIBBITCOIN_CRYPTO+=$(LIBBITCOIN_CRYPTO_AVX2)
LIBBITCOIN_CRYPTO+=$(LIBBITCOIN_CRYPTO_SHANI)
...

Then the cpu-specific ones can be dropped from the LDADDs all over the place. I'm happy to do up a patch on top of this if you'd like.

This comment has been minimized.

@sipa

sipa Jun 6, 2018

Author Member

@theuni Actually, feel like PRing that as a separate PR, before this one goes in? Then @TheBlueMatt and I can both rebase ours on top of yours and not conflict with each other.

@theuni

This comment has been minimized.

Copy link
Member

theuni commented Jun 5, 2018

concept ACK.

I noticed while testing #13400 that I added a bug, but bitcoind started up fine anyway, due to missing sanity checks for the double/4way/8way hashes. Mind adding those?

src/crypto/sha256_shani.cpp Outdated

#include <stdint.h>
#if defined(_MSC_VER)
#include <immintrin.h>

This comment has been minimized.

@DesWurstes

DesWurstes Jun 7, 2018

Contributor

I believe including immintrin.h is enough for both platforms. x86intrin includes immintrin.h, and immintrin.h includes everything that is needed: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/immintrin.h

EDIT: Tested on Linux GCC and Clang, including immintrin.h is enough for all platforms.

This comment has been minimized.

@sipa

sipa Jun 11, 2018

Author Member

Thanks, fixed.

laanwj added a commit that referenced this pull request Jun 11, 2018

Merge #13408: crypto: cleanup sha256 build
f68049d crypto: cleanup sha256 build (Cory Fields)

Pull request description:

  Requested by @sipa in #13386.

  Rather than appending all possible cpu variants to all targets, create a convenience variable that encompasses all.

Tree-SHA512: 8e9ab2185515672b79bb7925afa4f3fbfe921bfcbe61456833d15457de4feba95290de17514344ce42ee81cc38b252476cd0c29432ac48c737c2225ed515a4bd

@sipa sipa force-pushed the sipa:201806_shani branch Jun 11, 2018

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Jun 11, 2018

Rebased.

@DrahtBot DrahtBot removed the Needs rebase label Jun 11, 2018

@DrahtBot

This comment has been minimized.

Copy link
Contributor

DrahtBot commented Jun 12, 2018

Needs rebase

@sipa sipa force-pushed the sipa:201806_shani branch Jun 12, 2018

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Jun 12, 2018

Rebased.

@DrahtBot DrahtBot removed the Needs rebase label Jun 12, 2018

src/crypto/sha256_shani.cpp Outdated
#include <stdint.h>
#include <immintrin.h>

#include "crypto/common.h"

This comment has been minimized.

@theuni

theuni Jun 13, 2018

Member

Linter is yelling about include style here.

@DrahtBot

This comment has been minimized.

Copy link
Contributor

DrahtBot commented Jun 24, 2018

Needs rebase

@sipa sipa force-pushed the sipa:201806_shani branch from ff1ef63 Jun 24, 2018

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Jun 24, 2018

Rebased after #13471 merge.

Also split up the CPU feature detection logic change into its own commit.

I noticed while testing #13400 that I added a bug, but bitcoind started up fine anyway, due to missing sanity checks for the double/4way/8way hashes. Mind adding those?

That was addressed in #13438.

@DrahtBot DrahtBot removed the Needs rebase label Jun 24, 2018

@theuni

This comment has been minimized.

Copy link
Member

theuni commented Jun 25, 2018

~~~SelfTest() is now missing TransformD64_2way :(~~~

utACK otherwise.

@sipa sipa force-pushed the sipa:201806_shani branch to 66b2cf1 Jun 26, 2018

@theuni

This comment has been minimized.

Copy link
Member

theuni commented Jun 26, 2018

Thanks! utACK 66b2cf1.

@jb55

This comment has been minimized.

Copy link
Contributor

jb55 commented Jun 27, 2018

Not sure what compelled me to do this, and it's probably overkill... but...

Tested ACK 66b2cf1 with 100k rounds of quickcheck at various optimization levels, but only with the non-two way transform for now.

@DesWurstes

This comment has been minimized.

Copy link
Contributor

DesWurstes commented Jun 27, 2018

Just a nit from older pull requests: Now that it has a custom CPUID function

// We can't use cpuid.h's __get_cpuid as it does not support subleafs.
void inline cpuid(uint32_t leaf, uint32_t subleaf, uint32_t& a, uint32_t& b, uint32_t& c, uint32_t& d)
{
__asm__ ("cpuid" : "=a"(a), "=b"(b), "=c"(c), "=d"(d) : "0"(leaf), "2"(subleaf));
}

including <cpuid.h> is not necessary now:

#if defined(__x86_64__) || defined(__amd64__)
#if defined(USE_ASM)
#include <cpuid.h>
namespace sha256_sse4
{

Thank you for your awesome contributions!

@gmaxwell

This comment has been minimized.

Copy link
Member

gmaxwell commented Jul 7, 2018

ACK

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Jul 9, 2018

utACK 66b2cf1
tested that build passes on FreeBSD+OpenBSD

@laanwj laanwj merged commit 66b2cf1 into bitcoin:master Jul 9, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Jul 9, 2018

Merge #13386: SHA256 implementations based on Intel SHA Extensions
66b2cf1 Use immintrin.h everywhere for intrinsics (Pieter Wuille)
4c935e2 Add SHA256 implementation using using Intel SHA intrinsics (Pieter Wuille)
268400d [Refactor] CPU feature detection logic for SHA256 (Pieter Wuille)

Pull request description:

  Based on #13191.

  This adds SHA256 implementations that use Intel's SHA Extension instructions (using intrinsics). This needs GCC 4.9 or Clang 3.4.

  In addition to #13191, two extra implementations are provided:
  * (a) A variable-length SHA256 implementation using SHA extensions.
  * (b) A 2-way 64-byte input double-SHA256 implementation using SHA extensions.

  Benchmarks for 9001-element Merkle tree root computation on an AMD Ryzen 1800X system:
  * Using generic C++ code (pre-#10821): 6.1ms
  * Using SSE4 (master, #10821): 4.6ms
  * Using 4-way SSE4 specialized for 64-byte inputs (#13191): 2.8ms
  * Using 8-way AVX2 specialized for 64-byte inputs (#13191): 2.1ms
  * Using 2-way SHA-NI specialized for 64-byte inputs (this PR): 0.56ms

  Benchmarks for 32-byte SHA256 on the same system:
  * Using SSE4 (master, #10821): 190ns
  * Using SHA-NI (this PR): 53ns

  Benchmarks for 1000000-byte SHA256 on the same system:
  * Using SSE4 (master, #10821): 2.5ms
  * Using SHA-NI (this PR): 0.51ms

Tree-SHA512: 2b319e33b22579f815d91f9daf7994a5e1e799c4f73c13e15070dd54ba71f3f6438ccf77ae9cbd1ce76f972d9cbeb5f0edfea3d86f101bbc1055db70e42743b7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.