Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARMv8 SHA2 Intrinsics #24115

Merged
merged 4 commits into from
Feb 14, 2022
Merged

ARMv8 SHA2 Intrinsics #24115

merged 4 commits into from
Feb 14, 2022

Conversation

prusnak
Copy link
Contributor

@prusnak prusnak commented Jan 20, 2022

This PR adds support for ARMv8 SHA2 Intrinsics.

Fixes #13401 and #17414

@laanwj
Copy link
Member

laanwj commented Jan 20, 2022

Concept ACK!

detection when the feature can be used

On Linux (the only system we care about for ARM, i guess), the following would be the way to do detection:

#include <sys/auxv.h>
#include <asm/hwcap.h>
…
#ifdef __arm__
/* ARM 32 bit */
if (getauxval(AT_HWCAP2) & HWCAP2_SHA2) {
    have_arm_shani = true;
}
#endif
#ifdef __aarch64__
/* ARM 64 bit */
if (getauxval(AT_HWCAP) & HWCAP_SHA2) {
    have_arm_shani = true;
}
#endif

Note that the capability bit is on a different HWCAP word on 32 and 64 bit (dunno if you even want to support 32 bit here).

@prusnak
Copy link
Contributor Author

prusnak commented Jan 20, 2022

the following would be the way to do detection:

Added in f7dd1ef

@prusnak prusnak force-pushed the armv8-shani branch 2 times, most recently from 3d77517 to f7dd1ef Compare January 20, 2022 19:57
@sipa
Copy link
Member

sipa commented Jan 20, 2022

On commit f7dd1ef

On a Linux aarch64 Cortex-A53 system with:

$ cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 200.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

which I presume means it has the necessary SHA2 extensions.

The GCC 9.3.0 compiler used supports the extensions (crypto/libbitcoin_crypto_arm_shani.a is being built):

checking for x86 SHA-NI intrinsics... no
checking whether C++ compiler accepts -march=armv8-a+crc+crypto... yes
checking whether C++ compiler accepts -march=armv8-a+crc+crypto... (cached) yes
checking for AArch64 CRC32 intrinsics... yes
checking for AArch64 SHA-NI intrinsics... yes

Still, the extension doesn't seem to be detected. debug.log says:

2022-01-20T20:37:05Z Using the 'standard' SHA256 implementation

@prusnak
Copy link
Contributor Author

prusnak commented Jan 20, 2022

@sipa should be fixed in c0849fc

@sipa
Copy link
Member

sipa commented Jan 20, 2022

On c0849fc:

2022-01-20T22:15:44Z Using the 'arm_shani(1way)' SHA256 implementation

@prusnak prusnak marked this pull request as ready for review January 20, 2022 22:18
@sipa
Copy link
Member

sipa commented Jan 20, 2022

This PR (c0849fc):

ns/byte byte/s err% total benchmark
2.56 390,886,099.70 0.9% 0.03 SHA256
10.27 97,329,584.86 0.0% 0.01 SHA256D64_1024
14.60 68,478,670.64 0.4% 0.01 SHA256_32b

On master (e3ce019):

ns/byte byte/s err% total benchmark
15.69 63,715,155.54 0.0% 0.17 SHA256
43.42 23,029,615.98 0.5% 0.03 SHA256D64_1024
41.67 23,995,549.21 0.1% 0.01 SHA256_32b

@prusnak prusnak mentioned this pull request Jan 20, 2022
@PastaPastaPasta
Copy link
Contributor

PastaPastaPasta commented Jan 21, 2022

On Linux (the only system we care about for ARM, i guess)

M1 macs would like a word with you.

@PastaPastaPasta
Copy link
Contributor

Speaking of m1, I was able to compile this locally on my m1 pro 10 core, ./configure realized that SHA2 intrinsics could be used. See benchmarks below.

on c0849fc

ns/byte byte/s err% total benchmark
0.47 2,148,996,364.97 0.3% 0.01 SHA256
1.48 676,354,727.08 0.3% 0.01 SHA256D64_1024
1.15 873,060,380.08 0.1% 0.01 SHA256_32b

on master

ns/byte byte/s err% total benchmark
3.10 322,550,263.01 1.3% 0.03 SHA256
9.26 107,941,088.30 0.7% 0.01 SHA256D64_1024
6.22 160,743,377.26 1.6% 0.01 SHA256_32b

@prusnak
Copy link
Contributor Author

prusnak commented Jan 21, 2022

Speaking of m1, I was able to compile this locally on my m1 pro 10 core, ./configure realized that SHA2 intrinsics could be used. See benchmarks below.

Yes, support for Apple Silicon is included in this PR.

@hebasto
Copy link
Member

hebasto commented Jan 21, 2022

Tested c0849fc on Mac mini (M1, 2020):

% time ./src/bitcoind -datadir=/Users/hebasto/SHANI -assumevalid=0 -stopatheight=719700 -prune=550
2022-01-21T07:44:20Z Bitcoin Core version v22.99.0-c0849fc4fd9a (release build)
2022-01-21T07:44:20Z Validating signatures for all blocks.
2022-01-21T07:44:20Z Setting nMinimumChainWork=00000000000000000000000000000000000000001fa4663bbbe19f82de910280
2022-01-21T07:44:20Z Prune configured to target 550 MiB on disk for block and undo files.
2022-01-21T07:44:20Z Using the 'arm_shani(1way)' SHA256 implementation
...
2022-01-21T22:38:17Z Shutdown: done
./src/bitcoind -datadir=/Users/hebasto/SHANI -assumevalid=0  -prune=550  149587.28s user 11456.17s system 300% cpu 14:53:56.52 total

UPDATE. The same for the master branch (e3ce019):

% time ./src/bitcoind -datadir=/Users/hebasto/MASTER -assumevalid=0 -stopatheight=719700 -prune=550
2022-01-21T22:49:25Z Bitcoin Core version v22.99.0-e3ce019667fb (release build)
2022-01-21T22:49:25Z Validating signatures for all blocks.
2022-01-21T22:49:25Z Setting nMinimumChainWork=00000000000000000000000000000000000000001fa4663bbbe19f82de910280
2022-01-21T22:49:25Z Prune configured to target 550 MiB on disk for block and undo files.
2022-01-21T22:49:25Z Using the 'standard' SHA256 implementation
...
2022-01-22T14:37:08Z Shutdown: done
./src/bitcoind -datadir=/Users/hebasto/MASTER -assumevalid=0  -prune=550  174110.07s user 11526.30s system 326% cpu 15:47:43.83 total

51 min or 6% faster IBD.

@sipa
Copy link
Member

sipa commented Jan 21, 2022

See https://github.com/sipa/bitcoin/commits/pr24115, which adds a 2-way 64-byte optimized variant. On my Cortex-A53 It's roughly a 2x speedup for the SHA256D64_1024 benchmark (relevant for Merkle root computation) compared to this PR. For more modern architectures I could imagine it's more:

ns/byte byte/s err% total benchmark
2.60 384,105,263.28 0.3% 0.03 SHA256
5.35 187,019,153.94 0.1% 0.01 SHA256D64_1024
14.61 68,437,280.69 0.0% 0.01 SHA256_32b

For reference, master again:

ns/byte byte/s err% total benchmark
15.69 63,715,155.54 0.0% 0.17 SHA256
43.42 23,029,615.98 0.5% 0.03 SHA256D64_1024
41.67 23,995,549.21 0.1% 0.01 SHA256_32b

@PastaPastaPasta
Copy link
Contributor

@sipa's branch on m1:

ns/byte byte/s err% total benchmark
0.46 2,174,603,243.64 0.6% 0.01 SHA256
0.95 1,053,985,898.82 0.8% 0.01 SHA256D64_1024
1.15 871,857,965.44 0.4% 0.01 SHA256_32b

previous results on c0849

ns/byte byte/s err% total benchmark
0.47 2,148,996,364.97 0.3% 0.01 SHA256
1.48 676,354,727.08 0.3% 0.01 SHA256D64_1024
1.15 873,060,380.08 0.1% 0.01 SHA256_32b

@prusnak
Copy link
Contributor Author

prusnak commented Jan 22, 2022

I confirm the numbers on M1:

before @sipa's improvements (f06f46c):

ns/byte byte/s err% total benchmark
0.45 2,198,307,083.71 0.3% 0.01 SHA256
1.49 671,303,457.11 0.3% 0.01 SHA256D64_1024
1.15 866,670,334.07 0.2% 0.01 SHA256_32b

after @sipa's improvements (0e72995):

ns/byte byte/s err% total benchmark
0.46 2,197,198,571.82 0.3% 0.01 SHA256
0.94 1,059,405,097.65 0.3% 0.01 SHA256D64_1024
1.17 857,291,152.82 0.5% 0.01 SHA256_32b

@prusnak
Copy link
Contributor Author

prusnak commented Jan 22, 2022

I confirm the numbers on M1:

I merged @sipa's improvements into this branch => 0e72995 ❤️

@PastaPastaPasta
Copy link
Contributor

PastaPastaPasta commented Jan 22, 2022

I'm not able to build this branch on m1 at the moment

config.status: creating libbitcoinconsensus.pc
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating doc/man/Makefile
config.status: creating share/setup.nsi
config.status: creating share/qt/Info.plist
config.status: creating test/config.ini
config.status: creating contrib/devtools/split-debug.sh
config.status: creating src/config/bitcoin-config.h
config.status: src/config/bitcoin-config.h is unchanged
config.status: executing depfiles commands
config.status: executing libtool commands
 cd . && /bin/sh /Users/pasta/workspace/bitcoin/build-aux/missing automake-1.16 --foreign
/bin/sh: /Users/pasta/workspace/bitcoin/build-aux/missing: No such file or directory
make: *** [Makefile.in] Error 1

I just checked out sipa's branch here: sipa@0e72995 and compilation worked trivially

@sipa
Copy link
Member

sipa commented Jan 22, 2022

@prusnak @PastaPastaPasta Perhaps you want to also benchmark with the two last commits removed (so at "Optimization: precompute a few 3rd transform intermediaries"). Whether the last two help may be very architecture-dependent. For me they contribute a ~30% speedup, but maytbe on M1 that is not the case.

@prusnak
Copy link
Contributor Author

prusnak commented Jan 22, 2022

@sipa benchmark of 38ed75f Optimization: precompute a few 3rd transform intermediaries on M1:

ns/byte byte/s err% total benchmark
0.46 2,163,137,327.85 0.0% 0.01 SHA256
1.28 780,225,941.01 0.3% 0.01 SHA256D64_1024
1.18 850,467,547.93 0.2% 0.01 SHA256_32b

The improvement of using 0e72995 is there also for M1.

@sipa
Copy link
Member

sipa commented Jan 22, 2022

Looks like the 2-way version is a clear win on M1 as well, thanks!

@prusnak
Copy link
Contributor Author

prusnak commented Jan 28, 2022

@fanquake rebased on top of current master

@hebasto
Copy link
Member

hebasto commented Jan 28, 2022

Guix builds:

$ find guix-build-$(git rev-parse --short=12 HEAD)/output/ -type f -print0 | env LC_ALL=C sort -z | xargs -r0 sha256sum
4a309ef27036065f787330a50659c85323f78f7b5d3c69a79e3eca232f4d3e55  guix-build-aaa1d03d3ace/output/aarch64-linux-gnu/SHA256SUMS.part
cfedbd51f5bf65d57fe9200e22e56f3a38f44308eea565b201483811996d957b  guix-build-aaa1d03d3ace/output/aarch64-linux-gnu/bitcoin-aaa1d03d3ace-aarch64-linux-gnu-debug.tar.gz
85c9195cf594fbbf3ce6c6b76370f746dd4342c793b54b8b5bdb650a53761eaf  guix-build-aaa1d03d3ace/output/aarch64-linux-gnu/bitcoin-aaa1d03d3ace-aarch64-linux-gnu.tar.gz
280234b6a20bbcddb847e0bbaf27f35ac30d292ea5dd096f8c8096220f8ef29a  guix-build-aaa1d03d3ace/output/arm-linux-gnueabihf/SHA256SUMS.part
48c0fd0a6a4c1e86e942f6907804ed87527f329954e81e5bdc22beed83cbe6b3  guix-build-aaa1d03d3ace/output/arm-linux-gnueabihf/bitcoin-aaa1d03d3ace-arm-linux-gnueabihf-debug.tar.gz
c435b7a53606f33f9d23f28e6c98ce3d55f50f61fd67bf2b4ebe65261fe68aba  guix-build-aaa1d03d3ace/output/arm-linux-gnueabihf/bitcoin-aaa1d03d3ace-arm-linux-gnueabihf.tar.gz
e0d61af7b471ba5135a848be6d4d6cf0caa8042dab9d929f6578de17e47e40c2  guix-build-aaa1d03d3ace/output/arm64-apple-darwin/SHA256SUMS.part
e9eda618bf90d5d1522005bd13bd296ae89aec9c3b0b2528b9c9f67e70178828  guix-build-aaa1d03d3ace/output/arm64-apple-darwin/bitcoin-aaa1d03d3ace-arm64-apple-darwin.tar.gz
a6540dcea7c2a2562edd0f2232f511da97339d388cdca686e084d03e381d6fa2  guix-build-aaa1d03d3ace/output/arm64-apple-darwin/bitcoin-aaa1d03d3ace-osx-unsigned.dmg
6d970ae7c94b78c5f97e978718862e85c3375c48d3b8ade6fafce39e7c8e0f0e  guix-build-aaa1d03d3ace/output/arm64-apple-darwin/bitcoin-aaa1d03d3ace-osx-unsigned.tar.gz
cdbc9eb5281b14ecbecc2956a5a41709fda93762115ce3cee9516a68874676b8  guix-build-aaa1d03d3ace/output/dist-archive/bitcoin-aaa1d03d3ace.tar.gz
d063099449e40036d15ed5b023f602ba824edf303ecbe78bcd3f01feeabb535f  guix-build-aaa1d03d3ace/output/powerpc64-linux-gnu/SHA256SUMS.part
3e6da4026d466039cd361e48a492ea780021bc0efb8e67e948097871ce4226cc  guix-build-aaa1d03d3ace/output/powerpc64-linux-gnu/bitcoin-aaa1d03d3ace-powerpc64-linux-gnu-debug.tar.gz
669291a4767509f053bdd8789f4631ff6d28cd4ad62156105bc98cdb7d3a295a  guix-build-aaa1d03d3ace/output/powerpc64-linux-gnu/bitcoin-aaa1d03d3ace-powerpc64-linux-gnu.tar.gz
868cbe0f73cd786d91dd6d1990550e6c9c673ef3d874c545150b6e030951a24d  guix-build-aaa1d03d3ace/output/powerpc64le-linux-gnu/SHA256SUMS.part
90157bdc29262abd421535e4d6921e72aff7f3d43ca5634bc76598d8daf3a1ec  guix-build-aaa1d03d3ace/output/powerpc64le-linux-gnu/bitcoin-aaa1d03d3ace-powerpc64le-linux-gnu-debug.tar.gz
8275665e19f85193de4e7e5ee7b451d6a4f8e414a33586e7066575812e878eda  guix-build-aaa1d03d3ace/output/powerpc64le-linux-gnu/bitcoin-aaa1d03d3ace-powerpc64le-linux-gnu.tar.gz
2723b48d06e8217adb41bcc640417cba3470b234cce7815104e878866d775046  guix-build-aaa1d03d3ace/output/riscv64-linux-gnu/SHA256SUMS.part
f63822813587ec9e4bcc044c4b7918b1330d6b16be09f28bd95fedfd3dcdb147  guix-build-aaa1d03d3ace/output/riscv64-linux-gnu/bitcoin-aaa1d03d3ace-riscv64-linux-gnu-debug.tar.gz
3ec47d6968e2e430e3ff1629f07243f8589bd30406c0de916c2bbc6d5d88e0e8  guix-build-aaa1d03d3ace/output/riscv64-linux-gnu/bitcoin-aaa1d03d3ace-riscv64-linux-gnu.tar.gz
2120e8021edfe8170e318d82e44e405af68bb04c3fe3a3cd0407a17507cd74a0  guix-build-aaa1d03d3ace/output/x86_64-apple-darwin/SHA256SUMS.part
11b173cbeff7b20c717fd880446904eec2d33b30076348c0e9698e876f5be4a4  guix-build-aaa1d03d3ace/output/x86_64-apple-darwin/bitcoin-aaa1d03d3ace-osx-unsigned.dmg
af86643730d769ddf6635e9e240b50bd5b94aa576cf7450b4289d8ec1fc883ed  guix-build-aaa1d03d3ace/output/x86_64-apple-darwin/bitcoin-aaa1d03d3ace-osx-unsigned.tar.gz
3c3591954aaf6b0557b74baf156f892b9b90ec63c35f3495889431afe0a17f93  guix-build-aaa1d03d3ace/output/x86_64-apple-darwin/bitcoin-aaa1d03d3ace-osx64.tar.gz
553d227d7e32c72390e259a46f37e4fd98f5781bd4b6d7ed8d83c5a9ba04f65b  guix-build-aaa1d03d3ace/output/x86_64-linux-gnu/SHA256SUMS.part
479f3597f1577a33dad6634b97515f4ceac0ff163d64e7fa00b8685793d36231  guix-build-aaa1d03d3ace/output/x86_64-linux-gnu/bitcoin-aaa1d03d3ace-x86_64-linux-gnu-debug.tar.gz
c273a93458f2afa04c66d7e23346835dc6201f8e7b878ffbf661ae104c8746e7  guix-build-aaa1d03d3ace/output/x86_64-linux-gnu/bitcoin-aaa1d03d3ace-x86_64-linux-gnu.tar.gz

UPDATE: build artifacts are available in https://github.com/hebasto/artefacts/tree/master/pr24115/guix-build-aaa1d03d3ace/output

@prusnak
Copy link
Contributor Author

prusnak commented Jan 29, 2022

Two questions:

Systems used:

  • RPI4 - Raspberry Pi 4 system (which does not support ARMv8 SHA2 NI), Ubuntu 21.10
  • ALTRA - Ampere Altra system (which does support ARMv8 SHA2 NI), Ubuntu 20.4.3 LTS

I performed the following tests:

  • build aaa1d03 on RPI4
    • configure output says checking for ARMv8 SHA-NI intrinsics... yes
    • resulting binary contains ARMv8 SHA-NI code
    • bitcoind starts
    • bitcoind output says Using the 'standard' SHA256 implementation
  • build aaa1d03 on ALTRA
    • configure output says checking for ARMv8 SHA-NI intrinsics... yes
    • resulting binary contains ARMv8 SHA-NI code
    • bitcoind starts
    • bitcoind output says Using the 'arm_shani(1way,2way)' SHA256 implementation

  • take binary built on ALTRA and run it on RPI4
    • bitcoind starts
    • bitcoind output says Using the 'standard' SHA256 implementation
  • take binary built on RPI4 and run it on ALTRA
    • bitcoind starts
    • bitcoind output says Using the 'arm_shani(1way,2way)' SHA256 implementation

I think this proves the build mechanism and the runtime detection works as intended.

@Sjors
Copy link
Member

Sjors commented Jan 31, 2022

Concept ACK. I find it near-impossible to follow what sha256d64_arm_shani is doing, but that's mainly because our c++ TransformD64 is undocumented (introduced in #13191). In particular I don't understand how the algorithm follows from a single sha256. I assume it's an optimization. Otherwise they seem similar enough, with sha256d64_arm_shani splitting the input to take advantage of the 2-way instructions. And the tests pass :-)

@sipa
Copy link
Member

sipa commented Jan 31, 2022

@Sjors That's quite possibly worth documenting in general (for all D64 code).

What these functions do:

  • Take a pointer to an input N*64 bytes buffer, and an output N*32 bytes buffer (N=1 for 1-way ,N=2 for 2-way, etc).
  • Treat the input as the concatenation of N 64-byte inputs, compute SHA256(SHA256(input)) for each, and concatenate those outputs in the output buffer.

A bit about SHA256's structure. SHA256(bytes) is really the following algorithm:

  • Append padding to input (between 9 and 72 bytes); the result is always a multiple of 64 bytes.
  • Initialize the state (a 32-byte value, typically represented as 8 32-bit integers) to the initial state, a constant.
  • Then split the input into blocks of 64 bytes, and for each do state = Transform(state, block), where Transform is the SHA256 transformation function at a high level.
  • The hash is equal to the final state.

In case of SHA256(SHA256(64 bytes)), there are 3 Transforms being invoked:

  • The first operates on the 64 bytes of input, starting with initial state.
  • The second continues on the resulting state, processing 64 bytes of padding. That padding is a constant (it's just a function of the length of the input).
  • The third operates on the 32 bytes of output produced by the second transform, followed by 32 bytes of padding, which is again constant, and starting with a new initial state.

There are 3 types of optimizations we can do in this case:

  • Start by inlining the 3 transforms into one function body, together with all the initializations. The intermediary conversion to bytes after the second transform and then back to integers for the 3rd transform can be bypassed (serializing & deserializing is a no-op).
  • Observing that lots of intermediary values now actually become known at compile time. In particular, lots of values occurring during the 2nd transform (whose input is 100% fixed). I did this by simply writing the code up to this point, adding printf statements on these intermediaries, then turning the printed values into constants in the code and skipping their computation.
  • Taking advantage of vectorization and/or instruction level parallellism. In the case of x86 and ARM SHA instructions, we literally just duplicate every line of code (after doing the operations above), alternating between working on variables relating to a first or a second 64-byte input. This works because these instructions have a long pipeline, and there are sufficient registers available in hardware to store (most) of the data relating to two instances at once. This improves the throughput.

The individual commits in https://github.com/sipa/bitcoin/commits/pr24115 show the process.

Note that I don't think it's really required for verifying correctness to see these steps (otherwise I'd have argued for including them in this PR), but it may help understand how it came to be.

@Sjors
Copy link
Member

Sjors commented Feb 1, 2022

I think this is the step that confuses me:

The second continues on the resulting state, processing 64 bytes of padding. That padding is a constant (it's just a function of the length of the input).

If the first transform is the equivalent of a single sha256(64 bytes) and the third is the equivalent of a second sha256() on the 32 byte result of the first, what is the second transform doing?

I did this by simply writing the code up to this point, adding printf statements on these intermediaries, then turning the printed values into constants in the code and skipping their computation.

This is definitely worth documenting (can be another PR). Even nicer if we can generate the values in a Python script (for manual comparison, not code generation).

@sipa
Copy link
Member

sipa commented Feb 1, 2022

@Sjors

There are two SHA256 invocations:

  • H1 = SHA256(input)
  • H2 = SHA256(H1)

Input is 64 bytes, which means it gets 64 bytes of padding (because the padding is always between 9 and 72 bytes long, and the result is always a multiple of 64).

For H2, SHA256(H1) just gets a 32-byte input, so it also only gets a 32-byte padding, and the result just needs one transform.

So we can write it this way:

  • H1 = Transform(Transform(Init(), input), Pad(64))
  • H2 = Transform(Init(), H1 + Pad(32))

The first transform is the inner one for H1, the second the outer one for H1. The third transform is the H2 one.

@Sjors
Copy link
Member

Sjors commented Feb 1, 2022

Ah that makes sense.

Input is 64 bytes, which means it gets 64 bytes of padding

I naively assumed a 64 byte message wasn't padded, but it is: https://datatracker.ietf.org/doc/html/rfc6234#section-4.1

@mutatrum
Copy link
Contributor

mutatrum commented Feb 1, 2022

IBD up to block 700000 on a Rock Pi 4a w/ NVMe SSD, assumevalid=0, dbcache=2000:

master (bd482b3): 68H52M
shani (4abca94): 65H29M

Improvement ~5%

@sipa
Copy link
Member

sipa commented Feb 1, 2022

@Sjors

I naively assumed a 64 byte message wasn't padded, but it is: https://datatracker.ietf.org/doc/html/rfc6234#section-4.1

Yes, it has to be. Otherwise you'd have a trivial 2nd preimage attack between hash(X) and hash(X || padding(len(X))), for non-multiple-of-64-bytes X.

@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 3, 2022

Guix builds

File commit 133f73e
(master)
commit 19a5c3f
(master and this pull)
SHA256SUMS.part 4d29ebeb3309d60d... b7844e9c678b97f6...
*-aarch64-linux-gnu-debug.tar.gz c29ba0d0426063e8... 89ff765de2630eb7...
*-aarch64-linux-gnu.tar.gz e52c846b1841b3eb... d6ea7915e264d3be...
*-arm-linux-gnueabihf-debug.tar.gz 5d3f9731cf88da5b... d254cb4dc971d678...
*-arm-linux-gnueabihf.tar.gz 282b33b13dd1f8a0... ea7a0cf5a93cb32a...
*-arm64-apple-darwin.tar.gz 3f0dba0a6549c410... 3400772805824d22...
*-osx-unsigned.dmg 6692809452b6cb62... 25da88b4e7f96778...
*-osx-unsigned.tar.gz 4366f453672f800d... 3ad587987969546b...
*-osx64.tar.gz 5387d0cdc36be37a... 198a4d9b96a87a5f...
*-powerpc64-linux-gnu-debug.tar.gz 1ed57908059941ef... e3a27aa697de6985...
*-powerpc64-linux-gnu.tar.gz 21079e1bae0f459b... 7c91ce0fa9a40f23...
*-powerpc64le-linux-gnu-debug.tar.gz 6b441c519f2d3185... 2bfee29cc7203662...
*-powerpc64le-linux-gnu.tar.gz 132f95573d0fd005... 48a553082d2d03b8...
*-riscv64-linux-gnu-debug.tar.gz 3c5e1f8e3d9aa92a... 64dc9f50e4993c2e...
*-riscv64-linux-gnu.tar.gz b17efbca76425fc5... 78bba78a4dd894cb...
*-x86_64-linux-gnu-debug.tar.gz 9f61cd7fca8d6425... edbfec791406e681...
*-x86_64-linux-gnu.tar.gz 323dc8d7d0aa8290... b38a112a72acb36a...
*.tar.gz 5887839fbd29cd1c... 58afa778369cde02...
guix_build.log 53868781dafe6675... 8cae536cac2f270c...
guix_build.log.diff a14b825c6610d5c4...

@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 11, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #24322 ([kernel 1/n] Introduce initial libbitcoinkernel by dongcarl)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@laanwj
Copy link
Member

laanwj commented Feb 14, 2022

Code review and lightly tested ACK aaa1d03
I have checked

  • that the code gets compiled (bitcoind contains the instructions)
  • on a old ARM64 device without the instruction set that it correctly doesn't enable the code.
  • on a recent ARM64 with the instruction set that it uses and enables the code

@laanwj laanwj merged commit c23bf06 into bitcoin:master Feb 14, 2022
@prusnak prusnak deleted the armv8-shani branch February 14, 2022 22:17
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Feb 15, 2022
MSG3 = vreinterpretq_u32_u8(vrev32q_u8(vld1q_u8(chunk + 48)));
chunk += 64;

// Original implemenation preloaded message and constant addition which was 1-3% slower.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: implemenation ==> implementation

PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Mar 29, 2022
aaa1d03 Add optimized sha256d64_arm_shani::Transform_2way (Pieter Wuille)
fe06298 Implement sha256_arm_shani::Transform (Pavol Rusnak)
48a72fa Add sha256_arm_shani to build system (Pavol Rusnak)
c2b7934 Rename SHANI to X86_SHANI to allow future implementation of ARM_SHANI (Pavol Rusnak)

Pull request description:

  This PR adds support for ARMv8 SHA2 Intrinsics.

  Fixes bitcoin#13401 and bitcoin#17414

  * Integration part was done by me.
  * The original SHA2 NI code comes from https://github.com/noloader/SHA-Intrinsics/blob/master/sha256-arm.c
  * Minor optimizations from https://github.com/rollmeister/bitcoin-armv8/blob/master/src/crypto/sha256.cpp are applied too.
  * The 2-way transform added by @sipa

ACKs for top commit:
  laanwj:
    Code review and lightly tested ACK aaa1d03

Tree-SHA512: 9689d6390c004269cb1ee79ed05430d7d35a6efef2554a2b6732f7258a11e7e959b3306c04b4e8637a9623fb4c12d1c1b3592da0ff0dc6d737932db302509669

# Conflicts:
#	configure.ac
#	src/Makefile.am
#	src/crypto/sha256.cpp
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Mar 29, 2022
aaa1d03 Add optimized sha256d64_arm_shani::Transform_2way (Pieter Wuille)
fe06298 Implement sha256_arm_shani::Transform (Pavol Rusnak)
48a72fa Add sha256_arm_shani to build system (Pavol Rusnak)
c2b7934 Rename SHANI to X86_SHANI to allow future implementation of ARM_SHANI (Pavol Rusnak)

Pull request description:

  This PR adds support for ARMv8 SHA2 Intrinsics.

  Fixes bitcoin#13401 and bitcoin#17414

  * Integration part was done by me.
  * The original SHA2 NI code comes from https://github.com/noloader/SHA-Intrinsics/blob/master/sha256-arm.c
  * Minor optimizations from https://github.com/rollmeister/bitcoin-armv8/blob/master/src/crypto/sha256.cpp are applied too.
  * The 2-way transform added by @sipa

ACKs for top commit:
  laanwj:
    Code review and lightly tested ACK aaa1d03

Tree-SHA512: 9689d6390c004269cb1ee79ed05430d7d35a6efef2554a2b6732f7258a11e7e959b3306c04b4e8637a9623fb4c12d1c1b3592da0ff0dc6d737932db302509669

# Conflicts:
#	configure.ac
#	src/Makefile.am
#	src/crypto/sha256.cpp
@hebasto hebasto mentioned this pull request Apr 1, 2022
5 tasks
laanwj added a commit that referenced this pull request May 11, 2022
…shani}

7fd0860 Bugfix: configure: Define defaults for enable_arm_{crc,shani} (Luke Dashjr)

Pull request description:

  Fix for #17398 and #24115

  Trivial, mostly for consistency (you'd have to *try* to break this)

ACKs for top commit:
  pk-b2:
    ACK 7fd0860
  seejee:
    ACK 7fd0860
  vincenzopalazzo:
    ACK 7fd0860

Tree-SHA512: 51c389787c369f431ca57071f03392438bff9fd41f128c63ce74ca30d2257213f8be225efcb5c1329ad80b714f44427d721215d4f848cc8e63060fa5bc8f1f2e
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 11, 2022
…m_{crc,shani}

7fd0860 Bugfix: configure: Define defaults for enable_arm_{crc,shani} (Luke Dashjr)

Pull request description:

  Fix for bitcoin#17398 and bitcoin#24115

  Trivial, mostly for consistency (you'd have to *try* to break this)

ACKs for top commit:
  pk-b2:
    ACK bitcoin@7fd0860
  seejee:
    ACK bitcoin@7fd0860
  vincenzopalazzo:
    ACK bitcoin@7fd0860

Tree-SHA512: 51c389787c369f431ca57071f03392438bff9fd41f128c63ce74ca30d2257213f8be225efcb5c1329ad80b714f44427d721215d4f848cc8e63060fa5bc8f1f2e
@bitcoin bitcoin locked and limited conversation to collaborators Feb 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ARMv8 sha2 support
9 participants