You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Description of changes:
Graviton 4 uses Neoverse V2 cores which we were previously not detecting
which resulted in poor performing implementations being used. It still
produced the correct answer, it just didn't take full advantage of the
CPU's capabilities.
### Call-outs:
I thought about combining the ARM CPU capability flag for Neoverse V1
and V2 cores into one flag but we might have a future usecase that would
need specific handling even though right now they behave the same.
We also need to test Apple M2 and M3 CPUs which will probably also want
the same optimizations as M1. This also might make sense to combine into
a single Apple "M" ARM capability flag.
### Testing:
Built and ran locally. On a Graviton 4 instance:
|Algorithm|Before|After|
|---|---|---|
|RSA 2048 sign|929.6 ops/sec|1,397.5 ops/sec|
|ECDH P-384|3,541.2 ops/sec|3,744.8 ops/sec|
|ECDH P-521|1,885.1 ops/sec|2,406.7 ops/sec
|AES 256 GCM 16 bytes|204.4 MB/s|203.9 MB/s|
|AES 256 GCM 16kb bytes|4,500.5 MB/s|6,019.0 MB/s|
By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
0 commit comments