-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LibCrypto: Add x86 specific versions of SHA1 and SHA2 #24522
Conversation
1e7e422
to
d1cc231
Compare
Yes, no problem, you have done lot of work on this and helped me much, I thank you for this. Now, towards AES... should be easier than this. |
// FIXME: Use __builtin_cpu_supports("sha") when compilers support it | ||
constexpr u32 cpuid_sha_ebx = 1 << 29; | ||
u32 eax, ebx, ecx, edx; | ||
__cpuid_count(7, 0, eax, ebx, ecx, edx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can hide __cpuid_count
in some AK::cpu_supports(Capability::SHA)
which would also cache the result. (This will be similar to how __builtin_cpu_supports
works)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well copilot said the same, but sadly we dont have that API yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's never late to introduce a new helper!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is with clang only, GCC 11.1 supports both the function attribute and the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, we later figured on Discord that neither Clang nor GCC did the correct thing. Compare assembly for use_f
and use_transform_impl
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was wrong. GCC does support the __builtin_cpu_supports("sha")
test since 11.1 but clang does not, not even the latest one version 18. https://godbolt.org/z/cP7WGYqGq Both compilers do not support the automatic dispatch thing (Function Multiversioning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add test whether the CPUID
has at least 7
leaves before reading from the 7th
leaf. This is done by setting the eax
register to zero
and invoking the cpuid
instruction, or, from the C
language, by calling the __get_cpuid_max(0, NULL)
function provided by the <cpuid.h>
header. https://godbolt.org/z/v6q89rqhn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add test whether the
CPUID
has at least7
leaves before reading from the7th
leaf. This is done by setting theeax
register tozero
and invoking thecpuid
instruction, or, from theC
language, by calling the__get_cpuid_max(0, NULL)
function provided by the<cpuid.h>
header. https://godbolt.org/z/v6q89rqhn
For now I'll elide that, as memset
also avoids that, and I am pretty sure that cpus without the Extentions leaf are missing other features we rely on
f4a07c3
to
0d01559
Compare
Before (Clang):
After:
So a 5.3x speedup for sha1 on my end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope I'm not too annoying with asking to use C++ standard syntax for attributes wherever possible.
static_assert(IsSame<ElementOf<i8x4>, i8>); | ||
static_assert(IsSame<ElementOf<f32x4>, float>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move these into tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO static_asserts should stay in the header, not sure if we have many of those in the test files
for AK/Concepts.h
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are countless instances of static_assert
in Tests/AK, while the only thing that is being tested inline in a header in AK is explode_byte
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave them here for now
(Side note the Kernel likes inline static_asserts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about (untested)
template<typename ElementType, size_t n, auto function>
void test(ReadonlySpan<u64> values, ReadonlySpan<u64> expected)
{
VERIFY(values.size() == n);
VERIFY(expected.size() == n);
using Type = ElementType __attribute__((vector_size(n * sizeof(ElementType))));
auto v = AK::SIMD::load_unaligned<Type>(&values[0]);
v = function(v);
ElementType result[n];
AK::SIMD::store_unaligned(result, v);
for (size_t i = 0; i < n; ++i)
EXPECT_EQ(result[i], expected[i]);
}
and expressing all 1000 lines of tests with a nice and short array of structs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this wont work on GCC, but will look into it
We currently pretend that these return an ElfAddr, while in actuality they return a function pointer. UBSAN may check this and complain about it, so let's just disable it for that line.
1993c31
to
aa6d286
Compare
3ec6983
to
a0a7ff2
Compare
Otherwise we'd hit a VERIFY in AK::SIMD::shuffle() when that operand contains an out-of-range value, the spec tests indicate that a swizzle with an out-of-range index should return 0.
Co-Authored-By: Hendiadyoin1 <leon.a@serenityos.org>
Co-Authored-By: Hendiadyoin1 <leon.a@serenityos.org>
@gmta, do the first few SIMD commits look good to you? (I think this is fine for merging, but you know the SIMD bits much better.) |
CC @MarekKnapek
I revived your PR, I hope you dont mind me giving my self co-author
Changes since #23958:
sha
target seems to be silently ignored at best by all compilers