New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wasm] Add intrinsics for clz and ctz instructions #77777
Conversation
Add intrinsics for leading and trailing zero count for wasm. Introduce new internal class WasmBase for non-SIMD wasm intrinsics. These are always used in AOT build, unlike SIMD intrinsics, which are disabled by default.
Example of the relevant code change: $ wa-diff -d -f SpanHelper.*Index bin-simd\Release\AppBundle\dotnet.wasm bin-simd-base\Release\AppBundle\dotnet.wasm
(func corlib_System_SpanHelpers_IndexOfValueType_byte__byte_int(param i32, i32, i32, i32) (result i32))
...
local.get $9
i8x16.bitmask [SIMD]
- local.tee $6
+ i32.ctz
- i32.eqz
- if
- local.get $4
- i32.const 32
- i32.add
- return
-
- i32.const 2131592
- i32.load align:2
- local.get $6
- i32.const 0
- local.get $6
- i32.sub
- i32.and
- i32.const 125613361
- i32.mul
- i32.const 27
- i32.shr.u
- i32.add
- local.tee $6
- i32.eqz
- br.if
- local.get $6
- i32.load8.u
local.get $4
i32.add
... |
/azp run runtime-wasm-perf |
Azure Pipelines successfully started running 1 pipeline(s). |
I measured the performance in SpanHelper, where it saves few us as it is used only once. Let see if microbenchmarks can show other areas where it might help more. The size saving is nice, I see cca 1k reduction in bench sample's |
Refactor the code to avoid adding ifdefs
namespace System.Runtime.Intrinsics.Wasm | ||
{ | ||
[Intrinsic] | ||
internal abstract class WasmBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just handle BitOperations.LeadingZeroCount
and BitOperations.TrailingZeroCount
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to stay close to the other platforms implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The others exist largely due to historical reasons and because on x86 they aren't part of the "baseline" instruction set. There will ultimately be better codegen and less work for the compiler if you recognize the BitOperations
APIs directly. This is because the compiler won't have to do inlining, dead code elimination for n
platforms, etc.
I'm planning on doing the same for RyuJIT and recognizing BitOperations
directly for the same reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you plan to do with the Log2 and RoundUpToPowerOf2 methods? Implement them as direct intrinsics too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Log2
yes. For RoundUpToPowerOf2
that's TBD, but potentially.
tzcnt and lzcnt are the most hot methods and the most impactful in terms of "compiler work" required to inline and optimize down to a single instruction today.
src/mono/mono/mini/intrinsics.c
Outdated
if (cfg->opt & MONO_OPT_SIMD) { | ||
#endif | ||
ins = mono_emit_simd_intrinsics (cfg, cmethod, fsig, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be to move the if (cfg->opt check to mono_emit_simd_intrinsics ()).
The failing tests are unrelated and failing in main and in other PRs too. |
I will address the feedback in following PR. |
Add intrinsics for leading and trailing zero count for wasm.
Introduce new internal class
WasmBase
for non-SIMD wasm intrinsics. These are always used in AOT builds, unlike SIMD intrinsics, which are disabled by default.