Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Add intrinsics for clz and ctz instructions #77777

Merged
merged 8 commits into from Nov 8, 2022

Conversation

radekdoulik
Copy link
Member

Add intrinsics for leading and trailing zero count for wasm.

Introduce new internal class WasmBase for non-SIMD wasm intrinsics. These are always used in AOT builds, unlike SIMD intrinsics, which are disabled by default.

Add intrinsics for leading and trailing zero count for wasm.

Introduce new internal class WasmBase for non-SIMD wasm intrinsics.
These are always used in AOT build, unlike SIMD intrinsics, which are
disabled by default.
@radekdoulik
Copy link
Member Author

radekdoulik commented Nov 2, 2022

Example of the relevant code change:

$ wa-diff -d -f SpanHelper.*Index bin-simd\Release\AppBundle\dotnet.wasm bin-simd-base\Release\AppBundle\dotnet.wasm
(func corlib_System_SpanHelpers_IndexOfValueType_byte__byte_int(param i32, i32, i32, i32) (result i32))
...
       local.get $9
       i8x16.bitmask    [SIMD]
-      local.tee $6
+      i32.ctz
-      i32.eqz
-      if
-       local.get $4
-       i32.const 32
-       i32.add
-       return
-
-      i32.const 2131592
-      i32.load align:2
-      local.get $6
-      i32.const 0
-      local.get $6
-      i32.sub
-      i32.and
-      i32.const 125613361
-      i32.mul
-      i32.const 27
-      i32.shr.u
-      i32.add
-      local.tee $6
-      i32.eqz
-      br.if
-      local.get $6
-      i32.load8.u
       local.get $4
       i32.add
...

@radekdoulik
Copy link
Member Author

/azp run runtime-wasm-perf

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@radekdoulik
Copy link
Member Author

I measured the performance in SpanHelper, where it saves few us as it is used only once. Let see if microbenchmarks can show other areas where it might help more.

The size saving is nice, I see cca 1k reduction in bench sample's dotnet.wasm in default aot Release configuration and cca 3k for SIMD Release configuration.

Refactor the code to avoid adding ifdefs
namespace System.Runtime.Intrinsics.Wasm
{
[Intrinsic]
internal abstract class WasmBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just handle BitOperations.LeadingZeroCount and BitOperations.TrailingZeroCount directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to stay close to the other platforms implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The others exist largely due to historical reasons and because on x86 they aren't part of the "baseline" instruction set. There will ultimately be better codegen and less work for the compiler if you recognize the BitOperations APIs directly. This is because the compiler won't have to do inlining, dead code elimination for n platforms, etc.

I'm planning on doing the same for RyuJIT and recognizing BitOperations directly for the same reasons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you plan to do with the Log2 and RoundUpToPowerOf2 methods? Implement them as direct intrinsics too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Log2 yes. For RoundUpToPowerOf2 that's TBD, but potentially.

tzcnt and lzcnt are the most hot methods and the most impactful in terms of "compiler work" required to inline and optimize down to a single instruction today.

if (cfg->opt & MONO_OPT_SIMD) {
#endif
ins = mono_emit_simd_intrinsics (cfg, cmethod, fsig, args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to move the if (cfg->opt check to mono_emit_simd_intrinsics ()).

@radekdoulik
Copy link
Member Author

The failing tests are unrelated and failing in main and in other PRs too.

@radekdoulik radekdoulik merged commit 134968a into dotnet:main Nov 8, 2022
@radekdoulik
Copy link
Member Author

I will address the feedback in following PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants