Improve the handling of popcnt and tzcnt on arm64 by tannergooding · Pull Request #128677 · dotnet/runtime

tannergooding · 2026-05-28T03:22:20Z

Previously popcnt wasn't handled at all and tzcnt was imported as two IR nodes which could negatively impact certain heuristics and pattern matching.

dotnet-policy-service · 2026-05-28T03:23:41Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

Improves ARM64 codegen for int.PopCount/long.PopCount and int.TrailingZeroCount/long.TrailingZeroCount by representing them as a single GT_INTRINSIC IR node instead of leaving popcnt unhandled and emitting tzcnt as two separate hwintrinsic nodes (ReverseElementBits + LeadingZeroCount). Lowering to actual instructions (cnt/addv for popcnt, rbit/clz for tzcnt) is now done directly in the codegen, which should preserve more information for heuristics and pattern matching.

Changes:

In the importer, emit a GenTreeIntrinsic(TYP_INT, ...) node for ARM64 PopCount (previously not handled) and for ARM64 TrailingZeroCount (previously decomposed into two hwintrinsics).
Add ARM64 LSRA build logic for NI_PRIMITIVE_PopCount (requests an internal SIMD temp) and NI_PRIMITIVE_TrailingZeroCount.
Add ARM64 codegen for the two intrinsics: cnt+addv over an 8-byte SIMD register for PopCount, and rbit+clz (sized via emitActualTypeSize(srcNode)) for TrailingZeroCount.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
src/coreclr/jit/importercalls.cpp	Build a single `NI_PRIMITIVE_PopCount`/`NI_PRIMITIVE_TrailingZeroCount` intrinsic node on ARM64; set `baseType = TYP_INT` so the existing widen-back cast handles `long` return types.
src/coreclr/jit/lsraarm64.cpp	Add LSRA cases: PopCount needs an internal float (SIMD) reg; TrailingZeroCount is a straight 1-src, 1-dst integer op.
src/coreclr/jit/codegenarmarch.cpp	Emit `cnt`/`addv` via a SIMD temp for PopCount and `rbit`/`clz` for TrailingZeroCount under `TARGET_ARM64`.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

tannergooding · 2026-05-28T17:02:02Z

CC. @dotnet/jit-contrib, @EgorBo for review. This ensures that tzcnt and popcnt are properly imported as intrinsic on Arm64, allowing other phases to handle those nodes and avoiding the inliner.

Diffs are generally positive, showing up to -7.4k bytes of codegen on Linux Arm64 and nearly as much on Windows Arm64. The overall PerfScore is an improvement as well.

There are a few regressions intermixed, these mostly look to be due to additional CSEs, although a couple are from extra casts inserted for small types.

tannergooding · 2026-05-28T17:09:19Z

There are a few regressions intermixed, these mostly look to be due to additional CSEs, although a couple are from extra casts inserted for small types.

The extra casts look easy enough to fix, seems to be because we aren't handling it in IntegralRange::ForNode like we did with ToScalar(). I'd prefer to do that in a follow up PR, however, since there's a few intrinsics to add there; not just tzcnt/popcnt

EgorBo · 2026-05-29T12:14:04Z

This feels like yet another case where I don't understand why the managed code in BitOperations

isn't inlined into exactly what you emit by hands here. My understanding that it happens because we don't do opts for AdvSimd.PopCount that we do for PRIMITIVE_PopCount later in JIT (meaning users code that calls AdvSimd.PopCount directly won't be optimized this much)

tannergooding · 2026-05-29T12:37:53Z

meaning users code that calls AdvSimd.PopCount directly won't be optimized this much

Correct, because it is a SIMD instruction and must be presumed to operate across all SIMD8/SIMD16 bytes. We have no way to trivially map that back, especially given the more complex IR and our limited ability to track ranges already.

This allows us to track it directly as an integer operation and ignore the fact that codegen actually involves SIMD operations due to a limitation in the Arm64 instruction set.

Copilot AI review requested due to automatic review settings May 28, 2026 03:22

Copilot started reviewing on behalf of tannergooding May 28, 2026 03:22 View session

github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 28, 2026

dotnet-policy-service Bot assigned tannergooding May 28, 2026

Copilot AI reviewed May 28, 2026

View reviewed changes

tannergooding mentioned this pull request May 28, 2026

Import lzcnt/tzcnt/popcnt as returning int so rngchk works #128658

Draft

tannergooding force-pushed the arm64-popcnt-tzcnt branch 2 times, most recently from 9607e31 to bf02815 Compare May 28, 2026 04:22

Copilot AI review requested due to automatic review settings May 28, 2026 04:22

Copilot started reviewing on behalf of tannergooding May 28, 2026 04:22 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

tannergooding force-pushed the arm64-popcnt-tzcnt branch from bf02815 to 24d3c5f Compare May 28, 2026 04:45

Copilot AI review requested due to automatic review settings May 28, 2026 05:24

tannergooding force-pushed the arm64-popcnt-tzcnt branch from 24d3c5f to bc6488e Compare May 28, 2026 05:24

Copilot started reviewing on behalf of tannergooding May 28, 2026 05:24 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

tannergooding added 2 commits May 28, 2026 04:58

Improve the handling of popcnt and tzcnt on arm64

bc6488e

Don't zero for the 64-bit popcount case as it fills the whole register

f6f2b58

tannergooding marked this pull request as ready for review May 28, 2026 16:56

Copilot AI review requested due to automatic review settings May 28, 2026 16:56

Copilot started reviewing on behalf of tannergooding May 28, 2026 16:56 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

tannergooding mentioned this pull request May 28, 2026

Ensure that applicable intrinsics have their ranges tracked by vn/assertionprop/rngchk #128722

Open

tannergooding requested a review from EgorBo May 28, 2026 20:59

EgorBo approved these changes May 29, 2026

View reviewed changes

tannergooding enabled auto-merge (squash) May 29, 2026 13:08

tannergooding merged commit db51bb8 into dotnet:main May 29, 2026
141 of 146 checks passed

tannergooding deleted the arm64-popcnt-tzcnt branch May 29, 2026 15:56

dotnet-maestro Bot mentioned this pull request May 30, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#6925

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the handling of popcnt and tzcnt on arm64#128677

Improve the handling of popcnt and tzcnt on arm64#128677
tannergooding merged 2 commits into
dotnet:mainfrom
tannergooding:arm64-popcnt-tzcnt

tannergooding commented May 28, 2026

Uh oh!

dotnet-policy-service Bot commented May 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

tannergooding commented May 28, 2026

Uh oh!

tannergooding commented May 28, 2026 •

edited

Loading

Uh oh!

EgorBo commented May 29, 2026 •

edited

Loading

Uh oh!

tannergooding commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tannergooding commented May 28, 2026

Uh oh!

dotnet-policy-service Bot commented May 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

tannergooding commented May 28, 2026

Uh oh!

tannergooding commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tannergooding commented May 28, 2026 •

edited

Loading

EgorBo commented May 29, 2026 •

edited

Loading