Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ARM32 atomic intrinsics #97792

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

MichalPetryka
Copy link
Contributor

Implements Interlocked Exchange, Add and CompareExchange as "must expand" intrinsics on ARM32 for types equal to or smaller than pointer size.

Contributes to #86915.
Fixes #9982.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 31, 2024
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 31, 2024
@ghost
Copy link

ghost commented Jan 31, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Implements Interlocked Exchange, Add and CompareExchange as "must expand" intrinsics on ARM32 for types equal to or smaller than pointer size.

Contributes to #86915.
Fixes #9982.

Author: MichalPetryka
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@filipnavara filipnavara self-requested a review January 31, 2024 21:54
@ryujit-bot
Copy link

Diff results for #97792

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,238,105 contexts (829,328 MinOpts, 1,408,777 FullOpts).

MISSED contexts: base: 71,274 (3.08%), diff: 72,559 (3.14%)

Overall (-122,174 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,250,208 -1,344
benchmarks.run_pgo.linux.arm.checked.mch 63,745,630 -18,358
benchmarks.run_tiered.linux.arm.checked.mch 21,504,686 -1,306
coreclr_tests.run.linux.arm.checked.mch 321,631,208 -10,726
libraries.crossgen2.linux.arm.checked.mch 34,521,638 -22
libraries.pmi.linux.arm.checked.mch 49,769,404 -7,214
libraries_tests.run.linux.arm.Release.mch 243,597,272 -52,352
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,156,910 -29,380
realworld.run.linux.arm.checked.mch 13,589,492 -1,472
MinOpts (+532 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 11,199,966 -76
benchmarks.run_tiered.linux.arm.checked.mch 8,653,000 -26
coreclr_tests.run.linux.arm.checked.mch 212,477,588 +414
libraries_tests.run.linux.arm.Release.mch 120,969,132 +220
FullOpts (-122,706 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,861,006 -1,344
benchmarks.run_pgo.linux.arm.checked.mch 52,545,664 -18,282
benchmarks.run_tiered.linux.arm.checked.mch 12,851,686 -1,280
coreclr_tests.run.linux.arm.checked.mch 109,153,620 -11,140
libraries.crossgen2.linux.arm.checked.mch 34,520,408 -22
libraries.pmi.linux.arm.checked.mch 49,663,180 -7,214
libraries_tests.run.linux.arm.Release.mch 122,628,140 -52,572
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,086,174 -29,380
realworld.run.linux.arm.checked.mch 13,154,192 -1,472

Details here


Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.41% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.08%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.21%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.41%
realworld.run.linux.arm.checked.mch -0.09%
MinOpts (-0.10% to -0.06%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.08%
benchmarks.run_pgo.linux.arm.checked.mch -0.10%
benchmarks.run_tiered.linux.arm.checked.mch -0.10%
coreclr_tests.run.linux.arm.checked.mch -0.06%
libraries.crossgen2.linux.arm.checked.mch -0.06%
libraries.pmi.linux.arm.checked.mch -0.07%
libraries_tests.run.linux.arm.Release.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.08%
realworld.run.linux.arm.checked.mch -0.10%
FullOpts (-0.43% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.10%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.25%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.43%
realworld.run.linux.arm.checked.mch -0.09%

Details here


@ryujit-bot
Copy link

Diff results for #97792

Throughput diffs

Throughput diffs for windows/arm64 ran on linux/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch -0.01%

Details here


@ryujit-bot
Copy link

Diff results for #97792

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,238,105 contexts (829,328 MinOpts, 1,408,777 FullOpts).

MISSED contexts: base: 71,274 (3.08%), diff: 72,559 (3.14%)

Overall (-141,104 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,250,208 -1,954
benchmarks.run_pgo.linux.arm.checked.mch 63,745,630 -20,060
benchmarks.run_tiered.linux.arm.checked.mch 21,504,686 -1,766
coreclr_tests.run.linux.arm.checked.mch 321,631,208 -13,896
libraries.crossgen2.linux.arm.checked.mch 34,521,638 -462
libraries.pmi.linux.arm.checked.mch 49,769,404 -8,650
libraries_tests.run.linux.arm.Release.mch 243,597,272 -61,452
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,156,910 -31,006
realworld.run.linux.arm.checked.mch 13,589,492 -1,858
MinOpts (+532 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 11,199,966 -76
benchmarks.run_tiered.linux.arm.checked.mch 8,653,000 -26
coreclr_tests.run.linux.arm.checked.mch 212,477,588 +414
libraries_tests.run.linux.arm.Release.mch 120,969,132 +220
FullOpts (-141,636 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,861,006 -1,954
benchmarks.run_pgo.linux.arm.checked.mch 52,545,664 -19,984
benchmarks.run_tiered.linux.arm.checked.mch 12,851,686 -1,740
coreclr_tests.run.linux.arm.checked.mch 109,153,620 -14,310
libraries.crossgen2.linux.arm.checked.mch 34,520,408 -462
libraries.pmi.linux.arm.checked.mch 49,663,180 -8,650
libraries_tests.run.linux.arm.Release.mch 122,628,140 -61,672
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,086,174 -31,006
realworld.run.linux.arm.checked.mch 13,154,192 -1,858

Details here


Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.41% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.08%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.21%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.41%
realworld.run.linux.arm.checked.mch -0.10%
MinOpts (-0.10% to -0.06%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.08%
benchmarks.run_pgo.linux.arm.checked.mch -0.10%
benchmarks.run_tiered.linux.arm.checked.mch -0.10%
coreclr_tests.run.linux.arm.checked.mch -0.06%
libraries.crossgen2.linux.arm.checked.mch -0.06%
libraries.pmi.linux.arm.checked.mch -0.07%
libraries_tests.run.linux.arm.Release.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.08%
realworld.run.linux.arm.checked.mch -0.10%
FullOpts (-0.42% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.10%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.25%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.42%
realworld.run.linux.arm.checked.mch -0.10%

Details here


@filipnavara
Copy link
Member

filipnavara commented Feb 1, 2024

Just a heads up. The tests fail on device. I am looking into it with Michal to see if we can find the root cause.

@filipnavara
Copy link
Member

filipnavara commented Feb 1, 2024

1d24a02 fixed most of the failures, there's still incorrect sign extension of comparand instead of zero extension for 16-bit variant.

@ryujit-bot
Copy link

Diff results for #97792

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.osx.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #97792

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,237,693 contexts (829,328 MinOpts, 1,408,365 FullOpts).

MISSED contexts: base: 71,275 (3.08%), diff: 72,560 (3.14%)

Overall (-71,018 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,244,634 -738
benchmarks.run_pgo.linux.arm.checked.mch 63,699,020 -14,426
benchmarks.run_tiered.linux.arm.checked.mch 21,499,760 -436
coreclr_tests.run.linux.arm.checked.mch 321,580,944 -6,798
libraries.crossgen2.linux.arm.checked.mch 34,518,950 +402
libraries.pmi.linux.arm.checked.mch 49,747,738 -2,996
libraries_tests.run.linux.arm.Release.mch 243,491,778 -42,144
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,144,284 -3,184
realworld.run.linux.arm.checked.mch 13,587,278 -698
MinOpts (+2,774 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 11,199,966 +400
benchmarks.run_tiered.linux.arm.checked.mch 8,653,000 +252
coreclr_tests.run.linux.arm.checked.mch 212,477,588 +782
libraries_tests.run.linux.arm.Release.mch 120,969,132 +1,340
FullOpts (-73,792 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,855,432 -738
benchmarks.run_pgo.linux.arm.checked.mch 52,499,054 -14,826
benchmarks.run_tiered.linux.arm.checked.mch 12,846,760 -688
coreclr_tests.run.linux.arm.checked.mch 109,103,356 -7,580
libraries.crossgen2.linux.arm.checked.mch 34,517,720 +402
libraries.pmi.linux.arm.checked.mch 49,641,514 -2,996
libraries_tests.run.linux.arm.Release.mch 122,522,646 -43,484
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,073,548 -3,184
realworld.run.linux.arm.checked.mch 13,151,978 -698

Details here


Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.41% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.11%
coreclr_tests.run.linux.arm.checked.mch -0.08%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.22%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.41%
realworld.run.linux.arm.checked.mch -0.09%
MinOpts (-0.10% to -0.06%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.08%
benchmarks.run_pgo.linux.arm.checked.mch -0.10%
benchmarks.run_tiered.linux.arm.checked.mch -0.10%
coreclr_tests.run.linux.arm.checked.mch -0.06%
libraries.crossgen2.linux.arm.checked.mch -0.06%
libraries.pmi.linux.arm.checked.mch -0.07%
libraries_tests.run.linux.arm.Release.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.08%
realworld.run.linux.arm.checked.mch -0.10%
FullOpts (-0.42% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.10%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.25%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.42%
realworld.run.linux.arm.checked.mch -0.09%

Details here


@ryujit-bot
Copy link

Diff results for #97792

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,237,693 contexts (829,328 MinOpts, 1,408,365 FullOpts).

MISSED contexts: base: 71,275 (3.08%), diff: 72,560 (3.14%)

Overall (-71,018 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,244,634 -738
benchmarks.run_pgo.linux.arm.checked.mch 63,699,020 -14,426
benchmarks.run_tiered.linux.arm.checked.mch 21,499,760 -436
coreclr_tests.run.linux.arm.checked.mch 321,580,944 -6,798
libraries.crossgen2.linux.arm.checked.mch 34,518,950 +402
libraries.pmi.linux.arm.checked.mch 49,747,738 -2,996
libraries_tests.run.linux.arm.Release.mch 243,491,778 -42,144
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,144,284 -3,184
realworld.run.linux.arm.checked.mch 13,587,278 -698
MinOpts (+2,774 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 11,199,966 +400
benchmarks.run_tiered.linux.arm.checked.mch 8,653,000 +252
coreclr_tests.run.linux.arm.checked.mch 212,477,588 +782
libraries_tests.run.linux.arm.Release.mch 120,969,132 +1,340
FullOpts (-73,792 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,855,432 -738
benchmarks.run_pgo.linux.arm.checked.mch 52,499,054 -14,826
benchmarks.run_tiered.linux.arm.checked.mch 12,846,760 -688
coreclr_tests.run.linux.arm.checked.mch 109,103,356 -7,580
libraries.crossgen2.linux.arm.checked.mch 34,517,720 +402
libraries.pmi.linux.arm.checked.mch 49,641,514 -2,996
libraries_tests.run.linux.arm.Release.mch 122,522,646 -43,484
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,073,548 -3,184
realworld.run.linux.arm.checked.mch 13,151,978 -698

Details here


Throughput diffs

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.osx.arm64.checked.mch +0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Throughput diffs for linux/arm ran on windows/x86

Overall (-0.41% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.08%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.22%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.41%
realworld.run.linux.arm.checked.mch -0.09%
MinOpts (-0.10% to -0.06%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.08%
benchmarks.run_pgo.linux.arm.checked.mch -0.10%
benchmarks.run_tiered.linux.arm.checked.mch -0.10%
coreclr_tests.run.linux.arm.checked.mch -0.06%
libraries.crossgen2.linux.arm.checked.mch -0.06%
libraries.pmi.linux.arm.checked.mch -0.07%
libraries_tests.run.linux.arm.Release.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.08%
realworld.run.linux.arm.checked.mch -0.10%
FullOpts (-0.42% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.10%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.25%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.42%
realworld.run.linux.arm.checked.mch -0.09%

Details here


@ryujit-bot
Copy link

Diff results for #97792

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,238,104 contexts (829,328 MinOpts, 1,408,776 FullOpts).

MISSED contexts: base: 71,275 (3.08%), diff: 72,560 (3.14%)

Overall (-70,778 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,250,208 -792
benchmarks.run_pgo.linux.arm.checked.mch 63,745,626 -15,392
benchmarks.run_tiered.linux.arm.checked.mch 21,504,686 -512
coreclr_tests.run.linux.arm.checked.mch 321,630,642 -7,306
libraries.crossgen2.linux.arm.checked.mch 34,521,638 +546
libraries.pmi.linux.arm.checked.mch 49,769,404 -2,946
libraries_tests.run.linux.arm.Release.mch 243,597,156 -40,580
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,156,910 -3,088
realworld.run.linux.arm.checked.mch 13,589,492 -708
MinOpts (+2,774 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 11,199,966 +400
benchmarks.run_tiered.linux.arm.checked.mch 8,653,000 +252
coreclr_tests.run.linux.arm.checked.mch 212,477,588 +782
libraries_tests.run.linux.arm.Release.mch 120,969,132 +1,340
FullOpts (-73,552 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,861,006 -792
benchmarks.run_pgo.linux.arm.checked.mch 52,545,660 -15,792
benchmarks.run_tiered.linux.arm.checked.mch 12,851,686 -764
coreclr_tests.run.linux.arm.checked.mch 109,153,054 -8,088
libraries.crossgen2.linux.arm.checked.mch 34,520,408 +546
libraries.pmi.linux.arm.checked.mch 49,663,180 -2,946
libraries_tests.run.linux.arm.Release.mch 122,628,024 -41,920
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,086,174 -3,088
realworld.run.linux.arm.checked.mch 13,154,192 -708

Details here


Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.41% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.08%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.22%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.41%
realworld.run.linux.arm.checked.mch -0.09%
MinOpts (-0.10% to -0.06%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.08%
benchmarks.run_pgo.linux.arm.checked.mch -0.10%
benchmarks.run_tiered.linux.arm.checked.mch -0.10%
coreclr_tests.run.linux.arm.checked.mch -0.06%
libraries.crossgen2.linux.arm.checked.mch -0.06%
libraries.pmi.linux.arm.checked.mch -0.07%
libraries_tests.run.linux.arm.Release.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.08%
realworld.run.linux.arm.checked.mch -0.10%
FullOpts (-0.42% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.10%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.25%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.42%
realworld.run.linux.arm.checked.mch -0.09%

Details here


@ryujit-bot
Copy link

Diff results for #97792

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,238,104 contexts (829,328 MinOpts, 1,408,776 FullOpts).

MISSED contexts: base: 71,275 (3.08%), diff: 72,560 (3.14%)

Overall (-70,778 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,250,208 -792
benchmarks.run_pgo.linux.arm.checked.mch 63,745,626 -15,392
benchmarks.run_tiered.linux.arm.checked.mch 21,504,686 -512
coreclr_tests.run.linux.arm.checked.mch 321,630,642 -7,306
libraries.crossgen2.linux.arm.checked.mch 34,521,638 +546
libraries.pmi.linux.arm.checked.mch 49,769,404 -2,946
libraries_tests.run.linux.arm.Release.mch 243,597,156 -40,580
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,156,910 -3,088
realworld.run.linux.arm.checked.mch 13,589,492 -708
MinOpts (+2,774 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 11,199,966 +400
benchmarks.run_tiered.linux.arm.checked.mch 8,653,000 +252
coreclr_tests.run.linux.arm.checked.mch 212,477,588 +782
libraries_tests.run.linux.arm.Release.mch 120,969,132 +1,340
FullOpts (-73,552 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,861,006 -792
benchmarks.run_pgo.linux.arm.checked.mch 52,545,660 -15,792
benchmarks.run_tiered.linux.arm.checked.mch 12,851,686 -764
coreclr_tests.run.linux.arm.checked.mch 109,153,054 -8,088
libraries.crossgen2.linux.arm.checked.mch 34,520,408 +546
libraries.pmi.linux.arm.checked.mch 49,663,180 -2,946
libraries_tests.run.linux.arm.Release.mch 122,628,024 -41,920
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,086,174 -3,088
realworld.run.linux.arm.checked.mch 13,154,192 -708

Details here


Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.41% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.08%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.22%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.41%
realworld.run.linux.arm.checked.mch -0.09%
MinOpts (-0.10% to -0.06%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.08%
benchmarks.run_pgo.linux.arm.checked.mch -0.10%
benchmarks.run_tiered.linux.arm.checked.mch -0.10%
coreclr_tests.run.linux.arm.checked.mch -0.06%
libraries.crossgen2.linux.arm.checked.mch -0.06%
libraries.pmi.linux.arm.checked.mch -0.07%
libraries_tests.run.linux.arm.Release.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.08%
realworld.run.linux.arm.checked.mch -0.10%
FullOpts (-0.42% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.13%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.12%
coreclr_tests.run.linux.arm.checked.mch -0.10%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.25%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.42%
realworld.run.linux.arm.checked.mch -0.09%

Details here


@filipnavara
Copy link
Member

Passes the tests on Raspberry Pi 5 now.

Copy link
Member

@filipnavara filipnavara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an unnecessary sign extension of the value parameter in CompareExchange (short):

0004e2c0 <System.Threading.Interlocked__CompareExchange_0>:
   4e2c0:       b508            push    {r3, lr}
   4e2c2:       b20b            sxth    r3, r1
   4e2c4:       b212            sxth    r2, r2
   4e2c6:       f3bf 8f5f       dmb     sy
   4e2ca:       e8d0 ef5f       ldrexh  lr, [r0]
   4e2ce:       fa0f fe8e       sxth.w  lr, lr
   4e2d2:       4596            cmp     lr, r2
   4e2d4:       d103            bne.n   4e2de <System.Threading.Interlocked__CompareExchange_0+0x1e>
   4e2d6:       e8c0 3f51       strexh  r1, r3, [r0]
   4e2da:       2900            cmp     r1, #0
   4e2dc:       d1f5            bne.n   4e2ca <System.Threading.Interlocked__CompareExchange_0+0xa>
   4e2de:       f3bf 8f5f       dmb     sy
   4e2e2:       4670            mov     r0, lr
   4e2e4:       bd08            pop     {r3, pc}

It's not really a correctness issue though. Otherwise, LGTM. Thanks!

@ryujit-bot
Copy link

Diff results for #97792

Throughput diffs

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
realworld.run.windows.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #97792

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,268,941 contexts (836,977 MinOpts, 1,431,964 FullOpts).

MISSED contexts: base: 75,116 (3.20%), diff: 76,443 (3.26%)

Overall (-71,712 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 16,212,460 -822
benchmarks.run_pgo.linux.arm.checked.mch 70,700,700 -15,306
benchmarks.run_tiered.linux.arm.checked.mch 20,254,108 -534
coreclr_tests.run.linux.arm.checked.mch 325,222,616 -6,656
libraries.crossgen2.linux.arm.checked.mch 37,766,632 +510
libraries.pmi.linux.arm.checked.mch 50,459,272 -3,008
libraries_tests.run.linux.arm.Release.mch 238,899,034 -41,870
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 93,598,228 -3,338
realworld.run.linux.arm.checked.mch 13,497,114 -688
MinOpts (+2,798 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 12,646,282 +400
benchmarks.run_tiered.linux.arm.checked.mch 8,137,304 +252
coreclr_tests.run.linux.arm.checked.mch 212,484,140 +784
libraries_tests.run.linux.arm.Release.mch 121,970,640 +1,362
FullOpts (-74,510 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,774,288 -822
benchmarks.run_pgo.linux.arm.checked.mch 58,054,418 -15,706
benchmarks.run_tiered.linux.arm.checked.mch 12,116,804 -786
coreclr_tests.run.linux.arm.checked.mch 112,738,476 -7,440
libraries.crossgen2.linux.arm.checked.mch 37,765,402 +510
libraries.pmi.linux.arm.checked.mch 50,353,048 -3,008
libraries_tests.run.linux.arm.Release.mch 116,928,394 -43,232
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 83,568,466 -3,338
realworld.run.linux.arm.checked.mch 13,063,346 -688

Details here


Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.42% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.14%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.13%
coreclr_tests.run.linux.arm.checked.mch -0.08%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.23%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.42%
realworld.run.linux.arm.checked.mch -0.10%
MinOpts (-0.10% to -0.06%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.08%
benchmarks.run_pgo.linux.arm.checked.mch -0.10%
benchmarks.run_tiered.linux.arm.checked.mch -0.10%
coreclr_tests.run.linux.arm.checked.mch -0.06%
libraries.crossgen2.linux.arm.checked.mch -0.06%
libraries.pmi.linux.arm.checked.mch -0.07%
libraries_tests.run.linux.arm.Release.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.08%
realworld.run.linux.arm.checked.mch -0.10%
FullOpts (-0.44% to -0.04%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.14%
benchmarks.run_pgo.linux.arm.checked.mch -0.13%
benchmarks.run_tiered.linux.arm.checked.mch -0.13%
coreclr_tests.run.linux.arm.checked.mch -0.10%
libraries.crossgen2.linux.arm.checked.mch -0.04%
libraries.pmi.linux.arm.checked.mch -0.14%
libraries_tests.run.linux.arm.Release.mch -0.27%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch -0.44%
realworld.run.linux.arm.checked.mch -0.10%

Details here


@MichalPetryka
Copy link
Contributor Author

This should be ready to be merged now with #97902 merged.

@filipnavara
Copy link
Member

I retested the latest version on Raspberry Pi. Interlocked_ro and Interlocked_r tests still pass.

@BruceForstall
Copy link
Member

@dotnet/jit-contrib

@MichalPetryka
Copy link
Contributor Author

I'd like to wait with merging this for #99011 and #99130 so that ARM32 SPMI can show diffs for those as it's the only platform it runs that uses the paths changes there, this will also have small merge conflicts with the first PR too. It should be ready for review however.

@JulieLeeMSFT
Copy link
Member

Related to #99011.
@kunalspathak, PTAL.

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Apr 15, 2024
@JulieLeeMSFT
Copy link
Member

Ping @kunalspathak again for a code review.

@@ -82,20 +82,29 @@ bool Lowering::IsContainableImmed(GenTree* parentNode, GenTree* childNode) const
{
case GT_ADD:
case GT_SUB:
#ifdef TARGET_ARM64
#ifdef TARGET_ARM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better readability, I would suggest to do the following:

#ifdef TARGET_ARM
            case GT_CMPXCHG:
                return emitter::emitIns_valid_imm_for_cmp(immVal, flags);
            case GT_XADD:
                return emitter::emitIns_valid_imm_for_add(immVal, flags);
#else
            case GT_CMPXCHG:
            case GT_XADD:
            case GT_LOCKADD:
            case GT_XORR:
            case GT_XAND:
                return comp->compOpportunisticallyDependsOn(InstructionSet_Atomics)
                           ? false
                           : emitter::emitIns_valid_imm_for_add(immVal, size);
#endif // TARGET_ARM

srcCount = tree->gtGetOp2()->isContained() ? 1 : 2;

buildInternalIntRegisterDefForNode(tree);
if (tree->OperGet() != GT_XCHG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the check explicit just to make sure we do not define the internal register for nodes that don't need them in future.

Suggested change
if (tree->OperGet() != GT_XCHG)
if (tree->OperGet() == GT_XADD)


emitAttr dataSize = emitActualTypeSize(data);

regNumber tempReg = treeNode->ExtractTempReg(RBM_ALLINT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now use new data structure to extract the temp register in #101647. Please update it accordingly.


gcInfo.gcMarkRegPtrVal(addrReg, addr->TypeGet());

// Emit code like this:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is done for other platforms as well, but not sure why this cannot be done in lower. The codegen should just emit the code instead of inserting the loop like code here.

// Arguments:
// treeNode - the GT_XADD/XCHG node
//
void CodeGen::genLockedInstructions(GenTreeOp* treeNode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The genLocked* methods added here looks similar to the one in arm64. can the logic be shared and have just a single method in codegenarmarch.cpp?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm32 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RyuJIT/arm32] CQ: enable InterlockedCmpXchg32 intrinsic
8 participants