New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable RyuJIT Backend for ARM32 #8496

Closed
YongseopKim opened this Issue Dec 7, 2016 · 73 comments

Comments

@YongseopKim
Contributor

YongseopKim commented Dec 7, 2016

We( @YongseopKim; @chunseoklee ; @hseok-oh ; @hqueue ) are going to work for enabling RyuJIT backend for arm32.

Any information that we should know? (like that somebodies are working some feature of RyuJIT on arm32)

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas
Member

jkotas commented Dec 7, 2016

@RussKeldorph

This comment has been minimized.

Show comment
Hide comment
@RussKeldorph

RussKeldorph Dec 8, 2016

Member

Awesome! It should be much easier to do it now that x86 paved the 32-bit path. I'm not aware of anyone else working on RyuJIT/ARM32. dotnet/jit-contrib have certainly talked about it a great deal, so expect feedback and feel free to ask questions.

Member

RussKeldorph commented Dec 8, 2016

Awesome! It should be much easier to do it now that x86 paved the 32-bit path. I'm not aware of anyone else working on RyuJIT/ARM32. dotnet/jit-contrib have certainly talked about it a great deal, so expect feedback and feel free to ask questions.

@mskvortsov

This comment has been minimized.

Show comment
Hide comment
@mskvortsov

mskvortsov Dec 26, 2016

Contributor

@hqueue I'm working on CodeGen::genSetRegToCond right now.
And I've put my head into the !isRegPairType(tree->TypeGet()) issue already.

(BTW, broadly speaking, there are still quite a few overlapping areas in the bring up process, so at this stage we probably should not try too hard not to repeat each other's efforts).

Contributor

mskvortsov commented Dec 26, 2016

@hqueue I'm working on CodeGen::genSetRegToCond right now.
And I've put my head into the !isRegPairType(tree->TypeGet()) issue already.

(BTW, broadly speaking, there are still quite a few overlapping areas in the bring up process, so at this stage we probably should not try too hard not to repeat each other's efforts).

@hqueue

This comment has been minimized.

Show comment
Hide comment
@hqueue

hqueue Dec 27, 2016

Member

@mskvortsov Cool. For !isRegPairType(tree->TypeGet()) and assignedInterval != nullptr , I think they are all related to types which requires two registers , e.g. long and double type in all phases including LSRA.
BTW I will look into other assertions.

Member

hqueue commented Dec 27, 2016

@mskvortsov Cool. For !isRegPairType(tree->TypeGet()) and assignedInterval != nullptr , I think they are all related to types which requires two registers , e.g. long and double type in all phases including LSRA.
BTW I will look into other assertions.

@hqueue

This comment has been minimized.

Show comment
Hide comment
@hqueue

hqueue Dec 27, 2016

Member

The description is deprecated by Jan. 18, 2017. You can find test result from #8724.


Summarize current status with PR #8710 and #8724 (cited from #8724)

Status

Test result from 139 tests from JIT/CodeGenBringUpTests with COMPlus_JitDisasm=BringUpTest:* COMPlus_AltJit=BringUpTest:*

with PR #8710

=======================
Test Results
=======================
# CoreCLR Bin Dir :
# Tests Discovered : 139
# Passed : 61
# Failed : 78
# Skipped : 0
=======================

All 78 failures are Assert failure as follows.

  • 54 of 'NYI: float argument' (gone)
  • 7 of 'NYI: genSetRegToCond'
  • 6 of 'NYI: Lowering of long register argument'
  • 4 of 'NYI_ARM: TreeNodeInfoInit default case'
  • 2 of 'NYI_ARM: ARM IsContainableImmed for floating point type' (gone)
  • 2 of 'NYI_ARM: fgMorphMultiregStructArgs'
  • 1 of 'NYI: Cast'
  • 1 of 'NYI: ARM genCallFinally'
  • 1 of '!isRegPairType(tree->TypeGet())'

with PR #8710 and this PR #8724 (15 more tests are PASSED.)

=======================
Test Results
=======================
# CoreCLR Bin Dir : 
# Tests Discovered : 139
# Passed : 76
# Failed : 63
# Skipped : 0
=======================

All 63 failures are Assert failure as follows.

  • 29 of 'assignedInterval != nullptr' (NEW)
  • 7 of 'NYI: GT_DIV' (NEW)
  • 2 of '!"Unexpected instruction"' (NEW)
  • 2 of 'NYI_ARM: Lowering for cast from float' (NEW)
  • 1 of 'NYI_ARM: GT_LEA: index and cns are not nil' (NEW)
  • 7 of 'NYI: genSetRegToCond' (no change)
  • 6 of 'NYI: Lowering of long register argument' (no change)
  • 4 of 'NYI_ARM: TreeNodeInfoInit default case' (no change)
  • 2 of 'NYI_ARM: fgMorphMultiregStructArgs' (no change)
  • 1 of 'NYI: Cast' (no change)
  • 1 of 'NYI: ARM genCallFinally' (no change)
  • 1 of '!isRegPairType(tree->TypeGet())' (no change)
Member

hqueue commented Dec 27, 2016

The description is deprecated by Jan. 18, 2017. You can find test result from #8724.


Summarize current status with PR #8710 and #8724 (cited from #8724)

Status

Test result from 139 tests from JIT/CodeGenBringUpTests with COMPlus_JitDisasm=BringUpTest:* COMPlus_AltJit=BringUpTest:*

with PR #8710

=======================
Test Results
=======================
# CoreCLR Bin Dir :
# Tests Discovered : 139
# Passed : 61
# Failed : 78
# Skipped : 0
=======================

All 78 failures are Assert failure as follows.

  • 54 of 'NYI: float argument' (gone)
  • 7 of 'NYI: genSetRegToCond'
  • 6 of 'NYI: Lowering of long register argument'
  • 4 of 'NYI_ARM: TreeNodeInfoInit default case'
  • 2 of 'NYI_ARM: ARM IsContainableImmed for floating point type' (gone)
  • 2 of 'NYI_ARM: fgMorphMultiregStructArgs'
  • 1 of 'NYI: Cast'
  • 1 of 'NYI: ARM genCallFinally'
  • 1 of '!isRegPairType(tree->TypeGet())'

with PR #8710 and this PR #8724 (15 more tests are PASSED.)

=======================
Test Results
=======================
# CoreCLR Bin Dir : 
# Tests Discovered : 139
# Passed : 76
# Failed : 63
# Skipped : 0
=======================

All 63 failures are Assert failure as follows.

  • 29 of 'assignedInterval != nullptr' (NEW)
  • 7 of 'NYI: GT_DIV' (NEW)
  • 2 of '!"Unexpected instruction"' (NEW)
  • 2 of 'NYI_ARM: Lowering for cast from float' (NEW)
  • 1 of 'NYI_ARM: GT_LEA: index and cns are not nil' (NEW)
  • 7 of 'NYI: genSetRegToCond' (no change)
  • 6 of 'NYI: Lowering of long register argument' (no change)
  • 4 of 'NYI_ARM: TreeNodeInfoInit default case' (no change)
  • 2 of 'NYI_ARM: fgMorphMultiregStructArgs' (no change)
  • 1 of 'NYI: Cast' (no change)
  • 1 of 'NYI: ARM genCallFinally' (no change)
  • 1 of '!isRegPairType(tree->TypeGet())' (no change)
@mskvortsov

This comment has been minimized.

Show comment
Hide comment
@mskvortsov

mskvortsov Dec 27, 2016

Contributor

@hqueue #8731 (on top of your FP work) makes 6 more tests passed.

Contributor

mskvortsov commented Dec 27, 2016

@hqueue #8731 (on top of your FP work) makes 6 more tests passed.

@hqueue

This comment has been minimized.

Show comment
Hide comment
@hqueue

hqueue Dec 27, 2016

Member

@mskvortsov Cool! Now I'm not working on assertions right now, I think you can continue as you wish :) I will run tests tomorrow again.

Member

hqueue commented Dec 27, 2016

@mskvortsov Cool! Now I'm not working on assertions right now, I think you can continue as you wish :) I will run tests tomorrow again.

@hqueue

This comment has been minimized.

Show comment
Hide comment
@hqueue

hqueue Aug 25, 2017

Member

With 8e2d570, arm Release build passed all Pri2 tests, although there are assertions observed in Checked build. And remaining assertions will be addressed by opened issues(#13453, #13565) and PR(#13579).

Tested on Raspberry Pi2.
CoreCLR (Release): 8e2d570
CoreFX (Release): 06d361c
coreclr tests: built for x86 (Release build) (Pri2)

=======================
# CoreCLR Bin Dir  : 
# Tests Discovered : 11387
# Passed           : 10574
# Failed           : 0
# Skipped          : 813
=======================
Member

hqueue commented Aug 25, 2017

With 8e2d570, arm Release build passed all Pri2 tests, although there are assertions observed in Checked build. And remaining assertions will be addressed by opened issues(#13453, #13565) and PR(#13579).

Tested on Raspberry Pi2.
CoreCLR (Release): 8e2d570
CoreFX (Release): 06d361c
coreclr tests: built for x86 (Release build) (Pri2)

=======================
# CoreCLR Bin Dir  : 
# Tests Discovered : 11387
# Passed           : 10574
# Failed           : 0
# Skipped          : 813
=======================
@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Sep 19, 2017

Contributor

The coreclr CI system now supports testing RyuJIT/arm32 on Windows, as well as arm32 LEGACY_BACKEND on Windows. In the CI, RyuJIT/arm32 is now named "arm", and LEGACY_BACKEND arm32 JIT is named "armlb". There is one more job triggered with every change (for people on the ARM64 "white list"): "Windows_NT arm Cross Checked Build and Test".

With #14055, we can also trigger a number of JitStress/JitStressRegs/TailcallStress/MinOpts test runs for arm32 as well. See #14072 for an example. They haven't all been run yet, so they are probably not error-free. The tests\arm\Tests.lst file should be updated over time to ensure all the runs are clean, and failing tests marked with EXPECTED_FAIL and a tracking GitHub issue number. (These tests can only be triggered by the ARM64 "white list" of approved Microsoft developers.)

Contributor

BruceForstall commented Sep 19, 2017

The coreclr CI system now supports testing RyuJIT/arm32 on Windows, as well as arm32 LEGACY_BACKEND on Windows. In the CI, RyuJIT/arm32 is now named "arm", and LEGACY_BACKEND arm32 JIT is named "armlb". There is one more job triggered with every change (for people on the ARM64 "white list"): "Windows_NT arm Cross Checked Build and Test".

With #14055, we can also trigger a number of JitStress/JitStressRegs/TailcallStress/MinOpts test runs for arm32 as well. See #14072 for an example. They haven't all been run yet, so they are probably not error-free. The tests\arm\Tests.lst file should be updated over time to ensure all the runs are clean, and failing tests marked with EXPECTED_FAIL and a tracking GitHub issue number. (These tests can only be triggered by the ARM64 "white list" of approved Microsoft developers.)

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Sep 26, 2017

Contributor

The coreclr CI system now supports testing RyuJIT/arm32 on Windows via the altjit mechanism (compiling for ARM on x86 using protononjit.dll). Because it is cross-compilation, the generated ARM code doesn't run; it is thrown away (but the x86 generated code does run). This kind of testing can be useful for doing lots of stress mode compilations, looking for asserts. (Note that I added equivalent test modes for x64-hosted, arm64-targeting testing). One advantage is these modes can be triggered by anyone, not just those on the ARM64 hardware "white list", since they use standard x86/x64 VMs. Also, we have lots of x86/x64 hardware (or VMs), compared to the currently limited number and slow speed of ARM hardware, so these are faster, more reliable, and don't spend so much time waiting, so they shouldn't time out, for instance.

You can see an example of these runs here: #14182.

Note, however, that the runs currently are not clean: there are some failures in every run. I've started adding such asserts and issues to GitHub, summarized in the RyuJIT/arm32 project page: https://github.com/dotnet/coreclr/projects/4.

Contributor

BruceForstall commented Sep 26, 2017

The coreclr CI system now supports testing RyuJIT/arm32 on Windows via the altjit mechanism (compiling for ARM on x86 using protononjit.dll). Because it is cross-compilation, the generated ARM code doesn't run; it is thrown away (but the x86 generated code does run). This kind of testing can be useful for doing lots of stress mode compilations, looking for asserts. (Note that I added equivalent test modes for x64-hosted, arm64-targeting testing). One advantage is these modes can be triggered by anyone, not just those on the ARM64 hardware "white list", since they use standard x86/x64 VMs. Also, we have lots of x86/x64 hardware (or VMs), compared to the currently limited number and slow speed of ARM hardware, so these are faster, more reliable, and don't spend so much time waiting, so they shouldn't time out, for instance.

You can see an example of these runs here: #14182.

Note, however, that the runs currently are not clean: there are some failures in every run. I've started adding such asserts and issues to GitHub, summarized in the RyuJIT/arm32 project page: https://github.com/dotnet/coreclr/projects/4.

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Oct 7, 2017

Contributor

I've opened GitHub issues corresponding to every assert found in the following Windows RyuJIT/arm32 altjit jobs:

Windows_NT x86_arm_altjit Checked forcerelocs
Windows_NT x86_arm_altjit Checked jitsse2only
Windows_NT x86_arm_altjit Checked jitstress1
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs0x1000
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs0x10
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs0x80
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs1
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs2
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs3
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs4
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs8
Windows_NT x86_arm_altjit Checked jitstress2
Windows_NT x86_arm_altjit Checked jitstressregs0x1000
Windows_NT x86_arm_altjit Checked jitstressregs0x10
Windows_NT x86_arm_altjit Checked jitstressregs0x80
Windows_NT x86_arm_altjit Checked jitstressregs1
Windows_NT x86_arm_altjit Checked jitstressregs2
Windows_NT x86_arm_altjit Checked jitstressregs3
Windows_NT x86_arm_altjit Checked jitstressregs4
Windows_NT x86_arm_altjit Checked jitstressregs8
Windows_NT x86_arm_altjit Checked minopts
Windows_NT x86_arm_altjit Checked tailcallstress
Windows_NT x86_arm_altjit Checked
Windows_NT x86_arm_altjit Debug
Windows_NT x86_arm_altjit Release

The "corefx" altjit jobs don't work yet.

You can find the issues in project page here or by looking at the arm32 CodeGen issues here

Contributor

BruceForstall commented Oct 7, 2017

I've opened GitHub issues corresponding to every assert found in the following Windows RyuJIT/arm32 altjit jobs:

Windows_NT x86_arm_altjit Checked forcerelocs
Windows_NT x86_arm_altjit Checked jitsse2only
Windows_NT x86_arm_altjit Checked jitstress1
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs0x1000
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs0x10
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs0x80
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs1
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs2
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs3
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs4
Windows_NT x86_arm_altjit Checked jitstress2_jitstressregs8
Windows_NT x86_arm_altjit Checked jitstress2
Windows_NT x86_arm_altjit Checked jitstressregs0x1000
Windows_NT x86_arm_altjit Checked jitstressregs0x10
Windows_NT x86_arm_altjit Checked jitstressregs0x80
Windows_NT x86_arm_altjit Checked jitstressregs1
Windows_NT x86_arm_altjit Checked jitstressregs2
Windows_NT x86_arm_altjit Checked jitstressregs3
Windows_NT x86_arm_altjit Checked jitstressregs4
Windows_NT x86_arm_altjit Checked jitstressregs8
Windows_NT x86_arm_altjit Checked minopts
Windows_NT x86_arm_altjit Checked tailcallstress
Windows_NT x86_arm_altjit Checked
Windows_NT x86_arm_altjit Debug
Windows_NT x86_arm_altjit Release

The "corefx" altjit jobs don't work yet.

You can find the issues in project page here or by looking at the arm32 CodeGen issues here

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Oct 17, 2017

Contributor

The corefx altjit jobs now work. I've opened 7 new issues found running these tests. The jobs are:

Windows_NT x86_arm_altjit Checked corefx_baseline
Windows_NT x86_arm_altjit Checked corefx_jitstress1
Windows_NT x86_arm_altjit Checked corefx_jitstress2
Windows_NT x86_arm_altjit Checked corefx_jitstressregs0x1000
Windows_NT x86_arm_altjit Checked corefx_jitstressregs0x10
Windows_NT x86_arm_altjit Checked corefx_jitstressregs0x80
Windows_NT x86_arm_altjit Checked corefx_jitstressregs1
Windows_NT x86_arm_altjit Checked corefx_jitstressregs2
Windows_NT x86_arm_altjit Checked corefx_jitstressregs3
Windows_NT x86_arm_altjit Checked corefx_jitstressregs4
Windows_NT x86_arm_altjit Checked corefx_jitstressregs8
Windows_NT x86_arm_altjit Checked corefx_minopts

and can be triggered with any dotnet/coreclr PR. None of the jobs pass right now; besides the asserts, there appear to be some runtime failures that only occur when the altjit is invoked. E.g., you can see all the test runs in this PR: #14334 (choose "View Details" near the bottom.)

Contributor

BruceForstall commented Oct 17, 2017

The corefx altjit jobs now work. I've opened 7 new issues found running these tests. The jobs are:

Windows_NT x86_arm_altjit Checked corefx_baseline
Windows_NT x86_arm_altjit Checked corefx_jitstress1
Windows_NT x86_arm_altjit Checked corefx_jitstress2
Windows_NT x86_arm_altjit Checked corefx_jitstressregs0x1000
Windows_NT x86_arm_altjit Checked corefx_jitstressregs0x10
Windows_NT x86_arm_altjit Checked corefx_jitstressregs0x80
Windows_NT x86_arm_altjit Checked corefx_jitstressregs1
Windows_NT x86_arm_altjit Checked corefx_jitstressregs2
Windows_NT x86_arm_altjit Checked corefx_jitstressregs3
Windows_NT x86_arm_altjit Checked corefx_jitstressregs4
Windows_NT x86_arm_altjit Checked corefx_jitstressregs8
Windows_NT x86_arm_altjit Checked corefx_minopts

and can be triggered with any dotnet/coreclr PR. None of the jobs pass right now; besides the asserts, there appear to be some runtime failures that only occur when the altjit is invoked. E.g., you can see all the test runs in this PR: #14334 (choose "View Details" near the bottom.)

@BruceForstall BruceForstall modified the milestones: Future, 2.1.0 Oct 26, 2017

@hqueue

This comment has been minimized.

Show comment
Hide comment
@hqueue

hqueue Nov 13, 2017

Member

@dotnet/arm32-contrib I'm just curious when are we going to enable RyuJIT as a default compiler for ARM32. Alghouth, it seems that there are several failures in ARM32 in stress mode tests and sometimes there is regressions now, I think it's time to discuss about criteria for enabling RyuJIT, for example pass all CI tests enabled above.
All RyuJIT/ARM32 issues are set to milestone 2.1.0, and I'm just curious about a goal all you have in mind. Please share your thought in mind :)

Member

hqueue commented Nov 13, 2017

@dotnet/arm32-contrib I'm just curious when are we going to enable RyuJIT as a default compiler for ARM32. Alghouth, it seems that there are several failures in ARM32 in stress mode tests and sometimes there is regressions now, I think it's time to discuss about criteria for enabling RyuJIT, for example pass all CI tests enabled above.
All RyuJIT/ARM32 issues are set to milestone 2.1.0, and I'm just curious about a goal all you have in mind. Please share your thought in mind :)

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Nov 13, 2017

Contributor

We've recently been talking about this internally. The answer is "soon". The remaining work is: (1) there are a couple remaining asserts, (2) there are some tests that fail in various stress and non-stress modes -- presumably some because of bad codegen, (3) there are some GC stress issues, (4) we haven't measured throughput or investigated generated code quality compared to legacy backend.

Probably only completing (1) and (2) are all we will wait for to make RyuJIT the default. For GC stress, even legacy backend has several issues. For CQ/TP, we will have to work on those before 2.1 ships.

Contributor

BruceForstall commented Nov 13, 2017

We've recently been talking about this internally. The answer is "soon". The remaining work is: (1) there are a couple remaining asserts, (2) there are some tests that fail in various stress and non-stress modes -- presumably some because of bad codegen, (3) there are some GC stress issues, (4) we haven't measured throughput or investigated generated code quality compared to legacy backend.

Probably only completing (1) and (2) are all we will wait for to make RyuJIT the default. For GC stress, even legacy backend has several issues. For CQ/TP, we will have to work on those before 2.1 ships.

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Nov 13, 2017

Contributor

Note: with #14921, the CI will now use RyuJIT/arm32 to crossgen System.Private.CoreLib.dll in "arm" jobs ("armlb" jobs will continue to use legacy backend for everything).

Contributor

BruceForstall commented Nov 13, 2017

Note: with #14921, the CI will now use RyuJIT/arm32 to crossgen System.Private.CoreLib.dll in "arm" jobs ("armlb" jobs will continue to use legacy backend for everything).

@hqueue

This comment has been minimized.

Show comment
Hide comment
@hqueue

hqueue Nov 14, 2017

Member

We've recently been talking about this internally. The answer is "soon".

Glad to here this :)

The remaining work is: (1) there are a couple remaining asserts, (2) there are some tests that fail in various stress and non-stress modes -- presumably some because of bad codegen, (3) there are some GC stress issues,

Thank you again for reporting issues by enabling a lot of test with various stress combination.

(4) we haven't measured throughput or investigated generated code quality compared to legacy backend.

There is an live issue #12971 before to evaluate performance. But we neither have looked into throughput seriously yet.

Probably only completing (1) and (2) are all we will wait for to make RyuJIT the default. For GC stress, even legacy backend has several issues. For CQ/TP, we will have to work on those before 2.1 ships.

Looks good and feasible for near future. And we also think supporting GC without bug is a never-ending task in managed environments with GC, even in mature JVM.

Note: with #14921, the CI will now use RyuJIT/arm32 to crossgen System.Private.CoreLib.dll in "arm" jobs ("armlb" jobs will continue to use legacy backend for everything).

Thanks for the PR!

Member

hqueue commented Nov 14, 2017

We've recently been talking about this internally. The answer is "soon".

Glad to here this :)

The remaining work is: (1) there are a couple remaining asserts, (2) there are some tests that fail in various stress and non-stress modes -- presumably some because of bad codegen, (3) there are some GC stress issues,

Thank you again for reporting issues by enabling a lot of test with various stress combination.

(4) we haven't measured throughput or investigated generated code quality compared to legacy backend.

There is an live issue #12971 before to evaluate performance. But we neither have looked into throughput seriously yet.

Probably only completing (1) and (2) are all we will wait for to make RyuJIT the default. For GC stress, even legacy backend has several issues. For CQ/TP, we will have to work on those before 2.1 ships.

Looks good and feasible for near future. And we also think supporting GC without bug is a never-ending task in managed environments with GC, even in mature JVM.

Note: with #14921, the CI will now use RyuJIT/arm32 to crossgen System.Private.CoreLib.dll in "arm" jobs ("armlb" jobs will continue to use legacy backend for everything).

Thanks for the PR!

@alpencolt

This comment has been minimized.

Show comment
Hide comment
@alpencolt

alpencolt Nov 27, 2017

Member

@BruceForstall I've prepared list of remaining issues for RyuJIT. Could you check it and add more if I've missed something. For making RyuJIT as default compiler we are needed fix (1) and (4), am I right?

  1. Assertions:

    • [RyuJIT/armel] Assertion failed 'argValue->TypeGet() == TYP_STRUCT' #14955
    • Test failure: JIT_Directed.coverage_oldtests_lclflddiv_cs_d_lclflddiv_cs_d/_coverage_oldtests_lclflddiv_cs_d_lclflddiv_cs_d_cmd #14583
    • [ARM] Intermittent segfaults in JIT/Methodical/cctor/misc/threads1_cs_r #12421
    • [ARM32] CompareExchange working wrong with clang3.9 #15074
  2. Unimplemented features (and features in progress)

    • [RyuJIT/ARM32] Support fast tail call optimization #13828
  3. NYI_ARM (23) #13000:

./jit/lowerarmarch.cpp:       NYI_ARM("ARM IsContainableImmed for floating point type");
./jit/lowerarmarch.cpp:       NYI_ARM("initblk loop unrolling is currently not implemented.");
./jit/lowerarmarch.cpp:       NYI_ARM("Lowering for cast from float to small type"); // Not tested yet.
./jit/lsraarm.cpp:            NYI_ARM("LinearScan::TreeNodeInfoInit for GT_INTRINSIC");
./jit/lsraarm.cpp:            NYI_ARM("TreeNodeInfoInit default case");
./jit/codegenarm.cpp:         NYI_ARM("genCodeForInitBlkUnroll");
./jit/codegenarm.cpp:         NYI_ARM("st.lclFld contained operand");
./jit/codegenarm.cpp:         NYI_ARM("st.lclVar contained operand");
./jit/codegenarm.cpp:         NYI_ARM("NOGC_WRITE_BARRIERS");
./jit/lsraarmarch.cpp:        NYI_ARM("NOGC_WRITE_BARRIERS");
./jit/lsraarmarch.cpp:        NYI_ARM("float reg varargs");
./jit/lsraarmarch.cpp:        NYI_ARM("initblk loop unrolling is currently not implemented.");
./jit/codegenarmarch.cpp:     NYI_ARM("genRegCopy from 'int' to 'float'");
./jit/codegenarmarch.cpp:     NYI_ARM("GT_CMPXCHG");
./jit/codegenarmarch.cpp:     NYI_ARM("genIntrinsic for round - not implemented yet");
./jit/codegenarmarch.cpp:     NYI_ARM("genRegCopy from 'int' to 'float'");
./jit/codegenarmarch.cpp:     NYI_ARM("SIMD genStructReturn");
./jit/codegenlegacy.cpp:      NYI_ARM("LOCK instructions");
./jit/lower.cpp:              NYI_ARM("lower: struct argument by fast tail call");
./jit/emitarm.cpp:            NYI_ARM("JitDataOffset static fields");
./jit/emitarm.cpp:            NYI_ARM("Thread-Local-Storage static fields");
./jit/emitarm.cpp:            NYI_ARM("JitDataOffset static fields");
./jit/emitarm.cpp:            NYI_ARM("Thread-Local-Storage static fields");
  1. JitStress:
    https://github.com/dotnet/coreclr/issues?q=is%3Aissue+is%3Aopen+label%3AJitStress+label%3Aarch-arm32
  2. GCStress:
    https://github.com/dotnet/coreclr/issues?q=is%3Aissue+is%3Aopen+label%3Aarch-arm32+label%3AGCStress
  3. Performance:
    • Need to check benchmarks comparing to legacy backend
  4. Others:
    • Blocking issues for profiler APIs on ARM32/64 #14526
Member

alpencolt commented Nov 27, 2017

@BruceForstall I've prepared list of remaining issues for RyuJIT. Could you check it and add more if I've missed something. For making RyuJIT as default compiler we are needed fix (1) and (4), am I right?

  1. Assertions:

    • [RyuJIT/armel] Assertion failed 'argValue->TypeGet() == TYP_STRUCT' #14955
    • Test failure: JIT_Directed.coverage_oldtests_lclflddiv_cs_d_lclflddiv_cs_d/_coverage_oldtests_lclflddiv_cs_d_lclflddiv_cs_d_cmd #14583
    • [ARM] Intermittent segfaults in JIT/Methodical/cctor/misc/threads1_cs_r #12421
    • [ARM32] CompareExchange working wrong with clang3.9 #15074
  2. Unimplemented features (and features in progress)

    • [RyuJIT/ARM32] Support fast tail call optimization #13828
  3. NYI_ARM (23) #13000:

./jit/lowerarmarch.cpp:       NYI_ARM("ARM IsContainableImmed for floating point type");
./jit/lowerarmarch.cpp:       NYI_ARM("initblk loop unrolling is currently not implemented.");
./jit/lowerarmarch.cpp:       NYI_ARM("Lowering for cast from float to small type"); // Not tested yet.
./jit/lsraarm.cpp:            NYI_ARM("LinearScan::TreeNodeInfoInit for GT_INTRINSIC");
./jit/lsraarm.cpp:            NYI_ARM("TreeNodeInfoInit default case");
./jit/codegenarm.cpp:         NYI_ARM("genCodeForInitBlkUnroll");
./jit/codegenarm.cpp:         NYI_ARM("st.lclFld contained operand");
./jit/codegenarm.cpp:         NYI_ARM("st.lclVar contained operand");
./jit/codegenarm.cpp:         NYI_ARM("NOGC_WRITE_BARRIERS");
./jit/lsraarmarch.cpp:        NYI_ARM("NOGC_WRITE_BARRIERS");
./jit/lsraarmarch.cpp:        NYI_ARM("float reg varargs");
./jit/lsraarmarch.cpp:        NYI_ARM("initblk loop unrolling is currently not implemented.");
./jit/codegenarmarch.cpp:     NYI_ARM("genRegCopy from 'int' to 'float'");
./jit/codegenarmarch.cpp:     NYI_ARM("GT_CMPXCHG");
./jit/codegenarmarch.cpp:     NYI_ARM("genIntrinsic for round - not implemented yet");
./jit/codegenarmarch.cpp:     NYI_ARM("genRegCopy from 'int' to 'float'");
./jit/codegenarmarch.cpp:     NYI_ARM("SIMD genStructReturn");
./jit/codegenlegacy.cpp:      NYI_ARM("LOCK instructions");
./jit/lower.cpp:              NYI_ARM("lower: struct argument by fast tail call");
./jit/emitarm.cpp:            NYI_ARM("JitDataOffset static fields");
./jit/emitarm.cpp:            NYI_ARM("Thread-Local-Storage static fields");
./jit/emitarm.cpp:            NYI_ARM("JitDataOffset static fields");
./jit/emitarm.cpp:            NYI_ARM("Thread-Local-Storage static fields");
  1. JitStress:
    https://github.com/dotnet/coreclr/issues?q=is%3Aissue+is%3Aopen+label%3AJitStress+label%3Aarch-arm32
  2. GCStress:
    https://github.com/dotnet/coreclr/issues?q=is%3Aissue+is%3Aopen+label%3Aarch-arm32+label%3AGCStress
  3. Performance:
    • Need to check benchmarks comparing to legacy backend
  4. Others:
    • Blocking issues for profiler APIs on ARM32/64 #14526
@CarolEidt

This comment has been minimized.

Show comment
Hide comment
@CarolEidt

CarolEidt Nov 27, 2017

Member

@BruceForstall can you advise what would be the best place for me to focus? I could work on jitstress or NYIs, or ...?

Member

CarolEidt commented Nov 27, 2017

@BruceForstall can you advise what would be the best place for me to focus? I could work on jitstress or NYIs, or ...?

@alpencolt

This comment has been minimized.

Show comment
Hide comment
@alpencolt

alpencolt Dec 7, 2017

Member

@BruceForstall have you had the time to look into #8496 (comment)? Is there any other big fields for work? Or it's difficult to estimate at the current moment?

Another thing which I'm interesting is it possible to generate Linux/ARM code like protononjit but on Windows host (off course 32-bit target on 32-bit host)? If it's not implemented yet would it be difficult to implement it?

Member

alpencolt commented Dec 7, 2017

@BruceForstall have you had the time to look into #8496 (comment)? Is there any other big fields for work? Or it's difficult to estimate at the current moment?

Another thing which I'm interesting is it possible to generate Linux/ARM code like protononjit but on Windows host (off course 32-bit target on 32-bit host)? If it's not implemented yet would it be difficult to implement it?

@CarolEidt

This comment has been minimized.

Show comment
Hide comment
@CarolEidt

CarolEidt Dec 7, 2017

Member

Another thing which I'm interesting is it possible to generate Linux/ARM code like protononjit but on Windows host (off course 32-bit target on 32-bit host)?

I believe that's what linuxnonjit.dll does.

Member

CarolEidt commented Dec 7, 2017

Another thing which I'm interesting is it possible to generate Linux/ARM code like protononjit but on Windows host (off course 32-bit target on 32-bit host)?

I believe that's what linuxnonjit.dll does.

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Dec 7, 2017

Contributor

@alpencolt Sorry for the slow response. Thank you for collecting the list of issues.

I agree that 1 and 4 should be fixed before we switch RyuJIT/arm32 to be the default. I would also add that "non-stress" runs should also be clean: simple Debug/Checked/Release Pri-1 runs. There are currently 3 test failures in Release. However, I'm willing to be a bit flexible, as well, as there are benefits to doing the switch, and LEGACY_BACKEND is not exactly clean (perhaps I should do a full armlb test run and see how it compares to current RyuJIT results).

As for the issues you list in 1, I'm not sure any need to be blocking:

  • #14583: this is LEGACY_BACKEND
  • #14955: this is armel only, as far as I know. Should it be blocking?
  • #12421, #15074 -- I don't think these should be considered blocking. They don't seem to be RyuJIT specific.

So, to summarize what I think needs to be fixed before conversion:

  • #15150 -- [RyuJIT/arm32] Assert failure: pImportSection == pModule->GetImportSectionForRVA(rva)
  • #15153 -- [RyuJIT/arm32] Release test failures
  • #14860 -- [RyuJIT/arm32][JITMinOpts=1] Test runtime failures

I'm a little worried we haven't yet fully tested ReadyToRun / crossgen: #14916

I think we just disable tests failing in #15155 and #15156 if those aren't fixed when we convert.

Comments?

Contributor

BruceForstall commented Dec 7, 2017

@alpencolt Sorry for the slow response. Thank you for collecting the list of issues.

I agree that 1 and 4 should be fixed before we switch RyuJIT/arm32 to be the default. I would also add that "non-stress" runs should also be clean: simple Debug/Checked/Release Pri-1 runs. There are currently 3 test failures in Release. However, I'm willing to be a bit flexible, as well, as there are benefits to doing the switch, and LEGACY_BACKEND is not exactly clean (perhaps I should do a full armlb test run and see how it compares to current RyuJIT results).

As for the issues you list in 1, I'm not sure any need to be blocking:

  • #14583: this is LEGACY_BACKEND
  • #14955: this is armel only, as far as I know. Should it be blocking?
  • #12421, #15074 -- I don't think these should be considered blocking. They don't seem to be RyuJIT specific.

So, to summarize what I think needs to be fixed before conversion:

  • #15150 -- [RyuJIT/arm32] Assert failure: pImportSection == pModule->GetImportSectionForRVA(rva)
  • #15153 -- [RyuJIT/arm32] Release test failures
  • #14860 -- [RyuJIT/arm32][JITMinOpts=1] Test runtime failures

I'm a little worried we haven't yet fully tested ReadyToRun / crossgen: #14916

I think we just disable tests failing in #15155 and #15156 if those aren't fixed when we convert.

Comments?

@alpencolt

This comment has been minimized.

Show comment
Hide comment
@alpencolt

alpencolt Dec 8, 2017

Member

@CarolEidt great news, thank you!

@BruceForstall thank you for response. We will update our plans.

#14955: this is armel only, as far as I know. Should it be blocking?

Up to you. Since Samsung use armel this issue is critical for us.

We haven't tested crossgen a lot too but I've measured launch time of Xamarin applications on Tizen platform for Fragile mode. And RyuJIT is more than 10% slower than Legacy. So I don't know about stability but there is regression in performance unfortunately.

Member

alpencolt commented Dec 8, 2017

@CarolEidt great news, thank you!

@BruceForstall thank you for response. We will update our plans.

#14955: this is armel only, as far as I know. Should it be blocking?

Up to you. Since Samsung use armel this issue is critical for us.

We haven't tested crossgen a lot too but I've measured launch time of Xamarin applications on Tizen platform for Fragile mode. And RyuJIT is more than 10% slower than Legacy. So I don't know about stability but there is regression in performance unfortunately.

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Dec 8, 2017

Contributor

Since Samsung use armel this issue is critical for us.

How much testing has Samsung done with RyuJIT and armel?

The CI job https://ci.dot.net/job/dotnet_coreclr/job/master/job/armel_cross_checked_tizen_prtest only runs 22 tests, and it is using LEGACY_BACKEND. When we switch to RyuJIT/arm32 as the default, it will use RyuJIT. Has Samsung tested these (and more) with RyuJIT/arm32?

Contributor

BruceForstall commented Dec 8, 2017

Since Samsung use armel this issue is critical for us.

How much testing has Samsung done with RyuJIT and armel?

The CI job https://ci.dot.net/job/dotnet_coreclr/job/master/job/armel_cross_checked_tizen_prtest only runs 22 tests, and it is using LEGACY_BACKEND. When we switch to RyuJIT/arm32 as the default, it will use RyuJIT. Has Samsung tested these (and more) with RyuJIT/arm32?

@alpencolt

This comment has been minimized.

Show comment
Hide comment
@alpencolt

alpencolt Dec 11, 2017

Member

@BruceForstall we do it but not on regular basis. There are about 2 failed tests from more than 10k called for both ABI. We will rerun them and I''ll provide more detailed results.

Member

alpencolt commented Dec 11, 2017

@BruceForstall we do it but not on regular basis. There are about 2 failed tests from more than 10k called for both ABI. We will rerun them and I''ll provide more detailed results.

@alpencolt

This comment has been minimized.

Show comment
Hide comment
@alpencolt

alpencolt Dec 13, 2017

Member

@BruceForstall sorry for delaying.
Test results for RyuJIT ARM/Linux (failed tests):

=======================
     Test Results
=======================
# CoreCLR Bin Dir  : 
# Tests Discovered : 11402
# Passed           : 10588
# Failed           : 8
# Skipped          : 806
=======================

Test results for RyuJIT ARMEL/Linux (failed tests):

=======================
     Test Results
=======================
# CoreCLR Bin Dir  : 
# Tests Discovered : 11402
# Passed           : 10574
# Failed           : 22
# Skipped          : 806
=======================

Checked versions have about 6 failed tests. There are several segmentation fault and I think some of them because wrong configuration, we will check them manually.

Member

alpencolt commented Dec 13, 2017

@BruceForstall sorry for delaying.
Test results for RyuJIT ARM/Linux (failed tests):

=======================
     Test Results
=======================
# CoreCLR Bin Dir  : 
# Tests Discovered : 11402
# Passed           : 10588
# Failed           : 8
# Skipped          : 806
=======================

Test results for RyuJIT ARMEL/Linux (failed tests):

=======================
     Test Results
=======================
# CoreCLR Bin Dir  : 
# Tests Discovered : 11402
# Passed           : 10574
# Failed           : 22
# Skipped          : 806
=======================

Checked versions have about 6 failed tests. There are several segmentation fault and I think some of them because wrong configuration, we will check them manually.

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Dec 13, 2017

Contributor

@alpencolt Are those test results from a Checked or Release run? It sounds like maybe Release, since you say at the end that Checked has a different set of failures.

Can you open GitHub issues for the failures, once you have verified them?

Contributor

BruceForstall commented Dec 13, 2017

@alpencolt Are those test results from a Checked or Release run? It sounds like maybe Release, since you say at the end that Checked has a different set of failures.

Can you open GitHub issues for the failures, once you have verified them?

@alpencolt

This comment has been minimized.

Show comment
Hide comment
@alpencolt

alpencolt Dec 13, 2017

Member

@BruceForstall release sorry that forget to mention.
Armel tests were executed on chroot and now we've started to check them on real phone and many of this fails cannot be reproduced.
So our current infrastructure isn't ideal. I will share results when check carefully.

Member

alpencolt commented Dec 13, 2017

@BruceForstall release sorry that forget to mention.
Armel tests were executed on chroot and now we've started to check them on real phone and many of this fails cannot be reproduced.
So our current infrastructure isn't ideal. I will share results when check carefully.

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Dec 13, 2017

Contributor

As of d135144, RyuJIT is now the default JIT for ARM32 in .NET Core!

Congratulations and Thank You to everyone who has contributed to this effort!

You can see from this issue that the RyuJIT/arm32 porting effort started almost exactly one year ago. Starting then, building on the working legacy arm32 JIT and leveraging x86/x64/arm64 RyuJIT work, arm32 has gone from nothing to a fully functioning, extensively tested compiler, working on multiple platforms, and even with multiple ABIs (namely, SOFTFP). It is tested in the CI with over 30 JIT stress modes on Windows ARM hardware, some minimal testing on Ubuntu and Tizen, and almost 40 JIT stress modes in x86_arm altjit mode.

We have more work to do to finish this port and prepare for shipping, including fixing more bugs, including getting GCStress clean, and testing and automating more test configurations. The work continues to be tracked here, and in generic queries for "arch-arm32" labelled bugs.

Note that all Ubuntu and Tizen testing in the CI is now using RyuJIT/arm32. For Windows, the "arm" jobs use RyuJIT/arm32 and the "armlb" jobs use the LEGACY_BACKEND JIT, now named legacyjit.dll.

Onwards!

Contributor

BruceForstall commented Dec 13, 2017

As of d135144, RyuJIT is now the default JIT for ARM32 in .NET Core!

Congratulations and Thank You to everyone who has contributed to this effort!

You can see from this issue that the RyuJIT/arm32 porting effort started almost exactly one year ago. Starting then, building on the working legacy arm32 JIT and leveraging x86/x64/arm64 RyuJIT work, arm32 has gone from nothing to a fully functioning, extensively tested compiler, working on multiple platforms, and even with multiple ABIs (namely, SOFTFP). It is tested in the CI with over 30 JIT stress modes on Windows ARM hardware, some minimal testing on Ubuntu and Tizen, and almost 40 JIT stress modes in x86_arm altjit mode.

We have more work to do to finish this port and prepare for shipping, including fixing more bugs, including getting GCStress clean, and testing and automating more test configurations. The work continues to be tracked here, and in generic queries for "arch-arm32" labelled bugs.

Note that all Ubuntu and Tizen testing in the CI is now using RyuJIT/arm32. For Windows, the "arm" jobs use RyuJIT/arm32 and the "armlb" jobs use the LEGACY_BACKEND JIT, now named legacyjit.dll.

Onwards!

@YongseopKim

This comment has been minimized.

Show comment
Hide comment
@YongseopKim

YongseopKim Dec 19, 2017

Contributor

@dotnet/arm32-contrib thanks for your collaborations. I'll close this because now RyuJIT is the default backend-engine. Happy hacking onwards!

Contributor

YongseopKim commented Dec 19, 2017

@dotnet/arm32-contrib thanks for your collaborations. I'll close this because now RyuJIT is the default backend-engine. Happy hacking onwards!

@BruceForstall BruceForstall moved this from In progress to Complete in RyuJIT/ARM32 Dec 19, 2017

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Mar 19, 2018

Contributor

fyi, I am working on adding Ubuntu arm32 testing in the dotnet/coreclr CI system, on hardware, with #17018.

Contributor

BruceForstall commented Mar 19, 2018

fyi, I am working on adding Ubuntu arm32 testing in the dotnet/coreclr CI system, on hardware, with #17018.

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall May 30, 2018

Contributor

As of today, .NET Core 2.1 has been released, with ARM32 officially supported, using the RyuJIT/ARM32 compiler! You can see the announcement here.

Once again, thank you to everyone who contributed to this effort! It's been a great example of cross-company, around-the-world collaboration.

Based on the opening date of this issue, the RyuJIT/ARM32 port took about 18 months from start to finish. I mentioned some numbers above related to testing; here are some more numbers I collected recently for ARM32 testing done in our CI system:

  • 2 jobs on every PR (1 Windows, 1 Ubuntu)
  • 62 Windows arm on hardware test jobs in CI, 3 running daily, the rest running weekly.
  • 61 Ubuntu arm on hardware jobs in CI, 3 running daily, the rest running weekly.
  • 40 Windows x86_arm_altjit test jobs in CI, running daily

These include "normal" as well as many JIT stress and GC stress variations, using both the dotnet/coreclr and dotnet/corefx testbeds. In addition, there was testing on the Tizen emulator.

I'm looking forward to seeing what developers do with this work!

Contributor

BruceForstall commented May 30, 2018

As of today, .NET Core 2.1 has been released, with ARM32 officially supported, using the RyuJIT/ARM32 compiler! You can see the announcement here.

Once again, thank you to everyone who contributed to this effort! It's been a great example of cross-company, around-the-world collaboration.

Based on the opening date of this issue, the RyuJIT/ARM32 port took about 18 months from start to finish. I mentioned some numbers above related to testing; here are some more numbers I collected recently for ARM32 testing done in our CI system:

  • 2 jobs on every PR (1 Windows, 1 Ubuntu)
  • 62 Windows arm on hardware test jobs in CI, 3 running daily, the rest running weekly.
  • 61 Ubuntu arm on hardware jobs in CI, 3 running daily, the rest running weekly.
  • 40 Windows x86_arm_altjit test jobs in CI, running daily

These include "normal" as well as many JIT stress and GC stress variations, using both the dotnet/coreclr and dotnet/corefx testbeds. In addition, there was testing on the Tizen emulator.

I'm looking forward to seeing what developers do with this work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment