Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure: JIT/Methodical/xxobj/sizeof/_il_dbgsizeof_Target_32Bit/_il_dbgsizeof_Target_32Bit.sh #37470

Closed
v-haren opened this issue Jun 5, 2020 · 29 comments · Fixed by #38577
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs
Milestone

Comments

@v-haren
Copy link

v-haren commented Jun 5, 2020

failed in job: runtime-coreclr outerloop 20200604.7

failed tests:
JIT/Methodical/xxobj/sizeof/_il_dbgsizeof_Target_32Bit/_il_dbgsizeof_Target_32Bit.sh
JIT/Methodical/xxobj/sizeof/_il_relsizeof_Target_32Bit/_il_relsizeof_Target_32Bit.sh
JIT/Methodical/xxobj/sizeof/_il_relsizeof32_Target_32Bit/_il_relsizeof32_Target_32Bit.sh
JIT/Methodical/xxobj/sizeof/_il_dbgsizeof32_Target_32Bit/_il_dbgsizeof32_Target_32Bit.sh
JIT/Methodical/xxobj/sizeof/_il_relsizeof64_Target_32Bit/_il_relsizeof64_Target_32Bit.sh

Error message

Return code: 1
Raw output file: /root/helix/work/workitem/JIT/Methodical/Reports/JIT.Methodical/xxobj/sizeof/_il_dbgsizeof_Target_32Bit/_il_dbgsizeof_Target_32Bit.output.txt
Raw output:
BEGIN EXECUTION
/root/helix/work/correlation/corerun _il_dbgsizeof_Target_32Bit.dll ''
sizeof(RefComplexStruct) failed.
Expected: 100
Actual: 104
END EXECUTION - FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=/root/helix/work/correlation
> /root/helix/work/workitem/JIT/Methodical/xxobj/sizeof/_il_dbgsizeof_Target_32Bit/_il_dbgsizeof_Target_32Bit.sh
Expected: True
Actual: False


Stack trace
   at JIT_Methodical._xxobj_sizeof__il_dbgsizeof_Target_32Bit__il_dbgsizeof_Target_32Bit_._xxobj_sizeof__il_dbgsizeof_Target_32Bit__il_dbgsizeof_Target_32Bit_sh() in /__w/1/s/artifacts/tests/coreclr/Linux.arm.Checked/TestWrappers/JIT.Methodical/JIT.Methodical.XUnitWrapper.cs:line 201318
@v-haren v-haren added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 5, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jun 5, 2020
@BruceForstall BruceForstall added arch-arm32 os-linux Linux OS (any supported distro) and removed untriaged New issue has not been triaged by the area owner labels Jun 5, 2020
@BruceForstall BruceForstall added this to the 5.0 milestone Jun 5, 2020
@BruceForstall
Copy link
Member

BruceForstall commented Jun 5, 2020

CoreCLR Linux arm Checked @ (Ubuntu.1804.Arm32.Open)Ubuntu.1804.Armarch.Open

Many are also failing on Windows x86

@BruceForstall
Copy link
Member

@dotnet/jit-contrib

@BruceForstall BruceForstall added the blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs label Jun 13, 2020
@BruceForstall BruceForstall added this to To do by 6/12/2020 in .NET Core CodeGen Jun 13, 2020
@BruceForstall
Copy link
Member

The most important one (fails in most runs):

JIT\Methodical\xxobj\sizeof_il_dbgsizeof64_il_dbgsizeof64.cmd

@AndyAyersMS
Copy link
Member

Looking at _il_dbgsizeof64, there is just one novel method Main jitted and it is debug codegen. CI history shows failures starting around 6/1, so I build an x86 jit from just before then at 967d216 and diffed codegen for Main with current master; they're identical.

@AndyAyersMS
Copy link
Member

I don't have any lead on a fix here yet, and will probably work on some other bugs first.

If this is really blocking we might want to disable it (and related tests) for now.

@BruceForstall
Copy link
Member

Looking at _il_dbgsizeof64, there is just one novel method Main jitted and it is debug codegen. CI history shows failures starting around 6/1, so I build an x86 jit from just before then at 967d216 and diffed codegen for Main with current master; they're identical.

Did the 967d216 build pass the test, and the current one consistently fail the test, for you?

@AndyAyersMS
Copy link
Member

I can't get this to fail locally.

Am going to modify the runtime to assert if the return value is not an "expected" value and see if can use this to get some dumps from CI.

@AndyAyersMS
Copy link
Member

Looking at commits in that range I wonder if somehow #37274 is involved... among other things it renamed at least some of these tests (to include the Target... suffix) and from what I can tell the tests that got renamed did not fail under their old names.

@BruceForstall
Copy link
Member

fwiw, it's interesting that _il_dbgsizeof64.ilproj has:

<Compile Condition="'$(TargetArchitecture)' == 'x86'" Include="sizeof64.il" />
<Compile Condition="'$(TargetArchitecture)' != 'x86'" Include="64sizeof64.il" />

which seems probably wrong for arm32, since all the other proj files have something like:

<!-- There is a 32 and 64 version of this test to allow it to be compiled for all targets -->
<CLRTestTargetUnsupported Condition="'$(TargetBits)' != '32'">true</CLRTestTargetUnsupported>

@BruceForstall
Copy link
Member

Looking at commits in that range I wonder if somehow #37274 is involved...

Looks likely. (Although I don't know why you can't repro. Maybe you could download assets built by the CI and see if those repro locally)

Looks like _il_dbgsizeof64.ilproj didn't get the same conversion done that the others did, so its build is not properly specified.

@AndyAyersMS
Copy link
Member

Oddly enough _il_dbgsizeof64 only fails for x86, where it is "properly" specified.

@AndyAyersMS
Copy link
Member

Seeing if I can get some dumps from CI via #38309.

@AndyAyersMS
Copy link
Member

Looks like it worked; we have dumps.

@AndyAyersMS
Copy link
Member

Confirmed that in CI, _il_dbgsizeof64 is running the wrong test case for x86; when I build it locally I get the right one which is why I couldn't repro.

@AndyAyersMS
Copy link
Member

@jashook @trylek still sorting where things go off track here but I strongly suspect the failures above are from running the wrong versions of these tests on 32 bit platforms. This seems to be a consequence of #37274.

Either we're building the tests incorrectly or we're building both 32 and 64 bit versions and then running the wrong one...

@jashook
Copy link
Contributor

jashook commented Jun 24, 2020

/cc @sdmaclea

@trylek
Copy link
Member

trylek commented Jun 24, 2020

@AndyAyersMS - AFAIK @sdmaclea's work was about building all managed test components just once for all the architecture-specific test flavors, having Roslyn mark them as AnyCpu so that they are portable across execution architectures. From this viewpoint the _il_dbgsizeof64.ilproj project seems wrong, we shouldn't be using conditional <Compile clauses based on target architecture because by the time we're building these tests we don't yet know what architecture we'll be running on.

I'm inclined to suspect that this particular test got either overlooked in Steve's refactoring or added later / independently, not using the new pattern that Steve defined for architecture-specific tests - simply put, for a 32/64-bit specific test we should now have two different test projects both of which get compiled (possibly using different sources or compilation switches) and their execution scripts generated in CLRTest.Execute.targets will include runtime checks to skip those tests that don't support the actual runtime platform.

Having said that, the title of the issue refers to those tests that Steve did split so they shouldn't be affected by the weirdness that @BruceForstall pointed out for the _il_dbgsizeof64.ilproj script. I would hope that, with this in mind, we should be able to narrow the problem space. If we're running a *_Target_64bit test on ARM32, that would mean we're running the wrong test, but that doesn't seem to be the case. So I guess that either the information that the tests are actually built using Roslyn on an OSX x64 machine somehow leaks into the MSIL, rendering it non-portable across architectures (due to some remaining conditional compilation within the source code), or we're doing something wrong about the AnyCPU setting.

@jashook
Copy link
Contributor

jashook commented Jun 24, 2020

As @trylek has mentioned conditional compilation of managed tests using bitness is not supported as we will always build the managed components of the test on x64 OSX in CI.

Specifically below would not work in CI:

<Compile Condition="'$(TargetArchitecture)' == 'x86'" Include="sizeof64.il" />
<Compile Condition="'$(TargetArchitecture)' != 'x86'" Include="64sizeof64.il" />

@jashook
Copy link
Contributor

jashook commented Jun 24, 2020

Seems like this specific ilproj should just be deleted. There is already an ilproj for _il_dbgsizeof64_target_64bit and _il_dbgsizeof64_target_32bit

@AndyAyersMS
Copy link
Member

@trylek @jashook the x86 case seems to be in hand. Now trying to figure out what gets built and what gets run for the arm32 failures.

I have core dumps for these on arm32 from https://dev.azure.com/dnceng/public/_build/results?buildId=701494&view=ms.vss-test-web.build-test-results-tab&runId=21752388&resultId=107271&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab but haven't gotten to extracting the test IL just yet... my lldb/sos combo is not happy.

@AndyAyersMS
Copy link
Member

Ah, think I've perhaps finally figured this out.

Arm defines FEATURE_64BIT_ALIGNMENT and so the class sizes on arm don't match those on x86.

Before the changes in #37274 we were checking for arch=x86 when doing test version selection, and so on arm we were running the "64 bit" flavors of these tests. Now we're running the 32 bit flavors and they're failing.

That also explains why some other bit-size specific tests are ok and these particular tests are not.

I don't think these tests are all that valuable overall so instead of adding yet another category of exclusion/selection I'm tempted to just disable them for arm.

Thoughts?

@BruceForstall
Copy link
Member

So it looks like "x86" was right all along, and it wasn't a 32-bit/64-bit question.

@jashook @trylek @sdmaclea Is there an existing mechanism for determining a test run time whether a test should be run or not, based on architecture (or other criteria) (in the metadata; without adding that logic to the test)?

@AndyAyersMS
Copy link
Member

Seems a little odd that arm32 has the same class sizes as x64/arm64. Perhaps that was intentional, or perhaps a happy accident.

@BruceForstall
Copy link
Member

Seems a little odd that arm32 has the same class sizes as x64/arm64.

That's only because there are no pointers involved.

Actually, I'm thinking maybe these tests are valuable specifically for ARM, because ARM is the only platform that defines FEATURE_64BIT_ALIGNMENT and has that particular alignment requirements, so protecting against regression there makes sense.

@BruceForstall
Copy link
Member

Note that these tests were previously disabled for arm32 in issues.targets with the note "needs triage". After #37274, the issues.targets references were renamed for the new test names, however, the directories were written with incorrect casing. So presumably case-insensitive Windows has them disabled but case-sensitive Linux does not. Then #38186 disabled them for all platforms by adding another set of disabling text, this time with the correct casing (but note that the bad casing is still there).

@sdmaclea
Copy link
Contributor

This is the mechanism to control whether the script is generated for a given platform

<!-- This test runs on all platforms except x86 -->
<CLRTestTargetUnsupported Condition="'$(TargetArchitecture)' != 'x86'">true</CLRTestTargetUnsupported>

@sdmaclea
Copy link
Contributor

sdmaclea commented Jun 29, 2020

There was a class of tests which had 64-bit, 32-bit, and arm32-bit flavors of tests.

@AndyAyersMS
Copy link
Member

In these test cases, arm happens to have the same class sizes as arm64/x64.

So my proposal is to add another set of ILPROJs for arm, splitting the "target_32bit" cases into "target_x86" and "target_arm", and include the "64 bit" il sources for the latter. I won't rename the sources unless we feel it can add clarity.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Jun 29, 2020
On these tests the arm class layout happens to match the x64/arm64 layout and
is different than the x86 layout.

So, introduce arm specific test projects for these tests.

Also in the recent test build conversion one project was overlooked, so update
that project to work with this new scheme as well.

Closes dotnet#37470.
.NET Core CodeGen automation moved this from To do by 6/12/2020 to Done Jun 30, 2020
AndyAyersMS added a commit that referenced this issue Jun 30, 2020
On these tests the arm class layout happens to match the x64/arm64 layout and
is different than the x86 layout.

So, introduce arm specific test projects for these tests.

Also in the recent test build conversion one project was overlooked, so update
that project to work with this new scheme as well.

Remove all exclusions for these and older variants of these tests.

Closes #37470.
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants