Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure: Assert failure: Verify_TypeLayout 'System.Numerics.Vector`1' failed to verify type layout #60036

Closed
BruceForstall opened this issue Oct 5, 2021 · 8 comments · Fixed by #73406

Comments

@BruceForstall
Copy link
Member

runtime-coreclr outerloop pipeline is failing on windows/Linux arm64 due to this assert.

https://dev.azure.com/dnceng/public/_build/results?buildId=1397053&view=ms.vss-test-web.build-test-results-tab

e.g.,

    Interop\PInvoke\Generics\GenericsTest\GenericsTest.cmd [FAIL]
      Could Not Find D:\h\w\B3A40985\w\A28E0905\e\Interop\PInvoke\Generics\GenericsTest\IL-CG2\composite-r2r.dll
      Could Not Find D:\h\w\B3A40985\w\A28E0905\e\Interop\PInvoke\Generics\GenericsTest\GenericsNative.dll.rsp
      Could Not Find D:\h\w\B3A40985\w\A28E0905\e\Interop\PInvoke\Generics\GenericsTest\GenericsTest.dll.rsp
      Could Not Find D:\h\w\B3A40985\w\A28E0905\e\Interop\PInvoke\Generics\GenericsTest\TestLibrary.dll.rsp
      
      Assert failure(PID 11376 [0x00002c70], Thread: 12804 [0x3204]): Verify_TypeLayout 'System.Numerics.Vector`1' failed to verify type layout
      
      CORECLR! LoadDynamicInfoEntry + 0xBF4 (0x00007ff9`1dac170c)
      CORECLR! Module::FixupNativeEntry + 0x6C (0x00007ff9`1da4ddec)
      CORECLR! Module::FixupDelayListAux<Module *,int (__cdecl Module::*)(CORCOMPILE_IMPORT_SECTION *,unsigned __int64,unsigned __int64 *,int)> + 0x200 (0x00007ff9`1db64c58)
      CORECLR! ReadyToRunInfo::GetEntryPoint + 0x2F4 (0x00007ff9`1db65b9c)
      CORECLR! MethodDesc::GetPrecompiledR2RCode + 0x38 (0x00007ff9`1db0fe78)
      CORECLR! MethodDesc::GetPrecompiledCode + 0x28 (0x00007ff9`1db0fc90)
      CORECLR! MethodDesc::PrepareILBasedCode + 0x230 (0x00007ff9`1db11b50)
      CORECLR! MethodDesc::PrepareCode + 0x54 (0x00007ff9`1db11914)
      CORECLR! CodeVersionManager::PublishVersionableCodeIfNecessary + 0x280 (0x00007ff9`1da70ee0)
      CORECLR! MethodDesc::DoPrestub + 0x764 (0x00007ff9`1db0de2c)
          File: D:\workspace\_work\1\s\src\coreclr\vm\jitinterface.cpp Line: 13647
          Image: D:\h\w\B3A40985\p\corerun.exe
      
      
      Return code:      1
      Raw output file:      D:\h\w\B3A40985\w\A28E0905\uploads\Reports\Interop.PInvoke\Generics\GenericsTest\GenericsTest.output.txt

@dotnet/crossgen-contrib

@BruceForstall BruceForstall added arch-arm64 area-crossgen2-coreclr blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs labels Oct 5, 2021
@BruceForstall BruceForstall added this to the 7.0.0 milestone Oct 5, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Oct 5, 2021
@BruceForstall
Copy link
Member Author

@tannergooding Any chance #53450 could be causing this?

@trylek
Copy link
Member

trylek commented Oct 5, 2021

I was actually looking at the bug earlier today. It indeed seems specific to arm64, I must admit I don't yet fully understand the extent of Tanner's change but I suspect it most likely just exposes a previously unseen type layout corner case we're still not getting right in Crossgen2. I have revived my work from earlier this year regarding cleanup of the CoreCLR type layout algorithms, I'll investigate this issue as part of that task.

@trylek trylek self-assigned this Oct 5, 2021
@trylek trylek removed the untriaged New issue has not been triaged by the area owner label Oct 5, 2021
@trylek
Copy link
Member

trylek commented Oct 5, 2021

As a short-term mitigation I'll put up a PR blocking the test out with this issue.

@tannergooding
Copy link
Member

@BruceForstall, its certainly possible.

However, the change was largely just a refactoring so the existing simd.cpp and simdashwintrinsic.cpp logic could be shared between System.Runtime.Intrinsics and System.Numerics types; so I'd lean towards what Tomas was suggesting that there may have been an existing bug here.

Let me know if there's anything I can do to help with the investigation.

trylek added a commit to trylek/runtime that referenced this issue Oct 12, 2021
In my change from the summer I added provisions to the type layout
check to support printing the differences as an aid for investigation
of this class of failures; I however put the diff display after the
assertion so that it actually doesn't get hit as the assertion tears
down the process. This change fixes the ordering and should let us
review the particular mismatch that occurs in the arm64 runs
(dotnet#60036).

Thanks

Tomas
trylek added a commit that referenced this issue Oct 12, 2021
In my change from the summer I added provisions to the type layout
check to support printing the differences as an aid for investigation
of this class of failures; I however put the diff display after the
assertion so that it actually doesn't get hit as the assertion tears
down the process. This change fixes the ordering and should let us
review the particular mismatch that occurs in the arm64 runs
(#60036).

Thanks

Tomas

P.S. The change includes the instrumentation to unblock the
GenericsTest test that hits this issue to let me analyze the diff.
@trylek
Copy link
Member

trylek commented Oct 12, 2021

OK, I found out that diagnostics I added for this purpose in the summer doesn't really work because the call that explains the type mismatch in the log is below the assertion failure which tears down the process. By using an instrumented run including the fix I'm currently code reviewing and removal of the failing GenericsTest from issues.targets I have received the following additional info:

      Testing Vector<bool>
      Type System.Numerics.Vector`1: expected HFA type 00000004, actual 00000000
      Type System.Numerics.Vector`1: expected HFA type 00000004, actual 00000000
      Expected: 100
      Actual: -1073740286

@davidwrighton / @jakobbotsch, can you please comment on what is the correct HFA value? I believe that the left-hand side (expected, 4) is what Crossgen2 generates whereas the right-hand side (actual, 0) is what the CoreCLR runtime generates when constructing the type.

This is what R2RDump shows for the fixup:

System.Numerics.Vector`1 Flags READYTORUN_LAYOUT_HFA, READYTORUN_LAYOUT_Alignment, READYTORUN_LAYOUT_Alignment_Native, READYTORUN_LAYOUT_GCLayout, READYTORUN_LAYOUT_GCLayout_Empty Size 16 HFAType 4 (VERIFY_TYPE_LAYOUT)

Thanks

Tomas

@tannergooding
Copy link
Member

Is this maybe a disconnect between between Vector<T> when T is an integer vs when it is a floating-point type?

Perhaps there is some missing handling that's treating it like a "standard user-defined struct" rather than something which is either __m128 or __m256 (in which case, it is an HVA of 1, not an HFA of n).

  • This, unlike for things like Vector4, should be fine since Vector<T> will never be usable in interop (its variable sized and other considerations so we block it and always have), so its safe to declare as an "ABI type" rather than some "user defined type"

@mangod9
Copy link
Member

mangod9 commented Jul 19, 2022

this is marked as blocking-outerloop, perhaps since the test is disabled? Another type layout issue which needs to be looked at?

@trylek
Copy link
Member

trylek commented Jul 19, 2022

I believe that we should remove the "blocking outerloop" label when we disable a test - technically all failing tests are disabled because otherwise they would be blocking innerloop, outerloop or other pipelines, in my opinion the blocking label is a "call to action" that may result in several things and blocking the test in issues.targets is one of them; having said that it's most likely it was myself who overlooked removing the label when I added the issues.targets entries, I'm going to go ahead and remove it now.

@trylek trylek removed the blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs label Jul 19, 2022
@mangod9 mangod9 assigned davidwrighton and unassigned trylek Aug 4, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 8, 2022
davidwrighton added a commit that referenced this issue Aug 8, 2022
…does not match native runtime (#73406)

- We didn't check to make sure that the type of the Vector<T> was a primitive numeric as the runtime does
- Re-enable test

Fixes #60036
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 8, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Sep 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.