Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure: System.Buffers.Tests.ArrayBufferWriterTests_Char.Advance #42517

Closed
v-haren opened this issue Sep 21, 2020 · 27 comments · Fixed by #57282
Closed

Test failure: System.Buffers.Tests.ArrayBufferWriterTests_Char.Advance #42517

v-haren opened this issue Sep 21, 2020 · 27 comments · Fixed by #57282
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Milestone

Comments

@v-haren
Copy link

v-haren commented Sep 21, 2020

failed in job: runtime-coreclr libraries-jitstress 20200920.1

failed tests:
System.Buffers.Tests.ArrayBufferWriterTests_Char.Advance
System.Buffers.Tests.ArrayBufferWriterTests_Byte.Advance

net5.0-Linux-Release-arm-CoreCLR_checked-tailcallstress-(Ubuntu.1804.Arm32.Open)Ubuntu.1804.Armarch.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440

Error message

Assert.True() Failure
Expected: True
Actual:   False


Stack trace
   at System.Buffers.Tests.ArrayBufferWriterTests`1.Advance() in /_/src/libraries/System.Memory/tests/ArrayBufferWriter/ArrayBufferWriterTests.T.cs:line 71

category:correctness
theme:value-numbering
skill-level:expert
cost:medium

@ghost
Copy link

ghost commented Sep 21, 2020

Tagging subscribers to this area: @tannergooding, @pgovind, @jeffhandley
See info in area-owners.md if you want to be subscribed.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Sep 21, 2020
@tannergooding
Copy link
Member

This failed in multiple scenarios, not just arm. It also failed for x86, x64, and on Windows.

@tannergooding
Copy link
Member

Looks like this repro's with just TieredCompilation=0

@tannergooding
Copy link
Member

CC @dotnet/jit-contrib, this looks to be caused by inlining the ReadOnlyMemory<T>.Span getter method.

@JulieLeeMSFT JulieLeeMSFT added this to Needs Triage in .NET Core CodeGen via automation Sep 21, 2020
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 21, 2020
@JulieLeeMSFT JulieLeeMSFT added this to the 5.0.0 milestone Sep 21, 2020
@briansull
Copy link
Contributor

briansull commented Sep 22, 2020

I am still trying to get a repro for this on my dev machine:

            build.cmd clr.runtime+clr.corelib+clr.nativecorelib+clr.alljits+libs+libs.tests -arch x64 -c Checked -librariesConfiguration Release /p:ArchiveTests=true

            cd artifacts\bin\System.Memory.Tests\net5.0-Release
            set COMPlus_JitStress=1
            set COMPlus_TieredCompilation=0
            RunTests.cmd --runtime-path <PATH>\runtime\artifacts\bin\testhost\net5.0-Windows_NT-Release-x64

=== TEST EXECUTION SUMMARY ===
System.Memory.Tests Total: 45609, Errors: 0, Failed: 0, Skipped: 0, Time: 8.092s

<test name="System.Buffers.Tests.ArrayBufferWriterTests_Char.Advance" type="System.Buffers.Tests.ArrayBufferWriterTests_Char" method="Advance" time="0.0182829" result="Pass" />
<test name="System.Buffers.Tests.ArrayBufferWriterTests_Byte.Advance" type="System.Buffers.Tests.ArrayBufferWriterTests_Byte" method="Advance" time="0.0135947" result="Pass" />

@briansull
Copy link
Contributor

briansull commented Sep 22, 2020

First test failure was for
#42474 Use ReadOnlySpan for parsing FileWatcher responses

@briansull
Copy link
Contributor

@tannergooding Can you help me get a local repro for this issue?

@tannergooding
Copy link
Member

tannergooding commented Sep 22, 2020

I just do:

.\build.cmd -subset clr -configuration checked
.\build.cmd -subset libs+libs.tests -configuration release -rc checked
.\src\tests\build.cmd x64 checked generatelayoutonly
cd .\artifacts\bin\System.Memory.Tests\net5.0-Release
$env:COMPlus_TieredCompilation=0
..\..\..\..\artifacts\tests\coreclr\Windows_NT.x64.Checked\Tests\Core_Root\CoreRun.exe xunit.console.dll System.Memory.Tests.dll -method System.Buffers.Tests.ArrayBufferWriterTests_Byte.Advance

@tannergooding
Copy link
Member

added in the .\src\tests\build.cmd x64 checked generatelayoutonly command which I had forgotten

@briansull
Copy link
Contributor

Thanks @tannergooding I have a repro!

@briansull
Copy link
Contributor

Looks like fgMorphBlkNode produces a different result for some trees:

image

@briansull
Copy link
Contributor

Testing this:
reverting commit 92b125f.

@briansull
Copy link
Contributor

Passes without the commit 92b125f

@briansull
Copy link
Contributor

@sandreenko _ This change cause the regression.
Allow more obj(addr(lcl_var) foldings. (#42343)

@sandreenko
Copy link
Contributor

Thanks @briansull for the analysis, the change was merged after 5.0 so we can change the milestone to 6.0.
Do you mind if I take the issue from here?

@sandreenko sandreenko modified the milestones: 5.0.0, 6.0.0 Sep 22, 2020
@sandreenko sandreenko added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug and removed area-System.Buffers blocking-release labels Sep 22, 2020
@briansull
Copy link
Contributor

Sure

@sandreenko sandreenko self-assigned this Sep 22, 2020
@JulieLeeMSFT
Copy link
Member

Thanks @briansull for root-causing this issue.

@JulieLeeMSFT JulieLeeMSFT moved this from Needs Triage to Backlog in .NET Core CodeGen Sep 23, 2020
sandreenko pushed a commit to sandreenko/runtime that referenced this issue Sep 24, 2020
@sandreenko
Copy link
Contributor

My current understanding is that the optimization uncovered a preexisting issue with VN and morph,
the morph is not clearing DONT_CSE in fgMorphBlockOperand while folding ADDR(OBJ(LCL_VAR)), so LCL_VAR keeps DONT_CSE even no under address node:

block assignment to morph:
               [003011] -A--G-------              *  ASG       struct (copy)
               [003010] D----+-N----              +--*  LCL_VAR   struct<System.ReadOnlyMemory`1[Byte], 16> V09 loc8         
               [003008] n---G+------              \--*  OBJ       struct<System.ReadOnlyMemory`1[Byte], 16>
               [003016] -----+------                 \--*  ADDR      byref 
               [003017] -----+-N----                    \--*  LCL_VAR   struct<System.Memory`1[Byte], 16> V235 tmp222      
 with no promoted structs this requires a CopyBlock.

Local V09 should not be enregistered because: written in a block op

Local V235 should not be enregistered because: written in a block op
Replacing block node [003008] with lclVar V235

fgMorphCopyBlock (after):
               [003011] -A--G-------              *  ASG       struct (copy)
               [003010] D----+-N----              +--*  LCL_VAR   struct<System.ReadOnlyMemory`1[Byte], 16> V09 loc8         
               [003017] -----+-N----              \--*  LCL_VAR   struct<System.Memory`1[Byte], 16> V235 tmp222 

there are no reasons for 003017 to keep DONT_CSE but it does and we are getting lucky avoiding incorrect optimizations later (cc @echesakovMSFT another example of DONT_CSE on the right side of ASG).

but the bug is in VN logic (003014 was copy propogated in the bad file):

diff --git "a/D:\\Sergey\\logs\\temp\\good" "b/D:\\Sergey\\logs\\temp\\bad"
index 50e7950fd58..e25dbf094db 100644
--- "a/D:\\Sergey\\logs\\temp\\good"
+++ "b/D:\\Sergey\\logs\\temp\\bad"
@@ -1,48 +1,35 @@
-***** BB95, STMT00656(before)
-N003 (  7,  5) [003014] -A------R---              *  ASG       struct (copy)
-N002 (  3,  2) [003012] D------N----              +--*  LCL_VAR   struct<System.Memory`1[Byte], 16> V235 tmp222      
-N001 (  3,  2) [002934] ------------              \--*  LCL_VAR   struct<System.Memory`1[Byte], 16> V234 tmp221      u:3 (last use)
-
-N001 [002934]   LCL_VAR   V234 tmp221      u:3 (last use) => $617 {PhiDef($ea, $3, $5c0)}
-LHS V235 not in ssa at [003012], so no VN assigned
-N003 [003014]   ASG       => $VN.Void
-
-***** BB95, STMT00656(after)
-N003 (  7,  5) [003014] -A------R---              *  ASG       struct (copy) $VN.Void
-N002 (  3,  2) [003012] D------N----              +--*  LCL_VAR   struct<System.Memory`1[Byte], 16> V235 tmp222      
-N001 (  3,  2) [002934] ------------              \--*  LCL_VAR   struct<System.Memory`1[Byte], 16> V234 tmp221      u:3 (last use) $617
-
----------
-
 ***** BB95, STMT00058(before)
 N003 (  7,  5) [003011] -A--G---R---              *  ASG       struct (copy)
 N002 (  3,  2) [003010] D------N----              +--*  LCL_VAR   struct<System.ReadOnlyMemory`1[Byte], 16> V09 loc8         d:3
-N001 (  3,  2) [003017] -------N----              \--*  LCL_VAR   struct<System.Memory`1[Byte], 16> V235 tmp222       (last use)
+N001 (  3,  2) [003008] ------------              \--*  LCL_VAR   struct<System.Memory`1[Byte], 16> V234 tmp221      u:3 (last use)
 
-N001 [003017]   LCL_VAR   V235 tmp222       (last use) => $4bc {4bc}
-Tree [003011] assigned VN to local var V09/3: new uniq $4bf {4bf}
+N001 [003008]   LCL_VAR   V234 tmp221      u:3 (last use) => $617 {PhiDef($ea, $3, $5c0)}
+Tree [003011] assigned VN to local var V09/3: $617 {PhiDef($ea, $3, $5c0)}
 N003 [003011]   ASG       => $VN.Void
 
 ***** BB95, STMT00058(after)
 N003 (  7,  5) [003011] -A--G---R---              *  ASG       struct (copy) $VN.Void
 N002 (  3,  2) [003010] D------N----              +--*  LCL_VAR   struct<System.ReadOnlyMemory`1[Byte], 16> V09 loc8         d:3
-N001 (  3,  2) [003017] -------N----              \--*  LCL_VAR   struct<System.Memory`1[Byte], 16> V235 tmp222       (last use) $4bc
+N001 (  3,  2) [003008] ------------              \--*  LCL_VAR   struct<System.Memory`1[Byte], 16> V234 tmp221      u:3 (last use) $617
 
 N003 (  3,  4) [003030] -A------R---              *  ASG       ref   
 N002 (  1,  1) [003029] D------N----              +--*  LCL_VAR   ref    V239 tmp226      d:2
 N001 (  3,  4) [003028] ------------              \--*  LCL_FLD   ref    V09 loc8         u:3[+0] Fseq[_object]
 
   VNApplySelectors:
-    VNForHandle(_object) is $193, fieldType is ref                                                                                                                                                                        
-    VNForMapSelect($4bf, $193):ref returns $8f8 {$4bf[$193]}
+    VNForHandle(_object) is $193, fieldType is ref
+      AX2: $193 != $18c ==> select([$36a]store($368, $18c, $9ce), $193) ==> select($368, $193).
+      AX2: $193 != $18b ==> select([$368]store($24b, $18b, $40), $193) ==> select($24b, $193).
+      AX2: $193 != $18a ==> select([$24b]store($1, $18a, $8f3), $193) ==> select($1, $193).
+    VNForMapSelect($617, $193):ref returns $VN.Null
   VNApplySelectors:
     VNForHandle(_object) is $193, fieldType is ref
-    VNForMapSelect($4bf, $193):ref returns $8f8 {$4bf[$193]}
-N001 [003028]   LCL_FLD   V09 loc8         u:3[+0] Fseq[_object] => $8f8 {$4bf[$193]}
-N002 [003029]   LCL_VAR   V239 tmp226      d:2 => $8f8 {$4bf[$193]}
-N003 [003030]   ASG       => $8f8 {$4bf[$193]}
+    VNForMapSelect($617, $193):ref returns $VN.Null
+N001 [003028]   LCL_FLD   V09 loc8         u:3[+0] Fseq[_object] => $VN.Null
+N002 [003029]   LCL_VAR   V239 tmp226      d:2 => $VN.Null
+N003 [003030]   ASG       => $VN.Null
 
 ***** BB95, STMT00660(after)
-N003 (  3,  4) [003030] -A------R---              *  ASG       ref    $8f8
-N002 (  1,  1) [003029] D------N----              +--*  LCL_VAR   ref    V239 tmp226      d:2 $8f8
-N001 (  3,  4) [003028] ------------              \--*  LCL_FLD   ref    V09 loc8         u:3[+0] Fseq[_object] $8f8
+N003 (  3,  4) [003030] -A------R---              *  ASG       ref    $VN.Null
+N002 (  1,  1) [003029] D------N----              +--*  LCL_VAR   ref    V239 tmp226      d:2 $VN.Null
+N001 (  3,  4) [003028] ------------              \--*  LCL_FLD   ref    V09 loc8         u:3[+0] Fseq[_object] $VN.Null

maybe we were lucky with LHS V235 not in SSA at [003012], so no VN assigned because we have a limited number of variables in SSA and that one did not fit in the past.

The main problem that I see here is VNForMapSelect($617, $193):ref returns $VN.Null but it will probably take longer to understand and fix the underlaying issue so I opened a PR to disable the test temporary.

sandreenko pushed a commit that referenced this issue Sep 25, 2020
* Disable ArrayBufferWriterTests.Advance with JitStress.

Issue #42517

* Try to disable broader.
@briansull
Copy link
Contributor

I would expect that it is conservative to have DONT_CSE set on any node.
It should just prevent that particular node from participating in the CSE logic,
so we would miss an opportunity to use a LclVar in place of the expression that has the DONT_CSE set.

@briansull
Copy link
Contributor

ValueNumbering of field writes and subsequent reads are pretty tricky to debug.

If you need any help or want to assign this to me that is fine as well.

@briansull
Copy link
Contributor

briansull commented Sep 25, 2020

The main problem that I see here is VNForMapSelect($617, $193):ref returns $VN.Null

That means that we saw the definition assignment that assigned a nullptr and that we didn't kill/replace the value when it was changed by a second assignment.

@sandreenko
Copy link
Contributor

I would expect that it is conservative to have DONT_CSE set on any node.
It should just prevent that particular node from participating in the CSE logic,
so we would miss an opportunity to use a LclVar in place of the expression that has the DONT_CSE set.

Yes, but now we have a logic that depends on it to identify if a lcl var is store a src or dst (#40871). In this case I think it does not matter because it only affects structs and they can't be small types but if we find a case where it is set on a primitive struct we will be able to create a test that hits assert assert((tree->gtFlags & GTF_VAR_FOLDED_IND) == 0) after global morph.

@sandreenko
Copy link
Contributor

sandreenko commented Sep 25, 2020

ValueNumbering of field writes and subsequent reads are pretty tricky to debug.
If you need any help or want to assign this to me that is fine as well.

Thanks for the offer I will probably need your help soon. I have found this code in importer:

if (varTypeIsStruct(JITtype2varType(referentType)) &&
(varDsc->GetStructHnd() != referentClassHandle))
{
// We are returning a byref to struct1; the method signature specifies return type as
// byref
// to struct2. struct1 and struct2 are different so we are "reinterpreting" the struct.
// This may happen in, for example, System.Runtime.CompilerServices.Unsafe.As<TFrom,
// TTo>.
// We need to mark the source struct variable as having overlapping fields because its
// fields may be accessed using field handles of a different type, which may confuse
// optimizations, in particular, value numbering.
JITDUMP("\nSetting lvOverlappingFields to true on V%02u because of struct "
"reinterpretation\n",
addrChild->AsLclVarCommon()->GetLclNum());
varDsc->lvOverlappingFields = true;

and we had this flag set for V235 but after the change it was copy propogated to V234, which did not have the flag, and VN was confused.
This code was added in dotnet/coreclr#24482 and specifically for these tests System.Buffers.Tests.ArrayBufferWriterTests_Byte.Advance and System.Buffers.Tests.ArrayBufferWriterTests_Char.Advance.

So there are several possibilities here:

  1. teach value numbering to work with such reinterpretations, note that they are not taking pointers, they are just struct copies that change all fileds (invalidate previous VNs);
  2. do no copy propogate when lclVars have different lvOverlappingFields;
  3. or set lvOverlappingFields when propogate a lclvar with this flag to a local var without.

Of course, the first looks like the most attractive way to solve it but I think I need to write a repro first, using the one from @erozenfeld.

Updated the link to the PR.

@sandreenko
Copy link
Contributor

@briansull I wrote a small repro for this issue: sandreenko@ec2ae56

JitDump for Test2:
out.txt

The issue is Setting lvOverlappingFields to true on V05 because of struct reinterpretation
when our tree looks like:

***** BB04
STMT00012 (IL 0x027...  ???)
               [000052] -A----------              *  ASG       struct (copy)
               [000050] D------N----              +--*  LCL_VAR   struct<GitHub_24159.Str1, 20> V05 tmp1         
               [000019] n-----------              \--*  OBJ       struct<GitHub_24159.Str1, 20>
               [000018] ------------                 \--*  ADDR      byref 
               [000016] -------N----                    \--*  LCL_VAR   struct<GitHub_24159.Str1, 20> V00 loc0 

and the comment says:

     // We need to mark the source struct variable as having overlapping fields because its 
     // fields may be accessed using field handles of a different type, which may confuse 
     // optimizations, in particular, value numbering. 

so in this case we should mark V00, not V05

it means we now have another option:
4. fix the code in importation to set the flag to source, add asserts that we do not try to copy propagating structs with different lvOverlappingFields values.

What resoltuion would you prefer for it? Would it be profitable to invest time to teach VN to work with such cases or do you think they are too rare? I think it is a better long-term solution but its cost is unclear for me.

@briansull
Copy link
Contributor

briansull commented Sep 26, 2020

I don't think it is easy to fix ValueNumbering to handle overlapping fields.
The maps start from the reverse order, so writes into i2's can't change the values held by j3's

@BruceForstall
Copy link
Member

I presume this is still failing in the CI (except the CI has been broken), but there has been no update recently. What's the story? Should the failing tests be disabled until this is fixed?

@sandreenko
Copy link
Contributor

Hi Bruce, the test was disabled in #42674. It does not fail in the CI currently because of that.
There are some details in my internal reports and in the proposed fix #42806 but the issue needs more analysis in the assertion-based copy propagation, I suspect we can have more issues there.

@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 7, 2020
.NET Core CodeGen automation moved this from Backlog (General) to Done Aug 24, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Projects
Archived in project
7 participants