Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Arm Containment Cleanup #13579

Merged
merged 2 commits into from Aug 26, 2017
Merged

Conversation

CarolEidt
Copy link

  • Fix a couple of issues with TreeNodeInfoInit for struct arguments
  • Remove duplicate calls to LowerBlockStore from TreeNodeInfoInit for arm and arm64
  • Eliminate duplicative isMultiReg method from GenTree and fix condition for OperIsMultiRegOp()

@@ -3517,6 +3517,15 @@ static int ComputeOperandDstCount(GenTree* operand)
// pointers to argument setup stores.
return 0;
}
#ifdef _TARGET_ARMARCH_
else if (operand->OperIsPutArgStk())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to find why srcCount of PUTARG_STK is 0 instead of adding this case? I think setting of srcCount for PUTARG_STK has some problem on ARMARCH.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method, we are only interested in operands that either produce a register, or are contained. In the case of PUTARG_STK, it may have a srcCount of zero if its child is a GT_OBJ with a GT_LCL_VAR_ADDR child (see https://github.com/dotnet/coreclr/blob/master/src/jit/lower.cpp#L868)

Copy link

@hseok-oh hseok-oh Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct on lsra operation & codegen when we set srcCount for PUTARG_STK as 0 if its child is a GT_OBJ with a GT_LCL_VAR_ADDR child? In x64/x86, srcCount is set by using GetOperandSourceCount() (https://github.com/dotnet/coreclr/blob/master/src/jit/lsraxarch.cpp#L1635), but In arm32/arm64, srcCount is set by using only child's source (https://github.com/dotnet/coreclr/blob/master/src/jit/lsraarmarch.cpp#L670)

Copy link
Member

@hqueue hqueue Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt This code can address assertion #13565. However I also have a same question @hseok-oh had.

Here is an exmaple from #13565.
Before #13198, GT_PUTARG_STK had one srcCount and consume one register during LSRA.

// Before
N4533 (???,???) [043750] ----GO-------             *  PUTARG_STK [+0x08] stru|tmp550      [+4] $cbe
ct REG NA
+<TreeNodeInfo @ 4533 0=1 1i 0f src=[r0-r9 r12 lr] int=[r0-r9 r12 lr] dst=[r0-r9 r12 lr] I> 
..(suppressed)..
                 N4533.                    PUTARG_STK [+0x08]
                 consume=1 produce=0
...
                 N4555.                    CALL     
                 consume=4 produce=0  
...
                                                                             /--*  t22477 struct 
               Generating: N4533 (???,???) [043750] ----GO-------             *  PUTARG_STK [+0x08] struct REG NA
                                                                Byref regs: 0008 {r3} => 0000 {}
               IN0a2a:             ldr     r2, [r3]
               IN0a2b:             str     r2, [sp+0x08]        // [V1427 OutArgs+0x08]
               IN0a2c:             ldr     r2, [r3+4]
               IN0a2d:             str     r2, [sp+0x0c]        // [V1427 OutArgs+0x0c]
               IN0a2e:             ldr     r2, [r3+8]
               IN0a2f:             str     r2, [sp+0x10]        // [V1427 OutArgs+0x10]
               IN0a30:             ldr     r2, [r3+12]   
               IN0a31:             str     r2, [sp+0x14]        // [V1427 OutArgs+0x14]

Now, GT_PUTARG_STK has zero srcCount and raises assertion during LSRA.

// Now
N4533 (???,???) [043748] ----GO-------             *  PUTARG_STK [+0x08] struct REG NA
+<TreeNodeInfo @ 4533 0=0 1i 0f src=[r0-r9 r12 lr] int=[r0-r9 r12 lr] dst=[r0-r9 r12 lr] I>`
...
                 N4533.                    PUTARG_STK [+0x08]
                 consume=0 produce=0
// Is it expected behavior ?
// There is no consume for PUTARG_STK which is actually
// use one register (r3 in above example) as source
// and use one internal register (r2 in above example).
...
                 N4555.                    CALL
                 consume=4 produce=0
// An assertion raised while processing this CALL related to above PUTARG_STK
// And above fix can remove this assertion

...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the dump you provided, I can't tell what the child of the PUTARG_STK node is. Can you tell me which test this is from?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the child is a lclVar, there should be no source register; each word of the struct would be loaded from its stack location into the temp register (note the "1i" in the TreeNodeInfo which indicates that 1 internal register is used). Then it will be stored to the outgoing argument stack location.

Copy link
Member

@hqueue hqueue Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat late to this comment.
It seems that PUTARG_STK in above example has GT_ADD as a child.

               N003 (  1,  1) [022473] -------------    t22473 =    LCL_VAR   ref    V01 loc1         u:3 $2c0
               N004 (  1,  1) [025101] -------------    t25101 =    CNS_INT   int    56 field offset Fseq[m_cl_op2] $f50
                                                                 /--*  t22473 ref    
                                                                 +--*  t25101 int    
               N005 (  3,  3) [025102] -------------    t25102 = *  ADD       byref  $1044
                                                                 /--*  t25102 byref  
               N006 (  9,  7) [022477] ----GO-------    t22477 = *  OBJ(16)   struct <l:$896, c:$d04>
                                                                 /--*  t22477 struct 
                              [043750] ----GO-------             *  PUTARG_STK [+0x08] struct

What do you think of this case ?

@CarolEidt
Copy link
Author

@pgavlin @BruceForstall PTAL
cc @dotnet/jit-contrib
Tests are still running on my pi, but I've run a couple of tests by hand and verified that it addresses some instances of a couple of asserts. I'll update when the tests are complete.

@@ -1333,7 +1333,11 @@ struct GenTree
bool OperIsMultiRegOp() const
{
#if !defined(LEGACY_BACKEND) && defined(_TARGET_ARM_)
if (gtOper == GT_MUL_LONG || gtOper == GT_PUTARG_REG || gtOper == GT_COPY)
if (gtOper == GT_MUL_LONG ||
#ifdef SOFT_FP
Copy link
Member

@wateret wateret Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be #ifdef ARM_SOFTFP, looks like a typo.

And I think GT_COPY is MultiRegOp on armel only as well as GT_PUTARG_REG since these two nodes come together for double register argument passing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching the typo! I believe that GT_COPY should be enabled for both. At present, the register allocator doesn't insert copies of tree temps, but it may in future (e.g. to support conflicting register requirements, or to avoid a spill), in which case it will be needed if a GT_MUL_LONG is copied.

@@ -7030,7 +7030,7 @@ GenTreePtr Compiler::gtNewPutArgReg(var_types type, GenTreePtr arg, regNumber ar
assert(arg != nullptr);

GenTreePtr node = nullptr;
#if !defined(LEGACY_BACKEND) && defined(_TARGET_ARM_)
#if !defined(LEGACY_BACKEND) && defined(ARM_SOFTFP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change this way, I think we can remove the if statement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good.

@CarolEidt
Copy link
Author

Here are the test results from my pi:

19 total failures

11 Runtime Failures:

  • Exceptions.ForeignThread.ForeignThreadExceptions.ForeignThreadExceptions
  • GC.Scenarios.ServerModel.servermodel.servermodel
  • JIT.jit64.opt.cse.hugeexpr1.hugeexpr1
  • JIT.jit64.opt.cse.HugeField1.HugeField1
  • JIT.jit64.opt.rngchk.ArrayWithThread_o.ArrayWithThread_o
  • JIT.Methodical.doublearray.dblarray3_cs_d.dblarray3_cs_d
  • JIT.Methodical.doublearray.dblarray3_cs_do.dblarray3_cs_do
  • JIT.Methodical.doublearray.dblarray3_cs_r.dblarray3_cs_r
  • JIT.Methodical.doublearray.dblarray3_cs_ro.dblarray3_cs_ro
  • JIT.Methodical.ELEMENT_TYPE_IU._il_dbgsizeof._il_dbgsizeof
  • JIT.Methodical.ELEMENT_TYPE_IU._il_relsizeof._il_relsizeof

2 Assertion failed 'info->dstCount == 0' (lsraarm.cpp Line: 473)

  • JIT.Methodical.Boxing.boxunbox._il_dbgchain._il_dbgchain
  • JIT.Methodical.xxobj.ldobj._il_dbgldobj_I8._il_dbgldobj_I8

4 Assertion failed 'genActualType(op1->TypeGet()) == genActualType(op2->TypeGet()) || varTypeIsI(op1->TypeGet()) && varTypeIsI(op2->TypeGet()) || varTypeIsFloating(op1->gtType) && varTypeIsFloating(op2->gtType) : Possibly bad IL with CEE_ceq at offset 00FAh (op1=long op2=int stkDepth=0)' (importer.cpp Line: 11629):

  • JIT.Methodical.ELEMENT_TYPE_IU._il_dbgi_array_merge._il_dbgi_array_merge
  • JIT.Methodical.ELEMENT_TYPE_IU._il_dbgu_array_merge._il_dbgu_array_merge
  • JIT.Methodical.ELEMENT_TYPE_IU._il_reli_array_merge._il_reli_array_merge
  • JIT.Methodical.ELEMENT_TYPE_IU._il_relu_array_merge._il_relu_array_merge

2 Assertion failed '(op2 == idx) || (op2->gtEffectiveVal() == idx)' (gentree.cpp Line: 3846)

  • JIT.SIMD.VectorConvert_ro.VectorConvert_ro
  • JIT.SIMD.VectorConvert_r.VectorConvert_r

- Fix a couple of issues with TreeNodeInfoInit for struct arguments
- Remove duplicate calls to LowerBlockStore from TreeNodeInfoInit for arm and arm64
- Eliminate duplicative isMultiReg method from GenTree and fix condition for OperIsMultiRegOp()
@wateret
Copy link
Member

wateret commented Aug 25, 2017

@CarolEidt AFAIR the assertion from importer.cpp shows when we run with Windows.x64 TC build. These tests are fine with Windows.x86 build. And there may be several runtime failures due to TC build arch. (Anyway I'm not sure if throwing this assertion is right behaviour.)

@CarolEidt
Copy link
Author

@wateret thanks! All the JIT.Methodical.ELEMENT_TYPE_IU tests pass with the x86 test build - I should have known that I would need to download tests with the same bitness. I'm now re-running the other failing tests.

@CarolEidt
Copy link
Author

Anyway I'm not sure if throwing this assertion is right behavior

There are (unverifiable) tests that depend on the size of native int. I haven't verified that these tests fall into that category, but it would explain that assertion, which is complaining about type mismatches.

assert(curArgTabEntry);

assert(curArgTabEntry->regNum == REG_STK);
INDEBUG(assert(curArgTabEntry->regNum == REG_STK));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these asserts need to be surrounded by INDEBUG? assert is always DEBUG only.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Thanks, I'll remove the INDEBUG.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit

@CarolEidt
Copy link
Author

Here's an update after using the Windows/x86 test build instead of the x64 one (though I only re-ran the tests that had failed before, assuming there are no new failures):

12 total failures

8 Runtime Failures:

  • Exceptions.ForeignThread.ForeignThreadExceptions.ForeignThreadExceptions
  • GC.Scenarios.ServerModel.servermodel.servermodel
  • JIT.jit64.opt.cse.hugeexpr1.hugeexpr1
  • JIT.jit64.opt.cse.HugeField1.HugeField1
  • JIT.Methodical.doublearray.dblarray3_cs_d.dblarray3_cs_d
  • JIT.Methodical.doublearray.dblarray3_cs_do.dblarray3_cs_do
  • JIT.Methodical.doublearray.dblarray3_cs_r.dblarray3_cs_r
  • JIT.Methodical.doublearray.dblarray3_cs_ro.dblarray3_cs_ro

2 Assertion failed 'info->dstCount == 0' (lsraarm.cpp Line: 473)

  • JIT.Methodical.Boxing.boxunbox._il_dbgchain._il_dbgchain
  • JIT.Methodical.xxobj.ldobj._il_dbgldobj_I8._il_dbgldobj_I8

2 Assertion failed '(op2 == idx) || (op2->gtEffectiveVal() == idx)' (gentree.cpp Line: 3846)

  • JIT.SIMD.VectorConvert_ro.VectorConvert_ro
  • JIT.SIMD.VectorConvert_r.VectorConvert_r

@BruceForstall
Copy link
Member

I don't see any issue for the info->dstCount == 0 assert. Can you open one? The op2 == idx assert is #13422. I don't see issues for the runtime failures. (fyi, issue summary here: https://github.com/dotnet/coreclr/projects/4)

@CarolEidt
Copy link
Author

@dotnet-bot test Tizen armel Cross Debug Build

@CarolEidt
Copy link
Author

I don't see any issue for the info->dstCount == 0 assert. Can you open one?

#13603

I don't see issues for the runtime failures.

I opened #13604

@CarolEidt CarolEidt merged commit 18ee973 into dotnet:master Aug 26, 2017
@CarolEidt CarolEidt deleted the RemoveDupLowerBlockStore branch August 26, 2017 00:05
@karelz karelz modified the milestone: 2.1.0 Aug 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
7 participants