Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks to prevent silent bad codegen from VN optimizations. #49504

Closed
wants to merge 5 commits into from

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Mar 11, 2021

VN optimizations are known to be a source of silent bad codegen issues, a good part of them happen because it works with maps [CORINFO_CLASS_HANDLE, CORINFO_FIELD_HANDLE] and when it allows us to access a class/struct with a field handle not from this class. Jit does not hit any asserts when it happens, as a result, we can have this:

N003 (  7, 11) [000016] -A------R----             *  ASG       float  $41
N002 (  3,  2) [000015] D------N-----             +--*  LCL_VAR   float  V05 tmp4         d:2 $41
N001 (  3,  6) [000014] -------------             \--*  CNS_DBL   float  8.5000000000000000 $41
***
N003 (  7,  9) [000099] -A------R----             |  |  +--*  ASG       float  $41
N002 (  3,  4) [000097] U------N-----             |  |  |  +--*  LCL_FLD   float  V02 tmp1         ud:1->2[+0] Fseq[X] $180
N001 (  3,  2) [000098] -------N-----             |  |  |  \--*  LCL_VAR   float  V05 tmp4         u:2 
***
N003 (  7,  9) [000115] -A------R----             |  |  +--*  ASG       float  $40
N002 (  3,  2) [000113] D------N-----             |  |  |  +--*  LCL_VAR   float  V09 tmp8         d:2 $40
N001 (  3,  4) [000114] -------------             |  |  |  \--*  LCL_FLD   float  V02 tmp1         u:5[+0] Fseq[X] $40

The bug here is that 000114's VN is $40, when it should be $180. The issue is that we access V02 in 000097 and 000114 with different FieldHandles, both have names [X] but one is not from that struct type, in the past, it was a silent bad codegen, now it would be an assert when we try to set such FldSeq for a LclVar + dump now shows the actual handle values, so it is clearer from the dump what happens.

Summary: our new contract for FldSeq on LclVar is that CORINFO_FIELD_HANDLE must belong to LclVar's CORINFO_CLASS_HANDLE, otherwise it has to be NotAField(). And each FldSeq list should dereference a field that exists in the previous class handle.

And when we see similar issues but not with LclFld I expect we broader these checks to other nodes that have FldSeq.

There are small asm diffs:

x64 crossgen libs
0 (0.00% of base)
0 total methods with Code Size differences (0 improved, 0 regressed), 201134 unchanged.

x64 pmi libs
Total bytes of delta: 125 (0.00% of base)
8 total methods with Code Size differences (2 improved, 6 regressed), 268234 unchanged.

arm64 altjit windows crossgen libs
Total bytes of delta: 0 (0.00% of base)
0 total methods with Code Size differences (0 improved, 0 regressed), 200625 unchanged.

arm64 altjit  windows pmi libs
Total bytes of delta: 172 (0.00% of base)
8 total methods with Code Size differences (2 improved, 6 regressed), 268234 unchanged.

x64 unix altjit windows crossgen libs
Total bytes of delta: 11 (0.00% of base)
1 total method with Code Size differences (0 improved, 1 regressed), 200625 unchanged.

x64 unix altjit windows pmi libs
Total bytes of delta: 162 (0.00% of base)
13 total methods with Code Size differences (3 improved, 10 regressed), 268229 unchanged.

the diffs are falling into two categories:

  1. we inline a method that takes a struct and use the struct as another type,
    for example:
Importing BB14 (PC=000) of 'Microsoft.CodeAnalysis.CSharp.MemberResolutionResult`1[__Canon][System.__Canon]:get_Member():System.__Canon:this'
    [ 0]   0 (0x000) ldarg.0
    [ 1]   1 (0x001) ldfld 0A00058C
    [ 1]   6 (0x006) ret

    Inlinee Return expression (before normalization)  =>
               [000134] ------------              *  FIELD     ref    _member
               [000132] ------------              \--*  ADDR      byref 
               [000133] -------N----                 \--*  LCL_VAR   struct<Microsoft.CodeAnalysis.CSharp.MemberResolutionResult`1[[Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol, Microsoft.CodeAnalysis.CSharp, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], 48> V08 loc4

our this has type

Microsoft.CodeAnalysis.CSharp.MemberResolutionResult`1[[Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol, Microsoft.CodeAnalysis.CSharp

when we access it with a token for

Microsoft.CodeAnalysis.CSharp.MemberResolutionResult`1[__Canon][System.__Canon]

in the past we were setting CORINFO_FIELD_HANDLE from Canon to LclVar with MethodSymbol type, it could have led to errors if we accessed it using CORINFO_FIELD_HANDLE from MethodSymbol at the same time. Like:

Init V08 with 0;
V08._member = 1 using MethodSymbol handle;
read V08._member using Canon handle <- here VN would say that V08._member  is equal 0 because it would not see the previous assignment.

I believe we have enough deoptimizations in other places to be protected from such behavior, in this case, V08 is marked as address exposed so VN does not do optimizations with it.

  1. getFieldClass returs an unexpected value for some fields, I hope @davidwrighton could help me to understand such cases better:
CORINFO_FIELD_HANDLE: field = 0x000002b59440e928
CORINFO_CLASS_HANDLE: class = 0x000002b59440f650

m_fieldHnd
0x000002b59440e928 {...}
comp->info.compCompHnd->getFieldClass(m_fieldHnd)
0x000002b59440f3a8 {...} <- it says m_fieldHnd belongs to 0x000002b59440f3a8 
clsHnd
0x000002b59440f650 {...}
comp->info.compCompHnd->getFieldInClass(clsHnd, 0)
0x000002b59440e928 {...} <- it says 0x000002b59440e928  contains 0x000002b59440e928 , but why then 0x000002b59440e928 does not belong to it?

comp->info.compCompHnd->getClassName(clsHnd)
0x000002b594416e50 "System.ValueTuple`3[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral,...
comp->info.compCompHnd->getClassName(0x000002b59440f3a8)
0x000002b594415020 "System.ValueTuple`3[__Canon,__Canon,Boolean]"

This change improves our consistency for LCL_FLD but leaves many things unchanged:
For example, when we work with raw addresses we set FldSeq without any checks, it allows us to have smaller diffs and raw addresses are not involved in the most dangerous optimizations but could be used for CSE, like:

ADD
  LCL_VAR byref V01 <- we spilled an address of some local var, don't know its struct handle, it is just a pointer for us.
  CNST_INT 8 FldSeq someField <- allow any field seq to be here.

and if we have NotAFld CSE can't match it with the same ADD nodes, but if we have a real FldSeq it could be done.

And there are some questions that need to be addressed before the merge:

  1. the check for if (IsPseudoField()) is very optimistic, hope @briansull can tell me how they are used and how the check should be improved.
  2. compilation time impact, it looks expensive to call comp->info.compCompHnd->getFieldClass so often but we need to measure it to know for sure, probably we can save CORINFO_CLASS_HANDLE into FieldSeqNode, and then we would not need it but use a little extra memory.

@sandreenko sandreenko added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 11, 2021
Sergey added 2 commits March 15, 2021 01:58
There are cases when two fields have the same name but different handles.
@@ -7754,10 +7757,10 @@ void Compiler::fgValueNumberTree(GenTree* tree)
rhsVNPair.SetBoth(vnStore->VNForExpr(compCurBB, lclVarTree->TypeGet()));
}

lclDefSsaNum = GetSsaNumForLocalVarDef(lclVarTree);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was one failure on a big method before this change, the case looked like this:

ASG(LCL_VAR V01, 0) <- def
ASG(LCL_FLD V01+4, 1) with NotAField <- both def and use
Read LCL_FLD V01+8 with LclFld <- use

and SSA created such numbers:

ASG(LCL_VAR V01, 0) <- lclDefSsaNum 0
ASG(LCL_FLD V01+4, 1) with NotAField <- lclDefSsaNum 1, use ssa num 0
Read LCL_FLD V01+8 with LclFld <- use ssa num 1

and when we did fgValueNumberTree for Read LCL_FLD V01+8 we expected data ssa num 1 to be initialized but it was not because we used ssa num 0 for ASG(LCL_FLD V01+4, 1).

@davidwrighton
Copy link
Member

@sandreenko fields on generic types are handled specially (when using the runtime instead of crossgen2) The general rule is that instance fields are not actually created as unique FieldDesc, the runtime simply uses the fields on the matching canonical type.

@sandreenko
Copy link
Contributor Author

@sandreenko fields on generic types are handled specially (when using the runtime instead of crossgen2) The general rule is that instance fields are not actually created as unique FieldDesc, the runtime simply uses the fields on the matching canonical type.

then can we support the old verifying functionality of

// Return the field's type, if it is CORINFO_TYPE_VALUECLASS 'structType' is set
// the field's value class (if 'structType' == 0, then don't bother
// the structure info).
//
// 'memberParent' is typically only set when verifying. It should be the
// result of calling getMemberParent.
virtual CorInfoType getFieldType(
CORINFO_FIELD_HANDLE field,
CORINFO_CLASS_HANDLE *structType = NULL,
CORINFO_CLASS_HANDLE memberParent = NULL /* IN */
) = 0;

on all VMs and use it here, what would be the result if I call :

  1. getFieldType(canonFieldHandle, null, concreteStructHandle that contains canonFieldHandle)?
  2. getFieldType(fieldHandle, null, structHandle that does not contain fieldHandle)?

@sandreenko
Copy link
Contributor Author

PTAL @briansull @BruceForstall @dotnet/jit-contrib that will probably be a discussion for some time rather than a ready-to-be-merged PR.

@sandreenko
Copy link
Contributor Author

Additional thoughts:

  1. VN would be more reliable if it used offset, type instead of FldSeq and Jit would not need to spend time and memory passing FldSeq around. However, it is would be an XL change and there are better modern approaches to do the same. So it is not worth time investing so much into improving the current model.

  2. often there is a possibility to find a FieldHandle in the current class for the given offset and type but we currently don't, for example:

struct A // StructHandle1
{
  int fA; // FldHandle1
}

struct B // StructHandle2
{
  int fB; // FldHandle2
}

B b = new B();
A a = Unsafe.As<A, B>(ref b); 
-> a promoted, b not-promoted ->
LclVar fA = IND(ADDR(b)) [NotAField]

because FldHandle1 does not belong to StructHandle2 and we don't try to find an appropriate FldHandle for that class that we have. It would probably too expensive compilation-time-wise.
but ideally we would do:

LclVar fA = LclFld b +0 [FldHandle2].

@davidwrighton
Copy link
Member

The current behavior of specifying the extra "owner" parameter to getFieldType is that it will be used to produce a more exact type handle. So... if you have the following class.

class TestClass<T>
{
    T _field;
}

Assuming that you are working with a TestClass<string> and fieldHnd points at TestClass<__Canon>._field, and exactTypeHandle is TestClass<string> then.

getFieldType(fieldHnd, &typeOfField, NULL) will result in typeOfField referring to __Canon
getFieldType(fieldHnd, &typeOfField, exactTypeHandle) will result in typeOfField referring to string

And if wrongTypeHandle refers to List<object>
getFieldType(fieldHnd, &typeOfField, wrongTypeHandle) will result in typeOfField referring to object

And if wrongTypeHandle refers to DateTime
getFieldType(fieldHnd, &typeOfField, wrongTypeHandle) will result in either a crash, or undefined behavior.

So, if we want a function to verify that a given field is a member of a type, we'll need to add a new function.

@sandreenko sandreenko mentioned this pull request Apr 9, 2021
10 tasks
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Apr 12, 2021
@sandreenko
Copy link
Contributor Author

ping myself

@sandreenko
Copy link
Contributor Author

will finish later.

@sandreenko sandreenko closed this May 6, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants