Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libraries PGO tests failing with System.BadImageFormatException #56655

Closed
AndyAyersMS opened this issue Jul 31, 2021 · 9 comments · Fixed by #56756
Closed

Libraries PGO tests failing with System.BadImageFormatException #56655

AndyAyersMS opened this issue Jul 31, 2021 · 9 comments · Fixed by #56756
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

Seems to mainly pop up in the DCS tests -- example.

Haven't tried to repro yet.

@AndyAyersMS AndyAyersMS self-assigned this Jul 31, 2021
@AndyAyersMS AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 31, 2021
@AndyAyersMS AndyAyersMS added this to Needs Triage in .NET Core CodeGen via automation Jul 31, 2021
@AndyAyersMS AndyAyersMS moved this from Needs Triage to PGO in .NET Core CodeGen Jul 31, 2021
@AndyAyersMS AndyAyersMS added this to the 6.0.0 milestone Jul 31, 2021
@AndyAyersMS
Copy link
Member Author

cc @EgorBo in case you get a chance to look before I do...

@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Jul 31, 2021
@EgorBo
Copy link
Member

EgorBo commented Jul 31, 2021

cc @EgorBo in case you get a chance to look before I do...

OOF this week without a laptop, sorry

@AndyAyersMS
Copy link
Member Author

Via JitEnablePGORange I isolated it down to

MemberInfoCache`1[__Canon][System.__Canon]:PopulateProperties(Filter):System.Reflection.RuntimePropertyInfo[]:this

Not sure yet what in that method causes trouble.

@AndyAyersMS
Copy link
Member Author

Note this method is on the stack when the test fails, so debugging the failure seems prudent:

   DataContractSerializerTests.DCS_BasicPerSerializerRoundTripAndCompare_DataSet [FAIL]
      System.BadImageFormatException : An attempt was made to load a program with an incorrect format. (0x8007000B (COR_E_BADIMAGEFORMAT))
      Stack Trace:
           at System.Reflection.MetadataImport._GetPropertyProps(IntPtr scope, Int32 mdToken, Void** name, Int32& propertyAttributes, ConstArray& signature)
           at System.Reflection.RuntimePropertyInfo..ctor(Int32 tkProperty, RuntimeType declaredType, RuntimeTypeCache reflectedTypeCache, Boolean& isPrivate) in System.Private.CoreLib.dll:token 0x6005489+0x8e
           at System.RuntimeType.RuntimeTypeCache.MemberInfoCache`1.PopulateProperties(Filter filter) in System.Private.CoreLib.dll:token 0x6000743+0x0
           at System.RuntimeType.RuntimeTypeCache.MemberInfoCache`1.GetListByName(Char* pName, Int32 cNameLen, Byte* pUtf8Name, Int32 cUtf8Name, MemberListType listType, CacheType cacheType) in System.Private.CoreLib.dll:token 0x6000735+0x96
           at System.RuntimeType.RuntimeTypeCache.MemberInfoCache`1.Populate(String name, MemberListType listType, CacheType cacheType) in System.Private.CoreLib.dll:token 0x6000734+0x0
           at System.RuntimeType.GetPropertyCandidates(String name, BindingFlags bindingAttr, Type[] types, Boolean allowPrefixLookup) in System.Private.CoreLib.dll:token 0x6000684+0x7f
           at System.RuntimeType.GetProperties(BindingFlags bindingAttr) in System.Private.CoreLib.dll:token 0x600068a+0xb

@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Jul 31, 2021
@AndyAyersMS
Copy link
Member Author

Failure is not tied to PGO data directly -- just the inlining that PGO engenders. That is if I use log replay to do the exact inlines that PGO does, but erase the PGO data, the failure persists.

Trimming down the set of inlines, the key inline seems to be the last one done here:

**************** Inline Tree
Inlines into 06000743 [via ReplayPolicy] MemberInfoCache`1[__Canon][System.__Canon]:PopulateProperties(Filter):System.Reflection.RuntimePropertyInfo[]:this
  [1 IL=0001 TR=000001 06000746] [accepted by log replay] MemberInfoCache`1[__Canon][System.__Canon]:get_ReflectedType():System.RuntimeType:this
    [2 IL=0006 TR=000102 0600071C] [accepted by log replay] RuntimeTypeCache:GetRuntimeType():System.RuntimeType:this
  [3 IL=0012 TR=000005 06006435] [accepted by log replay] System.Diagnostics.Debug:Assert(bool)
    [4 IL=0011 TR=000116 06006438] [accepted by log replay] System.Diagnostics.Debug:Assert(bool,System.String,System.String)
      [0 IL=0005 TR=000126 0600643D] [FAILED: noinline per IL/cached result] System.Diagnostics.Debug:Fail(System.String,System.String)
  [5 IL=0018 TR=000008 06000746] [accepted by log replay] MemberInfoCache`1[__Canon][System.__Canon]:get_ReflectedType():System.RuntimeType:this
    [6 IL=0006 TR=000133 0600071C] [accepted by log replay] RuntimeTypeCache:GetRuntimeType():System.RuntimeType:this
  [7 IL=0031 TR=000015 06006435] [accepted by log replay] System.Diagnostics.Debug:Assert(bool)
    [8 IL=0011 TR=000147 06006438] [accepted by log replay] System.Diagnostics.Debug:Assert(bool,System.String,System.String)
      [0 IL=0005 TR=000157 0600643D] [FAILED: noinline per IL/cached result] System.Diagnostics.Debug:Fail(System.String,System.String)
  [0 IL=0045 TR=000022 0600059A] [FAILED: noinline per IL/cached result] System.RuntimeTypeHandle:IsInterface(System.RuntimeType):bool
  [0 IL=0055 TR=000048 060006FF] [FAILED: rejected by log replay] System.RuntimeType:GetBaseType():System.RuntimeType:this
  [0 IL=0062 TR=000043 060005BE] [FAILED: noinline per IL/cached result] System.RuntimeTypeHandle:IsGenericVariable(System.RuntimeType):bool
  [9 IL=0071 TR=000054 0600072F] [accepted by log replay] Filter:CaseSensitive():bool:this
  [10 IL=0078 TR=000096 06006CB1] [accepted by log replay] System.Collections.Generic.Dictionary`2[__Canon,__Canon][System.__Canon,System.__Canon]:.ctor():this
    [11 IL=0003 TR=000172 06006CB4] [accepted by log replay] System.Collections.Generic.Dictionary`2[__Canon,__Canon][System.__Canon,System.__Canon]:.ctor(int,System.Collections.Generic.IEqualityComparer`1[__Canon]):this
      [12 IL=0001 TR=000174 06000557] [accepted by log replay] System.Object:.ctor():this
      [13 IL=0080 TR=000196 06006E2F] [accepted by log replay] System.Collections.Generic.NonRandomizedStringEqualityComparer:GetStringComparer(System.Object):System.Collections.Generic.IEqualityComparer`1[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]
        [14 IL=0010 TR=000219 06001C57] [accepted by log replay] System.StringComparer:get_Ordinal():System.StringComparer
        [15 IL=0024 TR=000224 06001C58] [accepted by log replay] System.StringComparer:get_OrdinalIgnoreCase():System.StringComparer
  [0 IL=0088 TR=000067 06000593] [FAILED: noinline per IL/cached result] System.RuntimeTypeHandle:GetNumVirtuals(System.RuntimeType):int
  [16 IL=0106 TR=000080 06000744] [accepted by log replay] MemberInfoCache`1[__Canon][System.__Canon]:PopulateProperties(Filter,System.RuntimeType,System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Collections.Generic.List`1[[System.Reflection.RuntimePropertyInfo, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]],System.Boolean[],byref):this

This method is large (> 500 bytes of IL) but gets large benefit boosts:

1 arguments are structs passed by value.  Multiplier increased to 2.
Inline candidate has arg that feeds range check.  Multiplier increased to 3.
Inline candidate has const arg that feeds a conditional.  Multiplier increased to 6.
Inline has 1 intrinsics.  Multiplier increased to 7.3.
Inline candidate has an arg that feeds a constant test.  Multiplier increased to 8.3.
Inline candidate callsite is in a loop.  Multiplier increased to 11.3.
Callsite has profile data: 3.00952.  Multiplier limited to 50.85.
Inline has 3 backward jumps (loops?).  Multiplier decreased to 35.595.
calleeNativeSizeEstimate=8625
callsiteNativeSizeEstimate=275
benefit multiplier=35.595
threshold=9788
Native estimate for function size is within threshold for inlining 862.5 <= 978.8 (multiplier = 35.595)

I may try to trim down the set of inlines further; suspect maybe just this inline is enough.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Aug 2, 2021

Ok, think I've found the bug.

scope.EnumNestedTypes(tkEnclosingType, out MetadataEnumResult tkNestedClasses);

The code above initializes a MetadatEnumResult via a qcall. Later we index into the struct via

public int this[int index]
{
get
{
Debug.Assert(0 <= index && index < Length);
if (largeResult != null)
return largeResult[index];
fixed (int* p = smallResult)
return p[index];
}
}
}

Note this looks at one part of the struct or another. So far so good.

But when we inline the caller that initializes the struct at a call site within a loop, the struct is not zeroed before each call, and the qcall does not clear out the part of the struct state it's not interested in. So we can end up looking at stale data from a previous call, which happens to be a token value for a property that does not make sense, and this causes the property props code to throw a bad format exception.

This is some fallout from skipping zero-init in SPC; any qcall that initializes a struct must now initialize the entire thing (or enough of it to avoid a problem like this).

@AndyAyersMS
Copy link
Member Author

This change fixes the issue:

diff --git a/src/coreclr/vm/managedmdimport.cpp b/src/coreclr/vm/managedmdimport.cpp
index 6bea537f1c8..ec2aef0cc03 100644
--- a/src/coreclr/vm/managedmdimport.cpp
+++ b/src/coreclr/vm/managedmdimport.cpp
@@ -163,6 +163,7 @@ static int * EnsureResultSize(MetadataEnumResult * pResult, ULONG length)
     else
     {
         ZeroMemory(pResult->smallResult, sizeof(pResult->smallResult));
+        pResult->largeResult = NULL;
         p = pResult->smallResult;
     }

@AndyAyersMS
Copy link
Member Author

cc @davidfowl @AaronRobinsonMSFT

@AaronRobinsonMSFT
Copy link
Member

This is some fallout from skipping zero-init in SPC; any qcall that initializes a struct must now initialize the entire thing (or enough of it to avoid a problem like this).

That is interesting and something the interop team should be aware of. Thanks for the analysis.

@jkoritzinsky @elinor-fung Might be worth auditing some of our QCalls since we tend to do a lot of initialization/reading for the older inertop paths.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Aug 3, 2021
…Enum

`MetadataEnumResult` has a fixed inline buffer for returning small results
and a pointer to allow it to return larger ones. The indexer for this
checks the pointer and if non-null assumes that's the current set of
values.

But if a `MetadataEnumResult` is re-used within a loop, values written to it
by `MetaDataImport::Enum` may bleed from one loop iteration to the next
if the iterations first get a large result and then a small one.

One case where this could happen was in libraries PGO tests, where PGO data
encouraged the jit to inline `MemberInfoCache<T>.PopulateProperties(Filter,...)`
into `MemberInfoCache<T>.PopulateProperties(Filter)`.

Note this also is a conseqeunce of skipping zero init locals; without that
the struct would have been zeroed each loop iteration.

Fixes dotnet#56655.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 3, 2021
.NET Core CodeGen automation moved this from PGO to Done Aug 3, 2021
AndyAyersMS added a commit that referenced this issue Aug 3, 2021
…Enum (#56756)

`MetadataEnumResult` has a fixed inline buffer for returning small results
and a pointer to allow it to return larger ones. The indexer for this
checks the pointer and if non-null assumes that's the current set of
values.

But if a `MetadataEnumResult` is re-used within a loop, values written to it
by `MetaDataImport::Enum` may bleed from one loop iteration to the next
if the iterations first get a large result and then a small one.

One case where this could happen was in libraries PGO tests, where PGO data
encouraged the jit to inline `MemberInfoCache<T>.PopulateProperties(Filter,...)`
into `MemberInfoCache<T>.PopulateProperties(Filter)`.

Note this also is a conseqeunce of skipping zero init locals; without that
the struct would have been zeroed each loop iteration.

Fixes #56655.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 3, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants