Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM: JIT crash #1397

Closed
kant2002 opened this issue Aug 8, 2021 · 22 comments
Closed

ARM: JIT crash #1397

kant2002 opened this issue Aug 8, 2021 · 22 comments
Labels
area-NativeAOT-coreclr .NET runtime optimized for ahead of time compilation

Comments

@kant2002
Copy link
Contributor

kant2002 commented Aug 8, 2021

During samples compilation. ARM built from 4424f85

During publish of HelloWorld sample. No modifications except local feed change.

  /home/pi/runtimelab/src/coreclr/jit/lclvars.cpp:2618
  Assertion failed 'varTypeIsStruct(varDsc)' in 'System.SpanHelpers:Fill(byref,int,int)' during 'Morph - Global' (IL size 1139)
@kant2002
Copy link
Contributor Author

kant2002 commented Aug 8, 2021

Not sure, but maybe #1388 gone. I may change clang version on new Raspberry or maybe something changed in the runtime repo.

@jkotas
Copy link
Member

jkotas commented Aug 8, 2021

What is the T that it is crashing on? The same crash is likely going to repro on regular CoreCLR as well.

@jkotas jkotas added the area-NativeAOT-coreclr .NET runtime optimized for ahead of time compilation label Aug 8, 2021
@kant2002
Copy link
Contributor Author

T is int. Code below enough for crash.

var x = new int[10];
var s = new Span<int>(x);
s.Fill(0);

Also #1388 is not gone,since if I publish empty console app, I still receive segmentation fault.

@kant2002
Copy link
Contributor Author

I cannot reproduce that issue in not-NativeAOT context. Regular dotnet run just works. What route should I take?

@jkotas
Copy link
Member

jkotas commented Aug 12, 2021

I would capture JIT dumps of the method in NativeAOT and in non-Native (make sure to turn off tiered JIT and R2R, and use same CoreLib build flavor in both cases), and then compare the two JIT dumps to find out why things are different in NativeAOT.

@kant2002
Copy link
Contributor Author

I think I close to finding root cause of the issue. Part of JIT dump in NativeAOT.

Morphing BB41 of 'System.SpanHelpers:Fill(byref,int,int)'

fgMorphTree BB41, STMT00093 (before)
               [000358] S-CXG--------             *  CALL      void   System.Runtime.Intrinsics.Vector256.AsVector
               [000361] ------------- arg0        +--*  LCL_VAR_ADDR byref  V10 loc7         
               [000357] --CXG-------- arg1        \--*  OBJ       struct<System.Runtime.Intrinsics.Vector256`1[System.Byte], 32>
               [000721] -------------                \--*  ADDR      byref 
               [000722] -------N-----                   \--*  LCL_VAR   int   (AX) V09 loc6         
Initializing arg info for 358.CALL:
ArgTable for 358.CALL after fgInitArgInfo:
fgArgTabEntry[arg 0 361.LCL_VAR_ADDR int (By ref), 1 reg: r0, byteAlignment=4]
fgArgTabEntry[arg 1 357.OBJ struct (By value), 2 regs: r2 r3, numSlots=6, slotNum=0, byteSize=32, byteOffset=0, byteAlignment=8, isSplit, isStruct]

Morphing args for 358.CALL:
argSlots=10, preallocatedArgCount=6, nextSlotNum=6, nextSlotByteOffset=24, outgoingArgSpaceSize=24

Sorting the arguments:
Deferred argument ('r2'):
     (  9,  7) [000357] n---G--------             *  OBJ       struct<System.Runtime.Intrinsics.Vector256`1[System.Byte], 32>
     (  3,  3) [000721] -------------             \--*  ADDR      byref 
     (  3,  2) [000722] ----G--N-----                \--*  LCL_VAR   int   (AX) V09 loc6         
Replaced with placeholder node:
               [000986] -----------L-             *  ARGPLACE  struct => [clsHnd=0042E8F8]
Deferred argument ('r0'):
     (  3,  3) [000361] -------------             *  LCL_VAR_ADDR int    V10 loc7         
Replaced with placeholder node:
               [000987] -----------L-             *  ARGPLACE  int   

Shuffled argument table:    r2 r0 

Local V09 should not be enregistered because: it is a struct arg
D:\d\kant\GitHub\dotnet\runtimelab\src\coreclr\jit\lclvars.cpp:2618
Assertion failed 'varTypeIsStruct(varDsc)' in 'System.SpanHelpers:Fill(byref,int,int)' during 'Morph - Global' (IL size 1139)

It seems that this is happens when this line processed:

vector = Unsafe.As<T, Vector256<byte>>(ref tmp).AsVector();

@kant2002
Copy link
Contributor Author

Stack trace for the issue following

clrjit_unix_arm_x64.dll!assertAbort(const char * why, const char * file, unsigned int line) Line 306
	at runtimelabsrc\coreclr\jit\error.cpp(306)
clrjit_unix_arm_x64.dll!Compiler::lvaSetVarDoNotEnregister(unsigned int varNum, Compiler::DoNotEnregisterReason reason) Line 2618
	at runtimelabsrc\coreclr\jit\lclvars.cpp(2618)
clrjit_unix_arm_x64.dll!Compiler::fgMorphMultiregStructArg(GenTree * arg, fgArgTabEntry * fgEntryPtr) Line 4503
	at runtimelabsrc\coreclr\jit\morph.cpp(4503)
clrjit_unix_arm_x64.dll!Compiler::fgMorphMultiregStructArgs(GenTreeCall * call) Line 4400
	at runtimelabsrc\coreclr\jit\morph.cpp(4400)
clrjit_unix_arm_x64.dll!Compiler::fgMorphArgs(GenTreeCall * call) Line 4294
	at runtimelabsrc\coreclr\jit\morph.cpp(4294)
clrjit_unix_arm_x64.dll!Compiler::fgMorphCall(GenTreeCall * call) Line 9321
	at runtimelabsrc\coreclr\jit\morph.cpp(9321)
clrjit_unix_arm_x64.dll!Compiler::fgMorphTree(GenTree * tree, Compiler::MorphAddrContext * mac) Line 14939
	at runtimelabsrc\coreclr\jit\morph.cpp(14939)
clrjit_unix_arm_x64.dll!Compiler::fgMorphStmts(BasicBlock * block, bool * lnot, bool * loadw) Line 15810
	at runtimelabsrc\coreclr\jit\morph.cpp(15810)
clrjit_unix_arm_x64.dll!Compiler::fgMorphBlocks() Line 16063
	at runtimelabsrc\coreclr\jit\morph.cpp(16063)
clrjit_unix_arm_x64.dll!Compiler::compCompile::__l2::<lambda>() Line 4856
	at runtimelabsrc\coreclr\jit\compiler.cpp(4856)
clrjit_unix_arm_x64.dll!ActionPhase<void <lambda>(void)>::DoPhase() Line 65
	at runtimelabsrc\coreclr\jit\phase.h(65)
clrjit_unix_arm_x64.dll!Phase::Run() Line 61
	at runtimelabsrc\coreclr\jit\phase.cpp(61)
clrjit_unix_arm_x64.dll!DoPhase<void <lambda>(void)>(Compiler * _compiler, Phases _phase, Compiler::compCompile::__l2::void <lambda>(void) _action) Line 79
	at runtimelabsrc\coreclr\jit\phase.h(79)
clrjit_unix_arm_x64.dll!Compiler::compCompile(void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 4911
	at runtimelabsrc\coreclr\jit\compiler.cpp(4911)
clrjit_unix_arm_x64.dll!Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 6411
	at runtimelabsrc\coreclr\jit\compiler.cpp(6411)
clrjit_unix_arm_x64.dll!`Compiler::compCompile'::`73'::__Body::Run(Compiler::compCompile::__l2::__JITParam * __JITpParam) Line 5684
	at runtimelabsrc\coreclr\jit\compiler.cpp(5684)
clrjit_unix_arm_x64.dll!Compiler::compCompile(CORINFO_MODULE_STRUCT_ * classPtr, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 5688
	at runtimelabsrc\coreclr\jit\compiler.cpp(5688)
clrjit_unix_arm_x64.dll!``jitNativeCode'::`8'::__Body::Run'::`6'::__Body::Run(jitNativeCode::__l8::__Body::Run::__l5::__JITParam * __JITpParam) Line 7054
	at runtimelabsrc\coreclr\jit\compiler.cpp(7054)
clrjit_unix_arm_x64.dll!`jitNativeCode'::`8'::__Body::Run(jitNativeCode::__l2::__JITParam * __JITpParam) Line 7057
	at runtimelabsrc\coreclr\jit\compiler.cpp(7057)
clrjit_unix_arm_x64.dll!jitNativeCode(CORINFO_METHOD_STRUCT_ * methodHnd, CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags, void * inlineInfoPtr) Line 7081
	at runtimelabsrc\coreclr\jit\compiler.cpp(7081)
clrjit_unix_arm_x64.dll!CILJit::compileMethod(ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, unsigned int flags, unsigned char * * entryAddress, unsigned int * nativeSizeOfCode) Line 276
	at runtimelabsrc\coreclr\jit\ee_il_dll.cpp(276)
jitinterface_x64.dll!JitCompileMethod(CorInfoExceptionClass * * ppException, ICorJitCompiler * pJit, void * thisHandle, void * * callbacks, CORINFO_METHOD_INFO * methodInfo, unsigned int flags, unsigned char * * entryAddress, unsigned int * nativeSizeOfCode) Line 36
	at runtimelabsrc\coreclr\tools\aot\jitinterface\jitwrapper.cpp(36)
clrjit_unix_arm_x64.dll!00007ffbefcebd38()
[Managed to Native Transition]
ILCompiler.RyuJit.dll!Internal.JitInterface.CorInfoImpl.CompileMethodInternal(ILCompiler.DependencyAnalysis.IMethodNode methodCodeNodeNeedingCode, Internal.IL.MethodIL methodIL) Line 298
	at runtimelabsrc\coreclr\tools\Common\JitInterface\CorInfoImpl.cs(298)
ILCompiler.RyuJit.dll!Internal.JitInterface.CorInfoImpl.CompileMethod(ILCompiler.DependencyAnalysis.MethodCodeNode methodCodeNodeNeedingCode, Internal.IL.MethodIL methodIL) Line 62
	at runtimelabsrc\coreclr\tools\aot\ILCompiler.RyuJit\JitInterface\CorInfoImpl.RyuJit.cs(62)
ILCompiler.RyuJit.dll!ILCompiler.RyuJitCompilation.CompileSingleMethod(Internal.JitInterface.CorInfoImpl corInfo, ILCompiler.DependencyAnalysis.MethodCodeNode methodCodeNodeNeedingCode) Line 173
	at runtimelabsrc\coreclr\tools\aot\ILCompiler.RyuJit\Compiler\RyuJitCompilation.cs(173)
ILCompiler.RyuJit.dll!ILCompiler.RyuJitCompilation.CompileMultiThreaded.AnonymousMethod__15_0(object m) Line 135
	at runtimelabsrc\coreclr\tools\aot\ILCompiler.RyuJit\Compiler\RyuJitCompilation.cs(135)

Based on what I see I think this is related to FEATURE_MULTIREG_ARGS. My primary suspect right now this block.

if ((size > 1) || (fgEntryPtr->IsHfaArg() && argx->TypeGet() == TYP_STRUCT))
{
foundStructArg = true;
if (varTypeIsStruct(argx) && !argx->OperIs(GT_FIELD_LIST))
{
if (fgEntryPtr->IsHfaRegArg())
{
var_types hfaType = fgEntryPtr->GetHfaType();
unsigned structSize;
if (argx->OperIs(GT_OBJ))
{
structSize = argx->AsObj()->GetLayout()->GetSize();
}
else if (varTypeIsSIMD(argx))
{
structSize = genTypeSize(argx);
}
else
{
assert(argx->OperIs(GT_LCL_VAR));
structSize = lvaGetDesc(argx->AsLclVar()->GetLclNum())->lvExactSize;
}
assert(structSize > 0);
if (structSize == genTypeSize(hfaType))
{
if (argx->OperIs(GT_OBJ))
{
argx->SetOper(GT_IND);
}
argx->gtType = hfaType;
}
}
GenTree* newArgx = fgMorphMultiregStructArg(argx, fgEntryPtr);

Specifically line 4369 is skipped and as such tree does not transformed. At least this is one possible scenario if issue is close to execution path. If by some reason issue in different phase I do not know where to look at right now. I'm temporary without Raspberry right now, so I cannot get my hands on JIT dump for regular runtime application.

@MichalStrehovsky
Copy link
Member

Since this is around VectorXXX<T> and there's a bunch of IsHfaReg around, I wonder whether you need an equivalent of dotnet/runtime#35576.

Maybe all that's needed is to update ComputeValueTypeShapeCharacteristics in VectorFieldLayoutAlgorithm.cs to also report a HFA shape on ARM32 (and VectorOfTFieldLayoutAlgorithm.cs as a preventative measure, since that's the good old Vector<T>)

@kant2002
Copy link
Contributor Author

If I update ComputeValueTypeShapeCharacteristics to report shape on ARM32 then next thing it fails in HfaTypeFromElemKind in compiler.h with assertion when kind == CORINFO_HFA_ELEM_VECTOR128. That case is behind FEATURE_SIMD, and seems to be disabled on ARM32 because I do not hit that area. I suspect that simply enabling that feature would not fly without some extra deep dive. But I will play a bit with changes in Anton's PR. Maybe I will see something interesting.

@kant2002
Copy link
Contributor Author

Okay. I trying to better understand what's going on
There 2 similar locations in code one of which trigger bug, but second is not.

Problematic

vector = Unsafe.As<T, Vector256<byte>>(ref tmp).AsVector(); 

Okayish

Vector128<byte> vec128 = Unsafe.As<T, Vector128<byte>>(ref tmp);
if (Vector<byte>.Count == 16)
{
    vector = vec128.AsVector();
}

This translated to following IL

Problematic

IL_0206: ldloca.s 6
IL_0208: call !!1& Internal.Runtime.CompilerServices.Unsafe::As<!!T, valuetype System.Runtime.Intrinsics.Vector256`1<uint8>>(!!0&)
IL_020d: ldobj valuetype System.Runtime.Intrinsics.Vector256`1<uint8>
IL_0212: call valuetype System.Numerics.Vector`1<!!0> System.Runtime.Intrinsics.Vector256::AsVector<uint8>(valuetype System.Runtime.Intrinsics.Vector256`1<!!0>)
IL_0217: stloc.s 7

Okayish

IL_0185: ldloca.s 6
IL_0187: call !!1& Internal.Runtime.CompilerServices.Unsafe::As<!!T, valuetype System.Runtime.Intrinsics.Vector128`1<uint8>>(!!0&)
IL_018c: ldobj valuetype System.Runtime.Intrinsics.Vector128`1<uint8>
IL_0191: stloc.s 17
.. If part with AsVector is skipped, since I think that's not important.

And this is imports as following statements in basic blocks.

Problematic 3 statements

***** BB33
STMT00091 (IL 0x206...0x217)
               [000355] I-C-G--------             *  CALL      byref  Internal.Runtime.CompilerServices.Unsafe.As (exactContextHnd=0x400000000047A050)
               [000354] ------------- arg0        \--*  ADDR      byref 
               [000353] -------N-----                \--*  LCL_VAR   int    V09 loc6         

***** BB33
STMT00092 (IL   ???...  ???)
               [000358] I-CXG--------             *  CALL      void   System.Runtime.Intrinsics.Vector256.AsVector (exactContextHnd=0x40000000004363A0)
               [000361] ------------- arg0        +--*  ADDR      byref 
               [000360] -------N-----             |  \--*  LCL_VAR   struct<System.Numerics.Vector`1[System.Byte], 16> V10 loc7         
               [000357] --CXG-------- arg1        \--*  OBJ       struct<System.Runtime.Intrinsics.Vector256`1[System.Byte], 32>
               [000356] --C----------                \--*  RET_EXPR  byref (inl return expr [000355])

***** BB33
STMT00093 (IL   ???...  ???)
               [000359] --C----------             *  RET_EXPR  void  (inl return expr [000358])

Okayish 2 statements

***** BB25
STMT00123 (IL 0x185...0x191)
               [000474] I-C-G--------             *  CALL      byref  Internal.Runtime.CompilerServices.Unsafe.As (exactContextHnd=0x400000000047A048)
               [000473] ------------- arg0        \--*  ADDR      byref 
               [000472] -------N-----                \--*  LCL_VAR   int    V09 loc6         

***** BB25
STMT00124 (IL   ???...  ???)
               [000479] -ACXG--------             *  ASG       struct (copy)
               [000477] D------N-----             +--*  LCL_VAR   struct<System.Runtime.Intrinsics.Vector128`1[System.Byte], 16> V20 loc17        
               [000476] --CXG--------             \--*  OBJ       struct<System.Runtime.Intrinsics.Vector128`1[System.Byte], 16>
               [000475] --C----------                \--*  RET_EXPR  byref (inl return expr [000474])

After inlining and substitutions it comes down to

Problematic 3 statements

***** BB31
STMT00093 (IL   ???...  ???)
               [000358] S-CXG--------             *  CALL      void   System.Runtime.Intrinsics.Vector256.AsVector
               [000361] ------------- arg0        +--*  ADDR      byref 
               [000360] -------N-----             |  \--*  LCL_VAR   struct<System.Numerics.Vector`1[System.Byte], 16> V10 loc7         
               [000357] --CXG-------- arg1        \--*  OBJ       struct<System.Runtime.Intrinsics.Vector256`1[System.Byte], 32>
               [000721] -------------                \--*  ADDR      byref 
               [000722] -------N-----                   \--*  LCL_VAR   int    V09 loc6         

Okayish 2 statements

***** BB25
STMT00124 (IL   ???...  ???)
               [000479] -ACXG--------             *  ASG       struct (copy)
               [000477] D------N-----             +--*  LCL_VAR   struct<System.Runtime.Intrinsics.Vector128`1[System.Byte], 16> V20 loc17        
               [000476] --CXG--------             \--*  OBJ       struct<System.Runtime.Intrinsics.Vector128`1[System.Byte], 16>
               [000678] -------------                \--*  ADDR      byref 
               [000679] -------N-----                   \--*  LCL_VAR   int    V09 loc6         

I notice that line in log

  [0 IL=0530 TR=000358 06003F64] [FAILED: unprofitable inline] System.Runtime.Intrinsics.Vector256:AsVector(System.Runtime.Intrinsics.Vector256`1[System.Byte]):System.Numerics.Vector`1[System.Byte]

So my suspect that in Vector256<int>.AsVector() gets inlined on other platforms, so tree somehow normalized (?)
Or maybe this tree pattern does not handled well. I will take a look more on this.

@kant2002
Copy link
Contributor Author

I think I managed to get CoreCLR JIT. I do Frankend build in following way. I build NativeAOT build.sh -rc Debug -lc Release and copy libclrjit.so to .dotnet folder. Then I run using dotnet run with export COMPlus_TieredCompilation=0.
I look at the logs and found that Vector256<int>.AsVector() is inlined.

Debug NativeAOT (not inlined, crash)

Invoking compiler for the inlinee method System.Runtime.Intrinsics.Vector256:AsVector(System.Runtime.Intrinsics.Vector256`1[System.Byte]):System.Numerics.Vector`1[System.Byte] :
IL to import:
IL_0000  00                nop         
IL_0001  28 47 0a 00 0a    call         0xA000A47
IL_0006  28 37 0a 00 0a    call         0xA000A37
IL_000b  fe 04             clt         
IL_000d  16                ldc.i4.0    
IL_000e  fe 01             ceq         
IL_0010  28 6a 2f 00 06    call         0x6002F6A
IL_0015  00                nop         
IL_0016  28 08 04 00 2b    call         0x2B000408
IL_001b  00                nop         
IL_001c  0f 00             ldarga.s     0x0
IL_001e  28 81 04 00 2b    call         0x2B000481
IL_0023  71 0a 02 00 1b    ldobj        0x1B00020A
IL_0028  0a                stloc.0     
IL_0029  2b 00             br.s         0 (IL_002b)
IL_002b  06                ldloc.0     
IL_002c  2a                ret         

INLINER impTokenLookupContextHandle for System.Runtime.Intrinsics.Vector256:AsVector(System.Runtime.Intrinsics.Vector256`1[System.Byte]):System.Numerics.Vector`1[System.Byte] is 0x40000000004363A0.
*************** In fgFindBasicBlocks() for System.Runtime.Intrinsics.Vector256:AsVector(System.Runtime.Intrinsics.Vector256`1[System.Byte]):System.Numerics.Vector`1[System.Byte]
weight= 65 : state   2 [ noshow ]
weight= 79 : state  40 [ call ]
weight= 79 : state  40 [ call ]
weight= 26 : state 171 [ clt ]
weight= 15 : state  23 [ ldc.i4.0 ]
weight= 20 : state 168 [ ceq ]
weight= 79 : state  40 [ call ]
weight= 65 : state   2 [ noshow ]
weight= 79 : state  40 [ call ]
weight= 65 : state   2 [ noshow ]
weight= 77 : state  16 [ ldarga.s ]
weight= 79 : state  40 [ call ]
weight= 29 : state 101 [ ldobj ]
weight=  6 : state  11 [ stloc.0 ]
weight= 44 : state  43 [ br.s ]
weight= 12 : state   7 [ ldloc.0 ]
weight= 19 : state  42 [ ret ]

Inline candidate callsite is boring.  Multiplier increased to 1.3.
calleeNativeSizeEstimate=838
callsiteNativeSizeEstimate=225
benefit multiplier=1.3
threshold=292
Native estimate for function size exceeds threshold for inlining 83.8 > 29.2 (multiplier = 1.3)

Release CoreCLR (inlined, no crash)

Invoking compiler for the inlinee method Vector256:AsVector(Vector256`1):Vector`1 :
IL to import:
IL_0000  28 75 03 00 2b    call         0x2B000375
IL_0005  0f 00             ldarga.s     0x0
IL_0007  28 d1 03 00 2b    call         0x2B0003D1
IL_000c  71 4c 03 00 1b    ldobj        0x1B00034C
IL_0011  2a                ret         

INLINER impTokenLookupContextHandle for Vector256:AsVector(Vector256`1):Vector`1 is 0x61BC50A4.
*************** In fgFindBasicBlocks() for Vector256:AsVector(Vector256`1):Vector`1
weight= 79 : state  40 [ call ]
weight= 77 : state  16 [ ldarga.s ]
weight= 79 : state  40 [ call ]
weight= 29 : state 101 [ ldobj ]
weight= 19 : state  42 [ ret ]

Inline candidate returns a struct by value.  Multiplier increased to 2.
Inline candidate callsite is boring.  Multiplier increased to 3.3.
calleeNativeSizeEstimate=283
callsiteNativeSizeEstimate=225
benefit multiplier=3.3
threshold=742
Native estimate for function size is within threshold for inlining 28.3 <= 74.2 (multiplier = 3.3)
Jump targets:
  none
New Basic Block BB191 [0165] created.
BB191 [000..012)
Basic block list for 'Vector256:AsVector(Vector256`1):Vector`1'

@kant2002
Copy link
Contributor Author

I was trying to test with build.sh nativeaot+nativeaot.packages -rc Release -lc Release, but currently blocked by #1458

@jkotas
Copy link
Member

jkotas commented Aug 25, 2021

You should also build and copy over debug CoreCLR System.Private.CoreLib.dll. It will give you Debug CoreCLR config that should closely match Debug NativeAOT behavior.

@kant2002
Copy link
Contributor Author

You should also build and copy over debug CoreCLR System.Private.CoreLib.dll. It will give you Debug CoreCLR config that should closely match Debug NativeAOT behavior.

I trying to avoid that, since I have only 16Gb flash, and that's mean I should remove locally runtimelab artifacts, and it would be slow, so I avoid that as much as possible. Linux at home allow me to cross compile, but I usually limited in cognitive throughput there. I definitely will try that as last resort.

FIY: Release NativeAOT build works and Hello World is running.

@jkotas
Copy link
Member

jkotas commented Aug 26, 2021

Hopefully, both these steps are going to fit into your 16GB flash.

Release NativeAOT build works and Hello World is running.

Nice!

@kant2002
Copy link
Contributor Author

I copy libcorejit.so, libcoreclr.so, and System.Private.CoreLib.dll from recent pipeline run.

Checked CoreCLR

INLINER impTokenLookupContextHandle for System.Runtime.Intrinsics.Vector256:AsVector(System.Runtime.Intrinsics.Vector256`1[Byte]):System.Numerics.Vector`1[Byte] is 0x600B62CC.
*************** In fgFindBasicBlocks() for System.Runtime.Intrinsics.Vector256:AsVector(System.Runtime.Intrinsics.Vector256`1[Byte]):System.Numerics.Vector`1[Byte]
weight= 79 : state  40 [ call ]
weight= 79 : state  40 [ call ]
weight= 26 : state 171 [ clt ]
weight= 15 : state  23 [ ldc.i4.0 ]
weight= 20 : state 168 [ ceq ]
weight= 79 : state  40 [ call ]
weight= 79 : state  40 [ call ]
weight= 77 : state  16 [ ldarga.s ]
weight= 79 : state  40 [ call ]
weight= 29 : state 101 [ ldobj ]
weight= 19 : state  42 [ ret ]

Inline candidate returns a struct by value.  Multiplier increased to 2.
Inline candidate callsite is boring.  Multiplier increased to 3.3.
calleeNativeSizeEstimate=581
callsiteNativeSizeEstimate=225
benefit multiplier=3.3
threshold=742
Native estimate for function size is within threshold for inlining 58.1 <= 74.2 (multiplier = 3.3)
Jump targets:
  none
New Basic Block BB157 [0135] created.
BB157 [000..026)
Basic block list for 'System.Runtime.Intrinsics.Vector256:AsVector(System.Runtime.Intrinsics.Vector256`1[Byte]):System.Numerics.Vector`1[Byte]'

I see following differences between CoreCLR and NativeAOT which may be related
CoreCLR has

Inline candidate returns a struct by value.  Multiplier increased to 2.    ; This line was missing in NativeAOT
Inline candidate callsite is boring.  Multiplier increased to 3.3.              ; This has bigger number, does this related to prev line?

Debug NativeAOT

weight= 65 : state   2 [ noshow ] ;; Any nop increase weight that much. Is this correct estimation? 

Debug NativeAOT

threshold=292 ;;;; is less then threshold=742 for CoreCLR not clear yet why.

@kant2002
Copy link
Contributor Author

I have all answers on my questions after looking at source code.
so I think I can summarize problem with code.

If you have Unsafe.As<TFrom, TTo> which convert to ref T to ref struct. Struct should be multi-reg struct and then afterwards we should have non-inlinable call on the converted value in IL stack, then we have crash in debug builds.

This maybe not even an issue per-se, since code generated correctly, but definitely this is correctness error in these specific circumstances.

I was trying to replicate with following code

using System;
using System.Runtime.Intrinsics;
using System.Runtime.CompilerServices;
using System.Runtime.Versioning;
using System.Numerics;

class Program
{
	public static void Main()
	{
		var x = new int[10];
		TestFunction(ref x[0]);
	}

	static Vector<byte> TestFunction<T>(ref T x)
	{
		return Unsafe.As<T, Vector256<byte>>(ref x).AsNonInlinable();
	}
}

public static class Test
{
	public static Vector<byte> AsNonInlinable(this Vector256<byte> data)
	{
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		Console.WriteLine("sss");
		return data.AsVector();
	}
}

namespace Internal.Runtime.CompilerServices
{
    public static class Unsafe
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        //[Intrinsic]
        public static ref TTo As<TFrom, TTo>(ref TFrom source)
        {
            throw new PlatformNotSupportedException();
        }
    }
}

But seems to be Internal.Runtime.CompilerServices.Unsafe work only if placed in CoreLib, so maybe I should resort to IL tweaks for repro on CoreCLR ARM.

@jkotas
Copy link
Member

jkotas commented Aug 30, 2021

This has bigger number, does this related to prev line?

This difference is caused by CoreCLR using ExtendedDefaultPolicy inlining policy by default and NativeAOT using DefaultPolicy by default. You can try passing in -Ot that will turn on ExtendedDefaultPolicy for NativeAOT.

Internal.Runtime.CompilerServices.Unsafe work only if placed in CoreLib,

Public System.Runtime.CompilerServices.Unsafe should behave identically.

@kant2002
Copy link
Contributor Author

You can try passing in -Ot that will turn on ExtendedDefaultPolicy for NativeAOT.

Same problem.

Public System.Runtime.CompilerServices.Unsafe should behave identically
Strange that I miss that. Anyway I still not able to reproduce issue on CoreCLR.
I cannot re-create RET_EXPR byref (inl return expr [000xxx]) in the code which I trying to write. Any ideas how importer decide create this node?

Also what area or ILC may be related to missing line Inline candidate returns a struct by value in JIT dump, for AsVector(Vector256<T>) call?

@jkotas
Copy link
Member

jkotas commented Aug 30, 2021

missing line Inline candidate returns a struct by value

My guess is that the ILC does not use ExtendedDefaultPolicy for some reason. I was hoping that -Ot will turn it on, but there is probably some other condition that prevents it from kicking in.

Do you see return new (compiler, CMK_Inlining) ExtendedDefaultPolicy(compiler, isPrejitRoot); line in the JIT getting executed when the offending method is compiled?

@kant2002
Copy link
Contributor Author

Do you see return new (compiler, CMK_Inlining) ExtendedDefaultPolicy(compiler, isPrejitRoot); line in the JIT getting

This line is executed.

@kant2002
Copy link
Contributor Author

kant2002 commented Sep 1, 2021

This is tracked as dotnet/runtime#58518 in Runtime repo.

@jkotas jkotas closed this as completed Sep 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-NativeAOT-coreclr .NET runtime optimized for ahead of time compilation
Projects
None yet
Development

No branches or pull requests

3 participants