Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mono][Wasm][Interp] Assertion fail in Marshal.GetFunctionPointerForDelegate #101736

Open
Nemo-G opened this issue Apr 30, 2024 · 18 comments
Open
Labels
Milestone

Comments

@Nemo-G
Copy link

Nemo-G commented Apr 30, 2024

Description

I was testing a project using Marshal.GetFunctionPointerForDelegate in both Debug and Release mode. The code fragment causing problem is:

               // public delegate void ForwardingFunc(IntPtr systemPtr, IntPtr state);

#if DIRECT_CALL 
// Result in assertion failure in release mode
                defeatGc = (ulong)GCHandle.ToIntPtr(GCHandle.Alloc(managedFn));
                result = (ulong)Marshal.GetFunctionPointerForDelegate(managedFn);
#else
// Mashal with wrapper func
// Result in NULL orig_method in interp_create_method_pointer
               ForwardingFunc wrapper = (IntPtr system, IntPtr state) =>
                {
                    try
                    {
                        managedFn(system, state);
                    }
                    catch (Exception ex)
                    {
                        Debug.LogException(ex);
                    }
                };
                defeatGc = (ulong)GCHandle.ToIntPtr(GCHandle.Alloc(wrapper));
                result = (ulong)Marshal.GetFunctionPointerForDelegate(wrapper);
#endif

I first tested with wrapper func in Debug mode. It failed to find the native2managed function because orig_method is NULL in

MonoMethod *orig_method = info->d.native_to_managed.method;

(I can understand somehow because both m2n or n2m function in pinvoke-table.h was generated at compile time. Lambda function may not contain necessary info. But is it a bug? )

So I removed the wrapper and marshal managedFn directly. I got no problem in Debug mode. However, I met assertion failure in Release mode:

g_assert (td->locals [i].indirects >= 0);

(local_ref_count == 0 at the time)
(Normally a fatal error won't give me exact mono trace so that I changed the related runtime code by setting mono_error_set_xxx to continue the program. This helped me to get the managed stacktraces that caused the problem.)

And after that I did some tests by comparing what made the difference between Debug and Release. Eventually I located the interp optimization options because I found the code path in Debug and Release mode was totally different. With "-jiterp,-bblocks" disabled, everything seems fine.

As I'm not an expert in mono, I do not quite understand the optimization logic which makes the debugging full of struggle. Is there any documents regarding the interp optimization logic? Help is needed!

Reproduction Steps

Just by called the managed code(after mono runtime is initialized of course)

Expected behavior

No error

Actual behavior

Assertion failed and program was terminated

Regression?

I've only tested in .NET 8. Not sure if the behaviour is the same in .NET9

Known Workarounds

Disable part of interp_opts. The minimum combination is to disable INTERP_OPT_JITERPRETER and INTERP_OPT_BBLOCKS.
It works fine in debug mode which I checked the source code that no interp_opts is enabled.

Configuration

.NET8
Wasm
Tested in Chrome/Edge (v8)
I haven't tested against Webkit, but I guess this may not be related to browser.

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 30, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 30, 2024
@Nemo-G
Copy link
Author

Nemo-G commented Apr 30, 2024

Add some extra info: Compile dotnet runtime under single thread. Build without SIMD enabled

@jeffschwMSFT jeffschwMSFT added arch-wasm WebAssembly architecture area-Interop-mono labels May 1, 2024
@lewing lewing added area-Codegen-Interpreter-mono and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 2, 2024
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad, @kotlarmilos
See info in area-owners.md if you want to be subscribed.

@Nemo-G
Copy link
Author

Nemo-G commented May 7, 2024

I'm going to provide more info here. I located the method to marshal and turned on MONO_VERBOSE_METHOD.
And I hacked in interp_local_deadce like
image

Here's the IL diff containing jiterp&bblocks ON vs OFF.
https://www.diffchecker.com/4MXLTBTw/

It seems certain instructions was taken as dead but should be kept? And the mis-removal would cause problem in jiterp optimization?

In fact, with the hacking above. The program would work with only JITERP disabled

@BrzVlad I completely understand that you might be swamped. But would you take a look and provide some insights when you have time? Thanks in advance!

@BrzVlad
Copy link
Member

BrzVlad commented May 9, 2024

Thank you for taking the time to look into it. From the verbose log, it looks like the ldloca is generated with the incorrect source. The prefix1 instruction from inlined method is most likely a ldarga. All roads point to this being a known issue: #97650.

Are you sure you are using the latest net8 release ?

@Nemo-G
Copy link
Author

Nemo-G commented May 9, 2024

@BrzVlad Thanks for the inputs! We are indeed not working on the latest net8 but a fork with Unity specific changes. I will apply the patch and get you with the results later.

@Nemo-G
Copy link
Author

Nemo-G commented May 9, 2024

With the fix I can no longer see the BBLOCK optimization errors. Thanks again for the kind help! @BrzVlad

But still the program is not running correctly unless I disable INTERP_OPT_JITERPRETER. I got memory access OOB with callstack
image

I checked the method value in mono_runtime_invoke and found the related method is

    static class AutomaticWorldBootstrap
    {
        [RuntimeInitializeOnLoadMethod(RuntimeInitializeLoadType.BeforeSceneLoad)]
        static void Initialize()
        {
            DefaultWorldInitialization.Initialize("Default World", false);
        }
    }

// attr definition
    public enum RuntimeInitializeLoadType
    {
        AfterSceneLoad  = 0,
        BeforeSceneLoad,
        AfterAssembliesLoaded,
        BeforeSplashScreen,
        SubsystemRegistration
    }

    [Scripting.RequiredByNativeCode]
    [System.AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
    public class RuntimeInitializeOnLoadMethodAttribute : Scripting.PreserveAttribute
    {
        public RuntimeInitializeOnLoadMethodAttribute() { this.loadType = RuntimeInitializeLoadType.AfterSceneLoad; }
        public RuntimeInitializeOnLoadMethodAttribute(RuntimeInitializeLoadType loadType) { this.loadType = loadType; }

        public RuntimeInitializeLoadType loadType
        {
            get { return m_LoadType; } private set { m_LoadType = value; }
        }

        RuntimeInitializeLoadType m_LoadType;
    }

Although the stack evaluation failed in IsUserCattrProvider, I can't make sure if it is the consequence or the root cause. What info can I provide regarding the jiterpreter logic in order to help further debug?

@Nemo-G
Copy link
Author

Nemo-G commented May 9, 2024

d62ff0b This is the only commit my branch doesn't have but I think it is not related.

And here is the mono verbose and jiterp stats log before MOOB

MONO_WASM: // jitted 471 bytes; 0 traces (0.0%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 2, fused: 1; back-branches emitted: 0, failed: 1 (0.0%); 
MONO_WASM: // time: 0ms generating, 0ms compiling wasm.
MONO_WASM: // jitted 942 bytes; 1 traces (50.0%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 4, fused: 2; back-branches emitted: 0, failed: 2 (0.0%); 
MONO_WASM: // time: 3ms generating, 1ms compiling wasm.
dotnet.native.8.0.0.pi5hm4ztm8.js:8 Input Manager initialize...

dotnet.native.8.0.0.pi5hm4ztm8.js:8 UnloadTime: 0.700000 ms

MONO_WASM: // jitted 1267 bytes; 2 traces (66.7%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 4, fused: 2; back-branches emitted: 0, failed: 2 (0.0%); 
MONO_WASM: // time: 3ms generating, 3ms compiling wasm.
MONO_WASM: // jitted 1573 bytes; 3 traces (75.0%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 5, fused: 2; back-branches emitted: 0, failed: 2 (0.0%); 
MONO_WASM: // time: 4ms generating, 4ms compiling wasm.
MONO_WASM: // jitted 1974 bytes; 4 traces (80.0%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 5, fused: 4; back-branches emitted: 0, failed: 2 (0.0%); 
MONO_WASM: // time: 4ms generating, 5ms compiling wasm.
MONO_WASM: // jitted 2321 bytes; 5 traces (83.3%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 7, fused: 5; back-branches emitted: 0, failed: 2 (0.0%); 
MONO_WASM: // time: 5ms generating, 7ms compiling wasm.
MONO_WASM: // jitted 2679 bytes; 6 traces (85.7%) (1 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 7, fused: 5; back-branches emitted: 0, failed: 2 (0.0%); 
MONO_WASM: // time: 5ms generating, 8ms compiling wasm.
MONO_WASM: // jitted 3935 bytes; 7 traces (87.5%) (1 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 8, fused: 17; back-branches emitted: 0, failed: 2 (0.0%); 
MONO_WASM: // time: 5ms generating, 9ms compiling wasm.
MONO_WASM: // jitted 4543 bytes; 8 traces (88.9%) (1 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 8, fused: 20; back-branches emitted: 0, failed: 2 (0.0%); 
MONO_WASM: // time: 6ms generating, 10ms compiling wasm.
MONO_WASM: // jitted 5000 bytes; 9 traces (90.0%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 10, fused: 21; back-branches emitted: 0, failed: 3 (0.0%); 
MONO_WASM: // time: 7ms generating, 11ms compiling wasm.
MONO_WASM: // jitted 5474 bytes; 10 traces (90.9%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 14, fused: 21; back-branches emitted: 0, failed: 3 (0.0%); 
MONO_WASM: // time: 7ms generating, 13ms compiling wasm.
MONO_WASM: // jitted 6292 bytes; 11 traces (91.7%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 14, fused: 26; back-branches emitted: 0, failed: 3 (0.0%); 
MONO_WASM: // time: 7ms generating, 14ms compiling wasm.
MONO_WASM: // jitted 6705 bytes; 12 traces (92.3%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 14, fused: 27; back-branches emitted: 1, failed: 3 (25.0%); 
MONO_WASM: // time: 8ms generating, 15ms compiling wasm.
MONO_WASM: // jitted 7195 bytes; 13 traces (92.9%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 15, fused: 28; back-branches emitted: 2, failed: 4 (33.3%); 
MONO_WASM: // time: 9ms generating, 16ms compiling wasm.
MONO_WASM: // jitted 7597 bytes; 14 traces (93.3%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 18, fused: 28; back-branches emitted: 2, failed: 4 (33.3%); 
MONO_WASM: // time: 9ms generating, 18ms compiling wasm.
MONO_WASM: // jitted 7977 bytes; 15 traces (93.8%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 21, fused: 28; back-branches emitted: 2, failed: 4 (33.3%); 
MONO_WASM: // time: 9ms generating, 19ms compiling wasm.
MONO_WASM: // jitted 9076 bytes; 16 traces (88.9%) (3 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 30, fused: 28; back-branches emitted: 2, failed: 4 (33.3%); 
MONO_WASM: // time: 10ms generating, 20ms compiling wasm.
MONO_WASM: // jitted 11112 bytes; 17 traces (89.5%) (3 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 52, fused: 28; back-branches emitted: 2, failed: 4 (33.3%); 
MONO_WASM: // time: 10ms generating, 21ms compiling wasm.
MONO_WASM: // jitted 11362 bytes; 18 traces (90.0%) (4 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 55, fused: 28; back-branches emitted: 2, failed: 4 (33.3%); 
MONO_WASM: // time: 11ms generating, 22ms compiling wasm.
MONO_WASM: // jitted 11985 bytes; 19 traces (90.5%) (4 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 60, fused: 29; back-branches emitted: 3, failed: 4 (42.9%); 
MONO_WASM: // time: 11ms generating, 24ms compiling wasm.
Method Unity.Entities.AutomaticWorldBootstrap:Initialize (), optimized 0, original code:
IL_0000: ldstr     "Default World"
IL_0005: ldc.i4.0  
IL_0006: call      0x0a0002a0
IL_000b: pop       
IL_000c: ret       

IL_0000 ldstr     , sp 0,                
IL_0005 ldc.i4.0  , sp 1, O  String      
IL_0006 call      , sp 2, I4             
IL_000b pop       , sp 1, O  World       
IL_000c ret       , sp 0,                
Unoptimized IR:
BB0:
IL_ffffffff: tier_enter_method [nil <- nil],
BB1:
IL_0000: ldstr          [0 <- nil], 0
IL_0005: ldc.i4.0       [1 <- nil],
IL_0006: call           [2 <- c:], Unity.Entities.DefaultWorldInitialization:Initialize (string,bool)
IL_000c: ret.void       [nil <- nil],
Runtime method: Unity.Entities.AutomaticWorldBootstrap:Initialize () 0x6984cc
Locals size 0
Calculated stack height: 2, stated height: 8
IR_0000: tier_enter_method [nil <- nil],
IR_0001: ldstr          [0 <- nil], 0
IR_0004: ldc.i4.0       [8 <- nil],
IR_0006: call           [0 <- 0], Unity.Entities.DefaultWorldInitialization:Initialize (string,bool)
IR_000a: ret.void       [nil <- nil],

SEQ POINT MAP FOR Initialize: 
MONO_WASM: // jitted 12518 bytes; 20 traces (90.9%) (4 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 60, fused: 30; back-branches emitted: 4, failed: 6 (40.0%); 
MONO_WASM: // time: 11ms generating, 25ms compiling wasm.
dotnet.native.8.0.0.pi5hm4ztm8.js:8 [Test] LoadingSystem.OnCreate invoked

dotnet.native.8.0.0.pi5hm4ztm8.js:8 RuntimeError: memory access out of bounds

@BrzVlad
Copy link
Member

BrzVlad commented May 10, 2024

There have been some serious changes in this area in .net9. It would be worthwhile to try it out. Maybe @kg can provide some help with diagnosing this latest failure

@kg
Copy link
Contributor

kg commented May 10, 2024

As a starting point, you could use MONO_VERBOSE_METHOD to dump IsUserCattrProvider. The jiterpreter should automatically engage its diagnostics for the method and we can examine what it's doing to see if it makes sense.

@Nemo-G
Copy link
Author

Nemo-G commented May 11, 2024

@BrzVlad Thanks for the suggestion. In fact we started our project investigation with net9 alpha. The only reason we decided to use net8 is that it is officially released at the time. When do you think that net9 can be considered stable? We will definitely come back to integrate with net9 by then.

localhost-1715442485731.log
@kg Here's the log with IsUserCattrProvider enabled.

@kg
Copy link
Contributor

kg commented May 11, 2024

Is there a reason why it says dotnet.runtime.8.0.0.90q4mwuy84.js? Are you not able to use the latest version of 8.0?
Will still investigate, but you may be missing bug fixes.

@kg
Copy link
Contributor

kg commented May 11, 2024

Also, are you running any third party WASM code in your environment? This crash appears to indicate memory corruption, specifically the reserved low memory region in the WASM heap that's supposed to be all zeroes. If the low memory region is corrupted when the jiterpreter is enabled, that can turn NullReferenceExceptions into 'memory access out of bounds' errors. I refer to this area as the "zero page".

image

If the zero page is being corrupted, you can disable zero page optimizations by turning off the jiterpreter-zero-page-optimization option. See https://github.com/dotnet/runtime/blob/main/docs/design/mono/jiterpreter.md#configuring-the-jiterpreter for information on how to configure the jiterpreter.

You could also disable null check elimination, but in this case I don't think that's related.

If I found the correct implementation of IsUserCattrProvider, this would potentially occur if the obj passed into it is not a Type. The first isinst will produce null, and the second isinst will be type-checking null (which is safe with zero page optimizations as long as the heap isn't corrupted).

You could also examine the bottom of WASM memory in the devtools debugger when this crash happens. It should be all zeroes.

@Nemo-G
Copy link
Author

Nemo-G commented May 11, 2024

@kg Yes there are third party natives, I'm working on Unity :) My team started to add a lot Unity specific changes on top of NET8 to make it work on WebGL platform without AOT (to keep a lower memory on mobile devices). In fact this issue comes from one of our clients. As we need a stable version and not sure if any recurssion issue may be introduced when applying the latest change of NET8, we just let the net8 stay on a specific version unless there are problems. And only the wasm changes are necessary for us.

I will try the jiterp options first and try to merge the latest NET8 later.

@Nemo-G
Copy link
Author

Nemo-G commented May 11, 2024

Confirmed that there is no MOOB when jiterpreter-zero-page-optimization is turned off.

@kg
Copy link
Contributor

kg commented May 11, 2024

Confirmed that there is no MOOB when jiterpreter-zero-page-optimization is turned off.

For the interim it should be safe to just run with that option turned off, and the performance penalty is pretty small (something like 1-5% for jiterpreter traces that do specific operations).

@Nemo-G
Copy link
Author

Nemo-G commented May 11, 2024

For the interim it should be safe to just run with that option turned off, and the performance penalty is pretty small (something like 1-5% for jiterpreter traces that do specific operations).

Will try that temporarily for this specific project. Thanks for the help!
But from your analysis, it seems there is an NPE which corrupts the 'zeroPage'. Instead of disabling the zero page opts, is there an 'official way' to solve this problem? I mean in what directions that I can try to analyze further? If there are other solutions, I definitely do not want to disable zero page opts by default for every clients.

@kg
Copy link
Contributor

kg commented May 11, 2024

I do not know of any good way to detect zero page corruption other than frequently verifying the zero page after invoking third party code. The jiterpreter does some basic checks of the zero page in order to identify this kind of corruption, and it will shut off the optimizations if it detects it. But it can't always detect it.

Also, I believe we pass --low-memory-unused to wasm-opt when optimizing the runtime, so the runtime itself may end up containing similar zero page optimizations in its C. I'm not sure when or how wasm-opt exploits this property. I think clang may also have some optimizations of this sort in it. So you may potentially experience related crashes in other code.

@Nemo-G
Copy link
Author

Nemo-G commented May 11, 2024

You are correct. I just checked the build binlog and found that -O3 --low-memory-unused --zero-filled-memory... were passed to wasm-opt.

As for the 3rd party codes, except for Unity engine native modules, 3rd party native libs are allowed to be involved in the build process. If this crash can be triggered by any 3rd party native(wasm), it seems disabling the zero page optimization would be the only way to ensure the game won't crash.

@lewing lewing added this to the Future milestone May 13, 2024
@lewing lewing removed the untriaged New issue has not been triaged by the area owner label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants