Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[browser][MT] enable jiterp #101252

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pavelsavara
Copy link
Member

Testing ...

Fixes #100273

@pavelsavara pavelsavara added arch-wasm WebAssembly architecture area-Codegen-Interpreter-mono os-browser Browser variant of arch-wasm labels Apr 18, 2024
@pavelsavara pavelsavara added this to the 9.0.0 milestone Apr 18, 2024
@pavelsavara pavelsavara requested a review from kg April 18, 2024 17:00
@pavelsavara pavelsavara self-assigned this Apr 18, 2024
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad, @kotlarmilos
See info in area-owners.md if you want to be subscribed.

@pavelsavara
Copy link
Member Author

There is GC deadlock with Finalizer on this PR.
Is there something that jiterp could be trying to lock, that is already locked on another thread that is suspended ?
@kg

@kg
Copy link
Contributor

kg commented Apr 19, 2024

There is GC deadlock with Finalizer on this PR. Is there something that jiterp could be trying to lock, that is already locked on another thread that is suspended ? @kg

It's possible jiterp's C helpers might be missing GC state transitions. We could do an audit of all of them.

@pavelsavara
Copy link
Member Author

What's the design ?
I guess the generated code should be in GC unsafe mode.

But I don't know if the interp is running in

  • GC unsafe
  • GC safe
  • or mixed

And if jiterp should share the same rules ?

@kg
Copy link
Contributor

kg commented Apr 19, 2024

What's the design ? I guess the generated code should be in GC unsafe mode.

But I don't know if the interp is running in

* GC unsafe

* GC safe

* or mixed

And if jiterp should share the same rules ?

jiterp traces are called directly by the interp without a transition so they inherit whatever state it's in, it's possible some of the helpers jiterp calls out to (written in c) need transitions and are missing them. it's also possible they contain transitions and shouldn't. i never put any thought into it at the time.

i mentioned on discord also that we should probably disable jiterp on the finalizer thread. the way jiterp works right now, each thread needs to JIT its own copy of the trace it's trying to run, and there's little upside to doing that on the finalizer thread to run a finalizer. it's possible that the process of compiling traces could deadlock if it happens on the finalizer thread, i think.

@pavelsavara pavelsavara added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Apr 19, 2024
@pavelsavara
Copy link
Member Author

So, Chrome is not guilty for the GC deadlock issue we have on MT with jiterp.
It happens on Firefox too.

See Log

@pavelsavara
Copy link
Member Author

pavelsavara commented May 15, 2024

Re-confirmed. Jiterp is causing GC deadlocks

This is firefox log

[11:54:22] info: [PASS] System.Collections.Concurrent.Tests.ConcurrentDictionary_Generic_Tests_enum_enum.ICollection_Generic_Remove_EveryValue
[11:54:27] info: [0x000fb4ac--UI- 11:54:27.555] MONO_WASM: Dumping web worker info as seen by UI thread, it could be stale:
[11:54:27] info: 000 | 0x000fb4ac--UI-: isRunning: true isAttached: true isEventLoop:false reuseCount:  0 - UI Thread
[11:54:27] info: 001 | 0x017da270-dpty: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 -
[11:54:27] info: 002 | 0x034c86b8--IO-: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 -
[11:54:27] info: 003 | 0x039d0970-norm: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 - Finalizer
[11:54:27] info: 004 | 0x03ef7ee0-pool: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 - .NET TP Worker
[11:54:27] info: 005 | 0x0421a218-gate: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 - .NET TP Gate
[11:54:27] info: 007 | 0x05ad0030-long: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 - .NET Long Running Task
[11:54:27] info: 008 | 0x05cd02e0-long: isRunning: true isAttached: true isEventLoop:false reuseCount:  2 - .NET Long Running Task
[11:54:27] info: 006 | 0x060f0040-long: isRunning: true isAttached: true isEventLoop:false reuseCount:  2 - .NET Long Running Task
[11:54:27] info: 010 | 0x062f02f0-long: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 - .NET Long Running Task
[11:54:27] info: 009 | 0x06600040-long: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 - .NET Long Running Task
[11:54:27] info: 012 | 0x06910040-long: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 - .NET Long Running Task
[11:54:27] info: 011 | 0x06c20040-long: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 - .NET Long Running Task
[11:54:27] info: 014 | 0x06f30040-long: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 - .NET Long Running Task
[11:54:27] info: 013 | 0x075d8680-pool: isRunning: true isAttached: true isEventLoop:false reuseCount:  1 - .NET TP Worker
[11:54:27] info: 016 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  5 - unregistered:(.NET Long Running Task)
[11:54:27] info: 028 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  2 - unregistered:(.NET Long Running Task)
[11:54:27] info: 021 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  4 - unregistered:(.NET Long Running Task)
[11:54:27] info: 023 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  3 - unregistered:(.NET Long Running Task)
[11:54:27] info: 017 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  4 - unregistered:(.NET Long Running Task)
[11:54:27] info: 024 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  3 - unregistered:(.NET Long Running Task)
[11:54:27] info: 029 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  2 - unregistered:(.NET Long Running Task)
[11:54:27] info: 020 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  3 - unregistered:(.NET Long Running Task)
[11:54:27] info: 019 | --08780000-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  4 - unregistered:(.NET Long Running Task)
[11:54:27] info: 025 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  2 - unregistered:(.NET Long Running Task)
[11:54:27] info: 022 | --0acd0000-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  3 - unregistered:(.NET Long Running Task)
[11:54:27] info: 030 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  2 - unregistered:(.NET Long Running Task)
[11:54:27] info: 018 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  4 - unregistered:(.NET Long Running Task)
[11:54:27] info: 026 | --0a5b0000-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  3 - unregistered:(.NET Long Running Task)
[11:54:27] info: 015 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  5 - unregistered:(.NET Long Running Task)
[11:54:27] info: 027 | --07c09c28-emsc: isRunning:false isAttached:false isEventLoop:false reuseCount:  3 - unregistered:(.NET Long Running Task)
[11:54:27] info: STATE CUE CARD: (? means a positive number, usually 1 or 2, * means any number)
[11:54:27] info: 	0x0	- starting (GOOD, unless the thread is running managed code)
[11:54:27] info: 	0x1	- detached (GOOD, unless the thread is running managed code)
[11:54:27] info: 	0x2	- running (BAD, unless it's the gc thread)
[11:54:27] info: 	0x?03	- async suspended (GOOD)
[11:54:27] info: 	0x?04	- self suspended (GOOD)
[11:54:27] info: 	0x?05	- async suspend requested (BAD)
[11:54:27] info: 	0x6	- blocking (BAD, unless there's no suspend initiator)
[11:54:27] info: 	0x?07	- blocking async suspended (GOOD)
[11:54:27] info: 	0x?08	- blocking self suspended (GOOD)
[11:54:27] info: 	0x?09	- blocking suspend requested (GOOD in coop; BAD in hybrid)
[11:54:27] info: --thread 0x3431f10 id 0xfb4ac [0] state 109
[11:54:27] info: --thread 0x1cdb318 id 0x17da270 [0] state 109
[11:54:27] info: --thread 0x3441a08 id 0x34c86b8 [0] state 109
[11:54:27] info: --thread 0x3433038 id 0x39d0970 [0] state 6
[11:54:27] info: --thread 0x3edfbc8 id 0x3ef7ee0 [0] state 109
[11:54:27] info: --thread 0x3ede268 id 0x421a218 [0] state 108
[11:54:27] info: --thread 0x4d6ee28 id 0x5ad0030 [0] state 109
[11:54:27] info: --thread 0x5ed3ad8 id 0x5cd02e0 [0] state 109
[11:54:27] info: --thread 0x50e4010 id 0x60f0040 [0] state 105
[11:54:27] info: --thread 0x50e4150 id 0x62f02f0 [0] state 109
[11:54:27] info: --thread 0x5035400 id 0x6600040 [0] state 2  GC INITIATOR
[11:54:27] info: --thread 0x58c6bd0 id 0x6910040 [0] state 109
[11:54:27] info: --thread 0x5ed6068 id 0x6c20040 [0] state 108
[11:54:27] info: --thread 0x5087050 id 0x6f30040 [0] state 109
[11:54:27] info: --thread 0x783bda0 id 0x75d8680 [0] state 109
[11:54:27] info: WAITING for 1 threads, got 0 suspended
[11:54:27] fail: [0x06600040-long 11:54:27.613] [MONO] suspend_thread suspend took 5000 ms, which is more than the allowed 5000 ms

the thread 0x60f0040 is not playing along.

@kg
Copy link
Contributor

kg commented May 15, 2024

I think what we need to do is inspect all the jiterp C helpers and figure out which ones need GC transitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-wasm WebAssembly architecture area-Codegen-Interpreter-mono NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) os-browser Browser variant of arch-wasm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[browser][MT] enable jiterp
2 participants