Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-rpc-ext: Remove rayon (Fix deadlock) #1174

Merged
merged 1 commit into from
Feb 27, 2023
Merged

node-rpc-ext: Remove rayon (Fix deadlock) #1174

merged 1 commit into from
Feb 27, 2023

Conversation

kvinwang
Copy link
Collaborator

@kvinwang kvinwang commented Feb 27, 2023

We have got a deadlock in the khala-node while getting storage changes.
According to the gdb stack frame dump, we found the deadlock is caused by the following flow:

[time 0] Thread A: locked(rayon default thread pool) in `pha_getStorageChanges`.
[time 1] Thread B: locked(sp wasm runtime cache) in block executor.
[time 2] Thread B: acquiring (rayon default thread pool) in `fn runtime_version`.
[time 3] Thread A: acquiring (sp wasm runtime cache) `fn runtime_version`.

Ideally, this can be solved by replacing the rayon thread pool used in pha_getStorageChanges with a separate pool rather than the default one.
However, due to the thread pool API of rayon:

thread_pool.install(|| {
        let result = headers
            .into_par_iter()
            .for_each(|| { /*do the work*/ })
            .collect();
})

The codes running in the inner closure inherit the installed thread_pool automatically.
So it would still deadlock in this single thread like:

[time 0] Thread A: locked(rayon custom thread pool) in `pha_getStorageChanges`.
[time 1] Thread A: locked(sp wasm runtime cache) in block executor.
[time 2] Thread A: acquiring (rayon custom thread pool) in `fn runtime_version`.  < deadlock here

Turns out this is an aged issue laying there for a long time.

I've also tried to replace the rayon with a different version:

rayon = { version = "1", git = "https://github.com/rayon-rs/rayon.git" }

But it failed to compile due to that they depend on the external lib.

So, let's just remove the rayon at the moment.

@h4x3rotab @jasl

Copy link
Collaborator

@jasl jasl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jasl jasl merged commit 3442308 into master Feb 27, 2023
@jasl jasl deleted the rayon branch February 27, 2023 07:11
Copy link
Contributor

@h4x3rotab h4x3rotab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

jasl added a commit to Phala-Network/khala-parachain that referenced this pull request Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants