Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lib-pthread issues #2410

Merged
merged 2 commits into from
Aug 1, 2023
Merged

Conversation

wenyongh
Copy link
Contributor

  • Avoid destroying module instance repeatedly in pthread_exit_wrapper and
    wasm_thread_cluster_exit.
  • Wait enough time in pthread_join_wrapper for target thread to exit and
    destroy its resources.

@Zzzabiyaka
Copy link
Contributor

Zzzabiyaka commented Jul 31, 2023

Hi Wenyong,

recently we observed a situation when test was running 4 threads but thread manager reported up to 5-6 of them because after pthread_join it still was in thread_manager list for some short time (that's how I got it). So we added retries on the test side.

Is the second change fixing it?

@wenyongh
Copy link
Contributor Author

wenyongh commented Aug 1, 2023

Hi Wenyong,

recently we observed a situation when test was running 4 threads but thread manager reported up to 5-6 of them because after pthread_join it still was in thread_manager list for some short time (that's how I got it). So we added retries on the test side.

Is the second change fixing it?

Yes, in lib-pthread mode (not wasi-threads mode), I found that when thread A is exiting in pthread_join_wrapper, it may change its node's status to THREAD_EXIT before calling wasm_cluster_exit_thread to actually exit, and if thread B is joining it and detects the change of thread A node's status, thread B will just return successfully. But at this time thread A may hasn't actually exited, this may cause some unexpected behavior, for example, if thread B is main thread, it may think that all other thread have exited, and it may destroy all resources (module instance, module, runtime, etc.) and exit, but the resources may be still needed by thread A. I just let thread B wait some time after joining to fix it.

@wenyongh
Copy link
Contributor Author

wenyongh commented Aug 1, 2023

@Zzzabiyaka Maybe you can check whether this PR fixes your test issue and don't add the retries?

@loganek
Copy link
Collaborator

loganek commented Aug 1, 2023

I don't think the issue mentioned by @Zzzabiyaka can actually be fixed. Both pthread_join and pthread_create are implemented in the userspace, and they have no information about the status of the underlying threads in a thread cluster. There's no guarantee that at the time when pthread_join() completes, the underlying thread actually finishes (it has a few more things to do in the host before it really ends). So I think the recommendation for users should be to:

  • either implement retries (just like we did in tests)
  • increase number of allowed threads and accept the fact that there will be some short periods of time where there will be more than expected threads running.

The thing is different for lib-pthread, because it's fully implemented in host, therefore we can confidently wait for the native thread to complete.

they are actually destroyed to avoid unexpected behavior. */
os_mutex_lock(&exec_env->wait_lock);
os_cond_reltimedwait(&exec_env->wait_cond, &exec_env->wait_lock, 1000);
os_mutex_unlock(&exec_env->wait_lock);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this still racy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exec_env->wait_lock is used singly or together with exec_env->wait_cond, it won't be occupied by a thread for a long time: when a thread locks exec_env->wait_lock, it will unlock it after short time operations, or call os_cond_wait/reltimedwait, note that the lock will be unlocked firstly inside os_cond_wait/reltimedwait. My understanding is that here the current thread should be able to acquire the lock.

@wenyongh wenyongh merged commit cb6d850 into bytecodealliance:main Aug 1, 2023
368 checks passed
@wenyongh wenyongh deleted the fix_lib_pthread branch August 1, 2023 11:15
Zzzabiyaka pushed a commit to Zzzabiyaka/wasm-micro-runtime that referenced this pull request Aug 1, 2023
- Avoid destroying module instance repeatedly in pthread_exit_wrapper and
  wasm_thread_cluster_exit.
- Wait enough time in pthread_join_wrapper for target thread to exit and
  destroy its resources.
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
- Avoid destroying module instance repeatedly in pthread_exit_wrapper and
  wasm_thread_cluster_exit.
- Wait enough time in pthread_join_wrapper for target thread to exit and
  destroy its resources.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants