Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard to shutdown, shows stack trace #4

Closed
4 tasks
ethanfrey opened this issue Jan 2, 2020 · 18 comments
Closed
4 tasks

Hard to shutdown, shows stack trace #4

ethanfrey opened this issue Jan 2, 2020 · 18 comments
Labels
production Needed to be production ready
Milestone

Comments

@ethanfrey
Copy link
Member

Summary of Bug

If you run wasmd, it works fine, but fails to shut down on Ctrl+C. If I hit Ctrl+C a second time, it will shut down with a big stack trace.

Version

0.5.2

Steps to Reproduce

See above. I assume this is an issue with shutting down the rust lib and tendermint signal handling.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@workshub
Copy link

workshub bot commented Jan 13, 2020

This issue is now published on WorksHub. If you would like to work on this issue you can
start work on the WorksHub Issue Details page.

@ethanfrey
Copy link
Member Author

I came across this while debuging some weird test failures in the 0.6 update.

When I hit a nil pointer panic in go, killing the process, I got a giant rust stacktrace. I was tracing down weird ffi stuff, til I finally realized it was really just a go bug. In that debugging, I found the "culprit":

https://github.com/wasmerio/wasmer/blob/master/lib/runtime-core/src/fault.rs

Notably signal_trap_handler, which is set by ensure_sighandler.

This is called in two places in wasmer code, notably in the singlepass backend, which we use. (Also in run_tiering which I have no idea if we invoke.

This probably interacts with the tendermint-install signal handlers in not so great ways...

@workshub
Copy link

workshub bot commented Feb 6, 2020

@MDM-General started working on this issue via WorksHub.

@ethanfrey ethanfrey added this to Backlog in Blockchain Development Mar 10, 2020
@ethanfrey ethanfrey moved this from Backlog to Improved Ops in Blockchain Development Mar 10, 2020
@workshub
Copy link

workshub bot commented May 21, 2020

A user started working on this issue via WorksHub.

@yun-yeo
Copy link

yun-yeo commented Jun 15, 2020

Seems the SIGINT is not passed to cosmos TrapSignal function due to Wasmer signal handler.

This can disturb Tendermint & Cosmos side clean shutdown. We must solve this problem before being used in production.

@yun-yeo
Copy link

yun-yeo commented Jun 15, 2020

Maybe related with this golang/go#21905 (comment)

@ethanfrey ethanfrey added the production Needed to be production ready label Jun 15, 2020
@ethanfrey ethanfrey moved this from Improved Ops to 1.0 Release in Blockchain Development Jun 17, 2020
@hanjukim
Copy link

hanjukim commented Jul 6, 2020

Related issue: wasmerio/wasmer#842

@ethanfrey
Copy link
Member Author

Thank you @hanjukim

That is a good description and exactly what I had thought was happening (but they did investigate more in depth, with different backends having different signal schemes). It does give a good place to start investigating

@hanjukim
Copy link

hanjukim commented Jul 6, 2020

From approach above, I have successfully called terra signal handler from wasmer-runtime-core. I will push my branch soon for reference. 🚀

@ethanfrey
Copy link
Member Author

Is this a PR on wasmer? That would be great.

I was thinking to somehow cache the signal handlers in go before calling rust, then reset them when returning from rust, but that is a bit more ugly than just improving wasmer.

@hanjukim
Copy link

hanjukim commented Jul 6, 2020

wasmerio/wasmer@master...hanjukim:interrupt-signal-propagation

Should I push this PR to wasmer? I am not sure that they will accept this sometime soon. I have added propagation codes for SIGINT only.

@ethanfrey
Copy link
Member Author

It will have to get into Wasmer sometime.

It looks like a nice start, but would definitely need some more tests before wasmer accepts it. Let me know if this works for you well in production, I get the general idea. Some comments:

wasmerio/wasmer@master...hanjukim:interrupt-signal-propagation#diff-8f3014265045b59f7b99dc8c469aedb0R515 used to panic if unsuccessful (with .unwrap()). We should likely keep that behavior.

SIGINT_SYS_HANDLER = Some(sigaction(SIGINT, &sa_interrupt).unwrap());

would probably be the closest to current behavior

@ethanfrey
Copy link
Member Author

Also. where did the logic in call_signal_handler come from (I didn't see it in the diff) - I assume the match sig_action.handler() statement mirrors some existing code.

Finally, does singlepass register the signal handler one time on first initialization / execution, or every time we execute? If the later, and we run a few contracts, the original (go) handlers will never be the last handler. I am guessing this is a one-time global thing (static INSTALL_SIGHANDLER: Once = Once::new();), but it would be good to test.

@ethanfrey
Copy link
Member Author

Great start with the code. These are just some comments to make it a bit more robust and likely to end up in wasmer.

@hanjukim
Copy link

hanjukim commented Jul 8, 2020

Also. where did the logic in call_signal_handler come from (I didn't see it in the diff) - I assume the match sig_action.handler() statement mirrors some existing code.

Finally, does singlepass register the signal handler one time on first initialization / execution, or every time we execute? If the later, and we run a few contracts, the original (go) handlers will never be the last handler. I am guessing this is a one-time global thing (static INSTALL_SIGHANDLER: Once = Once::new();), but it would be good to test.

I extracted it from wasmerio/wasmer@master...fluencelabs:clif_jni_hardering
I will test it with multiple contracts for to be sure.

@hanjukim
Copy link

hanjukim commented Jul 8, 2020

Also. where did the logic in call_signal_handler come from (I didn't see it in the diff) - I assume the match sig_action.handler() statement mirrors some existing code.

Finally, does singlepass register the signal handler one time on first initialization / execution, or every time we execute? If the later, and we run a few contracts, the original (go) handlers will never be the last handler. I am guessing this is a one-time global thing (static INSTALL_SIGHANDLER: Once = Once::new();), but it would be good to test.

signal handler is registered one time on first initialization / execution.

@hanjukim
Copy link

hanjukim commented Jul 8, 2020

I uploaded a PR to wasmer project: wasmerio/wasmer#1496

@ethanfrey
Copy link
Member Author

This has been fixed with wasmer 1.0.0, used by cosmwasm-vm 0.13.

Fix is on master as of #358 and will be on the v0.14.0 release. Took a year, but this looks a lot nicer now.

Blockchain Development automation moved this from 1.0 Release to Done Jan 7, 2021
giansalex pushed a commit to disperze/wasmd that referenced this issue Dec 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
production Needed to be production ready
Projects
No open projects
Development

No branches or pull requests

4 participants