Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Properly flush log messages on unexpected program crashes #18

Closed
wants to merge 0 commits into from

Conversation

Lancern
Copy link
Collaborator

@Lancern Lancern commented Dec 20, 2022

Currently the buffered log messages are flushed only if the program exits normally or terminated by a Rust panic. If the program terminates due to other reasons (e.g. memory corruption), log messages might be lost. The lost log messages may contain information that are useful for identifying the root cause, which makes things worse.

This PR tries to fix this issue. It contains 3 parts:

  • A new integration test that verifies log messages are properly flushed in an event of program termination.
  • Unix signal handlers that handles signals that lead to program termination on Unix-like systems. Since program termination on Unix can only be triggered either by normal termination or the delivery of a signal, this should make sure all program termination scenarios on Unix-like systems are properly handled.
  • Windows specific handlers that handles unexpected program termination on Windows.

@SpriteOvO
Copy link
Owner

CI test failures have been fixed in main branch, please rebase this PR, thanks!

@SpriteOvO
Copy link
Owner

For signals, it is documented that not all libc functions are signal-safety.

For the current PR implementation, when a signal is triggered, it enters the hook_unix_signals function, which eventually calls the flush function for each sink in default_logger. The sink there may come from spdlog-rs or from a user's custom implementation, which will almost certainly involve signal-unsafety functions - or it's hard to guarantee that it doesn't involve.

I am concerned that this is an obvious UB, if I understand it correctly.

@SpriteOvO
Copy link
Owner

@Lancern
Copy link
Collaborator Author

Lancern commented Jul 12, 2023

The sink there may come from spdlog-rs or from a user's custom implementation, which will almost certainly involve signal-unsafety functions - or it's hard to guarantee that it doesn't involve.

In general it's impossible to guarantee that we don't touch any signal unsafe operations during signal handling, especially when arbitrary third-party code can be involved. What we can do is to provide such a chance for the user to save their logs as much as possible.

@SpriteOvO
Copy link
Owner

SpriteOvO commented Jul 12, 2023

First of all, we probably should not register for all signals. Some signals are not fatal (the control flow will return) and may be triggered frequently.

Just found gabime/spdlog#1607 (comment) provided a solution to safely flush sinks. And we should replace the atomic with condition variable if we can ensure that it is signal-safety.

@Lancern
Copy link
Collaborator Author

Lancern commented Jul 12, 2023

And we should replace the atomic with condition variable if we can ensure that it is signal-safety.

This is indeed the standard approach to execute some non-signal-safe code upon the arrival of a signal. However, for signals like SIGSEGV, I'm not sure whether it can work as expected. Maybe we can have a try.

@Lancern
Copy link
Collaborator Author

Lancern commented Jul 12, 2023

First of all, we probably should not register for all signals.

Agree. We need to decide what signals are interesting for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants