Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka does not log the error when ActorSystem shuts down due to failed assert #17010

Closed
RichardBradley opened this issue Mar 5, 2015 · 15 comments
Assignees
Labels
1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted bug
Milestone

Comments

@RichardBradley
Copy link
Contributor

We have found that Errors (assertion failures, out of memory errors etc.) can cause Akka 2.3.x to exit without logging any failure info.

Original repro: certain types of TCP connections seem to crash our Akka HTTP 1.0-M3 app -- that is being tracked at #17015 .
This issue tracks the more worrying problem that Akka exits without logging any failure info in that case, making this extremely difficult to track down.

After much struggle, I believe that the issue is that the entire message queue of the Slf4jLogger actor is discarded on ActorSystem shutdown.
In our case, this includes the enqueued "ERROR" log describing the System Guardian actor failure.

I have uploaded a repro project at https://gist.github.com/RichardBradley/f7b1b7c3358779e3559d
If you run this with "sbt run", you will see that "myApp.log" has lots of log messages, but that it does not include all log messages that were enqueued, and importantly it does not include the details of the Actor assertion failure which was escalated and caused ActorSystem shutdown.

What can I do to ensure that errors which are fatal to the ActorSystem get logged, if I am using SLF4J / Logback?

With respect, why does Akka have its own logging system based on Actors in the first place? Perhaps Akka should be calling SLF4J directly, using an "AsyncAppender" to give non-blocking behaviour? That would solve this issue (the appender's queue is drained on JVM exit by Logback), and perhaps many others.

Thanks,

Rich

@RichardBradley
Copy link
Contributor Author

I’ve pushed a second demo that shows the start of a possible fix: https://gist.github.com/RichardBradley/fc0cc3d312d985e65794 .

If we strip out the Actor subscribers to LoggingBus and replace them with code that directly hands off all logging messages to SLF4J as soon as they arrive, then the problem goes away. To avoid blocking I/O on actor threads, we need to add an AsyncAppender to the Logback config XML.

You can run the same demo from above, and see that Logback will drain the AsyncAppender queue on application shutdown. It's also quite a bit faster than the Akka Actor based logging version.

What do you think? Akka could then get rid of a whole load of complexity (like the “startDefaultLoggers” methods in Logging.scala, the custom log-level definitions which duplicate those in SLF4J, etc.). I suppose it would be a massively breaking change.

(I'll do something like this as a workaround for now in my app.)

@bantonsson bantonsson added 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted bug labels Mar 6, 2015
@bantonsson
Copy link
Member

Thanks for reporting this issue.

The main reason that akka has its own message bus based logging system is simply to not block when logging. Logging is commonly a very large contention point in a highly concurrent system.

@RichardBradley
Copy link
Contributor Author

I had thought that this issue was entirely related to the message queue of the Slf4jLogger actor, but actually the minimised example I've just uploaded to #17015 crashes without logging the failed assert when using Akka's default logging setup (no SLF4J at all!). Perhaps there is more than one issue at play here.

@patriknw
Copy link
Member

I suggest that we create a special mailbox for the akka logger that drains the message queue to System.out (delegating to StdOutLogger) when it is shutdown instead of sending them to deadLetters.

@mehmeteking
Copy link

Do we consider log messages diverted to StdOutLogger "not lost", or is this a workaround?

@patriknw
Copy link
Member

perhaps there is a better way, e.g. making sure that loggers are stopped last and when other actors have been terminated send loggers a special stop message to drain the mailbox before completing the shutdown

@RichardBradley
Copy link
Contributor Author

I think the best fix would be to remove the Actor-based logging and replace it with direct calls to Logback / SLF4J, with Logback configured to be non-blocking. This would solve this problem and reduce Akka complexity (there's no benefit to having Akka declare its own set of LogLevel enums when SLF4J already does so). See the comments on #16745.

I think https://xkcd.com/927/ applies -- Akka shouldn't have it's own logging API; it should use SLF4J.

IMO all the other proposed solutions here increase complexity, rather than reduce it.

@patriknw
Copy link
Member

As background, the original driving factor, apart from performance considerations, was that akka-actor is supposed to have zero dependencies, apart from scala-library and typesafe-config.

@RichardBradley RichardBradley changed the title Akka drops error logs on ActorSystem shutdown due to failed assert when using SLF4J Akka does not log the error when ActorSystem shuts down due to failed assert Mar 26, 2015
@RichardBradley
Copy link
Contributor Author

A small update: this issue isn't restricted to the new Akka HTTP 1.0-M releases -- we are having the same problem in our Spray 1.3 app using Akka 2.3.3.

An out of memory error can cause the app to exit without logging any error info (which makes diagnosing the fault rather infuriating).

@patriknw
Copy link
Member

You can't expect anything when there is out of memory. At the best the JVM will exit, but it might not even be able to do that. You should run the JVM with flags -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError="kill -9 %p".

The logging API concerns will not be handled by this ticket.

What I have done is to try to drain log messages that are queued when the actor system is shutdown.

patriknw added a commit that referenced this issue May 15, 2015
* by using a special mailbox that emits the remaining log messages
  to the StandardOutLogger
@drewhk
Copy link
Member

drewhk commented May 15, 2015

On the other hand we should log AssertionErrors. Silently killing the actor system in those cases makes it very hard to debug

@patriknw
Copy link
Member

We print/log AssertionErrors, see the uncaughtExceptionHandler in ActorSystem. The improvement of the mailbox would also emit those log messages for the case when JvmExitOnFatalError is false.

patriknw added a commit that referenced this issue Jun 17, 2015
* by using a special mailbox that emits the remaining log messages
  to the StandardOutLogger
patriknw added a commit that referenced this issue Jun 17, 2015
@patriknw patriknw added this to the 2.4-M2 milestone Jun 17, 2015
@patriknw patriknw self-assigned this Jun 17, 2015
@RichardBradley
Copy link
Contributor Author

I think the best fix would be to remove the Actor-based logging and replace it with direct calls to Logback / SLF4J, with Logback configured to be non-blocking.

Is there any appetite for this at all, or is it just a non-starter for BC reasons?

I could look into a PR. It's likely to be a very wide-ranging change, of course. I think the end result for Akka would be cleaner, simpler and much more robust. Shall I raise a separate issue or start a thread on the dev list or just forget about it?

@patriknw
Copy link
Member

It cannot be done in 2.4 for compatibility reasons. I don't think we are ready for that change yet.

ktoso pushed a commit to ktoso/akka that referenced this issue Jan 11, 2016
…iknw

=act akka#17010 Drain log messages on system shutdown
@RichardBradley
Copy link
Contributor Author

RichardBradley commented Oct 4, 2016

What I have done is to try to drain log messages that are queued when the actor system is shutdown.

Thanks. This is a lot better than the previous version, but it still feels to me like a bug / undesirable edge case that any logs pending at the time of an ActorSystem shutdown (e.g. fatal exception) will go to stdout rather than to the configured log file.

I'll keep my workaround in place on my apps, I think. I've configured Akka so that the Actor-based logging is just an UnregisteredActorRef that immediately hands the message to Logback (configured with an async queue to avoid blocking issues) (note that their async queue has a JVM shutdown hook to ensure a drain on exit).

I think the best fix would be to remove the Actor-based logging and replace it with direct calls to Logback / SLF4J, with Logback configured to be non-blocking.

Is there any appetite for this at all, or is it just a non-starter for BC reasons?

It cannot be done in 2.4 for compatibility reasons. I don't think we are ready for that change yet.

I still think this would be both an improvement and a big simplification for Akka. I've raised akka/akka-meta#31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted bug
Projects
None yet
Development

No branches or pull requests

5 participants