Revisit logging error stack traces in cases of fatal errors #23107

Closed
jrudolph opened this Issue Jun 6, 2017 · 1 comment

Comments

Projects
None yet
3 participants
Member

jrudolph commented Jun 6, 2017

Right now the error and stack trace is only logged on stderr if akka.jvm-exit-on-fatal-error is enabled. It's not clear if that was intended or an oversight.

Context: https://groups.google.com/d/topic/akka-user/R9utH4VIeXU/discussion

@raboof:

Is the logging really logged to stderr here though? Looking at the code:

        if (settings.JvmExitOnFatalError) {
          try {
            markerLogging.error(LogMarker.Security, cause, "Uncaught error from thread [{}] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled", thread.getName)
            import System.err
            err.print("Uncaught error from thread [")
            err.print(thread.getName)
            err.print("] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[")
            err.print(name)
            err.println("]")
            cause.printStackTrace(System.err)
            System.err.flush()
          } finally {
            System.exit(-1)
          }
        } else {
          markerLogging.error(LogMarker.Security, cause, "Uncaught fatal error from thread [{}] shutting down ActorSystem [{}]", thread.getName, name)
          terminate()
        }

'Uncaught fatal error' is in the 'else' block and I don't see a "cause.printStackTrace" there.

Do we have a reason for ignoring the 'cause' in MarkerLoggingAdapter?

@jrudolph:

I think the general idea is to do as little as possible on fatal errors because simple code is more likely to "make it through" (e.g. to prevent logging code provoking another OOM). This is a pretty vague rationale but fatal errors could have lots of causes so we can only guess and try to be conservative.

That said, I don't think there's a good reason to structure the code like this. I think a better solution might be to log the error to stderr first (in all cases) and then also log to the logger afterwards. Using the logger is more likely to run into extra problems as it might need the network still to work etc.

raboof self-assigned this Jun 6, 2017

@jrudolph jrudolph added 3 - in progress and removed 1 - triaged labels Jun 6, 2017

@jrudolph jrudolph added a commit to jrudolph/akka that referenced this issue Jun 6, 2017

@jrudolph jrudolph =act #23107 streamline error reporting on fatal errors
Fixes #23107
6a5f8c7

@raboof raboof added a commit that referenced this issue Jun 6, 2017

@raboof raboof Log fatal errors to stderr, add exceptions to 'marked' errors (#23107)
After logging to stderr it seems reasonable to include the exception
in the logging though the logging subsystem
af9ce16

@raboof raboof added a commit that referenced this issue Jun 7, 2017

@raboof raboof Log fatal errors to stderr, add exceptions to 'marked' errors (#23109)
* Log fatal errors to stderr, add exceptions to 'marked' errors (#23107)

After logging to stderr it seems reasonable to include the exception
in the logging though the logging subsystem

* Add cause message to logging on fatal error

* Flush twice for extra safety

printStackTrace does some allocations, there might be rare cases where
we can at least salvage the cause.
b80ee5e
Member

raboof commented Jun 7, 2017

Created #23117. Included the other system termination logging changes too because adding the stacktrace to the logging might make it heavier, so logging to stderr first to be on the safe side.

patriknw added this to the 2.4.19 milestone Jun 9, 2017

patriknw closed this Jun 9, 2017

@patriknw patriknw added a commit that referenced this issue Jun 12, 2017

@raboof @patriknw raboof + patriknw Log fatal errors to stderr, add exceptions to 'marked' errors (#23109)
* Log fatal errors to stderr, add exceptions to 'marked' errors (#23107)

After logging to stderr it seems reasonable to include the exception
in the logging though the logging subsystem

* Add cause message to logging on fatal error

* Flush twice for extra safety

printStackTrace does some allocations, there might be rare cases where
we can at least salvage the cause.
56a8580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment