Describe the problem
In a recent incident, nodes were exiting while outputting logs due to this function being called:
|
func (l *loggerT) exitLocked(err error, code exit.Code) { |
Since SRE doesn't work a lot with the DB code, it wasn't obvious to them why the nodes were exiting and took some time before realizing that the nodes were exiting due to errors being thrown from the logging sink.
SRE mentioned that one thing they look out for in the case of node crashes is stack traces and requested that we include a stack trace when exiting here to make more apparent that nodes are exiting due to log sink issues.
Jira issue: CRDB-53952
Describe the problem
In a recent incident, nodes were exiting while outputting logs due to this function being called:
cockroach/pkg/util/log/exit_override.go
Line 81 in 049c30a
Since SRE doesn't work a lot with the DB code, it wasn't obvious to them why the nodes were exiting and took some time before realizing that the nodes were exiting due to errors being thrown from the logging sink.
SRE mentioned that one thing they look out for in the case of node crashes is stack traces and requested that we include a stack trace when exiting here to make more apparent that nodes are exiting due to log sink issues.
Jira issue: CRDB-53952