release-26.1: log: fix infinite loop bug from malformed log json decoding#163353
Conversation
When the statusServer.LogFile() API encounters malformed json in json formatted logs, it results in an infinite loop that eventually crashes a node dude to it going OOM. To fix this, the API will now stop reading from the log file and return all previously parsed logs. This means that properly formatted logs after a malformed line will not be returned. Note: This bug was introduced in 24.1 via cockroachdb#113722, which added logic to handle malformed logs. This bugfix essentially reverts back to the previous versions logic when handling json formatted logs. Epic: None Release note (bug fix): Fixes a bug where generating a debug zip causes a node to OOM. This OOM happens when a log file contains a malformed log and is using json (or json-compact) formatting.
|
Thanks for opening a backport. Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate. |
dhartunian
left a comment
There was a problem hiding this comment.
Release justification: this behavior is not gated or opt-in because it's an improvement to error handling. Adding configuration to it would complicate the implementation further and make it harder to verify. This change adds a new code branch that is only triggered in the case of a malformed JSON decode and exits early.
|
blathers backport release-26.1.0-rc |
Backport 1/1 commits from #163224 on behalf of @kyle-a-wong.
When the statusServer.LogFile() API encounters malformed json in json
formatted logs, it results in an infinite loop that eventually crashes
a node dude to it going OOM. To fix this, the API will now stop reading
from the log file and return all previously parsed logs. This means that
properly formatted logs after a malformed line will not be returned.
Epic: None
Release note (bug fix): Fixes a bug where generating a debug zip causes
a node to OOM. This OOM happens when a log file contains a malformed
log and is using json (or json-compact) formatting.
Release justification: Fixes a bug that can cause an infinite loop and eventual OOM. The change only low impact and only affects the code path that causes the OOM.