Fix KubernetesPodOperator emitting orphan timestamps for empty container writes#67652
Merged
kaxil merged 1 commit intoMay 28, 2026
Merged
Conversation
2 tasks
ashb
approved these changes
May 28, 2026
ed18cfb to
db1539d
Compare
…ner writes
When a container running under KPO writes an empty line, kubelet streams
it back (with ``timestamps=True``) as ``"<rfc3339-ts> \n"`` -- a timestamp
followed by a separator space and an empty message. ``parse_log_line``
called ``line.strip().partition(" ")`` which removed the trailing
separator space before partitioning, so the function returned
``timestamp=None`` and the caller treated the line as a continuation of
the previous buffered log record. The bare RFC3339 string was then
appended onto the previous message and emitted as a multi-line log
where only the first line carried the Airflow ``[ts] {pod_manager.py:N}
INFO -`` prefix, leaving unprefixed timestamp rows interleaved in task
logs.
Downstream that breaks
``airflow.utils.log.file_task_handler._parse_timestamp`` (which feeds
the line to ``pendulum.parse`` after stripping ``[]``): malformed
fragments from these orphan rows can raise
``ValueError: month must be in 1..12`` and fail the task entirely.
The fix:
* ``parse_log_line`` no longer pre-strips the line; it ``rstrip("\n")``
only and partitions on the original separator, so empty container
writes are recognised as ``(timestamp, "")`` rather than as
continuations. It also catches ``ValueError`` (not just
``ParserError``) so a malformed timestamp can never escape.
* The sync and async log consumer loops skip emit for empty messages
-- the resume marker still advances in the sync path, but no noisy
``[base] `` row is written.
Regressed in apache#33675 (cncf-kubernetes 7.5.0, Aug 2023) which replaced
the original ``line.find(" ")`` split with the strip+partition pattern
under the banner of a refactor. The pre-refactor implementation
correctly handled ``<ts> \n`` because ``find(" ")`` matched the
separator space directly. Reported in apache#36571 against 7.12.0 / 7.13.0,
still reproducible on the current main.
db1539d to
20e8d90
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a container running under
KubernetesPodOperatorwrites an empty line, kubelet streams it back (withtimestamps=True) as"<rfc3339-ts> \n"-- a timestamp followed by a separator space and an empty message.parse_log_lineinpod_manager.pycalledline.strip().partition(" "), which removed the trailing separator space before partitioning, so the function returnedtimestamp=Noneand the caller treated the line as a continuation of the previous buffered log record. The bare RFC3339 string was then appended onto the previous message and emitted as a multi-line log where only the first line carried the Airflow[ts] {pod_manager.py:N} INFO -prefix, leaving unprefixed timestamp rows interleaved in task logs:Downstream that breaks
airflow.utils.log.file_task_handler._parse_timestamp, which feeds the line topendulum.parseafter stripping[]: malformed fragments from these orphan rows can raiseValueError: month must be in 1..12and fail the task entirely.Closes #36571.
Root cause and history
Regressed in #33675 (merged 2023-08-24, shipped in cncf-kubernetes 7.5.0) which replaced the original
line.find(\" \")split with aline.strip().partition(\" \")pattern under the banner of a refactor:The pre-refactor implementation correctly handled
<ts> \nbecausefind(\" \")matched the separator space directly and the message-side.rstrip()produced an empty string. The new code strips the separator off before partitioning, so the function loses its only signal that the line is well-formed.This matches the regression window the original reporter described in #36571: the bug appeared after upgrading cncf-kubernetes from 7.4.2 (pre-refactor) to 7.12.0+ (post-refactor) and is still reproducible on current
main(10.17.x).Fix
parse_log_lineno longer pre-strips the line; itrstrip(\"\\n\")only and partitions on the original separator, so empty container writes are recognised as(timestamp, \"\")rather than as continuations. If the partition yields no separator the whole line is tried as a bare timestamp (some kubelet versions emit<ts>\\nwith no trailing space), and parse failures fall through to the original return-the-raw-line path. It also catchesValueError, not justParserError, so a malformed timestamp can never escape into Airflow's downstream parsers.PodManager.fetch_container_logs.consume_logs) and async (AsyncPodManager.fetch_container_logs_before_current_sec) log consumer loops skip emit for empty messages -- the resume marker still advances in the sync path so reconnect-since-time stays correct, but no noisy[base]row is written.Tests
test_parse_log_line_handles_empty_container_writescovers<ts> \\n,<ts>\\n, and<ts>(no newline). Verified RED onmain, GREEN with the fix.test_empty_container_lines_do_not_pollute_previous_messagedrivesfetch_container_logswith the exact log sequence from the issue and asserts no orphan timestamps land incaplog. Also RED onmain, GREEN with the fix.Gotchas
\\n) is no longer surfaced as a[base]row. That output carries no information for the task log reader and was previously the trigger for downstream pendulum failures, so dropping it is a net improvement; if a future use case needs to count blank container lines, that's separable work.