Skip to content

Fix KubernetesPodOperator emitting orphan timestamps for empty container writes#67652

Merged
kaxil merged 1 commit into
apache:mainfrom
astronomer:fix/kpo-empty-log-line-pollution-36571
May 28, 2026
Merged

Fix KubernetesPodOperator emitting orphan timestamps for empty container writes#67652
kaxil merged 1 commit into
apache:mainfrom
astronomer:fix/kpo-empty-log-line-pollution-36571

Conversation

@kaxil
Copy link
Copy Markdown
Member

@kaxil kaxil commented May 28, 2026

When a container running under KubernetesPodOperator writes an empty line, kubelet streams it back (with timestamps=True) as "<rfc3339-ts> \n" -- a timestamp followed by a separator space and an empty message. parse_log_line in pod_manager.py called line.strip().partition(" "), which removed the trailing separator space before partitioning, so the function returned timestamp=None and the caller treated the line as a continuation of the previous buffered log record. The bare RFC3339 string was then appended onto the previous message and emitted as a multi-line log where only the first line carried the Airflow [ts] {pod_manager.py:N} INFO - prefix, leaving unprefixed timestamp rows interleaved in task logs:

[2026-05-28T13:07:50.160+0000] {pod_manager.py:520} INFO - [base] first test line
2026-05-28T13:07:57.030578889Z 
2026-05-28T13:07:57.030581518Z
2026-05-28T13:07:57.030642740Z 
[2026-05-28T13:07:57.034+0000] {pod_manager.py:520} INFO - [base] last test line

Downstream that breaks airflow.utils.log.file_task_handler._parse_timestamp, which feeds the line to pendulum.parse after stripping []: malformed fragments from these orphan rows can raise ValueError: month must be in 1..12 and fail the task entirely.

Closes #36571.

Root cause and history

Regressed in #33675 (merged 2023-08-24, shipped in cncf-kubernetes 7.5.0) which replaced the original line.find(\" \") split with a line.strip().partition(\" \") pattern under the banner of a refactor:

- split_at = line.find(\" \")
- if split_at == -1:
-     ...
- timestamp = line[:split_at]
- message = line[split_at + 1 :].rstrip()
+ timestamp, sep, message = line.strip().partition(\" \")
+ if not sep:
+     ...

The pre-refactor implementation correctly handled <ts> \n because find(\" \") matched the separator space directly and the message-side .rstrip() produced an empty string. The new code strips the separator off before partitioning, so the function loses its only signal that the line is well-formed.

This matches the regression window the original reporter described in #36571: the bug appeared after upgrading cncf-kubernetes from 7.4.2 (pre-refactor) to 7.12.0+ (post-refactor) and is still reproducible on current main (10.17.x).

Fix

  • parse_log_line no longer pre-strips the line; it rstrip(\"\\n\") only and partitions on the original separator, so empty container writes are recognised as (timestamp, \"\") rather than as continuations. If the partition yields no separator the whole line is tried as a bare timestamp (some kubelet versions emit <ts>\\n with no trailing space), and parse failures fall through to the original return-the-raw-line path. It also catches ValueError, not just ParserError, so a malformed timestamp can never escape into Airflow's downstream parsers.
  • The sync (PodManager.fetch_container_logs.consume_logs) and async (AsyncPodManager.fetch_container_logs_before_current_sec) log consumer loops skip emit for empty messages -- the resume marker still advances in the sync path so reconnect-since-time stays correct, but no noisy [base] row is written.

Tests

  • Parametrized test_parse_log_line_handles_empty_container_writes covers <ts> \\n, <ts>\\n, and <ts> (no newline). Verified RED on main, GREEN with the fix.
  • End-to-end test_empty_container_lines_do_not_pollute_previous_message drives fetch_container_logs with the exact log sequence from the issue and asserts no orphan timestamps land in caplog. Also RED on main, GREEN with the fix.

Gotchas

  • Truly empty container output (just \\n) is no longer surfaced as a [base] row. That output carries no information for the task log reader and was previously the trigger for downstream pendulum failures, so dropping it is a net improvement; if a future use case needs to count blank container lines, that's separable work.

@kaxil kaxil force-pushed the fix/kpo-empty-log-line-pollution-36571 branch from ed18cfb to db1539d Compare May 28, 2026 18:00
…ner writes

When a container running under KPO writes an empty line, kubelet streams
it back (with ``timestamps=True``) as ``"<rfc3339-ts> \n"`` -- a timestamp
followed by a separator space and an empty message. ``parse_log_line``
called ``line.strip().partition(" ")`` which removed the trailing
separator space before partitioning, so the function returned
``timestamp=None`` and the caller treated the line as a continuation of
the previous buffered log record. The bare RFC3339 string was then
appended onto the previous message and emitted as a multi-line log
where only the first line carried the Airflow ``[ts] {pod_manager.py:N}
INFO -`` prefix, leaving unprefixed timestamp rows interleaved in task
logs.

Downstream that breaks
``airflow.utils.log.file_task_handler._parse_timestamp`` (which feeds
the line to ``pendulum.parse`` after stripping ``[]``): malformed
fragments from these orphan rows can raise
``ValueError: month must be in 1..12`` and fail the task entirely.

The fix:

* ``parse_log_line`` no longer pre-strips the line; it ``rstrip("\n")``
  only and partitions on the original separator, so empty container
  writes are recognised as ``(timestamp, "")`` rather than as
  continuations. It also catches ``ValueError`` (not just
  ``ParserError``) so a malformed timestamp can never escape.
* The sync and async log consumer loops skip emit for empty messages
  -- the resume marker still advances in the sync path, but no noisy
  ``[base] `` row is written.

Regressed in apache#33675 (cncf-kubernetes 7.5.0, Aug 2023) which replaced
the original ``line.find(" ")`` split with the strip+partition pattern
under the banner of a refactor. The pre-refactor implementation
correctly handled ``<ts> \n`` because ``find(" ")`` matched the
separator space directly. Reported in apache#36571 against 7.12.0 / 7.13.0,
still reproducible on the current main.
@kaxil kaxil force-pushed the fix/kpo-empty-log-line-pollution-36571 branch from db1539d to 20e8d90 Compare May 28, 2026 18:03
@kaxil kaxil merged commit fd95035 into apache:main May 28, 2026
113 checks passed
@kaxil kaxil deleted the fix/kpo-empty-log-line-pollution-36571 branch May 28, 2026 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KubernetesPodOperator prints empty log lines without respecting the log format

2 participants