Fix duplicate log reads when resuming from log_pos#63531
Fix duplicate log reads when resuming from log_pos#63531gopidesupavan merged 1 commit intoapache:mainfrom
Conversation
jason810496
left a comment
There was a problem hiding this comment.
Nice catch! It's a long time bug introduced in 3.0, thanks a lot!
There was a problem hiding this comment.
Pull request overview
Fixes FileTaskHandler._read() so resumed log reads correctly honor metadata["log_pos"], preventing duplicate log content when tailing/resuming from a prior position.
Changes:
- Reassigns the
islice(...)result to actually advance the output stream whenlog_posis provided. - Adds a unit test that verifies resumed reads skip previously-returned log lines and return only new content.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| airflow-core/src/airflow/utils/log/file_task_handler.py | Fixes resume/tailing behavior by correctly slicing the stream based on metadata["log_pos"]. |
| airflow-core/tests/unit/utils/test_log_handlers.py | Adds regression coverage ensuring resumed reads don’t return duplicate earlier log lines. |
You can also share your feedback on Copilot code review. Take the survey.
|
CI Test failures look unrelated but worth veriyfing |
yeah looks like tests are failing for the lowest dep likely the pydantic==2.12.0 causing issues https://github.com/apache/airflow/actions/runs/23058551095/job/66982917534?pr=63531#step:8:1938 |
|
failures test pr is here #63570 |
(cherry picked from commit 00dc420) Co-authored-by: GPK <gopidesupavan@gmail.com>
Backport successfully created: v3-1-testNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
|
Fix FileTaskHandler._read() so resumed log reads actually honor metadata["log_pos"].
Previously we called islice(...) without reassigning the returned iterator, so no log lines were skipped and resumed reads could return duplicate content from the beginning of the stream.
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.