Fix duplicated logs and memory issue with S3 log handler by jvstein · Pull Request #67144 · apache/airflow

jvstein · 2026-05-19T00:57:34Z

In our Kubernetes based celery executor, we ran into a runaway memory issue with a sensor that used
mode="reschedule" and kept scheduling to the same worker repeatedly. In this environment we have
a dedicated worker set devoted to sensors and the task was getting rescheduled to the same worker
every time the poke was executed. As such, the local log file existed and was getting appended to and
then the S3 log file was also getting appended to each time.

Over time, this caused a large memory spike as the supervisor process loaded the logs from S3, attempted
to append a copy of the logs again, upload the result, and then repeat. The memory usage eventually crashed
the worker due to OOM.

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)

Generated-by: Claude Code following the guidelines

ferruzzi

Approved with a non-blocking thought

ferruzzi · 2026-05-21T18:24:48Z

+            elif has_uploaded:
+                local_loc.write_text("")


Non-blocking idea: This is definitely an improvement in the happy case, but if you want to take it further in another PR, consider what happens if the upload to S3 fails for some reason. I haven't thought this through all the way so I may be wrong. I think has_uploaded will remain false, so the local copy doesn't get truncated, then on the next pass we still write with duplication. If that's the case, you could tail the s3 log before uploading to check for duplicates and trim those before writing?

Just an idea. Thanks for fixing this.

I think that if an upload fails on a given worker, we still want to append logs on subsequent runs that land on the same worker. For example, if the task lands on different workers between pokes, we would still want to re-attempt the upload of the local log files, even if they arrive out-of-order in the final file.

In any scenario where the log upload fails and never re-executes on a worker, we lose logs, but that's a reality with the current code anyway and I'm not sure what the right solution is.

Maybe the worse problem is around a transient read failure of the log file from S3, which could trample the log file entirely.

I'm going to resolve this convo and merge, we're just talking. Maybe I'm looking at it wrong. I think right now if the S3 upload fails, then the local log doesn't get cleaned up. Which means if there is a hiccup and it works the next pass, it will upload duplicate logs. But that's whats already happening right now, so it's not a big deal.

Maybe I'm looking at it wrong. I think right now if the S3 upload fails, then the local log doesn't get cleaned up. Which means if there is a hiccup and it works the next pass, it will upload duplicate logs.

I think in that case, that means the lines in the local log file are not present at all in the S3 file. So when it gets appended, there is no duplication. But maybe I'm the one thinking about it wrong.

jvstein requested a review from o-nikolas as a code owner May 19, 2026 00:57

boring-cyborg Bot added area:logging area:providers provider:amazon AWS/Amazon - related issues labels May 19, 2026

Fix duplicated logs and memory issue with S3 log handler

3c53d5c

jvstein force-pushed the fix_duplicate_s3_logs_with_memory_usage branch from b22fbc9 to 3c53d5c Compare May 19, 2026 00:58

eladkal requested review from ferruzzi and vincbeck May 21, 2026 17:17

vincbeck approved these changes May 21, 2026

View reviewed changes

ferruzzi approved these changes May 21, 2026

View reviewed changes

ferruzzi merged commit c3bf97d into apache:main May 21, 2026
108 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicated logs and memory issue with S3 log handler#67144

Fix duplicated logs and memory issue with S3 log handler#67144
ferruzzi merged 1 commit into
apache:mainfrom
jvstein:fix_duplicate_s3_logs_with_memory_usage

jvstein commented May 19, 2026

Uh oh!

ferruzzi left a comment

Uh oh!

ferruzzi May 21, 2026

Uh oh!

jvstein May 21, 2026

Uh oh!

ferruzzi May 21, 2026 •

edited

Loading

Uh oh!

jvstein May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jvstein commented May 19, 2026

Was generative AI tooling used to co-author this PR?

Uh oh!

ferruzzi left a comment

Choose a reason for hiding this comment

Uh oh!

ferruzzi May 21, 2026

Choose a reason for hiding this comment

Uh oh!

jvstein May 21, 2026

Choose a reason for hiding this comment

Uh oh!

ferruzzi May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jvstein May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ferruzzi May 21, 2026 •

edited

Loading