Skip to content

[v3-2-test] Fix triggerer file handle leak when remote log upload fails (#66675)#66684

Merged
potiuk merged 1 commit into
v3-2-testfrom
backport-a6798ab-v3-2-test
May 11, 2026
Merged

[v3-2-test] Fix triggerer file handle leak when remote log upload fails (#66675)#66684
potiuk merged 1 commit into
v3-2-testfrom
backport-a6798ab-v3-2-test

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

When a trigger finishes, the supervisor uploads its log to the remote log
store and then closes the local file descriptor. If upload_to_remote()
raised (e.g., S3/GCS throttling, transient network error), close() was
never called and the underlying BufferedWriter — plus its 8 KiB buffer and
the open fd — leaked for every failed upload.

Wrap the cleanup in try/except/finally so the fd is always closed, and log
the upload failure instead of letting it propagate into handle_requests.

Surfaced in discussion #65985.
(cherry picked from commit a6798ab)

Co-authored-by: Jarek Potiuk jarek@potiuk.com

…ls (#66675)

When a trigger finishes, the supervisor uploads its log to the remote log
store and then closes the local file descriptor. If `upload_to_remote()`
raised (e.g., S3/GCS throttling, transient network error), `close()` was
never called and the underlying BufferedWriter — plus its 8 KiB buffer and
the open fd — leaked for every failed upload.

Wrap the cleanup in try/except/finally so the fd is always closed, and log
the upload failure instead of letting it propagate into `handle_requests`.

Surfaced in discussion #65985.
(cherry picked from commit a6798ab)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
@potiuk potiuk marked this pull request as ready for review May 11, 2026 04:03
@potiuk potiuk merged commit 348c182 into v3-2-test May 11, 2026
29 checks passed
@potiuk potiuk deleted the backport-a6798ab-v3-2-test branch May 11, 2026 04:10
potiuk added a commit that referenced this pull request May 12, 2026
…D test (#66743)

v3-2-test is currently red because the test
test_trigger_logger_fd_closed_when_upload_to_remote_raises (backported
via #66684 from #66675) consumes a `jobless_supervisor` fixture that
was never backported to this branch.

Root cause: the fixture was added on main by #66006 ("Make
TriggerRunnerSupervisor.job optional"), which is a feature change and
was correctly skipped from release-branch backports. The subsequent
fix backport (#66684) brought only the test definition, so the test
errored at setup with "fixture 'jobless_supervisor' not found" across
every DB/Python matrix cell.

Cherry-picking #66006 wholesale isn't viable: 4 conflict regions in
triggerer_job_runner.py totalling ~150 lines (v3-2-test has diverged
since 2026-04-30), and it would drag a feature into a release branch.

Instead, add the fixture inline with the same name and shape as main's
but adapted for v3-2-test's still-required `job: Job` constraint —
use mocker.Mock(spec=Job) instead of job=None. The test only
exercises logger_cache, running_triggers, and _handle_request, none
of which touch the .job attribute, so the mock is sufficient.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant