Skip to content

[BUGFIX] Ingester: close TSDB on compaction failure to prevent resource leak#7560

Open
sandy2008 wants to merge 2 commits into
cortexproject:masterfrom
sandy2008:fix-tsdb-compact-resource-leak
Open

[BUGFIX] Ingester: close TSDB on compaction failure to prevent resource leak#7560
sandy2008 wants to merge 2 commits into
cortexproject:masterfrom
sandy2008:fix-tsdb-compact-resource-leak

Conversation

@sandy2008
Copy link
Copy Markdown
Contributor

@sandy2008 sandy2008 commented May 27, 2026

What this PR does

createTSDB opens a TSDB via tsdb.Open() and then runs db.Compact() to flush any head data into blocks after WAL replay. If Compact() returned an error, the function returned immediately without calling db.Close(), leaking the head, WAL, and mmap file handles held by the open database.

Symptom: each failed compaction permanently leaked one set of TSDB file descriptors and memory-mapped regions. Under repeated startup compaction failures for different tenants, the ingester could exhaust file descriptors (ulimit -n) and eventually crash or refuse to open new TSDBs.

Fix: call db.Close() before returning the compaction error. If Close itself fails, log a warning so the secondary failure is observable rather than silent. This follows the existing close-error logging pattern used in closeAllTSDB (line 3032).

Which issue(s) this PR fixes

Fixes #7557

Checklist

  • CHANGELOG.md updated — [BUGFIX] Ingester: entry under master / unreleased.
  • Documentation updated — not applicable, no flags or config changed (make doc not required).
  • Tests verified — no new test added (the error path requires injecting a Compact failure which is non-trivial without production code changes); existing TSDB tests confirm no regression.

Test plan

  • go build -tags "netgo slicelabels" ./pkg/ingester/... — clean
  • go vet -tags "netgo slicelabels" ./pkg/ingester/... — clean
  • goimports -l -local github.com/cortexproject/cortex ./pkg/ingester/ingester.go — clean
  • go test -timeout 300s -tags "netgo slicelabels" -count=1 ./pkg/ingester/... — PASS (27s)
  • go test -tags "netgo slicelabels" -count=1 -run "TestHeadCompactionOnStartup" ./pkg/ingester/... — PASS (exercises the success path of the same code)
  • go test -tags "netgo slicelabels" -count=1 -run "TestIngester_OpenExistingTSDBOnStartup" ./pkg/ingester/... — PASS

…eak (cortexproject#7557)

When `db.Compact()` fails after a successful `tsdb.Open()` in
`createTSDB`, the opened TSDB was not closed, leaking head, WAL,
and mmap file handles. Add `db.Close()` with warning-level logging
on the compaction error path.

Closes cortexproject#7557

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
@sandy2008 sandy2008 changed the title fix(ingester): Close TSDB on compaction failure in createTSDB [BUGFIX] Ingester: close TSDB on compaction failure to prevent resource leak May 27, 2026
Signed-off-by: Friedrich Gonzalez <1517449+friedrichg@users.noreply.github.com>
Copy link
Copy Markdown
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/ingester lgtm This PR has been approved by a maintainer size/XS type/bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resource leak: createTSDB does not close TSDB when db.Compact() fails

3 participants