Skip to content

fix: add ScopeGuard to wait async tasks before early return in OrphanFilesCleanerImpl::Clean()#307

Merged
zjw1111 merged 3 commits into
alibaba:mainfrom
zjw1111:fix/orphan-cleaner-segfault
May 28, 2026
Merged

fix: add ScopeGuard to wait async tasks before early return in OrphanFilesCleanerImpl::Clean()#307
zjw1111 merged 3 commits into
alibaba:mainfrom
zjw1111:fix/orphan-cleaner-segfault

Conversation

@zjw1111
Copy link
Copy Markdown
Collaborator

@zjw1111 zjw1111 commented May 27, 2026

Purpose

Linked issue: N/A

Fix intermittent segmentation fault in OrphanFilesCleanerImpl::Clean(). When GetUsedFiles() returns an early error (e.g. "do not support clean index manifest"), the function returns immediately while thread pool tasks submitted via Via(executor_.get(), ...) are still running. These tasks capture this and access members like fs_, causing use-after-free when the OrphanFilesCleanerImpl object is destroyed shortly after.

Changes:

  1. Added a ScopeGuard in Clean() that calls CollectAll(file_statuses_futures) to ensure all submitted async tasks complete before the function returns on early error exit.
  2. Added future.valid() check in CollectAll() (in future.h) to safely skip already-consumed futures, so the guard is a no-op on the normal path where CollectAll is explicitly called later. This is consistent with the existing Wait() function which already has the same valid() check.
  3. Added a TODO comment in Via() noting that the exception capture logic will be removed in the future since paimon-cpp uses Status/Result for error handling throughout.

Tests

Existing test OrphanFilesCleanerTest.TestTableWithIndexManifest covers this path — it was the one experiencing the intermittent segfault.

API and Format

No.

Documentation

No.

Generative AI tooling

Generated-by: Aone Copilot (Claude 4.6 Opus)

Copilot AI review requested due to automatic review settings May 27, 2026 05:16
@zjw1111 zjw1111 force-pushed the fix/orphan-cleaner-segfault branch from 8386f5d to 36e42de Compare May 27, 2026 05:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@zjw1111 zjw1111 force-pushed the fix/orphan-cleaner-segfault branch 2 times, most recently from 977a477 to ee6d26d Compare May 27, 2026 07:31
@zjw1111 zjw1111 requested a review from Copilot May 27, 2026 07:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread src/paimon/core/operation/orphan_files_cleaner_impl.cpp
Comment thread src/paimon/common/executor/future.h Outdated
@zjw1111 zjw1111 force-pushed the fix/orphan-cleaner-segfault branch from 032b8b0 to 7e5aae1 Compare May 28, 2026 07:26
@zjw1111
Copy link
Copy Markdown
Collaborator Author

zjw1111 commented May 28, 2026

@Eyizoha, the exception handling logic (try/catch + set_exception) in Via() is a legacy pattern. Since paimon-cpp consistently uses Status/Result for error handling, we plan to remove this exception capture logic from Via() in the future. Another reason for this is that calling functions that may throw exceptions inside ScopeGuard can result in unexpected crashes.

zjw1111 added 2 commits May 28, 2026 15:58
When GetUsedFiles() returns an error (e.g. index manifest not supported),
Clean() would return immediately while thread pool tasks submitted via
Via(executor_.get(), ...) were still running. These tasks capture 'this'
and access members like fs_, causing use-after-free when the
OrphanFilesCleanerImpl is destroyed shortly after.

Add a ScopeGuard that calls CollectAll(file_statuses_futures) to ensure
all submitted async tasks complete before the function returns, preventing
the intermittent segmentation fault.
@zjw1111 zjw1111 force-pushed the fix/orphan-cleaner-segfault branch from 6b345a4 to 0e54c34 Compare May 28, 2026 07:59
Copy link
Copy Markdown
Collaborator

@lucasfang lucasfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@zjw1111 zjw1111 merged commit 433487e into alibaba:main May 28, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants