Skip to content

Fix multi-controller race conditions in ResponseStoreCleaner cursor cleanup#18113

Merged
yashmayya merged 1 commit intoapache:masterfrom
yashmayya:fix-response-store-cleaner-race-conditions
Apr 7, 2026
Merged

Fix multi-controller race conditions in ResponseStoreCleaner cursor cleanup#18113
yashmayya merged 1 commit intoapache:masterfrom
yashmayya:fix-response-store-cleaner-race-conditions

Conversation

@yashmayya
Copy link
Copy Markdown
Contributor

@yashmayya yashmayya commented Apr 6, 2026

Summary

  • Leader coordination: Only the lead controller runs ResponseStoreCleaner by gating processTables() on isLeaderForTable(TASK_NAME), preventing all controllers from racing to delete the same expired responses on each broker.
  • Graceful concurrent deletion on broker: AbstractResponseStore.deleteResponse() now catches exceptions from readResponse() when files vanish between the exists() check and the read (TOCTOU race). FsResponseStore.deleteResponseImpl() catches exceptions from pinotFS.delete() and treats already-gone directories as success instead of throwing.
  • No batch abort on individual failures: ResponseStoreCleaner.deleteExpiredResponses() logs individual DELETE failures as warnings instead of throwing a RuntimeException, so one failed DELETE no longer aborts the entire broker's cleanup batch.

Root cause

When multiple controllers run the ResponseStoreCleaner concurrently (all controllers run it because processTables() ignores the table leadership list), they race to delete the same expired cursor responses on each broker. The broker's deleteResponse() has a TOCTOU race between exists()readResponse()deleteResponseImpl() — when one controller deletes a cursor's files, the others hit FileNotFoundException / IOException, and the broker returns HTTP 500 instead of 404. The controller's deleteExpiredResponses() then throws on any single 500, aborting the remaining successful deletes' logging for that broker.

Test plan

  • Existing ResponseStoreCleanerTest tests pass (including testPartialBrokerFailureDoesNotBlockOthers and testCleanupTreats404AsSuccess)
  • Verify in a multi-controller environment that only the lead controller runs the cleaner
  • Verify that concurrent DELETE requests to the broker no longer cause HTTP 500s

🤖 Generated with Claude Code

@yashmayya yashmayya added the bug Something is not working as expected label Apr 6, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 31.25000% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.98%. Comparing base (736cbf7) to head (4455238).
⚠️ Report is 28 commits behind head on master.

Files with missing lines Patch % Lines
...he/pinot/common/cursors/AbstractResponseStore.java 14.28% 6 Missing ⚠️
...g/apache/pinot/broker/cursors/FsResponseStore.java 0.00% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18113      +/-   ##
============================================
+ Coverage     63.75%   63.98%   +0.22%     
- Complexity     1573     1617      +44     
============================================
  Files          3167     3181      +14     
  Lines        191658   193753    +2095     
  Branches      29469    29917     +448     
============================================
+ Hits         122198   123970    +1772     
- Misses        59851    59981     +130     
- Partials       9609     9802     +193     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.94% <31.25%> (+0.20%) ⬆️
java-21 55.81% <0.00%> (-0.28%) ⬇️
temurin 63.98% <31.25%> (+0.22%) ⬆️
unittests 63.98% <31.25%> (+0.22%) ⬆️
unittests1 55.84% <0.00%> (-0.27%) ⬇️
unittests2 34.35% <31.25%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya requested review from Copilot, gortiz and xiangfu0 April 7, 2026 15:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate multi-controller races during cursor response-store cleanup by ensuring only one controller performs cleanup work and by making broker-side deletion idempotent under concurrent deletes.

Changes:

  • Gate ResponseStoreCleaner execution to a single controller using lead-controller coordination.
  • Make broker deleteResponse() resilient to TOCTOU races between exists() / metadata read / delete.
  • Prevent cleanup batches from aborting due to individual DELETE failures by logging warnings instead of throwing.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
pinot-controller/src/main/java/org/apache/pinot/controller/cursors/ResponseStoreCleaner.java Adds a lead-controller gate and changes per-broker delete failure handling to avoid aborting cleanup.
pinot-common/src/main/java/org/apache/pinot/common/cursors/AbstractResponseStore.java Handles concurrent disappearance of response metadata during deletion to avoid failing deletes.
pinot-broker/src/main/java/org/apache/pinot/broker/cursors/FsResponseStore.java Treats already-deleted response directories as successful deletes to make deletion idempotent.

Comment on lines +116 to +121
@Override
protected void processTables(List<String> tableNamesWithType, Properties periodicTaskProperties) {
// Make it so that only one controller is responsible for cleanup.
if (!_leadControllerManager.isLeaderForTable(TASK_NAME)) {
return;
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ControllerPeriodicTask.runTask() only invokes processTables() when this controller is leader for at least one entry returned by getTablesToProcess(). With the new isLeaderForTable(TASK_NAME) gate inside processTables(), it’s possible that the controller that is leader for TASK_NAME is not leader for any actual table (e.g., few tables / many controllers), causing processTables() to never be called on that controller and the cleaner to never run. Consider overriding getTablesToProcess() to return a singleton list containing TASK_NAME (or switching this task to BasePeriodicTask) so exactly one controller runs cleanup independent of table distribution, and then remove/avoid the additional early-return gate.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this should be fixed.

Comment on lines +240 to +245
// readResponse() (TOCTOU race), so handle that gracefully.
long bytesWritten = 0;
try {
bytesWritten = readResponse(requestId).getBytesWritten();
} catch (Exception e) {
LOGGER.debug("Could not read response metadata for requestId={} (may have been deleted concurrently)",
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent here is to handle the TOCTOU “deleted between exists() and readResponse()” race, but catching a generic Exception can also hide non-race failures (e.g., deserialization bugs, permission issues, transient FS errors), and it will silently skip metrics adjustments in those cases. Consider narrowing the catch to the expected “not found”/IO exceptions (or rethrow unexpected exceptions) so real read failures aren’t masked.

Suggested change
// readResponse() (TOCTOU race), so handle that gracefully.
long bytesWritten = 0;
try {
bytesWritten = readResponse(requestId).getBytesWritten();
} catch (Exception e) {
LOGGER.debug("Could not read response metadata for requestId={} (may have been deleted concurrently)",
// readResponse() (TOCTOU race), so only suppress read failures when the response no longer exists.
long bytesWritten = 0;
try {
bytesWritten = readResponse(requestId).getBytesWritten();
} catch (Exception e) {
if (exists(requestId)) {
throw e;
}
LOGGER.debug("Could not read response metadata for requestId={} because it was deleted concurrently",

Copilot uses AI. Check for mistakes.
…leanup

When multiple controllers run the ResponseStoreCleaner concurrently, they
race to delete the same expired cursor responses on each broker. This causes
TOCTOU races in the broker's delete path (files disappear between exists()
and read/delete), resulting in HTTP 500s that cascade into aborting the
entire cleanup batch for that broker.

Three fixes:
1. Only the lead controller runs the cleaner by gating processTables() on
   isLeaderForTable(TASK_NAME), eliminating the multi-controller race.
2. Handle concurrent deletion gracefully on the broker side:
   - AbstractResponseStore.deleteResponse() catches exceptions from
     readResponse() when files vanish between exists() and read.
   - FsResponseStore.deleteResponseImpl() catches exceptions from
     pinotFS.delete() and treats already-gone directories as success.
3. ResponseStoreCleaner.deleteExpiredResponses() logs individual delete
   failures as warnings instead of throwing, so one failed DELETE no longer
   aborts the entire broker batch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yashmayya yashmayya force-pushed the fix-response-store-cleaner-race-conditions branch from 3e90ba3 to 4455238 Compare April 7, 2026 16:21
@yashmayya yashmayya merged commit e38f8b9 into apache:master Apr 7, 2026
15 of 16 checks passed
yashmayya added a commit to yashmayya/pinot that referenced this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is not working as expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants