Fix flaky PinotTaskManagerDistributedLockingTest cron race#18252
Merged
Conversation
The shared createSingleTestTable helper used a cron schedule of "0 */10 * ? * * *" that fires every 10 minutes on the :00 second boundary. With PINOT_TASK_MANAGER_SCHEDULER_ENABLED=true (set by every test in this class), the Quartz scheduler would occasionally fire during the test window, triggering an extra scheduleTasks -> generateTasks call and breaking assertions that count exact task generations. Change the schedule to "0 0 0 1 1 ? 2099" so the cron never fires during test runs, and document why so the schedule is not "corrected" later. Removes the nondeterminism at the root rather than papering over it with retries or relaxed assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4444cba to
2206c4b
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18252 +/- ##
============================================
- Coverage 63.48% 63.46% -0.03%
Complexity 1627 1627
============================================
Files 3244 3244
Lines 197342 197342
Branches 30529 30529
============================================
- Hits 125285 125241 -44
- Misses 62014 62065 +51
+ Partials 10043 10036 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes flakiness in PinotTaskManagerDistributedLockingTest by preventing the Quartz cron scheduler from firing during test execution, which previously could cause unexpected extra generateTasks invocations and break exact-count assertions.
Changes:
- Update the test table task cron schedule to a far-future one-time trigger (
2099) so it won’t run during tests. - Add Javadoc explaining why the schedule is intentionally “effectively never” to avoid reintroducing the race.
yashmayya
approved these changes
Apr 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes flaky
PinotTaskManagerDistributedLockingTest.testForceReleaseLockDuringTaskExecution(and siblings) that intermittently observed an extragenerateTasksinvocation:Root cause
The shared
createSingleTestTablehelper configured every test table with cron"0 */10 * ? * * *"— fires every 10 minutes at the:00second boundary. Each test in this class enablesPINOT_TASK_MANAGER_SCHEDULER_ENABLED=true, which starts the Quartz scheduler and registers that cron. If the test window happened to straddle a:00/:10/:20/...wall-clock boundary, Quartz would fireCronJobScheduleJob→PinotTaskManager.scheduleTasks(...)→PinotTaskGenerator.generateTasks(...), bumpingControllableTaskGenerator._taskGenerationCountbeyond what the test's explicitcreateTask/scheduleTaskscalls produced.The tests assert exact counts (e.g.
assertEquals(slowGenerator.getTaskGenerationCount(), 2, ...)), so even a single spurious cron firing breaks them.Fix
Change the helper's cron to
"0 0 0 1 1 ? 2099"— a valid Quartz expression that fires once at midnight on Jan 1, 2099, i.e. effectively never during a test run. Added a Javadoc comment oncreateSingleTestTableexplaining why the schedule is deliberately far-future so a future reader doesn't "correct" it and reintroduce the flakiness.This removes the nondeterminism at the source rather than masking it with retries or loosened assertions. No product code changes.
Test plan
./mvnw -pl pinot-controller -am -Dtest='PinotTaskManagerDistributedLockingTest#testForceReleaseLockDuringTaskExecution' test— passes./mvnw -pl pinot-controller -am -Dtest='PinotTaskManagerDistributedLockingTest' test— all 6 tests pass./mvnw spotless:apply checkstyle:check license:check -pl pinot-controller— clean🤖 Generated with Claude Code