Skip to content

Fix flaky PinotTaskManagerDistributedLockingTest cron race#18252

Merged
xiangfu0 merged 1 commit into
apache:masterfrom
xiangfu0:claude/amazing-wescoff-138778
Apr 18, 2026
Merged

Fix flaky PinotTaskManagerDistributedLockingTest cron race#18252
xiangfu0 merged 1 commit into
apache:masterfrom
xiangfu0:claude/amazing-wescoff-138778

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

Summary

Fixes flaky PinotTaskManagerDistributedLockingTest.testForceReleaseLockDuringTaskExecution (and siblings) that intermittently observed an extra generateTasks invocation:

java.lang.AssertionError: Both createTask calls should have generated tasks
expected [2] but found [3]

Root cause

The shared createSingleTestTable helper configured every test table with cron "0 */10 * ? * * *" — fires every 10 minutes at the :00 second boundary. Each test in this class enables PINOT_TASK_MANAGER_SCHEDULER_ENABLED=true, which starts the Quartz scheduler and registers that cron. If the test window happened to straddle a :00/:10/:20/... wall-clock boundary, Quartz would fire CronJobScheduleJobPinotTaskManager.scheduleTasks(...)PinotTaskGenerator.generateTasks(...), bumping ControllableTaskGenerator._taskGenerationCount beyond what the test's explicit createTask / scheduleTasks calls produced.

The tests assert exact counts (e.g. assertEquals(slowGenerator.getTaskGenerationCount(), 2, ...)), so even a single spurious cron firing breaks them.

Fix

Change the helper's cron to "0 0 0 1 1 ? 2099" — a valid Quartz expression that fires once at midnight on Jan 1, 2099, i.e. effectively never during a test run. Added a Javadoc comment on createSingleTestTable explaining why the schedule is deliberately far-future so a future reader doesn't "correct" it and reintroduce the flakiness.

This removes the nondeterminism at the source rather than masking it with retries or loosened assertions. No product code changes.

Test plan

  • ./mvnw -pl pinot-controller -am -Dtest='PinotTaskManagerDistributedLockingTest#testForceReleaseLockDuringTaskExecution' test — passes
  • ./mvnw -pl pinot-controller -am -Dtest='PinotTaskManagerDistributedLockingTest' test — all 6 tests pass
  • ./mvnw spotless:apply checkstyle:check license:check -pl pinot-controller — clean

🤖 Generated with Claude Code

The shared createSingleTestTable helper used a cron schedule of
"0 */10 * ? * * *" that fires every 10 minutes on the :00 second
boundary. With PINOT_TASK_MANAGER_SCHEDULER_ENABLED=true (set by every
test in this class), the Quartz scheduler would occasionally fire during
the test window, triggering an extra scheduleTasks -> generateTasks
call and breaking assertions that count exact task generations.

Change the schedule to "0 0 0 1 1 ? 2099" so the cron never fires
during test runs, and document why so the schedule is not "corrected"
later. Removes the nondeterminism at the root rather than papering
over it with retries or relaxed assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 force-pushed the claude/amazing-wescoff-138778 branch from 4444cba to 2206c4b Compare April 18, 2026 08:54
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.46%. Comparing base (ade9f6c) to head (2206c4b).

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18252      +/-   ##
============================================
- Coverage     63.48%   63.46%   -0.03%     
  Complexity     1627     1627              
============================================
  Files          3244     3244              
  Lines        197342   197342              
  Branches      30529    30529              
============================================
- Hits         125285   125241      -44     
- Misses        62014    62065      +51     
+ Partials      10043    10036       -7     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.43% <ø> (-0.03%) ⬇️
java-21 63.44% <ø> (+<0.01%) ⬆️
temurin 63.46% <ø> (-0.03%) ⬇️
unittests 63.46% <ø> (-0.03%) ⬇️
unittests1 55.44% <ø> (-0.02%) ⬇️
unittests2 34.97% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes flakiness in PinotTaskManagerDistributedLockingTest by preventing the Quartz cron scheduler from firing during test execution, which previously could cause unexpected extra generateTasks invocations and break exact-count assertions.

Changes:

  • Update the test table task cron schedule to a far-future one-time trigger (2099) so it won’t run during tests.
  • Add Javadoc explaining why the schedule is intentionally “effectively never” to avoid reintroducing the race.

@xiangfu0 xiangfu0 merged commit 85147cc into apache:master Apr 18, 2026
20 checks passed
@xiangfu0 xiangfu0 deleted the claude/amazing-wescoff-138778 branch April 18, 2026 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants