Add support for cron based scheduling in controller#18256
Add support for cron based scheduling in controller#18256pri1712 wants to merge 23 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces cron-based scheduling for controller periodic tasks so they can run at wall-clock times independent of controller start time, while retaining the existing fixed-delay scheduler as a fallback.
Changes:
- Add an optional cron expression to
PeriodicTask/BasePeriodicTaskand propagate it through controller periodic task constructors/config. - Update
PeriodicTaskSchedulerto schedule tasks via Quartz when a cron expression is present, otherwise use the existing fixed-delay execution. - Add unit tests covering the cron path and legacy fixed-delay path.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java |
Adds a new config key for ResponseStoreCleaner cron scheduling. |
pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTask.java |
Adds getCronExpression() default method to the PeriodicTask API. |
pinot-core/src/main/java/org/apache/pinot/core/periodictask/BasePeriodicTask.java |
Stores cron expression and exposes it via overridden getter. |
pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskScheduler.java |
Integrates Quartz-based scheduling when cron is provided, otherwise fixed-delay. |
pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskCronJob.java |
Adds a Quartz Job wrapper to execute PeriodicTask.run(). |
pinot-core/src/test/java/org/apache/pinot/core/periodictask/PeriodicTaskSchedulerTest.java |
Adds tests for cron scheduling and fixed-delay fallback. |
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/periodictask/ControllerPeriodicTask.java |
Extends constructor to accept cron expression and passes it to BasePeriodicTask. |
pinot-controller/src/test/java/org/apache/pinot/controller/helix/core/periodictask/ControllerPeriodicTaskTest.java |
Updates test instantiation to include cron expression parameter. |
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java |
Wires cron expression from ControllerConf into periodic task construction. |
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeOffsetAutoResetManager.java |
Wires cron expression from ControllerConf into periodic task construction. |
pinot-controller/src/main/java/org/apache/pinot/controller/validation/OfflineSegmentValidationManager.java |
Wires cron expression from ControllerConf into periodic task construction. |
pinot-controller/src/main/java/org/apache/pinot/controller/validation/BrokerResourceValidationManager.java |
Wires cron expression from ControllerConf into periodic task construction. |
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java |
Wires cron expression from ControllerConf into periodic task construction. |
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/relocation/SegmentRelocator.java |
Wires cron expression from ControllerConf into periodic task construction. |
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceChecker.java |
Wires cron expression from ControllerConf into periodic task construction. |
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotTaskManager.java |
Updates constructor call to match new ControllerPeriodicTask signature (passes null cron). |
pinot-controller/src/main/java/org/apache/pinot/controller/helix/SegmentStatusChecker.java |
Wires cron expression from ControllerConf into periodic task construction. |
pinot-controller/src/main/java/org/apache/pinot/controller/helix/RealtimeConsumerMonitor.java |
Wires cron expression from ControllerConf into periodic task construction. |
pinot-controller/src/main/java/org/apache/pinot/controller/cursors/ResponseStoreCleaner.java |
Wires cron expression from config into task construction. |
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java |
Adds cronExpression config keys + getters/setters for multiple controller periodic tasks. |
Comments suppressed due to low confidence (1)
pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskScheduler.java:75
- start() warns when the scheduler is already started but continues executing, which can double-schedule tasks and (for Quartz) can lead to ObjectAlreadyExistsException / duplicate triggers. Consider returning early when _executorService != null.
public synchronized void start() {
if (_executorService != null) {
LOGGER.warn("Periodic task scheduler already started");
}
| @@ -69,28 +79,59 @@ public synchronized void start() { | |||
| Collection<PeriodicTask> periodicTasks = _periodicTasks.values(); | |||
| LOGGER.info("Starting periodic task scheduler with tasks: {}", periodicTasks); | |||
| _executorService = Executors.newScheduledThreadPool(_periodicTasks.size()); | |||
There was a problem hiding this comment.
_executorService is sized to the total number of tasks even when some/all tasks are scheduled via Quartz. This can create unused threads. Consider sizing the thread pool based on the number of fixed-delay tasks (still keeping enough capacity for scheduleNow()).
| _executorService = Executors.newScheduledThreadPool(_periodicTasks.size()); | |
| int numFixedDelayTasks = (int) periodicTasks.stream().map(PeriodicTask::getCronExpression) | |
| .filter(cronExpression -> cronExpression == null || cronExpression.trim().isEmpty()).count(); | |
| // Keep at least one thread so scheduleNow() can still execute tasks even if all configured tasks use Quartz. | |
| _executorService = Executors.newScheduledThreadPool(Math.max(1, numFixedDelayTasks)); |
There was a problem hiding this comment.
this may cause starvation issues if the quartz scheduler is unable to schedule a cron job for any reason, and we have to fall back to the default fixed delay schedule
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18256 +/- ##
============================================
+ Coverage 63.43% 63.45% +0.01%
Complexity 1683 1683
============================================
Files 3253 3255 +2
Lines 198841 198933 +92
Branches 30795 30799 +4
============================================
+ Hits 126136 126231 +95
- Misses 62625 62627 +2
+ Partials 10080 10075 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@xiangfu0 is cron schedule support for |
yashmayya
left a comment
There was a problem hiding this comment.
is cron schedule support for ResponseStoreCleaner task still needed
Nope, that can be skipped given that it is no longer implementing this controller periodic job pattern.
…sors/ResponseStoreCleaner.java
|
@yashmayya have made the needed changes |
|
hi, just bumping this @xiangfu0 @Jackie-Jiang @yashmayya , would be great if you could review it whenever you have time! i have addressed previous comments |
xiangfu0
left a comment
There was a problem hiding this comment.
Finding 1
- Severity: MAJOR
- Rule: Config contract must not silently degrade
- Where:
pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskScheduler.java-start()cron scheduling path - Issue: When a task has a non-empty cron expression, any Quartz init/schedule failure, including an invalid cron expression, is caught and the task is silently scheduled with the legacy fixed-delay interval.
- Risk: This makes a controller config look accepted while Pinot runs the task on a different cadence than the operator requested. For retention, validation, rebalance, relocation, and cleanup tasks, that can create surprising wall-clock behavior and makes misconfiguration hard to detect.
- Suggested fix: Treat a configured cron expression as authoritative: validate it before scheduling and fail fast or leave the task unscheduled with an explicit config error/metric. Only use fixed-delay scheduling when the cron config is blank/unset.
Finding 2
- Severity: MAJOR
- Rule: New config keys must be wired or removed
- Where:
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java-OFFLINE_SEGMENT_INTERVAL_CHECKER_CRON_EXPRESSIONandTENANT_REBALANCE_CHECKER_CRON_EXPRESSION - Issue: These two cron config keys/getters are added, but the corresponding runtime paths never use them.
OfflineSegmentValidationManagerstill only readsgetOfflineSegmentIntervalCheckerFrequencyInSeconds()for the segment-level interval checker, andTenantRebalanceCheckerstill calls the 3-argBasePeriodicTaskconstructor withoutgetTenantRebalanceCheckerCronExpression(). - Risk: Operators can set
controller.offline.segment.interval.checker.cronExpressionorcontroller.tenant.rebalance.checker.cronExpressionand Pinot will silently ignore them. That is a config compatibility/support issue because the PR exposes knobs that do not affect scheduling. - Suggested fix: Either wire these configs into the owning tasks with tests proving the cron path is used, or remove the keys/getters from this PR until the corresponding scheduler semantics are implemented.
Finding 3
- Severity: CRITICAL
- Rule: CI must be green before merge
- Where: GitHub checks for PR #18256
- Issue: The latest check run has two failing compatibility jobs:
Pinot Compatibility Regression Testing against master on compatibility-verifier/sample-test-suiteandPinot Multi-Stage Query Engine Compatibility Regression Testing against master on compatibility-verifier/multi-stage-query-engine-test-suite. - Risk: Even if these failures are unrelated/flaky, the PR is currently red on compatibility coverage and should not merge until the failures are explained, fixed, or cleanly rerun.
- Suggested fix: Investigate or rerun the failing jobs and update the PR once compatibility checks are green.
|
also please rebase to the latest master branch |
|
@xiangfu0 finding 2: have also rebased the PR with latest commits on master, hopefully should fix CI issues if any. |
|
Reviewed latest head Finding 1
Finding 2
|
This PR aims to fix : Issue #14051
Summary:
Introduces cron based scheduling capabilities for Controller Periodic Tasks.
Motivation:
It allows users to schedule cluster wide jobs at wall clock times without adding an extra dependency on controller start-up time.
Changes:
BasePeriodicTaskto accept a Cron expression as one of the parameters. This changing meant a change was needed in the classes that were inherited from this.PeriodicTaskScheduler) to use the cron expression for a task, if provided otherwise fall back to the default fixed time delay task execution.PeriodicTaskCronJobto run the cron jobs using the quartz scheduler (was already being used in Pinot Minions)Testing:
Added unit tests in
PeriodicTaskSchedulerTestfor both the paths, i.e. for when a cron expression is used to schedule jobs and for when the fixed delay is used as the fallback in case cron expression is not supplied.