Skip to content

Add support for cron based scheduling in controller#18256

Open
pri1712 wants to merge 23 commits intoapache:masterfrom
pri1712:feature/cron-tasks-controller
Open

Add support for cron based scheduling in controller#18256
pri1712 wants to merge 23 commits intoapache:masterfrom
pri1712:feature/cron-tasks-controller

Conversation

@pri1712
Copy link
Copy Markdown

@pri1712 pri1712 commented Apr 19, 2026

This PR aims to fix : Issue #14051

Summary:

Introduces cron based scheduling capabilities for Controller Periodic Tasks.

Motivation:

It allows users to schedule cluster wide jobs at wall clock times without adding an extra dependency on controller start-up time.

Changes:

  1. Config layer: Added configs to allow users to specify cron expressions.
  2. Modified BasePeriodicTask to accept a Cron expression as one of the parameters. This changing meant a change was needed in the classes that were inherited from this.
  3. Modified the class responsible for task execution (PeriodicTaskScheduler) to use the cron expression for a task, if provided otherwise fall back to the default fixed time delay task execution.
  4. Implemented PeriodicTaskCronJob to run the cron jobs using the quartz scheduler (was already being used in Pinot Minions)

Testing:
Added unit tests in PeriodicTaskSchedulerTest for both the paths, i.e. for when a cron expression is used to schedule jobs and for when the fixed delay is used as the fallback in case cron expression is not supplied.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces cron-based scheduling for controller periodic tasks so they can run at wall-clock times independent of controller start time, while retaining the existing fixed-delay scheduler as a fallback.

Changes:

  • Add an optional cron expression to PeriodicTask/BasePeriodicTask and propagate it through controller periodic task constructors/config.
  • Update PeriodicTaskScheduler to schedule tasks via Quartz when a cron expression is present, otherwise use the existing fixed-delay execution.
  • Add unit tests covering the cron path and legacy fixed-delay path.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java Adds a new config key for ResponseStoreCleaner cron scheduling.
pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTask.java Adds getCronExpression() default method to the PeriodicTask API.
pinot-core/src/main/java/org/apache/pinot/core/periodictask/BasePeriodicTask.java Stores cron expression and exposes it via overridden getter.
pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskScheduler.java Integrates Quartz-based scheduling when cron is provided, otherwise fixed-delay.
pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskCronJob.java Adds a Quartz Job wrapper to execute PeriodicTask.run().
pinot-core/src/test/java/org/apache/pinot/core/periodictask/PeriodicTaskSchedulerTest.java Adds tests for cron scheduling and fixed-delay fallback.
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/periodictask/ControllerPeriodicTask.java Extends constructor to accept cron expression and passes it to BasePeriodicTask.
pinot-controller/src/test/java/org/apache/pinot/controller/helix/core/periodictask/ControllerPeriodicTaskTest.java Updates test instantiation to include cron expression parameter.
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java Wires cron expression from ControllerConf into periodic task construction.
pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeOffsetAutoResetManager.java Wires cron expression from ControllerConf into periodic task construction.
pinot-controller/src/main/java/org/apache/pinot/controller/validation/OfflineSegmentValidationManager.java Wires cron expression from ControllerConf into periodic task construction.
pinot-controller/src/main/java/org/apache/pinot/controller/validation/BrokerResourceValidationManager.java Wires cron expression from ControllerConf into periodic task construction.
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java Wires cron expression from ControllerConf into periodic task construction.
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/relocation/SegmentRelocator.java Wires cron expression from ControllerConf into periodic task construction.
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceChecker.java Wires cron expression from ControllerConf into periodic task construction.
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotTaskManager.java Updates constructor call to match new ControllerPeriodicTask signature (passes null cron).
pinot-controller/src/main/java/org/apache/pinot/controller/helix/SegmentStatusChecker.java Wires cron expression from ControllerConf into periodic task construction.
pinot-controller/src/main/java/org/apache/pinot/controller/helix/RealtimeConsumerMonitor.java Wires cron expression from ControllerConf into periodic task construction.
pinot-controller/src/main/java/org/apache/pinot/controller/cursors/ResponseStoreCleaner.java Wires cron expression from config into task construction.
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java Adds cronExpression config keys + getters/setters for multiple controller periodic tasks.
Comments suppressed due to low confidence (1)

pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskScheduler.java:75

  • start() warns when the scheduler is already started but continues executing, which can double-schedule tasks and (for Quartz) can lead to ObjectAlreadyExistsException / duplicate triggers. Consider returning early when _executorService != null.
  public synchronized void start() {
    if (_executorService != null) {
      LOGGER.warn("Periodic task scheduler already started");
    }

Comment thread pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java Outdated
@@ -69,28 +79,59 @@ public synchronized void start() {
Collection<PeriodicTask> periodicTasks = _periodicTasks.values();
LOGGER.info("Starting periodic task scheduler with tasks: {}", periodicTasks);
_executorService = Executors.newScheduledThreadPool(_periodicTasks.size());
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_executorService is sized to the total number of tasks even when some/all tasks are scheduled via Quartz. This can create unused threads. Consider sizing the thread pool based on the number of fixed-delay tasks (still keeping enough capacity for scheduleNow()).

Suggested change
_executorService = Executors.newScheduledThreadPool(_periodicTasks.size());
int numFixedDelayTasks = (int) periodicTasks.stream().map(PeriodicTask::getCronExpression)
.filter(cronExpression -> cronExpression == null || cronExpression.trim().isEmpty()).count();
// Keep at least one thread so scheduleNow() can still execute tasks even if all configured tasks use Quartz.
_executorService = Executors.newScheduledThreadPool(Math.max(1, numFixedDelayTasks));

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

@pri1712 pri1712 Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may cause starvation issues if the quartz scheduler is unable to schedule a cron job for any reason, and we have to fall back to the default fixed delay schedule

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 67.28972% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.45%. Comparing base (5b0b38c) to head (eedaee8).
⚠️ Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
...va/org/apache/pinot/controller/ControllerConf.java 36.36% 21 Missing ⚠️
...pinot/core/periodictask/PeriodicTaskScheduler.java 75.00% 8 Missing and 2 partials ⚠️
...e/pinot/core/periodictask/PeriodicTaskCronJob.java 76.92% 2 Missing and 1 partial ⚠️
...g/apache/pinot/core/periodictask/PeriodicTask.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18256      +/-   ##
============================================
+ Coverage     63.43%   63.45%   +0.01%     
  Complexity     1683     1683              
============================================
  Files          3253     3255       +2     
  Lines        198841   198933      +92     
  Branches      30795    30799       +4     
============================================
+ Hits         126136   126231      +95     
- Misses        62625    62627       +2     
+ Partials      10080    10075       -5     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.45% <67.28%> (+0.01%) ⬆️
temurin 63.45% <67.28%> (+0.01%) ⬆️
unittests 63.45% <67.28%> (+0.01%) ⬆️
unittests1 55.39% <76.27%> (+0.03%) ⬆️
unittests2 35.00% <40.18%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pri1712
Copy link
Copy Markdown
Author

pri1712 commented Apr 22, 2026

@xiangfu0 is cron schedule support for ResponseStoreCleaner task still needed? given that it has now been moved to the Broker, it was earlier a Controller task

Copy link
Copy Markdown
Contributor

@yashmayya yashmayya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is cron schedule support for ResponseStoreCleaner task still needed

Nope, that can be skipped given that it is no longer implementing this controller periodic job pattern.

@pri1712
Copy link
Copy Markdown
Author

pri1712 commented Apr 23, 2026

@yashmayya have made the needed changes

@pri1712 pri1712 requested a review from yashmayya April 23, 2026 07:34
@pri1712
Copy link
Copy Markdown
Author

pri1712 commented Apr 26, 2026

hi, just bumping this @xiangfu0 @Jackie-Jiang @yashmayya , would be great if you could review it whenever you have time! i have addressed previous comments

Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding 1

  • Severity: MAJOR
  • Rule: Config contract must not silently degrade
  • Where: pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskScheduler.java - start() cron scheduling path
  • Issue: When a task has a non-empty cron expression, any Quartz init/schedule failure, including an invalid cron expression, is caught and the task is silently scheduled with the legacy fixed-delay interval.
  • Risk: This makes a controller config look accepted while Pinot runs the task on a different cadence than the operator requested. For retention, validation, rebalance, relocation, and cleanup tasks, that can create surprising wall-clock behavior and makes misconfiguration hard to detect.
  • Suggested fix: Treat a configured cron expression as authoritative: validate it before scheduling and fail fast or leave the task unscheduled with an explicit config error/metric. Only use fixed-delay scheduling when the cron config is blank/unset.

Finding 2

  • Severity: MAJOR
  • Rule: New config keys must be wired or removed
  • Where: pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java - OFFLINE_SEGMENT_INTERVAL_CHECKER_CRON_EXPRESSION and TENANT_REBALANCE_CHECKER_CRON_EXPRESSION
  • Issue: These two cron config keys/getters are added, but the corresponding runtime paths never use them. OfflineSegmentValidationManager still only reads getOfflineSegmentIntervalCheckerFrequencyInSeconds() for the segment-level interval checker, and TenantRebalanceChecker still calls the 3-arg BasePeriodicTask constructor without getTenantRebalanceCheckerCronExpression().
  • Risk: Operators can set controller.offline.segment.interval.checker.cronExpression or controller.tenant.rebalance.checker.cronExpression and Pinot will silently ignore them. That is a config compatibility/support issue because the PR exposes knobs that do not affect scheduling.
  • Suggested fix: Either wire these configs into the owning tasks with tests proving the cron path is used, or remove the keys/getters from this PR until the corresponding scheduler semantics are implemented.

Finding 3

  • Severity: CRITICAL
  • Rule: CI must be green before merge
  • Where: GitHub checks for PR #18256
  • Issue: The latest check run has two failing compatibility jobs: Pinot Compatibility Regression Testing against master on compatibility-verifier/sample-test-suite and Pinot Multi-Stage Query Engine Compatibility Regression Testing against master on compatibility-verifier/multi-stage-query-engine-test-suite.
  • Risk: Even if these failures are unrelated/flaky, the PR is currently red on compatibility coverage and should not merge until the failures are explained, fixed, or cleanly rerun.
  • Suggested fix: Investigate or rerun the failing jobs and update the PR once compatibility checks are green.

@xiangfu0
Copy link
Copy Markdown
Contributor

also please rebase to the latest master branch

@pri1712
Copy link
Copy Markdown
Author

pri1712 commented Apr 29, 2026

@xiangfu0
finding 1: addressed the comments, cron expressions now fail fast and an error is logged rather than falling back to the default method silently.

finding 2: OFFLINE_SEGMENT_INTERVAL_CHECKER_CRON_EXPRESSION was already being used in the OfflineSegmentValidationManager , fixed the issue wherein TENANT_REBALANCE_CHECKER_CRON_EXPRESSION was silently being discarded.

have also rebased the PR with latest commits on master, hopefully should fix CI issues if any.

@pri1712 pri1712 requested a review from xiangfu0 April 29, 2026 05:52
@xiangfu0
Copy link
Copy Markdown
Contributor

Reviewed latest head eedaee89c6fe250cdf8a80f43668cb44bfe9c428. The tenant checker wiring is fixed, but two cron config/lifecycle issues remain.

Finding 1

  • Severity: MAJOR
  • Rule: Configured cron tasks must either schedule successfully or fail startup
  • Where: pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskScheduler.java - start() cron path
  • Issue: Quartz init failure is only logged, then the scheduler continues with _quartzScheduler unset. Cron job scheduling/parsing failures are also caught and only logged after periodicTask.start() has already run. In both cases the controller continues startup with the configured cron task not scheduled and no fixed-delay fallback.
  • Risk: A controller can start with retention, validation, rebalance, relocation, or cleanup tasks disabled simply because a cron expression or Quartz setup failed. This is a controller lifecycle/config-contract issue; blast radius is controller periodic maintenance tasks, not pinot-spi or wire protocol.
  • Suggested fix: Validate every non-empty cron expression and create/start Quartz before starting tasks. If Quartz init or any cron scheduling fails, throw from start() or reject the config earlier, and clean up the executor/started tasks. Add a regression test for invalid cron expression proving startup fails and the task is not left started.

Finding 2

  • Severity: MAJOR
  • Rule: New config keys must be wired or removed
  • Where: pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java - OFFLINE_SEGMENT_INTERVAL_CHECKER_CRON_EXPRESSION; OfflineSegmentValidationManager constructor and shouldRunSegmentValidation()
  • Issue: controller.offline.segment.interval.checker.cronExpression still has no runtime effect. The manager uses getOfflineSegmentValidationCronExpression() for the whole periodic task, and the interval-checker path still only reads getOfflineSegmentIntervalCheckerFrequencyInSeconds() to decide when segment-level validation should run.
  • Risk: Operators can set the new interval-checker cron key and Pinot will silently ignore it. This exposes a config contract that does not control scheduling, and it is especially confusing because the similarly named controller.offline.segment.validation.cronExpression does work for the outer task cadence.
  • Suggested fix: Remove OFFLINE_SEGMENT_INTERVAL_CHECKER_CRON_EXPRESSION and its getter/setter from this PR unless the inner segment-level validation cadence is actually made cron-aware. If implementing it now, add a focused test that sets the interval-checker cron key and proves segment-level validation follows that schedule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow to use cron to schedule the controller periodical tasks

5 participants