Skip to content

[GOBBLIN-2264] Key deadline reminders by dagAction only so DELETE events can unschedule them#4194

Merged
Blazer-007 merged 1 commit into
apache:masterfrom
pratapaditya04:GOBBLIN-2264
May 19, 2026
Merged

[GOBBLIN-2264] Key deadline reminders by dagAction only so DELETE events can unschedule them#4194
Blazer-007 merged 1 commit into
apache:masterfrom
pratapaditya04:GOBBLIN-2264

Conversation

@pratapaditya04
Copy link
Copy Markdown
Contributor

@pratapaditya04 pratapaditya04 commented May 18, 2026

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

Problem. Deadline reminders for ENFORCE_JOB_START_DEADLINE / ENFORCE_FLOW_FINISH_DEADLINE could not be cancelled by DagManagementDagActionStoreChangeMonitor on a DELETE event:

  • The Quartz JobKey for every reminder embedded the lease event time (see createDagActionReminderKey on master), but the DagActionStoreChangeEvent payload for a DELETE row carries no event time. The DELETE branch in the change monitor was therefore commented out with a TODO ("skip deadline removal for now and let them fire"), leaving deadline reminders to fire on already-resolved actions.
  • Separately, DagProcUtils.sendEnforceJobStartDeadlineDagAction / sendEnforceFlowFinishDeadlineDagAction recover from SQLIntegrityConstraintViolationException by deleting the existing row and reinserting it. If the resulting DELETE + INSERT change events arrived out-of-order on the consumer, an unschedule attempt could miss the still-live old reminder and the new INSERT would then collide on the Quartz key, throwing ObjectAlreadyExistsException and losing the new deadline.

Change.

  • DagActionReminderScheduler.createDagActionReminderKey(LeaseParams, boolean isDeadlineReminder): deadline reminders are now keyed only by (flowGroup, flowName, flowExecutionId, jobName, dagActionType) — event time is intentionally omitted. The DagActionStore primary key already guarantees uniqueness over that tuple for ENFORCE_*_DEADLINE, so at most one deadline reminder can ever be in flight for that tuple. Retry reminders keep the event-time suffix (multi-KILL / multi-RESUME still need to coexist by event time).
  • New overload unscheduleReminderJob(DagActionStore.DagAction) that the DELETE handler can invoke without needing the original event time. The existing (LeaseParams, boolean) overload is preserved.
  • scheduleReminder performs a replace-on-conflict for deadline reminders: if a JobKey already exists at schedule time, the pre-existing job is deleted before the new one is scheduled — defends against the duplicate-insert + out-of-order race described above.
  • DagManagementDagActionStoreChangeMonitor DELETE branch is activated for ENFORCE_*_DEADLINE. Retry reminders are intentionally not unscheduled here because their key embeds event time; orphaned retries fire harmlessly — when the reminder fires, ReminderJob.execute calls DagManagement.addDagAction which routes through MysqlMultiActiveLeaseArbiter.tryAcquireLease; with the row already gone, the arbiter completes the lifecycle without re-triggering downstream work.

Backwards compatibility.

  • Quartz scheduler is constructed with StdSchedulerFactory(properties) setting only instanceName and threadCount; the in-tree conf/*/quartz.properties all use org.quartz.simpl.RAMJobStore. No reminders are persisted across restart, so the JobKey format change has zero migration risk.
  • The static createDagActionReminderKey(LeaseParams) signature gained a required boolean parameter. All in-repo callers (createJobKey, createTriggerKey, and DagActionReminderSchedulerTest) have been updated. No usages outside gobblin-service/ and the gobblin-kafka-09 test.
  • unscheduleReminderJob(LeaseParams, boolean) is preserved (still used internally for retry-reminder lifecycle).

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

  • DagActionReminderSchedulerTest.testCreateDagActionReminderKeyForDeadlineOmitsEventTime — asserts the deadline key is (flowGroup, flowName, flowExecutionId, jobName, dagActionType) only and that two LeaseParams for the same dagAction with different event times produce the same deadline key.

  • DagActionReminderSchedulerTest.testScheduleDeadlineReminderReplacesExistingEntry — exercises the duplicate-insert + out-of-order path: schedules a deadline reminder, then schedules a second reminder with a different event time for the same dagAction, and asserts no exception is thrown and the reminder is still present after the replace.

  • DagActionReminderSchedulerTest.testUnscheduleDeadlineReminderByDagAction — covers the new unscheduleReminderJob(DagAction) overload and asserts idempotency (a second unschedule of an already-removed reminder must not throw).

  • DagActionReminderSchedulerTest.testCreateDagActionReminderKey updated to call the new (LeaseParams, boolean) signature for retry-reminder keys.

  • DagManagementDagActionStoreChangeMonitorTest.testProcessMessageWithDelete — re-enables the previously TODO-disabled verify(...) and adapts it to the new unscheduleReminderJob(DagAction) overload, now that the DELETE branch is no longer commented out.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@pratapaditya04 pratapaditya04 changed the title GOBBLIN-2264: Key deadline reminders by dagAction only so DELETE events can unschedule them [GOBBLIN-2264] Key deadline reminders by dagAction only so DELETE events can unschedule them May 18, 2026
…nts can unschedule them

Deadline reminder JobKeys previously embedded the lease event time, but the
DagActionStore DELETE event payload omits event time, so the change monitor
could not cancel a still-pending deadline reminder for a dagAction whose row
had already been resolved.

Re-key deadline reminders by (flowGroup, flowName, flowExecutionId, jobName,
dagActionType) only and add an unscheduleReminderJob(DagAction) overload that
the DELETE handler can invoke. Replace any pre-existing deadline reminder on
schedule to avoid ObjectAlreadyExistsException when the delete-then-reinsert
duplicate-insert path in DagProcUtils.sendEnforce*DeadlineDagAction races with
an out-of-order change event.

Retry reminders still embed event time and are intentionally not unscheduled
on DELETE; their JobKey is not derivable from the DELETE payload, and orphaned
retries fire harmlessly because the lease arbiter sees the row is gone.

Re-enable the previously TODO-disabled DELETE assertion in
DagManagementDagActionStoreChangeMonitorTest to cover the activated code path.
@Blazer-007 Blazer-007 merged commit d62931e into apache:master May 19, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants