[DSIP-104][Alert] Support Absolute Time SLA Monitoring (Start/End Time)

### Search before asking

- [x] I had searched in the [DSIP](https://github.com/apache/dolphinscheduler/issues/14102) and found no similar DSIP.


### Motivation

Apache DolphinScheduler currently provides "Timeout Alarms" based on **relative duration** (e.g., alerting if a task runs longer than 30 minutes). However, production SLAs are typically defined by **absolute wall-clock time**.

**Problem Statement:**

* **Business Deadline:** Many pipelines must complete by a specific time (e.g., 08:00 AM) to meet downstream business reports.
* **Delayed Start:** Critical tasks must start by a certain time (e.g., 02:00 AM). If they are stuck in the queue or delayed by upstream dependencies, the system should alert before the "end-time" is even reached.
* **Observability Gap:** There is currently no persistent record of SLA violations, making it difficult to generate SLA compliance reports (e.g., "What percentage of tasks finished by 09:00 AM last month?").

Introducing absolute time SLA monitoring and a dedicated violation record table will provide better governance and auditability for critical data pipelines.


### Design Detail

**1. Metadata Configuration:**
Add the following fields to `t_ds_workflow_definition` and `t_ds_task_definition`:

* `expected_start_time`: Absolute time the instance must start (e.g., `02:00`).
* `expected_end_time`: Absolute time the instance must finish (e.g., `08:00`).

**2. SLA Record Table:**
Create a new table **`t_ds_sla_violation`** to persist every breach event.
Suggested schema:

* `id`: Primary Key.
* `workflow_definition_code`: The code of the workflow.
* `instance_id`: ID of the workflow/task instance (if created).
* `violation_type`: Enum (`START_TIME_BREACH`, `END_TIME_BREACH`).
* `expected_time`: The configured SLA time.
* `actual_time`: The time when the violation was detected.
* `creation_time`: Audit timestamp.

**3. Monitoring Logic (SLA Monitor Thread):**
The Master Server will run a background thread that periodically:

* **Scans Definitions:** Identifies workflows/tasks with active SLA configurations.
* **Evaluation:**
* **Start-Time:** If `Current Time > expected_start_time` AND (no instance exists OR instance is still `SUBMITTED`/`WAITING`).
* **End-Time:** If `Current Time > expected_end_time` AND instance status is not `SUCCESS`.


* **Action:** * Trigger an `SLA_ALARM` via the Alert Server.
* Insert a record into `t_ds_sla_violation` for persistence and UI display.

### Compatibility, Deprecation, and Migration Plan

* **Compatibility:** Fully backward compatible. Workflows without these fields defined will skip the SLA check.
* **Database Migration:**
* Add `sla_start_time` and `sla_end_time` columns to definition tables.
* New DDL for table `t_ds_sla_violation`.

### Test Plan

* **Functional Testing:**
* Verify that if a task remains in the `DELAY` or `SERIAL_WAIT` state past its `expected_start_time`, a violation record is created and an alert is sent.
* Verify that if a task is still `RUNNING` past its `expected_end_time`, the system detects the breach.


* **Persistence Testing:**
* Check if the `t_ds_sla_violation` table correctly records the `instance_id` (if applicable) and the type of breach.


* **Edge Case Testing:**
* **Cross-day monitoring:** Test a workflow with a start time of 23:30 and an end time of 01:30 (next day).
* **Frequency:** Ensure the monitor thread doesn't create duplicate violation records for the same instance in a single cycle.


### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DSIP-104][Alert] Support Absolute Time SLA Monitoring (Start/End Time) #17836

Search before asking

Motivation

Design Detail

Compatibility, Deprecation, and Migration Plan

Test Plan

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DSIP-104][Alert] Support Absolute Time SLA Monitoring (Start/End Time) #17836

Description

Search before asking

Motivation

Design Detail

Compatibility, Deprecation, and Migration Plan

Test Plan

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions