Skip to content

Regularly execute archive_dag_instances function #818

@kevinwallimann

Description

@kevinwallimann

Background

In #721, the archive_dag_instances SQL function was added. It moves the oldest dag instances (and related objects) to an archive table, so that querying the dag_instance doesn't slow down over time. By default, the last 6 months are kept in the dag_instance table and are queriable by end users.

This query can take a long time (minutes - 10s of minutes), depending on the number of objects to be removed. However, executing it more frequently should reduce the execution time of a single query to seconds and not impact the database.

Currently, the archive_dag_instances function needs to be invoked manually. This is obviously not ideal. Therefore, a scheduled job should invoke the function in a regular interval (e.g. every Sunday).

Feature

  • Create a cron-job (e.g. with the Quartz library) that calls the archive_dag_instances function.

  • The cron-job should take retention days as a parameter and use it for the second parameter of the archive_dag_instances function, which is i_to_ts. i_to_ts should be set to now() - '${retentionDays} days'::interval

  • The output of the archive_dag_instances function should be captured and sent to the configurable notification recipients.

  • The total execution time of the query should be reported as well. It doesn't have to be exact, it can also include the roundtrip time to the database.

  • New application parameters to add

    • housekeeping.archiveDags.schedule (default 0 0 4 ? * SUN *)
    • housekeeping.archiveDags.retentionDays (default 180)
    • housekeeping.archiveDags.notificationRecipients (default [])
  • Only one of the scheduler instances should execute the function! This could be achieved using an optimistic locking technique with a auxiliary database table or other techniques. While skipping an archival is not catastrophic, the archive function should not be called concurrently.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions