-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][Task instance] Automatic cleaning of database task instance table #16145
Comments
I think you need to give a more detailed design. Configuring on system level / workflow definition level both have pros and cons. The question is too general for reviewers to answer without a detailed design. |
BTW, you need to submit a DISP for any potential modifications on DS core. |
I recommend using mysql |
good idea, please design a proposal first |
Plan One: Configure Workflow: # black list
delete from t_ds_task_instance
where (project_code in ($project_code)
or process_instance_id in (select id
from t_ds_process_instance
where process_definition_code in ($process_definition_code)))
and start_time < NOW() - INTERVAL 3 MONTH;
# white list
delete from t_ds_task_instance
where (project_code not in ($project_code) or
process_instance_id not in (select id from t_ds_process_instance
where process_definition_code in ($process_definition_code)))
and start_time < NOW() - INTERVAL 3 MONTH; Choose either a blacklist or a whitelist, retain data from the past three months, modify as needed.fill in the corresponding parameters for $project_code and $process_definition_code, and remove the corresponding conditions if they are not needed, separated by English commas. Plan Two: Integration at the System Level: |
How about this plan? After the workflow development is completed, it will be added to the initialization script of DS's database, along with detailed documentation, so that all users can easily use it with simple configuration. |
I used a python script to connect to the metadatabase to delete the data in the t_ds_task_instance table, which will cause the scheduler of the workflow to be invalid and the workflow cannot be offline. Have you done a test ?my version:3.0.1, |
Simply deleting these tables is not enough. There are already workflow definitions and workflow instances api for deleting. We can create a sample workflow that contains the data that needs to be deleted according to certain filtering criteria and call the deletion API to delete the data. I think this is a good practice for both user intrusion and iterative updates. |
I understand, but there are some data operations involved, such as the need to query all workflow_definitions_id for the whitelist, and DS does not provide a relevant interface. Should we add a specific interface or directly query using SQL? Both options seem a bit troublesome. Which method is better? |
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs. |
Why do you think python script is better? DolphinScheduler is a java project. Using a unified language to implement functionality is a basic requirement. |
I have the same problem. I wonder if I can start a scheduled task in the code to perform the cleanup.
|
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs. |
This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future. |
Search before asking
Description
After long-term use of DolphinScheduler, the task instance table accumulates a large amount of data, which can lead to slow queries. It would be beneficial to have a feature that automatically cleans up logs, with a precise whitelist/blacklist mechanism at the project/ProcessDefinition level. After simple configuration, it could automatically clean up periodically (every few months).
I would like to contribute this feature. I have a question: would it be more appropriate for this feature to be integrated as a system-level function of DS or to be configured as ProcessDefinition?
Use case
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: