Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed Up delete process from Web UI #12826

Closed
wants to merge 1 commit into from
Closed

Conversation

darrkz
Copy link
Contributor

@darrkz darrkz commented Nov 9, 2022

Purpose of the pull request

Speed Up delete process from Web UI

Brief change log

add an index on table t_ds_task_instance

create index idx_task_instance_process_instance_flag on t_ds_task_instance (process_instance_id, flag);

I use postgresql as my dolphinscheduler meta data base

Because I have some process run in each 5 minutes , so lead much process instance and task instance in the database , the most top instance of task node may reach 2000 and the total t_ds_task_instance table hold nearly 1000000 records in it.

I had try delete process or process instance from the web ui. But I found it too slow and lead my postgresql server cpu usage too high, and cost nearly about one hours I foud the sql is running long time is :

select                                                                                                             

                                                                                                                   

        id, name, task_type, process_instance_id, task_code, task_definition_version, state, submit_time,          

        start_time, end_time, host, execute_path, log_path, alert_flag, retry_times, pid, app_link,                

        flag, retry_interval, max_retry_times, task_instance_priority, worker_group,environment_code , executor_id,

        first_submit_time, delay_time, task_params, var_pool, dry_run                                              

                                                                                                                   

        from t_ds_task_instance                                                                                    

        WHERE process_instance_id = $1                                                                             

        and flag = $2                                                                                              

        order by start_time desc                                                                                   


then I try to add the index , and try delete the related process definition or process instance , it Just finsh in few seconds !!!

Verify this pull request

then I try to add the index , and try delete the related process definition or process instance , it Just finsh in few seconds !!!

I use postgresql as my dolphinscheduler meta data base

Because I have some process run in each 5 minutes , so lead much process instance and task instance in the database , the most top instance of  task node may reach 2000 and the total t_ds_task_instance table hold nearly 1000000 records in it.

I had try delete process or process instance from the web ui.
But I found it too slow and lead my postgresql server cpu usage too high, and cost nearly about one hours
I foud the sql is running long time is :
```
select                                                                                                             

                                                                                                                   

        id, name, task_type, process_instance_id, task_code, task_definition_version, state, submit_time,          

        start_time, end_time, host, execute_path, log_path, alert_flag, retry_times, pid, app_link,                

        flag, retry_interval, max_retry_times, task_instance_priority, worker_group,environment_code , executor_id,

        first_submit_time, delay_time, task_params, var_pool, dry_run                                              

                                                                                                                   

        from t_ds_task_instance                                                                                    

        WHERE process_instance_id = $1                                                                             

        and flag = $2                                                                                              

        order by start_time desc                                                                                   


```

then I try to add the index  , and try delete the related process definition or process instance , it Just finsh in few seconds !!!
@mergeable
Copy link

mergeable bot commented Nov 9, 2022

⚠️ This PR do not change database DDL synchronize.

@codecov-commenter
Copy link

codecov-commenter commented Nov 9, 2022

Codecov Report

Merging #12826 (68f541d) into dev (04c6b5e) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##                dev   #12826      +/-   ##
============================================
- Coverage     39.13%   39.11%   -0.02%     
+ Complexity     4212     4211       -1     
============================================
  Files          1046     1046              
  Lines         39765    39765              
  Branches       4576     4576              
============================================
- Hits          15562    15555       -7     
- Misses        22433    22440       +7     
  Partials       1770     1770              
Impacted Files Coverage Δ
...erver/master/processor/queue/TaskEventService.java 69.64% <0.00%> (-10.72%) ⬇️
...r/plugin/task/sqoop/parameter/SqoopParameters.java 51.19% <0.00%> (-1.20%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -796,6 +797,7 @@ CREATE TABLE t_ds_task_instance (
) ;

create index idx_task_instance_code_version on t_ds_task_instance (task_code, task_definition_version);
create index idx_task_instance_process_instance_flag on t_ds_task_instance (process_instance_id, flag);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only need to change postgresql, or did we miss adding to h2 and mysql?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also add to specific update directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just use postgresql

and I found there are some foreign key in mysql
I think it could be removed to improve

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just use postgresql

and I found there are some foreign key in mysql I think it could be removed to improve

But as a PR, you should also add your change to other type of database, such as mysql and H2 to keep synchronous.
Also, you should add the index add in https://github.com/apache/dolphinscheduler/blob/489e7fe4e2d980031c91eac2a4823b9f1713659a/dolphinscheduler-dao/src/main/resources/sql/upgrade/3.2.0_schema/postgresql/dolphinscheduler_ddl.sql and https://github.com/apache/dolphinscheduler/blob/489e7fe4e2d980031c91eac2a4823b9f1713659a/dolphinscheduler-dao/src/main/resources/sql/upgrade/3.2.0_schema/mysql/dolphinscheduler_ddl.sql

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darkz1984 you did a good job,

can you add these changes to mysql and H2 to keep the db file sync, I think it's not difficult for you ^_^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add an index for a big t_ds_task_instance with much rows online when dolphinscheduler is running in PostgreSQL is very easy and just cost few milliseconds.
But this action in MySQL is very danger, It may lead table lock and long time cost, My be an production accident.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darkz1984 you did a good job,

can you add these changes to mysql and H2 to keep the db file sync, I think it's not difficult for you ^_^

I had try mysql :
#13080

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#13081
the post upgrade ddl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think H2 database has no index, so, no need to add index for it.

@SbloodyS SbloodyS added the improvement make more easy to user or prompt friendly label Nov 9, 2022
@SbloodyS SbloodyS added this to the 3.1.2 milestone Nov 9, 2022
@sonarcloud
Copy link

sonarcloud bot commented Nov 9, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@davidzollo davidzollo added the Waiting for user feedback Waiting for feedback from issue/PR author label Nov 27, 2022
darrkz added a commit to darrkz/dolphinscheduler that referenced this pull request Dec 2, 2022
Speed Up delete process from Web UI apache#12826
darrkz added a commit to darrkz/dolphinscheduler that referenced this pull request Dec 2, 2022
@darrkz darrkz requested review from davidzollo and zhongjiajie and removed request for davidzollo and zhongjiajie December 2, 2022 12:44
@ruanwenjun
Copy link
Member

Every workflow instance has a distinct id, you will rerun the workflow instance in every 5 minutes?

In general, the index process_instance_id is enough.

@darrkz
Copy link
Contributor Author

darrkz commented Dec 3, 2022

Every workflow instance has a distinct id, you will rerun the workflow instance in every 5 minutes?

In general, the index process_instance_id is enough.

But delete process instance will delete the task instance casde, and the task instance may be with a huge number of rows in it. In my system, it reach 10,000 of rows one day, and total reach 4,000,000 rows. So it need to add this index to speed up delete the task instance with this index

@ruanwenjun
Copy link
Member

Every workflow instance has a distinct id, you will rerun the workflow instance in every 5 minutes?
In general, the index process_instance_id is enough.

But delete process instance will delete the task instance casde, and the task instance may be with a huge number of rows in it. In my system, it reach 10,000 of rows one day, and total reach 4,000,000 rows. So it need to add this index to speed up delete the task instance with this index

We delete the task instance by process_instance_id, I am not clear what is your problem.

@EricGao888
Copy link
Member

@darkz1984 I don't doubt that creating an index on the combination of (process_instance_id, flag) for table t_ds_task_instance could significantly improve the querying efficiency of the sql sentence you post in this issue, which you used to manually deleted the stuff from db. But I don't think this index could improve any of the queries in TaskInstanceMapper.xml. It is not meaningful for the upstream code.

https://github.com/apache/dolphinscheduler/blob/dev/dolphinscheduler-dao/src/main/resources/org/apache/dolphinscheduler/dao/mapper/TaskInstanceMapper.xml

davidzollo pushed a commit that referenced this pull request Dec 3, 2022
@darrkz
Copy link
Contributor Author

darrkz commented Dec 4, 2022

@EricGao888 @ruanwenjun
I detect this problem in my dolphinscheduler, and delete the process instance too slow, may be more than 10 minutes.
And i follow the database process of dolphinscheduler, see this slow database query.
Then I add this index, the action end in some seconds.

I have some process in dolphinscheduler run each 5 minutes, so some process definition has much process instance and mo task instance.
About 20,000+ task instance of the process definition. So the delete of process definition too slow.
if the process in dolphinscheduler just run every day, delete of the process definition is fast.

@ruanwenjun
Copy link
Member

ruanwenjun commented Dec 4, 2022

@darkz1984 There already exist process_instance_id in t_ds_task_instance, you don't need to add again.

The problem is we will query the validate task instance when delete, we don't need to query the validate task instance, we need to directly delete by process_instance_id.

@ruanwenjun ruanwenjun closed this Dec 4, 2022
@ruanwenjun
Copy link
Member

ruanwenjun commented Dec 4, 2022

@darkz1984 @EricGao888 I close this PR since the index(process_instance_id, flag) is not needed, and I find the logic in our code has a bug, when we delete the task instance by workflow instance id, we will query the validate task by tag, and then delete the query result, this will cause the invalidate task instance will never be deleted.

I submit #13091 to fix this.

@EricGao888
Copy link
Member

@darkz1984 @EricGao888 I close this PR since the index(process_instance_id, flag) is not needed, and I find the logic in our code has a bug, when we delete the task instance by workflow instance id, we will query the validate task by tag, and then delete the query result, this will cause the invalidate task instance will never be deleted.

I submit #13091 to fix this.

Sounds good to me. Thanks : )

@darrkz
Copy link
Contributor Author

darrkz commented Dec 4, 2022

@darkz1984 @EricGao888 I close this PR since the index(process_instance_id, flag) is not needed, and I find the logic in our code has a bug, when we delete the task instance by workflow instance id, we will query the validate task by tag, and then delete the query result, this will cause the invalidate task instance will never be deleted.

I submit #13091 to fix this.

OK, good news

@zhuangchong zhuangchong modified the milestones: 3.1.2, 3.2.0 Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend improvement make more easy to user or prompt friendly priority:low Waiting for user feedback Waiting for feedback from issue/PR author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants