Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement]: Optimizer has a performance bottleneck #975

Closed
2 of 3 tasks
nicochen opened this issue Jan 3, 2023 · 6 comments
Closed
2 of 3 tasks

[Improvement]: Optimizer has a performance bottleneck #975

nicochen opened this issue Jan 3, 2023 · 6 comments

Comments

@nicochen
Copy link
Contributor

nicochen commented Jan 3, 2023

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

When there are too many optimize tasks, even if the optimizer consumes the optimize task with multiple parallelism, the efficiency is limited. The main reason for this problem is that each optimize only puts 10 tasks into the optimize group at a time.
image
image

image

How should we improve?

Support configuration of this property, or adaptive dynamic adjustment of this property.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

@zhoujinsong
Copy link
Contributor

zhoujinsong commented Jan 4, 2023

@nicochen I have found the value of OptimizeQueueService#MAX_POOL_TASK_CNT had been setted to 50 in v0.4.0.
But I still agree that we should support setting this value in AMS config files.

@majin1102 @wangtaohz How do you think?

@wangtaohz
Copy link
Contributor

wangtaohz commented Jan 4, 2023

I have found the value of OptimizeQueueService#MAX_POOL_TASK_CNT had been setted to 50 in v0.4.0.
But I still agree that we should support setting this value in AMS config files.

We put up to 10 or 50 tasks into optimize group at a time to avoid a table taking up all quotas, the optimizers may execute 10 or 50 tasks of a table and then turn to execute tasks of other tables.
However, the tasks of a table can't be committed before all tasks are executed and set to prepared, that's why a table takes a long time to finish the Optimize when there are many tasks.

I think we should modify the process above, and execute all tasks of a table at a time, then we can commit these tasks as soon as possible, and then we may not need a config like OptimizeQueueService#MAX_POOL_TASK_CNT.

Also, executing all tasks of a table at a time is consistent with the idea of this issue #946.

@zhoujinsong
Copy link
Contributor

zhoujinsong commented Jan 4, 2023

@wangtaohz
Yes, we should limit tasks number when planning if we are afraid of a table taking too much optimizing process resources at a time rather than limiting the tasks put into optimizing group queue.

Of course, we may not need to limit the planning task number for now.

@shidayang
Copy link
Contributor

I agree to remove OptimizeQueueService#MAX_POOL_TASK_CNT. Prioritizing the optimization of a table is the most efficient way

@wangtaohz
Copy link
Contributor

Yes, we should limit tasks number when planning if we are afraid of a table taking too much optimizing process resources at a time rather than limiting the tasks put into optimizing group queue.
Of course, we may not need to limit the planning task number for now.

I agree with this.

@zhoujinsong
Copy link
Contributor

Completed by #993.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants