-
Notifications
You must be signed in to change notification settings - Fork 3.7k
branch-3.0: [opt](metrics) Remove IntervalHistogramStat #47459 #47606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
branch-3.0: [opt](metrics) Remove IntervalHistogramStat #47459 #47606
Conversation
### What problem does this PR solve? Use prometheus to calculate average value is better. Related PR: apache#43144 For example, we use `task_execution_time_ns_avg_in_last_1000_times` which is equal to `SUM(cost 0, ... cost 999) / 1000` to represent average execution time, it has two problems: 1. Update of its data source `_task_execution_time_ns_statistic` acquires lock. 2. Result of `task_execution_time_ns_avg_in_last_1000_times` is not zero if we just finished a set of tasks and no more tasks to run. For example, we have a continuous straight line after all tasks have finished for a while. <img width="416" alt="image" src="https://github.com/user-attachments/assets/e874a077-1e74-4700-9dd8-4cf9625bc8f8" /> The problem can be fixed by: 1. Using `task_execution_time_ns_total` an atomic counter to store total sum of execution time of each iteration. 2. With the help of `irate` function of prometheus, we can have an equivalent substitution like `irate(doris_be_task_execution_time_ns_total[$__rate_interval])/doris_be_thread_pool_active_threads` <img width="349" alt="image" src="https://github.com/user-attachments/assets/2b0b41bd-8709-4cab-84c1-cf11c4cc3ac9" /> After all tasks finished, the curve will be zero, this is more reasonable.
|
run buildall |
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
TPC-H: Total hot run time: 40829 ms |
TPC-DS: Total hot run time: 197589 ms |
ClickBench: Total hot run time: 32.67 s |
|
We're closing this PR because it hasn't been updated in a while. |
cherry pick from #47459