-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](memory) Calculate workload group weighted memory limit #38494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
TPC-H: Total hot run time: 41713 ms |
TPC-DS: Total hot run time: 170306 ms |
ClickBench: Total hot run time: 29.76 s |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
TPC-H: Total hot run time: 41766 ms |
TPC-DS: Total hot run time: 170811 ms |
ClickBench: Total hot run time: 30.13 s |
mrhhsg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…38494) when construct workload group, mem_limit is equal to (process_memory_limit * group_limit_percent), here, it is assumed that the available memory of workload groups is equal to process_memory_limit. but process_memory_usage is actually bigger than all_workload_groups_mem_usage, because public_memory of page cache, allocator cache, segment cache etc. are included in process_memory_usage. so actual available memory of the workload groups is equal to (process_memory_limit - public_memory) we will exclude this public_memory when calculate workload group mem_limit. so a ratio is calculated to multiply the workload group mem_limit from the previous construction. if all_workload_groups_mem_usage is greater than process_memory_usage, it means that the memory statistics of the workload group are inaccurate. the reason is that query/load/etc. tracked is virtual memory, and virtual memory is not used in time. At this time, weighted_memory_limit_ratio is equal to 1, and workload group mem_limit is still equal to (process_memory_limit * group_limit_percent), this may cause query spill to occur earlier, However, there is no good solution at present, but we cannot predict when these virtual memory will be used. ``` Process Memory Summary: process memory used 10.14 GB(= 10.35 GB[vm/rss] - 217.76 MB[tc/jemalloc_cache] + 0[reserved] + 0B[waiting_refresh]), sys available memory 101.36 GB(= 101.36 GB[proc/available] - 0[reserved] - 0B[waiting_refresh]), all workload groups memory usage: 2.38 KB, weighted_memory_limit_ratio: 0.9700175477872811 I20240729 20:18:47.099756 2520164 workload_group_manager.cpp:253] Workload Group normal: mem limit: 169.12 GB, mem used: 2.38 KB, weighted mem limit: 164.05 GB, used ratio: 1.3807033383519938e-08, query count: 1, query spill threshold: 164.05 GB Query Memory Summary: MemTracker Label=Query#Id=67f945a17fb44483-ab4c0b1ff52acb3e, Parent Label=-, Used=2.38 KB, SpillThreshold=164.05 GB, Peak=2.38 KB ```
when construct workload group, mem_limit is equal to
(process_memory_limit * group_limit_percent),
here, it is assumed that the available memory of workload groups is
equal to process_memory_limit.
but process_memory_usage is actually bigger than
all_workload_groups_mem_usage,
because public_memory of page cache, allocator cache, segment cache etc.
are included in process_memory_usage.
so actual available memory of the workload groups is equal to
(process_memory_limit - public_memory)
we will exclude this public_memory when calculate workload group
mem_limit.
so a ratio is calculated to multiply the workload group mem_limit from
the previous construction.
if all_workload_groups_mem_usage is greater than process_memory_usage,
it means that the memory statistics of the workload group are
inaccurate.
the reason is that query/load/etc. tracked is virtual memory, and
virtual memory is not used in time.
At this time, weighted_memory_limit_ratio is equal to 1, and workload
group mem_limit is still equal to (process_memory_limit *
group_limit_percent), this may cause query spill to occur earlier,
However, there is no good solution at present, but we cannot predict
when these virtual memory will be used.
```
Process Memory Summary: process memory used 10.14 GB(= 10.35 GB[vm/rss] - 217.76 MB[tc/jemalloc_cache] + 0[reserved] + 0B[waiting_refresh]), sys available memory 101.36 GB(= 101.36 GB[proc/available] - 0[reserved] - 0B[waiting_refresh]), all workload groups memory usage: 2.38 KB, weighted_memory_limit_ratio: 0.9700175477872811
I20240729 20:18:47.099756 2520164 workload_group_manager.cpp:253]
Workload Group normal: mem limit: 169.12 GB, mem used: 2.38 KB, weighted mem limit: 164.05 GB, used ratio: 1.3807033383519938e-08, query count: 1, query spill threshold: 164.05 GB
Query Memory Summary:
MemTracker Label=Query#Id=67f945a17fb44483-ab4c0b1ff52acb3e, Parent Label=-, Used=2.38 KB, SpillThreshold=164.05 GB, Peak=2.38 KB
```
when construct workload group, mem_limit is equal to
(process_memory_limit * group_limit_percent),
here, it is assumed that the available memory of workload groups is
equal to process_memory_limit.
but process_memory_usage is actually bigger than
all_workload_groups_mem_usage,
because public_memory of page cache, allocator cache, segment cache etc.
are included in process_memory_usage.
so actual available memory of the workload groups is equal to
(process_memory_limit - public_memory)
we will exclude this public_memory when calculate workload group
mem_limit.
so a ratio is calculated to multiply the workload group mem_limit from
the previous construction.
if all_workload_groups_mem_usage is greater than process_memory_usage,
it means that the memory statistics of the workload group are
inaccurate.
the reason is that query/load/etc. tracked is virtual memory, and
virtual memory is not used in time.
At this time, weighted_memory_limit_ratio is equal to 1, and workload
group mem_limit is still equal to (process_memory_limit *
group_limit_percent), this may cause query spill to occur earlier,
However, there is no good solution at present, but we cannot predict
when these virtual memory will be used.
```
Process Memory Summary: process memory used 10.14 GB(= 10.35 GB[vm/rss] - 217.76 MB[tc/jemalloc_cache] + 0[reserved] + 0B[waiting_refresh]), sys available memory 101.36 GB(= 101.36 GB[proc/available] - 0[reserved] - 0B[waiting_refresh]), all workload groups memory usage: 2.38 KB, weighted_memory_limit_ratio: 0.9700175477872811
I20240729 20:18:47.099756 2520164 workload_group_manager.cpp:253]
Workload Group normal: mem limit: 169.12 GB, mem used: 2.38 KB, weighted mem limit: 164.05 GB, used ratio: 1.3807033383519938e-08, query count: 1, query spill threshold: 164.05 GB
Query Memory Summary:
MemTracker Label=Query#Id=67f945a17fb44483-ab4c0b1ff52acb3e, Parent Label=-, Used=2.38 KB, SpillThreshold=164.05 GB, Peak=2.38 KB
```
…38494) when construct workload group, mem_limit is equal to (process_memory_limit * group_limit_percent), here, it is assumed that the available memory of workload groups is equal to process_memory_limit. but process_memory_usage is actually bigger than all_workload_groups_mem_usage, because public_memory of page cache, allocator cache, segment cache etc. are included in process_memory_usage. so actual available memory of the workload groups is equal to (process_memory_limit - public_memory) we will exclude this public_memory when calculate workload group mem_limit. so a ratio is calculated to multiply the workload group mem_limit from the previous construction. if all_workload_groups_mem_usage is greater than process_memory_usage, it means that the memory statistics of the workload group are inaccurate. the reason is that query/load/etc. tracked is virtual memory, and virtual memory is not used in time. At this time, weighted_memory_limit_ratio is equal to 1, and workload group mem_limit is still equal to (process_memory_limit * group_limit_percent), this may cause query spill to occur earlier, However, there is no good solution at present, but we cannot predict when these virtual memory will be used. ``` Process Memory Summary: process memory used 10.14 GB(= 10.35 GB[vm/rss] - 217.76 MB[tc/jemalloc_cache] + 0[reserved] + 0B[waiting_refresh]), sys available memory 101.36 GB(= 101.36 GB[proc/available] - 0[reserved] - 0B[waiting_refresh]), all workload groups memory usage: 2.38 KB, weighted_memory_limit_ratio: 0.9700175477872811 I20240729 20:18:47.099756 2520164 workload_group_manager.cpp:253] Workload Group normal: mem limit: 169.12 GB, mem used: 2.38 KB, weighted mem limit: 164.05 GB, used ratio: 1.3807033383519938e-08, query count: 1, query spill threshold: 164.05 GB Query Memory Summary: MemTracker Label=Query#Id=67f945a17fb44483-ab4c0b1ff52acb3e, Parent Label=-, Used=2.38 KB, SpillThreshold=164.05 GB, Peak=2.38 KB ```
when construct workload group, mem_limit is equal to (process_memory_limit * group_limit_percent),
here, it is assumed that the available memory of workload groups is equal to process_memory_limit.
but process_memory_usage is actually bigger than all_workload_groups_mem_usage,
because public_memory of page cache, allocator cache, segment cache etc. are included in process_memory_usage.
so actual available memory of the workload groups is equal to (process_memory_limit - public_memory)
we will exclude this public_memory when calculate workload group mem_limit.
so a ratio is calculated to multiply the workload group mem_limit from the previous construction.
if all_workload_groups_mem_usage is greater than process_memory_usage, it means that the memory statistics of the workload group are inaccurate.
the reason is that query/load/etc. tracked is virtual memory, and virtual memory is not used in time.
At this time, weighted_memory_limit_ratio is equal to 1, and workload group mem_limit is still equal to (process_memory_limit * group_limit_percent), this may cause query spill to occur earlier,
However, there is no good solution at present, but we cannot predict when these virtual memory will be used.