-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enhancement](filecache) fine-grained cache space observation #57783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
|
run buildall |
| ) | ||
| .put("file_cache_info", | ||
| new SchemaTable(SystemIdGenerator.getNextId(), "file_cache_info", TableType.SCHEMA, | ||
| builder().column("HASH", ScalarType.createStringType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name 'Hash' is ambiguous.
| .column("TYPE", ScalarType.createStringType()) | ||
| .column("REMOTE_PATH", ScalarType.createStringType()) | ||
| .column("CACHE_PATH", ScalarType.createStringType()) | ||
| .column("BE_ID", ScalarType.createType(PrimitiveType.BIGINT)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a table which contains BE_ID and COMPUTE_GROUP_NAME?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but there should be, or at least something alternative, to form 'show backends' results
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 34483 ms |
TPC-DS: Total hot run time: 188140 ms |
ClickBench: Total hot run time: 28.22 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
Pls create a pr to documentation. |
|
|
||
| // Get all cache instances for inspection | ||
| const std::vector<std::unique_ptr<BlockFileCache>>& get_caches() const { return _caches; } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_all_caches
Sure. This pr is a basic function introduction. User manual will be updated later on. |
| std::string hash_str = key.hash.to_string(); | ||
|
|
||
| // Add to cache entries | ||
| cache_entries.emplace_back(hash_str, key.tablet_id, value.size, value.type, cache_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there may be memory issue
too large to storage in memory?
we can compact hash_str as 2 int64 instead of hex string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may need to support predicate push down ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use an integer as index for cache_path to save memory, and replace index with real path when fill column_values
| } | ||
|
|
||
| Status SchemaFileCacheInfoScanner::_fill_block_impl(vectorized::Block* block) { | ||
| SCOPED_TIMER(_fill_block_timer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this timer used?
| } | ||
|
|
||
| *eos = true; | ||
| return _fill_block_impl(block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there may be some thing wrong, if we return a block contains all result
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| } | ||
|
|
||
| // Collect all cache entries from all file cache instances | ||
| std::vector<std::tuple<std::string, int64_t, int64_t, int, std::string>> cache_entries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to define a struct instead of tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 use scoped struct in side function definition
|
PR approved by at least one committer and no changes requested. |
1 similar comment
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
1 similar comment
|
PR approved by anyone and no changes requested. |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)