-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local cache for remote filesystem #33717
Local cache for remote filesystem #33717
Conversation
If the read request is from merge operation, then it will read a lot of data and write to local cache but they will be deleted after merge finished. Is there a better way to avoid this? |
7eaf6a5
to
41f4f33
Compare
Thanks, you are right, will make a solution to avoid this. |
41f4f33
to
31d2749
Compare
31d2749
to
a566099
Compare
I compiled using this branch, but I made an error at startup. |
It is not ready yet. Moreover I did not push the latest version in this PR. |
b5e12e1
to
36a41ac
Compare
fa0780a
to
710bba8
Compare
@kant777 LRU is obviously not the best caching algorithm, but it is pointless to implement anything else until you have reproducible comparison mechanism to evaluate different algorithms on realistic workload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reading, but currently I see no big issues. Discussed a lot in TG chat. Let's continue.
|
||
bool LRUFileCache::tryReserve(const Key & key_, size_t offset_, size_t size, [[maybe_unused]] std::lock_guard<std::mutex> & cache_lock) | ||
{ | ||
auto removed_size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check corner case when we are trying to reserve more data than the whole cache size?
1589782
to
8a7b4fa
Compare
8a7b4fa
to
aef99de
Compare
569dfe8
to
20eb192
Compare
c717854
to
aab3de7
Compare
e7f7298
to
bd68d1d
Compare
c70f7f5
to
c86955d
Compare
Not related to changes:
Those which failed - failed because of a timeout in stateless tests flakey check run. |
VFS cache is intended to be composable. So, you define caching layer on top of another VFS. So you can have client-side encrypted data on s3 and choose between caching encrypted and unencrypted data. Now I see that s3 cache configuration is duplicating s3 configuration 😿 I will have to say that configuration is preliminary and the feature is in preview. |
if (allow_non_strict_checking) | ||
return "None"; | ||
|
||
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Cannot use cache without query id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be this a LOGICAL_ERROR
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thanks, I will change in those places.
Initially I started making it REMOTE_FS_OBJECT_CACHE_ERROR instead of LOGICAL_ERROR, because I wanted to check certain errors for EXPECT_THROW in unit tests, but then it would not work in debug build unit tests, so I started to not use logical error...
std::lock_guard segment_lock(mutex); | ||
|
||
if (downloader_id.empty()) | ||
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "There is no downloader"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, why not LOGICAL_ERROR
?
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "There is no downloader"); | ||
|
||
if (getCallerId() != downloader_id) | ||
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Downloader can be reset only by downloader"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, and in few other places in this file for similar exceptions.
|
||
namespace DB | ||
{ | ||
|
||
namespace ErrorCodes | ||
{ | ||
extern const int BAD_ARGUMENTS; | ||
{extern const int BAD_ARGUMENTS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add local cache for disk s3. Closes #28961.
Detailed description / Documentation draft:
.