Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local cache for remote filesystem #33717

Merged
merged 69 commits into from
Mar 11, 2022

Conversation

kssenii
Copy link
Member

@kssenii kssenii commented Jan 17, 2022

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add local cache for disk s3. Closes #28961.

Detailed description / Documentation draft:
.

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Jan 17, 2022
@yiguolei
Copy link
Contributor

If the read request is from merge operation, then it will read a lot of data and write to local cache but they will be deleted after merge finished. Is there a better way to avoid this?

@kssenii
Copy link
Member Author

kssenii commented Jan 18, 2022

@yiguolei

If the read request is from merge operation, then it will read a lot of data and write to local cache but they will be deleted after merge finished. Is there a better way to avoid this?

Thanks, you are right, will make a solution to avoid this.

@xiedeyantu
Copy link
Contributor

xiedeyantu commented Jan 21, 2022

I compiled using this branch, but I made an error at startup.
Application: DB::Exception: Cannot read all data. Bytes read: 3. Bytes expected: 4.

@kssenii
Copy link
Member Author

kssenii commented Jan 21, 2022

It is not ready yet. Moreover I did not push the latest version in this PR.
And it is in draft state now, you can test it when it is not a draft.

@kant777
Copy link

kant777 commented Jan 24, 2022

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Jan 24, 2022

@kant777 LRU is obviously not the best caching algorithm, but it is pointless to implement anything else until you have reproducible comparison mechanism to evaluate different algorithms on realistic workload.

Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reading, but currently I see no big issues. Discussed a lot in TG chat. Let's continue.

src/Disks/IO/CacheableReadBufferFromRemoteFS.h Outdated Show resolved Hide resolved
src/Disks/IO/CacheableReadBufferFromRemoteFS.h Outdated Show resolved Hide resolved
src/Common/FileCache.h Outdated Show resolved Hide resolved
src/Common/FileCacheFactory.cpp Show resolved Hide resolved
src/Disks/IO/ReadBufferFromRemoteFSGather.cpp Outdated Show resolved Hide resolved
src/Common/FileCache.cpp Outdated Show resolved Hide resolved
src/Common/FileCache.cpp Outdated Show resolved Hide resolved

bool LRUFileCache::tryReserve(const Key & key_, size_t offset_, size_t size, [[maybe_unused]] std::lock_guard<std::mutex> & cache_lock)
{
auto removed_size = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check corner case when we are trying to reserve more data than the whole cache size?

src/Disks/IO/CacheableReadBufferFromRemoteFS.cpp Outdated Show resolved Hide resolved
src/Common/FileSegment.cpp Outdated Show resolved Hide resolved
@kssenii kssenii force-pushed the local-cache-for-remote-fs branch 2 times, most recently from 1589782 to 8a7b4fa Compare January 24, 2022 22:34
@kssenii
Copy link
Member Author

kssenii commented Mar 11, 2022

Stress test (thread, actions) — Sanitizer assert (in stderr.log)

Not related to changes:

WARNING: ThreadSanitizer: heap-use-after-free (pid=607)
  Atomic write of size 8 at 0x7b48006c01f0 by thread T243:
    #0 __tsan_atomic64_fetch_add <null> (clickhouse+0xad0a815)
    #1 long std::__1::__cxx_atomic_fetch_add<long>(std::__1::__cxx_atomic_base_impl<long>*, long, std::__1::memory_order) obj-x86_64-linux-gnu/../contrib/libcxx/include/atomic:1050:12 (clickhouse+0xad9d56a)
    #2 std::__1::__atomic_base<long, true>::fetch_add(long, std::__1::memory_order) obj-x86_64-linux-gnu/../contrib/libcxx/include/atomic:1719:17 (clickhouse+0xad9d56a)
    #3 MemoryTracker::allocImpl(long, bool, MemoryTracker*) obj-x86_64-linux-gnu/../src/Common/MemoryTracker.cpp:118:35 (clickhouse+0xad9d56a)
    #4 MemoryTracker::allocImpl(long, bool, MemoryTracker*) obj-x86_64-linux-gnu/../src/Common/MemoryTracker.cpp:239:22 (clickhouse+0xad9d940)

Stateless tests flaky check (address, actions) — Timeout, fail: 27, passed: 390

Those which failed - failed because of a timeout in stateless tests flakey check run.

@kssenii kssenii merged commit 818459b into ClickHouse:master Mar 11, 2022
@alexey-milovidov
Copy link
Member

VFS cache is intended to be composable. So, you define caching layer on top of another VFS. So you can have client-side encrypted data on s3 and choose between caching encrypted and unencrypted data. Now I see that s3 cache configuration is duplicating s3 configuration 😿

I will have to say that configuration is preliminary and the feature is in preview.

if (allow_non_strict_checking)
return "None";

throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Cannot use cache without query id");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be this a LOGICAL_ERROR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks, I will change in those places.
Initially I started making it REMOTE_FS_OBJECT_CACHE_ERROR instead of LOGICAL_ERROR, because I wanted to check certain errors for EXPECT_THROW in unit tests, but then it would not work in debug build unit tests, so I started to not use logical error...

std::lock_guard segment_lock(mutex);

if (downloader_id.empty())
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "There is no downloader");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, why not LOGICAL_ERROR?

throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "There is no downloader");

if (getCallerId() != downloader_id)
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Downloader can be reset only by downloader");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, and in few other places in this file for similar exceptions.


namespace DB
{

namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
{extern const int BAD_ARGUMENTS;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Local cache for remote filesystem (RFC)
9 participants