New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileCache::loadMetadata is extremely slow on large caches #52037
Comments
I agree that it makes sense to add more parallelism. |
yes, it should better be done asynchronously.
which ops do you mean?
I assume we can only parallelise loading the metadata, but nothing else, any suggestions? |
I haven't looked at the code in detail, but I meant mostly to verify if all the checks for is_directory, is_empty, etc. are needed (most likely they are). Since they are sequential disk operations they aren't fast, so removing them (or combining them if possible) might help quite a bit when you need to process millions of folders. |
One idea, and I'd need to verify it with local testing is to try to find a better way to do the is_directory + is_empty on Linux.
As a side note, we've been doing some tests adding IOPS to the disk and it's clear it's not the bottleneck, so parallelizing the whole process makes sense up to a point. |
Checking the overhead of the implementation: #include <chrono>
#include <filesystem>
#include <iostream>
#include <string>
using namespace std::filesystem;
using namespace std::chrono;
size_t empties_fs(const std::string &path)
{
size_t empties = 0;
for (auto key_prefix_it = recursive_directory_iterator{path};
key_prefix_it != recursive_directory_iterator();
++key_prefix_it)
{
const auto key_prefix_directory = key_prefix_it->path();
if (!is_directory(key_prefix_directory))
continue;
if (is_empty(key_prefix_directory))
empties++;
}
return empties;
}
int main(int argc, char **argv)
{
if (argc != 2)
return 1;
auto start = high_resolution_clock::now();
size_t empties_1 = empties_fs(argv[1]);
auto stop = high_resolution_clock::now();
std::cout << "C++ " << duration_cast<milliseconds>(stop - start).count() << " milliseconds. " << empties_1 << " empty folders\n";
return 0;
}
Using find (bash > C++ 😄 ):
So find + wc -l (as find is not printing the count but the names) takes half the time than the C++ snippet. That is, there is a 2x there by discarding std::filesystem on top of whatever parallelization is added to the process. |
I've been looking at this a little bit more to try to address the core of the problem, which is a really big metadata folder which is not in the kernel disk cache. I see several different approaches, but to confirm it's solvable at least in my local disk I've done the following:
# sync; echo 3 > /proc/sys/vm/drop_caches
# /home/raul/issues/empty .
C++ 83290 milliseconds. 0 empty folders The total runtime is only the executable, which takes ~83 seconds to iterate over 1000 * 1000 folders.
The sum of the "warmup prefetch time" and the run is around 5.9 seconds (it's missing a wait for the processes), which is 14x better. I still haven't came up with a good way to implement this parallel warmup in C++ in a sane way, but at least it seems that it should help quite a bit. Edit: Simplified bash waiting for all folders to be cached
|
It should be much better after #52943 using |
Thanks @Algunenano, I've created a PR to add the setting to the documentation #55860 |
When booting a server with a large cache disk attached it might expend even 30 minutes in FileCache::loadMetadata, during which the server is unresponsive:
CH version: 23.5.3.24
Config:
Disk:
When looking at amount of directories in cache there is a lot:
When looking at what the server is doing we can see it's just iterating over all the dirs:
Fstat confirms the same:
The times are much better if the dirs happen to be in kernel cache, but the problem is more obvious if you drop it before starting ClickHouse.
Some thoughts:
The text was updated successfully, but these errors were encountered: