-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems to use HDFS storage_policy for MergeTree with more than 300GB #43112
Comments
it should not write many small files if you inserted all data in one insert please check
|
related #40968 |
with S3 objects are sharded to subfolders, with HDFS they are not S3
HDFS
|
And that's not easy, need to pred-create folders in HDFS, before putting a file. |
We don't support HDFS. |
Alex Sapin suggested to use an optimistic approach. If create_hdfs_object fails with an error "folded does not exist" then create a folder, retry create_hdfs_object. |
Hellow!
We use
storage_policy=hdfs
for MergeTree tablesEach file about 2KB, than if we just save about 300GB, then we will get an exception about files limit in each folder
See dfs.namenode.fs-limits.max-directory-items
https://github.com/naver/hadoop/blob/master/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java#L285
load_balancing=round_robin
not work in our case, cause memory limit for folder not exist, andmove_factor
parameter will not workWe create 100 folders and 100 volumes for them and 1 policy for all of them. But ClickHouse save all data for 1 folder
min_bytes_for_wide_part
make no effect for loading, each file about 2KB anywayThe text was updated successfully, but these errors were encountered: