You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use case
ClickHouse currently support multiple types of disks for data: (disk, s3, hdfs, blob) and some special (cache/encryption) <disk>
Basically any POSIX compatible FileSystem.
ClickHouse initially was designed around how EXT4 deal with file / directory renames.
But not all FS are equal, despite all of them claim compatibility performance of certain operations can differ much.
For example, it's known about some performance issues on XFS.
Another problem that under POSIX compatible disk hides entire family of FS and backend storage, which based on completely different ideas and physical principles:
Starting from locally attached disks (NMVe, SSD, HDD, software/hardware RAID arrays) and near network storage based on SAN/NAS (NFS/Lustre/Gluster/Ceph) and cloud provided block devices (EBS/PV/Azure Block Storage)
All of them provide different guarantees on latency and performance of certain operations, sometimes they are far away from locally attached ext4 FS.
It's especially noticeable during mutations, when a lot of parts being renamed or hard links created. (One of main reason why Lightweight mutations not such lightweight as they wanted to be (Second one is writing new parts in ZK))
Such problems also exist in non-POSIX storage options like S3, HDFS?, BLOB. But here, it's more extreme as you can't rename file without re upload (S3) or there is no hard links (all of them)
Current approach
All files related to single part reside in single directory.
All parts of table reside in table directory (For Atomic DB engine, it's unique UUID instead of DB_NAME/TABLE_NAME like it was in Ordinary, no FS renames in case of RENAME TABLE, and atomic EXCHANGE TABLE, YAY)
ATTACH PART[ITIION]/ALTER TABLE UPDATE use hardlinks in order to create copy of data.
It also complicate things when people want to make backup with snapshoting tools, because ClickHouse quite often rename/move files.
Proposal is simple:
Reduce amount of possible file manipulation to minimum, basically: CREATE, READ, REMOVE (so basically the same set as in object storage)
It will require us to have layout similar to what we have in object storage:
Some shared bin, where all files will have randomly assigned names. (Probably ClickHouse can reuse implementation from s3 disk)
The text was updated successfully, but these errors were encountered:
Use case
ClickHouse currently support multiple types of disks for data: (disk, s3, hdfs, blob) and some special (cache/encryption)
<disk>
Basically any POSIX compatible FileSystem.
ClickHouse initially was designed around how EXT4 deal with file / directory renames.
But not all FS are equal, despite all of them claim compatibility performance of certain operations can differ much.
For example, it's known about some performance issues on XFS.
Another problem that under
POSIX compatible disk
hides entire family of FS and backend storage, which based on completely different ideas and physical principles:Starting from locally attached disks (NMVe, SSD, HDD, software/hardware RAID arrays) and near network storage based on SAN/NAS (NFS/Lustre/Gluster/Ceph) and cloud provided block devices (EBS/PV/Azure Block Storage)
All of them provide different guarantees on latency and performance of certain operations, sometimes they are far away from locally attached ext4 FS.
It's especially noticeable during mutations, when a lot of parts being renamed or hard links created. (One of main reason why Lightweight mutations not such lightweight as they wanted to be (Second one is writing new parts in ZK))
Such problems also exist in non-POSIX storage options like S3, HDFS?, BLOB. But here, it's more extreme as you can't rename file without re upload (S3) or there is no hard links (all of them)
Current approach
It also complicate things when people want to make backup with snapshoting tools, because ClickHouse quite often rename/move files.
Proposal is simple:
Reduce amount of possible file manipulation to minimum, basically: CREATE, READ, REMOVE (so basically the same set as in object storage)
It will require us to have layout similar to what we have in object storage:
Some shared bin, where all files will have randomly assigned names. (Probably ClickHouse can reuse implementation from s3 disk)
The text was updated successfully, but these errors were encountered: