-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update storing-data.md #60024
Update storing-data.md #60024
Conversation
This is an automated comment for commit 69a6631 with description of existing statuses. It's updated for the latest CI running ✅ Click here to open a full report in a separate page Successful checks
|
ba77897
to
9bcd4da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get to read through everything, but i tried to target the areas where I know docs were previously lacking. Hopefully this is helpful and can lead to some great docs!
docs/en/operations/storing-data.md
Outdated
|
||
### Using Plain Storage {#s3-storage} | ||
|
||
There is a disk type `s3_plain`, which provides a write-once storage. Unlike `s3` disk type, it stores data as is, e.g. instead of randomly-generated blob names, it uses normal file names as clickhouse stores files on local disk. So this disk type allows to keeper a static version of the table and can also be used to create backups on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this talk about the plain
metadata type now, since this setting no longer exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, because it's write-once, does this mean it never merges? Or when it merges it never deletes old parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this disk type allows to keeper a static version of the table and can also be used to create backups on it. Configuration parameters are the same as for s3 disk type.
Doesn't mean much to me tbh. Does this mean it basically just puts the exact clickhouse path on S3? If so, why are we doing the random blobs locally then, so metadata is then stored locally? Maybe that confusion is a result of this not being updated to reflect the plain
metadata_type.
It would be useful to show this in context as well, like when I do this:
- what does the S3 file structure look like?
- how does this affect merges (as previously mentioned)?
- should this ever be a working-set table, or do I need to make this a materialized view target? If this is a working table, what are the performance implications?
- what do you mean by keep a static version and create a backup? Are you saying that because merges never delete it's a backup? Confusing without deep clickhouse context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this talk about the plain metadata type now, since this setting no longer exists?
Yes, this made me to remember that some code was missing to allow usage of plain metadata for other object storage types, which was addressed in #60396, so this is why I did not continue with this documentation PR until now - needed to merge the PR with the fix first.
Also, because it's write-once, does this mean it never merges? Or when it merges it never deletes old parts?
Yes, merges are disabled. And Inserts are not allowed as well (an exception will be thrown on an attempt to insert some data). Added this to doc.
Doesn't mean much to me tbh.
Added some more explanation in doc.
Does this mean it basically just puts the exact clickhouse path on S3?
Yes.
If so, why are we doing the random blobs locally then, so metadata is then stored locally?
For s3
disk type we store data in random blobs because unlike s3_plain
it is not "write once", e.g. we have inserts and merges, so requirements are higher for ordinary s3. The limitations of object storage (no rename, move, hardlink operations, etc) do not allow the same usability as local filesystem allows, therefore we cannot handle it the same way.
what does the S3 file structure look like?
Just randomly generated strings with 3-digit prefix, e.g. /prefix_from_disk_config/blob_random_3_digit_prefix/blob_random_name
.
Also there was some feature which allows to change this blob path representation in a more performant way #57663, as I see it was documented already.
should this ever be a working-set table, or do I need to make this a materialized view target?
This is a normal read-only table, you can do whatever you want. The initial use case for s3_plain disk was to create backups to it (I added some info to doc about it). Backups to any other disk type apart from plain is not allowed.
If this is a working table, what are the performance implications?
None apart from data parts not being merged.
what do you mean by keep a static version and create a backup?
Added explanation in doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danthegoodman1 I will merge this PR for now. If you have more comments - please write - I will address them in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no explanation on what each metadata type does. I think it would be useful to briefly explain each one, and have an example config for each one that is something that might actually be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that they are explained in other context in other places (e.g. https://github.com/ClickHouse/ClickHouse/blob/09e630e02be9ccd19681b34f33e24cea849ca9fd/docs/en/operations/storing-data.md#using-static-web-storage-read-only-web-storage) but having them in one spot so it's easy to find the answer will make this far more accessible for users
Changelog category (leave one):
Updated documentation to include changes from #58357.