-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: UUID 2.0 data types #63179
Comments
There is also problem with "awful" hash function, for UUID in uniq aggregate function #34425 (comment) |
The only concern is that Clickhouse is a DWH database and users may want to store uuids from different sources in the same column and also users may not know what their uuids are. |
I think, for new ClickHouse installations we need to use "new" variant. |
Original proposal:
@rschu1ze for some reason your proposal is entirely different from the original. Closing this because it does not make sense to have different data types for UUIDv4 and UUIDv7. |
ClickHouse supports UUIDs through a UUID data type and various utility functions to generate and convert UUIDs.
It has been noted that UUIDs in ClickHouse have no intuitive sort order, instead they are sorted by their right half. This makes UUIDs unsuitable/dangerous as sorting or primary index keys or partition keys. The reason for this behavior is historical: They are internally represented as a UInt128 (2 x 64 bit) composite integer (code), with the halves in big endian order (code).
The current UUID type also has the disadvantage that it treats all UUID versions equal (v1-v5 are standardized, v6-v8 are being standardized). This was okay in the past when ClickHouse only supported UUID version 4 but it makes it makes things difficult when we support version 7. More specifically,
UUIDv7ToDateTime
cannot assume that the input is really in version 7 format.These problems can only addressed with a new UUID implementation:
UUID4
andUUID7
, the new implementation will have separate UUID data types for each UUID version. This removes the existing type ambiguities.FixedString(16)
, i.e. a consecutive 16 byte field without a notion of "halves".UUID
type will continue to be supported in order to not break existing use cases. A new (server or session?) settingenable_new_uuid_types
(or something like that) is introduced which controls ifgenerateUUIDv4
andgenerateUUIDv7
return data in the new or old type. We similarly need to check for every UUID-related function how the new setting affects its behavior.The text was updated successfully, but these errors were encountered: