You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chriscomeau79
changed the title
ULID implementation with FixedString(16) instead of FixedString(26), to/from UInt128
ULID implementation with FixedString(16)/UInt128 instead of FixedString(26)
May 25, 2024
A few more notes - I think there was a deleted comment but I'll leave the reply:
The FixedString(26) version of the ULID still compresses reasonably well, since the extra bits are all zeroes. This optimization to get it down from 26 bytes to 16 is more about uncompressed memory usage and other benefits like fitting in 128 bits for memory alignment, as well as being able to go to/from UInt128 directly.
Another way to put it, it would be nice to be able to take any UInt128 and get a ULID string from it, which should be possible.
The ULID generation spec, where it's the 48-bit timestamp and 80 bits of randomness, happens to be a good way to generate well-behaved UInt128 equivalents which compress well. That's what I was trying to get at with the example here, showing how python-ulid can accept the max UInt128 and give this result. ClickHouse can do the same thing with something like UInt128ToULIDString, ULIDStringToUInt128.
The process of generating ULIDs would still behave the same as in the spec, so we get those benefits with locality and compression. The internal representation would just be those 128 bits with the behaviour built in to display as a ULID string.
If it's useful, could generalize this to do the same thing with UInt256 and strings that are twice as long but otherwise follow the ULID convention. It's a nice way to deal with such long numbers. Hypothetically the same style of generation would work too, with 48 bits timestamp + 208 bits random, but it's hard to think of a scenario where that would be needed. Could call that a ULID256.
Use case
Optimization: storing ULIDs in 128 bits instead of the 208 used with FixedString(26).
Describe the solution you'd like
Where generateULID() currently returns a FixedString(26), there could be an equivalent that uses FixedString(16).
https://clickhouse.com/docs/en/sql-reference/functions/ulid-functions
Some functions similar to:
https://clickhouse.com/docs/en/sql-reference/functions/uuid-functions#uuidnumtostring
https://clickhouse.com/docs/en/sql-reference/functions/uuid-functions#uuidtonum
toUInt128 for ULIDs similar to this: #51765
Additional context
ULID spec and note here on potential overflow with 26 base-32 characters (130 bits). We can't hit the 130 bit overflow problem if we're only using 128.
https://github.com/ulid/spec?tab=readme-ov-file#overflow-errors-when-parsing-base32-strings
There should be lots of useful example code in python-ulid
The text was updated successfully, but these errors were encountered: