Replace uuid4 and uuid1_macless with UUIDT #1726

Twixes · 2020-09-28T00:45:51Z

Changes

This replaces uuid4 and uuid1_macless with UUIDT, which consists of:

6 bytes - Unix time milliseconds unsigned integer (will fail in 10 895 CE)
2 bytes - autoincremented series unsigned integer (per millisecond, rolls over to 0 after reaching 65 535 UUIDs in one ms)
8 bytes - securely random gibberish

UUIDT doesn't adhere to any particular UUID version spec, but it is superior as a primary key:
to incremented integers (as they can reveal sensitive business information about usage volumes and patterns),
to UUID v4 (as the complete randomness of v4 can make its indexing performance suboptimal),
and to UUID v1 (as despite being time-based it can't be used practically for sorting by generation time).

Order can be messed up if system clock is changed or if more than 65 536 IDs are generated per millisecond (that's over 5 trillion events per day), but it should be largely safe to assume that these are sorted.

Loosely based on Segment's KSUID and on Twitter's snowflake ID.

paolodamico

For user-facing cases (i.e. the ID used to interact with the API; e.g. /events/{id}/, I would very strongly advocate to use Stripe's approach of generating context-aware IDs (e.g. evt_OM46NaUQoOhMEF is a lot more informative than aaa7a0d8-f464-4c59-ae12-e00c6aff2aad). These IDs can be very helpful as they very clearly indicate the type of object they identify, whereas a UUID (or any variation) can belong to any object, and can be painful to utilize. The part after the prefix can be a completely random string or something along the lines of what you're building. As a side note, this ID need not be the primary key of the database.

As for the indexing performance, I believe we should not be sorting by primary keys (I think that for the most part we already don't do this in production code), but rather by creation date, which means that having sequential or pseudo-sequential PKs is not necessary. [Discussion with @fuziontech]

fuziontech

lgtm. Nicely done basing this off of KSUID

Twixes · 2020-09-29T13:40:55Z

This might definitely be nice for existing IDs. @paolodamico UUIDs are quite well optimized (exactly 128 bits), so that's good about them, I do agree however that going forward we may consider something a bit verbose for user-facing application.

Twixes added 3 commits September 28, 2020 02:00

Replace uuid1_macless with UUIDT

152d1f0

Add machine_component

a1205cb

Update comment

785d022

timgl temporarily deployed to posthog-uuidt-9zeqkwcehgvzuuoh September 28, 2020 00:47 Inactive

Twixes requested a review from fuziontech September 28, 2020 00:50

Remove machine_component in favor of larger time_component

d70c9d3

Twixes temporarily deployed to posthog-uuidt-9zeqkwcehgvzuuoh September 28, 2020 00:55 Inactive

paolodamico reviewed Sep 28, 2020

View reviewed changes

fuziontech approved these changes Sep 29, 2020

View reviewed changes

Update UUIDT docstring

11dfbfb

timgl temporarily deployed to posthog-uuidt-1368rsix23shtlog September 29, 2020 13:31 Inactive

Twixes merged commit 13e32aa into master Sep 29, 2020

Twixes deleted the uuidt branch September 29, 2020 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace uuid4 and uuid1_macless with UUIDT #1726

Replace uuid4 and uuid1_macless with UUIDT #1726

Twixes commented Sep 28, 2020 •

edited

Loading

paolodamico left a comment

fuziontech left a comment

Twixes commented Sep 29, 2020

Replace uuid4 and uuid1_macless with UUIDT #1726

Replace uuid4 and uuid1_macless with UUIDT #1726

Conversation

Twixes commented Sep 28, 2020 • edited Loading

Changes

paolodamico left a comment

Choose a reason for hiding this comment

fuziontech left a comment

Choose a reason for hiding this comment

Twixes commented Sep 29, 2020

Twixes commented Sep 28, 2020 •

edited

Loading