Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace uuid4 and uuid1_macless with UUIDT #1726

Merged
merged 5 commits into from
Sep 29, 2020
Merged

Replace uuid4 and uuid1_macless with UUIDT #1726

merged 5 commits into from
Sep 29, 2020

Conversation

Twixes
Copy link
Collaborator

@Twixes Twixes commented Sep 28, 2020

Changes

This replaces uuid4 and uuid1_macless with UUIDT, which consists of:

  • 6 bytes - Unix time milliseconds unsigned integer (will fail in 10 895 CE)
  • 2 bytes - autoincremented series unsigned integer (per millisecond, rolls over to 0 after reaching 65 535 UUIDs in one ms)
  • 8 bytes - securely random gibberish

UUIDT doesn't adhere to any particular UUID version spec, but it is superior as a primary key:
to incremented integers (as they can reveal sensitive business information about usage volumes and patterns),
to UUID v4 (as the complete randomness of v4 can make its indexing performance suboptimal),
and to UUID v1 (as despite being time-based it can't be used practically for sorting by generation time).

Order can be messed up if system clock is changed or if more than 65 536 IDs are generated per millisecond (that's over 5 trillion events per day), but it should be largely safe to assume that these are sorted.

Loosely based on Segment's KSUID and on Twitter's snowflake ID.

@timgl timgl temporarily deployed to posthog-uuidt-9zeqkwcehgvzuuoh September 28, 2020 00:47 Inactive
@Twixes Twixes temporarily deployed to posthog-uuidt-9zeqkwcehgvzuuoh September 28, 2020 00:55 Inactive
Copy link
Contributor

@paolodamico paolodamico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For user-facing cases (i.e. the ID used to interact with the API; e.g. /events/{id}/, I would very strongly advocate to use Stripe's approach of generating context-aware IDs (e.g. evt_OM46NaUQoOhMEF is a lot more informative than aaa7a0d8-f464-4c59-ae12-e00c6aff2aad). These IDs can be very helpful as they very clearly indicate the type of object they identify, whereas a UUID (or any variation) can belong to any object, and can be painful to utilize. The part after the prefix can be a completely random string or something along the lines of what you're building. As a side note, this ID need not be the primary key of the database.

As for the indexing performance, I believe we should not be sorting by primary keys (I think that for the most part we already don't do this in production code), but rather by creation date, which means that having sequential or pseudo-sequential PKs is not necessary. [Discussion with @fuziontech]

Copy link
Member

@fuziontech fuziontech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Nicely done basing this off of KSUID

@timgl timgl temporarily deployed to posthog-uuidt-1368rsix23shtlog September 29, 2020 13:31 Inactive
@Twixes Twixes merged commit 13e32aa into master Sep 29, 2020
@Twixes Twixes deleted the uuidt branch September 29, 2020 13:39
@Twixes
Copy link
Collaborator Author

Twixes commented Sep 29, 2020

This might definitely be nice for existing IDs. @paolodamico UUIDs are quite well optimized (exactly 128 bits), so that's good about them, I do agree however that going forward we may consider something a bit verbose for user-facing application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants