Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB schema: optimize for tail view, keep track of insertion order for entities (with index, ideally via primary key) #789

Closed
jgehrcke opened this issue Mar 2, 2023 · 6 comments
Assignees
Labels
DB schema ticket related to database schema

Comments

@jgehrcke
Copy link
Member

jgehrcke commented Mar 2, 2023

Generalization of #786.

Requirement for #622.

We want to have a solution for "give the last N inserted entities, fast" for all the things (including benchmark results).

Currently that's not easily possible because of a UUID4 type primary key.

I think we should cheaply keep track of insertion order via primary key. Could use simple auto-increment integer (non-fancy default in many systems, would have served us well here, yeah) or we could do something like UUID7 (might be a little too new, maybe no established tooling support as of today).

I think this task/goal/ticket should be tracked independently of tracking insertion time (#790).

@jgehrcke
Copy link
Member Author

jgehrcke commented Mar 6, 2023

We want to have a solution for "give the last N inserted entities, fast" for all the things (including benchmark results).

Adding a thought from #786 to here:

I keep centering around the idea that most users would like to have a tail view, i.e. insertion order should play a central role: #789 ... Maybe in the future, a central UI concept for snappy DB queries/snappy UI population should involve the idea of showing tabular data (everywhere in the UI) for the last D days (could default to e.g. 10). That relates to #790.

@jgehrcke jgehrcke added the DB schema ticket related to database schema label Mar 10, 2023
@jgehrcke jgehrcke changed the title DB schema: keep track of insertion order for entities (with index, ideally via primary key) DB schema: optimize for tail view, keep track of insertion order for entities (with index, ideally via primary key) Mar 15, 2023
@jgehrcke
Copy link
Member Author

Btw, I liked https://supabase.com/blog/choosing-a-postgres-primary-key quite a bit when I started to read about about this back then in March.

@jgehrcke jgehrcke self-assigned this Apr 19, 2023
@jgehrcke
Copy link
Member Author

Let's just go ahead with using UUID type 7 for the primary key before we stall on this for even longer.

Interesting perspective from https://www.percona.com/blog/store-uuid-optimized-way/

The size of the UUID table is almost 50% bigger than Ordered UUID table and 30% bigger than the table with BIGINT as PRIMARY KEY. Comparing the Ordered UUID table BIGINT table, the time is taken to insert rows and the size are almost the same. But they may vary slightly based on the index structure.

I am glad they discuss monotonicity to a great extent in the RFC draft: https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-02.html#name-monotonicity-and-counters.

For CPython someone rather brutally dropped a minimal pull request: python/cpython#29824 -- but there isn't even an issue on the tracker discussing the feature/enhancement to the stdlib.

The new standard is still in are not complete yet. https://pypi.org/project/uuid7/ looks good, though, https://github.com/stevesimmons/uuid7 has a decent README.

The latest public progress report by the working group pushing towards standardization of the new UUID types can be found in uuid6/uuid6-ietf-draft#122.

The current version of the RFC that is supposed to replace RFC 4122 is from 2023-02-16 and can be found at https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-02.html.

The work is happening in this repo: https://github.com/ietf-wg-uuidrev/rfc4122bis

@jgehrcke
Copy link
Member Author

https://uuid.ramsey.dev/en/stable/rfc4122/version7.html is a lovely rendered version of the draft work.

@austin3dickey
Copy link
Member

As I mentioned in #871 (comment), I think most of the work here is done, and we just need to do a manual cleanup of "old" data. Will close this ticket once that's done.

@austin3dickey
Copy link
Member

After consulting with various stakeholders, there's nontrivial pushback against the idea of archiving or migrating old data, since deleting or modifying the primary key will break old URL links to benchmark results.

I think we can still move forward with the requirement that any code that wants to take advantage of the UUID7 primary key will have to limit queries to data created after the date #1138 was deployed. (The latest date for the instances I manage is 2023-06-03.) So I'll call this ticket complete with that caveat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DB schema ticket related to database schema
Projects
None yet
Development

No branches or pull requests

2 participants