Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap auto-increment IDs for ObjectID #7494

Closed
ErisDS opened this issue Oct 5, 2016 · 5 comments
Closed

Swap auto-increment IDs for ObjectID #7494

ErisDS opened this issue Oct 5, 2016 · 5 comments
Assignees

Comments

@ErisDS
Copy link
Member

ErisDS commented Oct 5, 2016

Auto-increment IDs are a feature of most databases. They're an easy way to create an id which is unique for each row in a relational database, so easy, it's not often thought about.

However, auto-increment IDs have several undesirable features:

  • Any replication system with multiple masters needs to have unique IDs, otherwise replication breaks.
  • Auto-increment IDs leak information, although this is not always a problem.
  • Auto-increment IDs don't globally identify resources.
  • This blog post covers several more small reasons why auto increments aren't great.

When Ghost was originally built, I already identified the need to be able to identify resources globally, and so all models in Ghost have both an auto-increment id and a uuid.

Far from solving the problem, this feels like the worst of both worlds:

  • We still have auto-increment ids and all the downsides they bring
  • We have an additional uuid using uuid.v4() that we rarely leverage and which is only "probably" unique, rather than being a guaranteed globally unique id.

The proposal for Ghost 1.0.0 is to move towards using a single ID for each resource, which is guaranteed to be globally unique. After some research into node modules which provide unique id's (there are many and this one was particularly promising), and also looking at what other platforms use, we settled on the idea of using ObjectID - the ID format that is used by MongoDB, via this npm module: https://github.com/williamkapke/bson-objectid.

Note, there are some small downsides as this could have a performance impact. One interesting potential performance issue is that innoDB uses auto-increment ids to optimise inserts. However, as ObjectID is still serial I don't believe this will be an issue.

Using ObjectId can also increase field and index sizes, I believe the impact for us will be minimal. At the moment, we do a poor job of adding indexes to Ghost, so even the fact that we intend to add an optimise them is going to be an improvement over the existing system. Further, we're reducing from having two potential IDs to need indexes on, to having one. There is plenty of content showing that the performance impact is minimal or a worthwhile tradeoff.

Finally, the one place where objectID might not be nice, is for post previews, we may need to revisit the implementation of these, ref #5097.

@ErisDS ErisDS added the data label Oct 5, 2016
@ErisDS ErisDS added this to the 1.0.0-alpha.5 milestone Oct 5, 2016
@kirrg001
Copy link
Contributor

kirrg001 commented Oct 5, 2016

Already started with the implementation, but waiting for bookshelf/bookshelf#1383 right now 👍

@Mickael-van-der-Beek
Copy link

Mickael-van-der-Beek commented Oct 5, 2016

@ErisDS Shouldn't UUIDv4 be unique enough for this use case?

From reading the UUIDv4 Wikipedia page section about collisions:

https://en.wikipedia.org/wiki/Universally_unique_identifier#Random%5FUUID%5Fprobability%5Fof%5Fduplicates

Values are:

n = 68,719,476,736 (2^36), chance = 0.0000000000000004 (4×10^−16)
n = 2,199,023,255,552 (2^41), chance = 0.0000000000005 (5×10^−13)
n = 70,368,744,177,664 (2^46), chance = 0.0000000005 (5×10^−10)

As a global identifier this should be a pretty safe bet.

To limit collisions even more you can set the uuid field as the primary key or set a unique index and thus limit same-table collisions even though global collisions would still be possible in theory.

@Mickael-van-der-Beek
Copy link

Mickael-van-der-Beek commented Oct 5, 2016

It's been a while since I've used MongoDB but by re-reading the documentation it would seem like ObjectIds are not much more random or unique than UUIDv4s.

I haven't done the collision chance calculation but my gut feeling tells me it can't guarantee a lower collision rate then UUIDv4.

The only real advantage I remember from working with ObjectIds is that since the time is the prefix you have a natural ordering by insert date.

@ErisDS
Copy link
Member Author

ErisDS commented Oct 5, 2016

I haven't done the collision chance calculation but my gut feeling tells me it can't guarantee a lower collision rate then UUIDv4.

The idea is to at least guarantee no collision within a single system, like Ghost(Pro) where we can ensure that pids are unique etc.

If we really need to improve this further, we could always move to our own version of ObjectID that uses a proper machine fingerprint, but it seems to me if the current implementation is good enough for MongoDB, it's probably good enough for us.

The only real advantage I remember from working with ObjectIds is that since the time is the prefix you have a natural ordering by insert date.

This is a pretty big deal, especially as we're going to be focusing our efforts on optimising for MySQL (and InnoDB). This is the main reason for not using uuid.

@kirrg001 kirrg001 modified the milestones: 1.0.0-alpha.6, 1.0.0-alpha.5 Oct 13, 2016
@ErisDS ErisDS removed this from the 1.0.0-alpha.6 milestone Oct 24, 2016
kirrg001 added a commit to kirrg001/Ghost that referenced this issue Nov 4, 2016
refs TryGhost#7494

Our fixtures system is smart. You can define a user and a separated relation.
Right now we always expect static id's for everything: roles, permissions, users.
That's why we simply do for adding the owner: roles: [4] --> role id 4
So you can use separated relations or you can use a direct relation.

This does not work when generating ObjectId's. Ok it could work if we decide to define for ALL static resources a static ID, but i think that is not needed at all.

So this PR removes the default creation of an author role when user.roles is undefined or empty, to be able to create the owner without a static role id and instead use the fixture relation feature.

I looked in this git history, the default author role was invented without explaining. See TryGhost@5c6d45f
ErisDS pushed a commit that referenced this issue Nov 9, 2016
refs #7494, refs #7495 

This PR is an extracted clean up feature of #7495.
We are using everywhere static id checks (userId === 0 or userId === 1).
This PR moves the static values into the Base model.
This makes it 1. way more readable and 2. we can change the id's in a central place.

I changed the most important occurrences - no tests are touched (yet!).

The background is: when changing from auto increment id (number) to ObjectId's (string) we still need to support id 1 and 0, because Ghost relies on these two static id's.
I would like to support using both: 0/1 as string and 0/1 as number.

1 === owner/internal
0 === external

Another important change:
User Model does not longer define the contextUser method, because i couldn't find a reason?
I looked in Git history, see 6e48275
ErisDS pushed a commit that referenced this issue Nov 9, 2016
refs #7494,  refs #7495 

I saw tests adding permissions and roles twice. (see screenshots)
That happened because the setup in the test was mis-used and there is no restriction for static resources to create duplicates.
With this PR i suggest to make name unique.
kevinansfield pushed a commit to TryGhost/Admin that referenced this issue Nov 14, 2016
kevinansfield added a commit to kevinansfield/Ghost-Admin that referenced this issue Nov 17, 2016
refs TryGhost/Ghost#7494
- remove `uuid` attrs from all models except Post
- remove uuids from mirage factories and fixtures
- add a workaround for tags where the selectize-based tags input on the PSM relies on a unique identifier for each tag which doesn't get sent back to the server when saving (fixes the broken tags input caused by uuid removal in TryGhost/Ghost#7495)
kevinansfield added a commit to kevinansfield/Ghost-Admin that referenced this issue Nov 17, 2016
refs TryGhost/Ghost#7494
- remove `uuid` attrs from all models except Post
- remove uuids from mirage factories and fixtures
- add a workaround for tags where the selectize-based tags input on the PSM relies on a unique identifier for each tag which doesn't get sent back to the server when saving (fixes the broken tags input caused by uuid removal in TryGhost/Ghost#7495)
acburdine pushed a commit to TryGhost/Admin that referenced this issue Nov 17, 2016
refs TryGhost/Ghost#7494
- remove `uuid` attrs from all models except Post
- remove uuids from mirage factories and fixtures
- add a workaround for tags where the selectize-based tags input on the PSM relies on a unique identifier for each tag which doesn't get sent back to the server when saving (fixes the broken tags input caused by uuid removal in TryGhost/Ghost#7495)
geekhuyang pushed a commit to geekhuyang/Ghost that referenced this issue Nov 20, 2016
refs TryGhost#7494, refs TryGhost#7495 

This PR is an extracted clean up feature of TryGhost#7495.
We are using everywhere static id checks (userId === 0 or userId === 1).
This PR moves the static values into the Base model.
This makes it 1. way more readable and 2. we can change the id's in a central place.

I changed the most important occurrences - no tests are touched (yet!).

The background is: when changing from auto increment id (number) to ObjectId's (string) we still need to support id 1 and 0, because Ghost relies on these two static id's.
I would like to support using both: 0/1 as string and 0/1 as number.

1 === owner/internal
0 === external

Another important change:
User Model does not longer define the contextUser method, because i couldn't find a reason?
I looked in Git history, see TryGhost@6e48275
geekhuyang pushed a commit to geekhuyang/Ghost that referenced this issue Nov 20, 2016
refs TryGhost#7494,  refs TryGhost#7495 

I saw tests adding permissions and roles twice. (see screenshots)
That happened because the setup in the test was mis-used and there is no restriction for static resources to create duplicates.
With this PR i suggest to make name unique.
@kirrg001
Copy link
Contributor

closed via #7495

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants