Entity Ids currently have a risk of collision #117

cart · 2020-08-11T19:05:59Z

As some folks have pointed out in the discord, Bevy ECS uses rand::<u32>() to generate entity ids, which gives entities a decent chance of colliding when in the tens of thousands.

This needs to be fixed somehow. Per our discord discussion:

Atomically incrementing ints
- fast
- produces predictable ids
- centralized
Random Uuids (u128)
- slower
- decentralized
- astronomically small chance of collision

I'm not worried at all about random uuid collision, so at this point I think the conversation be about the performance cost of using them.

Someone should run benchmarks for each implementation.

The text was updated successfully, but these errors were encountered:

kabergstrom · 2020-08-11T19:29:06Z

I think this discussion extends to the use of entity IDs in persisted, serialized files (scenes). Currently, the IDs generated by rand::random<u32>() are persisted into files, meaning the same ID space is used for in-memory entities as well as persisted entities. There is a design choice here:
A. Separate the ID space for in-memory entity IDs and persisted entity IDs
B. Keep the ID space the same for both cases.

B with atomically incremented ints as IDs would render these persisted IDs meaningless since the same IDs will be generated every run, meaning collisions are everywhere.
A will allow to use an optimized in-memory representation (atomically incremented or generational) while using UUIDs for persisted files.

Note that entity IDs that are persisted have to be unique across all runs for your entire team, for the duration of your development cycle.

Ratysz · 2020-08-11T19:40:27Z

Some additional points, for completeness sake:

From ecs channel, by @qthree:

Main problem is that you can add UUIDs as Component if you already have generational IDs, but you can't really speed up hashmap lookup (to the level of indexing of array) if you're using UUIDs as your "main key".

kabergstrom · 2020-08-11T20:05:40Z

For reference:

bevy_hecs & legion use hashmap lookups to find entity locations:
https://github.com/bevyengine/bevy/blob/master/crates/bevy_ecs/hecs/src/entities.rs#L39
https://github.com/TomGillen/legion/blob/master/src/internals/entity.rs#L99
legion uses an atomically incrementing ID with batch optimizations to reduce contention

hecs uses a generational ID, and hence uses an array for entity location lookups:
https://github.com/Ralith/hecs/blob/master/src/entities.rs#L56

aclysma · 2020-08-11T20:17:15Z

I know that having a single ID space everywhere is very attractive, but I think it's somewhat inescapable that there will be more than one. Persisting things to disk really wants UUIDs, and network syncing wants substantially less. So I would caution against trying to pick something in the middle (i.e. 32 bits) that isn't ideal for either situation.

If you conclude that the same ID used everywhere is likely not going to work out, then I think UUIDs are a pretty clear choice for anything that's persisted. The cost of generating the UUID should be dominated by the cost of whatever other operation is happening that necessitated generating a UUID in the first place (like saving data to disk.)

kazimuth · 2020-08-11T20:18:07Z

FYI the hecs fork readme currently states that IDs are always UUIDs, that should be updated whenever a choice gets made here.

cart · 2020-08-11T20:19:59Z

I'm definitely aware of how other projects do it. I am personally attached the simplicity of a "globally unique entity id". It reduces the internal machinery involved in the ECS implementation (a relatively small win) and it (more importantly) simplifies the conceptual model for users: "create id anywhere and it is unique".

Note that entity IDs that are persisted have to be unique across all runs for your entire team, for the duration of your development cycle.

I agree that this is the situation. Based on my current understanding of guids, this would not be a problem.

If someone has proof that random UUIDS have a real collision risk (aka: a program somewhere is likely to hit a collision over the lifetime of the bevy project ... lets be optimistic and say 1000 years 😄 ) then please let me know.

Otherwise lets focus on the performance + code complexity discussion.

edit: clarified that this is "random uuids"

Cretezy · 2020-08-11T20:25:47Z

What is the performance difference with using u64 over u32? This would be a simple solution to increase the collision space.

kazimuth · 2020-08-11T20:43:31Z

if you look at the ahash readme it can hash a 128-bit int in 0.61 nanoseconds. For comparison, fxhash can hash a 64 bit int in... 0.62 nanoseconds. As I understand it hashing is the main operation applied to IDs. So, if we use ahash, there's not much cost there.

larger IDs aren't going to be a problem for overall memory usage, that'll be dominated by game assets. Larger IDs could increase register pressure slightly in hot loops. However, in an ecs that groups components into archetypes (like legion or bevy_ecs), you don't even need to look at IDs during iteration. So slightly larger IDs won't be a problem there. You only need to hash IDs for lookup-by-id, which is the rarer case -- and even then, we've got ahash.

so I suspect that for most workloads uuid ids won't penalize performance. Since they provide a much easier mental model for users, I think they're the clear choice.

(Unfortunately I don't have time to write actual benchmarks rn, but could next week.)

cart · 2020-08-11T20:49:14Z

the u32 collision issue is real enough that i'll fix it today if no-one else will, probably just by bumping to u64 or u128. we can continue to discuss the "best" solution.

ill do ecs_bench comparisons to see how much perf is affected.

OvermindDL1 · 2020-08-11T23:42:28Z

I performed a few benchmark tests, possible wrong, but code is at https://github.com/OvermindDL1/entity_bench/ if anyone wants to fix it

This needs to be fixed somehow. Per our discord discussion:

Atomically incrementing ints

Random Uuids (u128)

These do indeed hold, but you should add that the u128 UUID's will then be larger then the associated incrementing ints.

Also, as for 'decentralized', they only are when calculated, to actually be put into the system still requires world synchronization, so that has to happen no matter the case.

Someone should run benchmarks for each implementation.

Of my quick benchmark on my 13 year old desktop CPU to get an incrementing ID using my ECS's code takes (for u32, u64, and u128, it supports user types as well, great for making a type-strong id):

create/create-linear-u32   time:   [8.1782 ns 8.4789 ns 8.8042 ns]
create/create-linear-u64   time:   [7.5045 ns 7.5696 ns 7.6326 ns]
create/create-linear-u128  time:   [7.8254 ns 7.8976 ns 7.9624 ns]

To create a u128 via rand is:

[26.515 ns 26.636 ns 26.777 ns]

And that's not even including the time it takes to perform the insertion into the hashmap on world, which when using a u32 to insert into a hash_map, and ahash hash_map takes:

create/create-u128-rand-hashmap   time:   [67.235 ns 67.585 ns 67.937 ns]
create/create-u128-rand-ahash     time:   [46.223 ns 46.509 ns 46.785 ns]

B with atomically incremented ints as IDs would render these persisted IDs meaningless since the same IDs will be generated every run, meaning collisions are everywhere.

Except two things, first you 'should' still test for collisions or weird errors can suddenly pop up, thoguh with u128 it is less likely but still bound to happen to the rare user, and second, and the bigger thing, you can load entities multiple times (great for templates) so you have to change their ID's anyway.'

Should just write out entities as is, but load them while creating new ID's and reconnect things as appropriate.

Note that entity IDs that are persisted have to be unique across all runs for your entire team, for the duration of your development cycle.

Only if you don't remap on load, which you should do anyway because you can load things multiple times (instance a house and all its stuff into multiple locations for example).

If someone has proof that random UUIDS have a real collision risk (aka: a program somewhere is likely to hit a collision over the lifetime of the bevy project ... lets be optimistic and say 1000 years smile ) then please let me know.

Well with u32 I couldn't get a successful benchmark to run at all due to collisions. With u128 then statistically you'd have a 50% chance of collision when 2^64 entities have been allocated. Which given a number of entities, players, and days is bound to happen eventually, and it's a bad user experience for their program to suddenly crash with an lookup error or so. This may be a small chance of happening, but it is still possible in a reasonable time frame (and that chance can be made so so much worse in certain situations, especially with poor quality / fast PRNG's or certain game setups) and just waiting for a ticking time bomb in the code makes one very wary. The engine should be as stable as possible, and random ID's is not as stable as possible.

What is the performance difference with using u64 over u32? This would be a simple solution to increase the collision space.

With just u64 you have a 50% chance of collision after just 4 billion entities created, which over even a modest set of player over a few days is bound to happen. You could make a loop run for a few minutes and likely get a collision.

As I understand it hashing is the main operation applied to IDs. So, if we use ahash, there's not much cost there.

No need to hash incremental reused ID's, so no hashing cost at all.

the u32 collision issue is real enough that i'll fix it today if no-one else will, probably just by bumping to u64 or u128. we can continue to discuss the "best" solution.

Yeah, I hit it multiple times today with my tests...

zicklag · 2020-08-12T00:22:32Z

This may be a small chance of happening, but it is still possible in a reasonable time frame (and that chance can be made so so much worse in certain situations, especially with poor quality / fast PRNG's or certain game setups) and just waiting for a ticking time bomb in the code makes one very wary.

This is definitely a concern from a usability perspective. If that is a problem then I think that outweighs any cognitive advantage of "create id anywhere and it is unique" ( though I agree with the beauty of that model if it could ever be relied upon ).

cart · 2020-08-12T02:06:31Z

In order to reach that collision likelihood , where we have a shared game world where each player generates 1,000,000 unique entities per day (which is high), we would need:

184,467,440,000,000,000,000 (50% collision chance) /
7,800,000,000 (world population) / 
1,000,000 (entities per day) /
365 days
= 
64.8 years of every person in the world playing the same shared game with no resets.

Thats a lot of entities and a lot of time

OvermindDL1 · 2020-08-12T05:25:00Z

That's still to 50%, even a small percentage of a single percent will still cause it to occasionally happen due to the amount of people using it in a variety of ways over time.

aclysma · 2020-08-12T05:49:51Z

128-bit UUIDs are the gold standard for "no one will ever get the same one, ever". They are relied upon in distributed databases for far more important things than games. According to wikipedia: "the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion" (https://en.wikipedia.org/wiki/Universally_unique_identifier) You are about 1000 times more likely than that to be struck by lightining this year. (https://www.weather.gov/safety/lightning-odds).

I also think micro-benchmarking the cost of generating them is ignoring that whatever is triggering creating the UUID in the first place will almost certainly dominate the cost of generating the UUID. Even if spawning an entity was nothing more than a 128-byte memcpy, a profiler would show that before it showed generating a UUID. Games don't spawn/despawn thousands of entities per frame during gameplay anyways.

you can load entities multiple times (great for templates) so you have to change their ID's anyway.

I think this is a really good point

If I were designing an engine I would do:

128-bit UUIDs for anything saved to disk
64-bit counter for in-memory IDs
16-bit IDs for syncing dynamic entities in a real-time simulation over a network

I think 128-bit for in-memory IDs can work, but between @OvermindDL1's comment RE: spawning multiple copies of the same entity in different places in the world, and that it's likely there will be more than one ID space anyways, I would accept having a UUID/in-memory ID mapping table and enjoy the smaller IDs.

fopsdev · 2020-08-12T06:27:56Z

Hmm... Not sure if its of help. But what about defining a namespace for each project and a atomic 32bit increasing number?
Ok if i think more on this subject the problem will be with distributed systems i guess.
Then why not use a 64bit number and having ranges for each player?
lets say we have 2^64 / 2^40 (nr of entities per player) = 2^24 (possible nr of players)
(just throwing in some random ideas. i'm still learning all of this 🐥 😄 )

zicklag · 2020-08-12T14:18:45Z

Games don't spawn/despawn thousands of entities per frame during gameplay anyways.

What about particle effects, though? If each particle has to have a transform, don't those need their own entities? I'm not sure if there are different strategies for handling particles or not.

aclysma · 2020-08-12T14:22:17Z

I wouldn’t put particles in an ECS. It would be common to leave that data in a GPU buffer since it generally doesn’t affect gameplay. It’s also an example of if you are doing an extreme amount of something (like in sand games) you probably want a non-general, highly optimized solution rather than a general purpose solution.

OvermindDL1 · 2020-08-13T21:03:02Z

Spawning templates of things is common, like say a projectile for an FPS, units in an RTS, etc...

In addition, I hit another issue with using random entity ID's, some of my map generation iteration time is dominated by the hashmap lookup from looking up an entity data to get its draw information so I can render visible entities and map information. If they were packed near 0 like in my code above then that cost vanishes as it's just an array index. Hashmap lookups in tight loops is a significant performance hit.

RLHerbert · 2020-08-16T02:20:44Z

Forgive my newness to the topic, but would a family generator style a la https://skypjack.github.io/2019-02-14-ecs-baf-part-1#unique-identifiers make sense within the Bevy context?

zicklag · 2020-08-16T13:44:40Z

@RLHerbert no problem. :)

Someone correct me if I am wrong, but the identifiers referred to by that post are referring to component identifiers, not entity identifiers as is discussed in this thread. Component identifiers are actually being discussed in #32.

The technique for component IDs discussed in that post will also not work very well for component identifiers in our case because those IDs are only unique in the context of the program while it is running, and we need to make sure that components are as unique as possible across even different dynamically loaded plugins, etc.

cart · 2020-08-31T00:03:57Z

Small update here: moving to u128 (#117) does regress world creation performance enough that im now more willing to consider alternatives (see that pr for numbers). I still consider that performance "totally fine" for basically any context, but for some people being as fast as possible is important and I want to take that into account.

Ideally if we move back to incrementing indices we do that in a way that does not affect the ux of:

serialization
save games
networking

We still need stable ids in those contexts and they should be "free" if possible. I haven't yet made up my mind about this and I'd love everyone else's thoughts. Short term, I'll be merging the pr to resolve collision issues.

CleanCut · 2020-08-31T03:27:27Z

I'm sure you meant to link to this pull request: #393 😊

jpetkau · 2020-09-08T23:14:35Z

In favor of 128-bit guids

Several posts here have talked about the possibility of u128 collisions (you only need to create 2^64 items to get a ~50% chance of collision.) It's actually not nearly that bad, because it doesn't matter how many items you create; it matters how many items you store in a common pool where collisions will be a problem.

(I.e. if you're creating 10000 new IDs per second for some particle system, they're probably not very long-lived, so the fact that one of those IDs gets re-used a year later in the same game doesn't actually hurt anything).

Against 128-bit guids (or any other random scheme)

Determinism is good! Any pure random-number based scheme inherently breaks determinism, which is very bad (for reproducibility, debugging, deterministic multiplayer, lots of stuff.)

You can get determinism back by seeding the RNG if you need it, but then you can't use a thread-local RNG. (Which is probably fine actually, a mutex around entity creation isn't a big deal since you need to insert it into global data structures anyway.)

cart · 2020-09-09T00:03:01Z

@jpetkau I agree with your points, but I would argue that determinism isn't always required or desirable.

Ex: The amount of time it takes to render a frame isn't deterministic. Requiring it to be deterministic is basically impossible and doesn't buy us much. Game code shouldn't rely on total_time_to_render being the same every frame. I'd argue that this is also true for entity ids. They are "opaque" and game code shouldn't rely on them being the same every time. Doing so creates more problems than it solves.

In this case randomizing encourages good behavior, because it makes code that relies on entity id stability fail immediately. Go employs the same approach to hashes / iteration order of hashmaps (or at least, they used to). They explicitly make iteration order nondeterministic to prevent people from relying on it.

kabergstrom · 2020-09-09T21:13:55Z

legion 0.3 released recently with what's IMO the most solid serialization support for any Rust ECS today. https://docs.rs/legion/0.3.0/legion/serialize/index.html

It has many months of effort behind it with Aclysma and I working to make prefabs work with it, and Tom improving the ergonomics of the APIs.

cart · 2020-09-09T21:27:24Z

@kabergstrom yeah i totally agree that legion's new serialization is nice. We already do something similar on the registry side (although we automatically generate friendly names), but I am now heavily considering an entity id approach similar to legions.

This would likely consist of the following changes:

Revert to the original hecs "entity id as index" approach.
Add an internal mapping from "runtime" entity ids to "stable" entity ids
Update serialization to use the mapping instead of direct entity ids

I'll probably tackle this soon so we can move on to other things.

kabergstrom · 2020-09-09T21:55:40Z

Also have a look at clone_from and its related Merger trait: https://docs.rs/legion/0.3.0/legion/struct.World.html#method.clone_from

It allows to rewrite entity IDs when cloning them, including all references by hooking Entity's Clone impl to rewrite the references as they are being cloned. It's what allows prefabs to be cloned while maintaining coherent internal references that point to the newly spawned entities.

jpetkau · 2020-09-29T16:47:43Z

[Sorry, this turned into a ramble that may not be super relevant to the issue any more.]

@jpetkau I agree with your points, but I would argue that determinism isn't always required or desirable.
Ex: The amount of time it takes to render a frame isn't deterministic...

Yes, "always required" is a much stronger point than I intended to make. I would say it's always valuable, though, just sometimes the value isn't worth the effort to get there.

Basically there are two use-cases I had in mind:

Deterministic simulation, e.g. for multiplayer strategy games. This is certainly a niche application, but it seems like other aspects of Bevy are already making efforts to support this. (And nphysics and rapier physics will support determinism, which is awesome!)

Here you have a large chunk of the game logic (but not 100%, e.g. rendering as you noted) which is meant to be exactly deterministic. For a complex game, this is doable, but really hard. (I've shipped this kind of game a couple of times.) You absolutely need deterministic pieces to build out of. And you need to do things like compute a hash over all the deterministic state, so you can immediately detect bugs and narrow down their causes.

It already takes a lot of care and discipline to make this work. Having components actively work against you, by making entity IDs or iteration order nondeterministic, makes it pretty much impossible.

Determinism for reproducibility in general.

This is more of a general comment on the value of being able to reproduce an exact run of some code.

Nondeterminism in integration tests = flaky tests = people not trusting test failures = people ignoring test failures = you don't have integration tests.
In general, a deterministic function is a function (in the mathematical sense). A nondeterministic function isn't. That means if you have the inputs, you can recompute the outputs. Useful for debugging, memoization, offline analysis, etc.

As you point out, nondeterminism as in hash tables (for fuzzing or to avoid certain kinds of DOS attacks) is also valuable. But I think it's even better on top of a deterministic seed. E.g. tests may be nondeterministic as long as they're passing, but as soon as they find a failure, they record the seed so that failure can then be reproduced 100%. But doing that still requires the ability to run deterministically when desired.

Also re hash tables--having non-guaranteed iteration order is very often a PITA. There's a reason for the gradual move to "iteration is in order of insertion" hash tables--they're just more useful. Here's an example from some past work:

I had some code that loaded some json, made some change, and saved it out again.
The json library wrote fields in nondeterministic order, because the json spec says that doesn't matter, and the library author was insistent that no code should never rely on it.

The result: code that should have been trivial to test ("input A, B, C should be written out unchanged; input D, E, F should be written out like this") was hard to test because the output order was shuffled. If you used the tool on a file, you couldn't just look at a diff to see the results, because the file was shuffled! Logically it was the same, but practically it was useless.

Even in the case of rendering (where e.g. frame times are nondeterminstic), there's value in making as much of a game as possible not dependent on that, and in setting up the code so if you were to record and replay the frame times (and player input and other nondeterministic parts), you can reconstruct the rest of the run exactly.

Of course it's not worth the effort if the libraries you're building on are fighting you every step of the way, which is why I argue for more determinism in libraries.

cart · 2020-09-29T20:06:40Z

@jpetkau yup those are all good points! Fortunately in this case we now use atomic incrementing indices, so we are now "more deterministic" here. As soon as we introduce parallelism that goes out the window though.

I'm closing this issue as it is now "fully resolved'. But feel free to continue discussing related topics here if you all want to 😄

…e#393)

cart added the C-Bug An unexpected or incorrect behavior label Aug 11, 2020

karroffel added the A-ECS Entities, components, systems, and events label Aug 12, 2020

cart mentioned this issue Aug 15, 2020

Upstreaming Bevy ECS Changes Ralith/hecs#71

Closed

Ratysz mentioned this issue Aug 30, 2020

Spawing many Entities and then despawing them leads to NoSuchEntity. #369

Closed

BoxyUwU added a commit to BoxyUwU/bevy that referenced this issue Aug 30, 2020

Bump entities to u128 to avoid collisions (bevyengine#117)

f51b825

cart pushed a commit that referenced this issue Aug 31, 2020

Bump entities to u128 to avoid collisions (#117) (#393)

57177c9

cart closed this as completed Sep 29, 2020

mrk-its pushed a commit to mrk-its/bevy that referenced this issue Oct 6, 2020

Bump entities to u128 to avoid collisions (bevyengine#117) (bevyengin…

0ea5f15

…e#393)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entity Ids currently have a risk of collision #117

Entity Ids currently have a risk of collision #117

cart commented Aug 11, 2020

kabergstrom commented Aug 11, 2020

Ratysz commented Aug 11, 2020 •

edited

Loading

kabergstrom commented Aug 11, 2020 •

edited

Loading

aclysma commented Aug 11, 2020

kazimuth commented Aug 11, 2020

cart commented Aug 11, 2020 •

edited

Loading

Cretezy commented Aug 11, 2020

kazimuth commented Aug 11, 2020 •

edited

Loading

cart commented Aug 11, 2020

OvermindDL1 commented Aug 11, 2020

zicklag commented Aug 12, 2020

cart commented Aug 12, 2020 •

edited

Loading

OvermindDL1 commented Aug 12, 2020

aclysma commented Aug 12, 2020 •

edited

Loading

fopsdev commented Aug 12, 2020 •

edited

Loading

zicklag commented Aug 12, 2020

aclysma commented Aug 12, 2020

OvermindDL1 commented Aug 13, 2020

RLHerbert commented Aug 16, 2020

zicklag commented Aug 16, 2020

cart commented Aug 31, 2020

CleanCut commented Aug 31, 2020

jpetkau commented Sep 8, 2020

cart commented Sep 9, 2020

kabergstrom commented Sep 9, 2020 •

edited

Loading

cart commented Sep 9, 2020

kabergstrom commented Sep 9, 2020

jpetkau commented Sep 29, 2020

cart commented Sep 29, 2020

Entity Ids currently have a risk of collision #117

Entity Ids currently have a risk of collision #117

Comments

cart commented Aug 11, 2020

kabergstrom commented Aug 11, 2020

Ratysz commented Aug 11, 2020 • edited Loading

kabergstrom commented Aug 11, 2020 • edited Loading

aclysma commented Aug 11, 2020

kazimuth commented Aug 11, 2020

cart commented Aug 11, 2020 • edited Loading

Cretezy commented Aug 11, 2020

kazimuth commented Aug 11, 2020 • edited Loading

cart commented Aug 11, 2020

OvermindDL1 commented Aug 11, 2020

zicklag commented Aug 12, 2020

cart commented Aug 12, 2020 • edited Loading

OvermindDL1 commented Aug 12, 2020

aclysma commented Aug 12, 2020 • edited Loading

fopsdev commented Aug 12, 2020 • edited Loading

zicklag commented Aug 12, 2020

aclysma commented Aug 12, 2020

OvermindDL1 commented Aug 13, 2020

RLHerbert commented Aug 16, 2020

zicklag commented Aug 16, 2020

cart commented Aug 31, 2020

CleanCut commented Aug 31, 2020

jpetkau commented Sep 8, 2020

In favor of 128-bit guids

Against 128-bit guids (or any other random scheme)

cart commented Sep 9, 2020

kabergstrom commented Sep 9, 2020 • edited Loading

cart commented Sep 9, 2020

kabergstrom commented Sep 9, 2020

jpetkau commented Sep 29, 2020

cart commented Sep 29, 2020

Ratysz commented Aug 11, 2020 •

edited

Loading

kabergstrom commented Aug 11, 2020 •

edited

Loading

cart commented Aug 11, 2020 •

edited

Loading

kazimuth commented Aug 11, 2020 •

edited

Loading

cart commented Aug 12, 2020 •

edited

Loading

aclysma commented Aug 12, 2020 •

edited

Loading

fopsdev commented Aug 12, 2020 •

edited

Loading

kabergstrom commented Sep 9, 2020 •

edited

Loading