Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity Ids currently have a risk of collision #117

Closed
cart opened this issue Aug 11, 2020 · 29 comments
Closed

Entity Ids currently have a risk of collision #117

cart opened this issue Aug 11, 2020 · 29 comments
Labels
A-ECS Entities, components, systems, and events C-Bug An unexpected or incorrect behavior

Comments

@cart
Copy link
Member

cart commented Aug 11, 2020

As some folks have pointed out in the discord, Bevy ECS uses rand::<u32>() to generate entity ids, which gives entities a decent chance of colliding when in the tens of thousands.

This needs to be fixed somehow. Per our discord discussion:

  1. Atomically incrementing ints
    • fast
    • produces predictable ids
    • centralized
  2. Random Uuids (u128)
    • slower
    • decentralized
    • astronomically small chance of collision

I'm not worried at all about random uuid collision, so at this point I think the conversation be about the performance cost of using them.

Someone should run benchmarks for each implementation.

@cart cart added the C-Bug An unexpected or incorrect behavior label Aug 11, 2020
@kabergstrom
Copy link

I think this discussion extends to the use of entity IDs in persisted, serialized files (scenes). Currently, the IDs generated by rand::random<u32>() are persisted into files, meaning the same ID space is used for in-memory entities as well as persisted entities. There is a design choice here:
A. Separate the ID space for in-memory entity IDs and persisted entity IDs
B. Keep the ID space the same for both cases.

B with atomically incremented ints as IDs would render these persisted IDs meaningless since the same IDs will be generated every run, meaning collisions are everywhere.
A will allow to use an optimized in-memory representation (atomically incremented or generational) while using UUIDs for persisted files.

Note that entity IDs that are persisted have to be unique across all runs for your entire team, for the duration of your development cycle.

@Ratysz
Copy link
Contributor

Ratysz commented Aug 11, 2020

Some additional points, for completeness sake:

From ecs channel, by @qthree:

Main problem is that you can add UUIDs as Component if you already have generational IDs, but you can't really speed up hashmap lookup (to the level of indexing of array) if you're using UUIDs as your "main key".

@kabergstrom
Copy link

kabergstrom commented Aug 11, 2020

For reference:

bevy_hecs & legion use hashmap lookups to find entity locations:
https://github.com/bevyengine/bevy/blob/master/crates/bevy_ecs/hecs/src/entities.rs#L39
https://github.com/TomGillen/legion/blob/master/src/internals/entity.rs#L99
legion uses an atomically incrementing ID with batch optimizations to reduce contention

hecs uses a generational ID, and hence uses an array for entity location lookups:
https://github.com/Ralith/hecs/blob/master/src/entities.rs#L56

@aclysma
Copy link
Contributor

aclysma commented Aug 11, 2020

I know that having a single ID space everywhere is very attractive, but I think it's somewhat inescapable that there will be more than one. Persisting things to disk really wants UUIDs, and network syncing wants substantially less. So I would caution against trying to pick something in the middle (i.e. 32 bits) that isn't ideal for either situation.

If you conclude that the same ID used everywhere is likely not going to work out, then I think UUIDs are a pretty clear choice for anything that's persisted. The cost of generating the UUID should be dominated by the cost of whatever other operation is happening that necessitated generating a UUID in the first place (like saving data to disk.)

@kazimuth
Copy link

FYI the hecs fork readme currently states that IDs are always UUIDs, that should be updated whenever a choice gets made here.

@cart
Copy link
Member Author

cart commented Aug 11, 2020

I'm definitely aware of how other projects do it. I am personally attached the simplicity of a "globally unique entity id". It reduces the internal machinery involved in the ECS implementation (a relatively small win) and it (more importantly) simplifies the conceptual model for users: "create id anywhere and it is unique".

Note that entity IDs that are persisted have to be unique across all runs for your entire team, for the duration of your development cycle.

I agree that this is the situation. Based on my current understanding of guids, this would not be a problem.

If someone has proof that random UUIDS have a real collision risk (aka: a program somewhere is likely to hit a collision over the lifetime of the bevy project ... lets be optimistic and say 1000 years 😄 ) then please let me know.

Otherwise lets focus on the performance + code complexity discussion.

edit: clarified that this is "random uuids"

@Cretezy
Copy link

Cretezy commented Aug 11, 2020

What is the performance difference with using u64 over u32? This would be a simple solution to increase the collision space.

@kazimuth
Copy link

kazimuth commented Aug 11, 2020

if you look at the ahash readme it can hash a 128-bit int in 0.61 nanoseconds. For comparison, fxhash can hash a 64 bit int in... 0.62 nanoseconds. As I understand it hashing is the main operation applied to IDs. So, if we use ahash, there's not much cost there.

larger IDs aren't going to be a problem for overall memory usage, that'll be dominated by game assets. Larger IDs could increase register pressure slightly in hot loops. However, in an ecs that groups components into archetypes (like legion or bevy_ecs), you don't even need to look at IDs during iteration. So slightly larger IDs won't be a problem there. You only need to hash IDs for lookup-by-id, which is the rarer case -- and even then, we've got ahash.

so I suspect that for most workloads uuid ids won't penalize performance. Since they provide a much easier mental model for users, I think they're the clear choice.

(Unfortunately I don't have time to write actual benchmarks rn, but could next week.)

@cart
Copy link
Member Author

cart commented Aug 11, 2020

the u32 collision issue is real enough that i'll fix it today if no-one else will, probably just by bumping to u64 or u128. we can continue to discuss the "best" solution.

ill do ecs_bench comparisons to see how much perf is affected.

@OvermindDL1
Copy link

I performed a few benchmark tests, possible wrong, but code is at https://github.com/OvermindDL1/entity_bench/ if anyone wants to fix it

This needs to be fixed somehow. Per our discord discussion:

  1. Atomically incrementing ints
  2. Random Uuids (u128)

These do indeed hold, but you should add that the u128 UUID's will then be larger then the associated incrementing ints.

Also, as for 'decentralized', they only are when calculated, to actually be put into the system still requires world synchronization, so that has to happen no matter the case.

Someone should run benchmarks for each implementation.

Of my quick benchmark on my 13 year old desktop CPU to get an incrementing ID using my ECS's code takes (for u32, u64, and u128, it supports user types as well, great for making a type-strong id):

create/create-linear-u32   time:   [8.1782 ns 8.4789 ns 8.8042 ns]
create/create-linear-u64   time:   [7.5045 ns 7.5696 ns 7.6326 ns]
create/create-linear-u128  time:   [7.8254 ns 7.8976 ns 7.9624 ns]

To create a u128 via rand is:

[26.515 ns 26.636 ns 26.777 ns]

And that's not even including the time it takes to perform the insertion into the hashmap on world, which when using a u32 to insert into a hash_map, and ahash hash_map takes:

create/create-u128-rand-hashmap   time:   [67.235 ns 67.585 ns 67.937 ns]
create/create-u128-rand-ahash     time:   [46.223 ns 46.509 ns 46.785 ns]

B with atomically incremented ints as IDs would render these persisted IDs meaningless since the same IDs will be generated every run, meaning collisions are everywhere.

Except two things, first you 'should' still test for collisions or weird errors can suddenly pop up, thoguh with u128 it is less likely but still bound to happen to the rare user, and second, and the bigger thing, you can load entities multiple times (great for templates) so you have to change their ID's anyway.'

Should just write out entities as is, but load them while creating new ID's and reconnect things as appropriate.

Note that entity IDs that are persisted have to be unique across all runs for your entire team, for the duration of your development cycle.

Only if you don't remap on load, which you should do anyway because you can load things multiple times (instance a house and all its stuff into multiple locations for example).

If someone has proof that random UUIDS have a real collision risk (aka: a program somewhere is likely to hit a collision over the lifetime of the bevy project ... lets be optimistic and say 1000 years smile ) then please let me know.

Well with u32 I couldn't get a successful benchmark to run at all due to collisions. With u128 then statistically you'd have a 50% chance of collision when 2^64 entities have been allocated. Which given a number of entities, players, and days is bound to happen eventually, and it's a bad user experience for their program to suddenly crash with an lookup error or so. This may be a small chance of happening, but it is still possible in a reasonable time frame (and that chance can be made so so much worse in certain situations, especially with poor quality / fast PRNG's or certain game setups) and just waiting for a ticking time bomb in the code makes one very wary. The engine should be as stable as possible, and random ID's is not as stable as possible.

What is the performance difference with using u64 over u32? This would be a simple solution to increase the collision space.

With just u64 you have a 50% chance of collision after just 4 billion entities created, which over even a modest set of player over a few days is bound to happen. You could make a loop run for a few minutes and likely get a collision.

As I understand it hashing is the main operation applied to IDs. So, if we use ahash, there's not much cost there.

No need to hash incremental reused ID's, so no hashing cost at all.

the u32 collision issue is real enough that i'll fix it today if no-one else will, probably just by bumping to u64 or u128. we can continue to discuss the "best" solution.

Yeah, I hit it multiple times today with my tests...

@zicklag
Copy link
Member

zicklag commented Aug 12, 2020

This may be a small chance of happening, but it is still possible in a reasonable time frame (and that chance can be made so so much worse in certain situations, especially with poor quality / fast PRNG's or certain game setups) and just waiting for a ticking time bomb in the code makes one very wary.

This is definitely a concern from a usability perspective. If that is a problem then I think that outweighs any cognitive advantage of "create id anywhere and it is unique" ( though I agree with the beauty of that model if it could ever be relied upon ).

@cart
Copy link
Member Author

cart commented Aug 12, 2020

In order to reach that collision likelihood , where we have a shared game world where each player generates 1,000,000 unique entities per day (which is high), we would need:

184,467,440,000,000,000,000 (50% collision chance) /
7,800,000,000 (world population) / 
1,000,000 (entities per day) /
365 days
= 
64.8 years of every person in the world playing the same shared game with no resets. 

Thats a lot of entities and a lot of time

@OvermindDL1
Copy link

That's still to 50%, even a small percentage of a single percent will still cause it to occasionally happen due to the amount of people using it in a variety of ways over time.

@aclysma
Copy link
Contributor

aclysma commented Aug 12, 2020

128-bit UUIDs are the gold standard for "no one will ever get the same one, ever". They are relied upon in distributed databases for far more important things than games. According to wikipedia: "the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion" (https://en.wikipedia.org/wiki/Universally_unique_identifier) You are about 1000 times more likely than that to be struck by lightining this year. (https://www.weather.gov/safety/lightning-odds).

I also think micro-benchmarking the cost of generating them is ignoring that whatever is triggering creating the UUID in the first place will almost certainly dominate the cost of generating the UUID. Even if spawning an entity was nothing more than a 128-byte memcpy, a profiler would show that before it showed generating a UUID. Games don't spawn/despawn thousands of entities per frame during gameplay anyways.

you can load entities multiple times (great for templates) so you have to change their ID's anyway.

I think this is a really good point

If I were designing an engine I would do:

  • 128-bit UUIDs for anything saved to disk
  • 64-bit counter for in-memory IDs
  • 16-bit IDs for syncing dynamic entities in a real-time simulation over a network

I think 128-bit for in-memory IDs can work, but between @OvermindDL1's comment RE: spawning multiple copies of the same entity in different places in the world, and that it's likely there will be more than one ID space anyways, I would accept having a UUID/in-memory ID mapping table and enjoy the smaller IDs.

@fopsdev
Copy link

fopsdev commented Aug 12, 2020

Hmm... Not sure if its of help. But what about defining a namespace for each project and a atomic 32bit increasing number?
Ok if i think more on this subject the problem will be with distributed systems i guess.
Then why not use a 64bit number and having ranges for each player?
lets say we have 2^64 / 2^40 (nr of entities per player) = 2^24 (possible nr of players)
(just throwing in some random ideas. i'm still learning all of this 🐥 😄 )

@zicklag
Copy link
Member

zicklag commented Aug 12, 2020

Games don't spawn/despawn thousands of entities per frame during gameplay anyways.

What about particle effects, though? If each particle has to have a transform, don't those need their own entities? I'm not sure if there are different strategies for handling particles or not.

@aclysma
Copy link
Contributor

aclysma commented Aug 12, 2020

I wouldn’t put particles in an ECS. It would be common to leave that data in a GPU buffer since it generally doesn’t affect gameplay. It’s also an example of if you are doing an extreme amount of something (like in sand games) you probably want a non-general, highly optimized solution rather than a general purpose solution.

@karroffel karroffel added the A-ECS Entities, components, systems, and events label Aug 12, 2020
@OvermindDL1
Copy link

Spawning templates of things is common, like say a projectile for an FPS, units in an RTS, etc...

In addition, I hit another issue with using random entity ID's, some of my map generation iteration time is dominated by the hashmap lookup from looking up an entity data to get its draw information so I can render visible entities and map information. If they were packed near 0 like in my code above then that cost vanishes as it's just an array index. Hashmap lookups in tight loops is a significant performance hit.

@RLHerbert
Copy link

Forgive my newness to the topic, but would a family generator style a la https://skypjack.github.io/2019-02-14-ecs-baf-part-1#unique-identifiers make sense within the Bevy context?

@zicklag
Copy link
Member

zicklag commented Aug 16, 2020

@RLHerbert no problem. :)

Someone correct me if I am wrong, but the identifiers referred to by that post are referring to component identifiers, not entity identifiers as is discussed in this thread. Component identifiers are actually being discussed in #32.

The technique for component IDs discussed in that post will also not work very well for component identifiers in our case because those IDs are only unique in the context of the program while it is running, and we need to make sure that components are as unique as possible across even different dynamically loaded plugins, etc.

@cart
Copy link
Member Author

cart commented Aug 31, 2020

Small update here: moving to u128 (#117) does regress world creation performance enough that im now more willing to consider alternatives (see that pr for numbers). I still consider that performance "totally fine" for basically any context, but for some people being as fast as possible is important and I want to take that into account.

Ideally if we move back to incrementing indices we do that in a way that does not affect the ux of:

  • serialization
  • save games
  • networking

We still need stable ids in those contexts and they should be "free" if possible. I haven't yet made up my mind about this and I'd love everyone else's thoughts. Short term, I'll be merging the pr to resolve collision issues.

@CleanCut
Copy link
Member

I'm sure you meant to link to this pull request: #393 😊

@jpetkau
Copy link

jpetkau commented Sep 8, 2020

In favor of 128-bit guids

Several posts here have talked about the possibility of u128 collisions (you only need to create 2^64 items to get a ~50% chance of collision.) It's actually not nearly that bad, because it doesn't matter how many items you create; it matters how many items you store in a common pool where collisions will be a problem.

(I.e. if you're creating 10000 new IDs per second for some particle system, they're probably not very long-lived, so the fact that one of those IDs gets re-used a year later in the same game doesn't actually hurt anything).

Against 128-bit guids (or any other random scheme)

Determinism is good! Any pure random-number based scheme inherently breaks determinism, which is very bad (for reproducibility, debugging, deterministic multiplayer, lots of stuff.)

You can get determinism back by seeding the RNG if you need it, but then you can't use a thread-local RNG. (Which is probably fine actually, a mutex around entity creation isn't a big deal since you need to insert it into global data structures anyway.)

@cart
Copy link
Member Author

cart commented Sep 9, 2020

@jpetkau I agree with your points, but I would argue that determinism isn't always required or desirable.

Ex: The amount of time it takes to render a frame isn't deterministic. Requiring it to be deterministic is basically impossible and doesn't buy us much. Game code shouldn't rely on total_time_to_render being the same every frame. I'd argue that this is also true for entity ids. They are "opaque" and game code shouldn't rely on them being the same every time. Doing so creates more problems than it solves.

In this case randomizing encourages good behavior, because it makes code that relies on entity id stability fail immediately. Go employs the same approach to hashes / iteration order of hashmaps (or at least, they used to). They explicitly make iteration order nondeterministic to prevent people from relying on it.

@kabergstrom
Copy link

kabergstrom commented Sep 9, 2020

legion 0.3 released recently with what's IMO the most solid serialization support for any Rust ECS today. https://docs.rs/legion/0.3.0/legion/serialize/index.html

It has many months of effort behind it with Aclysma and I working to make prefabs work with it, and Tom improving the ergonomics of the APIs.

@cart
Copy link
Member Author

cart commented Sep 9, 2020

@kabergstrom yeah i totally agree that legion's new serialization is nice. We already do something similar on the registry side (although we automatically generate friendly names), but I am now heavily considering an entity id approach similar to legions.

This would likely consist of the following changes:

  • Revert to the original hecs "entity id as index" approach.
  • Add an internal mapping from "runtime" entity ids to "stable" entity ids
  • Update serialization to use the mapping instead of direct entity ids

I'll probably tackle this soon so we can move on to other things.

@kabergstrom
Copy link

Also have a look at clone_from and its related Merger trait: https://docs.rs/legion/0.3.0/legion/struct.World.html#method.clone_from

It allows to rewrite entity IDs when cloning them, including all references by hooking Entity's Clone impl to rewrite the references as they are being cloned. It's what allows prefabs to be cloned while maintaining coherent internal references that point to the newly spawned entities.

@jpetkau
Copy link

jpetkau commented Sep 29, 2020

[Sorry, this turned into a ramble that may not be super relevant to the issue any more.]

@jpetkau I agree with your points, but I would argue that determinism isn't always required or desirable.
Ex: The amount of time it takes to render a frame isn't deterministic...

Yes, "always required" is a much stronger point than I intended to make. I would say it's always valuable, though, just sometimes the value isn't worth the effort to get there.

Basically there are two use-cases I had in mind:

  1. Deterministic simulation, e.g. for multiplayer strategy games. This is certainly a niche application, but it seems like other aspects of Bevy are already making efforts to support this. (And nphysics and rapier physics will support determinism, which is awesome!)

Here you have a large chunk of the game logic (but not 100%, e.g. rendering as you noted) which is meant to be exactly deterministic. For a complex game, this is doable, but really hard. (I've shipped this kind of game a couple of times.) You absolutely need deterministic pieces to build out of. And you need to do things like compute a hash over all the deterministic state, so you can immediately detect bugs and narrow down their causes.

It already takes a lot of care and discipline to make this work. Having components actively work against you, by making entity IDs or iteration order nondeterministic, makes it pretty much impossible.

  1. Determinism for reproducibility in general.

This is more of a general comment on the value of being able to reproduce an exact run of some code.

  • Nondeterminism in integration tests = flaky tests = people not trusting test failures = people ignoring test failures = you don't have integration tests.

  • In general, a deterministic function is a function (in the mathematical sense). A nondeterministic function isn't. That means if you have the inputs, you can recompute the outputs. Useful for debugging, memoization, offline analysis, etc.

As you point out, nondeterminism as in hash tables (for fuzzing or to avoid certain kinds of DOS attacks) is also valuable. But I think it's even better on top of a deterministic seed. E.g. tests may be nondeterministic as long as they're passing, but as soon as they find a failure, they record the seed so that failure can then be reproduced 100%. But doing that still requires the ability to run deterministically when desired.

Also re hash tables--having non-guaranteed iteration order is very often a PITA. There's a reason for the gradual move to "iteration is in order of insertion" hash tables--they're just more useful. Here's an example from some past work:

  • I had some code that loaded some json, made some change, and saved it out again.
  • The json library wrote fields in nondeterministic order, because the json spec says that doesn't matter, and the library author was insistent that no code should never rely on it.

The result: code that should have been trivial to test ("input A, B, C should be written out unchanged; input D, E, F should be written out like this") was hard to test because the output order was shuffled. If you used the tool on a file, you couldn't just look at a diff to see the results, because the file was shuffled! Logically it was the same, but practically it was useless.

Even in the case of rendering (where e.g. frame times are nondeterminstic), there's value in making as much of a game as possible not dependent on that, and in setting up the code so if you were to record and replay the frame times (and player input and other nondeterministic parts), you can reconstruct the rest of the run exactly.

Of course it's not worth the effort if the libraries you're building on are fighting you every step of the way, which is why I argue for more determinism in libraries.

@cart
Copy link
Member Author

cart commented Sep 29, 2020

@jpetkau yup those are all good points! Fortunately in this case we now use atomic incrementing indices, so we are now "more deterministic" here. As soon as we introduce parallelism that goes out the window though.

I'm closing this issue as it is now "fully resolved'. But feel free to continue discussing related topics here if you all want to 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Bug An unexpected or incorrect behavior
Projects
None yet
Development

No branches or pull requests