-
-
Notifications
You must be signed in to change notification settings - Fork 746
Implement thread-safe ECS #10
Comments
You may be interested in my (very early stages, very experimental) similar project here: I've done quite a bit of thinking about parallelizing processors (as I call them) over the past few months, and I've arrived at the following conclusions: Processors will often have a hierarchy of priority, where certain processors must be run before others. From this I came up with the following design:
I am still in the middle of implementing the parallel iteration, as well as the MPSC queue, but the intent // The particle component this uses.
struct Particle {
pos: Position,
effect: Effect,
frames_left: u32,
}
impl Component for Particle {}
struct ParticleProcessor {
graphics: Graphics,
}
impl Processor for ParticleProcessor {
fn process<'a, 'b, C: Components + 'a>(&'a mut self, world: WorldHandle<'a, 'b, C>) {
enum Action {
// draw the effect and decrement the frame count
Decrement(Entity),
// destroy the entity.
Destroy(Entity)
}
// read phase is here -- no writes will occur while this is processing.
let actions = world.all_with::<Particle>().for_each_async(|e| {
// entities here are guaranteed to have the particle component.
let part = world.get_component::<Particle>(e).unwrap();
if part.frames_left != 0 {
Action::Decrement(e),
} else {
Action::Destroy(e),
}
});
// write phase is here. get exclusive access to the world.
let mut write = world.write();
for action in actions {
match action {
Action::Decrement(e) => {
let mut part = write.get_mut_component::<Particle>(e).unwrap();
part.frames -= 1;
draw_particle(&mut self.graphics, part.pos, part.effect);
}
Action::Destroy(e) => {
write.destroy_entity(e);
}
}
}
}
} From this example we can see that there are a few different degrees of parallelism to processors. Although the example here is somewhat contrived, the main idea is that as much work should be done let mut my_world = ...;
my_world.process(|ctxt| {
// all processors executed in this group will execute purely asynchonously, except for
// contention during their write phases.
ctxt.process_group(|group| {
group.process(&mut A);
group.process(&mut B);
group.process(&mut C);
});
// execution returns here only once all grouped processors have completely finished,
// guaranteeing an order of events.
// these processors may use some parallelism internally,
// but are not able to be sent to other threads to execute.
ctxt.process_sync(&mut D);
ctxt.process_sync(&mut E);
// E is only run once D is fully complete.
}); All this being said, it seems like your initial reaction has similar elements to mine, but it rather focuses on a priority of events rather than a priority of processors. |
Great information! Seems like our approaches are indeed pretty similar, for the most part. For further comparison, here is an academic paper called A Concurrent Component-Based Entity Architecture for Game Development (2015). It arrives at a similar conclusion to ours:
If possible, I prefer a lockless approach to parallelism, taking advantage of the Rust ownership system to catch data races at compile time, rather than at runtime with a mutex or similar locking data structure. However, I am open to using locks if concrete benchmarks prove better performance. |
@ebkalderon Thanks for the paper. I haven't seen it yet, but it seems promising. I am wary of the idea of using N "past" states and a "future" state -- this necessarily means copying the game state N times per frame to different points in memory. Of course, there are situations like networking where copying game state, or part of it, is necessary. One issue I have with the paper is the scheduling aspect of processors. The author chooses to assign each processor purely to one thread, using a task-parallel model rather than a data-parallel one. A large portion of the scheduling section is concessions that processor time is unpredictable, and there is likely to be a decent amount of wasted CPU time due to that unpredictability. I favor a work-stealing approach, where parallelizable processors simply decide which data they need, how to process it, and let all the threads work as fast as they can in order to do that. There is no excess time spent on scheduling or waiting for overloaded threads to finish. I also am not completely convinced that purely lock-free is even viable here: using your model, at some the threads must contend for access to the MPSC queue which produces component updates. It might be possible to implement such a queue in a lock-free way, but consider that the costs of locking an unlocked mutex are vastly overestimated (often not much more than an atomic compare-and-swap). The key to any multithreaded ECS is idempotence. The same state S0 passed through the same processors should always lead to an identical output state S1. My initial approach was similar to yours, where every processor produced some amount of changes to the state, to be applied later. I toyed with a concept of "commutative" and "non-commutative" changes which could be made to component data -- these are strangely similar to Rust's borrowing system:
To break these rules would lead to a runtime error. The implementation of this design led a few glaring issues:
One data model I toyed with for complete concurrency was giving every component type its own data buffer, rather than storing all an entity's component data together in a block. If you protect each of these buffers with a mutex of its own, processors which do not access the same component data can interleave with zero contention. The caveat is that the entity manager is a resource which many processors will contend for access to, so it became the bottleneck. I am curious what your mental model for the types of updates is currently. How does the user specify which component types the world manages and what order updates should be applied in? What type do updates have? Where are they stored? How are they iterated over to be applied? |
Thanks for the response! Just to clarify, there is no copying of the world being done. There is only the existing world state and the future world state being computed by the processors. The reason why I contend that my proposal could be made lock-free is because it is task-parallel (AFAIK, a performance disadvantage) rather than data-parallel, with discrete parallel-read and serial-write stages, where the processors' writing priority is determined by the order in which they were passed into the Application through a builder pattern. That said, I appreciate you explaining your position. The data-parallel design you described seems sound and probably more efficient (concrete benchmarks would have to be made to compare both to be sure). I welcome more feedback on both approaches, or other alternatives, from other participants before the final decision is made. |
Avoid trying to handle process scheduling. If any kind of scheduling is required by the final design, provide the tools to allow the user of the library to do it. They'll be the ones that are able to properly benchmark different schedules in order to get the best performance for their particular game. |
@HeroesGrave nice to see you here! |
Some thoughts from my attempts to parallelize a Flocking simulation:
|
Isn't there a way to implement systems which do not affect (write to) the same components? |
@Oflor Honestly, that is going to be difficult. Most notably: the physics processor, the particle processor, and the animation processor all modify entities' transform data. One could break down the components into ever-more granular and specific pieces, but this introduces a host of unnecessary complexities (having multiple transform components, syncing duplicate data between them, etc). I personally prefer to establish and enforce strict order dependencies between processors, either similar to @mangecoeur's or @rphmeier's approaches, or according to my original proposal. |
The user-facing API design of the ECS has been drafted on the ECS Design page on the wiki. Please leave your feedback here or on Gitter! Edit: Updated link. |
Just had a look at the proposed usage. One thing I'm not keen on is that the proposed way of adding Processors doesn't make it sufficiently explicit that they will run in the order they are added, so that something like this: .with(Input::new())
.with(Physics::new()) will not necessarily have the same result as this: .with(Physics::new())
.with(Input::new()) Since there was some emphasis on making everything predictable and deterministic I think the its important that this is really clear. Some alternative suggestions
processors = vec!{Physics::new(), Input::new()}
world.add_processors(processors)
processors = processors_group!{Physics::new(), Input::new()}
world.add_processors(processors) |
Also a question: how are Processors associated with the Components that they need to process? |
Since we want to run our ECS multi-threaded, will this order be used for the sequential writes to world and reading is allways simultaneously? Or do we want to control which systems are run in parallel? SimulationBuilder()::new()
.append(Rendering::new())
.append_parallel(Physics::new(), Ui::new())
.finalize()
I think processors simply get a reference to the world and query the components they need to access. Proposal for the API of the world: impl World {
/// Creates a new empty world.
pub fn new() -> World;
/// Creates a new entity in the world and returns a handle to it.
pub fn create_entity(&mut self) -> Entity;
/// Destroys a given entity and removes its components.
pub fn destroy_entity(&mut self, entity: Entity);
/// Attaches a component to an entity.
pub fn insert_component<T: Any>(&mut self, entity: Entity, comp: T);
/// Remove a component from an entity.
pub fn remove_component<T: Any>(&mut self, entity: Entity);
/// Get component by entity.
pub fn component<T: Any>(&mut self, entity: Entity) -> Option<&T>;
pub fn component_mut<T: Any>(&mut self, entity: Entity) -> Option<&mut T>;
/// Iterate over components by type.
pub fn components1<A: Any>(&self)-> Iterator<(Entity, &A)>;
pub fn components2<A: Any, B: Any>(&self)-> Iterator<(Entity, &A, &B)>;
pub fn components3<A: Any, B: Any, C: Any>(&self)-> Iterator<(Entity, &A, &B, &C)>;
// some more...
pub fn components1_mut<A: Any>(&self)-> Iterator<(Entity, &mut A)>;
pub fn components2_mut<A: Any, B: Any>(&self)-> Iterator<(Entity, &mut A, &mut B)>;
pub fn components3_mut<A: Any, B: Any, C: Any>(&self)-> Iterator<(Entity, &mut A, &mut B, &mut C)>;
// some more...
} |
(sorry if I missed something in this thread, not studied it through) ctxt.process_group(|group| {
group.process(&mut A);
group.process(&mut B);
group.process(&mut C);
}); and: .append_parallel(Physics::new(), Ui::new()) Both of these approaches enforce a fork-join model on the API level, which imposes a tight upper bound on the performance you can squeeze out of a parallel system, since chooses a rather tiny subset of all possible execution graphs. For example, what if Optimally, each system just specifies the dependencies for it to start, and then some sort of a scheduler figures out when to start it. We did some experiments with @csherratt on fibe-rs, you could use its example for inspiration: let ha = task(move |_| {print!("Hello, ")}).start(&mut front);
task(move |_| {println!("World!")}).after(ha.signal()).start(&mut front);
If you don't explicitly expose which components are being read, and which are modified per system, you don't provide the scheduler an opportunity to figure out the proper timeline for their execution. One could specify the systems directly as dependencies, or build the dependency graph judging by which components are getting modified and read. The latter would work well if you could actually enforce the fact that these and only these components are what each system receives for processing. |
Paralleling over systems might be an overkill if we already parallel over entities. There are going to be more entities than systems (if not, we should be able run it in a single thread anyway). Specifying accessed components requires dealing with trait-objects and TypeIds (I think), and that is already less efficient than compile-time system managers. |
Wouldn't this require that Components should be defined by implementing a trait? That would make it difficult to create game objects by loading their definitions from declarative data files (would would be forced to define all entities in rust source code).
I think this is important - even if it does mean messing with TypeIds, it opens up a lot of doors for speeding up execution, which should offset the extra complexity. Another thought - it may be possible to perform these checks at compile time using compiler plugins. Declarative game object definitions could be read in at compile time (instead of at runtime) to allow these checks to be made. It may even be possible to build re-usable machinery that can either perform the checks at runtime or at compile time, so that while you prototype you can edit entities dynamically without waiting for a re-compile every time, but when you move to release you could bake everything in. Just an idea... |
I haven't considered that. At the first sight, parallel entities processing would involve a lot of synchronization overhead. Like, you'd need some kind of dynamic sync before accessing each entity, as opposed to just starting a system. Another major problem would be the cache contention - if components are close to each other in memory, and one of them gets modified by one thread, access to all other components in this cache line will be delayed significantly for other threads... In other words, I'm not convinced it's worth it. Parallelizing systems just seems like an easier task, and it gives rather good performance - that's what Bungie and Naughty Dog are doing, IIRC.
I'm pretty sure it should be possible to specify at compile time too. @mangecoeur proposed one way, via compiler plugins, but I'd also like to explore non-plugin approaches. |
// Create the entity eid with the compoents Foo{}/Bar{}
let eid = ENTITY.with(Foo{}).with(Bar{}).create(&mut world);
// Update the component Baz
eid.with(Baz{}).update(&mut world); The advantage of the builder API like I have shown above is that it does not lock the // do some update
fn update(eid: Entity) -> EntityWith<(Foo,)> { eid.with(Foo{}) }
// in parallel update all entities and write the results into `world`
// not sure if this would actually work this elegantly (haven't got to trying it yet)
entities.parallel_map(update).write(&mut world); Accessing the data could be done with queries on the world object. // select every entity from world that has a Foo attached to it
for (eid, foo) in world.select::<(Foo,)>() {
}
// select every entity from world where Foo & Bar are attached to it
for (eid, (foo, bar)) in world.select::<(Foo, Bar)>() {
}
// select every entity from world where Foo & Bar are attached and
// it is a member of the entity set
for (eid, (foo, bar)) in world.select::<(Foo, Bar)>().only(&entity_set) {
} The biggest problem with these queries (or any iterator for that matter) is that you can't actually modify the The solution I have always reverted to is making it cheap to clone Alternatively systems could work in parallel with past copies, and write their updates to a changelist. This can be annoying as you can't see the writes you have made until they are committed to the |
@csherratt We were initially going to go with the change list model, but eventually decided against it because of suspected latency. However, we haven't done any concrete profiling to confirm this. Have you tested this and/or the locking method before? If so, what were your conclusions? |
@csherratt's approach sounds really similar to the basic "queries" I've been imagining while experimenting. I can try and work a little on the implementation, to see how feasible they really are. If user-specified storage of component data is allowed (e.g. positional data structures), they can also compose with custom filters associated with that storage to get the entities desired with maximum efficiency. The main issue with cloning the world is that it can get extremely expensive when you've got a lot of entities, and a lot of data to go with them. Even if it were as simple as a raw @kvark I do think you're right that having an implicitly fork-join execution model limits the space of execution graphs drastically. However, it also works simply to minimize contention over components, and the subset of execution graphs which it does allow are generally efficient. The alternate approach is what you said above: have a signal-based approach where processors wait to be "kickstarted" by their dependencies. This does mean that each processor needs to be aware of not only its dependencies, but also its successors. Can we easily encode this information without much boilerplate? |
You could define a system based on it's inputs and what it modifies. The scheduler knowing this knowledge is useful. as @kvark has said. So, where does this lead? If a system is defined by what it's input is, the system is pending on those inputs to be stable before it can process. If a system modifies one or more components it is both waiting on that component to be stable, and it is produce a new version of said components. A The ordering follows the order in software in the case of a conflict. let mut world = World{};
loop {
// run is a magical function that in practice would probably take more boiler plate then this.
// I think it is possible to write, but have not attempted to yet.
world = run(world /* world is moved here */, system_a);
world = run(world, system_b);
world = run(world, system_c);
world = run(world, system_d);
} @rphmeier about the copying, I said it would be copy-on-write. Meaning that when you do a I'm not a fan of locking. It's not because the locking itself is expensive. There are two problems
|
@csherratt Are you thinking about the I also realized that the scheduling problem can be very much simplified. Sorry if this is obvious for someone, but I feel it being important to mention explicitly - we don't have to implement a full-featured scheduler for the ECS systems. Instead, we follow the order in which user specifies the systems and components they modify. Each system either waits or executes in parallel, depending on whether or not the input components are locked for writing, and the output components are still being read. This is very simple for the user: they don't have to think about it, but they can still fine-tune the execution order according to their specific requirements that the scheduler wouldn't know about anyway ;) Going back to @csherratt 's So, given that the scheduling problem is addressed by relaxing the order and using |
Yes, I was thinking of something like @kvark there is no problem with passing in The one tricky area I foresee is enumeration of entities that are being created. It might be good enough to just use a mutex protected object. We could do the same thing as the futures of course. |
@kvark Actors are about getting as parallel and distributed as possible. The principles to achieve maximum parallelisation are founding to Erlang and OTP. "Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang." Basically following those principles is a way to get code to naturally run on as many cpus as you have available. Some of them are already compatible with Rust's principles, like immutability by default. |
@doppioslash so if we try to categorize the actors approach into the ECS wiki, will it go as "message passing" solution for the component updates? Or somewhere else? |
@kvark I think "message passing" is correct. |
@kvark @doppioslash Correct, ECS and Actor are two different paradigms, however they can be used in tandem, they are not incompatible. Specifically what I would do is have an Actor system for all Entities, but have the Entities themselves manage their own components. By keeping the components internally then you can have the actors process things themselves, fully async, and you can pack them with the entity definition in memory as well (which is what Erlang does for an Actor, it allows the Erlang garbage collector to be blazing fast as it either needs to collect within a single Actor (very rare as it holds everything on the 'stack' so as functions are popped or TCO is used then things are just unseen even though still exist, if the GC runs it is generally only because the Actor got a very deep stack and is not using it so deep anymore so it will free up some of the tail memory, which you can force via the hibernate command to fully compact the Actor), or the GC 'runs' when an Actor dies by just freeing up the entire Actor memory (technically holding on to it to allocate another Actor within). The Erlang VM (BEAM) is blazing fast for Actors due to its design (which I know quite well in the JIT HiPE level so I can describe anything if you need), and I've yet to see a language implement it so it runs as fast or as safe, however the BEAM is fully dynamic, I.E. not typed, so that harms its efficiency (about on par with LUA for math, not great, but not bad, just not 'C' speed, although its interaction points of PORTS, NIFs, and others make up for that for needed areas) but the overall programming design is a great thing to model, and based on what I have been learning about Rust I think an Actor model could actually be implemented in it properly unlike in Java and C++ where the holes utterly cause it to fail in the corner cases. 'Within' the Actor you would allocate on its 'stack' the component data, it could have a cache on that same Actor 'stack' of which systems need to be run within it. Sadly this means that the function calls are inverted if the Actor itself runs it, however you could invert it internally so that the systems run on the internal Actor data. If the Actors remain tightly packed (say an on-created component allocation internally with externally linked dynamic components, or rebuild it on the fly, whichever makes more sense) than you could still get most of the cache efficiency as well, though honestly I would say do not even bother with this pre-optimization, the point of Actors is to run comparatively rarely by only responding to messages, just have them send messages. Basically say you have an Actor, it does not know where it is, it does not know its position, the Actor model is modeled after real life, if you are floating in space you have no clue where you are, not even on earth, you have to use reference points, you see the light coming from the reflected planetary surface of Earth to see your location, same with Actors, they do not hold their position themselves, they would query (send a message to) the 'World' that they are in (an octree perhaps, or however it wants to partition them internally, maybe it queries the physics engine for where it itself is). They only hold state data that is internal to them, their Position is not internal data, just like it is not to you, it is all relative, you need to figure out your location based on prior knowledge, looking at your surroundings, etc.. The nice thing about this design is say you have a game server, your Actors could, for example, be running across a whole bank of servers. You do have to design the game a little different, you have no such thing as lockstep (which I personally think is an anti-design anyway), messages take non-trivial time (usually microseconds, practically immeasurable, but the message will be received when the receiver is next scheduled, might be immediate, might be a second away if it is on a busy server across the world), and it is very easy to program for once you learn it, but it has such great advantages. So for this, probably 'message passing' is the most accurate, but still not accurate, as the internal ECS system could still be any of the others as they run 'internally to' an Actor, but the Actors communicate via messages. I just got home after a long day so I hope the above is understandable, I think I need a nap. ^.^ |
I do want to make clear, imagine an Actor being implemented underneath via something kind of like this (in pseudo-C++): struct ActorData {
ActorPid id;
AtomicQueue messageStack;
GrowableStack stack;
} Erlangs have some (a lot of) extra fields for optimization purposes, but that is the basics of it. The ActorPid would be say a struct of 2 64 bit integers, where there is the 'node' id and the 'self' id, the node being the machine ID that it is running on and the self id being the actual ID of the node, in Erlang they work kind of like the Handles I mentioned previously, the index ID's are reused but there is a counter of use so it cannot be re-accessed again. If you send a message to an Actor pid then it appends the message onto the AtomicQueue end and is done with it. Every Erlang function takes an internal hidden pointer to its position in its stack, which is just untyped memory, the variables in Erlang are untyped, but the memory itself is typed, so doing something like So the scheduler is running (one per CPU core) and it sees that the Actor that received the message has a non-empty message queue, so it then calls the frozen function (in Erlang a If an Actor terminates uncleanly (crashes, divides by zero, whatever) the terminate will not be called but the Actor will be outright killed (all others keep going) and the Supervisor that it was created under will see the reason of why it died and usually recreate it if you have it do so, if too many die too fast then the Supervisor will (if your settings tell it to) then kill itself too, also killing all Actors under its supervision, to where 'its' Supervisor can recreate it and it can recreate the others with fresh good state from the DB or whatever. That tree of Supervisors that go down to basic GenServers (which have other things built on them like GenEvents, GenFSM's, etc...) defines the Supervision Tree of the entire Application (of which there can be multiple running Applications within an entire project). When a Supervisor creates an Actor it can create it locally on the same Node (by default), on another Node, ephemerally, or in a few other ways, but it knows about them, watches them, sets up Monitors with the Schedulers so it knows when they die (or an entire Node dies, like hardware failure), etc... Basically, a simple Elixir Actor without OTP would be ( def getOtherThing():
receive
{:wait, anything} -> # wait for anything and return it!
anything
after 5000 -> # Nothing received after 5 seconds, how sad... I could leave this out to wait forever...
:nothing
end
end
def myLoop(num):
receive
{:get, toPid} -> # Someone requested what my number is, lets tell them!
toPid ! {:theNum, num} # Sending them a message of a tuple with the atom :theNum and my number
myLoop(num) # Tail call, no stack growth
{:set, newNum} -> # Someone wants to set my number, sure!
myLoop(newNum) # Tail call, no stack growth
{:set_get_old, toPid, newNum} -> # Someone wants to set my number and get the old one sent back to them, sure!
toPid ! {:theNum, num}
myLoop(newNum) # Tail call, no stack growth
:wait_for_another_thing -> # Ooo, someone says they will send me something else, then ignore it
getOtherThing() # This has an embedded receive that will return whatever the next message receive of a :wait type
myLoop(num) # Tail call, no stack growth
:stop -> # Aww, someone wants me to die... Oh well... :-(
# just return, do nothing... you have to return 'something' in Elixir, so I'll just do the atom :stop
:stop # Tail, exiting, this Actor will die
after 1000 ->
IO.puts "Nothing happened for a second, I'm bored..." # The IO system can even redirect output if you want to another Node, say you connected a shell into the system to introspect it
myLoop(num) # Tail Call, no stack growth
end You could use it like: boop = spawn myLoop(5) # Spawn a new Actor with the initial arg of 5, returns the PID
boop ! {:get, self()} # Sends a message to boop, boop should respond fairly immediately, can dump all pending messages on self() in the shell by doing:
flush() # returns the number that boop sent back to us
boop ! {:set, 10} # Have boop use the number 10 now!
boop ! :erghblergh # boop will now have this message stuck in its message inbox since there is never a receive that handles any message, should have added a case to receive to handle any message and log and dump it, OTP does for you!
boop ! :stop # boop will now die when it processes this message a few microseconds later
boop ! :anything # This goes no where, you get no respond, no confirmation, if I set up a Monitor on boop then I would have got a Monitor :down message when boop died With OTP the above as a GenServer it would be more like this: defmodule Boop do
use GenServer # this brings in a set of methods and such that you can override below, technically optional if you define them all, but good for documentation purposes anyway
def start_link(startNum) do # The 'start_link' function can be named whatever, this is just normal OTP style
GenServer.start_link(__MODULE__, startNum) # This calls the start_link on GenServer, what it does is that whatever process called this method will get an automatic monitor
# setup for the process that GenServer is also creating (and returning the pid of), it will use this module 'Boop' as the callback module
# When the GenServer code of this Actor (it has its own receive loop) receives a message then it will call this callback module
# The callback functions are resolved at JIT/Actor creation time so no indirect calls are done that would waste time
end
def init(startNum) do # The argument is the second argument given to GenServer.start_link, can be whatever you want
{:ok, startNum} # Return a tuple of {:ok, state} to start your GenServer actor, or return an {:error, msg} to die, few other options too
end
def handle_call(:get, _from, num) do
{:reply, num, num} # return the number to the caller (second tuple entry) and as the state again (third)
end
def handle_call(handle more of these...) do
# etc...
end
def handle_cast({:set, newNum}, _num) do
{:noreply, newNum} # change the state to the new number
end
def handle_cast(more of these...) do
# etc...
end
def handle_cast(:stop, _num) do
{:stop, :normal, terminationStateIfAny} # Stop this GenServer with a :normal reason (can give anything else for a failure message) and pass in any data as the state for a terminate function if it exists
end
end And more (though for simple Actor Data containers like the above Elixir has a GenServer sub-type called an Agent, the above is similar to how an Agent is implemented). The raw BEAM 'receive' stuff would not be so easy in Rust as I've not seen a way that Rust can break up a single function into multiple callable parts, but the OTP style is doable by Rust perfectly, which is good as raw receive should almost never be used. And an example of a BEAM webserver, take Phoenix, they've got it to over 2 million simultaneous websocket connections, and if they sent a message to those connections (a wikipedia article they used due to its size) took 2-4 seconds on a single machine. BEAM is, for example, what WhatsApp uses to host their messaging system on a single server for all users. This is not really a show-off of BEAM itself, as BEAM still has plenty more optimizations that can be done to it, but is rather showing off what the OTP style is capable of. Even if they did not have a single 24 core machine with 96 gigs of ram (which is still 40% unloaded at their most busy times) they could have scaled it out onto more computers, spool up more Phoenix nodes and create a Mesh where everything can communicate out transparently. BEAM scales not only up on more cores but also out onto more machines (great in case of hardware failure!), not because the VM is so good (although it is very well made), but because of the OTP programming principles. If you want arbitrary scaling abilities among an arbitrary amount of connections with an arbitrary amount of running data containers (like a Game Entity) that can each be running different code, the OTP style is really the way to go. Think of it this way, each Actor in an Actor system is like its own little running system process (just a lot more lightweight), you can program using OOP or ECS or whatever 'inside' of it, but all communication between Actors is done via messaging. Actors is just a higher level system view, you can still use ECS within them, or whatever is appropriate for the problem domain. Also sorry for another long post. I get a bit rambly when tired. If you have specific questions I am much shorter at answering those. ^.^ |
I've been messing around with an approach using clone-on-write and simple DAGs for scheduling. First of all, data is clone-on-write. Component state is not mutated. This allows all components from the previous cycle to be visible in order to create the components for the next cycle. Second of all, each system outputs one component (or intermediate calculation. see below). The main problem with this approach of course is the number of frames it may take a change that depends on multiple components to propagate. The way I've solved this is allowing systems to depend on certain components or intermediate calculations already being processed, and to use those up-to-date values. This way, all neccessary changes can be propagated in 1 frame, as well as allowing systems to reuse intermediate calculations (I suspect this will be quite rare, but it's a useful feature nonetheless). Therefore the signature for a system that requires 3 systems to be completed will look like this: fn process(&mut self, in: (&ComponentList<A>, &ComponentList<B>, &ComponentList<C>), previous_cycle: &Components) -> ComponentList<Out>; There is then a macro that builds up a sort-of DAG representing the systems that need to be completed before a certain system can be run. Then, using a scoped thread pool and some It may even be possible to use critical path analysis to optimise the scheduling of systems at runtime based on some simple profiling, but that's not something I have the time to look into right now. |
@HeroesGrave very interesting, thanks for sharing! Is this something you got working, or just toying around? I'd peek into the source ;) Also, @csherratt mentioned that the cost of copy-on-write appears to be high. Cloning the world on each iteration can be expensive. |
@kvark there is always overhead in doing copy-on-write. The worst part about it is that it has a bunch of extra code being generated to handle the aliased cases. But I didn't find it to be significantly higher then a none cow data-structure. The nice thing about copy-on-write is that it makes it really easy to get around the borrow checker. It's essentially free for the readers to clone data-structures, so if you want to do an update and need a copy of some of the old data you just call clone and you don't have to worry about who The big disadvantage of cow is that you limit yourself in what data-structures you can select. Trees map to copy-on-write extremely well. So BTree's and Radix trees work well with them. Hashmaps, and VecMap do not since they require cloning the entire data-structure (not just the parts changed). |
@kvark: It's pretty much working, but I want to clean up the code a little bit more and test some more complex workloads. Also, due to the lack of HKTs, it's not possible to be generic across different component storage types, which is a bit of a pain. Might throw everything in an enum as a workaround. |
I would like to give my opinion on this topic since I have already worked on implementing a In my implementation, entities are only composed of four things:
Most of the time the entity index is used for accessing everything, the handle (index + version), Components are just pure data. They need to implement a trait In this implementation, I always have two states. The past state and the futur one. The past state is read-only, all processes read this state and compute modifications accordingly. Every process writes their modifications through a The idea is that I can have a job system where a bunch of job are kicked to worker threads, I can also use this job system to batch updates of components. Each job takes care of one component Finally, the process execution graph will be at best only one deep level, no dependencies, only a sync/update point. There is a incomplete draft of the implementation available here. Note that it is incomplete and does not necessarily match what is described here. |
I've always questioned the point of the bitset that defines what components are set, I've just let the Component Containers handle that themselves. If something wants to know if an entity has some components then just query them. If something wants to know all the entities that have certain components, then make an Aspect and it will get updated. If something wants all entities and all components then it can make its own Aspect handler that will listen for everything, it can hold that data however it wants. The reason I hate the bitsets is I use a LOT of components, many hundreds can even be a low number for a fairly small game, and carrying that data around of what is used and what is not is not really useful. |
@OvermindDL1 I agree with you on that point. It is only useful when you don't have many entities to iterate and you don't cache them for future updates. However, if a process wants to filter entities conditionally it allows to just do a bit mask to check if the entity is bound to the filter but this might be rare/nonexistent. |
@Liotitch @OvermindDL1 |
@kvark In every setup I've used so far I have many cases where multiple systems work on the same aspects, systems that have no aspects, and systems that have multiple aspects. It is a good style to follow. |
@OvermindDL1 could you drop in some examples? |
@kvark Sure, from a small project: *) Multiple systems with the same Aspects: *) Systems with no Aspects *) Systems with multiple different kinds of Aspects |
Heads up everyone - I was able to implement a ECS design that fulfils my requirements: Thread-safe parallel processing is there, with not even a char of unsafe code. Still got to figure out the entity/component recycling though, so will follow the thread. |
@kvark As I often like to do I like to run an ECS through a few of my stress tests that my system handles with ease to see if I have found a replacement so I can stop dealing with my own so it is time for parsec. ^.^ I used your example.rs as the base, created 1 million entities (not unheard of in some of my mini games, I often have ten thousand active entities with bursts of up to a million that can be rapidly created and destroyed, hence why I had to learn how to optimize an ECS), with 99% having a parsec::VecStorage Component and 2% having a parsec::HashMapStorage component with 1% of overlap. First issue found, these two lines takes a substantial amount of time (almost a full second): scheduler.run0w1r(|b: &CompBool| { // The rare component
println!("Entity {}", b.0);
});
scheduler.wait(); Where such an equivalent Aspect in my system takes <1ms (even in a debug run). As I am still learning Rust I wondered if it was caused by println buffering output, so I removed that line (so the system is completely a no-op, still running in debug mode for note but was worried it might have been optimized out regardless as it often would in C++) and got the same results, so output buffering was not the issue. At this point I delved into the code and it looked like it was iterating over every single entity for every single system callback even if not wanted (it is just skipped for the callback, but it is still iterated over), which to me does not seem in any sense scalable? What is the plan to work around that? |
@OvermindDL1 thank you for having a look at You might be interested in the existing project called ecs_bench (by @lschmierer) that aims to profile different ECS implementations in Rust. It's first test case is very similar to yours, where only a fraction of entities actually satisfy the system's requirements for running. What you found is a known issue - the One way to address this is - if you know that some of the components are rarer than others - to use the custom Another - a more radical way, is to use tables by @csherratt. He's already got a prototype implemented on top of Please feel free to hop on our gitter and/or create issues in our project (or contribute!), to avoid off-topic in this thread. |
@kvark Fascinating links and information, thanks. Yeah my basic test involved a 2D screen with a set of tiny floating squares (movement system floats them around bouncing off screen edges) that when they intersect it (collision system) destroys both, emits (via event) a shower of new Entities for each (basically particles), and the spawn system shortly later spawns those back in within about 10ms. Generally have 1000 squares or 10k for a stress test, each collision creates about a hundred more entities as they zoom off until they fade out and are removed, and when it 10k exist the collisions are... consistently constant to put it mildly. I can only achieve real-time speeds in my system thanks to my aspect cache (and shader trickery for the rendering side). I should probably get a gitter account, done. ^.^ |
@OvermindDL1 that sounds like a very good testcase for ECS. I'm currently porting yasteroids, and it's going to take some time, but I'd love to try out your scenario with |
Closed by #44 |
Just wanted to note that this was a super fascinating discussion. Lots of information here that's useful for someone figuring out what patterns there are, and how to handle them. Thank you all for spending your time here ❤️ |
Refactor Transform in amethyst_core (and reformat amethyst_assets)
Update base64 requirement from 0.9 to 0.11
With game state management mostly stable, the next step is to implement an entity-component-system framework in the engine. This framework must follow the pure ECS model, in which entities are just IDs and components are POD structs.
Though the ECS model is extremely flexible and easy to data-drive, it has its shortcomings:
In a useful Reddit post, tomaka describes an alternative model similar to MVC with some parallelism. I propose a combination of the two designs resulting in a thread-safe ECS.
(Quoted from the other announcements section in TWIA Issue 1)
Clearly, the World struct containing our entities and components is analogous to the model in MVC. But systems typically require mutable state even for strictly read-only operations, so they don’t fit into either views or controllers, nor are they easy to parallelize.
However, there are two possible ways to remedy this issue.
Experiments are ongoing on the ecs branch.
The text was updated successfully, but these errors were encountered: