Optimize scheduler #54

dabreegster · 2020-04-30T18:30:14Z

Profiling reveals https://github.com/dabreegster/abstreet/blob/master/sim/src/scheduler.rs as a hot spot, which makes sense. Anybody have ideas to speed it up? It's a priority queue with updateable priorities. There are lots of alternatives to std::collections::BinaryHeap, but I think I had trouble finding one that was deterministic (given the same inputs, simulations must run exactly the same).

I squeezed running a huge scenario from 71s to 63s using 7a0b9cd. I bet there's more possible.

The text was updated successfully, but these errors were encountered:

Stupremee · 2020-06-24T11:27:06Z

How did you profile the code? I would like to try some things and see the change

dabreegster · 2020-06-24T13:53:55Z

cargo run --release -- --prebake in the game crate. This runs the weekday scenario for the montlake and lakeslice maps without the GUI, so all the work is just spent simulating. I've also used cpuprofiler before; there's a few commands in docs/misc_dev_tricks, but right now, I think that pretty much just points to stuff in scheduler as the main hot spot.

The slight complication is that the simulation is still sensitive, so for example, if the a new priority queue slightly reorders commands with the same timestamp, then hypothetically gridlock could occur on these currently stable maps. If that happens, performance grinds to a halt, so it's obvious. :) On my machine, prebaking takes about 5 minutes total, so it's fairly obvious when a new problem is introduced.

affect determistic simulation, but yields a crazy speedup. #54 - 8 hours of downtown: 122s to 102s!!! - prebake: 181s to 160s

dabreegster · 2021-05-15T21:51:04Z

@vks, based on your other repos (thanks for all your work on rand by the way!), I'm wondering if you happen to be interested in / knowledgable about core data structures in Rust like this? Doing something more clever here would probably be the single greatest perf boost we could get by changing one thing.

vks · 2021-05-15T22:26:48Z

I'm not really an expert on data structures, but I would be interested having a look at performance! So thanks for marking me.

vks · 2021-05-16T01:06:59Z

This crate looks promising: priority-queue It is built with an ordered hash map (indexmap). We could try to swap it in and see whether it improves performance.

dabreegster · 2021-05-16T16:04:25Z

This looks like it might work. I'm not sure if the Item struct should be the I type, or if we could remove time from that struct and just use the CommandType. The priority is mainly determined by Time, but the tie-breaking for equal times needs to be deterministic. peek doesn't say what happens when multiple items have the same greatest priority, but given the backing indexmap, it should probably work.

It would be nice if we could do away with having both a priority queue and a separate CommandType -> Command lookup. The queue's item must implement Hash and Eq. I tried just now and actually it's possible to make Command derive Eq. But not Hash; at the end of the day, there's geom::Duration and geom::Speed buried somewhere inside, which just wrap anf64. And maybe more importantly, the Command struct can get quite big, so hashing it by looking at all of its fields recursively is probably bad for perf.

That's why the whole CommandType indirection was started. The code using the scheduler promises to only ever have one Command per CommandType at a time in the priority queue. So we could probably ditch the queued_commands mapping and use the entire Command directly. Instead of deriving Hash and Eq, we would just make the hashing function take the CommandType, created lazily in the hashing function, as the input.

vks · 2021-05-19T17:46:02Z

The priority is mainly determined by Time, but the tie-breaking for equal times needs to be deterministic.

It is deterministic, and independent of the hasher: garro95/priority-queue#29

dabreegster · 2021-05-27T18:10:02Z

@vks, I might actually try tackling this soon. I just wanted to make sure I'm not repeating something you've started or were really keen to do

vks · 2021-05-27T20:36:34Z

Thanks for the heads up! I don't have plans to work on this soon, so please feel free to go ahead.

implementation gets simpler, but the net performance doesn't improve. #54

…Type -> Command mapping. #54

dabreegster · 2021-05-28T20:32:33Z

Well that was anti-climactic. I tried two different approaches for using priority_queue. https://github.com/a-b-street/abstreet/tree/pq_v2 just swaps out BinaryHeap for PriorityQueue, but otherwise keeps the CmdType -> Command mapping. https://github.com/a-b-street/abstreet/tree/pq_v1 removes that mapping as well, and instead implements Hash on Command by lazily calculating the CommandType and hashing that.

I measured total time to run the small montlake simulation. I repeated each run 3 times and tried to avoid other heavy activity while the test ran. There's a fair bit of noise repeating runs, but consistently, all of the variations using the new priority queue were slower than the current use of BinaryHeap.

So, failed experiment, sadly. :( But it could be eventually worth trying out other priority queue implementations.

gaetanww · 2023-12-21T11:18:34Z

Do we know what priority queue operation is costing the most in this case?
I’m not too familiar with the design of this scheduled but I understood it required to update all the priorities regularly. Is that correct?
With a binary heap, I think updating one priority is log n, so optimising à priorities is a log n.
Might have better luck with a data structure that has better priority update performance, like a Fibonacci heap? There may be other data structures I do not know.

dabreegster added the good first issue Good for newcomers label Apr 30, 2020

dabreegster mentioned this issue Jun 28, 2020

Avoid collect et al #106

Merged

dabreegster added a commit that referenced this issue Oct 19, 2020

Use a HashMap instead of a BTreeMap in Scheduler's internals. Doesn't

87d2332

affect determistic simulation, but yields a crazy speedup. #54 - 8 hours of downtown: 122s to 102s!!! - prebake: 181s to 160s

dabreegster mentioned this issue Dec 19, 2020

Starter project: Optimize map importer #270

Closed

dabreegster added a commit that referenced this issue May 28, 2021

Try switching to the priority_queue crate for the sim scheduler. The

195a7f4

implementation gets simpler, but the net performance doesn't improve. #54

dabreegster added a commit that referenced this issue May 28, 2021

Switch to priority-queue for sim Scheduler, but keep the separate Cmd…

c329504

…Type -> Command mapping. #54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize scheduler #54

Optimize scheduler #54

dabreegster commented Apr 30, 2020

Stupremee commented Jun 24, 2020

dabreegster commented Jun 24, 2020

dabreegster commented May 15, 2021

vks commented May 15, 2021

vks commented May 16, 2021

dabreegster commented May 16, 2021

vks commented May 19, 2021

dabreegster commented May 27, 2021

vks commented May 27, 2021

dabreegster commented May 28, 2021

gaetanww commented Dec 21, 2023 •

edited

Loading

Optimize scheduler #54

Optimize scheduler #54

Comments

dabreegster commented Apr 30, 2020

Stupremee commented Jun 24, 2020

dabreegster commented Jun 24, 2020

dabreegster commented May 15, 2021

vks commented May 15, 2021

vks commented May 16, 2021

dabreegster commented May 16, 2021

vks commented May 19, 2021

dabreegster commented May 27, 2021

vks commented May 27, 2021

dabreegster commented May 28, 2021

gaetanww commented Dec 21, 2023 • edited Loading

gaetanww commented Dec 21, 2023 •

edited

Loading