Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize scheduler #54

Open
dabreegster opened this issue Apr 30, 2020 · 11 comments
Open

Optimize scheduler #54

dabreegster opened this issue Apr 30, 2020 · 11 comments
Labels
good first issue Good for newcomers

Comments

@dabreegster
Copy link
Collaborator

Profiling reveals https://github.com/dabreegster/abstreet/blob/master/sim/src/scheduler.rs as a hot spot, which makes sense. Anybody have ideas to speed it up? It's a priority queue with updateable priorities. There are lots of alternatives to std::collections::BinaryHeap, but I think I had trouble finding one that was deterministic (given the same inputs, simulations must run exactly the same).

I squeezed running a huge scenario from 71s to 63s using 7a0b9cd. I bet there's more possible.

@dabreegster dabreegster added the good first issue Good for newcomers label Apr 30, 2020
@Stupremee
Copy link

How did you profile the code? I would like to try some things and see the change

@dabreegster
Copy link
Collaborator Author

cargo run --release -- --prebake in the game crate. This runs the weekday scenario for the montlake and lakeslice maps without the GUI, so all the work is just spent simulating. I've also used cpuprofiler before; there's a few commands in docs/misc_dev_tricks, but right now, I think that pretty much just points to stuff in scheduler as the main hot spot.

The slight complication is that the simulation is still sensitive, so for example, if the a new priority queue slightly reorders commands with the same timestamp, then hypothetically gridlock could occur on these currently stable maps. If that happens, performance grinds to a halt, so it's obvious. :) On my machine, prebaking takes about 5 minutes total, so it's fairly obvious when a new problem is introduced.

dabreegster added a commit that referenced this issue Oct 19, 2020
affect determistic simulation, but yields a crazy speedup. #54

- 8 hours of downtown: 122s to 102s!!!
- prebake: 181s to 160s
@dabreegster
Copy link
Collaborator Author

@vks, based on your other repos (thanks for all your work on rand by the way!), I'm wondering if you happen to be interested in / knowledgable about core data structures in Rust like this? Doing something more clever here would probably be the single greatest perf boost we could get by changing one thing.

@vks
Copy link
Contributor

vks commented May 15, 2021

I'm not really an expert on data structures, but I would be interested having a look at performance! So thanks for marking me.

@vks
Copy link
Contributor

vks commented May 16, 2021

This crate looks promising: priority-queue It is built with an ordered hash map (indexmap). We could try to swap it in and see whether it improves performance.

@dabreegster
Copy link
Collaborator Author

This looks like it might work. I'm not sure if the Item struct should be the I type, or if we could remove time from that struct and just use the CommandType. The priority is mainly determined by Time, but the tie-breaking for equal times needs to be deterministic. peek doesn't say what happens when multiple items have the same greatest priority, but given the backing indexmap, it should probably work.

It would be nice if we could do away with having both a priority queue and a separate CommandType -> Command lookup. The queue's item must implement Hash and Eq. I tried just now and actually it's possible to make Command derive Eq. But not Hash; at the end of the day, there's geom::Duration and geom::Speed buried somewhere inside, which just wrap anf64. And maybe more importantly, the Command struct can get quite big, so hashing it by looking at all of its fields recursively is probably bad for perf.

That's why the whole CommandType indirection was started. The code using the scheduler promises to only ever have one Command per CommandType at a time in the priority queue. So we could probably ditch the queued_commands mapping and use the entire Command directly. Instead of deriving Hash and Eq, we would just make the hashing function take the CommandType, created lazily in the hashing function, as the input.

@vks
Copy link
Contributor

vks commented May 19, 2021

The priority is mainly determined by Time, but the tie-breaking for equal times needs to be deterministic.

It is deterministic, and independent of the hasher: garro95/priority-queue#29

@dabreegster
Copy link
Collaborator Author

@vks, I might actually try tackling this soon. I just wanted to make sure I'm not repeating something you've started or were really keen to do

@vks
Copy link
Contributor

vks commented May 27, 2021

Thanks for the heads up! I don't have plans to work on this soon, so please feel free to go ahead.

dabreegster added a commit that referenced this issue May 28, 2021
implementation gets simpler, but the net performance doesn't improve. #54
dabreegster added a commit that referenced this issue May 28, 2021
@dabreegster
Copy link
Collaborator Author

Well that was anti-climactic. I tried two different approaches for using priority_queue. https://github.com/a-b-street/abstreet/tree/pq_v2 just swaps out BinaryHeap for PriorityQueue, but otherwise keeps the CmdType -> Command mapping. https://github.com/a-b-street/abstreet/tree/pq_v1 removes that mapping as well, and instead implements Hash on Command by lazily calculating the CommandType and hashing that.

I measured total time to run the small montlake simulation. I repeated each run 3 times and tried to avoid other heavy activity while the test ran. There's a fair bit of noise repeating runs, but consistently, all of the variations using the new priority queue were slower than the current use of BinaryHeap.

So, failed experiment, sadly. :( But it could be eventually worth trying out other priority queue implementations.

@gaetanww
Copy link

gaetanww commented Dec 21, 2023

Do we know what priority queue operation is costing the most in this case?
I’m not too familiar with the design of this scheduled but I understood it required to update all the priorities regularly. Is that correct?
With a binary heap, I think updating one priority is log n, so optimising à priorities is a log n.
Might have better luck with a data structure that has better priority update performance, like a Fibonacci heap? There may be other data structures I do not know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants