-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long running OR commands should report their runtime #5003
Comments
OpenROAD-flow-scripts is a flow that uses OpenROAD. This flow is based on invoking OpenROAD many times, reading in the previous .odb file and writing out an updated one for each stage and reporting times and memory usage. If I understand correctly, you describe a flow where a single .tcl script is used and the OpenROAD binary is invoked once. OpenROAD-flow-scripts used to have a "single .tcl script" version of the flow, but it was abandoned because it wasn't being used, maintained and there were no tests for it. There are many flows other than OpenROAD-flow-scripts that exist on top of the OpenROAD tool. To name a few: Hammer, Zero ASIC SiliconCompiler, bazel_hdl. None of these flows, to my knowledge, is run by a single .tcl script. The OpenROAD binary is invoked multiple times. For these flows, OpenROAD is not the only tool being used that memory and times have to be tracked for. Hence, I think today's solution where the measurement of memory usage and running times is outside of the scope of the OpenROAD binary seems to work well. There are some notable exceptions, memory consumption and running times is tracked and printed for each step within detailed routing. So in short, I think that absent of a more specific description for changes, I think that today's solution is pretty good. It has evolved to the way it is for good reasons. |
I have created log files from years ago running ispd routing testcases, using OpenRoad global and detailed route one script OpenRoad tcl script, no need to save and reload odb databases in between. Having multiple flows to implement their own runtime logging sounds like an argument to put it into the tool itself to me. |
I think it is a nice to have though not a high priority. To be clear this is command level timing rather than flow step level timing so it is a bit more fine grained. In general OR doesn't have a unified approach to commands so adding this will likely happen when there is some motivating need to make commands more common. Dependencies like sta further complicate the implementation. Detailed routing does already report its runtime so that should be present in your old logs. |
It's definitely not "high priority", but very much "nice to have". Making it easy to compare run's between versions/machines/implementations by diff'ing/grep'ing/scripting is useful. There is a fine balance between "show some progress/info" and "keep the log file short and concise". |
We could either report whenever a command takes more than X time or just decide which commands are interesting to always report. The former will make the output somewhat non-deterministic around the threshold, the later may report small times on small designs. I favor the later for determinism in which case in which case we should make a list. |
A few comments:
|
In my use-case this info would have been useful to me. So it sounds like our preferences differ and there is the question of the resources necessary to implement this request. If long running commands (placement, resize, global route, antenna, detailed route) would report time (in c++), this would in no way impede other flows to ignore this info and log this some other way as today, best of both worlds ... |
Would
suffice? I suspect those are the big ticket items and wouldn't add too much verbosity. |
repair_timing specifically can take surprisingly long, so having extra detail there would be useful. Also report timing can be very slow. CTS is a good example.I have seen cases where CTS takes 10 minutes and report/fix timing takes 24 hours. |
Adding time to logs makes them non hermetic so this should be an opt in thing. |
This is already the case for detailed routing. How do you deal with that (in bazel_hdl, presumably?)? |
repair_timing does give detail with -verbose. This would just be adding an overall time message. |
Here's an example why runtime and memory consumption reporting from openroad is useful. Now I know not to try to detail_route the 3M stdcell ispd24 mempool_group testcase again. read_def is fast
|
Not a great example as the command didn't complete so you wouldn't get any reporting at the end. Reporting usage during progress is not what was requested here. |
I think what DRT needs is some automatic adjustment to the number of threads before it starts running in case it detects that the system memory is too low. I know of some proprietary tools which do this based on system load (% CPU) but because DRT memory scales pretty linearly with threads, it should hopefully be easy to do here. |
@rovinski Interesting... I think it would be a good idea to make this improvement to detailed routing, even if it would not affect the maximum memory usage in all cases. Example from megaboom, here maximum memory usage for global and local routing is about the same. Note that megaboom runs with various things disabled, which will underestimate memory usage, though how much I don't know. Features are disabled mainly to rein in running times to something somewhat manageable.
|
It sounds quite fragile. The amount of free memory may change quite dynamically (particularly on a multi-user system). It also depends on how expensive swapping is (and how large the working set is). Its also hard to know exactly how much memory drt will use in advance. |
I was thinking more conservatively of (threads * memory/thread) < total system memory. Not even including free memory. Beyond that, you are guaranteed to dip into swap and risk significant slowdown. In an even more conservative case, at least a warning could be issued. Is it hard to know how much memory it uses? When I was experimenting a while ago, it seemed pretty deterministic based on the technology, design size, and number of threads. |
If I overestimate the usage and unnecessarily throttle the run I get a bug report. If I underestimate the memory and the run is slowed down or killed I get a bug report. I have to hit some goldilocks window every time. If you feel you can do that then please submit a PR (and plan to support it). |
The user can control this via NUM_CORES now, right? What about memory usage for the rest of the flow? |
But that's the currently the case anyways 😆. If introduced, hopefully it would be less. |
No such promise is made today and therefore nothing needs "fixing". |
I think today there is a well defined behavior where the user has to make an annoying choice that has deterministic effects: NUM_CORES. From a bug/feature request point of view, I think this is the least bad baseline. I can see how any change to this baseline(default behavior) practially(from a maintainers point of view) has to be only better and never worse. In bazel-orfs, I can easily specify NUM_CORES per stage(floorplan, place, route, etc.), the same is possible in ORFS, but requires writing a script that runs one make comand per stage(not a big deal for a flow that is memory constrainted, which probably also runs for a very long time). Perhaps the final summary report could include "marginal memory cost/core"? The marginal memory cost per core refers to the additional memory consumption incurred by adding one more processing core to a system That would require some extra logging by e.g. detailed routing based upon knowledge of the memory consumption. The user can then choose to modify NUM_CORES for the problematic stage. |
Come to think of it: if the marginal memory cost per CPU can't be estimated and reported, then it isn't possible to implement a policy to select the "best" number of CPUs as constrained by available memory. If the marginal speedup per core can be estimated(assuming roughly linear for simplication) and the marginal memory cost can be estimated and reported, then it is possible for the user to decide on a NUM_CORES tradeof manually. Currently a maximum speed policy is in place(use all cores) and a minimum memory usage could be estimated in a column. If the peak memory usage isn't reduced(grt doesn't use parallelism, does it?), then the user would know that nothing is gained as such by reducing parallelism. |
In fact, it would be easy to run an experiment. Run a bunch of runs w NUM_CORES w 1 through number of cores and plot run times and memory usage for each stage... Should tell a lot... |
This is all predicated on the idea that there is some fixed cost per thread and that it remains a constant across iterations. I'm not clear either is true but am open to see your results. This issue is getting taken over by an unrelated request. If people want to pursue it then it should be a separate issue. |
Hello, I would like to work on this issue, however from the conversation its not clear what is the final request for this issue. To summarize what I understood it's that for the following commands runtime and memory usage should be logged from within the commands and the rest of the conversation is considered as a separate request.
Could you please confirm if this is the case? |
I think that would be good start. People can request more commands if they see long runtimes. |
I'll be pretty forceful here and say this should be an option you need to enable, and not default behavior. |
Why? drt already does this, how are extra commands different? |
Hi all, I have started working on this one. |
Description
Every “long running” command should report how long it took to run it.
The goal would be to be able to easily compare log files from different runs.
E.g.
vimdiff dir[12]/log
or
grep runtime dir2/log > dir2/runtime
diff dir[12]/runtime
Suggested Solution
Currently there seems to be a way to get ORFS to report runtime and memory usage.
But this relies on running the openroad executable for each and every step.
And it's not "built into" openroad, it relies on external scripting.
It should be possible to have one OR tcl script to run "the whole flow" and still get this information in the log.
Additional Context
No response
The text was updated successfully, but these errors were encountered: