Add TimerOutputs instrumentation to ground-state algorithms#431
Conversation
|
Note; will remove the AI clutter here tomorrow, didn't feel like writing the PR notes but this is kind of bad |
|
I would keep this controllable via the verbosity interface. |
|
Fair point, probably just showing is enough. Any comments on the multithreading? I am not sure if I can do a lot better without having to do a lot of engineering, I don't even know where I would start in order to get something that is consistent. |
|
Maybe we can just update some of the names of the steps to indicate they are multithreaded. Getting some sort of average runtime per thread looks to be rather difficult. I didn't find an easy way in the TimerOutputs.jl readme. |
|
No, TimerOutputs doesn't actually provide that much multithreading functionality. I think the claim is that the top sections are accurate, and the multithreaded parts below aren't. Any suggestions on names? |
|
Maybe just |
|
For now I will just merge this as is, since this shouldn't be breaking I can always alter it later if people are unhappy, and I have limited time and want to keep things moving forwards a bit. |
This adds optional per-section timing to VUMPS, DMRG / DMRG2, IDMRG / IDMRG2, and GradientGrassmann, using TimerOutputs.jl. Each algorithm allocates its own
TimerOutput, sub-sections mirror the natural phases (eigsolves, orth steps, env transfers, gauge solver, ...), and the summary is emitted via@infov 4so it is gated by the existing verbosity setting.The section structures end up looking like:
A representative VUMPS run with
verbosity=4on a 3-site Ising chain:Note that inner sections can show
%tot > 100%relative to their parent (AC_eigsolve's 67% vslocalupdate's 64.8%) — that's becausetforeachruns sites concurrently, so per-call wall-time summed across threads exceeds the parent's wall-clock.Control is via the existing
verbosity:Wherever a timed section sits inside a
tforeach/@spawn(VUMPS site loop, env compute, multiline rows), each task gets its ownTimerOutputand wemerge!after the synchronization point. The merge target is computed dynamically fromtimeroutput.timer_stack, so nested wrappers likefg → envs → left_envswork without hardcoding tree paths.