Redesign DotRunner as a single global queue with per-format batching#12037
Redesign DotRunner as a single global queue with per-format batching#12037vtjnash wants to merge 20 commits into
Conversation
Replace the per-.dot-file DotRunner/ThreadPool architecture with a single DotRunner that holds a global queue of all jobs across all source files. Running dot is now batched as one invocation per output format using -O (auto-naming) instead of -o: `dot -Tpng -O file1.dot file2.dot ...` Remove ThreadPool and DOT_NUM_THREADS usage from DotManager. The cmapx map output extension changes from .map to .cmapx to match -O naming. The DOT_NUM_THREADS could be restored by splitting up `system` into a loop for `spawn` and a loop for `waitpid` each taking a fixed proportion of the graphs, but that isn't currently expected to provide significant difference in performance. The DOT_MULTI_TARGETS setting is now ignored (old default OFF, now assumed ON) since the corresponding feature in graphviz is now over 20 years old (1.8.10). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Replaces: doxygen#12028
Group jobs by format+directory, cd into each directory before invoking dot, and pass only the file basename. Cleanup and .md5 writes happen after all formats are generated, using the saved absolute paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rather than saving/restoring the process cwd around Portable::system calls, pass the desired working directory directly. On Unix (fork path) the child calls chdir() before exec; on Solaris (vfork) and Windows console paths it prepends cd to the shell command; on Windows non-console it passes the directory to CreateProcessW lpCurrentDirectory. DotRunner::run() now passes the dot file directory to Portable::system and uses absolute paths for post-run checks, removing the Dir::setCurrent save/restore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chdir() is async-signal-safe so it can safely be called in a vfork child. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also replace exit() with _exit() after execve to avoid flushing stdio buffers in the fork child. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…imits Error messages now include the working directory (chdir target) alongside the command and arguments. Dot invocations are split into batches so the command line stays under 32767 characters on Windows and 1MB on Linux/other Unix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MAX_ARG_STRLEN is the per-argument length limit enforced by the Linux kernel. Fall back to 131072 on systems that don't define it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
graphviz -O appends the format extension to the full input filename (including .dot) on some versions, producing graph.dot.svg instead of graph.svg. Rename the output if needed so the rest of doxygen finds the expected path. Applied after both the batch run and the PDF re-run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ehavior dot -O always appends the format suffix to the full input filename (including .dot), per the graphviz docs. Remove the conditional FileInfo::exists() guard since the rename is unconditionally required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dot -O appends the format suffix to the full input filename, so update all output path computations to use job->dotFile + "." + format directly: - dotgraph.cpp: imgName() includes .dot. before the format extension - dotgraph.h: absMapName() uses .dot.cmapx - dotrunner.cpp: post-processing uses job->dotFile + "." + format; rename blocks removed - dot.cpp: writeDotGraphFromFile and writeDotImageMapFromFile use Portable::system with explicit -o since the input and output paths are unrelated (user-provided file vs. generated output location) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dot -O writes file.dot.pdf/eps; passing absDotName() as figureName means figureName+".pdf"/".eps" resolves to the correct file without any rename. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Also an interesting idea. Seems there is still an issue with generating the PDF version of the manual. How much is this slower or faster than the other approach for llvm on a clean dir? I find the use of the maximum argument length a bit fragile/artificial. Maybe we could convince the GraphViz developers to add a |
dot -O produces file.dot.pdf; LaTeX appends its own extension list (.pdf
etc.) to the includegraphics argument, so {baseName.dot} resolves to
baseName.dot.pdf correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Fixed the remaining name issues with generating the pdf. Basically the same speed. The tradeoff here isn't speed, but the ability to do dynamic load balancing with multiple processes, since this version only accepts a static list. (This does about 6 execv, instead of 15k). But it is unclear to me that dynamic (or static) load balancing is even needed after this PR. Adding The maximum argument length is an annoying limitation of the unix kernels (apparently added to linux to avoid performance issues with their allocator), but was otherwise fairly trivial to implement, and is also related to logic that would used if this did support multiple threads again. The limit is per-argument, so it could be higher if this called directly into dot instead of using sh, but that |
|
Doesn't look quite right when running Doxygen on its own code. Running Running So that is more than 9x slower! (on an 8 core system running macOS) Looks like it only does one batch command with multiple dot files and then starts running them one at a time. |
|
Looks like it is accidentally making a (crude) estimate of the quality of |
|
I still see some blocking issues. Most images do not show up (they link to |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ion) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bering bug Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The duplicate thing sounds quite odd. It looks like this is intentional in doxygen for a couple years (ab35a71) because the patcher makes an svg inside an svg inside an svg. Looking into this further, I also seem to have files where doxygen erased the whole file after dot generated it: It looks like these issues were caused by the primary graph id field being populated as "page0,1_graph0" instead of the expected "graph0" for files after the first one. This breaks the simple text replacement code. Filed bug as https://gitlab.com/graphviz/graphviz/-/work_items/2827, though made a workaround fix here to not deal with waiting for a release and version checks |
|
I think performance should be further improved by combining worker threads and batch operation. The user should be enable to specify the number of worker threads (via the existing Also, specifying multiple targets seem to be possible using multiple Example will generate |
Seems reasonable, though note the PR already seems to give a 50% performance enhancement over using all possible threads, which seemed worthwhile on its own already–before complicating review further with bringing back threads?
Interesting ideas, though I don't think it is worth it. The additional savings from a couple more command line arguments is negligible. The additional complexity to batch jobs by their total set of extensions seems unpleasant (esp. in C++) and not justified for the bugs it may cause. This PR already does the rename to ensure existing links work. |
One of the Graphviz maintainers here. I’m surprised you’re hitting these OS limits, but I’m supportive of adding something like this to Graphviz if it helps you. |
@Smattr Thanks for your interest in helping out. Let me give some context and explain what I think would be the best way to improve Graphviz for use with Doxygen. For large projects, when you enable all dot related options, Doxygen can produce a huge amount of dot files (the number can easily run in the tens of thousands). Currently, for each image dot is invoked. To speed things up Doxygen can already use multiple worker threads to launch multiple instances of dot in parallel. What this PR has demonstrated is that the overhead of launch dot many times is very significant. If dot is instructed to produce a whole set of files in one go it is much faster (like 10x faster!). Currently, the only way to get Ideally, I would like to be able to pass a file, where each line represents an invocation of the "/tmp/output/html/test.dot_da268e64a51e9a0e393f9cefd6bb4b81.dot" -Tpng -o "/tmp/output/html/dot_test.dot_da268e64a51e9a0e393f9cefd6bb4b81.png"
"/tmp/output/html/test.dot_da268e64a51e9a0e393f9cefd6bb4b81.dot" -Tcmapx -o "/tmp/output/html/dot_test.dot_da268e64a51e9a0e393f9cefd6bb4b81.map"
"/tmp/output/latex/test.dot_da268e64a51e9a0e393f9cefd6bb4b81.dot" -Tpdf -o "/tmp/output/latex/dot_test.dot_da268e64a51e9a0e393f9cefd6bb4b81.pdf"When running e.g. dot "/tmp/output/html/test.dot_da268e64a51e9a0e393f9cefd6bb4b81.dot" -Tpng -o "/tmp/output/html/dot_test.dot_da268e64a51e9a0e393f9cefd6bb4b81.png"
dot "/tmp/output/html/test.dot_da268e64a51e9a0e393f9cefd6bb4b81.dot" -Tcmapx -o "/tmp/output/html/dot_test.dot_da268e64a51e9a0e393f9cefd6bb4b81.map"
dot "/tmp/output/latex/test.dot_da268e64a51e9a0e393f9cefd6bb4b81.dot" -Tpdf -o "/tmp/output/latex/dot_test.dot_da268e64a51e9a0e393f9cefd6bb4b81.pdf"If you could help with such option, then I can use it in Doxygen, and it will bring a huge speed-up without having to force things to fit with the limitations of the |
|
@Smattr I've created a simple proof concept patch that adds the new option. It diffs against Graphviz main (added the |
My optimal request above was that this should read from stdin, not a file. Currently it only expects sequential |
|
I think it is easy to combine both requests, by allowing e.g. |
|
I forgot one other detail, which is that the streaming mode needs a way to indicate when each is finished and dot is ready to handle the next job. A simple byte marker (NUL) between outputs would suffice. |
|
Seems this feature request has already expanded in scope :)
You mean major version? We would usually only bump minor version for an additive feature like this.
I would favour the format being NUL-byte separated so the Graphviz parsing logic can be simplified. The caller (Doxygen) can also more easily produce such a thing with a programmatic equivalent of
This will unfortunately be significantly more complicated. Graphviz uses a lot of globals, both in obvious places and in less obvious places. Tracking these down and resetting/updating them in between graph runs is going to be a heavy lift. It was just never designed with this execution model in mind. Taking a batch file that is simply a list of further argv elements is not too hard. This more ambitious “multi-job” list is going to be trickier.
TBH I don’t know why Graphviz doesn’t do this already. How does anyone delimit multiple outputs on stdout today? Though admittedly this kind of separation logic would only work for text-based output formats. |
|
BTW why is this a Graphviz-specific issue? I realise for Doxygen’s concrete case it is. But these Windows limits must plague others too. My naive thought is that one could implement a generic |
|
Windows doesn't have execve, it only has CreateProcess. This limit is mandated in the posix standard, since the memory needs to be copied out of the old process before replacing it with the new one, it doesn't like to do that too much during execve. If all that mattered was posix performance, I'd say just use posix_spawn and eat the extra bit of overhead. Unfortunately, CreateProcess, despite being much more limited than posix_spawn, is also very much slower due to virus scanners and other similar design decisions at Microsoft. |
No only the minor version, just something we can check against to tell if the feature works for the installed version of dot or not (it can take a long time before distros update their tool versions).
That's fine for Doxygen's purpose, but if an end user wants to create such a file for some reason it may be a bit more involved to create in an editor and then use e.g.
With the proof of concept I made, it just loops at the highest level. I hope this is sufficient to (re)set the globals used. Seemed to work fine.
I prefer a simple solution over no solution of course, but ideally it should support multiple jobs. Otherwise, we still need to launch dot for each set of options and use the
I think this is a different operating model, where inputs can be streamed into dot's stdin via some format (with support for options and dot files) and output is then also streamed to stdout in a structured way (with file names, possible error messages, and output binary files). I would consider this a different request. At least not needed for what I had in mind. This could also be created as a new/separate "dot-server" tool that uses graphviz as a library. I'd like to avoid shipping this kind of tool with Doxygen though, because it introduces a build time Graphviz package dependency, and the licenses are not compatible. |
Sure, it’ll probably work for some narrow set of cases. Though even the history of this PR tripped over a scenario where this doesn’t/didn’t work. My point is that this is not a well-used/-tested path, so there are likely a bunch of known and unknown bugs. Having said that, my vague gestures towards boogie men aren’t particularly satisfying. I don’t want to stymie progress here. If you’re willing to accept we’re driving off road here and might hit a few trees, we can use Doxygen as a pilot to drive this Graphviz feature’s development. I suggest this should involve some seat belts on your side in the form of (a) a way to run batch and single runs, then diff the results and (b) a way for users to opt-out of the batch mode if/when bugs are hit. And just to set expectations, this is all just my personal stance. I have not polled the other Graphviz maintainers for their opinion on this topic. If we really want to do this, the next step would be filing a Graphviz issue to get buy-in from the others. |
When rendering multiple jobs, one after the other, paging would be reinitialized but this logic omitted initializing the current page number. As far as I can tell, this is simply because `init_job_pagination` never anticipated that the current page number could be non-zero when it was called. This is described as a regression in 2.40.0 because the visible side effects of this were inadvertently introduced by 6fec15b. But the change introducing this was actually fixing a bug, so it is debatable whether this bug should really be considered as being introduced then. Github: doxygen/doxygen#12037 (comment) Gitlab: fixes #2827 Reported-by: Jameson Nash
|
@Smattr I think it is a good idea to go ahead and file an issue for this request, so we know if there is buy-in. I'd expect that similar issues will already be present when using the existing In any case, I will make sure there is a fallback for the users in case there are unforeseen issues. |
|
Done: https://gitlab.com/graphviz/graphviz/-/work_items/2831 Also for reference for anyone writing version gates, the fix to the page count issue went into Graphviz commit b620cbbb8e40dacebc19ce494115f64e17b87804 which, if all goes to plan, should land in Graphviz 14.1.5. |
|
Does this new batch mode have to be implemented as a new flag to |
That is certainly also an option as far as I'm concerned. Knowing a little bit how things are structured in GraphViz, I can imagine this makes sense. For Doxygen, one can already configure the path to the dot executable, so if this is pointing to differently named executable, it can be used as a way to know that batch mode can/should be used. |
…ygen Previously this was likely 1.9.8, with the Ubuntu 24.04 worker. Now this is 1.17.0-dev. Fixes 3 significant issues for LLVM: - `dot` execution performance is very slow (cuts this half hour step down to mere seconds). doxygen/doxygen#12037 - multi-thread performance is very slow (worse than single threading), and now uses all cores for ncpu times speedup (when using version with fix, autodetected by cmake). doxygen/doxygen#12027 - file links for IR.cpp and similar files were wrong doxygen/doxygen#11944 Assisted-by: Claude Code
|
I’m having a little trouble rearticulating the motivation for this Graphviz batch mode. In the era of pervasive multicore machines, it seems preferable to me to parallelise individual |
|
In the traditional unix design with fork, the spawning processes cost is roughly inversely proportional to core counts (since the work to spawn a process increases with the thread count, while the cost of doing that work increases with core count). So doing this work from multiple threads gets slower as more threads get added. One-shotting all the graphs in one dot process on one core is actually measurably a lot faster than using all cores because of the overhead of each dot processes, compounded by the overhead of each graphviz thread |
|
Sorry, I think I’m being dense and still not quite getting this. That reply unfortunately left me even more confused. Specifically:
|
|
Ah, sorry, I was answering the wrong question. When forking, the processor must enumerate all the other threads to copy their memory. This involves taking a lock, and lock performance scales negatively with the number of threads, and locks are more costly with more cores because of the coherency protocol. The problem here is actually unix, because fork is so slow it greatly dominates the time of generating all the plots. But if it was only unix at fault here, this could use posix_spawn and do much better. But Windows is also slow here, so the idea of batch mode is to fix all platforms. It doesn't and shouldn't, but that is how it was implemented in doxygen presently. That could be fixed too. I've proposed that could be done as a followup, but thought it wouldn't be necessary to do in this PR. Lastly, to your other point of waiting for unbalanced work in batch mode: my proposed enhancement for load-balancing was originally to add a streaming mode to dot/doxygen (ala #12028), not a batch mode. But the work being done by doxygen appears to be a very large amount of very small work typically, so statically-scheduled batch mode (and even single-thread mode) is pretty close to equivalent. |
|
Ah OK. So the problem here really is Windows, because it sounds like using Since you understand the requirements better than me/us and we have not encountered other Graphviz users with this use case, do you want to develop this batcher/streamer within Doxygen? We could then look at upstreaming it into Graphviz. Re comments on #12028:
Yes, this is a long standing problem. We have https://gitlab.com/graphviz/graphviz/-/work_items/2558 but we’re literally years away from resolving that.
From the perspective of the Graphviz maintainers, I don’t think we have any qualms with this. AT&T are the copyright holders, so it would be them if anyone taking issue with this. I don’t think they have any active interest in Graphviz these days. (with the proviso IANAL) |
|
@vtjnash I've merged this PR (see 08e95ca). I've added threading support back in and made the batch size a configuration option (DOT_BATCH_SIZE). @Smattr For now I limit the path size to a safe value. Somehow Github doesn't recognize it as being merged. |
|
Might be that GitHub doesn't want to close as it still has some conflicts? As it is merged I think this PR can be closed. |
|
Awesome, thanks! |
Replace the per-.dot-file DotRunner/ThreadPool architecture with a single DotRunner that holds a global queue of all jobs across all source files. Running dot is now batched as one invocation per output format using -O (auto-naming) instead of
-o:dot -Tpng -O file1.dot file2.dot ...The DOT_MULTI_TARGETS setting is now ignored (old default OFF, now assumed ON) since the corresponding feature in graphviz is now over 20 years old (1.8.10).
Alternative to #12028. For llvm, this generates all plots in about 20 seconds, which seemed not worth the additional effort to make it threaded (though it could still do so if other projects are making substantially more complicated or more numerous graphs than llvm).
As before, a disclaimer that this was written almost entirely by Claude (though you can potentially guess at all my requested fixes to the initial prompt from the list of commits). I'm happy to clean this up more if this looks generally acceptable (and I recommend squashing on merge).