include_server: Add opt-in features to work around boost headers#559
Open
akarle wants to merge 5 commits intodistcc:masterfrom
Open
include_server: Add opt-in features to work around boost headers#559akarle wants to merge 5 commits intodistcc:masterfrom
akarle wants to merge 5 commits intodistcc:masterfrom
Conversation
While I can imagine many use cases for this, the current need is that we would like to set a limit on the resources used by the include_server (via prlimit) to prevent it from pushing a machine into swap. We can do this in the make wrapper passed to pump, but only if the PID is exported. Signed-off-by: Alex Karle <akarle@mathworks.com>
Files in system directories (i.e. /usr/lib) are not sent over the network (instead added to the build area via a symlink forest) as an optimization. This patch takes that optimization one step further -- files identified as under a system directory are not only not sent, but they are not parsed for their own #include's and #define's. While this can create problems if a macro defined in these headers is used in an #include in the source to compile, it can also significantly speed up compilation for system headers that are known slow to parse (such as boost). A future patch implements a -DMACRO=VAL switch that allows the user to list any necessary macros skipped that are critical for #include parsing to work around this problem. Signed-off-by: Alex Karle <akarle@mathworks.com>
System directories are by default derived from the compiler, and are used to determine which headers do not need to be sent to the remote build hosts (and which areas do not need parsing, should -S be given). This patch adds a switch to the include_server to allow the user to define additional system directories. An example use case is if all build hosts have access to the same NFS share -- this can be considered a "system directory", even though the compiler doesn't view it as such, we can be sure that all machines have access to the area (and thus don't need to receive those files via the normal transfer mechanism). Signed-off-by: Alex Karle <akarle@mathworks.com>
When the include_server is told to skip parsing system headers (-S) it can greatly speed up the parsing, but it can also cause missed dependencies if a system header defines a macro used in an #include line. The canonical example of this is boost headers--they are so complex that the include_server hangs parsing them; however, `BOOST_PP_CAT` and `BOOST_PP_STRINGIZE` may be used in include lines: ``` #include BOOST_PP_CAT(ROOT,include/foo.h) ``` This patch adds a workaround to this issue by allowing the user to define system macros used in include lines: ``` INCLUDE_SERVER_ARGS="-DBOOST_PP_CAT(a,b)=a##b" pump ... ``` In practice, the number of macros defined in system headers _that are also used in #include lines_ should be rather small. An alternate take on this same problem was Alastair Rankine's OVERRIDE_MACROS pattern [1], which not only allows users to provide a definition for a macro that was skipped, but also changes the parser to avoid attempting to parse overridden macros. We've only found parsing issues with boost, so the hope is that the ability to override a macro by name is not needed, only the ability to define skipped macros, which allows for this simpler -D implementation. [1]: https://github.com/randomphrase/distcc/tree/include-server-fix Signed-off-by: Alex Karle <akarle@mathworks.com>
As the comment in basics.py says, the USER_TIME_QUOTA is critical to fine tune for ones environment--too little and the cache is dropped frequently (causing churn and more timouts). Too much and the include server can waste cycles getting stuck. This patch allows users to customize it (while keeping the defaults of 3.8 and 4.0 for timeout and check respectively). Note on the big scary FIXME comment--I audited the code base and found that the timer is either manually cancelled (in the case of subprocess.Popen) or the cause for concern (as in f.read()) is legacy. Specifically, as of Python3.5 (2015) and PEP475, most IO routines in python will resume after the signal handlers signals transparently (and we shouldn't be concerned about having to handle EINTR). I kept the comment because I think it's still probably cleaner to just avoid SIGALRM altogether, and _maybe_ there's some strange edge case where IO gets in a bad state if ALRM is signalled during it, but I'm reasonably optimistic that this should be a concern of the past. [PEP475]: https://peps.python.org/pep-0475/ Signed-off-by: Alex Karle <akarle@mathworks.com>
akarle
added a commit
to akarle/distcc
that referenced
this pull request
Nov 24, 2025
NOTE: this assumes distcc#451, distcc#459, distcc#566, and distcc#559 all merge and testing goes well. Signed-off-by: Alex Karle <akarle@mathworks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch series is all dedicated to one common problem: the include_server hangs parsing boost headers due to their complexity.
There was a good bit of work done on this by Alastair (@randomphrase) back in 2014, with the concept of a "macro override" that could be provided to avoid parsing a macro on a per-name basis (and instead using a user provided definition).
The solution that we found that works well in our environment is slightly more straightforward: avoiding parsing on a system/nonsystem basis. More concretely:
#includelines (this should be small in practice)This is all net-new code with zero changes to the default behavior, so it should be safe as-is. It's been tested pretty extensively in-house and proved effective at solving our boost woes.
There's a more detailed description of the rationale behind each feature in each commit message.
CC: @fergushenderson -- you may be interested in this, as I believe you reviewed Alastair's patch in 2014.