include_server: Add opt-in features to work around boost headers by akarle · Pull Request #559 · distcc/distcc

akarle · 2025-10-14T14:52:11Z

This patch series is all dedicated to one common problem: the include_server hangs parsing boost headers due to their complexity.

There was a good bit of work done on this by Alastair (@randomphrase) back in 2014, with the concept of a "macro override" that could be provided to avoid parsing a macro on a per-name basis (and instead using a user provided definition).

The solution that we found that works well in our environment is slightly more straightforward: avoiding parsing on a system/nonsystem basis. More concretely:

A new option allows skipping the parsing of system headers (a huge speedup in itself)
Another option allows backfilling any missing definitions that are needed in #include lines (this should be small in practice)
A third option allows specifying non-compiler-default system directories, for greater control over the impact of (1)
A fourth option allows configuring the parsing timeout, in cases where complex headers are expected and 3.8s isn't enough
Last but not lease, the INCLUDE_SERVER_PID is exported by pump so that we can set a resource limit to avoid it going off the rails

This is all net-new code with zero changes to the default behavior, so it should be safe as-is. It's been tested pretty extensively in-house and proved effective at solving our boost woes.

There's a more detailed description of the rationale behind each feature in each commit message.

CC: @fergushenderson -- you may be interested in this, as I believe you reviewed Alastair's patch in 2014.

While I can imagine many use cases for this, the current need is that we would like to set a limit on the resources used by the include_server (via prlimit) to prevent it from pushing a machine into swap. We can do this in the make wrapper passed to pump, but only if the PID is exported. Signed-off-by: Alex Karle <akarle@mathworks.com>

Files in system directories (i.e. /usr/lib) are not sent over the network (instead added to the build area via a symlink forest) as an optimization. This patch takes that optimization one step further -- files identified as under a system directory are not only not sent, but they are not parsed for their own #include's and #define's. While this can create problems if a macro defined in these headers is used in an #include in the source to compile, it can also significantly speed up compilation for system headers that are known slow to parse (such as boost). A future patch implements a -DMACRO=VAL switch that allows the user to list any necessary macros skipped that are critical for #include parsing to work around this problem. Signed-off-by: Alex Karle <akarle@mathworks.com>

System directories are by default derived from the compiler, and are used to determine which headers do not need to be sent to the remote build hosts (and which areas do not need parsing, should -S be given). This patch adds a switch to the include_server to allow the user to define additional system directories. An example use case is if all build hosts have access to the same NFS share -- this can be considered a "system directory", even though the compiler doesn't view it as such, we can be sure that all machines have access to the area (and thus don't need to receive those files via the normal transfer mechanism). Signed-off-by: Alex Karle <akarle@mathworks.com>

When the include_server is told to skip parsing system headers (-S) it can greatly speed up the parsing, but it can also cause missed dependencies if a system header defines a macro used in an #include line. The canonical example of this is boost headers--they are so complex that the include_server hangs parsing them; however, `BOOST_PP_CAT` and `BOOST_PP_STRINGIZE` may be used in include lines: ``` #include BOOST_PP_CAT(ROOT,include/foo.h) ``` This patch adds a workaround to this issue by allowing the user to define system macros used in include lines: ``` INCLUDE_SERVER_ARGS="-DBOOST_PP_CAT(a,b)=a##b" pump ... ``` In practice, the number of macros defined in system headers _that are also used in #include lines_ should be rather small. An alternate take on this same problem was Alastair Rankine's OVERRIDE_MACROS pattern [1], which not only allows users to provide a definition for a macro that was skipped, but also changes the parser to avoid attempting to parse overridden macros. We've only found parsing issues with boost, so the hope is that the ability to override a macro by name is not needed, only the ability to define skipped macros, which allows for this simpler -D implementation. [1]: https://github.com/randomphrase/distcc/tree/include-server-fix Signed-off-by: Alex Karle <akarle@mathworks.com>

As the comment in basics.py says, the USER_TIME_QUOTA is critical to fine tune for ones environment--too little and the cache is dropped frequently (causing churn and more timouts). Too much and the include server can waste cycles getting stuck. This patch allows users to customize it (while keeping the defaults of 3.8 and 4.0 for timeout and check respectively). Note on the big scary FIXME comment--I audited the code base and found that the timer is either manually cancelled (in the case of subprocess.Popen) or the cause for concern (as in f.read()) is legacy. Specifically, as of Python3.5 (2015) and PEP475, most IO routines in python will resume after the signal handlers signals transparently (and we shouldn't be concerned about having to handle EINTR). I kept the comment because I think it's still probably cleaner to just avoid SIGALRM altogether, and _maybe_ there's some strange edge case where IO gets in a bad state if ALRM is signalled during it, but I'm reasonably optimistic that this should be a concern of the past. [PEP475]: https://peps.python.org/pep-0475/ Signed-off-by: Alex Karle <akarle@mathworks.com>

NOTE: this assumes distcc#451, distcc#459, distcc#566, and distcc#559 all merge and testing goes well. Signed-off-by: Alex Karle <akarle@mathworks.com>

akarle added 5 commits October 14, 2025 09:46

akarle added a commit to akarle/distcc that referenced this pull request Nov 24, 2025

doc: Update NEWS for new v3.5 release

638988e

NOTE: this assumes distcc#451, distcc#459, distcc#566, and distcc#559 all merge and testing goes well. Signed-off-by: Alex Karle <akarle@mathworks.com>

akarle mentioned this pull request Nov 24, 2025

doc: Update NEWS for new v3.5 release #569

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

include_server: Add opt-in features to work around boost headers#559

include_server: Add opt-in features to work around boost headers#559
akarle wants to merge 5 commits intodistcc:masterfrom
akarle:ak/include-server-features

akarle commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

akarle commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant