Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental: Gradual conversion into Rust #505

Draft
wants to merge 71 commits into
base: master
Choose a base branch
from
Draft

Experimental: Gradual conversion into Rust #505

wants to merge 71 commits into from

Conversation

sourcefrog
Copy link
Contributor

@sourcefrog sourcefrog commented Mar 5, 2024

This PR is still exploratory and experimental. I might merge it before everything is converted, but only after enough has been done to validate the approach.

Basically, the idea is to gradually convert distcc into Rust, as a more or less function-for-function or even line-for-line conversion, using a bindgen FFI between them until eventually all the C is removed.

The approach is inspired by the Fish shell conversion, amongst others: https://github.com/fish-shell/fish-shell/blob/master/doc_internal/fish-riir-plan.md

Why do this?

Why not?

Old or niche architectures

We might lose support for some old or obscure platforms, which would be a bit of a shame because people do report using distcc to make development on old machines more pleasant. For example people post about using distcc on PowerPC. However, Rust does seem to have some support for them: https://doc.rust-lang.org/beta/rustc/platform-support.html. In the worst case, since I don't plan to change the protocol, people could still use old clients.

Anyhow, a maintained distcc on newer platforms is probably better than an unmaintained version everywhere.

Porting Rust (and particularly LLVM) to an older platform is probably a lot of work. On the other hand Linux and other base software seems likely to need it over time.

On the other hand, Rust is probably much less likely to have unintended cross-platform variation than C, so there are less likely to be platform-specific bugs like #476.

Harder to contribute?

In principle there might be people who know C and want to contribute a fix and who don't know Rust. I think it's not that hard to work out how to make small changes, even though there is a learning curve to whole-program design in Rust. But also, if I'm going to pick this up, I'd rather optimize for my own productivity.

How?

By contrast to the Fish approach, what I am going to try here is to convert from the outside in: replace autoconf/make with Cargo, have Cargo build all the C code with a small Rust shim that invokes the C main, and then gradually replace other parts. It seems to be working.

Following the path of Fish I'm going to try to first convert everything to Rust without making it especially idiomatic, and then perhaps refactor later.

I plan to add some unit tests for the C code as I go, calling it from Rust through the bindings, and use that to check that the Rust code has the same behavior. There is some risk that the intended behavior of the C code is ambiguous and so I might misinterpret it in the same way in both the tests and the Rust replacement. But again, this is probably better than it just remaining untested.

I don't intend to make breaking changes to the protocol.

I'm not going to merge this until I'm pretty sure it will work well, but I might merge it before all the C code is converted, if it's going very well and I'm sure it can complete.

I'll set up GitHub actions to, at least, run the cargo tests.

Preconditions

I'll make a new release first before landing any of this (#503.)

Pending PRs

I'll review and if appropriate merge all pending bugfix PRs, so the work in them is not lost by the rewrite. There's a lot of open PRs built up.

PRs that add new features or that are hard to assess, perhaps actually would be better redone in Rust.

TODO

  • Build and test in CI
  • Try testing the C code altered in Fix issue distcc#497 "Segmentation fault'" when run distcc with undef… #500
  • Convert some small leaf-node files
  • Build distccd, etc.
  • trace.c relies on passing va_args that are later formatted through vsprintf, which might be difficult in Rust. Maybe the callbacks should instead take a char*...
  • Run the existing make test tests against the chimeric binary

@sourcefrog sourcefrog changed the title Gradual conversion into Rust Experimental: Gradual conversion into Rust Mar 5, 2024
@asheplyakov
Copy link
Contributor

asheplyakov commented Mar 5, 2024

I think the main problem is the lack of unit tests. Conversion to rust (or whatever language) won't magically improve this. (Actually it's the other way around: there will be more bugs due to rewriting, and these bugs won't be noticed since there are no tests).

For instance the root cause of #476 seems to be incorrect "cross rewriting" (see #476 (comment)). Sure there will be less (or even no) segfaults in rust version, but

distccd[66787] (dcc_execvp) ERROR: failed to exec powerpc-unknown-linux-gnu-gcc: Permission denied

is not going to change.

P.S.
Shameless plug: there's a python implementation (both server and client) over here. It can be used for "weird" systems where rust is not available (or with "weird" compilers not supported by distcc). However it supports only protocol version 1 (no compression, no server side preprocessing).

@sourcefrog
Copy link
Contributor Author

That's good to know about pdistcc! I'd be happy to have you add a link from the readme or wherever.

You're quite right about the lack of unit tests, and also for that matter it also lacks realistic automated integration tests. 2003 was a different world, at least with regard to my attitude to automated testing (and arguably for the industry as a whole): I would just build and run it on a large tree and that was enough.

Also, this tree is showing the typical issues of open source projects: people want to add features and don't find it easy to add tests, and maintainers including me have historically accepted the features without the tests, making it more complex.

Rust won't magically fix the lack of tests, but I actually do have some tentative hope that it support better testing. It's just easier to write unit tests in Rust in my experience because you have more facilities easily available to construct test data, make assertions, build and inspect strings, etc.

It's even somewhat easier to write tests for C functions in rust, like in c_unit_tests.rs here. Most of the test is safe Rust.

So, if I proceed with this, or at least for the experiment, I would probably write unit tests for things as I go. At least, I'm interested to see how that goes.

There's still some risk that the intended behavior of the code is unclear or there are behaviors that are not easily tested well at the function level and so trying to write a unit test from looking at the existing code would miss something. If I misunderstand it I might carry the same misunderstanding into both the rewrite and the test.

It would also be really good to have a script that sets up server VMs and runs the client against them as a realistic integration test and somewhat of a benchmark. This is complicated by there being a lot of options that change the behavior, but at least we could test the most central path.

Having an integration test is orthogonal to Rust: it would make the C more maintainable or increase confidence in a reimplementation. However I think it's only worth building an integration test if there's a feasible path to making the code more clean and maintainable.

In #476 (comment), EPERM on PowerPC, I'm not sure what's causing it but the association with malloc errors makes me think at least part of the problem is a memory corruption bug. Did you have a different idea?

@sourcefrog
Copy link
Contributor Author

Actually I see there are now a decent number of tests run by make check, so as long as they keep running it might give some assurance that large changes are not breaking anything.

@asheplyakov
Copy link
Contributor

In #476 (comment), EPERM on PowerPC, I'm not sure what's causing it but the association with malloc errors makes me think at least part of the problem is a memory corruption bug. Did you have a different idea?

I think there are several problems in #476:

  1. heap corruption in dcc_rewrite_fqn, which has been fixed by commit 879b71d
  2. Incorrect assumption about native compiler triplet, this one is also fixed by now (by commit 850db9e).

@sourcefrog
Copy link
Contributor Author

Ah ok, thank you very much for fixing them, and that sounds like a good prompt to make a new release soon.

But also, part of it is a memory corruption bug and part of it, arguably, is a problem from the weird old autoconf machinery. So, although this specific bug is now fixed, I think both of the causes are the kind of thing that would tend to go better in Rust.

@wtarreau
Copy link

wtarreau commented Apr 6, 2024

Why not do that in a new project (e.g. by forking and renaming this one) ? That would at least allow the current one to continue to be maintained in best effort mode. Rust still has many problems, starting from the fact that nobody has the same version (if at all) on their systems, and making the current project subject to such constraints would hurt it more than it would help :-( Also that would ease the fixing of bugs in the current project, because assuming there would just be a new branch, things would diverge quickly to the point of not being backportable anymore.

@wtarreau
Copy link

wtarreau commented Apr 6, 2024

Also, for the server side it's particularly convenient to have very little dependencies, it allows to turn easily build distccd for it and turn any mostly-idle system into a distcc server. I'm sure that many of us have headless machines for this that were built using methods that are far from the perfection.

@sourcefrog
Copy link
Contributor Author

I'll emphasize that I haven't decided to even finish this, let alone actually make new releases from the Rust implementation. It's just an experiment to see what this pattern looks like, and also to just get familiar with this code again.

Why not do that in a new project (e.g. by forking and renaming this one) ? That would at least allow the current one to continue to be maintained in best effort mode.

So, perhaps yes, a new Rust implementation could be released as a different package name, to make it easier for any downstream packagers/distributors to ship both.

Equally well if the trunk becomes Rust (which is not yet decided), people can continue to run, or package, previous versions.

But, you're quite right, changes would not be easily ported between the two so one would tend to die off.

But also, let's be realistic what "best effort maintenance" means: the previous maintainers made no releases for years, and nobody stepped up to replace them, and when I search my heart I just don't find much enthusiasm for putting my personal time into writing or reviewing code doing string manipulation in C or working on autoconf.

(However, independently of this PR, I will make a new C release, and review and if possible fix&merge the accumulated PRs.)

Rust still has many problems, starting from the fact that nobody has the same version (if at all) on their systems, and making the current project subject to such constraints would hurt it more than it would help :-(

I don't want to make us rehearse all the pros and cons of Rust, which certainly exist, but can you say more about why it would matter about people having the same version? This kind of program should be able to build on any reasonably-recent rust toolchain and clients/servers would not need to build with the same rustc.

(As a detail this branch constrains it to a fairly recent rust>1.77 because that added the c"hello" syntax for C string literals, but I would not expect to bump the minimum version much after that.)

I can imagine there might be corporate environments where people really want to build the current distcc using only the toolchain from a 5-year-old RedHat without installing any new tools, and this change would be hard for them, but, I'm just not sure I should optimize for that. Or, as mentioned above, people on retrocomputing platforms, but perhaps they can use the C or Python client.

Also, for the server side it's particularly convenient to have very little dependencies, it allows to turn easily build distccd for it and turn any mostly-idle system into a distcc server. I'm sure that many of us have headless machines for this that were built using methods that are far from the perfection.

I think this actually works much better in Rust: it builds binaries that are self-contained (quasi statically linked) aside from libc. It's easy to copy the server binary onto a machine, and the server need not have any Rust tooling installed. Cross builds should also be much easier.

@sourcefrog
Copy link
Contributor Author

PS, looking at the code with 20 years more experience and with the lens of working in Rust, it is remarkable how informal or loose it is about memory management, e.g. functions that sometimes return a pointer into a newly allocated buffer or sometimes a pointer into an existing string. Probably in most cases the net result is a harmless memory leak that will be collected when the short-lived process exits. But, still, it reinforces my convictions about the likelihood of lurking bugs and the desirability of stronger compiler checks.

@wtarreau
Copy link

I get some of your points and am not commenting on possible old code quality. We've all had our era working on other systems with different constraints etc, and our own experience has improved our standards over time. But I mean, users of distcc use it exclusively to build C projects, and usually you install something like distcc and ccache when you expect to build a lot. I.e. you're either a distro packager or a developer, and in both cases you're fluent with the language (at least to fix build issues). This is also how some fixes and improvements came (I still have some tiny patches on various machines to propose BTW). Here, facing a totally foreign language could be a significant showstopper. Also, the language is not yet available everywhere, let alone in a recent version. A few examples of the systems I'm using distcc/distccd on:

$ rustc --version
rustc 1.58.1
$ slackpkg search rust|grep -w rust
  uninstalled              patches                      rust-1.46.0-x86_64-1_slack14.2            
$ apt search ^rust
rust-all/focal-security,focal-security,focal-updates,focal-updates 1.70.0+dfsg0ubuntu1~bpo2-0ubuntu0.20.04.1 all
$ apt search ^rust
 rust-all/unstable 1.63.0+dfsg1-1 all
$ pkg search ^rust-
rust-1.66.0                    Language with a focus on memory safety and concurrency
$ sudo pkgutil -a | grep -ci rust
0

The last machine is interesting BTW, as one of those benefitting the most from distcc, it's a dual-cpu Solaris Sparc which divides its build time by 15 thanks to distcc. One of the big benefits of distcc precisely is to make it possible for developers to continue to occasionally use a bit outdated systems for compatibility tests and bug reproduction, and making their experience reasonably comfortable.

On the opposite, if one would need a comparable tool to build Rust code (and I think it would be deserved considering that it's much slower to compile than C code due to extra checks), I would find it totally natural that it would be written in the same language for the reasons above (availability of the build environments and knowledge from its users).

Maybe actually it would make a lot of sense to try to start a "distrust" project to provide distributed builds for Rust, keeping in mind some of the points needed for C and that may ultimately cover them both. It would then add some value instead of removing some where the project is super useful.

@sourcefrog
Copy link
Contributor Author

Thanks for continuing to post your good thoughts, and listening to me kicking it around: it's helpful to me in weighing it up and even just working out what to do with distcc.

Rust on minor platforms

I definitely realize that people will want to run distcc on old or minor platforms, and I appreciate the apparent strangeness and likely friction of making C builds depend on Rust.

Supporting old platforms even constrains the C implementation somewhat, although in a different way: the code has lots of ifdefs, autoconf checks, and vendored libraries. I would cut many of them if I knew I only cared about modern Unix, but to support your Sparc Solaris I want to leave them, even though it leaves things a bit cluttered. (I wonder if anyone still has a build farm for open source developers where I could even get a shell to test or debug things. (I miss my Sun, such a classy machine!))

It seems like Linux and Solaris on Sparc is supported by Rust but only as a cross target, not as a host. It would be a bit more friction for people running Sparcs if they can't build it natively, but almost certainly all of them do have some other machine where they can do the cross build. Of course given the relative CPU speed a native build could be pretty slow if it was even supported.

On the other hand I think I'd be 100x more confident that a cross built tool would work reliably in Rust than in C, due to tighter typing of ints and pointers, no undefined behavior, no per-platform variation in the standard library, etc.

We could ship cross-built binaries from CI and as noted the quasi-static binaries should be broadly usable on that kernel/arch. But some people would of course have reasons to build from source.

And I guess there is some practical limit, where a binary will fail to run on some old distro/kernel due to missing syscalls or libc changes? I don't know how far back you can take them and expect them to work.

Distro availability

Also, the language is not yet available everywhere, let alone in a recent version. A few examples of the systems I'm using distcc/distccd on:

Yes, that's a thing, and would have some cost: even on active ARM/x86 distros, the toolchain can lag upstream Rust by quite a bit, and some packages expect a recent compiler. I have read some of the history of people trying to work out how to make Rust's opinionated package manager work with Debian. I think a lot of developers follow the general advice to install from rustup.rs rather than their distro package and that works fine, but some people may not want to or be able to install from upstream. In any case it would be an impediment to getting a Rust-based distcc into a distro, who would very reasonably insist on using their packaged toolchain.

I guess an option would be to say the tested minimum supported Rust version should be conservative, perhaps the version from Debian stable (currently 1.63, June 2022) or even oldstable (currently 1.48, November 2020). That wouldn't fix every situation but might let you build on anything released in the last few years.

distrust

Yes, in some ways making a "distrust" would be obvious place to work if one is excited about both Rust and distributed compilation. Someone previously filed a feature request for it, #482. But I wouldn't recommend people should start from distcc: aside from the code being old it's very C-oriented; you could take the idea but not the code.

However, it seems that sccache already exists and can do this, although I personally haven't had a compelling case to use it yet.

https://github.com/mozilla/sccache/blob/main/docs/DistributedQuickstart.md

(Aside from there being an existing solution, I personally already have a Rust toolchain project, cargo-mutants, and I don't need another.)

Prioritizing distcc in general

Another big question that I don't know the answer to is: where does distcc stand these days vs icecream or other options? People are still using and interested in distcc, which is nice, but if it turns out that something else offers a superset of its features then perhaps investing time in this isn't a good choice.

@wtarreau
Copy link

Thanks Martin for the discussion, it's fruitful. I'm not going to respond to all points / comments (and others should definitely participate, I don't want to flood the discussion with only my own view/experience).

Regarding shipping binaries for odd platforms, that could definitely work. Nobody uses these platforms by pleasure, even if it has its fun, it's essentially to reproduce issues and test compatibility/portability. So we're generally not much demanding on the origin of the tools. On Solaris there's OpenCSW which distributes binaries and I think most people are fine with that. I picked gcc-5.5 from there for example. I recompiled mine later to change the hard-coded linker (18+ hours) but even the first distcc I picked for this machine came from there (3.1).

However the concern about being able to adjust/fix the utility at the source level is important, but this will essentially be done on a more up-to-date platform where the issue or limitation can be reproduced. I'm more concerned by my inability to contribute Rust code and feeling like I'm dealing with sort-of-closed source code in that it will become totally opaque to me. But if the quality, features and performance are on par with the C version, that's something often acceptable. After all most people using binaries from their distros (or even those on windows) just use the software as shipped and never expect to see nor tweak the source (as sad as it can sound to many of us).

Also, regarding an hypothetical "distrust", it would definitely have to be redone from scratch. It wouldn't make sense IMHO to constrain oneself to follow the flow matching the build of a C program for another language.

Finally regarding distcc vs icecc etc, I tried other and found them just too complicated. I was once told "no it's not complicated, look you type the command and it does everything by itself automagically". Except when it cannot, due to a myriad of good reasons ranging from lack of R/W disk space, permissions, ability to use distinct platforms as destinations, and I don't remember what else. It just worked on none of the machines I tried by then. I don't even use the pump mode in distcc. I know it should be way more performant, but the beauty of the default mode is that you can just pick a bunch of nolibc compiler from kernel.org for your host and your target(s), you install all of them at the same place, with symlinks from a common bin/ directory to these installs, matching your local names, and that's done, it works. I even build my Solaris binaries using the linux compiler because there's no OS dependency to build a .o from a .c once the preprocessing was done, it just becomes bare metal at this point. And for me the gains (15-20x) are quite sufficient, especially when I factor in the extremely simple and flexible deployment. Mind you that I'm mixing x86, ARM64 and ARMv7 machines in the same clusters in a totally transparent way. I wouldn't be surprised if other users also value this simplicity and flexibility. There's basically no way it can fail and it just takes 10 minutes to show to a coworker how to use your build farm from their laptop with zero config. Thus I definitely intend to stick to that approach ;-)

Hoping this helps you get a clearer picture of some of your users' usage!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants