Experimental: Gradual conversion into Rust #505

sourcefrog · 2024-03-05T03:34:16Z

This PR is still exploratory and experimental. I might merge it before everything is converted, but only after enough has been done to validate the approach.

Basically, the idea is to gradually convert distcc into Rust, as a more or less function-for-function or even line-for-line conversion, using a bindgen FFI between them until eventually all the C is removed.

The approach is inspired by the Fish shell conversion, amongst others: https://github.com/fish-shell/fish-shell/blob/master/doc_internal/fish-riir-plan.md

Why do this?

Maintaining 20-year old C code is just not fun; maintaining Rust is subjectively more fun for me
Maintaining autoconf/make is also not fun; it could presumably be ported to gyp or something but why not move to Cargo.
A lot of the complexity in autoconf etc is about OSes and platforms that are now long obsolete, such as those without snprintf.
The code has accumulated a lot of ifdefs and branches to support various features and modes and those are hard to test and refactor; refactoring Rust with stronger compile time checks is in my experience easier
There are some (probably) memory safety bugs including "Segmentation fault'" when run distcc with undefined args or PATH followed string "distcc" #497, distccmon-gnome crashes if built with -flto=auto #502, malloc(): corrupted top size when doing anything on powerpc gentoo #476, malloc(): corrupted top size on PowerPC (32-bit) but not x86 #472, strv handling in insane (argv) #260. These could be fixed in C with Valgrind or whatever but it's also just unappealing work.
In particular, string manipulation in C is not fun and easy to get wrong, see e.g. Fix issue distcc#497 "Segmentation fault'" when run distcc with undef… #500

Why not?

Old or niche architectures

We might lose support for some old or obscure platforms, which would be a bit of a shame because people do report using distcc to make development on old machines more pleasant. For example people post about using distcc on PowerPC. However, Rust does seem to have some support for them: https://doc.rust-lang.org/beta/rustc/platform-support.html. In the worst case, since I don't plan to change the protocol, people could still use old clients.

Anyhow, a maintained distcc on newer platforms is probably better than an unmaintained version everywhere.

Porting Rust (and particularly LLVM) to an older platform is probably a lot of work. On the other hand Linux and other base software seems likely to need it over time.

On the other hand, Rust is probably much less likely to have unintended cross-platform variation than C, so there are less likely to be platform-specific bugs like #476.

Harder to contribute?

In principle there might be people who know C and want to contribute a fix and who don't know Rust. I think it's not that hard to work out how to make small changes, even though there is a learning curve to whole-program design in Rust. But also, if I'm going to pick this up, I'd rather optimize for my own productivity.

How?

By contrast to the Fish approach, what I am going to try here is to convert from the outside in: replace autoconf/make with Cargo, have Cargo build all the C code with a small Rust shim that invokes the C main, and then gradually replace other parts. It seems to be working.

Following the path of Fish I'm going to try to first convert everything to Rust without making it especially idiomatic, and then perhaps refactor later.

I plan to add some unit tests for the C code as I go, calling it from Rust through the bindings, and use that to check that the Rust code has the same behavior. There is some risk that the intended behavior of the C code is ambiguous and so I might misinterpret it in the same way in both the tests and the Rust replacement. But again, this is probably better than it just remaining untested.

I don't intend to make breaking changes to the protocol.

I'm not going to merge this until I'm pretty sure it will work well, but I might merge it before all the C code is converted, if it's going very well and I'm sure it can complete.

I'll set up GitHub actions to, at least, run the cargo tests.

Preconditions

I'll make a new release first before landing any of this (#503.)

Pending PRs

I'll review and if appropriate merge all pending bugfix PRs, so the work in them is not lost by the rewrite. There's a lot of open PRs built up.

PRs that add new features or that are hard to assess, perhaps actually would be better redone in Rust.

TODO

Build and test in CI
Try testing the C code altered in Fix issue distcc#497 "Segmentation fault'" when run distcc with undef… #500
Convert some small leaf-node files
Build distccd, etc.
trace.c relies on passing va_args that are later formatted through vsprintf, which might be difficult in Rust. Maybe the callbacks should instead take a char*...
Run the existing make test tests against the chimeric binary

asheplyakov · 2024-03-05T05:04:03Z

I think the main problem is the lack of unit tests. Conversion to rust (or whatever language) won't magically improve this. (Actually it's the other way around: there will be more bugs due to rewriting, and these bugs won't be noticed since there are no tests).

For instance the root cause of #476 seems to be incorrect "cross rewriting" (see #476 (comment)). Sure there will be less (or even no) segfaults in rust version, but

distccd[66787] (dcc_execvp) ERROR: failed to exec powerpc-unknown-linux-gnu-gcc: Permission denied

is not going to change.

P.S.
Shameless plug: there's a python implementation (both server and client) over here. It can be used for "weird" systems where rust is not available (or with "weird" compilers not supported by distcc). However it supports only protocol version 1 (no compression, no server side preprocessing).

sourcefrog · 2024-03-05T17:02:59Z

That's good to know about pdistcc! I'd be happy to have you add a link from the readme or wherever.

You're quite right about the lack of unit tests, and also for that matter it also lacks realistic automated integration tests. 2003 was a different world, at least with regard to my attitude to automated testing (and arguably for the industry as a whole): I would just build and run it on a large tree and that was enough.

Also, this tree is showing the typical issues of open source projects: people want to add features and don't find it easy to add tests, and maintainers including me have historically accepted the features without the tests, making it more complex.

Rust won't magically fix the lack of tests, but I actually do have some tentative hope that it support better testing. It's just easier to write unit tests in Rust in my experience because you have more facilities easily available to construct test data, make assertions, build and inspect strings, etc.

It's even somewhat easier to write tests for C functions in rust, like in c_unit_tests.rs here. Most of the test is safe Rust.

So, if I proceed with this, or at least for the experiment, I would probably write unit tests for things as I go. At least, I'm interested to see how that goes.

There's still some risk that the intended behavior of the code is unclear or there are behaviors that are not easily tested well at the function level and so trying to write a unit test from looking at the existing code would miss something. If I misunderstand it I might carry the same misunderstanding into both the rewrite and the test.

It would also be really good to have a script that sets up server VMs and runs the client against them as a realistic integration test and somewhat of a benchmark. This is complicated by there being a lot of options that change the behavior, but at least we could test the most central path.

Having an integration test is orthogonal to Rust: it would make the C more maintainable or increase confidence in a reimplementation. However I think it's only worth building an integration test if there's a feasible path to making the code more clean and maintainable.

In #476 (comment), EPERM on PowerPC, I'm not sure what's causing it but the association with malloc errors makes me think at least part of the problem is a memory corruption bug. Did you have a different idea?

sourcefrog · 2024-03-06T16:52:57Z

Actually I see there are now a decent number of tests run by make check, so as long as they keep running it might give some assurance that large changes are not breaking anything.

asheplyakov · 2024-03-06T18:25:00Z

In #476 (comment), EPERM on PowerPC, I'm not sure what's causing it but the association with malloc errors makes me think at least part of the problem is a memory corruption bug. Did you have a different idea?

I think there are several problems in #476:

heap corruption in dcc_rewrite_fqn, which has been fixed by commit 879b71d
Incorrect assumption about native compiler triplet, this one is also fixed by now (by commit 850db9e).

It's covered by the PR

sourcefrog · 2024-03-06T20:46:26Z

Ah ok, thank you very much for fixing them, and that sounds like a good prompt to make a new release soon.

But also, part of it is a memory corruption bug and part of it, arguably, is a problem from the weird old autoconf machinery. So, although this specific bug is now fixed, I think both of the causes are the kind of thing that would tend to go better in Rust.

I think we can assume Linux and macOS have it these days

Factor out a malloc module

Delete never-called h_argvtostr

See rust-lang/rust#68979

Duplicates README.md

wtarreau · 2024-04-06T12:10:34Z

Why not do that in a new project (e.g. by forking and renaming this one) ? That would at least allow the current one to continue to be maintained in best effort mode. Rust still has many problems, starting from the fact that nobody has the same version (if at all) on their systems, and making the current project subject to such constraints would hurt it more than it would help :-( Also that would ease the fixing of bugs in the current project, because assuming there would just be a new branch, things would diverge quickly to the point of not being backportable anymore.

wtarreau · 2024-04-06T12:14:23Z

Also, for the server side it's particularly convenient to have very little dependencies, it allows to turn easily build distccd for it and turn any mostly-idle system into a distcc server. I'm sure that many of us have headless machines for this that were built using methods that are far from the perfection.

sourcefrog · 2024-04-06T14:48:54Z

I'll emphasize that I haven't decided to even finish this, let alone actually make new releases from the Rust implementation. It's just an experiment to see what this pattern looks like, and also to just get familiar with this code again.

Why not do that in a new project (e.g. by forking and renaming this one) ? That would at least allow the current one to continue to be maintained in best effort mode.

So, perhaps yes, a new Rust implementation could be released as a different package name, to make it easier for any downstream packagers/distributors to ship both.

Equally well if the trunk becomes Rust (which is not yet decided), people can continue to run, or package, previous versions.

But, you're quite right, changes would not be easily ported between the two so one would tend to die off.

But also, let's be realistic what "best effort maintenance" means: the previous maintainers made no releases for years, and nobody stepped up to replace them, and when I search my heart I just don't find much enthusiasm for putting my personal time into writing or reviewing code doing string manipulation in C or working on autoconf.

(However, independently of this PR, I will make a new C release, and review and if possible fix&merge the accumulated PRs.)

Rust still has many problems, starting from the fact that nobody has the same version (if at all) on their systems, and making the current project subject to such constraints would hurt it more than it would help :-(

I don't want to make us rehearse all the pros and cons of Rust, which certainly exist, but can you say more about why it would matter about people having the same version? This kind of program should be able to build on any reasonably-recent rust toolchain and clients/servers would not need to build with the same rustc.

(As a detail this branch constrains it to a fairly recent rust>1.77 because that added the c"hello" syntax for C string literals, but I would not expect to bump the minimum version much after that.)

I can imagine there might be corporate environments where people really want to build the current distcc using only the toolchain from a 5-year-old RedHat without installing any new tools, and this change would be hard for them, but, I'm just not sure I should optimize for that. Or, as mentioned above, people on retrocomputing platforms, but perhaps they can use the C or Python client.

Also, for the server side it's particularly convenient to have very little dependencies, it allows to turn easily build distccd for it and turn any mostly-idle system into a distcc server. I'm sure that many of us have headless machines for this that were built using methods that are far from the perfection.

I think this actually works much better in Rust: it builds binaries that are self-contained (quasi statically linked) aside from libc. It's easy to copy the server binary onto a machine, and the server need not have any Rust tooling installed. Cross builds should also be much easier.

sourcefrog · 2024-04-06T14:52:19Z

PS, looking at the code with 20 years more experience and with the lens of working in Rust, it is remarkable how informal or loose it is about memory management, e.g. functions that sometimes return a pointer into a newly allocated buffer or sometimes a pointer into an existing string. Probably in most cases the net result is a harmless memory leak that will be collected when the short-lived process exits. But, still, it reinforces my convictions about the likelihood of lurking bugs and the desirability of stronger compiler checks.

wtarreau · 2024-04-17T07:10:26Z

I get some of your points and am not commenting on possible old code quality. We've all had our era working on other systems with different constraints etc, and our own experience has improved our standards over time. But I mean, users of distcc use it exclusively to build C projects, and usually you install something like distcc and ccache when you expect to build a lot. I.e. you're either a distro packager or a developer, and in both cases you're fluent with the language (at least to fix build issues). This is also how some fixes and improvements came (I still have some tiny patches on various machines to propose BTW). Here, facing a totally foreign language could be a significant showstopper. Also, the language is not yet available everywhere, let alone in a recent version. A few examples of the systems I'm using distcc/distccd on:

$ rustc --version
rustc 1.58.1
$ slackpkg search rust|grep -w rust
  uninstalled              patches                      rust-1.46.0-x86_64-1_slack14.2            
$ apt search ^rust
rust-all/focal-security,focal-security,focal-updates,focal-updates 1.70.0+dfsg0ubuntu1~bpo2-0ubuntu0.20.04.1 all
$ apt search ^rust
 rust-all/unstable 1.63.0+dfsg1-1 all
$ pkg search ^rust-
rust-1.66.0                    Language with a focus on memory safety and concurrency
$ sudo pkgutil -a | grep -ci rust
0

The last machine is interesting BTW, as one of those benefitting the most from distcc, it's a dual-cpu Solaris Sparc which divides its build time by 15 thanks to distcc. One of the big benefits of distcc precisely is to make it possible for developers to continue to occasionally use a bit outdated systems for compatibility tests and bug reproduction, and making their experience reasonably comfortable.

On the opposite, if one would need a comparable tool to build Rust code (and I think it would be deserved considering that it's much slower to compile than C code due to extra checks), I would find it totally natural that it would be written in the same language for the reasons above (availability of the build environments and knowledge from its users).

Maybe actually it would make a lot of sense to try to start a "distrust" project to provide distributed builds for Rust, keeping in mind some of the points needed for C and that may ultimately cover them both. It would then add some value instead of removing some where the project is super useful.

sourcefrog · 2024-04-17T17:26:41Z

Thanks for continuing to post your good thoughts, and listening to me kicking it around: it's helpful to me in weighing it up and even just working out what to do with distcc.

Rust on minor platforms

I definitely realize that people will want to run distcc on old or minor platforms, and I appreciate the apparent strangeness and likely friction of making C builds depend on Rust.

Supporting old platforms even constrains the C implementation somewhat, although in a different way: the code has lots of ifdefs, autoconf checks, and vendored libraries. I would cut many of them if I knew I only cared about modern Unix, but to support your Sparc Solaris I want to leave them, even though it leaves things a bit cluttered. (I wonder if anyone still has a build farm for open source developers where I could even get a shell to test or debug things. (I miss my Sun, such a classy machine!))

It seems like Linux and Solaris on Sparc is supported by Rust but only as a cross target, not as a host. It would be a bit more friction for people running Sparcs if they can't build it natively, but almost certainly all of them do have some other machine where they can do the cross build. Of course given the relative CPU speed a native build could be pretty slow if it was even supported.

On the other hand I think I'd be 100x more confident that a cross built tool would work reliably in Rust than in C, due to tighter typing of ints and pointers, no undefined behavior, no per-platform variation in the standard library, etc.

We could ship cross-built binaries from CI and as noted the quasi-static binaries should be broadly usable on that kernel/arch. But some people would of course have reasons to build from source.

And I guess there is some practical limit, where a binary will fail to run on some old distro/kernel due to missing syscalls or libc changes? I don't know how far back you can take them and expect them to work.

Distro availability

Also, the language is not yet available everywhere, let alone in a recent version. A few examples of the systems I'm using distcc/distccd on:

Yes, that's a thing, and would have some cost: even on active ARM/x86 distros, the toolchain can lag upstream Rust by quite a bit, and some packages expect a recent compiler. I have read some of the history of people trying to work out how to make Rust's opinionated package manager work with Debian. I think a lot of developers follow the general advice to install from rustup.rs rather than their distro package and that works fine, but some people may not want to or be able to install from upstream. In any case it would be an impediment to getting a Rust-based distcc into a distro, who would very reasonably insist on using their packaged toolchain.

I guess an option would be to say the tested minimum supported Rust version should be conservative, perhaps the version from Debian stable (currently 1.63, June 2022) or even oldstable (currently 1.48, November 2020). That wouldn't fix every situation but might let you build on anything released in the last few years.

distrust

Yes, in some ways making a "distrust" would be obvious place to work if one is excited about both Rust and distributed compilation. Someone previously filed a feature request for it, #482. But I wouldn't recommend people should start from distcc: aside from the code being old it's very C-oriented; you could take the idea but not the code.

However, it seems that sccache already exists and can do this, although I personally haven't had a compelling case to use it yet.

https://github.com/mozilla/sccache/blob/main/docs/DistributedQuickstart.md

(Aside from there being an existing solution, I personally already have a Rust toolchain project, cargo-mutants, and I don't need another.)

Prioritizing distcc in general

Another big question that I don't know the answer to is: where does distcc stand these days vs icecream or other options? People are still using and interested in distcc, which is nice, but if it turns out that something else offers a superset of its features then perhaps investing time in this isn't a good choice.

wtarreau · 2024-04-17T19:36:52Z

Thanks Martin for the discussion, it's fruitful. I'm not going to respond to all points / comments (and others should definitely participate, I don't want to flood the discussion with only my own view/experience).

Regarding shipping binaries for odd platforms, that could definitely work. Nobody uses these platforms by pleasure, even if it has its fun, it's essentially to reproduce issues and test compatibility/portability. So we're generally not much demanding on the origin of the tools. On Solaris there's OpenCSW which distributes binaries and I think most people are fine with that. I picked gcc-5.5 from there for example. I recompiled mine later to change the hard-coded linker (18+ hours) but even the first distcc I picked for this machine came from there (3.1).

However the concern about being able to adjust/fix the utility at the source level is important, but this will essentially be done on a more up-to-date platform where the issue or limitation can be reproduced. I'm more concerned by my inability to contribute Rust code and feeling like I'm dealing with sort-of-closed source code in that it will become totally opaque to me. But if the quality, features and performance are on par with the C version, that's something often acceptable. After all most people using binaries from their distros (or even those on windows) just use the software as shipped and never expect to see nor tweak the source (as sad as it can sound to many of us).

Also, regarding an hypothetical "distrust", it would definitely have to be redone from scratch. It wouldn't make sense IMHO to constrain oneself to follow the flow matching the build of a C program for another language.

Finally regarding distcc vs icecc etc, I tried other and found them just too complicated. I was once told "no it's not complicated, look you type the command and it does everything by itself automagically". Except when it cannot, due to a myriad of good reasons ranging from lack of R/W disk space, permissions, ability to use distinct platforms as destinations, and I don't remember what else. It just worked on none of the machines I tried by then. I don't even use the pump mode in distcc. I know it should be way more performant, but the beauty of the default mode is that you can just pick a bunch of nolibc compiler from kernel.org for your host and your target(s), you install all of them at the same place, with symlinks from a common bin/ directory to these installs, matching your local names, and that's done, it works. I even build my Solaris binaries using the linux compiler because there's no OS dependency to build a .o from a .c once the preprocessing was done, it just becomes bare metal at this point. And for me the gains (15-20x) are quite sufficient, especially when I factor in the extremely simple and flexible deployment. Mind you that I'm mixing x86, ARM64 and ARMv7 machines in the same clusters in a totally transparent way. I wouldn't be surprised if other users also value this simplicity and flexibility. There's basically no way it can fail and it just takes 10 minutes to show to a coworker how to use your build farm from their laptop with zero config. Thus I definitely intend to stick to that approach ;-)

Hoping this helps you get a clearer picture of some of your users' usage!

sourcefrog changed the title ~~Gradual conversion into Rust~~ Experimental: Gradual conversion into Rust Mar 5, 2024

sourcefrog added 9 commits March 6, 2024 09:44

Initial Rust code calling into C

8edc913

Test decompression to/from file

e141d99

Rust distcc main at least builds!

68a9b27

distcc main actually builds and runs

9f56d94

Add Rust test workflow

943bbe4

Run existing C CI flow on riir branch too

cf9cfbd

-Ipopt

1e225dc

Update to checkout@v4

6a64ce9

Exclude daemon and lsdistcc main fns from lib

5c3a2da

sourcefrog force-pushed the riir branch from 37726f6 to 5c3a2da Compare March 6, 2024 17:44

sourcefrog added 9 commits March 6, 2024 10:19

Don't redeclare snprintf etc

a2d9ae5

Assume HAVE_VSNPRINTF

0508a0c

Put back snprintf.c

a6316d6

Turn on _GNU_SOURCE to get asprintf on Linux

4ca8b7d

Comment

1a69c7d

Define strlcpy only on Darwin?

43f5db0

Rebuild C only when C (or build.rs) changes

5e28c59

Fold minilzo into the general source list

47242d7

HAVE_SENDFILE

afd1754

sourcefrog force-pushed the riir branch from 9d23a81 to afd1754 Compare March 6, 2024 18:20

Run actionlint on workflows

d6ae1f9

Don't redundantly run builds on riir branch pushes

38d7d52

It's covered by the PR

sourcefrog added 2 commits March 7, 2024 11:04

Merge branch 'master' into riir

71e7381

Rename job to just 'rust'

6fcbcf2

sourcefrog added 10 commits March 10, 2024 06:29

Delete getline

115a924

I think we can assume Linux and macOS have it these days

Delete getline tests

036b812

Convert 'strip' tests to Rust

12a0843

Move Rust into rust/

744dead

Put objects to be freed by C on the C heap

c4d4bcb

Factor out a malloc module

Test dcc_argv_tostr

e002759

Delete never-called h_argvtostr

Suppress Valgrind false positives

fe29b6e

See rust-lang/rust#68979

Rebuild when any C source changes

029c7bb

Suppress popt gcc warnings

596dc8c

Start to test scan_args

fcf08e1

sourcefrog force-pushed the riir branch from b6fc558 to fcf08e1 Compare March 13, 2024 02:10

sourcefrog added 5 commits March 13, 2024 02:24

Few more tests for scan_args

a221f87

Add glue to turn on C trace

397a7f9

Convert args test to Rust

9a36b04

Convert more tests from Python

e06ded0

Convert host list parsing tests to Rust

3321974

sourcefrog force-pushed the riir branch from 603d2b0 to 3321974 Compare March 15, 2024 05:40

sourcefrog added 3 commits March 18, 2024 08:01

Add cstr_to_string

1b51a38

Start testing dotd.c

a269101

Delete obsolete README

d507e1a

Duplicates README.md

sourcefrog force-pushed the riir branch from 84259c5 to 4df186d Compare March 26, 2024 14:37

Convert dotd tests to Rust

c837c23

sourcefrog force-pushed the riir branch from 4df186d to c837c23 Compare March 26, 2024 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental: Gradual conversion into Rust #505

Experimental: Gradual conversion into Rust #505

sourcefrog commented Mar 5, 2024 •

edited

asheplyakov commented Mar 5, 2024 •

edited

sourcefrog commented Mar 5, 2024

sourcefrog commented Mar 6, 2024

asheplyakov commented Mar 6, 2024

sourcefrog commented Mar 6, 2024

wtarreau commented Apr 6, 2024

wtarreau commented Apr 6, 2024

sourcefrog commented Apr 6, 2024

sourcefrog commented Apr 6, 2024

wtarreau commented Apr 17, 2024

sourcefrog commented Apr 17, 2024

wtarreau commented Apr 17, 2024

Experimental: Gradual conversion into Rust #505

Are you sure you want to change the base?

Experimental: Gradual conversion into Rust #505

Conversation

sourcefrog commented Mar 5, 2024 • edited

Why do this?

Why not?

Old or niche architectures

Harder to contribute?

How?

Preconditions

Pending PRs

TODO

asheplyakov commented Mar 5, 2024 • edited

sourcefrog commented Mar 5, 2024

sourcefrog commented Mar 6, 2024

asheplyakov commented Mar 6, 2024

sourcefrog commented Mar 6, 2024

wtarreau commented Apr 6, 2024

wtarreau commented Apr 6, 2024

sourcefrog commented Apr 6, 2024

sourcefrog commented Apr 6, 2024

wtarreau commented Apr 17, 2024

sourcefrog commented Apr 17, 2024

Rust on minor platforms

Distro availability

distrust

Prioritizing distcc in general

wtarreau commented Apr 17, 2024

sourcefrog commented Mar 5, 2024 •

edited

asheplyakov commented Mar 5, 2024 •

edited