-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BusyBox+hostid fix-up #540
Conversation
That's interesting and we recently had a report of a similar issue for aarch64 so perhaps it might be time to update all our arm binaries as well, although I can't see anything in the release notes about this. In your PR you've updated to v1.31.0 however when looking at https://busybox.net/ it appears v1.31.0 is marked as unstable and they're now up to v1.32.1 (stable). |
Ah, interesting - didn't expect it to affect aarch64 too. Yes, it seems the issue is most common when your hostid starts with 0. I believe they're still using the wrong underlying integer type for the ID, which is causing the issue. It's likely that a new compiler check in the toolchain they're using to build is avoiding the segfault for them (it's a format spec. bug), which likely explains why you're not seeing any explicit mention of a fix (it's not fixed). I'll look into pushing a fix upstream. Honestly, I just went with the most recent prebuilt version on their binaries page - it doesn't look like 1.32.1 is available there yet. I'll update this PR with at least a bump to the Arm BB version too. |
Retrieved binaries from the Arch distribution for arm64 and x86_64 at 1.32.1 (stable). |
Thanks for updating these, interestingly I've just tried the new version on my desktop and my hostid also started reporting all zeros.. I've just tried building busybox from source and this seems to work correctly, even though they both report the same version number.
vs
Would you be able to try running this version on your system to see if that allows it to display the hostid correctly? |
That build yields a segfault again on my system. |
Just built against static musl and the issue of I've found that both static and shared musl builds produce the all-zero hostid (* see below - this is expected).
Note the disparity between The Arch package Compiling natively on my machine with glibc and updated headers produces the correct hostid, as you observed on your own machine. The issue here is that the "all-zero hostid" bug may very well transpose to a segfault on later versions of the Linux kernel. I'll need to do more digging with this to find a robust solution. EDIT:
However, hopefully the following revelation will produce some giggles: https://github.com/occlum/musl/blob/master/src/misc/gethostid.c That's My recommendation is that devlib moves away from using the hostid altogether. glibc uses arbitrary bit-fiddling to invent an ID it thinks should be unique. Other implementations just return 0... |
@marcbonnici For the sake of completion, could you recompile your busybox build as-is but with the |
Right... thanks for looking into this, that would certainly explain that one then :D So it looks like we either go with musl and don't crash but always get zeros or we stick without and either segfaults or manage to obtain a usable value.
|
Okay, here's the segfault:
The NSS facility requires dynamic library loading and matching glibc versions at runtime, so it's no surprise this is failing, even if compiled as static. There is an That being said, I need libcrypt.a and I don't really fancy waiting 1+ hours to build the static glibc libs on Arch (frustrating they decided not to bundle them by default...). Here is my proposal for the time being - use musl for max peace of mind (i.e. the binaries that are already in this PR) and emulate glibc |
glibc emulation added. This will need testing. |
Thanks for taking a stab at this however unfortunately I don't think this approach is going to work very well. For example if we take an android device connected via usb it is unlikely that we would be able to resolve the hostname, there is no guarantee that they are connected to the same network, especially due to the fact that a lot of the devices I've tested with have their hostname set as
Please correct me if I've misunderstood, but by using musl unless Also out of curiosity how many / what sort of devices have you observed this issue on? |
The value returned by For example, you and I receive the same value on our test machines: I don't know the origins of why You're correct in saying that
This edge-case would require a check, and would likely end up in returning 0 again. You're also correct in saying that, if target and host don't share the same network, IP resolution will fail, and 0 will once again be returned. The only con to this solution is missing out on a mostly-useless value in the case that the host and target are not networked. Arguably, as the hostid is derived from the IP, it has even less meaning when the devices aren't networked together. I suppose this comes down to preference... but statically linking glibc should not be an option.
Mainly modern x86_64 Linux desktops. There's no reason the problems this PR intends to solve shouldn't occur on most other targets too, however. The issue you linked highlights this is an issue on arm too. |
It was actually only very recently added to devlib itself as a result of a workaround for this issue, this value is just one of many items collected from WA as we try to collect as much information about a target as possible to store as the target info, so as far as I'm aware, this is not being widely used.
As you say, this does not seem like a reliable metric so I'm wary of adding this relativity complicated workaround for something that is unlikely to be widely used, which also changes the previously reported value, even if the new version is arguably more useful as it attempts to use the devices actual IP rather than apparently picking up the loopback device. Do you see a downside to leaving the existing implementation and simply ignoring the error if unable to execute the command? E.g. something like:
Ok thanks, I haven't been able to recreate this on my devices so was just looking for some more data points. |
Yes, this works too. The only qualm I have with this is that it leaves a broken BusyBox in place, which is likely to present issues in the future, as glibc makes extensive use of |
I agree having an reliable option would definitely be the best outcome here. On that topic I noticed on the busybox site that there are 3 options to statically link against glibc, musl and uclibc, we've explored the initial 2 but I hadn't looked at uclibc yet which apparently does have an implementation for gethostid [1] Testing the pre provided uclibc binary [2] worked (and provided consistent output) on all of my devices therefore I wondering if you could/had tested this on yours and if so which behavior you saw? [1] https://git.uclibc.org/uClibc/tree/libc/inet/hostid.c |
I think I spotted that there was no x86_64 prebuilt version and sidestepped uclibc-ng, only to totally forget of its existence. I'll setup an x86_64 build with uclibc-ng and post it here for you to test. |
Took twice as long as expected - forgot IPv6 support in libc! 🙄 Here: busybox-ulibc.zip |
That's great thank you, and yes that seems to do the trick on my devices as well. |
Looks good - I imagine the aarch64 side will need the same treatment. |
Cool thanks, I'll create another PR to update the remaining binaries. Would you prefer for me to include the x86_64 binary from your PR when I update the others directly or would you like to clean up this PR and we can merge? |
I'll roll-back the other patches here and am currently doing an aarch64 build. When complete, I'll patch and then this can be squashed and merged. Would be nice to have a GitHub Actions-based CI/CD for these builds. I'm using crosstool-ng to generate these toolchains, will have a play and see what makes sense when I have time. |
Ready to merge. Requires test on an aarch64 device and another PR to bring these archs in line:
|
Great thanks, will give them a test. Would you also be able to rebase your branch so that we don't have the unnecessary commits to try and keep the history clean and add a bit more detail into the commit messages about the updated binaries? |
The original BusyBox binaries were statically linked against glibc, which caused segmentation faults to occur on __nss_* calls. This switches the libc implementation to uClibc in both cases. busybox ver bump to 1.32.1 for arm64 and x86_64 update x86_64-linux-uclibc busybox update aarch64-linux-uclibc busybox
fe252fd
to
d83627d
Compare
Rewrote history on this PR for you, it's now a single commit with a message. Thanks. |
The version of busybox currently shipped with devlib for x86_64 produces a segfault in the
hostid
applet for certain IDs.Looking through the commit history, they seem to have made a bit of a mess with format specifiers.
I've bumped the version of BB to v1.31.0 which allows LocalLinuxTarget to progress past the
hostid
invocation.The behaviour of the applet is still incorrect, however, returning all zeroes for my system's hostid. Better an incorrect ID than a crash, however...