Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buster builds broken on non-arm hosts #271

Open
XECDesign opened this issue Mar 19, 2019 · 47 comments · Fixed by hoobs-org/hoobs-build#2
Open

Buster builds broken on non-arm hosts #271

XECDesign opened this issue Mar 19, 2019 · 47 comments · Fixed by hoobs-org/hoobs-build#2

Comments

@XECDesign
Copy link
Member

Opening this as a heads to anyone relying on pi-gen.

https://bugs.launchpad.net/qemu/+bug/1805913

Unless this bug is fixed by the time buster goes live, images built through qemu-arm-static are going to be broken in slightly subtle ways. Luckily, qemu devs are pretty good and the issue is likely to be resolved before then.

pixbuf relies on the mime database, which silently fails to update and returns success. The result is that desktop is rendered without any icons.

Something similar happens with SSL certificates, breaking rpi-update and anything else that wants to use https.

Those are the known ways images break, but any binary that uses readdir() is not going to work.

Internally, we've moved our builds to an arm build server to avoid going through qemu for now.

@hhromic
Copy link
Contributor

hhromic commented Jun 24, 2019

@XECDesign thanks for the heads-up. And congratulations on releasing the new RPI4 with Buster!

Are you using this same pi-gen repository internally with your arm build servers? In other words is this repository still considered the official reference Raspbian image builder? Are you planning to update pi-gen to build buster images regardless of the problem with non-arm hosts? Thanks!

@XECDesign
Copy link
Member Author

I've just pushed the commits that we were using internally, but couldn't make public yet.

Not sure how to approach non-arm host builds right now.

@hhromic
Copy link
Contributor

hhromic commented Jun 24, 2019

Thanks a lot for that! Appreciated!
I guess for now we have to just wait for the qemu issue to get fixed upstream, unfortunately, i.e. patience.

@XECDesign
Copy link
Member Author

It's looking more like a kernel issue, but discussions I've seen on the mailing list seem to have fizzled out a long time ago without any resolution. Maybe when Buster is more commonly used it will press the issue.

@hhromic
Copy link
Contributor

hhromic commented Jun 24, 2019

Hopefully it gets more attention now that Buster is about to go live.
One final question in case you know from the top of your head: this problem only happens when the host is 64 bits or it happens on any host that is non-arm? If the host is non-arm but 32 bits do you think it would work ok? I can test using a VM if you are not sure.

hhromic referenced this issue in RetroPie/RetroPie-Setup Jun 25, 2019
@XECDesign
Copy link
Member Author

Sorry, not sure off the top of my head.

@hhromic
Copy link
Contributor

hhromic commented Jun 25, 2019

No problem, will quickly setup a VM with 32-bits Debian and check it out, if it works, then we have a temporary solution for the moment until it's fixed. Will report back. Thanks again!

@hhromic
Copy link
Contributor

hhromic commented Jun 27, 2019

Hi @XECDesign , so I can confirm now that this issue is specifically for hosts with 64 bits kernels no matter if arm or not. I made a testing VM with vanilla Debian i386 (32 bits kernel) and the generated image works fine on real hardware (tested with RPI 3B).

To verify this I built a control "broken" Buster image with a 64 bits Debian host using Docker and I did get SSL problems with curl (with GitHub and other websites too), for example:

$ curl -sSL https://github.com
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Then I built another Buster image using a 32 bits Debian host and the curl command above worked fine on the same hardware and same network almost at the same time.

Aside, I also noticed a minor issue with the Qemu version shipped with Debian Stretch, where the man-db package being installed for Buster in the image triggers many of these errors:

qemu: Unsupported syscall: 383

This is a manifestation of the following bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=891109

Fortunately this is resolved in the current Qemu version shipped with Debian Buster, therefore building with a Debian Buster host will not show any errors. I will send a PR to update the Dockerfile for this.

In summary, I wanted to let you know that this bug is not affecting 32 bits build hosts, no matter if they are arm or not (at least for me). For now you can use a 32 bits build host and pi-gen will generate a working image.

Hope this is useful for future readers!

@XECDesign
Copy link
Member Author

Thanks for looking into it. Much appreciated.

@ali1234
Copy link

ali1234 commented Jun 27, 2019

On a similar note, if you run buster in a chroot on Ubuntu 18.04 you will need to upgrade proot to 5.1.0-1.13 (if you use it) and qemu-user to something newer than 2.11 (3.1 works). This is because buster uses renameat2() and new features of getauxval(). These versions are available in Ubuntu 19.04 but not 18.04 LTS. In particular bootstrapping will not work in proot without both these upgrades because 'mv' command will not work, which completely borks the preinst and postinst scripts.

This bug is an example of the kind of issue you'll see with too old qemu versions:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923289

@Chaser
Copy link

Chaser commented Jul 3, 2019

Hi all, we were able to overcome the issue of SSL certs only by rehashing the ssl certs with c_rehash

Specifically

# Patch Issue with ssl certs as per https://github.com/RPi-Distro/pi-gen/issues/271
on_chroot << EOF
echo Patching certs...
c_rehash /etc/ssl/certs
EOF

Credit to Keith Tate as well :)

As indicated below by @hhromic its not a complete solution.

@hhromic
Copy link
Contributor

hhromic commented Jul 3, 2019

@Chaser that is a workaround for the ca-certificates package only, not a solution to the actual problem described in this issue. I wouldn't advice advertising it as a solution.

The problem affects any program using the readdir() syscall, not just ca-certificates and the effects are of varying nature. The SSL certificates issue is just one manifestation. Another (as the original post indicated) are icons not being correctly rendered in the Desktop.

It is unknown/unverified what other packages might be affected. Therefore the safest solution for now is to build using a 32-bits host (be ARM or non-ARM) as indicated before.

@Chaser
Copy link

Chaser commented Jul 3, 2019

Thanks @hhromic updated my comment to be clear its not a solution.

Are there tests that should be done to confirm issues? I have just built an image on a EC2 ARM (64bit) instance (a1.2xlarge) running ubuntu 18.04 LTS. I would like to do some sanity checks on it.

@Chaser
Copy link

Chaser commented Jul 4, 2019

@hhromic @XECDesign - did you attempt to use a i386 container and see the results? I have heard reports it worked within our team.

@hhromic
Copy link
Contributor

hhromic commented Jul 4, 2019

@Chaser I tested using a 32-bits Debian kernel inside a VM as explained on my comment here: #271 (comment)
That is not the same as an i386 userland running inside a 64 bits Docker host (which has a 64 bits kernel), if that is what you mean. Nevertheless I didn't try that approach, but I don't think it would work as the problem is related to the kernel and Qemu.

If you try it and you can confirm it works like I described in my comment, then it would be nice to know. Thanks!

@Chaser
Copy link

Chaser commented Jul 4, 2019

@hhromic - Clean execution of todays pi-gen mainline 1143530

As is pulls down qemu-user-static amd64

Get:103 http://cdn-fastly.deb.debian.org/debian buster/main amd64 qemu-user-static amd64 1:3.1+dfsg-8~deb10u1 [21.1 MB]

Changing to FROM i386/debian:buster

Get:103 http://cdn-fastly.deb.debian.org/debian buster/main i386 qemu-user-static i386 1:3.1+dfsg-8~deb10u1 [22.5 MB]

The build completed successfully. Hopefully this helps.

buster_default
buster_i386

@hhromic
Copy link
Contributor

hhromic commented Jul 5, 2019

@Chaser be aware that a successful build is not sufficient proof as no errors during building doesn't imply that the system was built correctly. As explained in the original post, this is a silent bug therefore the build succeeds but the built image is broken.

To verify your built image, burn it to an SD Card and boot a real RPI device with it. Then perform the simple test I explained in my comment: #271 (comment)

$ curl -sSL https://github.com

If you get an SSL error message, then it didn't work. If you get HTML content, then it worked.

@Chaser
Copy link

Chaser commented Jul 5, 2019

@hhromic - understood, curl works as expected. HTML content received.

@hhromic
Copy link
Contributor

hhromic commented Jul 5, 2019

@Chaser that is a very interesting result then and would mean that actually just Qemu needs to be in 32 bits, not the host kernel. That refiniment is indeed way better than using a 32 bits VM.

Can you confirm you were using a 64 bits Docker host for this test?
I will also give it a try myself too to double-check. Appreciate the testing!

@Chaser
Copy link

Chaser commented Jul 5, 2019

@hhromic - was using Codebuild docker image - https://github.com/aws/aws-codebuild-docker-images/blob/master/ubuntu/standard/2.0/Dockerfile

uname -a output:

Linux 046536f42b8e 4.14.123-86.109.amzn1.x86_64 #1 SMP Mon Jun 10 19:44:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

@rkubes
Copy link
Contributor

rkubes commented Jul 5, 2019

@hhromic I usually run my builds in a VM running Ubuntu 18.04. I don't use docker. I can confirm using the 64-bit 18.04 I will get the SSL error message.
I then changed nothing else other than installing the 32-bit qemu-user-static package and rebuilt the image. Once deployed, I was able to get HTML content without any SSL error messages.
All that to say, I also think having the 32-bit qemu is all that's needed. Not sure if there's any other tests that we can do to prove there are no other issues.

@hhromic
Copy link
Contributor

hhromic commented Jul 6, 2019

@Chaser @rkubes thanks for your input, really appreciated!
I now tested it myself too and can also confirm that indeed is just Qemu that needs to be 32 bits, not the kernel of the host system.

I used the included Dockerfile from pi-gen and used the i386/debian:buster base image instead to bring 32 bits binaries (including Qemu) as @Chaser suggested. Worked fine on actual RPI.

Not sure if there's any other tests that we can do to prove there are no other issues.

It is not clear at the moment how to test 100% reliably, however the SSL certificates test is a very good indicator as far as I can tell because it provides a tangible control case.

I will send a PR to update the included Dockerfile. @Chaser thanks a lot again for your input, I didn't know there were i386 images for Docker out there, I would have tested that for sure otherwise.

EDIT: @ryanteck might be interested. You don't need to setup a VM nor a 32-bits kernel for your host build system, just make sure you are installing the i386 version of Qemu in multiarch.

kdoren pushed a commit to kdoren/jambox-pi-gen that referenced this issue Dec 10, 2020
* Autmagically use 1386/debian:buster when running on 64-bit host to prevent error RPi-Distro#271
@XECDesign
Copy link
Member Author

XECDesign commented Feb 4, 2021

Just hit this issue while doing something else and the previous workaround didn't work. It looks like at least on some distributions it's no longer necessary to copy the qemu binary into the chroot. In my case, it was using the system's qemu binary rather than the 32bit one I was copying into the chroot.

The workaround I'm using right now is to override the qemu path binfmt uses with a local 32bit copy (edit as appropriate in your case):

if [ "$(dpkg --print-architecture)-$ARCH" = "amd64-armhf" ] && [ ! -e /proc/sys/fs/binfmt_misc/sbuild-arm ]; then
	echo ":sbuild-arm:M::\x7f\x45\x4c\x46\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:$(realpath "qemu/qemu-arm-static"):OCF" > /proc/sys/fs/binfmt_misc/register
fi

Then to remove that override:

if [ -e /proc/sys/fs/binfmt_misc/sbuild-arm ]; then
	echo "-1" > /proc/sys/fs/binfmt_misc/sbuild-arm
fi

Edit:
It looks like this is the relevant change that causes the different behaviour: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1815100

So, maybe that override with the F flag is a good approach for pi-gen to use in general, to avoid even having to copy the binary in the first place. Pi-gen would just need to check that multiarch support is enabled and that the dependencies are installed, then fetch the binary from somewhere else - debian or ubuntu repos.

@flexchar
Copy link

flexchar commented Mar 20, 2021

@XECDesign Hey mate, could we add #271 (comment) workaround to README? I've spent whole day trying to solve SSL issue and this comment solved it like a magic pill, it could help a lot of people!

bbinet added a commit to bbinet/salt-formula-ltsp that referenced this issue Jun 23, 2021
alexgg pushed a commit to balena-os/pi-gen that referenced this issue Jul 12, 2021
* Autmagically use 1386/debian:buster when running on 64-bit host to prevent error RPi-Distro#271
@starbasessd
Copy link

Any update on this with Debian 11 Bullseye?

@clarkchentw
Copy link

I can build Bullseye without issue on Bullseye AMD Machine

@starbasessd
Copy link

32 or 64 bit?

@khancyr
Copy link

khancyr commented Dec 23, 2021

I can build fine on AMD machine too (pop_os 21.04 and 21.10) both armhf and arm64 for buster or bullseye.
I build server images thus

@XECDesign
Copy link
Member Author

XECDesign commented Dec 24, 2021

Depending on the version of qemu and whether you're using docker, server images might not exhibit issues, but I still wouldn't recommend it until qemu is fixed.

There has been some good progress upstream. One of the issues has been fixed and another has a fix in the pipeline (https://gitlab.com/qemu-project/qemu/-/issues/633). Not sure how long it will be before there's an official release with both fixes.

EDIT: I should mention that the fixes only make it work with i686 qemu. amd64 still won't work, but that seems to be a glibc and/or kernel issue that might not be fixable. I'm not sure what the current state of that is.

SRaus pushed a commit to analogdevicesinc/adi-kuiper-gen that referenced this issue Feb 28, 2022
* Autmagically use 1386/debian:buster when running on 64-bit host to prevent error RPi-Distro#271
SRaus pushed a commit to analogdevicesinc/adi-kuiper-gen that referenced this issue Mar 1, 2022
* Autmagically use 1386/debian:buster when running on 64-bit host to prevent error RPi-Distro#271
SRaus pushed a commit to analogdevicesinc/adi-kuiper-gen that referenced this issue Mar 4, 2022
* Autmagically use 1386/debian:buster when running on 64-bit host to prevent error RPi-Distro#271
scuciurean pushed a commit to analogdevicesinc/adi-kuiper-gen that referenced this issue Mar 23, 2022
* Autmagically use 1386/debian:buster when running on 64-bit host to prevent error RPi-Distro#271

(cherry picked from commit dd96ca1)
@nicolas17
Copy link

The qemu inside the Docker image seems to be irrelevant, you need qemu installed on the host (running a Docker build, I got errors about wrong architecture until I installed qemu-user-static on the host). And then I hit this bitness error (invalid SSL certificates etc).

I tried installing qemu-user-static:i386 on the host, but this makes GPG fail in the chroot, so apt can't verify signatures and it fails even earlier. Is there any valid workaround nowadays?

SRaus pushed a commit to analogdevicesinc/adi-kuiper-gen that referenced this issue Sep 16, 2022
* Autmagically use 1386/debian:buster when running on 64-bit host to prevent error RPi-Distro#271

(cherry picked from commit dd96ca1)
@Josh798
Copy link
Contributor

Josh798 commented Nov 28, 2022

I need some clarification - it was mentioned in this thread that any binary using the readdir syscall is not going to work. To that I say "of course", as this syscall is not implemented in arm64 (or in any architecture that I know of other than x86). I have to be missing something obvious. Why would a binary compiled for arm64 even try to use the readdir syscall? Could someone explain?

Was it meant that any binary using the readdir() library function will not work? I could buy that.

@XECDesign
Copy link
Member Author

I need some clarification - it was mentioned in this thread that any binary using the readdir syscall is not going to work.

It has been a while since I've looked at this so I can't give full details.

It's likely that you're thinking of this, while the issue is with this.

If the actual syscall is involved, a particular arm binary doesn't have to call it itself. It could be something qemu does, depending on the architecture it's built for and the paths it takes. Not sure.

Either way, the issue seems to be resolved in Bullseye, at least.

@cpascual
Copy link

cpascual commented Jan 5, 2023

Just an update for someone getting here:

I confirm that I just built an arm64 bullseye lite (stage2) image using build.sh on the arm64 branch of this repo. My machine is an amd64 debian bullseye. After dumping the image on a real RPi4 and running curl -sSL https://github.com I got proper html code (so I guess it is indeed fixed)

SRaus pushed a commit to analogdevicesinc/adi-kuiper-gen that referenced this issue May 30, 2023
* Autmagically use 1386/debian:buster when running on 64-bit host to prevent error RPi-Distro#271

(cherry picked from commit dd96ca1)
SRaus pushed a commit to analogdevicesinc/adi-kuiper-gen that referenced this issue May 30, 2023
* Autmagically use 1386/debian:buster when running on 64-bit host to prevent error RPi-Distro#271

(cherry picked from commit dd96ca1)
wandering-andy pushed a commit to wandering-andy/pi-gen that referenced this issue Oct 15, 2023
* Autmagically use 1386/debian:buster when running on 64-bit host to prevent error RPi-Distro#271
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.