Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coreos-installer not finishing install properly on single core/thread vm's #575

Open
sedlund opened this issue Jul 17, 2021 · 8 comments
Open

Comments

@sedlund
Copy link

sedlund commented Jul 17, 2021

Bug

coreos-installer not working most of the time on single core/thread VMs.

I have tried it on the same vm with one thread and got it to finish successfully a few times out of hundred by re-running the below docker command and doing disk IO related commands while it was reading the image in another connection to it.

EG find / du while sleep .1; echo w | fdisk /dev/sda; done or seemingly luck.

I initially thought it was a lack of memory problem as a 4GB vm on the same provider (hetzner) would work with the method below, but that VM had 2 cpus. So I tried a 2cpu 2GB and that worked.

I then tried the same method on 1 thread 2GB VM on vultr running an old CoreOS (OG) version and had the same udevadm settle problem.

Host Operating System Version

Debian 10 buster live on Hetzner with docker 18.01

also CoreOS (OG) stable (1185.5.0) ISO on Vultr

Target Operating System Version

CoreOS Stable v 34.20210626.3.1

coreos-installer Version

quay.io/coreos-installer:release
quay.io/coreos-installer:latest
quay.io/coreos-installer:0.9.2-alpha.0

Expected Behavior

Installation Successful

Actual Behavior

root@rescue ~ # docker run --privileged --rm -v /dev:/dev -v /run/udev:/run/udev -v /root:/data -v coreos:/mnt -w /d
ata quay.io/coreos/coreos-installer:latest install /dev/sda  -i config.ign -f /mnt/fedora-coreos-34.20210626.
3.1-metal.x86_64.raw --insecure
Copying image from /mnt/fedora-coreos-34.20210626.3.1-metal.x86_64.raw
Couldn't read signature file: No such file or directory (os error 2)
Signature not found; skipping verification as requested
Read disk 95.0 MiB/1.4 GiB (6%)
Read disk 188.2 MiB/1.4 GiB (12%)
Read disk 287.5 MiB/1.4 GiB (19%)
Read disk 383.0 MiB/1.4 GiB (26%)
Read disk 480.0 MiB/1.4 GiB (32%)
Read disk 576.2 MiB/1.4 GiB (39%)
Read disk 676.8 MiB/1.4 GiB (46%)
Read disk 774.5 MiB/1.4 GiB (53%)
Read disk 867.5 MiB/1.4 GiB (59%)
Read disk 961.8 MiB/1.4 GiB (66%)
Read disk 1.0 GiB/1.4 GiB (72%)
Read disk 1.1 GiB/1.4 GiB (79%)
Read disk 1.2 GiB/1.4 GiB (87%)
Read disk 1.3 GiB/1.4 GiB (93%)
Read disk 1.4 GiB/1.4 GiB (100%)
Read disk 1.4 GiB/1.4 GiB (100%)
Failed to wait for daemon to reply: Broken pipe
 
Error: "udevadm" "settle" failed with exit status: 1
 
Resetting partition table
Failed to wait for daemon to reply: Broken pipe
Error: "udevadm" "settle" failed with exit status: 1
Jul 16 17:18:14 rescue kernel: GPT:Primary header thinks Alt. header is not at the end of the disk.
Jul 16 17:18:14 rescue kernel: GPT:4913151 != 40001535
Jul 16 17:18:14 rescue kernel: GPT:Alternate GPT header not at the end of the disk.
Jul 16 17:18:14 rescue kernel: GPT:4913151 != 40001535
Jul 16 17:18:14 rescue kernel: GPT: Use GNU Parted to correct GPT errors.
Jul 16 17:18:14 rescue kernel:  sda: sda1 sda2 sda3 sda4
Jul 16 17:18:15 rescue kernel:  sda:
Jul 16 17:18:15 rescue kernel:  sda:
Jul 16 17:18:15 rescue kernel:  sda:
Jul 16 17:18:15 rescue systemd-udevd[12135]: sda: Failed to process device, ignoring: Resource temporarily unavailable

Reproduction Steps

see docker command above.

@sedlund
Copy link
Author

sedlund commented Jul 17, 2021

changed

runcmd!("udevadm", "settle")?;

to udevadm trigger and worked twice first time - finishes install, reboots and sets up my ignition.

@sedlund
Copy link
Author

sedlund commented Jul 22, 2021

tested using a live fedora core 34 worktion ISO and it was able to install onto /dev/sdb on a 1cpu VM.

@sedlund
Copy link
Author

sedlund commented Jul 22, 2021

coreos-installer does not build on debian:buster (stable)

using debian:testing as the builder container and debian:stable as the resultant build container (with matching udev as host) works.

FROM debian:testing AS builder
ENV DEBIAN_FRONTEND noninteractive
RUN apt update 
RUN apt install -y cargo libssl-dev pkg-config libssl-dev
WORKDIR /build
COPY Cargo.* ./
COPY src src/
RUN cargo build --release

FROM debian:stable
ENV DEBIAN_FRONTEND noninteractive
RUN apt update 
RUN apt install -y gpg kpartx util-linux \
    udev && \
    apt clean
COPY --from=builder /build/target/release/coreos-installer /usr/sbin
ENTRYPOINT ["/usr/sbin/coreos-installer"]

@sedlund
Copy link
Author

sedlund commented Jul 22, 2021

adding -v /sbin/udevadm:/sbin/udevadm to the docker run command seems to resolve with official coreos-installer container

ocker run --privileged --rm -v /dev:/dev -v /sbin/udevadm:/sbin/udevadm -v /run/udev:/run/udev -v $PWD:/data -w /data -v coreos:/mnt quay.io/coreos/coreos-installer:release install /dev/sda -i config.ign -f /mnt/fedora-coreos-34.20210626.3.1-metal.x86_64.raw --insecure
Copying image from /mnt/fedora-coreos-34.20210626.3.1-metal.x86_64.raw
Reading signature from /mnt/fedora-coreos-34.20210626.3.1-metal.x86_64.raw.sig
Couldn't read signature file: No such file or directory (os error 2)
Signature not found; skipping verification as requested
Read disk 225.8 MiB/2.3 GiB (9%)
Read disk 384.8 MiB/2.3 GiB (16%)
Read disk 840.2 MiB/2.3 GiB (35%)
Read disk 1.0 GiB/2.3 GiB (44%)
Read disk 1.2 GiB/2.3 GiB (50%)
Read disk 1.3 GiB/2.3 GiB (55%)
Read disk 1.4 GiB/2.3 GiB (60%)
Read disk 1.5 GiB/2.3 GiB (66%)
Read disk 1.7 GiB/2.3 GiB (71%)
Read disk 1.8 GiB/2.3 GiB (76%)
Read disk 1.9 GiB/2.3 GiB (81%)
Read disk 2.0 GiB/2.3 GiB (85%)
Read disk 2.1 GiB/2.3 GiB (90%)
Read disk 2.2 GiB/2.3 GiB (95%)
Read disk 2.3 GiB/2.3 GiB (100%)
Read disk 2.3 GiB/2.3 GiB (100%)
Writing Ignition config
Install complete.

@bgilbert
Copy link
Contributor

Thanks for tracking this down. In summary, it looks like there's a mismatch between the newer udevadm in the coreos-installer container and an older version of udevd on the host. And the problem doesn't happen in all cases, only on a (these days) relatively rare hardware configuration.

I'm not sure there's a lot we can do about this. udevadm trigger is not generally recommended to run in production. systemd upstream presumably won't be interested in compatibility between substantially different versions of udevd and udevadm. We could ship a container based on an older Linux distro, but pinning to an old OS doesn't seem like the right approach. We can't unconditionally recommend bind-mounting udevadm into the container, because the host OS might be newer than the container, and a newer udevadm may not run with an older glibc than it was linked with.

For now, my best suggestion for anyone experiencing this issue is to use -v /usr/bin/udevadm:/usr/bin/udevadm to bind-mount the host's udevadm into the container.

@sedlund
Copy link
Author

sedlund commented Jul 27, 2021

I went and searched the systemd maillist with regard to udevadm trigger (and other searches) in production or otherwise -- I didn't come up with anything about an improper use of of it. For my edification could you shed more light on that part?

@sedlund sedlund closed this as completed Jul 29, 2021
@sedlund sedlund reopened this Jul 29, 2021
@sedlund
Copy link
Author

sedlund commented Jul 29, 2021

Sorry. Swiping through the issue I accidentally closed

@cgwalters
Copy link
Member

I think the best way to handle this type of stuff is to use e.g. systemd-run -Pq udevadm settle - which will work both on a host system and transparently run outside of the container on the host as long as the systemd socket is mounted in and the container is privileged anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants