Add support for `container-rebase` #428

jdoss · 2022-12-29T18:52:58Z

<walters> EDIT: Transferring this issue from rpm-ostree

Basically let's add something like:

variant: fcos
version: x
bootc:
  target: quay.io/example/customos:latest

One reason we should do this is that we need systemd unit ordering which correctly orders against ignition-firstboot-complete.target among others (see below).

Original issue follows:

Host system details

[root@appliance ~]# rpm-ostree status
State: idle
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: inactive
Deployments:
● ostree-unverified-registry:registry.local:5000/appliance:devel
                   Digest: sha256:ce098ae1aeaff8663df6a8ae131f4ae7af70c810ae518f542fddc20ad20cbcad
                  Version: 37.20221211.3.0 (2022-12-29T18:37:01Z)

  fedora:fedora/x86_64/coreos/stable
                  Version: 37.20221211.3.0 (2022-12-26T13:53:28Z)
                   Commit: 93930f1bbe732751297fb7e5c4b7f3b79c563a803f3cf8c48115f84c541f86a7
             GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A

Expected vs actual behavior

When using the latest quay.io/fedora/fedora-coreos:stable image based off of 37.20221211.3.0 the symlinks for systemd units that are enabled within the layer are no longer present so layered systemd units do not load on reboot.

Using an older verison of FCOS 36.20221001.3.0 works as expected.

Here is the steps in my container layer build process that show the symlinks being created

STEP 20/25: WORKDIR /usr/src/appliance
--> 07fdcd5afef
STEP 21/25: RUN tar xf app.tar && ./install.sh
Created symlink /etc/systemd/system/default.target.wants/pod-appliance.service → /etc/systemd/system/pod-appliance.service.
--> dba68fb0482
STEP 22/25: WORKDIR /
--> a1fdc9a8b0a
STEP 23/25: COPY units/appliance-config.service /etc/systemd/system/appliance-config.service
--> 428764c8876
STEP 24/25: RUN systemctl enable appliance-config.service && touch /etc/appliance/env/appliance-config.env   && sed -i 's/#AutomaticUpdatePolicy.*/AutomaticUpdatePolicy=stage/' /etc/rpm-ostreed.conf
Created symlink /etc/systemd/system/default.target.wants/appliance-config.service → /etc/systemd/system/appliance-config.service.
--> 3907077e435
STEP 25/25: RUN ostree container commit

But when I reboot into this container layer

Fedora CoreOS 37.20221211.3.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos

[core@appliance ~]$ sudo su -
[root@appliance ~]# ls -lah /etc/systemd/system/default.target.wants/appliance-config.service
ls: cannot access '/etc/systemd/system/default.target.wants/appliance-config.service': No such file or directory

The symlink is not present. The unit file is however present on the file system from the layer:

[root@appliance ~]# ls -lah /etc/systemd/system/appliance-config.service 
-rw-r--r--. 1 root root 984 Dec 29 18:37 /etc/systemd/system/appliance-config.service

Expected:

Working systemd units after layering an image on FCOS and rebooting.

Steps to reproduce it

Use the latest quay.io/fedora/fedora-coreos:stable image based off of 37.20221211.3.0. Add and enable a systemd unit in your Containerfile, layer that on to FCOS and watch the systemd unit be enabled on boot.

The text was updated successfully, but these errors were encountered:

cgwalters · 2023-01-04T14:14:48Z

I'm not immediately reproducing this problem when booting latest stable and rebasing to the tailscale example. This may relate to in-place upgrades? Or it might relate somehow to https://fedoraproject.org/wiki/Changes/Preset_All_Systemd_Units_on_First_Boot

Can you try to narrow down the reproduction scenario a bit more?

One thing that jumps out to me as a little odd is you're getting default.target.wants instead of multi-user.target.wants. Are you changing the default target?

jdoss · 2023-01-04T15:34:20Z

I pushed up an example here https://github.com/quickvm/fcos-layer-paperless-ngx that reproduces the problem. It layers quay.io/quickvm/paperless-ngx:broken which has the latest FCOS stable and it doesn't start the systemd units. quay.io/quickvm/paperless-ngx:stable works.

The default.target is from podman generated systemd units. I just used my paperless ngx script to setup the podman pod and service containers and then I dumped the systemd units with podman generate systemd.

I did try switching to multi-user.target on the units but the result was the same.

cgwalters · 2023-01-06T16:58:50Z

There's a full 4GB layer in there which makes testing things here a bit annoying 😄 Is it really necessary to reproduce? (I'm looking at it, just hoping it's not...)

jdoss · 2023-01-06T17:55:35Z

Yeah that can be annoying. You could edit the container file to not include the paperless tarballs and push that up to a registry and use that. You can still reproduce without that stuff because it won't have the units enabled via simlink.

I use a local registry to speed up my development workflows. Start a local registry on your workstation:

podman run --replace -d --rm --name local-registry -p 5000:5000 docker.io/library/registry:2

And push the container to the local registry:

podman push localhost/paperless-ngx:busted <your workstation ip>:5000/paperless-ngx:busted

Add this to the butane:

  - path: /etc/containers/registries.conf.d/local.conf 
     mode: 0644 
     overwrite: true 
     contents: 
       inline: | 
         [[registry]] 
         location = "<your workstation ip>:5000" 
         insecure = true

Change the rebase exec start to use the local registry:

        ExecStart=rpm-ostree rebase --bypass-driver --experimental ostree-unverified-registry: <your workstation ip>:5000/paperless-ngx:busted

That will make local development pretty darn quick.

cgwalters · 2023-01-06T18:21:43Z

I rebased to to the :broken image and I do see the systemd units started:

[root@cosa-devsh ~]# systemctl list-units|grep pngx
  pngx-gotenberg.service                                                                                                               loaded active running   Paperless Gotenberg Service
  pngx-pod.service                                                                                                                     loaded active running   Paperless pod service
  pngx-postgres.service                                                                                                                loaded active running   Paperless-ngx Postgresql Service
  pngx-redis.service                                                                                                                   loaded active running   Paperless-ngx Redis Service
  pngx-sftpgo.service                                                                                                                  loaded active running   Paperless-ngx SFTPgo Service
  pngx-tika.service                                                                                                                    loaded active running   Paperless Gotenberg Service
● pngx-webserver.service                                                                                                               loaded failed failed    Paperless-ngx Webserver Service
  machine-pngx-pod.slice                                                                                                               loaded active active    Slice /machine/pngx/pod
  machine-pngx.slice                                                                                                                   loaded active active    Slice /machine/pngx

I can think of three potential things that might be happening.

First, there's https://fedoraproject.org/wiki/Changes/Preset_All_Systemd_Units_on_First_Boot - this will wipe out units that don't have corresponding presets - but only on first boot. If you're rebasing from a "golden" FCOS image that shouldn't apply
Except, if you're managing to do the rebase and reboot before coreos-ignition-firstboot-complete.service happens to complete, then you can have Ignition run on the next boot again too, which would likely have this effect. (Hmm...when we make sugar for rebasing via a systemd unit we should definitely ensure it's ordered after that)
If you're rebasing to an image which has the units, and then trying to rebase back to an image which doesn't, those unit links will disappear. But, this is expected behavior.

jdoss · 2023-01-06T18:31:25Z

If you launch FCOS from that butane it will reproduce after applying the layer from the systems unit.

That unit has Before=first-boot-complete.target do we need to add After=coreos-ignition-firstboot-complete.service?

jdoss · 2023-03-01T20:13:32Z

I just ran into this issue after not seeing it happen for a while.

Server without this issue:

[root@node1 ~]# journalctl -u coreos-ignition-firstboot-complete.service
Feb 07 04:41:03 node1 systemd[1]: Starting coreos-ignition-firstboot-complete.service - CoreOS Mark Ignition Boot Complete...
Feb 07 04:41:03 node1 systemd[1]: Finished coreos-ignition-firstboot-complete.service - CoreOS Mark Ignition Boot Complete.
-- Boot 13b6de94359247e79f0b99f0716fd282 --
Feb 07 05:25:13 node1 systemd[1]: coreos-ignition-firstboot-complete.service - CoreOS Mark Ignition Boot Complete was skipped because of a failed condition check>
-- Boot 907edf8815f94cd9af2f14390d10430b --

[root@node0 ~]# journalctl -u coreos-ignition-firstboot-complete.service
Feb 20 23:39:49 localhost systemd[1]: Starting coreos-ignition-firstboot-complete.service - CoreOS Mark Ignition Boot Complete...
Feb 20 23:39:49 localhost systemd[1]: Finished coreos-ignition-firstboot-complete.service - CoreOS Mark Ignition Boot Complete.
Feb 20 23:51:01 linux systemd[1]: coreos-ignition-firstboot-complete.service: Deactivated successfully.
Feb 20 23:51:01 linux systemd[1]: Stopped coreos-ignition-firstboot-complete.service - CoreOS Mark Ignition Boot Complete.
-- Boot 6cf393e50562452cab0aecba2a83129a --
Feb 20 23:52:21 node000.quickvm.com systemd[1]: coreos-ignition-firstboot-complete.service - CoreOS Mark Ignition Boot Complete was skipped because of a failed condition >
-- Boot 1e8054fc91344df2b2ffc50181310c92 --
Feb 22 07:06:58 node000.quickvm.com systemd[1]: coreos-ignition-firstboot-complete.service - CoreOS Mark Ignition Boot Complete was skipped because of a failed condition >
lines 1-8/8 (END)

Server with this issue:

# journalctl -u coreos-ignition-firstboot-complete.service
Feb 25 23:00:44 node000.quickvm.com systemd[1]: coreos-ignition-firstboot-complete.service - CoreOS Mark Ignition Boot Complete was skipped because of a failed condition >
lines 1-1/1 (END)

Per your suggestion, I noted that coreos-ignition-firstboot-complete.service had not been run before the system layered the update and rebooted. I adjusted my systemd unit that does the rebase so it runs after coreos-ignition-firstboot-complete.service and I am not seeing the issue anymore.

[Unit]
Description=Rebase FCOS to Container Image
ConditionPathExists=!/var/lib/fcos-rebase.stamp
ConditionFirstBoot=true
After=network-online.target coreos-ignition-firstboot-complete.service coreos-update-ca-trust.service
Wants=network-online.target

[Service]
Type=oneshot
RemainAfterExit=yes
Restart=on-failure
RestartSec=10s
ExecStart=rpm-ostree rebase --bypass-driver --experimental ostree-unverified-registry:{{ container_registry }}/{{ container_registry_org }}/{{ container_name }}:{{ container_tag }}
ExecStartPost=/bin/touch /var/lib/fcos-rebase.stamp
ExecStartPost=systemctl reboot
[Install]
WantedBy=basic.target

cgwalters · 2023-03-01T21:46:35Z

Thanks, I think we'll get some butane sugar for this at some point soon which should avoid that particular footgun.

jdoss · 2023-03-02T03:07:23Z

No problem Colin and thank you for your help on this issue. Do you want to close this or leave it open?

cgwalters · 2023-03-02T13:50:59Z

I've transferred the issue to butane.

bgilbert · 2023-08-01T08:46:20Z

Generally we try not to hardcode complex systemd units in Butane, but instead ship them in the OS and have Butane sugar configure them.

It looks like the workflow here is to write an Ignition config that arranges a pivot during the first boot. Is this a workflow we want to support and encourage? A major design goal of Ignition is that it makes its changes before the system boots, so we're not trying to rearrange the boot while in the middle of booting.

cgwalters · 2023-08-01T15:02:03Z

Right, but this is all part of moving stuff out of ignition in the end, it relates to coreos/fedora-coreos-docs#540

So we can just move forward with documenting there.

jdoss mentioned this issue Dec 29, 2022

Create container repo tags for each FCOS release coreos/fedora-coreos-tracker#1367

Open

cgwalters transferred this issue from coreos/rpm-ostree Mar 2, 2023

cgwalters changed the title ~~Enabled systemd units not working in a layered 37.20221211.3.0 FCOS container~~ Add support for container-rebase Mar 2, 2023

bgilbert added the jira label Apr 13, 2023

cgwalters mentioned this issue May 2, 2023

Add a doc for container provisioning and updates coreos/fedora-coreos-docs#540

Open

cgwalters closed this as completed Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `container-rebase` #428

Add support for `container-rebase` #428

jdoss commented Dec 29, 2022 •

edited by cgwalters

cgwalters commented Jan 4, 2023

jdoss commented Jan 4, 2023 •

edited

cgwalters commented Jan 6, 2023

jdoss commented Jan 6, 2023

cgwalters commented Jan 6, 2023

jdoss commented Jan 6, 2023

jdoss commented Mar 1, 2023

cgwalters commented Mar 1, 2023

jdoss commented Mar 2, 2023

cgwalters commented Mar 2, 2023

bgilbert commented Aug 1, 2023

cgwalters commented Aug 1, 2023

Add support for container-rebase #428

Add support for container-rebase #428

Comments

jdoss commented Dec 29, 2022 • edited by cgwalters

Original issue follows:

cgwalters commented Jan 4, 2023

jdoss commented Jan 4, 2023 • edited

cgwalters commented Jan 6, 2023

jdoss commented Jan 6, 2023

cgwalters commented Jan 6, 2023

jdoss commented Jan 6, 2023

jdoss commented Mar 1, 2023

cgwalters commented Mar 1, 2023

jdoss commented Mar 2, 2023

cgwalters commented Mar 2, 2023

bgilbert commented Aug 1, 2023

cgwalters commented Aug 1, 2023

Add support for `container-rebase` #428

Add support for `container-rebase` #428

jdoss commented Dec 29, 2022 •

edited by cgwalters

jdoss commented Jan 4, 2023 •

edited