Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gap discovered for installing QM in OSTREE image type, during setup #367

Closed
pbrilla-rh opened this issue Apr 17, 2024 · 25 comments · Fixed by #393
Closed

Gap discovered for installing QM in OSTREE image type, during setup #367

pbrilla-rh opened this issue Apr 17, 2024 · 25 comments · Fixed by #393
Assignees
Labels
bug Something isn't working jira

Comments

@pbrilla-rh
Copy link
Collaborator

Currently QM is not up, due to TZ changes in qm boot

/usr/bin/podman run --name=qm --replace --rm --cgroups=split --tz=local --network=host --sdnotify=conmon -d --security-opt label=type:qm_t --security-opt label=filetype:qm_file_t --security-opt label=level:s0 --device=/dev/fuse --cap-add=all --read-only -v ${RWETCFS}:/etc -v ${RWVARFS}:/var --pids-limit=-1 --security-opt label=nested --security-opt unmask=all --rootfs ${ROOTFS} /sbin/init
Error: removing /etc/localtime: read-only file system 

--tz=local tries to change RO file under /etc/qm

podman info  
Error: overlay: can't stat imageStore dir /usr/share/containers/storage: stat /usr/share/containers/storage: no such file or directory 

Need to apply changes similar to QM setup
https://github.com/containers/qm/blob/main/setup#L91-L105

@dougsland
Copy link
Collaborator

@Yarboa it seems similar we talked right ?

@dougsland
Copy link
Collaborator

@Yarboa it seems similar we talked right ?

unping, just saw it's related.

@dougsland dougsland self-assigned this Apr 17, 2024
@dougsland dougsland added bug Something isn't working jira labels Apr 17, 2024
@dougsland
Copy link
Collaborator

reproduced locally:

git clone https://gitlab.com/CentOS/automotive/sample-images && cd sample-images/osbuild-manifests
sample-images/osbuild-manifests# make cs9-qemu-qm-minimal-ostree.x86_64.qcow2

 ./runvm --nographics ./cs9-qemu-qm-minimal-ostree.x86_64.qcow2
[root@localhost ~]# journalctl -r -u qm
Apr 19 04:25:42 localhost.localdomain systemd[1]: Failed
 to start qm.service.
Apr 19 04:25:42 localhost.localdomain systemd[1]:
qm.service: Failed with result 'exit-code'.
Apr 19 04:25:42 localhost.localdomain systemd[1]: qm.ser
vice: Main process exited, code=exited, status=126/n/a
Apr 19 04:25:42 localhost.localdomain qm[669]: Error: removing /etc/localtime: r
ead-only file system

Let me start investigate this one.

@dougsland
Copy link
Collaborator

no actions needed from @ericcurtin but adding him in the loop as he is one of the maint. of ostree project. He might have some ideas out of the box.

@ericcurtin
Copy link
Contributor

ericcurtin commented Apr 20, 2024

Questions I'd ask is do we need to do this?

--tz=local

And is this referring to the /etc directory inside the container or outside the container: "/etc/localtime"? Poke around both just to be sure.

@ericcurtin
Copy link
Contributor

This is the TZ thing in podman code:

        // Make sure to remove any existing localtime file in the container to not create invalid links
        err = unix.Unlinkat(etcFd, "localtime", 0)
        if err != nil && !errors.Is(err, fs.ErrNotExist) {
                return "", fmt.Errorf("removing /etc/localtime: %w", err)
        }

But the thing is, this may be the behaviour we actually want in QM, it's supposed to be restricted.

@ericcurtin
Copy link
Contributor

ericcurtin commented Apr 20, 2024

Similarly:

                                st, err := os.Stat(store)
                                if err != nil {
                                        return nil, fmt.Errorf("overlay: can't stat imageStore dir %s: %w", store, err)
                                }

I suspect in QM the container store is in an alternate place? Maybe we should open an issue with:

https://github.com/containers/storage/issues . The above code is from there.

to see if we can configure that non-standard place.

It is a bug regardless, because even without that imageStore directory it should be still be printing things like:

version:
  APIVersion: 4.9.4
  Built: 1711445992
  BuiltTime: Tue Mar 26 09:39:52 2024
  GitCommit: ""
  GoVersion: go1.21.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.9.4

just panicing and showing nothing else, I dunno, it could show at least the stuff it knows about without that directory.

@ericcurtin
Copy link
Contributor

ericcurtin commented Apr 20, 2024

And are both these things seen in OSTree images or Regular ones also?

@ericcurtin
Copy link
Contributor

ericcurtin commented Apr 20, 2024

This shouldn't be run on a running OSTree system, I suspect that is the root cause of everything:

https://github.com/containers/qm/blob/main/setup

This script has to be run during osbuild, during some osbuild stages or be run as part of the rpm install stage.

I suspect this script was being run on a "regular" system as quick and dirty way and to showcase the design, with the intent to integrate it with the composition of the image at build time later.

@dougsland
Copy link
Collaborator

This shouldn't be run on a running OSTree system, I suspect that is the root cause of everything:

https://github.com/containers/qm/blob/main/setup

This script has to be run during osbuild, during some osbuild stage or be run as part of the rpm install stage.

I was wondering the same. Should be running in the osbuild-auto I guess? :)

I suspect this script was being run on a "regular" system as quick and dirty way and to showcase the design,
with the intent to integrate it with the composition of the image at build time later.

Agreed.

@ericcurtin
Copy link
Contributor

This shouldn't be run on a running OSTree system, I suspect that is the root cause of everything:
https://github.com/containers/qm/blob/main/setup
This script has to be run during osbuild, during some osbuild stage or be run as part of the rpm install stage.

I was wondering the same. Should be running in the osbuild-auto I guess? :)

That's sounds reasonable.

I suspect this script was being run on a "regular" system as quick and dirty way and to showcase the design,
with the intent to integrate it with the composition of the image at build time later.

Agreed.

@Yarboa
Copy link
Collaborator

Yarboa commented Apr 21, 2024

@ericcurtin @dougsland But on the build of ostree we cannot run setup script.
We could use prepared config files, copy it through osbuild pipeline , is not it ?

AFAIR, you can not run the script during the build,

@dougsland
Copy link
Collaborator

Hi @ericcurtin ,

forgot to answer this one, just ostree.

And are both these things seen in OSTree images or Regular ones also?

I was wondering something like below, please correct me if I am wrong.

org.osbuild-auto.qm.setup.

#!/usr/bin/python3
"""
Executes QM setup script
"""


import shutil
import subprocess
import sys

import osbuild.api


def main(tree, options):
    cmd = [
        "/usr/share/qm/setup"
    ]

    subprocess.run(
        cmd,
        check=True)

    return 0

if __name__ == '__main__':
    args = osbuild.api.arguments()
    r = main(args["tree"], args["options"])
    sys.exit(r)
cp org.osbuild-auto.qm.setup /usr/lib/osbuild/stages/org.osbuild-auto.qm.setup
make cs9-qemu-qm-minimal-ostree.x86_64.qcow2
[root@localhost ~]# systemctl status qm -l
● qm.service
     Loaded: loaded (/et
c/containers/systemd/qm.container; generated)
     Active: activating (auto-restart) (Result: exit-code) since Sun 2024-04-21
13:33:03 UTC; 7s ago
    Process: 533 ExecStart=/usr/bin/podman run --name=qm --cidfile=/run/qm.cid -
-replace --rm --cgroups=split --tz=local --network=host --sdnotify=conmon -d --s
ecurity-opt label=type:qm_t --security-opt label=filetype:qm_file_t --security-o
pt label=level:s0 --device=/dev/fuse --cap-add=all --read-only -v ${RWETCFS}:/et
c -v ${RWVARFS}:/var --pids-limit=-1 --security-opt label=nested --security-opt
unmask=all --rootfs ${ROOTFS} /sbin/init (code=exited, status=126)
    Process: 613 ExecStopPost=/usr/bin/podman rm -v -f -i --cidfile=/run/qm.cid
(code=exited, status=0/SUCCESS)
   Main PID: 533 (code=exited, status=126)
        CPU: 524ms
[root@localhost ~]# podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
[root@localhost ~]#

of course, there is no chance to run setup manually to do more tests in ostree distro based as there is no dnf tool and that's what we want to solve in the first place.

@dougsland
Copy link
Collaborator

@ericcurtin @dougsland But on the build of ostree we cannot run setup script. We could use prepared config files, copy it through osbuild pipeline , is not it ?

AFAIR, you can not run the script during the build,

I believe we can run setup during the osbuild-auto stage. At least, that's what I am trying right now.

@ericcurtin
Copy link
Contributor

ericcurtin commented Apr 21, 2024

This looks like you are on the right track @dougsland . There are points during osbuild where the system is completely malleable for this kind of thing. Then it gets hardened and read only. It might take some time to figure out the kinks but you should do this during osbuild.

This is somewhat similar to what you were doing for initoverlayfs-install @dougsland .

But just keep in mind how the end user will use this. I'm not close enough to qm to know how end users will use it.

@pypingou
Copy link
Member

pypingou commented Apr 21, 2024 via email

@dougsland
Copy link
Collaborator

@ericcurtin @dougsland But on the build of ostree we cannot run setup script. We could use prepared config files, copy it through osbuild pipeline , is not it ? > > AFAIR, you can not run the script during the build, I believe we can run setup during the osbuild-auto stage. At least, that's what I am trying right now.
No currently we setup qm manually in osbuild, you can't run "random" script in osbuild

Yes, I am aware of qm.ipp.yml, I was just looking to see if setup could do something missing. Thanks for the heads up. Going to execute more tests.

@dougsland
Copy link
Collaborator

dougsland commented Apr 22, 2024

This is the TZ thing in podman code:

        // Make sure to remove any existing localtime file in the container to not create invalid links
        err = unix.Unlinkat(etcFd, "localtime", 0)
        if err != nil && !errors.Is(err, fs.ErrNotExist) {
                return "", fmt.Errorf("removing /etc/localtime: %w", err)
        }

But the thing is, this may be the behaviour we actually want in QM, it's supposed to be restricted.

For test only, I have removed https://github.com/containers/qm/blob/main/qm.container#L41 and qm is up again. I am wondering if it's something podman should WARN instead of ERROR. Also, Timezone=local is being in qm.container forever but most of the tests have being in regular images only.

@rhatdan do you have a suggestion?

@rhatdan
Copy link
Member

rhatdan commented Apr 22, 2024

You should not run the setup script on a read-only /usr, even if we did something with /etc/localltime within the QM environment. Some of the scripts could try to do some modification of /usr within the QM.

@Yarboa
Copy link
Collaborator

Yarboa commented Apr 25, 2024

Apr 19 04:25:42 localhost.localdomain qm[669]: Error: removing /etc/localtime: r
@ericcurtin
No, ostree only

@Yarboa
Copy link
Collaborator

Yarboa commented Apr 25, 2024

You should not run the setup script on a read-only /usr, even if we did something with /etc/localltime within the QM environment. Some of the scripts could try to do some modification of /usr within the QM.

@rhatdan
ostree image /var /etc has rw permission
Once podman tries to set qm TZ, which file is it? /etc/qm/localtime? /etc/localtime ?

I can think of few workarounds for that, such as containers.conf or volume mount

@ericcurtin
Copy link
Contributor

ericcurtin commented Apr 25, 2024

You should not run the setup script on a read-only /usr, even if we did something with /etc/localltime within the QM environment. Some of the scripts could try to do some modification of /usr within the QM.

@rhatdan ostree image /var /etc has rw permission Once podman tries to set qm TZ, which file is it? /etc/qm/localtime? /etc/localtime ?

I can think of few workarounds for that, such as containers.conf or volume mount

/etc is read-writable only sort of. Anything that gets written there won't persist after reboot.

Maybe we just have to rewrite the script in an osbuild stage format so it's built as part of the OSTree image.

/var is read-writable but is not atomically updatable like /usr .

@Yarboa
Copy link
Collaborator

Yarboa commented Apr 25, 2024

Interesting,
Once running quadlet without #Timezone=local and working with TZ env things are working
is local

[root@localhost qm]# date
Thu Apr 25 17:44:05 UTC 2024
[root@localhost qm]# podman exec qm bash -c date
Thu Apr 25 17:44:11 UTC 2024

this one gives the following

podman exec qm bash -c "export TZ=jst ;date

I changes quadlet file to use this
to get host TZ we could use this comment #Environment=TZ=$(date +%Z) playes with this one also
Environment=TZ=jst

systemctl daemon-reload
[root@localhost qm]# systemctl restart qm
 podman exec qm bash -c "date"

Thu Apr 25 18:01:49 jst 2024

I will propose the TZ in qm.container ad ToBeFixed

@dougsland
Copy link
Collaborator

Interesting, Once running quadlet without #Timezone=local and working with TZ env things are working is local

[root@localhost qm]# date
Thu Apr 25 17:44:05 UTC 2024
[root@localhost qm]# podman exec qm bash -c date
Thu Apr 25 17:44:11 UTC 2024

this one gives the following

podman exec qm bash -c "export TZ=jst ;date

I changes quadlet file to use this to get host TZ we could use this comment #Environment=TZ=$(date +%Z) playes with this one also Environment=TZ=jst

systemctl daemon-reload
[root@localhost qm]# systemctl restart qm
 podman exec qm bash -c "date"

Thu Apr 25 18:01:49 jst 2024

I will propose the TZ in qm.container ad ToBeFixed

Hi Yariv,

Agreed, this could be a good workaround until we figure out what happened with Timezone=local.
It seems Timezone=local is just not working well, it's just not following the timezone from the host machine which is expected to happen.

On the other hand, setting empty TZ environment like Environment=TZ (the system interprets this as defaulting to Coordinated Universal Time (UTC)) or setting values like jst, America/New_York (Environment=TZ=America/New_York) etc make everything work again (autosd ostree and regular).

output from the test

[root@donald osbuild-manifests]# ./runvm --nographics ./cs9-qemu-qm-minimal-ostree.x86_64.qcow2
BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
System BootOrder not found.  Initializing defaults.
Creating boot entry "Boot0007" with label "CentOS Linux" for file "\EFI\centos\shimx64.efi"

  Booting `Automotive Stream Distribution 9 (ostree:0)'

Automotive Stream Distribution 9
Kernel 5.14.0-438.391.el9iv.x86_64 on an x86_64

localhost login: root
Password:
Last failed login: Thu Apr 25 23:02:54 UTC 2024 on ttyS0
There was 1 failed login attempt since the last successful login.
[root@localhost ~]# systemctl status qm
● qm.service
     Loaded: loaded (/usr/share/containers/systemd/qm.container; generated)
     Active: active (running) since Thu 2024-04-25 23:02:44 UTC; 13s
ago
   Main PID: 377 (conmon)
      Tasks: 7 (limit: 7765)
     Memory: 92.5M (swap max: 0B)
        CPU: 1.159s
     CGroup: /QM.slice/qm.service
             ├─libpod-payload-91335f31eadd767dfed908867584d9e4973234305dea662cb3
2f31d33c0ddc18
             │ ├─init.scope
             │ │ └─379 /sbin/init
             │ └─system.slice
             │   ├─bluechi-agent.service
             │   │ └─423 /usr/libexec/bluechi-agent
             │   ├─dbus-broker.service
             │   │ ├─426 /usr/bin/dbus-broker-launch --scope system
--audit
             │   │ └─428 dbus-broker --log 4 --controller 9 --machin
e-id 91335f31eadd767dfed908867584d9e4 --max-bytes 536870912 --max-fds 4096 --max
-matches 16384 --audit
[root@localhost ~]# podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED         STATUS         PORTS       NAMES
91335f31eadd              /sbin/init  16 seconds ago  Up 17 seconds              qm
[root@localhost ~]# cat /usr/share/containers/systemd/qm.container
[Install]
WantedBy=default.target

[Service]
# It's recommended to use systemd drop-in to override the
# systemd settings. See QM manpage for an example.
CPUWeight=50
Delegate=true
IOWeight=50
ManagedOOMSwap=kill
MemorySwapMax=0
# Containers within the qm y default set OOMScoreAdj to 750
OOMScoreAdjust=500
Restart=always
Slice=QM.slice
Environment=ROOTFS=/usr/lib/qm/rootfs
Environment=RWETCFS=/etc/qm
Environment=RWVARFS=/var/qm
LimitNOFILE=65536
TasksMax=50%

[Container]
AddCapability=all

# Comment DropCapability this will allow FFI Tools to surpass their defaults.
DropCapability=sys_resource

AddDevice=-/dev/kvm
AddDevice=-/dev/fuse
ContainerName=qm
Exec=/sbin/init
Network=host
PodmanArgs=--pids-limit=-1 --security-opt seccomp=/usr/share/qm/seccomp.json --security-opt label=nested --security-opt unmask=all
ReadOnly=true
Rootfs=${ROOTFS}

SecurityLabelNested=true
SecurityLabelFileType=qm_file_t
SecurityLabelLevel=s0
SecurityLabelType=qm_t
#Timezone=local
Environment=TZ
Volume=${RWETCFS}:/etc
Volume=${RWVARFS}:/var

Anyway, we need to explore why Timezone=local is failing with podman/quadlet in a different issue as we discussed (in parallel) and avoid block the release. First question I would like to make is: downgrade/upgrade podman versions make this issue disappear?

dougsland added a commit that referenced this issue Apr 26, 2024
In recent tests with ostree + quadlet environment the QM service was not able
to work correctly, podman failed to remove /etc/localtime affecting the QM
service. This patch is a workaround to Timezone=local which uses
Environment=TZ to set the default timezone to UTC until we work in root cause.

More information: #367

Fixes: #367

Signed-off-by: Douglas Schilling Landgraf <dougsland@redhat.com>
@dougsland
Copy link
Collaborator

As we have the workaround, further investigation will continue here: #394

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working jira
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants