client: Don't invoke `systemctl start` if unit is already active #3523

cgwalters · 2022-03-16T17:46:54Z

Adding onto the pile of hacks here unfortunately. Basically
RHEL8 systemd seems to count explicit systemctl start invocations
against the restart limit. We hit this in tests in openshift/machine-config-operator
which are invoking rpm-ostree status and rpm-ostree kargs frequently.

cgwalters · 2022-03-16T17:47:15Z

(Only compile tested locally, my rhel8 devenv bitrotted, looking at resurrecting it)

cgwalters · 2022-03-16T17:48:38Z

xref openshift/machine-config-operator#3019

jlebon

Aside: the original patch for this was partly added because the daemon was taking too long to start and racing with the D-Bus timeout, but I'm 73% sure that now with #3406, this will no longer be an issue. The error-reporting part still applies though.

jlebon · 2022-03-16T19:20:33Z

rust/src/client.rs

+    let activeres = Command::new("systemctl")
+        .args(&["is-active", "rpm-ostreed"])
+        .output()?;
+    if !activeres.status.success() {


is-active returns nonzero if the service is not active. But even then, I think it'd be better to not make this a hard error.

So maybe let's drop this if-statement entirely and key off purely on its stdout regardless of the exit code?

Argh, I am so used to my custom fish prompt that clearly shows exit status of last command that I get tripped up in bash cases when it isn't present.

cgwalters · 2022-03-16T19:58:37Z

Aside: the #2945 was partly added because the daemon was taking too long to start and racing with the D-Bus timeout, but I'm 73% sure that now with #3406, this will no longer be an issue.

I don't think the GPG key parsing was ever a big problem on RHEL:

$ podman run --rm -ti registry.ci.openshift.org/rhcos-devel/rhel-coreos:4.11 ls -al /etc/pki/rpm-gpg/
total 16
drwxr-xr-x. 1 root root  148 Jan  1  1970 .
drwxr-xr-x. 1 root root  146 Jan  1  1970 ..
-rw-r--r--. 2 root root 1923 Jan  1  1970 ISV-Container-signing-key
-rw-r--r--. 2 root root 1669 Jan  1  1970 RPM-GPG-KEY-redhat-beta
-rw-r--r--. 2 root root 5135 Jan  1  1970 RPM-GPG-KEY-redhat-release
$

The error-reporting part still applies though.

Yeah.

Adding onto the pile of hacks here unfortunately. Basically RHEL8 systemd seems to count explicit `systemctl start` invocations against the restart limit. We hit this in tests in openshift/machine-config-operator which are invoking `rpm-ostree status` and `rpm-ostree kargs` frequently.

cgwalters · 2022-03-17T21:32:16Z

Hmm, jenkins CI seems to have gotten a bit worse recently. Clicking the restart button one more time but if that fails, going to override and we'll need to dig into that.

jlebon · 2022-03-18T14:02:05Z

Hmm, doesn't seem like a flake. It looks like something going wrong with the vmcheck test harness itself, but no obvious error messages.

jlebon · 2022-03-18T14:06:41Z

Can't reproduce locally so far. Opened #3528. Feel free to push there too to debug this.

jlebon · 2022-03-18T20:40:19Z

Restarted CI. SSH bug should be fixed now!

The main motivation here is to work around coreos/rpm-ostree#3523 (Which is itself a workaround for a RHEL8 systemd bug) Basically this e2e is invoking `rpm-ostree kargs` in a pretty tight loop which triggers that bug. To read the kernel command line, we can just read `/proc/cmdline` instead. (Now, this is the *actual* cmdline instead of just rpm-ostree's view of it, but it should be fine)

cgwalters mentioned this pull request Mar 16, 2022

server/api_test: Adjust expected error message for Go 1.18 openshift/machine-config-operator#3019

Merged

cgwalters force-pushed the rhel8-startlimit branch from c21cc41 to b21cbd9 Compare March 16, 2022 17:49

jlebon reviewed Mar 16, 2022

View reviewed changes

cgwalters force-pushed the rhel8-startlimit branch from b21cbd9 to 0556152 Compare March 16, 2022 20:05

jlebon approved these changes Mar 16, 2022

View reviewed changes

jlebon enabled auto-merge March 16, 2022 21:12

jlebon merged commit 10d3f24 into coreos:main Mar 18, 2022

cgwalters mentioned this pull request Mar 22, 2022

e2e: Use /proc/cmdline instead of rpm-ostree kargs openshift/machine-config-operator#3034

Merged

cgwalters mentioned this pull request Jul 11, 2022

daemon: Drop workarounds for rpm-ostree bugs openshift/machine-config-operator#3239

Merged

cgwalters mentioned this pull request Jul 19, 2022

Support delegation of privilege using LoadCredential=, add socket activation #3850

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client: Don't invoke `systemctl start` if unit is already active #3523

client: Don't invoke `systemctl start` if unit is already active #3523

cgwalters commented Mar 16, 2022

cgwalters commented Mar 16, 2022

cgwalters commented Mar 16, 2022

jlebon left a comment

jlebon Mar 16, 2022

cgwalters Mar 16, 2022

cgwalters commented Mar 16, 2022

cgwalters commented Mar 17, 2022

jlebon commented Mar 18, 2022

jlebon commented Mar 18, 2022

jlebon commented Mar 18, 2022

client: Don't invoke systemctl start if unit is already active #3523

client: Don't invoke systemctl start if unit is already active #3523

Conversation

cgwalters commented Mar 16, 2022

cgwalters commented Mar 16, 2022

cgwalters commented Mar 16, 2022

jlebon left a comment

Choose a reason for hiding this comment

jlebon Mar 16, 2022

Choose a reason for hiding this comment

cgwalters Mar 16, 2022

Choose a reason for hiding this comment

cgwalters commented Mar 16, 2022

cgwalters commented Mar 17, 2022

jlebon commented Mar 18, 2022

jlebon commented Mar 18, 2022

jlebon commented Mar 18, 2022

client: Don't invoke `systemctl start` if unit is already active #3523

client: Don't invoke `systemctl start` if unit is already active #3523