Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: Don't invoke systemctl start if unit is already active #3523

Merged
merged 1 commit into from Mar 18, 2022

Conversation

cgwalters
Copy link
Member

Adding onto the pile of hacks here unfortunately. Basically
RHEL8 systemd seems to count explicit systemctl start invocations
against the restart limit. We hit this in tests in openshift/machine-config-operator
which are invoking rpm-ostree status and rpm-ostree kargs frequently.

@cgwalters
Copy link
Member Author

(Only compile tested locally, my rhel8 devenv bitrotted, looking at resurrecting it)

@cgwalters
Copy link
Member Author

xref openshift/machine-config-operator#3019

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: the original patch for this was partly added because the daemon was taking too long to start and racing with the D-Bus timeout, but I'm 73% sure that now with #3406, this will no longer be an issue. The error-reporting part still applies though.

let activeres = Command::new("systemctl")
.args(&["is-active", "rpm-ostreed"])
.output()?;
if !activeres.status.success() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is-active returns nonzero if the service is not active. But even then, I think it'd be better to not make this a hard error.

So maybe let's drop this if-statement entirely and key off purely on its stdout regardless of the exit code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, I am so used to my custom fish prompt that clearly shows exit status of last command that I get tripped up in bash cases when it isn't present.

@cgwalters
Copy link
Member Author

Aside: the #2945 was partly added because the daemon was taking too long to start and racing with the D-Bus timeout, but I'm 73% sure that now with #3406, this will no longer be an issue.

I don't think the GPG key parsing was ever a big problem on RHEL:

$ podman run --rm -ti registry.ci.openshift.org/rhcos-devel/rhel-coreos:4.11 ls -al /etc/pki/rpm-gpg/
total 16
drwxr-xr-x. 1 root root  148 Jan  1  1970 .
drwxr-xr-x. 1 root root  146 Jan  1  1970 ..
-rw-r--r--. 2 root root 1923 Jan  1  1970 ISV-Container-signing-key
-rw-r--r--. 2 root root 1669 Jan  1  1970 RPM-GPG-KEY-redhat-beta
-rw-r--r--. 2 root root 5135 Jan  1  1970 RPM-GPG-KEY-redhat-release
$

The error-reporting part still applies though.

Yeah.

Adding onto the pile of hacks here unfortunately.  Basically
RHEL8 systemd seems to count explicit `systemctl start` invocations
against the restart limit.  We hit this in tests in openshift/machine-config-operator
which are invoking `rpm-ostree status` and `rpm-ostree kargs` frequently.
@jlebon jlebon enabled auto-merge March 16, 2022 21:12
@cgwalters
Copy link
Member Author

Hmm, jenkins CI seems to have gotten a bit worse recently. Clicking the restart button one more time but if that fails, going to override and we'll need to dig into that.

@jlebon
Copy link
Member

jlebon commented Mar 18, 2022

Hmm, doesn't seem like a flake. It looks like something going wrong with the vmcheck test harness itself, but no obvious error messages.

@jlebon
Copy link
Member

jlebon commented Mar 18, 2022

Can't reproduce locally so far. Opened #3528. Feel free to push there too to debug this.

@jlebon
Copy link
Member

jlebon commented Mar 18, 2022

Restarted CI. SSH bug should be fixed now!

@jlebon jlebon merged commit 10d3f24 into coreos:main Mar 18, 2022
cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Mar 22, 2022
The main motivation here is to work around
coreos/rpm-ostree#3523
(Which is itself a workaround for a RHEL8 systemd bug)

Basically this e2e is invoking `rpm-ostree kargs` in a pretty
tight loop which triggers that bug.

To read the kernel command line, we can just read `/proc/cmdline`
instead.  (Now, this is the *actual* cmdline instead of just rpm-ostree's
view of it, but it should be fine)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants