Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v5.1] Backports for v5.1.2 #23240

Merged
merged 18 commits into from
Jul 10, 2024
Merged

Conversation

mheon
Copy link
Member

@mheon mheon commented Jul 9, 2024

All the stuff from Bug Week that backported cleanly, plus a c/image bump at request of @mtrmac

Does this PR introduce a user-facing change?

NONE

Copy link
Contributor

openshift-ci bot commented Jul 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2024
@github-actions github-actions bot added machine kind/api-change Change to remote API; merits scrutiny labels Jul 9, 2024
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need #23013 to fix validate ci error.

Also while you are at it might want to take #23066 as well

.cirrus.yml Outdated
@@ -814,7 +814,7 @@ podman_machine_mac_task:
clone_script: # artifacts from osx_alt_build_task
- mkdir -p $CIRRUS_WORKING_DIR
- cd $CIRRUS_WORKING_DIR
- $ARTCURL/OSX%20Cross/repo/repo.tbz
- $ARTCURL/Build%20for%20MacOS%20amd64%2Barm64/repo/repo.tbz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not be correct unless you also backport the task rename but this is overkill so just drop that here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mean you could have just dropped 831baea but I well manually reverting works too.

@Luap99
Copy link
Member

Luap99 commented Jul 9, 2024

Ah #23017 might be needed as well to get the WSL test passing

@Luap99 Luap99 changed the title Backports for v5.1.2 [v5.1] Backports for v5.1.2 Jul 10, 2024
@Luap99
Copy link
Member

Luap99 commented Jul 10, 2024

You can rebase now which should hopefully then have all CI problems fixed.

Is there are reason you did not add the version bump (tag) commit into this PR?

@mheon
Copy link
Member Author

mheon commented Jul 10, 2024

Wanted to merge this morning, not yesterday. I'm respinning now and including the version bump.

Luap99 and others added 15 commits July 10, 2024 08:06
Do not return 200 status code before we know if there will be an error.
Delay writing the status code until we send the first response. That way
we can set an error code inside the loop when we get a error on the
first try, i.e. because an invalid descriptor was used.

Fixes containers#22986

Signed-off-by: Paul Holzinger <pholzing@redhat.com>

<MH: Fixed conflict in tests>

Signed-off-by: Matt Heon <mheon@redhat.com>
The pod was set after we checked the namespace and the namespace code
only checked the --pod flag but didn't consider --pod-id-file option.
As such fix the check to first set the pod option on the spec then use
that for the namespace. Also make sure we always use an empty default
otherwise it would be impossible in the backend to know if a user
requested a specific userns or not, i.e. even in case of a set
PODMAN_USERNS env a container should still get the userns from the pod
and not use the var in this case. Therefore unset it from the default
cli value.

There are more issues here around --pod-id-file and cli validation that
does not consider the option as conflicting with --userns like --pod
does but I decided to fix the bug at hand and don't try to fix the
entire mess which most likely would take days.

Fixes containers#22931

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The restore code path never called completeNetworkSetup() and this means
that hosts/resolv.conf files were not populated. This fix is simply to
call this function. There is a big catch here. Technically this is
suposed to be called after the container is created but before it is
started. There is no such thing for restore, the container runs right
away. This means that if we do the call afterwards there is a short
interval where the file is still empty. Thus I decided to call it
before which makes it not working with PostConfigureNetNS (userns) but
as this does not work anyway today so  I don't see it as problem.

Fixes containers#22901

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When we execute ps(1) in the container and the container uses a userns
with a different id mapping the user id field will be wrong.

To fix this we must join the userns in such case.

Fixes containers#22293

Signed-off-by: Paul Holzinger <pholzing@redhat.com>

<MH: Fixed conflict in tests>

Signed-off-by: Matt Heon <mheon@redhat.com>
The current timeout was not long enough. Systemd default is 90s so we
should wait for at least that long. Also it really doesn't make sense to
throw an error we saying we failed waiting for stop. We should hard
terminate the VM in case a graceful shutdown did not happen.

Fixes containers#22515

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Two tests are skipped for a long time because they flaked to much,
nobody cares about them and there are only debugging endpoints mostly so
it is not critical either.

The "of 2 seconds" tests isn't useful either. It waits up to 30s for the
exit so it doesn't actually verify a proper timeout. Additionally we
have similar checks in the system tests "podman system service -
CORS enabled in logs" so I consider this safe to remove.

Fixes containers#12624

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The task got renamed but didn't fix the URL for the machine test task
only the artifacts task url was fixed.

Fixes 439fe90 ("Minor: Rename the OSX Cross task")

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
One problem on FCOS is that the root directory is immutable, as such in
order to mount arbitrary paths from the host we must make it mutable
again and create these dir on boot in order to be able to mount there.

The current logic was racy as it used one unit for each path and they
all did chattr -i /; mkdir -p $path; chattr -i / and systemd can run
these units in parallel. That means it was possible for another unit to
make / immutable before the unit could do the mkdir. I pointed this out
on the original PR[1] but we never followed up on it...

Now this here changes several things. First have one unit that does the
chattr -i / (immutable-root-off.service), it is hooked into
remote-fs-pre.target which means it is executed before the network
mounts (virtiofs) are done.

Then we have another unit that does chattr +i /
(immutable-root-on.service) which turn the immutable root back on after
remote-fs.target which means all mount are done at this point.

Additionally the automount unit is removed because it does not add any
value for us and it was borken anyway as it used the virtiofs tag as
path so systemd just ignored it.

[1] containers#20612 (comment)

Fixes containers#22569

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Matt Heon <mheon@redhat.com>
Signed-off-by: Matt Heon <mheon@redhat.com>
No idea why we need them, it passes without them so I just remove them.
Currently CI is broken as this install is failing on rawhide for some
reason. I don't know what changed there but this is working and unblocks
CI so I like to get this in.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When a user specifies a invalid connection in CONTAINER_CONNECTION then
podman should return a proper error saying so. Currently it ignored the
error and in rootFlags() just exited early with defining any flags. This
caused a panic then when trying to use the flags later.

In order to address this first store the connection error in the
PodmanConfig struct and not abort right away during flag setup. This is
important as the user might have specified a flag with a valid remote
connection. As such we check all flags and only when none were given we
return the connection error.

Also while at it I noticed that the default connection reported via
podman --help was wrong as it only used the old containers.conf field
for it and did not consider the podman-connections.json default.

New regression tests have been added to make sure it behaves correctly.

This fixes the problem reported in the PR containers#22997.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Matt Heon <mheon@redhat.com>
First of some commands ignored cmd.Wait() error which means it was
impossible to notice any command errors. And others only returned
the wait error as it which when a command fails is just
`exit status <code>` which is not helpful at all.

This commit should add proper error wrapping with stderr to get useful
strings back hopefully.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We do a soft stop via systemd to allow graceful shutdown behavior.
Hoewever for unknown reason we are hitting such a case in CI right now.
Regardless of the CI issue we should always to the hard terminate in
such case so only log the timeout as warning.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@Luap99
Copy link
Member

Luap99 commented Jul 10, 2024

Ah sorry I just looked at the epel failures and that reminded me of 6db8ff7

I am not sure why our fedora builds work but without this patch some distros may not be able to build podman.

As this file uses open it needs to include fcntl.h.
This should fix the build error seen on epel9[1], not sure why it works
on the other platforms.

[1] https://download.copr.fedorainfracloud.org/results/packit/containers-podman-23113/epel-9-aarch64/07672197-podman/builder-live.log.gz

Fixes 65ed965 ("podman top: join the container userns")

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Matt Heon <mheon@redhat.com>
Signed-off-by: Matt Heon <mheon@redhat.com>
@mheon
Copy link
Member Author

mheon commented Jul 10, 2024

Added

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@baude
Copy link
Member

baude commented Jul 10, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 10, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 6114aaa into containers:v5.1 Jul 10, 2024
84 of 87 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Change to remote API; merits scrutiny lgtm Indicates that a PR is ready to be merged. machine release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants