Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cirrus: Install docker from package cache #3292

Merged
merged 2 commits into from
Jun 29, 2021

Conversation

cevich
Copy link
Member

@cevich cevich commented Jun 9, 2021

What type of PR is this?

/kind flake

What this PR does / why we need it:

Installing packages at runtime (from an external source) is problematic
for many reasons. Specifically in the case of buildah/docker
conformance testing, it means the current "latest" pacakges are
always installed. This is a problem as new release branches are
created, because it presents an opportunity for test-environment changes
to happen after buildah/test code is stabilized.

Fix this by using new/special VM images which cache the required docker
packages. At runtime then, the required packages may be installed from
this cache instead of reaching out to the repository. Since images used
by tests on release branches never change, this will also serve to
stabilize the package versions for that specific environment.

How to verify it

The automated 'conformance' testing will pass in this PR

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Ref: containers/automation_images#75

Does this PR introduce a user-facing change?

None

@cevich cevich force-pushed the use_cached_packages branch 2 times, most recently from 92cfa3a to 44110dd Compare June 10, 2021 14:51
@cevich cevich changed the title [WIP] [CI:DOCS] Cirrus: Install docker from package cache [WIP] Cirrus: Install docker from package cache Jun 10, 2021
@cevich cevich force-pushed the use_cached_packages branch 2 times, most recently from fc811e0 to f68037a Compare June 11, 2021 14:22
@cevich
Copy link
Member Author

cevich commented Jun 11, 2021

Uh oh, I think I broke it. @edsantiago every platform (a guess - I only looked at two or three) is failing this bud with --runtime and --runtime-flag test in the same way. I don't think (?) it has anything to do with my script changes (here), but may have something to do with brand-new/updated VM images. Do you have/know/see any clues that may indicate what I broke?

@edsantiago
Copy link
Collaborator

Sorry, I took a quick look at this last week but decided it was nothing I could have an opinion on.

Diving into it now, this is almost certainly a problem with the new VM update. I can reproduce this on my laptop, with buildah@master. My laptop updated runc 1.0.0-377.rc93 -> -378.rc95 on 2021-05-29.

My uneducated guess is that the test developer chose a string ("nsexec started") that looked reliable, but is actually not; and runc changed so that it no longer spits out that string; and someone will have to fix the buildah test.

@cevich
Copy link
Member Author

cevich commented Jun 16, 2021

@edsantiago thanks for taking a look, that helps. I haven't looked at the test recently...anything you can think of so the test doesn't need to rely on (perhaps frequently changing) debug-level messages? If not, I'll try fixing the test with the message you suggest and see what happens.

@edsantiago
Copy link
Collaborator

I would bring in a runc expert, because I have no idea what guarantees runc makes or does not make about its debug messages; or what problems we (buildah gating tests) will run into if the new test runs with an old runc (e.g. on RHEL). My naïve first reaction is to simply search for the presence of level=debug, which (to me) guarantees that the runtime-flag was passed along to something (whether that is runc or not, the test cannot determine, but I don't know how important it is to care about that. If it's important, the test could check for "level=debug.*msg=.*/runc".)

@cevich
Copy link
Member Author

cevich commented Jun 16, 2021

@giuseppe what's your opinion (see test failure due to new runc). Can I update the test to just check for level=debug or is something more extensive needed in the system-tests?

@cevich
Copy link
Member Author

cevich commented Jun 16, 2021

Added proposed fix (in case it's acceptable to @giuseppe) and updated VM images yet again, to help confirm podman 3.2.1 addition to Ubuntu images is functional (cc @lsm5)

@edsantiago
Copy link
Collaborator

LGTM

@cevich cevich marked this pull request as ready for review June 16, 2021 21:06
@edsantiago
Copy link
Collaborator

Looks like seccomp again:

# time="2021-06-16T21:35:29Z" level=error msg="container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall bdflush: permission denied" func=main.fatal file="/usr/src/packages/BUILD/src/github.com/opencontainers/runc/utils.go:57"
# time="2021-06-16T21:35:29Z" level=debug msg="container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall bdflush: permission denied" func=main.fatal file="/usr/src/packages/BUILD/src/github.com/opencontainers/runc/utils.go:59"

Log

@cevich
Copy link
Member Author

cevich commented Jun 17, 2021

sigh, yep and rebasing didn't help (was worth a try).

[+0731s] # time="2021-06-17T13:53:08Z" level=error msg="container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall bdflush: permission denied" func=main.fatal file="/usr/src/packages/BUILD/src/github.com/opencontainers/runc/utils.go:57"
[+0731s] # time="2021-06-17T13:53:08Z" level=debug msg="container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall bdflush: permission denied" func=main.fatal file="/usr/src/packages/BUILD/src/github.com/opencontainers/runc/utils.go:59"
[+0731s] # : exit status 1

@cevich
Copy link
Member Author

cevich commented Jun 17, 2021

Opened test PR in podman to see if these several-days newer images reproduce similar failures containers/podman#10709

@cevich
Copy link
Member Author

cevich commented Jun 17, 2021

Yay, a result! This problem is reproducing in podman testing on Ubuntu 20.10 (CGv1 + runc): https://cirrus-ci.com/task/6152440987779072?logs=main#L272

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 28, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cevich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Installing packages at runtime (from an external source) is problematic
for many reasons.  Specifically in the case of buildah/docker
conformance testing, it means the current "latest" pacakges are
always installed.  This is a problem as new release branches are
created, because it presents an opportunity for test-environment changes
to happen after buildah/test code is stabilized.

Fix this by using new/special VM images which cache the required docker
packages.  At runtime then, the required packages may be installed from
this cache instead of reaching out to the repository.  Since images used
by tests on release branches never change, this will also serve to
stabilize the package versions for that specific environment.

Signed-off-by: Chris Evich <cevich@redhat.com>
This test was observed failing in upstream CI on all platforms due to
the contents of debug messages changing after updating runc.  Since the
system tests need to function on multiple platforms with inconsistent
runc versions (depending on testing context), match a more general
output message on success.  Specifically, the test really only cares
that debugging output appears and that runc is always used.

Signed-off-by: Chris Evich <cevich@redhat.com>
@cevich cevich changed the title [WIP] Cirrus: Install docker from package cache Cirrus: Install docker from package cache Jun 29, 2021
@cevich
Copy link
Member Author

cevich commented Jun 29, 2021

Okay assuming tests pass, this should be ready to go. Though the real fruits will only appear over time in new release branches, it might only have a minor improvement to stability on 'main'. Once this merges, I will investigate possibility of doing something similar for already released branches (possible in theory, practice is another matter).

@rhatdan
Copy link
Member

rhatdan commented Jun 29, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jun 29, 2021
@openshift-merge-robot openshift-merge-robot merged commit ecfcde2 into containers:main Jun 29, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants