Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cirrus: Migrate Ubuntu to Debian CI VM Images #1535

Merged
merged 2 commits into from
Apr 4, 2023

Conversation

cevich
Copy link
Member

@cevich cevich commented Mar 15, 2023

@cevich
Copy link
Member Author

cevich commented Mar 16, 2023

Just when I thought I was done building CI VM images, it seems Debian is missing XFS utils. 🤔

@cevich
Copy link
Member Author

cevich commented Mar 21, 2023

@rhatdan & @nalind PTAL when you have a moment. I disabled the ZFS tests due to Debian image-build complications. If you think that's something we really need, I can fuss with the images more.

@rhatdan
Copy link
Member

rhatdan commented Mar 21, 2023

LGTM
I will let @nalind comment on the ZFS

@cevich cevich requested a review from nalind March 22, 2023 16:19
@TomSweeneyRedHat
Copy link
Member

LGTM and would like a @nalind head nod

@nalind
Copy link
Member

nalind commented Mar 28, 2023

Yes, I would like to continue testing zfs. We ship the code, people try to use it, we only recently did the work to start testing it in CI, and I don't want to sacrifice that improvement in order to make a different one.

The additional setup steps seem to look more or less like:

sed -i -r 's/^(deb.*)/\1 contrib/g' /etc/apt/sources.list
apt-get update
apt-get install zfsutils zstd

That drags in zfs-dkms, which builds the kernel module.

@cevich
Copy link
Member Author

cevich commented Mar 28, 2023

Thanks for researching the needed steps nalin. I'll incorporate those into the VM Image build scripts, and crank out a set of new images for this.

cevich added a commit to cevich/automation_images that referenced this pull request Mar 28, 2023
Ref: containers/storage#1535

Signed-off-by: Chris Evich <cevich@redhat.com>
@cevich cevich marked this pull request as draft March 28, 2023 18:47
cevich added a commit to cevich/automation_images that referenced this pull request Mar 28, 2023
Ref: containers/storage#1535

Signed-off-by: Chris Evich <cevich@redhat.com>
cevich added a commit to cevich/automation_images that referenced this pull request Mar 29, 2023
Ref: containers/storage#1535

Signed-off-by: Chris Evich <cevich@redhat.com>
cevich added a commit to cevich/automation_images that referenced this pull request Mar 29, 2023
Ref: containers/storage#1535

Signed-off-by: Chris Evich <cevich@redhat.com>
@cevich
Copy link
Member Author

cevich commented Mar 29, 2023

@nalind any idea why the module isn't loading? It's definitely being installed (log). Is /lib/modules/6.1.0-7-amd64/updates/dkms/ the correct path for modprobe to find it?

Ref: Debian-12 zfs test log

@nalind
Copy link
Member

nalind commented Mar 29, 2023

Is there something odd going on with the handling of the kernel in this VM image? The log and dkms status says that the zfs module was built for 6.1.0-7-amd64, but when I bring it up, uname -r says it's running 6.1.0-7-cloud-amd64.

@cevich
Copy link
Member Author

cevich commented Mar 30, 2023

Oh! Good catch! I wonder if that's some special kernel google placed.

    debian: Module build for kernel 6.1.0-7-cloud-amd64 was skipped since the
    debian: kernel headers for this kernel do not seem to be installed.

Maybe installing the cloud kernel headers is what's needed? I can give that a try.

Edit: I found https://packages.debian.org/sid/linux-headers-cloud-amd64

cevich added a commit to cevich/automation_images that referenced this pull request Mar 30, 2023
Ref: containers/storage#1535

Signed-off-by: Chris Evich <cevich@redhat.com>
@cevich
Copy link
Member Author

cevich commented Mar 30, 2023

@nalind Thanks for help with the kernel module. I think I got that fixed now and the ZFS tests look to be running much better. There's just a single test failure. Any idea what might be the trouble?

[+1006s] --- FAIL: TestOverlayEcho (0.24s)
[+1006s]     graphtest_unix.go:508: unlinkat /tmp.PSLnHvVlHo/tmp/storage-graphtest-336150358/overlay/3ee87b84f68a559f8458e08dc743e3015e027264d5bf40f13c332a4254062ee7/merged/subdir1/subdir2/subdir3/subdir4/subdir5/subdir6/subdir7/subdir8/subdir9: invalid argument
[+1006s] FAIL
[+1006s] FAIL	github.com/containers/storage/drivers/overlay	11.888s

@nalind
Copy link
Member

nalind commented Mar 30, 2023

Reproducing it, journalctl shows me overlayfs: upper fs does not support RENAME_WHITEOUT. being logged at about the same time.
We stopped rejecting overlay-on-zfs outright, but #1432 (comment) notes that the support that overlay needs from zfs is not present in the current 2.1.9 release, and is instead slated to be in a future 2.2.x release. We need to start exercising this functionality as part of checking if an underlying filesystem is suitable for use with overlay. Opened #1545 to do so.

@nalind
Copy link
Member

nalind commented Mar 31, 2023

#1545 was merged, so I think rebasing will be enough to get that test passing.

@cevich
Copy link
Member Author

cevich commented Mar 31, 2023

Thanks @nalind

@cevich cevich marked this pull request as ready for review March 31, 2023 15:04
@cevich
Copy link
Member Author

cevich commented Mar 31, 2023

@nalind it looks like all the tests are passing, but the final cleanup has some fart:

[+1151s] cannot destroy 'tmp.mNwHJCa6kq/tmp': dataset is busy

@nalind
Copy link
Member

nalind commented Mar 31, 2023

Cherry-picking the tip of #1548 should sort that out.

@cevich
Copy link
Member Author

cevich commented Mar 31, 2023

Thanks again Nalin.

@cevich
Copy link
Member Author

cevich commented Apr 3, 2023

@nalind I rebased to bring in #1548 but the tests are still complaining: [+1158s] cannot destroy 'tmp.U8HgWAHExr/tmp': dataset is busy 😢 Is this a real failure or maybe something we can just ignore because it's CI?

@nalind
Copy link
Member

nalind commented Apr 3, 2023

There's a leftover sleep 1000s from the device-mapper driver's TestDevmapperMountLeaks() test holding the zfs dataset mount open in a different moiunt namespace. I guess that process can be killed, which cleans up the namespace and its mounts, pretty quickly, by adding a

kill $(lsns -J -t mnt --output-all | jq '.namespaces[]|select(.command=="sleep 1000s").pid')

or more politely, by unmounting the dataset from the namespace, by adding

umount -l /proc/$(lsns -J -t mnt --output-all | jq '.namespaces[]|select(.command=="sleep 1000s").pid')/root/"$zpool/tmp"

to the cleanup logic. The latter still takes a couple of seconds before zfs will accept that the dataset is no longer busy.

Signed-off-by: Chris Evich <cevich@redhat.com>
@cevich
Copy link
Member Author

cevich commented Apr 4, 2023

@nalind the VFS task is throwing the same error. Did I put the fix in the wrong place?

@nalind
Copy link
Member

nalind commented Apr 4, 2023

Yeah, the teardown() function is by individual bats tests, which is how we're running integration tests.
The lingering process is spawned when go test is in drivers/devmapper, and we're running the unit tests after the integration tests.
I'd suggest adding the extra bit of cleanup around line 73 of contrib/cirrus/build_and_test.sh.

@cevich
Copy link
Member Author

cevich commented Apr 4, 2023

Oooohhh, it's the EXIT trap that's generating the error, now I get it. Thanks!

Fix for post-testing error:

`cannot destroy 'tmp.uwIwWzoNp4/tmp': dataset is busy`

Thanks to @nalind for the fix.

Signed-off-by: Chris Evich <cevich@redhat.com>
@cevich
Copy link
Member Author

cevich commented Apr 4, 2023

Yay \o/

All green now 😄

@rhatdan
Copy link
Member

rhatdan commented Apr 4, 2023

LGTM

@rhatdan rhatdan merged commit 9904fa3 into containers:main Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants