Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machine: Volume ops test: statfs /private/tmp/ci/ginkgoNNN: no such file or directory #22569

Open
edsantiago opened this issue May 1, 2024 · 6 comments
Labels
flakes Flakes from Continuous Integration machine macos MacOS (OSX) related

Comments

@edsantiago
Copy link
Collaborator

Not quite as frequent or annoying as #22551, but still causing wasted runs:

run basic podman commands
  Volume ops
....
  Trying to pull quay.io/libpod/alpine_nginx:latest...
  ...
  Writing manifest to image destination
  WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64)
  Error: statfs /private/tmp/ci/ginkgo630638030: no such file or directory
x x x x x x
machine-mac(5) podman(5) darwin(5) rootless(5) host(5) sqlite(5)
@edsantiago edsantiago added flakes Flakes from Continuous Integration macos MacOS (OSX) related machine labels May 1, 2024
@edsantiago
Copy link
Collaborator Author

@cevich is there any chance whatsoever that https://github.com/containers/podman/blob/c9644ebccf14309a77769cba00833cd139509e4a/contrib/cirrus/mac_cleanup.sh is getting invoked in the middle of a running CI job? I just can't understand this bug and am grasping at straws.

@Luap99
Copy link
Member

Luap99 commented May 3, 2024

Unlikely, there is a extra level of indirectness here given the dir is mounted in the machine VM. As such maybe the machine mount failed silently?

@cevich
Copy link
Member

cevich commented May 3, 2024

is there any chance whatsoever that

What Paul said. And the Mac's are single-task/single-user. Is it possible the running VM really is x86_64 via some emulation and/or is the pull command specifying --arch or --platform (just double-checking)?

Sorry no hack/get_ci_vm.sh support here, that's just way to complex with this environment to do safely. But in case it helps and is supported (I never checked), the re-run in terminal may be an option (with cleanup temporarily disabled).

Otherwise, there is a way to isolate (for a few hours) one of the Macs and dedicate it to servicing a single PR. In that PR, the end-of-task cleanup could be disabled, so that a human may ssh in and check out the state of things. This is all manual, and a bit of a chore to pull off, but it's technically possible.

@Luap99
Copy link
Member

Luap99 commented May 15, 2024

@edsantiago Not sure if you are testing machine in your non flake retry testing PR but if you do could you give this a go:

diff --git a/pkg/machine/apple/apple.go b/pkg/machine/apple/apple.go
index 93201407e..04db7638b 100644
--- a/pkg/machine/apple/apple.go
+++ b/pkg/machine/apple/apple.go
@@ -124,7 +124,7 @@ func GenerateSystemDFilesForVirtiofsMounts(mounts []machine.VirtIoFs) ([]ignitio
        mountPrep.Add("Service", "Type", "oneshot")
        mountPrep.Add("Service", "ExecStartPre", "chattr -i /")
        mountPrep.Add("Service", "ExecStart", "mkdir -p '%f'")
-       mountPrep.Add("Service", "ExecStopPost", "chattr +i /")
+       // mountPrep.Add("Service", "ExecStopPost", "chattr +i /")
 
        mountPrep.Add("Install", "WantedBy", "remote-fs.target")
        mountPrepFile, err := mountPrep.ToString()

@edsantiago
Copy link
Collaborator Author

Oops, no, I long ago disabled machine tests in #17831. I will look into reenabling this one.

FWIW here's the current flake list. I don't think there's any useful info in this list, i.e., I haven't seen any logs that look different or provide interesting new data, but am posting anyway.

x x x x x x
machine-mac(7) podman(7) darwin(7) rootless(7) host(7) sqlite(7)

@Luap99
Copy link
Member

Luap99 commented May 15, 2024

The alternative is I instrument the tests to do some checks. Basically I it would have to ssh into the machine VM and run systemctl status on all the mount units. I think the race here is the most likely cause.

One interesting point would be the new machine init with volume test, if this never fails then I am sure this is a race due the parallel running chattr -i and chattr +i in different units. Reason this tests mounts only one path so there cannot be a race, however the default volumes are several paths thus the chance for the race.

edsantiago added a commit to edsantiago/libpod that referenced this issue May 15, 2024
…bugging containers#22569

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue May 15, 2024
…bugging containers#22569

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue May 23, 2024
…bugging containers#22569

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Jun 3, 2024
…bugging containers#22569

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Jun 3, 2024
…bugging containers#22569

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Jun 5, 2024
…bugging containers#22569

Signed-off-by: Ed Santiago <santiago@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flakes Flakes from Continuous Integration machine macos MacOS (OSX) related
Projects
None yet
Development

No branches or pull requests

3 participants