Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[macos] cannot start a VM created with podman machine init after a reboot #10824

Closed
yvanarnaud opened this issue Jun 30, 2021 · 18 comments · Fixed by #10850
Closed

[macos] cannot start a VM created with podman machine init after a reboot #10824

yvanarnaud opened this issue Jun 30, 2021 · 18 comments · Fixed by #10850
Assignees
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@yvanarnaud
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

On MacOS, after rebooting, the VM created with podman machine init cannot start anymore.

Steps to reproduce the issue:

  1. create a VM with podman machine init : working as expected

  2. start it with podman machine start : working as expected

  3. reboot

  4. start VM again with podman machine start : VM does not start

Describe the results you received:

Waiting for VM ... qemu-system-x86_64: -qmp unix://var/folders/78/y7tgcc410y34zywtm83dgjt80000gn/T/podman/qmp_podman-machine-default.sock,server=on,wait=off: Failed to bind socket to //var/folders/78/y7tgcc410y34zywtm83dgjt80000gn/T/podman/qmp_podman-machine-default.sock: No such file or directory Error: dial unix /var/folders/78/y7tgcc410y34zywtm83dgjt80000gn/T/podman/podman-machine-default_ready.sock: connect: no such file or directory

Describe the results you expected:

VM should start

Additional information you deem important (e.g. issue happens only occasionally):

Same behaviour creating a VM with another name

Output of podman version:

Without any VM started, podman versiondoes not work, but podman --version does

podman version 3.2.2

After initialising a new VM:

Client:
Version:      3.2.2
API Version:  3.2.2
Go Version:   go1.16.5
Built:        Fri Jun 25 20:21:29 2021
OS/Arch:      darwin/amd64

Server:
Version:      3.2.1
API Version:  3.2.1
Go Version:   go1.16.3
Built:        Mon Jun 14 21:12:29 2021
OS/Arch:      linux/amd64

Output of podman info --debug:
It does not work without a started VM. After starting a new one:

host:
  arch: amd64
  buildahVersion: 1.21.0
  cgroupControllers: []
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.27-2.fc34.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.27, commit: '
  cpus: 1
  distribution:
    distribution: fedora
    version: "34"
  eventLogger: journald
  hostname: localhost
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.12.12-300.fc34.x86_64
  linkmode: dynamic
  memFree: 1656451072
  memTotal: 2072817664
  ociRuntime:
    name: crun
    package: crun-0.20.1-1.fc34.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.20.1
      commit: 0d42f1109fd73548f44b01b3e84d04a279e99d2e
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.9-1.fc34.x86_64
    version: |-
      slirp4netns version 1.1.8+dev
      commit: 6dc0186e020232ae1a6fcc1f7afbc3ea02fd3876
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.0
  swapFree: 0
  swapTotal: 0
  uptime: 1m 6.94s
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 0
  runRoot: /run/user/1000/containers
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 3.2.1
  Built: 1623697949
  BuiltTime: Mon Jun 14 19:12:29 2021
  GitCommit: ""
  GoVersion: go1.16.3
  OsArch: linux/amd64
  Version: 3.2.1

Package info (e.g. output of rpm -q podman or apt list podman):
Installed on MacOS Big Sur with brew

brew info podman
podman: stable 3.2.2 (bottled)
Tool for managing OCI containers and pods
https://podman.io/
/usr/local/Cellar/podman/3.2.2 (167 files, 29.8MB) *
  Poured from bottle on 2021-06-28 at 17:14:59
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/podman.rb
License: Apache-2.0
==> Dependencies
Build: go ✔, go-md2man ✘
==> Caveats
Bash completion has been installed to:
  /usr/local/etc/bash_completion.d
==> Analytics
install: 4,694 (30 days), 11,050 (90 days), 30,721 (365 days)
install-on-request: 4,691 (30 days), 11,046 (90 days), 30,280 (365 days)
build-error: 0 (30 days)

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

macOS Big Sur 11.4

Logging out an in again, I can start the VM as expected. So it seems to be related to the reboot.

I also tried to stop the VM before rebooting but it didn't help.

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 30, 2021
@mheon
Copy link
Member

mheon commented Jun 30, 2021

@baude @ashley-cui PTAL

@ashley-cui
Copy link
Member

@yvanarnaud, if you do a podman machine stop before rebooting, does the error still occur?

@yvanarnaud
Copy link
Author

@ashley-cui, I tried that but the error still occur.

@yvanarnaud
Copy link
Author

A workaround is to create the missing podman directory:

mkdir /var/folders/78/y7tgcc410y34zywtm83dgjt80000gn/T/podman
podman machine start
Waiting for VM ...

Still, I don't know why and when this directory is removed.

@ashley-cui
Copy link
Member

Hmm okay thanks for the info!

@baude
Copy link
Member

baude commented Jun 30, 2021

That location is akin to a tmpdir iirc. I wonder if macos changes the tempdir after reboot? Should be easy enough to replicate and see.

@yvanarnaud
Copy link
Author

@baude, you are right:

env|grep TMPDIR
TMPDIR=/var/folders/78/y7tgcc410y34zywtm83dgjt80000gn/T/

It is the same folder after each reboot.

@baude
Copy link
Member

baude commented Jul 2, 2021

@ashley-cui are you able to replicate this? or want me to run it down ?

@ashley-cui
Copy link
Member

@baude haven't touched it, if you want to grab it feel free

@baude baude added the In Progress This issue is actively being worked by the assignee, please do not work on this at this time. label Jul 2, 2021
@baude baude self-assigned this Jul 2, 2021
@baude
Copy link
Member

baude commented Jul 2, 2021

alright, whats going on here is that the tmpdir is being cleaned on each reboot (as it should) and podman machine start does not create the TMPDIR/podman directory if it does not exist. PR soon...

baude pushed a commit to baude/podman that referenced this issue Jul 2, 2021
If the tempdir for the OS does not have a podman/, machine start will fail.  An example would be after a reboot.  We now create the podman dir if it does not exist.

Fixes containers#10824

Signed-off-by: baude <baude@baudes-Mac-mini.localdomain>
baude pushed a commit to baude/podman that referenced this issue Jul 2, 2021
If the tempdir for the OS does not have a podman/, machine start will fail.  An example would be after a reboot.  We now create the podman dir if it does not exist.

Fixes containers#10824

[NO TESTS NEEDED]

Signed-off-by: baude <baude@baudes-Mac-mini.localdomain>
baude pushed a commit to baude/podman that referenced this issue Jul 2, 2021
If the tempdir for the OS does not have a podman/, machine start will fail.  An example would be after a reboot.  We now create the podman dir if it does not exist.

Fixes containers#10824

[NO TESTS NEEDED]

Signed-off-by: baude <baude@baudes-Mac-mini.localdomain>
Signed-off-by: Brent Baude <bbaude@redhat.com>
rugk pushed a commit to rugk/podman that referenced this issue Jul 9, 2021
If the tempdir for the OS does not have a podman/, machine start will fail.  An example would be after a reboot.  We now create the podman dir if it does not exist.

Fixes containers#10824

[NO TESTS NEEDED]

Signed-off-by: baude <baude@baudes-Mac-mini.localdomain>
Signed-off-by: Brent Baude <bbaude@redhat.com>
@Nilegfx
Copy link

Nilegfx commented May 19, 2022

in my case the temp dir is still exists but somehow the qmp_podman-machine-default.sock is missing. any advice?

@baude
Copy link
Member

baude commented May 19, 2022

@Nilegfx please open a new issue ... provide as much information as possible and follow the template.

@carlosgorges
Copy link

After some time debugging, I found the cause of this problem.

This problem is caused due to qemu 7.0.0 startup latency (3-5s) that occour in every first qemu execution after Mac Machine machine boots.

Podman has some bug that doesn't expect that the creation of socks files, done by the qemu call, can be delayed some seconds, and when podman tries to access the socks files, the qemu is not created them yet, showing the error "Error: dial unix /podman/podman-machine-default_ready.sock: connect: connection refused".

To avoid this problem, just execute qemu, even with invalid options (just to initialize), before call "podman machine start".

            echo "* Podman VM machine for MACOSX is stoped, starting..."
            
            # workaround - initialize qemu before machine start to avoid socket error
            /usr/local/bin/qemu-system-x86_64 -machine q35,accel=hvf:tcg -cpu host -display none INVALID_OPTION >> /dev/null 2>&1

            podman machine start podman-machine-default
            ECODE=$?;if [ $ECODE -ne 0 ];then echo "* Error starting podman linux vm machine: $ECODE";exit $ECODE;fi

I hope help.
Carlos Eduardo Gorges.

@kspendli
Copy link

kspendli commented Oct 3, 2022

i faced same kind of issue.

Remove podman brew uninstall podman
Remove containers files from the following directories:
rm -rf ~/.config/containers/
rm -rf ~/.local/share/containers

rm ~/.ssh/podman* related files
Reinstalled podman using brew brew install podman
Init the podman machine and start it.

@SAIYOGANAND
Copy link

i faced same kind of issue.

Remove podman brew uninstall podman Remove containers files from the following directories: rm -rf ~/.config/containers/ rm -rf ~/.local/share/containers

rm ~/.ssh/podman* related files Reinstalled podman using brew brew install podman Init the podman machine and start it.

@kspendli thank you so much.

@ssbarnea
Copy link
Collaborator

Apparently same problem is valid even with podman 4.0.3 on macos. All I did was to cache the container image with GHA in order to reduce its setup time, but apparently podman fails to start on a VM that was only few monutes old.

+ podman machine start
[58](https://github.com/ansible/ansible-language-server/actions/runs/3569313931/jobs/5999151792#step:12:59)
Starting machine "podman-machine-default"
[59](https://github.com/ansible/ansible-language-server/actions/runs/3569313931/jobs/5999151792#step:12:60)
Error: dial unix /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/podman/qmp_podman-machine-default.sock: connect: no such file or directory
[60](https://github.com/ansible/ansible-language-server/actions/runs/3569313931/jobs/5999151792#step:12:61)
task: Failed to run task "setup": exit status 125
[61](https://github.com/ansible/ansible-language-server/actions/runs/3569313931/jobs/5999151792#step:12:62)
Error: Process completed with exit code 1.

@ssbarnea
Copy link
Collaborator

@carlosgorges It is not so simple as the code should be multi-arch compatible, I ended up using:

"qemu-system-${MACHTYPE}" -machine q35,accel=hvf:tcg -cpu host -display none INVALID_OPTION || true

@AlekseiKromski
Copy link

i faced same kind of issue.

Remove podman brew uninstall podman Remove containers files from the following directories: rm -rf ~/.config/containers/ rm -rf ~/.local/share/containers

rm ~/.ssh/podman* related files Reinstalled podman using brew brew install podman Init the podman machine and start it.

Thank you! That helped for me. 🚀
Podman: 4.5.1
Mac OS: Ventura 13.4.1

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 12, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants