Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Machine is not running properly after update to macOS Catalina #83

Closed
mediaessenz opened this issue Nov 6, 2019 · 22 comments
Closed

Comments

@mediaessenz
Copy link

mediaessenz commented Nov 6, 2019

After the first generation of a new machine (in my case done by "dinghy", a reverse proxy solution) everything seems to work normal.
But after stopping the machine and try to start it again, its not working anymore.
The startup ends up with this error message

Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded
Traceback (most recent call last):
	9: from /usr/local/bin/_dinghy_command:12:in `<main>'
	8: from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/base.rb:440:in `start'
	7: from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor.rb:359:in `dispatch'
	6: from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/invocation.rb:126:in `invoke_command'
	5: from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/command.rb:27:in `run'
	4: from /usr/local/Cellar/dinghy/4.6.5/cli/cli.rb:93:in `up'
	3: from /usr/local/Cellar/dinghy/4.6.5/cli/cli.rb:271:in `start_services'
	2: from /usr/local/Cellar/dinghy/4.6.5/cli/dinghy/machine.rb:25:in `up'
	1: from /usr/local/Cellar/dinghy/4.6.5/cli/dinghy/machine.rb:126:in `system'
/usr/local/Cellar/dinghy/4.6.5/cli/dinghy/system.rb:18:in `system': Failure calling `docker-machine start dinghy` (System::Failure)

After starting the machine with debug option I get ten times this fault, before the upper error comes up again:

(dinghy) Calling .GetSSHHostname
(dinghy) DBG | executing: /usr/local/bin/prlctl list dinghy --output status --no-header
(dinghy) DBG | executing: /usr/local/bin/prlctl list -i dinghy
(dinghy) DBG | Found lease: 10.211.55.32 for MAC: 001C4208D8F8, expiring at 1571651690, leased for 1800 s.
(dinghy) DBG |
(dinghy) DBG | Found IP lease: 10.211.55.32 for MAC address 001C4208D8F8
(dinghy) DBG |
(dinghy) Calling .GetSSHPort
(dinghy) Calling .GetSSHKeyPath
(dinghy) Calling .GetSSHKeyPath
(dinghy) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /Users/alex/.docker/machine/machines/dinghy/id_rsa (-rw-------)
&{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null docker@10.211.55.32 -o IdentitiesOnly=yes -i /Users/alex/.docker/machine/machines/dinghy/id_rsa -p 22] /usr/local/bin/ssh <nil>}
About to run SSH command:
if ! type netstat 1>/dev/null; then ss -tln; else netstat -tln; fi
SSH cmd err, output: <nil>: Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 :::22                   :::*                    LISTEN

Watching the bootup sequence by opening the miniature window inside the parallel controll panel I see several errors and warning during the first bootup:

...
unable to write 'random state'
...
unable to write 'random state'
...
Device "eth1" does not exists.
...
unable to write 'random state'
...
unable to write 'random state'

Independent of this messages the machine works as expected until I stop and start it again.
If I doing this, I see only one warning during the bootup inside the parallels window:

warning: unable to find partition with the swap label (boot2dockerswap) or TYPE=swap (so Docker will likely complain about swap)
- this could also mean TCL already mounted it! (see 'free' or '/proc/swaps')

I have two macs (both already updated to macOS Catalina) with exact the same problem (after updating the system).

There is also an issue I posted at the dinghy repo, but the autor means the problem stuck inside docker-machine or this parallels connector:
codekitchen/dinghy#290

@KatSick
Copy link

KatSick commented Nov 7, 2019

any updates on this? I have similar problem. before latest Catalina updates all went well, but now on come commands like docker-compose up I see: [1] 26388 abort docker-compose up and that's all

@mediaessenz
Copy link
Author

@KatSick Maybe this helps: Homebrew/homebrew-core#45687 (comment)

@romankulikov
Copy link
Collaborator

@KatSick Maybe this helps: Homebrew/homebrew-core#45687 (comment)

@mediaessenz , does this workaround help for you?

@mediaessenz
Copy link
Author

@KatSick Maybe this helps: Homebrew/homebrew-core#45687 (comment)

@mediaessenz , does this workaround help for you?

Yes

@mediaessenz
Copy link
Author

An Update:
I got a brand new iMac yesterday and started to set up the system to my needs.
Because of the problems with restarting a docker machine I described here, I made the desicion not to use a timemachine backup of my old mac.
I installed only some basic stuff (browser, iterm, parallels desktop 15 and brew) before I used brew to install docker, docker-compose, docker-machine and docker-machine-parallels.
After this I created a new docker-machine and still have the same problem like before.
The machine comes up the first time without problems and end in the same error described up after stop and trying to start it again.
Also the messages shown in the parallels window I described up are the same.

@mediaessenz
Copy link
Author

Am I really the ony one on this planet who have problems with using docker machine together with parallels desktop on macOS Catalina?

@mediaessenz
Copy link
Author

After comparing the debug output of the first (working) start (create command) and the second (not working) start (start command) I found this different which may can help to identify the problem:

  1. Start:
...
Using SSH private key: /Users/alex/.docker/machine/machines/dinghy/id_rsa (-rw-------)
&{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null docker@10.211.55.8 -o IdentitiesOnly=yes -i /Users/alex/.docker/machine/machines/dinghy/id_rsa -p 22] /usr/bin/ssh <nil>}
About to run SSH command:
sudo /usr/bin/sethostname dinghy && echo "dinghy" | sudo tee /var/lib/boot2docker/etc/hostname
SSH cmd err, output: <nil>: Setting hostname to dinghy Done.
dinghy

(dinghy) Calling .GetSSHHostname
(dinghy) DBG | executing: /usr/local/bin/prlctl list dinghy --output status --no-header
(dinghy) DBG | executing: /usr/local/bin/prlctl list -i dinghy
(dinghy) DBG | Found lease: 10.211.55.8 for MAC: 001C42CE0C5E, expiring at 1574071941, leased for 1800 s.
(dinghy) DBG |
(dinghy) DBG | Found IP lease: 10.211.55.8 for MAC address 001C42CE0C5E
(dinghy) DBG |
(dinghy) Calling .GetSSHPort
(dinghy) Calling .GetSSHKeyPath
(dinghy) Calling .GetSSHKeyPath
(dinghy) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /Users/alex/.docker/machine/machines/dinghy/id_rsa (-rw-------)
&{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null docker@10.211.55.8 -o IdentitiesOnly=yes -i /Users/alex/.docker/machine/machines/dinghy/id_rsa -p 22] /usr/bin/ssh <nil>}
About to run SSH command:
if ! type netstat 1>/dev/null; then ss -tln; else netstat -tln; fi
SSH cmd err, output: <nil>: Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 :::2376                 :::*                    LISTEN
tcp        0      0 :::22                   :::*                    LISTEN
...
  1. Start
...
Using SSH private key: /Users/alex/.docker/machine/machines/dinghy/id_rsa (-rw-------)
&{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null docker@10.211.55.6 -o IdentitiesOnly=yes -i /Users/alex/.docker/machine/machines/dinghy/id_rsa -p 22] /usr/bin/ssh <nil>}
About to run SSH command:
if ! type netstat 1>/dev/null; then ss -tln; else netstat -tln; fi
SSH cmd err, output: <nil>: Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 :::22                   :::*                    LISTEN

(dinghy) Calling .GetSSHHostname
(dinghy) DBG | executing: /usr/local/bin/prlctl list dinghy --output status --no-header
(dinghy) DBG | executing: /usr/local/bin/prlctl list -i dinghy
(dinghy) DBG | Found lease: 10.211.55.6 for MAC: 001C4295E032, expiring at 1574070104, leased for 1800 s.
(dinghy) DBG |
(dinghy) DBG | Found IP lease: 10.211.55.6 for MAC address 001C4295E032
(dinghy) DBG |
(dinghy) Calling .GetSSHPort
(dinghy) Calling .GetSSHKeyPath
(dinghy) Calling .GetSSHKeyPath
(dinghy) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /Users/alex/.docker/machine/machines/dinghy/id_rsa (-rw-------)
&{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null docker@10.211.55.6 -o IdentitiesOnly=yes -i /Users/alex/.docker/machine/machines/dinghy/id_rsa -p 22] /usr/bin/ssh <nil>}
About to run SSH command:
if ! type netstat 1>/dev/null; then ss -tln; else netstat -tln; fi
SSH cmd err, output: <nil>: Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 :::22                   :::*                    LISTEN
...

@romankulikov
Copy link
Collaborator

Well, it looks like the issue is in broken IP address reporting from virtual machine. I'm investigating it.

@mediaessenz
Copy link
Author

Any news about this?

@romankulikov
Copy link
Collaborator

It looks like this is duplicate of #docker/machine#3595. For example, switching to older version of boot2docker works for me:

$ dinghy create --provider parallels --boot2docker-url=https://github.com/boot2docker/boot2docker/releases/download/v18.06.1-ce/boot2docker.iso

@romankulikov
Copy link
Collaborator

It looks like this is duplicate of #docker/machine#3595. For example, switching to older version of boot2docker works for me:

$ dinghy create --provider parallels --boot2docker-url=https://github.com/boot2docker/boot2docker/releases/download/v18.06.1-ce/boot2docker.iso

Well, now after more investigation I tend to reference this issue down to the lack of entropy at the start of virtual machine. It is described in #boot2docker/boot2docker#1322 (comment). Watching at /var/lib/boot2docker/log/docker.log inside virtual machine this message is printed at the start of dockerd:

crypto/rand: blocked for 60 seconds waiting to read random data from the kernel

–dockerd hangs at the start of the system which results in docker-machine create command failure (and consequent certificate issues).

Issue is reproduced on the recent boot2docker versions, like current 19.03.5, based on Linux kernel 4.14. Version 18.06.1 works because it is based on Linux kernel 4.9 where entropy pool state during VM boot is better.

@mediaessenz
Copy link
Author

@romankulikov You are my HERO!!!
After switching to the older image my problems are gone!
Thank You very much for the energy you put into this issue!

@iby
Copy link
Contributor

iby commented Feb 29, 2020

@romankulikov Sorry in advance if the question is dumb. I understand this is an issue with boot2docker image, but isn't this a show stopper? Aren't other drivers affected by this and if no, is there a way Parallels driver can adapt? If I understand correctly the referencing PR had something to do with fixing this, but the latest version (on Mojave) still shows the same behaviour (can create, cannot restart).

@romankulikov
Copy link
Collaborator

I've tried Parallels Desktop 15.1.2 and VirtualBox 6.1.4 with Boot2Docker v19.03.5–both work ok for me at the moment. At least in this case of creating and starting "dinghy" machine. @ianbytchek, can you please share your setup for me to reproduce the problem?

Speaking about the lack of entropy when starting the guest OS on the one hand it does look like a showstopper. On the other hand it doesn't look a like an easy thing to fix. On the third hand issue is currently addressed in modern Linux kernels:
https://lwn.net/Articles/808575/
https://git.kernel.org/linus/50ee7529ec4500c88f8664560770a7a1b65db72b

Not sure where to move forward.

@romankulikov
Copy link
Collaborator

Well, it looks like boot2docker starting from 19.03.5 has a backported patch from Linux kernel 5.4 with entropy fixes. So my picture of the problem is broken. And I need to know if one can reproduce the issue with recent boot2docker image.

@mediaessenz
Copy link
Author

Unfortunately, at least for me, the problem still exists with the latest boot2docker image (19.03.5)

@josefglatz
Copy link

@romankulikov Tried it with boot2docker iso image version 19.03.5 and get still same error es @mediaessenz already mentioned after a dinghy stop && dinghy up:

Starting the dinghy VM...

Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded
/usr/local/Cellar/dinghy/4.6.5/cli/dinghy/system.rb:18:in `system': Failure calling `docker-machine start dinghy` (System::Failure)
	from /usr/local/Cellar/dinghy/4.6.5/cli/dinghy/machine.rb:126:in `system'
	from /usr/local/Cellar/dinghy/4.6.5/cli/dinghy/machine.rb:25:in `up'
	from /usr/local/Cellar/dinghy/4.6.5/cli/cli.rb:271:in `start_services'
	from /usr/local/Cellar/dinghy/4.6.5/cli/cli.rb:93:in `up'
	from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/command.rb:27:in `run'
	from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/invocation.rb:126:in `invoke_command'
	from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor.rb:359:in `dispatch'
	from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/base.rb:440:in `start'
	from /usr/local/bin/_dinghy_command:12:in `<main>'

@legal90
Copy link
Collaborator

legal90 commented Jun 5, 2020

Well, it looks like boot2docker starting from 19.03.5 has a backported patch from Linux kernel 5.4 with entropy fixes.

boot2docker 19.03.5 was released from the earlier state, before that fix. That's why the issue still persists.

And, unfortunately, it seems there will be no releases anymore 😭 : boot2docker/boot2docker#1408
Together with docker/machine#4537, it looks like the sunset of the entire Docker Machine project.

@romankulikov
Copy link
Collaborator

boot2docker 19.03.5 was released from the earlier state, before that fix. That's why the issue still persists.

Yeah :-(

@legal90, how should we proceed with this issue? From my point it should be fixed only on guest OS (i.e. boot2docker) side. Is forking boot2docker an option?

@legal90
Copy link
Collaborator

legal90 commented Jun 24, 2020

@romankulikov Building and releasing a custom boot2docker.iso might be an option, but in this case all users will have to specify the custom URL to it using --parallels-boot2docker-url flag. Let's see if there will be any fork continued by the community.

I asked here if there is any chance for the patch to be released: boot2docker/boot2docker#1403 (comment)

@legal90
Copy link
Collaborator

legal90 commented Jun 30, 2020

v19.03.12, the final release of boot2docker was published today: https://github.com/boot2docker/boot2docker/releases/tag/v19.03.12

It includes the fix boot2docker/boot2docker#1403 and this issue should be solved there. I checked it on the test vm by doing docker-machine restart several times and it works as expected - no "Maximum number of retries (10) exceeded" errors anymore.
@mediaessenz, please, verify it in your setup

@mediaessenz
Copy link
Author

YES, IT WORKS !

Thanks a lot to all involved people!

@legal90 legal90 closed this as completed Jul 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants