Skip to content
This repository has been archived by the owner on Sep 26, 2021. It is now read-only.

Certificates do not install properly, always regenerate #1954

Closed
carolynvs opened this issue Oct 8, 2015 · 115 comments
Closed

Certificates do not install properly, always regenerate #1954

carolynvs opened this issue Oct 8, 2015 · 115 comments
Labels

Comments

@carolynvs
Copy link

I am using Docker Toolbox 1.8.2c with a local build of docker-machine using PR #1951. That PR fixes the ssh problems but now the generation/validation of certificates is broken. I do not know if the problem is due to the PR or is present on master.

After creating a machine, any attempt to use the certificates, e.g. running env causes docker-machine to detect that the certs are invalid and regenerate them. The certs are never regenerated and copied successfully so all attempts to connect to the machine and use docker fail. I attempted debugging a bit and the certificate validation is failing in cert.go, line 205 _, err = tls.DialWithDialer(dialer, "tcp", addr, tlsConfig).

See https://gist.github.com/carolynvs/d98baf90172d386561e1 for the full output from calling docker-machine create default --driver virtualbox on Windows 10.

The machine can't ever get its certificates installed properly:

$ docker-machine env default
Invalid certs detected; regenerating for 192.168.99.100:2376
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.99.100:2376"
export DOCKER_CERT_PATH="C:\Users\caro8994\.docker\machine\certs"
export DOCKER_MACHINE_NAME="default"
# Run this command to configure your shell:
# eval "$(C:\Program Files\Docker Toolbox\docker-machine.exe env default)"

caro8994@CAROLYNVANS87E4 MINGW64 ~
$ docker-machine env default
Invalid certs detected; regenerating for 192.168.99.100:2376
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.99.100:2376"
export DOCKER_CERT_PATH="C:\Users\caro8994\.docker\machine\certs"
export DOCKER_MACHINE_NAME="default"
# Run this command to configure your shell:
# eval "$(C:\Program Files\Docker Toolbox\docker-machine.exe env default)"

Here is the output from running docker-machine -D env default https://gist.github.com/carolynvs/778e4533a26fd612732d.

Here is the output from running docker-machine -D regenerate-certs default https://gist.github.com/carolynvs/ad82eb5fb9d7c42a3ed0

@carolynvs carolynvs mentioned this issue Oct 8, 2015
1 task
@nathanleclaire
Copy link
Contributor

Thanks for the detailed summary. I've seen issues like this before as well and I'll look into it.

@nathanleclaire
Copy link
Contributor

Are you on the latest VirutalBox? i.e. 5.0.6?

@carolynvs
Copy link
Author

I was using the 5.0.4 which ships with the latest version of Docker Toolbox (1.8.2c). I just removed that version, installed 5.0.6 and I am experiencing the same behavior.

@nathanleclaire
Copy link
Contributor

OK thanks.

@nathanleclaire
Copy link
Contributor

@carolynvs If you remove the host only network that you have (can do this in VirtualBox GUI) and try again, does it work?

@carolynvs
Copy link
Author

I deleted the machine, removed the adapter and tried again with the same result.

@nathanleclaire
Copy link
Contributor

OK thanks. Very peculiar behavior. I might make a test build which dumps more information about the certs and suggest that you try that if you're agreeable.

@carolynvs
Copy link
Author

Of course! I'm happy to help out however I can.

If you want to just make a branch and point me to it, I can build it myself (:heart: containerized builds!). That way you don't need to throw multiple builds over the wall if this takes more than one attempt.

@bentruyman
Copy link

Another thing to possibly consider while fixing this, some folks like myself actually write out the contents of docker-machine env to a file which I'll source for each new terminal session (as it's a little faster than running docker-machine env). If the output of this command contains anything that cannot be evald, it's obviously going to cause problems.

So lines like the following will cause issues:

Invalid certs detected; regenerating for 192.168.99.100:2376
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...

I experienced this issue on 0.5.0-dev, but haven't experienced it since downgrading to 0.4.1.

@blaggacao
Copy link
Contributor

I experienced exactly the same behavior today on the release candidate.

@dgageot dgageot self-assigned this Oct 19, 2015
@dgageot
Copy link
Member

dgageot commented Oct 19, 2015

Hi @carolynvs @blaggacao, thanks a lot for your feedback.

I'm trying to reproduce/fix this bug. Could you try this PR (#2006) that I created to help investigate the bug

@hairyhenderson
Copy link
Contributor

Looks like I'm seeing this too. I'm using the latest master build on OS X using the digitalocean driver, so this definitely isn't anything to do with the environment. I think the area/windows and area/driver-virtualbox tags are irrelevant here :)

@dgageot
Copy link
Member

dgageot commented Oct 19, 2015

Hi @hairyhenderson, can you try build PR #2006 and tell me the output for docker-machine -D env default?

@hairyhenderson
Copy link
Contributor

@dgageot - will do when I get a chance.

I'm also thinking a bit more about this and realizing that I've been doing a local build (i.e. make build on OS X, without using a container). One of the areas where go build has behaved differently in the past is around certificates (esp. root CA certs), so this might be related to that... I dunno.

But I'll rebuild with #2006 and try it out. Thanks!

@dgageot
Copy link
Member

dgageot commented Oct 19, 2015

@hairyhenderson That's a good point. I'll run my tests with a cross-compiled docker-machine

@carolynvs
Copy link
Author

@dgageot Here is the failed output https://gist.github.com/carolynvs/e2473d21c3376f1ebec2 from docker-machine -D env default for a brand new machine.

I built #2006 and copied docker-machine.exe and docker-machine-driver-virtualbox.exe to the Docker Toolbox installation directory. I am using Docker Toolbox 1.8.2c on Windows 10.

@blaggacao
Copy link
Contributor

I'm not sufficiently proficient as to know how to build, maybe I will have a look on it tis evening, if I can figure it out.

@dgageot
Copy link
Member

dgageot commented Oct 19, 2015

@carolynvs Thanks a lot. I still don't understand what's going on but your logs will help me.

@dgageot
Copy link
Member

dgageot commented Oct 19, 2015

@carolynvs Can you provide the output of:

VBoxManage list hostonlyifs
VBoxManage list dhcpservers

@carolynvs
Copy link
Author

C:\Program Files\Oracle\VirtualBox>VBoxManage list hostonlyifs
Name:            VirtualBox Host-Only Ethernet Adapter
GUID:            3729f60a-d9c3-4daa-96ca-7ce7bae4ddcc
DHCP:            Disabled
IPAddress:       192.168.56.1
NetworkMask:     255.255.255.0
IPV6Address:     fe80:0000:0000:0000:9d6d:4449:fce1:e1cb
IPV6NetworkMaskPrefixLength: 64
HardwareAddress: 0a:00:27:00:00:00
MediumType:      Ethernet
Status:          Up
VBoxNetworkName: HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter

Name:            VirtualBox Host-Only Ethernet Adapter #2
GUID:            99076a32-c9e5-4930-895a-a35ee45c2542
DHCP:            Disabled
IPAddress:       192.168.99.1
NetworkMask:     255.255.255.0
IPV6Address:     fe80:0000:0000:0000:118b:39e1:36b9:a336
IPV6NetworkMaskPrefixLength: 64
HardwareAddress: 0a:00:27:00:00:00
MediumType:      Ethernet
Status:          Up
VBoxNetworkName: HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter #2


C:\Program Files\Oracle\VirtualBox>VBoxManage list dhcpservers
NetworkName:    HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter
IP:             192.168.56.100
NetworkMask:    255.255.255.0
lowerIPAddress: 192.168.56.101
upperIPAddress: 192.168.56.254
Enabled:        Yes

NetworkName:    HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter #2
IP:             192.168.99.6
NetworkMask:    255.255.255.0
lowerIPAddress: 192.168.99.100
upperIPAddress: 192.168.99.254
Enabled:        Yes

I have found that I still occasionally get double host only adapters. I just deleted them both and created a new machine. The certs are still regenerating when I run docker-machine env default.

Here is the output of the VBoxManage commands the second time around (with only 1 host adapter).

C:\Program Files\Oracle\VirtualBox>VBoxManage list hostonlyifs
Name:            VirtualBox Host-Only Ethernet Adapter
GUID:            2883b47a-862d-454e-9db7-42c3789585eb
DHCP:            Disabled
IPAddress:       192.168.99.1
NetworkMask:     255.255.255.0
IPV6Address:     fe80:0000:0000:0000:90ff:fd25:e5f0:8c92
IPV6NetworkMaskPrefixLength: 64
HardwareAddress: 0a:00:27:00:00:00
MediumType:      Ethernet
Status:          Up
VBoxNetworkName: HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter


C:\Program Files\Oracle\VirtualBox>VBoxManage list dhcpservers
NetworkName:    HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter
IP:             192.168.99.6
NetworkMask:    255.255.255.0
lowerIPAddress: 192.168.99.100
upperIPAddress: 192.168.99.254
Enabled:        Yes

@dgageot
Copy link
Member

dgageot commented Oct 19, 2015

@carolynvs I have no idea so far.
I pushed a couple more commits on the PR to print more information and try things.
If you have time to update the output you get, that'd be just great.

ping @nathanleclaire @dmp42 any idea?

@carolynvs
Copy link
Author

Here's the new output: https://gist.github.com/carolynvs/84cd140bcbf9b696e20f.

Let me know if there's another way to go about debugging the connection problem. I'm not quite sure what docker-machine is detecting that is causing it to regenerate the certs but am happy to poke around in /var/lib/boot2docker on the host or compare certs between windows and the host, etc if I knew what to look for.

@nathanleclaire nathanleclaire removed this from the 0.5.2 milestone Dec 7, 2015
@dlgoodchild
Copy link

I now have this issue after upgrading from version 1.8 to 1.9.1 using the docker toolbox on MacOSX 10.10.5

Error running connection boilerplate: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.100:2376": dial tcp 192.168.99.100:2376: getsockopt: connection refused
You can attempt to regenerate them using 'docker-machine regenerate-certs name'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

command failed; 1

@spieden
Copy link

spieden commented Dec 30, 2015

This is happening to me periodically too. Docker v1.9.1

@alambike
Copy link

Same problem here with azure driver. Every time that we I create a new azure machine it fails with the error:

Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "testcargo2-prefapp-in.cloudapp.net:2376": tls: DialWithDialer timed out
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'

After running docker-machine regenerate-certs the certs validations works ok.

In docker-machine v0.5.5 there is no problem, and the creation of a docker host works ok:

Running pre-create checks...
Creating machine...
(testcargo3-prefapp-in) Creating Azure machine...
Waiting for machine to be running, this may take a few minutes...
Machine is running, waiting for SSH to be available...
Detecting operating system of created instance...
Detecting the provisioner...
Provisioning with ubuntu(upstart)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect Docker to this machine, run: docker-machine env 

@nathanleclaire
Copy link
Contributor

@alambike You're hitting this issue with 0.6.0?

@alambike
Copy link

Yep, from 0.5.5 onwards. I have test this with 0.5.6 and 0.6.0.

@helinwang
Copy link

same for me on 0.6.0 with aws driver (constantly) on mac 10.10.5. Not happening with virtual box driver.

@helinwang
Copy link

fixed after changing --engine-opt="cluster-advertise=eth1:2376" to --engine-opt="cluster-advertise=eth0:2376" using docker-machine 0.6.0 (docker-machine 0.5.4 still fails)

@sfc-gh-eraigosa
Copy link
Contributor

I think im battling the same issue on my machine. I'm using ubuntu 14.04
docker-machine version 0.5.5, build 02c4254
Running host on RHEL 7.1
Server Version: 1.10.2-cs1-rc3

Tried everything suggested with time on the machines, here is output i get from curl

curl -v --cacert ~/.docker/machine/certs/ca.pem --cert ~/.docker/machine/machines/$NODE_NAME/cert.pem --key ~/.docker/machine/machines/$NODE_NAME/key.pem https://$(docker-machine ip $NODE_NAME):2376/version

  • Hostname was NOT found in DNS cache
  • Trying 16.85.3.140...
  • Connected to 16.85.3.140 (16.85.3.140) port 2376 (#0)
  • successfully set certificate verify locations:
  • CAfile: /home/eraigosa/.docker/machine/certs/ca.pem
    CApath: /etc/ssl/certs
  • SSLv3, TLS handshake, Client hello (1):
  • SSLv3, TLS handshake, Server hello (2):
  • SSLv3, TLS handshake, CERT (11):
  • SSLv3, TLS handshake, Server key exchange (12):
  • SSLv3, TLS handshake, Request CERT (13):
  • SSLv3, TLS handshake, Server finished (14):
  • SSLv3, TLS handshake, CERT (11):
  • SSLv3, TLS handshake, Client key exchange (16):
  • SSLv3, TLS handshake, CERT verify (15):
  • SSLv3, TLS change cipher, Client hello (1):
  • SSLv3, TLS handshake, Finished (20):
  • SSLv3, TLS alert, Server hello (2):
  • error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate
  • Closing connection 0
    curl: (35) error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate

@carolynvs
Copy link
Author

@nathanleclaire I have found the cultprit! prltoolsd from boot2docker is constantly setting my date/timezone incorrectly.

$ date
<the current local time with the timezone set to UTC>

$ date -s '<the correct time in UTC>'
<prints the correct time>

$ date
<the date/time is now broken again>

$ /usr/local/etc/init.d/prltoolsd stop

$ date -s '<the correct time in UTC>'
<prints the correct time>

$ date
<prints the correct time and stays put>

After stopping prltoolsd and resetting the date, all my docker-machine commands work as expected and my certificates do not regenerate.

I still don't know why the timezone is set to UTC and the time to localtime after making a new machine, so this is just a workaround, not a fix.

@nathanleclaire
Copy link
Contributor

Nice @carolynvs ! We'll work on seeing if we can fix this in boot2docker.

@tianon @legal90 FYI ^^

@legal90
Copy link
Contributor

legal90 commented Mar 12, 2016

@carolynvs Wow 😨 . It looks really weird, because prltoolsd process shouldn't start on any other virtualization system except Parallels Desktop. The daemon will start only if /usr/bin/prlvmcheck returns 0 exit-code, which means that we are in Parallels VM.

Have you reproduced this issue on Virtualbox VM? What Boot2Docker version are you using?

P.s. Also, if we assume that prltoolsd is the only reason, then Docker Machine version should not make sense. However, other comments above (link) tells that the issue appears only in Machine 0.5.5+

@carolynvs
Copy link
Author

@legal90 That makes more sense. My environment is a bit wonky, but it did used to work just fine:

  1. I'm on a Mac running Parallels.
  2. Inside Parallels I run Windows, then the latest docker toolbox installation. I do this because I write documentation and tutorials for Docker, and need to target Mac, Linux and Windows users.

This explains why prltoolsd is attempting to manage my docker host clock. It must be picking up on being nested inside Parallels. Does that also explain why the system clock is set to local time but thinks it is UTC?

That is the root problem that caused me to open this bug. I create a new docker machine at 10 AM CST (-6). The system clock (date) on the new machine thinks that it is 10 AM UTC, so the timestamps on the certificates are "in the future". hwclock reports the correct time.

Looking at the boot2docker Dockerfile, I noticed that it is setting /etc/timezone to UTC and should have set /etc/localtime to UTC as well.

see https://github.com/boot2docker/boot2docker/blob/master/Dockerfile#L311

RUN echo 'UTC' > $ROOTFS/etc/timezone \
    && cp -L /usr/share/zoneinfo/UTC $ROOTFS/etc/localtime

But on my docker machine host, the tzdata package is not installed, so /usr/share/zoneinfo doesn't exist and neither does /etc/localtime. I have built my own boot2docker from the latest Dockerfile just to verify that I'm not using an old iso. I wonder if missing the /etc/localtime file is contributing to the incorrect time problem?

@legal90
Copy link
Contributor

legal90 commented Mar 14, 2016

@carolynvs Ah, now I got it.

This explains why prltoolsd is attempting to manage my docker host clock. It must be picking up on being nested inside Parallels.

Yeah, that's the root of issue. prltoolsd runs in Virtualbox VM nested into Parallels VM. I've reproduced this and reported to responsible people at Parallels. I'll let you know as soon as it's fixed.

Does that also explain why the system clock is set to local time but thinks it is UTC?

Well, it's hard to commit but it is a known issue of Parallels Desktop (and its guest tools). It was originally reported here: Parallels/vagrant-parallels#186.
It was worked around in PD 11 by additional option for prlctl utility, but it doesn't help in your rare case, because you are actually running Virtualbox VM on Windows.

I'm sorry, but the only solution I can suggest you at the moment is to prevent prltoolsd from running in your VM on the boot. If you use a custom Boot2Docker ISO build, you can remove parallels-related lines from Dockerfile and rebuild the ISO. Or comment out this line: https://github.com/boot2docker/boot2docker/blob/master/rootfs/rootfs/bootscript.sh#L101

@carolynvs
Copy link
Author

Thanks for the extra info about how prltoolsd works! I'll do as you suggest and make a custom iso for my setup. 🍺

I would close this issue, as this fixes my problem, but I'll leave that up to you since other people seem to be hitting it (though probably for different reasons!).

@nathanleclaire
Copy link
Contributor

I think we can treat it as effectively resolved; we can re-open if any new issues are discovered.

Thanks everyone for your contributions in reporting and triaging this epically long issue!

@mtrtm
Copy link

mtrtm commented Mar 22, 2016

I am using DockerToolbox 1.10.3 on Windows. It was working fine until I restarted, and I am now having this same issue. I am also not that familiar with Docker, so can someone tell me what the fix is?

@nathanleclaire
Copy link
Contributor

@mtrtm Does docker-machine regenerate-certs -f not work?

@mtrtm
Copy link

mtrtm commented Mar 25, 2016

Yes, docker-machine regenerate-certs -f does. It also seems to do it every time I start up Docker Quickstart Terminal

@virtualmackem
Copy link

+1
I'm using docker mainly on a Redhat server and everything works just fine. I'm not an expert but I know what I'm doing. On Windows with virtualbox, however, every time the docker VM restarts I need to regenerate-certs. I'm on toolbox 1.11.1

@leandropincini
Copy link

+1

Macbook late 2009
2,26 GHz Intel Core 2 Duo
Mac OS Sierra 10.12
Docker Tollbox 1.2.1
VirtualBox 5.0.26

$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
vbox-test - virtualbox Running tcp://192.168.99.100:2376 Unknown Unable to query docker version: Get https://192.168.99.100:2376/v1.15/version: x509: certificate has expired or is not yet valid

$ docker-machine env vbox-test
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.100:2376": x509: certificate has expired or is not yet valid
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

$ docker-machine regenerate-certs vbox-test
Regenerate TLS machine certs? Warning: this is irreversible. (y/n): y
Regenerating TLS certificates
Waiting for SSH to be available...
Detecting the provisioner...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...

$ docker-machine env vbox-test
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.100:2376": x509: certificate has expired or is not yet valid
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

@hholst80
Copy link

hholst80 commented Oct 30, 2016

I had this on the default install of the Docker Tookit (installed on Windows 10 Home) downloaded 2016-10-30. The error went away after running:

docker-machine regenerate-certs

@paddor
Copy link

paddor commented Mar 4, 2018

Having this issue on macOS. docker-machine env complains:

$ docker-machine env docker1
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.100:2376": x509: certificate has expired or is not yet valid
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.

Regenerating the certificates (even with -f) does not help. docker-machine ssh docker1 date shows the correct date and time.

Any ideas?

@wkruse
Copy link

wkruse commented Aug 18, 2018

@paddor Regenerating the certificates incl. client certificates (docker-machine regenerate-certs -f --client-certs) fixed it for me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests