Skip to content
This repository has been archived by the owner on Sep 26, 2021. It is now read-only.

Docker daemon does not start properly on DigitalOcean #2366

Closed
lucj opened this issue Nov 20, 2015 · 15 comments
Closed

Docker daemon does not start properly on DigitalOcean #2366

lucj opened this issue Nov 20, 2015 · 15 comments

Comments

@lucj
Copy link

lucj commented Nov 20, 2015

I use the following script to create a consul KV and a swarm (one master + one node)

 # KV Store
docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
consul
eval "$(docker-machine env consul)"
docker run -d -p "8500:8500" -h "consul" progrium/consul -server -bootstrap

sleep 5

# Swarm master
docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
--swarm --swarm-image="swarm" --swarm-master \
--swarm-discovery="consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-advertise=eth1:2376" \
demo0

sleep 5

# Swarm node
docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
--swarm --swarm-image="swarm:1.0.0-rc2" \
--swarm-discovery="consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-advertise=eth1:2376" \
demo1

Output while creating demo0 swarm master (same error when creating demo1 swarm node)

Creating machine...
(demo1) OUT | Creating SSH key...
(demo1) OUT | Creating Digital Ocean droplet...
(demo1) OUT | Waiting for IP address to be assigned to the Droplet...
Waiting for machine to be running, this may take a few minutes...
Machine is running, waiting for SSH to be available...
Detecting operating system of created instance...
Detecting the provisioner...
Provisioning created instance...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: Unable to verify the Docker daemon is listening:     Maximum number of retries (5) exceeded

I can ssh to the demo0 and demo1 nodes but got the following error when trying to switch to the context of those Docker hosts

$> eval $(docker-machine env demo0)
Error running connection boilerplate: Error checking and/or regenerating the certs: There was an error validating certificates for host "46.101.72.202:2376": dial tcp 46.101.72.202:2376: getsockopt: connection refused
You can attempt to regenerate them using 'docker-machine regenerate-certs name'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

I tried several times and got the same results each time.

My docker-machine version:

$ docker-machine -v
docker-machine version 0.5.1 (7e8e38e)
@lucj
Copy link
Author

lucj commented Nov 20, 2015

When ssh to the host, the docker daemon is not running.

$> dm ssh demo0
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-57-generic x86_64)
 * Documentation:  https://help.ubuntu.com/

  System information as of Fri Nov 20 13:40:59 EST 2015

  System load:  0.59              Processes:              71
  Usage of /:   5.3% of 29.40GB   Users logged in:        0
  Memory usage: 6%                IP address for eth0:    178.62.106.249
  Swap usage:   0%                IP address for docker0: 172.17.0.1

  Graph this data and manage this system at:
    https://landscape.canonical.com/

root@demo0:~# docker ps
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

@iammerrick
Copy link

I don't think this is swarm specific, I'm seeing similar issues with digital ocean.

@nathanleclaire
Copy link
Contributor

Can you please paste the content of the Docker logs at /var/log/upstart/docker.log (I'm assuming Ubuntu 14.04) on the servers where the daemon did not boot properly?

@nathanleclaire nathanleclaire changed the title Certificates errors after creating a Swarm on DigitalOcean with Machine Docker daemon does not start properly on DigitalOcean Nov 20, 2015
@iammerrick
Copy link

For me the Docker daemon boots fine I just can't get validated certs...

INFO[0001] Firewalld running: false
INFO[0001] Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address
WARN[0002] Your kernel does not support swap memory limit.
INFO[0002] Loading containers: start.
....................................................................
INFO[0003] Loading containers: done.
INFO[0003] Daemon has completed initialization
INFO[0003] Docker daemon                                 commit=a34a1d5 execdriver=native-0.2 graphdriver=aufs version=1.9.1
2015/11/20 18:28:10 http: TLS handshake error from 50.207.240.162:9758: read tcp 50.207.240.162:9758: connection reset by peer
2015/11/20 18:28:51 http: TLS handshake error from 50.207.240.162:50338: read tcp 50.207.240.162:50338: connection reset by peer
INFO[0148] GET /v1.21/version
INFO[0151] Processing signal 'terminated'
INFO[0161] Container ea77fbcd1ccbce0bf3bcc326b3a58f55f4a9a7297d4dc948647320b4cee5cc4a failed to exit within 10 seconds of SIGTERM - using the force
INFO[0161] Container 3648f05fa1187013f0dba705d3c2ed5f86e5a3aa2f7a6172120dc0778e699b59 failed to exit within 10 seconds of SIGTERM - using the force
INFO[0000] API listen on [::]:2376
INFO[0000] API listen on /var/run/docker.sock
INFO[0000] Firewalld running: false
INFO[0000] Default bridge (docker0) is assigned with an IP address 172.17.0.1/16. Daemon option --bip can be used to set a preferred IP address
WARN[0000] Your kernel does not support swap memory limit.
INFO[0000] Loading containers: start.
....................................................................
INFO[0002] Loading containers: done.
INFO[0002] Daemon has completed initialization
INFO[0002] Docker daemon                                 commit=a34a1d5 execdriver=native-0.2 graphdriver=aufs version=1.9.1
2015/11/20 18:30:42 http: TLS handshake error from 50.207.240.162:21829: read tcp 50.207.240.162:21829: connection reset by peer```

@iammerrick
Copy link

On the client:

λ ~/ docker-machine env text-rocket-production
Error running connection boilerplate: Error checking and/or regenerating the certs: There was an error validating certificates for host "104.236.196.215:2376": EOF
You can attempt to regenerate them using 'docker-machine regenerate-certs name'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

λ ~/ docker-machine regenerate-certs text-rocket-production
Regenerate TLS machine certs?  Warning: this is irreversible. (y/n): y
Regenerating TLS certificates
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
λ ~/ docker-machine env text-rocket-production
Error running connection boilerplate: Error checking and/or regenerating the certs: There was an error validating certificates for host "104.236.196.215:2376": EOF
You can attempt to regenerate them using 'docker-machine regenerate-certs name'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

@lucj
Copy link
Author

lucj commented Nov 20, 2015

I got those logs in demo0

Waiting for /var/run/docker.sock
ESC[34mINFOESC[0m[0000] API listen on /var/run/docker.sock
ESC[33mWARNESC[0m[0000] Running modprobe bridge br_netfilter failed with message: modprobe:     WARNING: Module br_netfilter not found.
insmod /lib/modules/3.13.0-57-generic/kernel/net/llc/llc.ko
insmod /lib/modules/3.13.0-57-generic/kernel/net/802/stp.ko
insmod /lib/modules/3.13.0-57-generic/kernel/net/bridge/bridge.ko
, error: exit status 1
ESC[34mINFOESC[0m[0000] Firewalld running: false
/var/run/docker.sock is up
ESC[34mINFOESC[0m[0000] Default bridge (docker0) is assigned with an IP address 172.17.0.0/16.     Daemon option --bip can be used to set a preferred IP address
ESC[33mWARNESC[0m[0000] Your kernel does not support swap memory limit.
ESC[34mINFOESC[0m[0000] Loading containers: start.

ESC[34mINFOESC[0m[0000] Loading containers: done.
ESC[34mINFOESC[0m[0000] Daemon has completed initialization
ESC[34mINFOESC[0m[0000] Docker daemon                                
ESC[34mcommitESC[0m=a34a1d5 ESC[34mexecdriverESC[0m=native-0.2 ESC[34mgraphdriverESC[0m=aufs ESC[34mversionESC[0m=1.9.1
ESC[34mINFOESC[0m[0000] GET /v1.21/version
ESC[34mINFOESC[0m[0001] GET /v1.21/version
ESC[34mINFOESC[0m[0003] Processing signal 'terminated'
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))
ESC[34mINFOESC[0m[0000] API listen on [::]:2376
ESC[34mINFOESC[0m[0000] API listen on /var/run/docker.sock
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))
ESC[34mINFOESC[0m[0000] API listen on [::]:2376
ESC[34mINFOESC[0m[0000] API listen on /var/run/docker.sock
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))
ESC[34mINFOESC[0m[0000] API listen on [::]:2376
ESC[34mINFOESC[0m[0000] API listen on /var/run/docker.sock
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))
ESC[34mINFOESC[0m[0000] API listen on [::]:2376
ESC[34mINFOESC[0m[0000] API listen on /var/run/docker.sock
ESC[31mFATAESC[0m[0000] Error starting daemon: discovery advertise parsing failed (no available advertise IP address in interface (eth1:2376))

demo0 is a swarm master created with the following command

# Swarm master
docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
--swarm --swarm-image="swarm" --swarm-master \
--swarm-discovery="consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-advertise=eth1:2376" \
demo0

@nathanleclaire
Copy link
Contributor

@lucj Just a guess, but probably you need to advertise on eth0 instead of eth1. eth1 is the interface on boot2docker / vbox but eth0 I think is what you want on DigitalOcean.

# Swarm master
docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
--swarm --swarm-image="swarm" --swarm-master \
--swarm-discovery="consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
demo0

Note that the cluster-advertise opt is different.

@nathanleclaire
Copy link
Contributor

@iammerrick Your issue seems different. Could I get you to open a new issue with the --debug logs from create please?

@lucj
Copy link
Author

lucj commented Nov 21, 2015

@nathanleclaire thanks, swarm creation is fine now. BTW, should eth0 be used on all the cloud provider ?

@lucj
Copy link
Author

lucj commented Nov 22, 2015

@nathanleclaire

As I have another issue when deploying a compose file on the swarm (docker-archive/classicswarm#1440), I have tested the creation of the swarm on DO using ubuntu 14.10 / 15.04 or 15.10 (to make sure Kernel is 3.16+). Until now I had only tested with the default ubuntu 14.04 (kernel 3.13).

When I use Machine to create either consul / swarm master / swarm node, I have the same error message (whereas I do not have the error anymore when using 14.04):

Error creating machine: Error running provisioning: Unable to verify the Docker daemon is listening: Maximum number of retries (5) exceeded

Docker Engine is running on each host but I have this error when trying to connect:

$ eval $(docker-machine env --swarm demo0)
Error running connection boilerplate: Error checking and/or regenerating the certs: There was an error validating certificates for host "xxx.xxx.xxx.xxx:3376": dial tcp xxx.xxx.xxx.xxx:3376: getsockopt: connection refused
You can attempt to regenerate them using 'docker-machine regenerate-certs name'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

Logging in through ssh is fine though:

luc ~ [] $ docker-machine ssh demo
Welcome to Ubuntu 15.04 (GNU/Linux 3.19.0-31-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
New release '15.10' available.
Run 'do-release-upgrade' to upgrade to it.

root@demo0:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
root@demo0:~#

Creation of my swarm is done with:

# KV Store

docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-image ubuntu-15-04-x64 \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
consul
eval "$(docker-machine env consul)"
docker run -d -p "8500:8500" -h "consul" progrium/consul -server -bootstrap

# Swarm master
docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-image ubuntu-15-04-x64 \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
--swarm \
--swarm-master \
--swarm-discovery="consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
demo0

# Swarm node
docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-image ubuntu-15-04-x64 \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
--swarm \
--swarm-discovery="consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
demo1

@nathanleclaire
Copy link
Contributor

@nathanleclaire thanks, swarm creation is fine now. BTW, should eth0 be used on all the cloud provider ?

It seems likely to work on most, but it's not guaranteed. Cloud providers all set up their networking in slightly different ways. Sometimes you might want to use the eth1 or others, if it's a private networking interface for instance.

Your second issue may be because support for Ubuntu >=15.04 hasn't been released in an official version yet, though it has been merged into master. In 0.5.2 release (about one weeks' time) support for this will be released.

@lucj
Copy link
Author

lucj commented Nov 23, 2015

@nathanleclaire thanks for your answers.
No problem regarding the second issue, I'm using debian 8 and its 3.16 kernel, that fixed the thing. I'll come back to Ubuntu in a week then :)
Docker Machine is really a great piece of soft, thanks !

@nathanleclaire
Copy link
Contributor

Thanks @lucj ! Glad you enjoy.

@MRezaNasirloo
Copy link

@nathanleclaire advertising on eth0 fixed the problem.

@noeljackson
Copy link

Same. Use eth0 NOT eth1. Of course, you can tell what to use by checking ifconfig

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants