New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discovery by file don't retry cheking not active node #1331

Closed
riuvshin opened this Issue Oct 23, 2015 · 3 comments

Comments

Projects
None yet
4 participants
@riuvshin
Contributor

riuvshin commented Oct 23, 2015

Hi! I faced with the issue where swarm don't trying to reconnect node after 1st failure

time="2015-10-23T14:41:27+01:00" level=info msg="Listening for HTTP" addr="0.0.0.0:2375" proto=tcp 
time="2015-10-23T14:41:27+01:00" level=error msg="Get http://node:2375/v1.15/info: dial tcp 192.168.56.15:2375: getsockopt: no route to host. Are you trying to connect to a TLS-enabled daemon without TLS?" 

I was failed 1st time due to node was not ready it was provisioned after swarm started, but once node become available swarm not trying to join node.

I can touch it with curl

curl node:2375/info
{"Containers":0,"Debug":0,"DockerRootDir":"/var/lib/docker","Driver":"devicemapper","DriverStatus":[["Pool Name","docker-253:1-50940367-pool"],["Pool Blocksize","65.54 kB"],["Backing Filesystem","xfs"],["Data file","/dev/loop0"],["Metadata file","/dev/loop1"],["Data Space Used","307.2 MB"],["Data Space Total","107.4 GB"],["Data Space Available","48.34 GB"],["Metadata Space Used","733.2 kB"],["Metadata Space Total","2.147 GB"],["Metadata Space Available","2.147 GB"],["Udev Sync Supported","true"],["Data loop file","/var/lib/docker/devicemapper/devicemapper/data"],["Metadata loop file","/var/lib/docker/devicemapper/devicemapper/metadata"],["Library Version","1.02.93-RHEL7 (2015-01-28)"]],"ExecutionDriver":"native-0.2","ID":"BHO6:BKFG:OSZ2:JCFF:MLHF:5NSL:HCYQ:SVK5:YZD2:O3MN:C4TL:EAG6","IPv4Forwarding":1,"Images":0,"IndexServerAddress":"https://index.docker.io/v1/","InitPath":"/usr/libexec/docker/dockerinit","InitSha1":"836be3a369bfc6bd4cbd3ade1eedbafcc1ea05d0","KernelVersion":"3.10.0-229.14.1.el7.x86_64","Labels":null,"MemTotal":3976208384,"MemoryLimit":1,"NCPU":2,"NEventsListener":0,"NFd":13,"NGoroutines":19,"Name":"node","OperatingSystem":"CentOS Linux 7 (Core)","RegistryConfig":{"IndexConfigs":{"data.dev.com:5000":{"Mirrors":[],"Name":"data.dev.com:5000","Official":false,"Secure":false},"docker.io":{"Mirrors":null,"Name":"docker.io","Official":true,"Secure":true}},"InsecureRegistryCIDRs":["127.0.0.0/8"]},"SwapLimit":1,"SystemTime":"2015-10-23T14:53:33.349169436+01:00"}

So I need to restart swarm to get it works. But If I manually stop docker on node swarm trying to reconnect node continuously:

time="2015-10-23T14:59:47+01:00" level=error msg="Flagging engine as dead. Updated state failed 3 times: Get http://node:2375/v1.15/containers/json?all=1&size=0: dial tcp 192.168.56.15:2375: getsockopt: connection refused" id="BHO6:BKFG:OSZ2:JCFF:MLHF:5NSL:HCYQ:SVK5:YZD2:O3MN:C4TL:EAG6" name=node
time="2015-10-23T15:00:30+01:00" level=error msg="Flagging engine as dead. Updated state failed 4 times: Get http://node:2375/v1.15/containers/json?all=1&size=0: dial tcp 192.168.56.15:2375: getsockopt: connection refused" id="BHO6:BKFG:OSZ2:JCFF:MLHF:5NSL:HCYQ:SVK5:YZD2:O3MN:C4TL:EAG6" name=node

And finally if I start docker again on node:

time="2015-10-23T15:01:48+01:00" level=info msg="Engine came back to life after %d retries. Hooray!5" id="BHO6:BKFG:OSZ2:JCFF:MLHF:5NSL:HCYQ:SVK5:YZD2:O3MN:C4TL:EAG6" name=runner1.dev.com 

Node become online. Btw there is a bug in log message 😃

Engine came back to life after %d retries. Hooray!5

ENV info
swarm: 1.0.0-rc1
docker: 1.6.0
OS: Centos7.1
Discovery type: file

content

node:2375

swarm started as:

swarm manage -H 0.0.0.0:2375 file:///usr/local/swarm/node_list
@MHBauer

This comment has been minimized.

Show comment
Hide comment
@MHBauer

MHBauer Oct 23, 2015

Member

this sounds kind of like #1185, but with file instead of token.

Member

MHBauer commented Oct 23, 2015

this sounds kind of like #1185, but with file instead of token.

@riuvshin

This comment has been minimized.

Show comment
Hide comment
@riuvshin

riuvshin Oct 27, 2015

Contributor

@MHBauer thank you for notification! I'll track #1185 also as it's blocker to me.

Contributor

riuvshin commented Oct 27, 2015

@MHBauer thank you for notification! I'll track #1185 also as it's blocker to me.

@vieux vieux added this to the 1.1.0 milestone Nov 16, 2015

@aluzzardi

This comment has been minimized.

Show comment
Hide comment
@aluzzardi

aluzzardi Dec 3, 2015

Contributor

Issue now tracked by the node management proposal (#1486)

Contributor

aluzzardi commented Dec 3, 2015

Issue now tracked by the node management proposal (#1486)

@aluzzardi aluzzardi closed this Dec 3, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment