Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Calico node failed to start" when scaling out docker cluster #301

Open
aledsage opened this issue May 4, 2016 · 0 comments
Open

"Calico node failed to start" when scaling out docker cluster #301

aledsage opened this issue May 4, 2016 · 0 comments
Assignees

Comments

@aledsage
Copy link
Member

aledsage commented May 4, 2016

Using clocker 1.2.0-SNAPSHOT (at commit 7c9346c, while testing a couple of unrelated fixes for issues #288 and #290)...

I successfully deployed a 2 host clocker+calico cluster in BlueBox. I then deployed many entities that created containers (using Brooklyn's MachineEntity) to cause the cluster to auto-scale.

It create a third host, but this hung on startup (waiting forever for post-start to finish). It is waiting for SdnAgent agent = Entities.attributeSupplierWhenReady(this, SdnAgent.SDN_AGENT).get();.

Looking at the CalicoNode for that host, its service.state is "ON_FIRE" and its service.isUp is "false". Looking in the debug log (grep -E "OKsRTXuY|10.101.1.162"), I see the following error:

2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Pulling Docker image calico/node:v0.19.0
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Running Docker container with the following command:
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] docker run -d --restart=always --net=host --privileged --name=calico-node -e HOSTNAME=brooklyn-o6o7oy-aled-clocker-bl-fgdo-docker-host-hhfw-bb3 -e
 IP=10.101.1.162 -e IP6= -e CALICO_NETWORKING=true -e AS= -e NO_DEFAULT_POOLS= -e ETCD_AUTHORITY=10.101.1.162:2379 -e ETCD_SCHEME=http -v /var/log/calico:/var/log/calico -v /lib/modules:/lib/modules -v /var/run/calico:/var/run/calico ca
lico/node:v0.19.0
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Calico node is running with id: 06dc7cbec5c7241fbdf0dec2cecce312908f7ce90224e90844b5a494765b6b1c
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Waiting for successful startup
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Traceback (most recent call last):
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]   File "startup.py", line 295, in <module>
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]     main()
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]   File "startup.py", line 285, in main
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]     _ensure_host_tunnel_addr(ipv4_pools, ipip_pools)
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]   File "startup.py", line 55, in _ensure_host_tunnel_addr
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]     _assign_host_tunnel_addr(ipip_pools)
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]   File "startup.py", line 74, in _assign_host_tunnel_addr
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]     host=hostname
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]   File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 128, in wrapped
2016-05-04 22:03:08,221 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]     return fn(*args, **kwargs)
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]   File "/usr/lib/python2.7/site-packages/pycalico/ipam.py", line 618, in auto_assign_ips
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]     pool[0], host)
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [brooklyn-execmanager-FnS0lXyr-1063]: launching CalicoNodeImpl{id=OKsRTXuY}, on machine SshMachineLocation[10.101.1.162:aled@10.101.1.162/10.101.1.162:22(id=hKrZGyax)], completed: return status
 0
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]   File "/usr/lib/python2.7/site-packages/pycalico/ipam.py", line 723, in _auto_assign
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]     ipam_config)
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]   File "/usr/lib/python2.7/site-packages/pycalico/ipam.py", line 189, in _new_affine_block
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout]     "wrong attributes" % pool)
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] pycalico.datastore_errors.PoolNotFound: Requested pool 50.0.3.0/24 is not configured or haswrong attributes
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Calico node failed to start
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Pulling Docker image calico/node-libnetwork:v0.8.0
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Calico libnetwork driver is running with id: dc8372dbd5e8e821dfc102f1d6e89c1384592870cd0766316d365bbae496ae1d
2016-05-04 22:03:08,222 DEBUG brooklyn.SSH [Thread-24165]: [OKsRTXuY@10.101.1.162:stdout] Executed /tmp/brooklyn-20160504-220243832-D1kk-launching_CalicoNodeImpl_id_OK.sh, result 0

It then goes on to repeatedly fail the check-running for CalicoNodeImpl{id=OKsRTXuY}.

2016-05-04 22:05:14,762 DEBUG brooklyn.SSH [brooklyn-execmanager-FnS0lXyr-1348]: check-running CalicoNodeImpl{id=OKsRTXuY}, on machine SshMachineLocation[10.101.1.162:aled@10.101.1.162/10.101.1.162:22(id=hKrZGyax)], completed: return status 1
2016-05-04 22:05:14,762 DEBUG brooklyn.SSH [Thread-33364]: [OKsRTXuY@10.101.1.162:stdout] calico-node container not running
2016-05-04 22:05:14,762 DEBUG brooklyn.SSH [Thread-33364]: [OKsRTXuY@10.101.1.162:stdout] Executed /tmp/brooklyn-20160504-220514011-ZJFA-check-running_CalicoNodeImpl_i.sh, result 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants