Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RUNNING HANDLER [handlers : wait for weave to listen]" timeout #687

Closed
NinoFloris opened this issue Apr 2, 2016 · 9 comments
Closed

"RUNNING HANDLER [handlers : wait for weave to listen]" timeout #687

NinoFloris opened this issue Apr 2, 2016 · 9 comments
Labels

Comments

@NinoFloris
Copy link

I have tried to get this great starting point for a cluster online but I get stuck with Weave not getting online correctly.

This is the console output leading up to the timeout of the ansible step.
https://gist.github.com/NinoFloris/89609002e852ce87fb3470320202e215

I don't exactly know how to debug this, Ssh'ing into a host and running docker ps -a gives this result

CONTAINER ID        IMAGE                        COMMAND             CREATED             STATUS              PORTS               NAMES
b8e1aa9af01c        weaveworks/weaveexec:1.4.4   "/bin/false"        2 minutes ago       Created                                 weavevolumes-1.4.4

So at least docker is running fine but systemctl shows weave and weaveproxy as dead

weave.service                                                   loaded    inactive dead      Weave Network Router
weaveproxy.service                                              loaded    inactive dead      Weave proxy for Docker API

Manually starting weaveproxy.service shows weave as 'activating' not 'active' (is this because of docker attach that keeps the service connected in the foreground?).

After starting weaveproxy.service my docker ps looks like this

CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS               NAMES
37b410f490d7        weaveworks/weaveexec:1.4.4   "/home/weave/sigproxy"   20 seconds ago      Up 19 seconds                           amazing_stallman
2406bda553a0        weaveworks/weave:1.4.4       "/home/weave/weaver -"   21 seconds ago      Up 20 seconds                           weave
885176cfe20e        weaveworks/weaveexec:1.4.4   "/home/weave/weavepro"   21 seconds ago      Up 21 seconds                           weaveproxy
b8e1aa9af01c        weaveworks/weaveexec:1.4.4   "/bin/false"             7 minutes ago       Created                                 weavevolumes-1.4.4

I would love some pointers as where to look because this is quite new terrain for me, the project is great though. I had almost started working on something exactly like this.

EDIT: This is on DigitalOcean

@wallies
Copy link
Contributor

wallies commented Apr 3, 2016

@NinoFloris if you export ANSIBLE_LOG="-vvvv" before running ansible again it should give you good debug output

@NinoFloris
Copy link
Author

I don't really see anything out of the ordinary here...

https://gist.github.com/NinoFloris/5485091483ca132120fce3169ecdac62

Maybe what I would like to know more is if there was a recent sucessful deployment by Capgemini on digitalocean. As it is stated to be project level supported in the docs it should just work right?

@wallies
Copy link
Contributor

wallies commented Apr 3, 2016

@NinoFloris I just tested the master branch on digitalocean and it worked for me. I did notice that the default master_instance_type and slave_instance_type of 512mb isnt enough and I have seen memory issues in some of the software used in the project. I usually use at least the 2GB for master and 4GB for slaves.

@NinoFloris
Copy link
Author

Ahhh wow, I would have never thought of that angle. I will try this tomorrow. Many thanks!

Would a contribution of better docs be something I could help with once I get a cluster up and running and have a better feeling for all the components involved?

@tayzlor
Copy link
Member

tayzlor commented Apr 4, 2016

Yes please that would be awesome!

@NinoFloris
Copy link
Author

It's strange and I don't like that I cannot solve this problem on my own but I still cannot get past this ansible step.

What my exact setup is:

  • TF_VAR for region (ams2) but I have also tried the default, lon1.
  • TF_VAR for master and slave instance types, as you suggested 2gb and 4gb.
  • APOLLO_PROVIDER set to digitalocean.
  • TF_VAR_do_token set to a valid token.

I have had runs with TF_VAR_public_key_file set to a correct path and runs where I did not export it as it does not seem to do much?
I see the droplets come up in the DO web interface with the correct region and size.

I have had ANSIBLE_LOG=-vvvv but this did not reveal any insights on this issue.

My Terraform version is v0.6.14, Ansible is 2.0.1.0, and I run this on a normal OSX 10.11
Looking at ref master in my .git directory, the Apollo version I use is commit hash 5ec6257 e.g. current HEAD (5 april 2016)

Are there any variables that I have not set correctly? As this is all that I could find in the docs and code that really required input.

@wallies
Copy link
Contributor

wallies commented Apr 4, 2016

@NinoFloris you are correct, I started getting the same error. Only happens on the first run though. Seems to be a repeat of this #662. When I run ./bootstrap/apollo-launch.sh ansible_playbook_run the second time everything comes up fine.

@tayzlor tayzlor added the bug label Apr 5, 2016
@tayzlor
Copy link
Member

tayzlor commented Apr 5, 2016

labelling as bug - think we need to look deeper into this and provide a fix.

@dllewellyn
Copy link
Contributor

Not sure if this is helpful info, but I'm getting the same problem running the vagrant build (using master branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants