Skip to content
This repository has been archived by the owner on Apr 12, 2021. It is now read-only.

nova-cloud-controller/0: hook failed #487

Closed
qgriffith-zz opened this issue Oct 11, 2016 · 9 comments
Closed

nova-cloud-controller/0: hook failed #487

qgriffith-zz opened this issue Oct 11, 2016 · 9 comments

Comments

@qgriffith-zz
Copy link

Ubuntu 16.04 using MAAS Version 2.0.0+bzr5189-0ubuntu1 (16.04.1) JUJU: 2.0-rc3-0ubuntu116.04.1juju1 conjure: 2.0.1-0201610061938ubuntu16.04.

I get the following in my log when doing a conjure up openstack and using the MAAS option

conjure-up/_unspecified_spell: [ERROR] conjure-up/_unspecified_spell: Failure in deploy done: Deployment errors:
nova-cloud-controller/0: hook failed: "cloud-compute-relation-changed" for nova-compute:cloud-compute
conjure-up/_unspecified_spell: [ERROR] conjure-up/_unspecified_spell: Showing dialog for exception:
Traceback (most recent call last):
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3/dist-packages/conjureup/controllers/deploystatus/common.py", line 40, in wait_for_applications
    raise Exception(result['message'])
Exception: Deployment errors:
nova-cloud-controller/0: hook failed: "cloud-compute-relation-changed" for nova-compute:cloud-compute
@mikemccracken
Copy link
Contributor

@qgriffith Thanks for trying this out, sorry it didn't work smoothy. This is an error from the juju charm that deploys nova on the control node. To debug this further, you'll need to look at juju debug output directly.

looking at juju status should show you the 'cloud-compute-relation-changed' error in the nova-cloud-controller application, and may show other errors that might be good clues, such as errors from any of the machines.

juju debug-log --replay -i unit-nova-cloud-controller-0 will show you all the debug log output from the erroring application. Something might be useful there. Feel free to paste the results here if it's not obvious what's going on and we can try to help.

Good luck

@qgriffith-zz
Copy link
Author

Thank you for your super quick reply and helpful tips. It appears to be something with DNS maybe

unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed Traceback (most recent call last):
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1102, in <module>
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     main()
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1096, in main
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     hooks.execute(sys.argv)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/core/hookenv.py", line 715, in execute
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     self._hooks[hook_name]()
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 618, in compute_changed
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     ssh_compute_add(key, rid=rid, unit=unit)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/nova_cc_utils.py", line 746, in ssh_compute_add
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     hn = get_hostname(private_address)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 465, in get_hostname
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     result = ns_query(rev)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 427, in ns_query
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     answers = dns.resolver.query(address, rtype)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 981, in query
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     raise_on_no_answer, source_port)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 910, in query
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     raise NXDOMAIN
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed dns.resolver.NXDOMAIN
unit-nova-cloud-controller-0: 14:11:24 ERROR juju.worker.uniter.operation hook "cloud-compute-relation-changed" failed: exit status 1
unit-nova-cloud-controller-0: 14:11:24 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook

@qgriffith-zz
Copy link
Author

there was also this error at the start but it seems to continue past it

unit-nova-cloud-controller-0: 14:03:01 ERROR juju.worker.dependency "metric-collect" manifold worker returned unexpected error: failed to read charm from: /var/lib/juju/agents/unit-nova-cloud-controller-0/charm: stat /var/lib/juju/agents/unit-nova-cloud-controller-0/charm: no such file or directory

@mikemccracken
Copy link
Contributor

This doesn't ring any bells immediately, we're looking into it.

@qgriffith-zz
Copy link
Author

I do notice that on the blades MAAS manages DNS is setup like this

nameserver
search maas

In the previous version of MAAS DNS was setup to point to the MAAS server and then forward do the real DNS server. I don't know if that is the issue but something I am looking into.

@adam-stokes
Copy link
Contributor

Not sure if this helps but this is my network setup for MAAS

maas-network

@qgriffith-zz
Copy link
Author

I think I have fixed this by removing the DNS server from the subnet MAAS handles. This will now default all MAAS servers to use MAAS as the DNS server, which then forwards out to my real DNS server

@laralar
Copy link

laralar commented Jul 5, 2018

Report

Thank you for trying conjure-up! Before reporting a bug please make sure you've gone through this checklist:

Please provide the output of the following commands

which juju
juju version

which conjure-up
conjure-up --version

which lxc
/snap/bin/lxc config show
/snap/bin/lxc version

cat /etc/lsb-release

Please attach tarball of ~/.cache/conjure-up:

tar cvzf conjure-up.tar.gz ~/.cache/conjure-up

Sosreport

Please attach a sosreport:

sudo apt install sosreport
sosreport

The resulting output file can be attached to this issue.

What Spell was Selected?

What provider (aws, maas, localhost, etc)?

MAAS Users

Which version of MAAS?

Commands ran

Please outline what commands were run to install and execute conjure-up:

Additional Information

I am experiencing similar issue as #487

Hello. I am experiencing a similar issue

aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/2 ping node16
ping: node16: Temporary failure in name resolution
Connection to 10.3.5.44 closed.
aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/1 ping node16
ping: node16: Temporary failure in name resolution
Connection to 10.3.5.45 closed.
aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/0 ping node16
ping: node16: Temporary failure in name resolution
Connection to 10.3.5.43 closed.
aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/0 ping node16.aibl.lan
PING node16.aibl.lan (10.3.4.16) 56(84) bytes of data.
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=1 ttl=64 time=0.273 ms
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=2 ttl=64 time=0.176 ms
^C
--- node16.aibl.lan ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.176/0.224/0.273/0.050 ms
Connection to 10.3.5.43 closed.
aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/1 ping node16.aibl.lan
PING node16.aibl.lan (10.3.4.16) 56(84) bytes of data.
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=1 ttl=64 time=0.228 ms
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=2 ttl=64 time=0.140 ms
^C
--- node16.aibl.lan ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1014ms
rtt min/avg/max/mdev = 0.140/0.184/0.228/0.044 ms
Connection to 10.3.5.45 closed.
aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/2 ping node16.aibl.lan
PING node16.aibl.lan (10.3.4.16) 56(84) bytes of data.
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=1 ttl=64 time=0.181 ms
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=2 ttl=64 time=0.206 ms
^C
--- node16.aibl.lan ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.181/0.193/0.206/0.018 ms
Connection to 10.3.5.44 closed.
aibladmin@os-client:~/.cache/conjure-up$
nova-cloud-controller/0*  error     idle       2/lxd/2  10.3.5.44       8774/tcp,8778/tcp  hook failed: "cloud-compute-relation-changed"
unit-nova-cloud-controller-0: 13:48:05 INFO unit.nova-cloud-controller/0.juju-log cloud-compute:27: Generating template context for cell v2 share-db
unit-nova-cloud-controller-0: 13:48:05 INFO unit.nova-cloud-controller/0.juju-log cloud-compute:27: Missing required data: novacell0_password novaapi_password nova_password
unit-nova-cloud-controller-0: 13:48:05 DEBUG unit.nova-cloud-controller/0.juju-log cloud-compute:27: OpenStack release, database, or rabbitmq not ready for Cells V2
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed Traceback (most recent call last):
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1183, in <module>
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     main()
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1176, in main
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     hooks.execute(sys.argv)
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/core/hookenv.py", line 823, in execute
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     self._hooks[hook_name]()
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 671, in compute_changed
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     ssh_compute_add(key, rid=rid, unit=unit)
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/nova_cc_utils.py", line 1005, in ssh_compute_add
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     if ns_query(short):
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 478, in ns_query
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     answers = dns.resolver.query(address, rtype)
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 1132, in query
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     raise_on_no_answer, source_port)
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 947, in query
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     raise NoNameservers(request=request, errors=errors)
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed dns.resolver.NoNameservers: All nameservers failed to answer the query node16. IN A: Server 127.0.0.53 UDP port 53 answered SERVFAIL
unit-nova-cloud-controller-0: 13:48:06 ERROR juju.worker.uniter.operation hook "cloud-compute-relation-changed" failed: exit status 1
unit-nova-cloud-controller-0: 13:48:06 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook

I have no issues outside juju containers to resolve the names. it seems the dns-search is failing

The problem that I see is that if I remove the DNS from the MAAS subnet, DHCP will not assign the MAAS DNS and will
Any clues?

Outside the juju containers, I can resolve with or without the DNS name without any issue.
I can also resovle the juju-3c505c-2-lxd-2 with or without the domain name, but not inside the container

What I see is that I have assigned static IP addresses to the nodes (e.g. node16) and if I ssh to that node, I can't resolve the node name

The dns-search is missing from the netplan configuration
ubuntu@node17:$ ping node16
ping: node16: Temporary failure in name resolution
ubuntu@node17:
$

/Also.. It doesn't seem to be MAAS/DNS related issue. it seems that the juju containers dont have the dsn-search property?

ubuntu@juju-41f038-3-lxd-2:~$ ping juju-41f038-1-lxd-0
ping: juju-41f038-1-lxd-0: Temporary failure in name resolution
ubuntu@juju-41f038-3-lxd-2:~$ ping juju-41f038-1-lxd-1
ping: juju-41f038-1-lxd-1: Temporary failure in name resolution
ubuntu@juju-41f038-3-lxd-2:~$ ping juju-41f038-1-lxd-1.aibl.lan
PING juju-41f038-1-lxd-1.aibl.lan (10.3.5.55) 56(84) bytes of data.
64 bytes from juju-41f038-1-lxd-1.aibl.lan (10.3.5.55): icmp_seq=1 ttl=64 time=0.227 ms
64 bytes from juju-41f038-1-lxd-1.aibl.lan (10.3.5.55): icmp_seq=2 ttl=64 time=0.148 ms
^C
--- juju-41f038-1-lxd-1.aibl.lan ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1006ms
rtt min/avg/max/mdev = 0.148/0.187/0.227/0.041 ms
ubuntu@juju-41f038-3-lxd-2:~$

@tux-box
Copy link

tux-box commented Feb 23, 2019

I have this same problem. When I checked the resolv.conf I found that "options edns0" while ifconfig shows eth0.
Not sure if this is related or not.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants