New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UNTESTED] fix selection of cinder-volume node #1225
Conversation
PR SUSE-Cloud#1195 introduced breakage in Jenkins openstack-mkcloud, because it can cause values of `cinder_volume` such as "d52-54-77-77-77-03.ve1.cloud.suse.de\nd52-54-77-77-77-04.ve1.cloud.suse.de", i.e. multiple hostnames delimited by "\n" rather than comma-separated like is correctly done with the other barclamps above. So instead we assign to an array breaking on whitespace, so that the first host is correctly picked. Here's an example failure: W, [2016-09-08T20:36:56.103076 #8898:0x00000005f37410] WARN -- Could not recover Chef Crowbar Node on load d52-54-77-77-77-03.ve1.cloud.suse.de d52-54-77-77-77-04.ve1.cloud.suse.de: #<URI::InvalidURIError: bad URI(is not URI?): http://localhost:4000/nodes/d52-54-77-77-77-03.ve1.cloud.suse.de d52-54-77-77-77-04.ve1.cloud.suse.de> I, [2016-09-08T20:36:56.103681 #8898:0x00000005f37410] INFO -- Completed 500 Internal Server Error in 85ms (ActiveRecord: 8.0ms) F, [2016-09-08T20:36:56.104726 #8898:0x00000005f37410] FATAL -- NoMethodError (undefined method `[]' for nil:NilClass): app/models/service_object.rb:744:in `block (2 levels) in violates_exclude_platform_constraint?' app/models/service_object.rb:743:in `each' app/models/service_object.rb:743:in `any?' app/models/service_object.rb:743:in `block in violates_exclude_platform_constraint?' app/models/service_object.rb:739:in `each' app/models/service_object.rb:739:in `violates_exclude_platform_constraint?' app/models/service_object.rb:805:in `block in validate_proposal_constraints' app/models/service_object.rb:778:in `each' app/models/service_object.rb:778:in `validate_proposal_constraints' app/models/service_object.rb:669:in `validate_proposal_after_save' app/models/cinder_service.rb:204:in `validate_proposal_after_save' app/models/service_object.rb:547:in `save_proposal!' app/models/service_object.rb:871:in `_proposal_update' app/models/service_object.rb:526:in `proposal_edit' app/controllers/barclamp_controller.rb:563:in `proposal_update' The problem occurs in this line in ServiceObject#violates_exclude_platform_constraint?: node = NodeObject.find_node_by_name(element) Chef::Node.load raises the URI::InvalidURIError exception which gets caught, turned into a warning, and then nil is returned for the node, causing the NoMethodError soon after.
+1 |
Could this be the reson for this mkcloud failure? https://ci.suse.de/job/openstack-mkcloud/33567/console |
@rsalevsky Yes, I think so, good catch! |
@rsalevsky It's definitely the reason for that failure. |
+1 |
+1 |
Someone (maybe me) needs to verify that this really does the right thing before merging. |
Without this fix ha deployment for GM5+up and GM6+up always fails with: qa_crowbarsetup.sh: line 3032: [: too many arguments With the fix mkcloud run completes successfully including tempest smoke test run. Side effect is exposing other small problems like: Starting proposal nova(default) at: Tue Sep 13 10:12:43 UTC 2016 Starting proposal ceilometer(default) at: Tue Sep 13 10:20:33 UTC 2016 |
@ellisab Thanks for the useful info! Can you share the logs from an The gating check for this PR passed, but it ran without |
I would say
is not a small problem, it's a potentially bad bug. |
The logs from @ellisab didn't have debug enabled, so although it very much looks like this PR is working, I can't verify 100%. However the gating run I triggered is providing the required info. |
Gate failed but I think I might have messed up the build parameters. |
@aspiers Was there some progress? It really creates blockers for C5 and C6. |
@rsalevsky @rhafer The gate appears to be failing because |
@aspiers Thanks. I am currently trying to come up with a more complete fix for this. (AFAICS the same or similar issues are there for ceilometer and nova). |
I've created #1255 which is rather similar to this, but also tries to avoid similar issues for ceilometer and nova. The
error was btw caused by missing quotation in the |
It's returning no node because there were no unclustered SLE12-SP2 nodes left in the deployment. It was a 4 node setup using HA and ceph. As ceph currently needs two SP1 nodes there's nothing left to deploy cinder-volume, ceilometer-agent and nova-compute to. So the fix would be to either use at least 5 nodes or deploy with want_ceph=0. |
PR #1195 introduced breakage in Jenkins openstack-mkcloud, because it can cause values of
cinder_volume
such as"d52-54-77-77-77-03.ve1.cloud.suse.de\nd52-54-77-77-77-04.ve1.cloud.suse.de"
, i.e. multiple hostnames delimited by"\n"
rather than comma-separated like is correctly done with the other barclamps above.So instead we assign to an array breaking on whitespace, so that the first host is correctly picked.
Here's an example failure:
The problem occurs in this line in
ServiceObject#violates_exclude_platform_constraint?
:Chef::Node.load
raises theURI::InvalidURIError
exception which gets caught, turned into a warning, and thennil
is returned for the node, causing theNoMethodError
soon after.