-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fail with useful error on unmatched interface (bsc#990745) #575
Conversation
map_if_ref(if_remap, if_ref) | ||
r = map_if_ref(if_remap, if_ref) | ||
if r.nil? | ||
raise "Unable to find interface for #{node.fqdn} matching #{if_ref} out of #{if_remap}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. 105/100
In particular, I need someone to confirm that there aren't any scenarios where it's OK to have an unmatched interface pattern. Because if there are, obviously this will break them. |
1b13bda
to
058a7f5
Compare
I might be wrong, but I think it's actually okay to have this be |
058a7f5
to
58c7863
Compare
# intf_to_if_map precalculated by #build_node_map if you are | ||
# calling this in a loop and want to avoid it being calculated | ||
# on each call. | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/TrailingWhitespace: Trailing whitespace detected. (https://github.com/bbatsov/ruby-style-guide#no-trailing-whitespace)
@vuntz I think you're probably right. I've drastically improved the PR. |
b5b6dd2
to
d0192ac
Compare
Not convinced that gating failure was related to the changes - forced a rebuild. |
lgtm +1 |
@aspiers Yes, I have this on my radar. Just didn't find the time yet to give it the attention it deserves :( |
d0192ac
to
65bdf9f
Compare
desired.downto(0,&filter) | ||
when "?" | ||
(desired..speeds.length).each(&filter) | ||
desired.downto(0,&filter) unless found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/SpaceAfterComma: Space missing after comma. (https://github.com/bbatsov/ruby-style-guide#spaces-operators)
65bdf9f
to
12da87d
Compare
@rhafer OK thanks :) Rebased, slightly improved one of the comments, and fixed some broken indentation. |
r = map_if_ref(if_remap, if_ref) | ||
if r.nil? | ||
Chef::Log.warn "Unable to find interface for #{node.fqdn} " \ | ||
"matching #{if_ref} out of #{if_remap}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @vuntz already mentioned it's perfectly fine to have this nil
here for some of the conduits. As long as that conduit is not used on the node (i.e. not network using that conduit is allocated/enabled on that node). I such a case this will cause a lot of unnecessary warnings (build_node_map
is called quite frequently) in the logs. I fear that it will just confuse people.
Wouldn't be better to throw a more useful error message for cases where the conduit is actually used on a node (e.g. here: https://github.com/crowbar/crowbar-core/blob/master/chef/cookbooks/network/recipes/default.rb#L161)
Or even when trying to allocate/enable a network already (i.e. in the rails app)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I such a case this will cause a lot of unnecessary warnings (build_node_map is called quite frequently) in the logs. I fear that it will just confuse people.
Yeah OK, fair point.
Wouldn't be better to throw a more useful error message for cases where the conduit is actually used on a node
Sure, that's why this PR also adds this error.
Or even when trying to allocate/enable a network already (i.e. in the rails app)
You mean, somewhere like in NetworkService#allocate_ip_by_type
? That seems to get all its network info from the databag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhafer Would it be good enough to change this Chef::Log.warn
into a .debug
? Also, I'm still not sure exactly where else you are suggesting that errors should be added if the interface is nil
? (See question above.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be good enough to change this Chef::Log.warn into a .debug?
Yeah, I guess that's fine.
You mean, somewhere like in NetworkService#allocate_ip_by_type?
Yes. And NetworkService#enable_interface
. The advantage of doing proper checks there would be that we could actually fail with an error much earlier than during the chef-client run. (Note: I am not say that you should be the one implementing that, it just an idea that I had when reviewing your code)
That seems to get all its network info from the databag.
Yeah, it would need to be enhanced to call into NodeObject#build_node_map
. Which reminds me that might want to add the debug message there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it's now .debug
.
Based on our follow-on IRC discussion about this, I've created Add network interface validation to NetworkService#build_net_info (Trello).
12da87d
to
8afe7e2
Compare
+1 @aspiers I guess you want to remove the "wip" and "do not merge labels"? |
8afe7e2
to
72888c1
Compare
Rebased to fix conflict. |
It's amazing that this critical part of Crowbar's core code has remained cryptic for so long. Let's make a start at documenting it so that future generations don't suffer the same pain.
This isn't technically a bug, because in Ruby: a, b, c = [nil, nil] will result in c being assigned nil. However it's more correct to be consistent in the number of elements the method returns.
If we fail to find an interface matching a pattern such as "+10g1" for a given node, give some useful debugging. We don't make it a warning, because there are cases where we expect this to happen, e.g. with rarely used networks such as 'os_sdn'. This helps more quickly identify incorrectly configured network.json files, and was experienced in a scenario where the Crowbar nodes were a mix of physical machines and VMs with slow virtio interfaces. https://bugzilla.suse.com/show_bug.cgi?id=990745
Catch the case where we failed to find an admin interface for a node, and bail with a more helpful error message. This was experienced in a scenario where the Crowbar nodes were a mix of physical machines and VMs with slow virtio interfaces, but the network.json was incorrectly configured and the admin network conduit failed to match the slow virtio interfaces. https://bugzilla.suse.com/show_bug.cgi?id=990745
"rescue nil" is not only a really bad habit, but it's completely unnecessary when looking up a value in a Hash.
72888c1
to
15bc4b2
Compare
Failed gating build has been deleted, so rebased to trigger another one. |
Bah, @vuntz completely rewrote this so rebasing is going to be ugly... One nice thing about gerrit is that it warns you when reviews conflict with each other. #githubsucks |
@vuntz I had a quick attempt to rebase and gave up as everything looks different. Let's fix this together this week. |
if interface.nil? | ||
raise "Failed to find admin node interface for node #{node.fqdn}! " \ | ||
"Check your network barclamp proposal." | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this: no objection about this part, as it's no regression, but I think it's not enough. We do want to avoid a crash in chef-client here, as having a rogue node could result, thanks to this, in a DoS on part of the deployment infrastructure. I'll submit a patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #803
@vuntz @rhafer Please can you either rebase this or help me rebase it? I put a lot of effort into adding useful comments to make cryptic code readable, but since you rewrote things completely by moving a lot of stuff to |
Obsoleted by #803 |
This is a forward-port of the first commit of crowbar#575 (893bfbb) in an attempt to make this code even clearer. crowbar#575 never got merged because it took too long to review it, and in the meantime all of this code got significantly rewritten during extraction of the conduit resolver code into a library to be shared between Chef and Crowbar. The other (fix) part of crowbar#575 was superseded by crowbar#803.
Looks like I'll have to redo this myself. Resubmitted as #1082. |
If we fail to find an admin interface matching a pattern such as
+10g1
for a given node, don't proceed blindly, but instead bail with a helpful error message. This was experienced in a scenario where the Crowbar nodes were a mix of physical machines and VMs with slowvirtio
interfaces, but thenetwork.json
was incorrectly configured and the admin network conduit failed to match the slowvirtio
interfaces.Also add a warning whenever a conduit cannot be matched against available interfaces, to help more quickly identify incorrectly configured
network.json
files, and de-obfuscate the code with comments and better variable names.https://bugzilla.suse.com/show_bug.cgi?id=990745