Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crashes with Service of type ExternalName #211

Merged
merged 6 commits into from
Dec 19, 2017

Conversation

maxlaverse
Copy link
Contributor

Service of type ClusterIP have a selector field but those of type ExternalName don't.
Trying to deploy such s service with kubernetes-deploy leads to a crash:

[FATAL][2017-11-24 16:53:19 +0100]	No actions taken
.../lib/kubernetes-deploy/kubernetes_resource/service.rb:62:in `selector': undefined method `map' for nil:NilClass (NoMethodError)
	from .../lib/kubernetes-deploy/kubernetes_resource/service.rb:72:in `fetch_related_replica_count'
	from .../lib/kubernetes-deploy/kubernetes_resource/service.rb:15:in `sync'
	from .../lib/kubernetes-deploy/concurrency.rb:13:in `each'

This PR modifies the behavior of the kubernetes-deploy service resources to succeed automatically by not searching to look at inexistent corresponding pods.

@maxlaverse maxlaverse force-pushed the fix_externalname_service branch 2 times, most recently from 6872379 to bf03510 Compare November 24, 2017 16:35
Copy link
Contributor

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! I have one change request. Can you please also pick up the latest changes from our master? I'm having trouble running CI because you're missing this commit.

@@ -21,6 +21,8 @@ def status
end

def deploy_succeeded?
return true if external_name?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be return exists? if external_name?--if the object somehow wasn't created, it won't be working.

@maxlaverse maxlaverse force-pushed the fix_externalname_service branch 4 times, most recently from 47f63c4 to bf91833 Compare November 30, 2017 09:46
@peiranliushop
Copy link
Contributor

We can address the "undefined method" exception for selector generically.
In stead of targeting a special case of services without selectors, the ExternalName service, we can check if selector exists before mapping. If so, set it to "". In effect, kubectl will ignore option selector when its value is an empty string.

The change in service.rb can be focused on a single method as such. I am new to Ruby, the code may have space for improvement.

class Service < KubernetesResource
...
    def selector
      # If service is without selectors, set selector to empty string which is the default value for
      # rubectl option "--selector". 
      if @definition["spec"].key?("selector")
        @selector ||= @definition["spec"]["selector"].map { |k, v| "#{k}=#{v}" }.join(",")
      else
        @selector ||= ""
      end
      @selector
    end

@KnVerey
Copy link
Contributor

KnVerey commented Dec 1, 2017

@peiranliushop's solution would still require a change in deploy_succeeded? as well so that it doesn't incorrectly return false, but I agree that fixing selector is more elegant than early returns in the _count methods. The method body could be: @definition["spec"].fetch("selector", []).map { |k, v| "#{k}=#{v}" }.join(",").

@maxlaverse maxlaverse force-pushed the fix_externalname_service branch 2 times, most recently from e248930 to a95e367 Compare December 3, 2017 11:14
@maxlaverse
Copy link
Contributor Author

Changed

@KnVerey
Copy link
Contributor

KnVerey commented Dec 13, 2017

I was really surprised to see CI pass since you didn't leave a change in deploy_succeeded? to prevent it from incorrectly returning false, per my comment. The fact that it did in fact pass points out to me that our suggestion isn't valid after all, because an empty selector selects everything, not nothing. This means that whether the service will succeed in this PR depends on whether there are other, completely unrelated pods running. In the end, I'm now thinking we need something like:

    def deploy_succeeded?
      return exists? unless requires_endpoints?
      selects_some_pods?
    end
    # ...
    def requires_endpoints?
      return false if external_name_svc?
      return true if @related_deployment_replicas.blank?
      @related_deployment_replicas > 0
    end

  # The version of `def selector` you have now

    def fetch_related_pod_count
      return 0 unless selector.present?
     # ...

    def fetch_related_replica_count
      return 0 unless selector.present?
     # ...

Plus a new status string for external name services, since talking about pods doesn't really make sense for them.

Do you agree? Sorry for the confusion.

@maxlaverse maxlaverse force-pushed the fix_externalname_service branch 3 times, most recently from 265eb3d to 1f528bf Compare December 17, 2017 19:19
@maxlaverse
Copy link
Contributor Author

Hi @KnVerey
I hadn't thought about a custom status and that's a good idea. It looks better with it.
I rebased and tested this PR by deploying normal services and headless services. Everything went fine.

I'm must confess I have read this class a few time and I forget everytime how it works :/

Why would we declare that endpoints are required if @related_deployment_replicas is blank ?
The three reasons for a blank @related_deployment_replicas are:

  • sync not called (isn't it always called?)
  • the command to get the deployment failed
  • no deployments were found (or more than 1)

@KnVerey
Copy link
Contributor

KnVerey commented Dec 17, 2017

Why would we declare that endpoints are required if @related_deployment_replicas is blank ?

You're exactly right about the reasons it might be blank. All three cases involved unexpected results, so what we're talking about here is really what the default should be. My reasoning is the most common use cases for services do require endpoints, making that the saner default. Also, if we are missing data for some reason (e.g. the deployment command failed), the logic overall should make us wait to try again on the next sync rather than potentially succeeding/failing prematurely.

@@ -11,7 +11,7 @@ def test_full_hello_cloud_set_deploy_succeeds
"Deploying ConfigMap/hello-cloud-configmap-data (timeout: 30s)",
"Hello from the command runner!", # unmanaged pod logs
"Result: SUCCESS",
"Successfully deployed 14 resources"
"Successfully deployed 15 resources"
], in_order: true)

assert_logs_match_all([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add an assertion for the logs about your new fixture here? (ie. %r{Service/redis-external\s+Doesn't require any endpoint})

Copy link
Contributor

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I think this likely also fixes #167, but we'd need to add another fixture to be sure. If you'd rather just get this in, I can look into that after we merge it.

@@ -44,28 +47,39 @@ def exposes_zero_replica_deployment?
@related_deployment_replicas == 0
end

def requires_endpoints?
return false if external_name_svc? || @related_deployment_replicas.blank?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you disagree with my reasoning that this should be

return false if external_name_svc?
return true if @related_deployment_replicas.blank? # problem counting replicas - by default, assume endpoints are required

@KnVerey
Copy link
Contributor

KnVerey commented Dec 18, 2017

If @peiranliushop has no objections, I'll merge and release this tomorrow morning.

@@ -11,16 +11,19 @@ def sync
end

def status
if @num_pods_selected.blank?
if !requires_endpoints?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that an endpoint is not required when the service is in type ExternalName or deployment replica is 0?
The if statement here is not quite straightforward, some comments will help to clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that an endpoint is not required when the service is in type ExternalName or deployment replica is 0?

Yes, that's exactly it, and that already seems pretty clear to me by reading this method's name and definition. Here, making the status say "Doesn't require any endpoints" when !requires_endpoints? also seems straightforward. I wouldn't add any more comments.

"Failed to count related pods"
elsif selects_some_pods?
"Selects at least 1 pod"
else
"Selects 0 pods"
"Unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elsif condition selects_some_pods? applies to when pods are found.
The else condition should cover the case when no pods are found for the service selector while replica is > 0. Besides that, are there are cases that will fall into the else clause? Message "unknown" is not clear to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it seems like we'll only arrive here if we expected to select some pods but saw <= 0. If that's true, Selects 0 pods might still be a better message, but otoh I don't hate Unknown since this is a catch-all fallback case. I think you have a point, but I wouldn't block merging the PR on this if the author feels differently.

Copy link
Contributor Author

@maxlaverse maxlaverse Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the string with Unknown to be consistent with the other services, but I hadn't realized we could have this message for good reasons.

end
end

def deploy_succeeded?
return exists? unless requires_endpoints?
Copy link
Contributor

@peiranliushop peiranliushop Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The service must be found no matter if the service is external or not. Line 26 says that if the service is external then it is deployed successfully. It may not be true."
Just looked at the code again. The logic is correct. Please ignore my above comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service must be found no matter if the service is external or not.

Do you mean L28 should be exists? && (exposes_zero_replica_deployment? || selects_some_pods?)? It's true that the pod selection doesn't inherently prove the server-side service resource was created, though that problem would predate this PR. Come to think of it, this could now just be exists? && selects_some_pods? since exposes_zero_replica_deployment? is already covered by the requires_endpoints? check.

Line 26 says that if the service is external then it is deployed successfully. It may not be true.

What do you mean? If the service is external, afaik there's no good way to check whether it is in fact available, so server-side existence is the best indicator we have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ignore my comment about line 26. I was confused. The code is right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@KnVerey
Copy link
Contributor

KnVerey commented Dec 20, 2017

@maxlaverse FYI this fix is included in v0.14.1, which we just released.

@maxlaverse maxlaverse deleted the fix_externalname_service branch December 20, 2017 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants