container ssa: use readiness prob #14578

enoodle · 2017-03-30T14:13:38Z

Instead of accessing the /healthz endpoint we can use kubernetes's
readinessProbe to do this for us and use the standard API instead of the
pod proxy which is less documented. This also simplifies the code.

This will hopefully solve two long outstanding BZs:
https://bugzilla.redhat.com/show_bug.cgi?id=1384629
https://bugzilla.redhat.com/show_bug.cgi?id=1371803

enoodle · 2017-03-30T14:13:58Z

@simon3z @moolitayer @ilackarms please review

cben · 2017-03-30T14:55:43Z

Sweet 👍

After the pod is ready, we'll still use the proxy URL to analyze the image, right? In one of the BZs conclusion was misconfigured proxy (?) - would this help there?

ilackarms · 2017-03-30T15:06:31Z

app/models/manageiq/providers/kubernetes/container_manager/scanning/job.rb

-    case response
-    when Net::HTTPOK
+    begin
+      ready = kubernetes_client.get_pod(options[:pod_name], options[:pod_namespace])[:status][:containerStatuses][0][:ready]


how are containerStatuses sorted by default? will [0] always be the most recent status? will [0] always exist (array len >= 1)?

Its by containers. Each container has one status, so if a pod had two containers in it this list would be of length 2. because we only have one container we can safely use [0].

sounds good. i wasn't sure if it was an historical list of statuses; this makes more sense

We should think to improve this using the kubernetes watch API (on the pod).

@simon3z As far as I understand, the watch API will hang on the http response waiting for new updates. This means that the worker that is executing the job will hang until the image-inspector scan is complete. It means that parallel scans will be less "parallel" and the scaling will be damaged.
If there is a better way of using the watch API we should consider it but from what I have seen using curl and Kubeclient [1] we have to hang and wait on the connection.

[1]https://github.com/abonas/kubeclient/blob/master/lib/kubeclient/watch_stream.rb#L14

@enoodle shouldn't we protect against possible nils here: [:status][:containerStatuses][0][:ready] ?

@simon3z discussed this with Mooli: #14578 (comment)
The ready field is always there, I will add a check for containerStatuses and will log this before trying again (Assuming it is a temporary glitch from Openshift) instead of failing with a "method missing" cryptic error. I think this state is not possible for a running pod, but it might not be documented because I couldn't find it.

enoodle · 2017-04-02T12:14:38Z

@miq-bot add_label bug

moolitayer · 2017-04-02T13:07:34Z

app/models/manageiq/providers/kubernetes/container_manager/scanning/job.rb

+    rescue SocketError, KubeException => e
+      msg = "unknown access error to pod #{pod_full_name}: #{e.message}"
+      _log.info(msg)
+      queue_signal(:abort_job, msg, "error")


Won't this keep queuing pod_wait if there is an error? (line 99)
Maybe you need:

return queue_signal(:abort_job, msg, "error")

Also do you mind adding brackets around e.message:

msg = "unknown access error to pod #{pod_full_name}: [#{e.message}]"

It makes nils really obvious

👍 Good catch, Thanks

@moolitayer If there was an error then ready won't be set, but I will add the "return" for clarity

Yes ready will not be set and you would schedule another check...

enoodle · 2017-04-02T14:20:25Z

@cben

After the pod is ready, we'll still use the proxy URL to analyze the image, right? In one of the BZs conclusion was misconfigured proxy (?) - would this help there?

Yes, I asked and seems to help with at least one of the mentioned BZs. The problem was that a "transparent" proxy was changing error messages, but now we are not based on error messages in our happy flow. If an error message from Kubernetes/Openshift is being switched then we will catch the exception and quit with the proxy's message, which is the best outcome IMO when we don't get the real message.

moolitayer · 2017-04-02T15:19:14Z

@enoodle should we expect the usual ephemeral missing method X for object nil that we get when k8s returns hashes that are missing elements? (what happens when you GET a pod right after creation, will it have containerStatuses?)

enoodle · 2017-04-03T09:17:01Z

@moolitayer Generally we should always have a containerStatus per container [1], if not then there is a problem (even if the pod is just creating). The ready flag is always present [2] so that too is kind of safe.

[1]https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2504
[2]https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L1848

moolitayer · 2017-04-09T10:08:54Z

@enoodle BTW I think there is some code in openshift ansible we would be able to throw away in version that have this change. I'm referring to the code adding the pod/proxy * resource to management-infra-admin SA

simon3z · 2017-04-11T15:32:50Z

@ilackarms can give this a try and review?
There is a pending comment but overall LGTM.

ilackarms · 2017-04-11T20:53:44Z

i tested this locally and could not load Compute > Containers > Providers tab of the MIQ dashboard. Not sure if this is only my problem, maybe someone else can reproduce? Screenshot attached

enoodle · 2017-04-12T14:12:04Z

@ilackarms This is an unrelated error due to some changes done by the UI team (IIUC). make sure you have #14563

ilackarms · 2017-04-12T14:30:50Z

@enoodle problem solved; LTGM

Instead of accessing the /healthz endpoint we can use kubernetes's readinessProbe to do this for us and use the standard API instead of the pod proxy which is less documented. This also simplifies the code.

miq-bot · 2017-04-12T14:42:58Z

Checked commit enoodle@4f5edf9 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
2 files checked, 0 offenses detected
Everything looks good. 👍

simon3z

LGTM 👍
@miq-bot assign roliveri

simaishi · 2017-04-24T13:12:21Z

@enoodle @roliveri should this be fine/yes?

roliveri · 2017-04-24T19:47:52Z

Sorry, not sure.

@simon3z should this be fine/yes?

simon3z · 2017-04-24T19:58:30Z

@simaishi @roliveri I would like to wait some more time to make sure everything is OK before backporting this to fine. I targeted the BZ to 5.8.1.

simaishi · 2017-06-14T16:01:25Z

@simon3z I'm backporting PRs for 5.8.1 now, please confirm this should now be fine/yes

simon3z · 2017-06-14T18:43:02Z

@miq-bot add_label fine/yes
cc @enoodle @simaishi

container ssa: use readiness prob (cherry picked from commit 08c3090) https://bugzilla.redhat.com/show_bug.cgi?id=1461558

simaishi · 2017-06-14T19:12:25Z

Fine backport details:

$ git log -1
commit 4769a2390063743d227386fc963cc9fd1b5e77fa
Author: Richard Oliveri <oliveri.richard.github@gmail.com>
Date:   Mon Apr 17 11:12:11 2017 -0400

    Merge pull request #14578 from enoodle/container_ssa_use_readiness_probe
    
    container ssa: use readiness prob
    (cherry picked from commit 08c3090302d4c7fa747781dc3699e502194adb62)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1461558

enoodle force-pushed the container_ssa_use_readiness_probe branch 2 times, most recently from 5f1a606 to a7f6f34 Compare March 30, 2017 14:19

ilackarms reviewed Mar 30, 2017

View reviewed changes

chessbyte assigned simon3z Mar 30, 2017

chessbyte added enhancement providers/containers core/smart state labels Mar 30, 2017

enoodle force-pushed the container_ssa_use_readiness_probe branch 2 times, most recently from bc7dd35 to c32a94e Compare April 2, 2017 11:53

miq-bot added the bug label Apr 2, 2017

moolitayer reviewed Apr 2, 2017

View reviewed changes

enoodle force-pushed the container_ssa_use_readiness_probe branch from c32a94e to 6bc2d2d Compare April 2, 2017 13:12

chessbyte removed the enhancement label Apr 3, 2017

container ssa: use readiness prob

4f5edf9

Instead of accessing the /healthz endpoint we can use kubernetes's readinessProbe to do this for us and use the standard API instead of the pod proxy which is less documented. This also simplifies the code.

enoodle force-pushed the container_ssa_use_readiness_probe branch from 6bc2d2d to 4f5edf9 Compare April 12, 2017 14:40

simon3z approved these changes Apr 12, 2017

View reviewed changes

roliveri self-assigned this Apr 17, 2017

roliveri merged commit 08c3090 into ManageIQ:master Apr 17, 2017

chessbyte unassigned simon3z Jun 7, 2017

chessbyte added this to the Sprint 59 Ending Apr 24, 2017 milestone Jun 7, 2017

miq-bot added the fine/yes label Jun 14, 2017

simaishi pushed a commit that referenced this pull request Jun 14, 2017

Merge pull request #14578 from enoodle/container_ssa_use_readiness_probe

4769a23

container ssa: use readiness prob (cherry picked from commit 08c3090) https://bugzilla.redhat.com/show_bug.cgi?id=1461558

simaishi added fine/backported and removed fine/yes labels Jun 14, 2017

enoodle mentioned this pull request Jul 13, 2017

remove obsolete pod_health_poll ManageIQ/manageiq-providers-kubernetes#61

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

container ssa: use readiness prob #14578

container ssa: use readiness prob #14578

enoodle commented Mar 30, 2017

enoodle commented Mar 30, 2017

cben commented Mar 30, 2017

ilackarms Mar 30, 2017

enoodle Mar 30, 2017

ilackarms Mar 30, 2017

simon3z Mar 31, 2017

enoodle Apr 2, 2017

simon3z Apr 10, 2017

enoodle Apr 12, 2017

enoodle commented Apr 2, 2017

moolitayer Apr 2, 2017

enoodle Apr 2, 2017

enoodle Apr 2, 2017

moolitayer Apr 2, 2017

enoodle commented Apr 2, 2017

moolitayer commented Apr 2, 2017

enoodle commented Apr 3, 2017 •

edited

moolitayer commented Apr 9, 2017

simon3z commented Apr 11, 2017

ilackarms commented Apr 11, 2017

enoodle commented Apr 12, 2017

ilackarms commented Apr 12, 2017

miq-bot commented Apr 12, 2017

simon3z left a comment

simaishi commented Apr 24, 2017

roliveri commented Apr 24, 2017

simon3z commented Apr 24, 2017

simaishi commented Jun 14, 2017

simon3z commented Jun 14, 2017

simaishi commented Jun 14, 2017

container ssa: use readiness prob #14578

container ssa: use readiness prob #14578

Conversation

enoodle commented Mar 30, 2017

enoodle commented Mar 30, 2017

cben commented Mar 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enoodle commented Apr 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enoodle commented Apr 2, 2017

moolitayer commented Apr 2, 2017

enoodle commented Apr 3, 2017 • edited

moolitayer commented Apr 9, 2017

simon3z commented Apr 11, 2017

ilackarms commented Apr 11, 2017

enoodle commented Apr 12, 2017

ilackarms commented Apr 12, 2017

miq-bot commented Apr 12, 2017

simon3z left a comment

Choose a reason for hiding this comment

simaishi commented Apr 24, 2017

roliveri commented Apr 24, 2017

simon3z commented Apr 24, 2017

simaishi commented Jun 14, 2017

simon3z commented Jun 14, 2017

simaishi commented Jun 14, 2017

enoodle commented Apr 3, 2017 •

edited