Readiness Probe #355

OperationalDev · 2022-10-17T09:02:59Z

Challenge

When restarting authorino deployments (with 3 replicas and a pdb), for a few brief seconds, requests are denied with a 403.

Solution

Have a healtz endpoint that can be used as a readiness probe that is true when all of the auth configs have been loaded.

guicassolato · 2022-10-17T09:21:41Z

@OperationalDev, are you using the Status block of the AuthConfig CRs to check for readiness? Conditions Available and Ready will tell you respectively when at least one host name listed in the AuthConfig has been indexed or all of them. The Summary field provides you with more details about which host names have been indexed (amongst other stuff).

(Not saying we couldn't use your suggestion to enhance the health check endpoint, BTW!)

OperationalDev · 2022-10-17T09:29:36Z

@guicassolato I am not using them. Can you give me some guidance as to how I might use them in this scenario to ensure 1 pod is always available?

guicassolato · 2022-10-17T10:00:48Z

@OperationalDev, it's a per-CR check and it won't give you the readiness state for the entire deployment, nor of any arbitrary pod for that matter. Rather, it gives you the state of a particular CR in the index, from the perspective of the leader replica, i.e. according to the one pod that won the dispute amongst all 3 replicas to become the leader Authorino replica and therefore it's responsible for updating the status sub-resource of the AuthConfigs. Unfortunately, we haven't yet implemented any synchronization between pods for the status update; that's why the value in the status sub-resource reflects only what the leader knows. On the other hand, it is relatively safe to assume that what the leader knows is either identical or no more than a couple milliseconds different (ahead or behind) what the other replicas know.

Checking the status sub-resource of a particular AuthConfig is straightforward. For example, given the AuthConfig from this example applied to the default namespace of the cluster, .status.summary.ready tells you if Authorino is ready to receive authz requests for the AuthConfig:

kubectl get authconfig/talker-api-protection -o jsonpath='{.status.summary.ready}'

Output:

true

In the example above, the AuthConfig lists only one host name, i.e. talker-api-authorino.127.0.0.1.nip.io. In this case, being available (i.e. "at least one host name of the AuthConfig linked in the index") equals being ready (i.e. "all host names of the AuthConfig linked in the index"). In some other cases where you have more than one host name listed in the AuthConfig, due to avoiding host name collisions, you can run into situations where an AuthConfig is available but not ready. Because of that, you may want to check as well:

kubectl get authconfig/talker-api-protection -o jsonpath='{.status.conditions}'

Output:

[{"lastTransitionTime":"2022-10-17T09:37:36Z","reason":"HostsLinked","status":"True","type":"Available"},{"lastTransitionTime":"2022-10-17T09:37:36Z","reason":"Reconciled","status":"True","type":"Ready"}]

...and/or

kubectl get authconfig/talker-api-protection -o jsonpath='{.status.summary.hostsReady}'

Output:

["talker-api-authorino.127.0.0.1.nip.io"]

OperationalDev · 2022-10-18T11:41:22Z

@guicassolato It's not clear to me how I would use the status of the authconfigs to know which of the authorino pods are ready to serve requests?

guicassolato · 2022-10-18T11:56:12Z

@OperationalDev , sorry if I wasn't clear before. The status block in the authconfigs won't help you with which authorino pods are ready. Instead, it can only tell you whether the authconfig is ready in a particular authorino pod, specifically the leader one. My point from before is that the difference between being ready in the leader pod and being in any other pod should no more than a couple milliseconds away.

This is not ideal. I know! But hopefully it's enough to mitigate those 403s a little bit.

OperationalDev · 2022-10-19T01:46:13Z

Ok ok, sorry I misunderstood, thank you for clarifying.

alexsnaps · 2022-10-27T13:46:07Z

While not completely overlapping, it might be nice to consider this in the context of this issue: Kuadrant/kuadrant-operator#96

Implements health and readiness probe endpoints for the controllers, reporting particularly the aggregated state of the AuthConfigs. New endpoints: - `/healthy`: Health probe (ping) - `/readyz`: Aggregated readiness probe (only AuthConfig reconciler currently reporting) - `/readyz/authconfigs`: Aggregated status of the AuthConfigs The default binding network address is `:8081`. It can be changed using the newly introduced flag (command-line arg) `--health-probe-addr`. The endpoints return either `200` ("ok") or `500` when 1+ probes fail. The query string parameters `verbose=true` and `exclude=authconfigs` are supported respectively to provide more verbose responses and exclude a particular probe ("authconfigs" in the example provided). Closes #355

OperationalDev · 2022-12-13T07:38:03Z

Just wanted to say thanks for the quick turn around time on this, working as expected.

alexsnaps added kind/enhancement New feature or request target/current labels Oct 27, 2022

alexsnaps mentioned this issue Nov 8, 2022

Spec the status reporting behaviour Kuadrant/kuadrant-operator#96

Closed

4 tasks

guicassolato mentioned this issue Nov 18, 2022

Readiness probe #365

Merged

guicassolato closed this as completed in #365 Nov 25, 2022

guicassolato mentioned this issue Dec 14, 2022

Configure Authorino using command-line flags Kuadrant/authorino-operator#103

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readiness Probe #355

Readiness Probe #355

OperationalDev commented Oct 17, 2022

guicassolato commented Oct 17, 2022

OperationalDev commented Oct 17, 2022

guicassolato commented Oct 17, 2022

OperationalDev commented Oct 18, 2022

guicassolato commented Oct 18, 2022

OperationalDev commented Oct 19, 2022

alexsnaps commented Oct 27, 2022

OperationalDev commented Dec 13, 2022

Readiness Probe #355

Readiness Probe #355

Comments

OperationalDev commented Oct 17, 2022

Challenge

Solution

guicassolato commented Oct 17, 2022

OperationalDev commented Oct 17, 2022

guicassolato commented Oct 17, 2022

OperationalDev commented Oct 18, 2022

guicassolato commented Oct 18, 2022

OperationalDev commented Oct 19, 2022

alexsnaps commented Oct 27, 2022

OperationalDev commented Dec 13, 2022