Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for API server health checks #21185

Merged
merged 2 commits into from Jul 27, 2020

Conversation

johscheuer
Copy link
Contributor

This PR adds some documentation to the different "health" endpoints of the Kubernetes API server.

Initial implementation PR: kubernetes/kubernetes#78458

Fixes: #20888

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 25, 2020
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels May 25, 2020
@netlify
Copy link

netlify bot commented May 25, 2020

Deploy preview for kubernetes-io-master-staging ready!

Built with commit f1588c0

https://deploy-preview-21185--kubernetes-io-master-staging.netlify.app

@sftim
Copy link
Contributor

sftim commented May 25, 2020

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label May 25, 2020
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @johscheuer

Here's some feedback. The main thing you absolutely do need to change is a couple of hyperlinks, you should make these relative links.

I've also commented on whether or not these health check endpoints are aimed at human or machine clients. My guess, without looking at any k/kubernetes source code, is that the HTTP response code can be relied on but the rest of the response is not meant for machine parsing. It'd be good to clarify that.

As well as that there's some style feedback, most of which is there in case it's useful. I've highlighted all the feedback which is purely suggestions and that it's 100% OK for you to stick with your original wording.

I hope this is useful; please feel free to ask questions.

@logicalhan did you want to add anything here?

content/en/docs/reference/using-api/health-checks.md Outdated Show resolved Hide resolved
content/en/docs/reference/using-api/health-checks.md Outdated Show resolved Hide resolved
content/en/docs/reference/using-api/health-checks.md Outdated Show resolved Hide resolved
content/en/docs/reference/using-api/health-checks.md Outdated Show resolved Hide resolved
content/en/docs/reference/using-api/health-checks.md Outdated Show resolved Hide resolved
content/en/docs/reference/using-api/health-checks.md Outdated Show resolved Hide resolved
content/en/docs/reference/using-api/health-checks.md Outdated Show resolved Hide resolved
@johscheuer
Copy link
Contributor Author

Thanks for your time and your comments. I will go over them in the next days.

@jimangel
Copy link
Member

@johscheuer do you still have time to update your PR based on Tim's review?

@johscheuer
Copy link
Contributor Author

Thanks for the reminder. I will finish this today/tomorrow.

@logicalhan
Copy link
Member

logicalhan commented Jun 11, 2020

@logicalhan did you want to add anything here?

It looks like a reasonable starting point to me :)

I've also commented on whether or not these health check endpoints are aimed at human or machine clients. My guess, without looking at any k/kubernetes source code, is that the HTTP response code can be relied on but the rest of the response is not meant for machine parsing. It'd be good to clarify that.

Yeah that's about right. 200 means it's healthy/live/ready. This is going to be consumed mostly by the kubelet (or whatever thing you have that monitors your apiserver) which is going to decide based on the response code. The verbose flag is specifically intended to aid human debugging.

It is also probably worth documenting that individual health checks also expose http endpoints which are automatically exposed when the individual health check is registered to the corresponding health endpoint. In other words for each health check you see when you curl apiserver/livez:

[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok

You are also provided apiserver/livez/<healthcheck-name> (which is the bit after the square brackets but before the ok). This should not be considered something you can depend on however and is exposed for human debugging. This is true for all health endpoints (i.e. livez,readyz,healthz).

For illustration, let's say apiserver is exposed on port 8080, then you can issue the following command:
curl localhost:8080/healthz/etcd

@johscheuer
Copy link
Contributor Author

@sftim i reworked the site and integrated your feedback. Could you review it again?

@sftim
Copy link
Contributor

sftim commented Jun 13, 2020

I'm afraid we removed capture shortcodes.

@johscheuer can you rebase and revise to match?

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things

@sftim
Copy link
Contributor

sftim commented Jun 15, 2020

/hold
following #21185 (comment)

This does need a rebase.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 15, 2020
Each individual health check exposes an http endpoint and could can be checked individually.
The schema for the individual health checks is `/livez/<healthcheck-name>` where `livez` and `readyz` and be used to indicate if you want to check thee liveness or the readiness of the API server.
The `<healthcheck-name>` path can be discovered using the `verbose` flag from above and take the path between `[+]` and `ok`.
These individual health checks should not be consumed by machines but can be helpful for a human operator to debug a system:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explicitly call out that this is not a supported API (i.e. use these endpoints at your own discretion and we explicitly do not support forward API compatibility on these endpoints)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a feature state or similar to go with that @logicalhan? If it's viable to call them “alpha” then that's easy to do, the website even has a shortcode for that.

(If it's not tracked as a feature state, I feel that could be an issue against k/kubernetes so that at least the lack of tracking is visible).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not really a feature state.. these endpoints are actually pre-versioned, legacy endpoints which do not have the traditional API versioning guarantees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is closer: alpha (don't rely on these, a future release could remove them without warning) or GA (you'll get plenty of official notice about deprecation)?

It sounds like the former. Are they perhaps insta-deprecated? I'd like to have some way of relating the stability to https://kubernetes.io/docs/reference/using-api/deprecation-policy/ - even if that means amending the deprecation policy page!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alpha seems reasonable.

@sftim
Copy link
Contributor

sftim commented Jun 21, 2020

Rebased
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 21, 2020
@sftim
Copy link
Contributor

sftim commented Jun 21, 2020

@kubernetes/sig-api-machinery-pr-reviews do you have any more feedback on #21185 (comment) and the following comments?

@johscheuer
Copy link
Contributor Author

Is there anything I can help with?

@sftim
Copy link
Contributor

sftim commented Jun 29, 2020

@johscheuer do you perhaps have a suggestion for someone who'd be a good choice for technical reviewer? If so, we can ask them to take a look.

@sttts
Copy link
Contributor

sttts commented Jun 29, 2020

do you perhaps have a suggestion for someone who'd be a good choice for technical reviewer?

@logicalhan has certainly most insight into this.

@sftim
Copy link
Contributor

sftim commented Jun 29, 2020

@johscheuer please revise in line with #21185 (comment)

@zparnold
Copy link
Member

zparnold commented Jul 1, 2020

@johscheuer when you get a chance can you please look at this? ☝️

@johscheuer
Copy link
Contributor Author

I make the requested changes in the next few days!

@johscheuer
Copy link
Contributor Author

Sorry for the delay I added the feature state information.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 21, 2020
@jimangel
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 27, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jimangel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 27, 2020
@k8s-ci-robot k8s-ci-robot merged commit 5c27482 into kubernetes:master Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document API server /livez and /readyz
7 participants