-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce "cilium-health", a new tool for investigating cluster connectivity issues. #2052
Conversation
pkg/health/server/server.go
Outdated
if err := pinger.Run(); err != nil { | ||
log.WithError(err).Info("Failed to run ping") | ||
return nil, err | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary)
pkg/health/server/server.go
Outdated
} | ||
} | ||
|
||
// FetchStatusResponse() updates the cluster with the latest set of nodes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported method Server.FetchStatusResponse should be of the form "FetchStatusResponse ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far!
7194ec9
to
cb3b955
Compare
pkg/health/client/client.go
Outdated
"github.com/go-openapi/strfmt" | ||
) | ||
|
||
type Client struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported type Client should have comment or be unexported
pkg/client/client.go
Outdated
@@ -101,3 +103,54 @@ func Hint(err error) error { | |||
} | |||
return fmt.Errorf("%s", e) | |||
} | |||
|
|||
func FormatStatusResponse(w io.Writer, sr *models.StatusResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported function FormatStatusResponse should have comment or be unexported
cfd9d94
to
8858645
Compare
api/v1/health/openapi.yaml
Outdated
"$ref": "../openapi.yaml#/definitions/Error" | ||
"/status/probe": | ||
put: | ||
summary: Get connectivity status of the Cilium cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The summary should already indicate that it is synchronous
Run synchronous connectivity probe to determine status of Cilium cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, will fix.
properties: | ||
cilium: | ||
description: Status of Cilium daemon | ||
"$ref": "../openapi.yaml#/definitions/StatusResponse" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have a StatusResponse
in both yaml files? Can we rename one of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, renamed the health status response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall - the size made it hard to fully grok some parts, but that is the nature of bootstrapping new components :)
Some small nits. You don't have to request another review from me.
pkg/health/server/prober.go
Outdated
} | ||
} | ||
|
||
// Run sends a single probes out to all of the other cilium nodes to gather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... a single probe...
cilium --> Cilium
pkg/launcher/launcher.go
Outdated
launcher.setStdout(stdout) | ||
} | ||
|
||
// Restart stops the daemon whilauncher will trigger a rerun. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix typo "whilauncher"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking it over. I tried to split it up into reasonable separate patches for better review, but it is still quite large. I'll fix this up.
302d36b
to
a7b512c
Compare
// HealthStatusResponse Connectivity status to other daemons | ||
// swagger:model HealthStatusResponse | ||
|
||
type HealthStatusResponse struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported type HealthStatusResponse should have comment or be unexported
This should live in pkg/node. Signed-off-by: Joe Stringer <joe@covalent.io>
This makes it reusable from cilium-health client. Signed-off-by: Joe Stringer <joe@covalent.io>
Export a list of known cluster nodes, including names, IPs and CIDR allocations in the health status requested on a local node. Signed-off-by: Joe Stringer <joe@covalent.io>
This is only used by some commands right now, so leave it hidden (since it's a bit inconsistent). When each command implements json-formatting the responses, we can make this not hidden. Signed-off-by: Joe Stringer <joe@covalent.io>
Previously, the client package created the client by passing the entire "unix://foo" or "http://foo" host string into the hostname field of the client, however when creating a client using an "http://" host string this would result in HTTP requests attempting to reach "http://http://hostname". Signed-off-by: Joe Stringer <joe@covalent.io>
goprocinfo provides a wrapper around /proc on linux systems, which will be used to collect system load in an upcoming commit. Signed-off-by: Joe Stringer <joe@covalent.io>
GET /status: Returns the connectivity status to all other cilium-health instances using interval-based probing. GET /status/probe: Runs a synchronous probe to all other cilium-health instances and returns the connectivity status. GET /healthz: Returns current health: * uptime * system load * local agent status See also Issue cilium#1947 Signed-off-by: Joe Stringer <joe@covalent.io>
Signed-off-by: Joe Stringer <joe@covalent.io>
cilium-health is a new client and daemon which allows users to probe the status of the local node, and also of the connectivity between the local node and remote nodes in the cluster. Signed-off-by: Joe Stringer <joe@covalent.io>
Also add the transitive dependencies on golang.org ICMP/IPv4/IPv6. Signed-off-by: Joe Stringer <joe@covalent.io>
Signed-off-by: Joe Stringer <joe@covalent.io>
Add support to the cilium-health client and server to host a new /hello API call over TCP socket 4240, and be able to make requests to it. Pinging this port on all known cluster nodes is done periodically, just like how regular ICMP ping polling is done. Signed-off-by: Joe Stringer <joe@covalent.io>
Signed-off-by: Joe Stringer <joe@covalent.io>
A generic launcher will be useful for an upcoming commit where the Cilium daemon also launches the cilium-health daemon. Signed-off-by: Joe Stringer <joe@covalent.io>
Signed-off-by: Joe Stringer <joe@covalent.io>
The new '--admin' option restricts which protocols that the /healthz and /status resources are served over. If specified as 'unix', it will only serve these resources over the unix socket (if not otherwise disabled by --passive); if specified as 'any' (default), these resources will also be served over the HTTP sockets. Signed-off-by: Joe Stringer <joe@covalent.io>
a7b512c
to
92bb2ba
Compare
This series extends the Cilium API to provide access to known nodes in the cluster and their IPs via /healthz (The same as #2139), and adds a new program to the tree:
cilium-health
.cilium-health
can be run as a daemon, which will periodically probe other nodes in the cluster for connectivity, starting with basic ping (ICMP) access and HTTP probes to other nodes. By default, whencilium
is run, it will launchcilium-health
locally. The results of the probes will be cached and exposed over a REST API.cilium-health
can also be run as a commandline client for the daemon, to fetch the connectivity status and format it. Currently,cilium-health
exposes:GET /hello
GET /healthz
GET /status
PUT /status/probe
Sample output
Commandline options
Next Tasks:
cilium status
to print the new fields exposed over the Cilium API--passive
option for only responding to probes--admin
option to toggle serving all APIs / unix socketThe scope of this PR is primarily to establish groundwork of determining connectivity to other nodes. Endpoint connectivity and policy connectivity is likely to be addressed in a followup PR.
For more future tasks, see Issue #1947.