Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

registry/health: adding healthcheck package #230

Merged
merged 1 commit into from
Mar 21, 2015

Conversation

diogomonica
Copy link
Contributor

Summary

Package health provides a generic health checking framework. The health package works expvar style. By importing the package the debug server is getting a /debug/health endpoint that returns the current status of the application. If there are no errors, /debug/health will return a HTTP 200 status, together with an empty JSON reply {}. If there are any checks with errors, the JSON reply will include all the failed checks, and the response will be have a HTTP 500 status.

A Check can either be run synchronously, or asynchronously. We recommend that most checks are registered as an asynchronous check, so a call to the /debug/health endpoint always returns
immediately. This pattern is particularly useful for checks that verify upstream connectivity or database status, since they might take a long time to return/timeout.

Installing

To install health, just import it in your application:

import "github.com/docker/distribution/health"

You can also (optionally) import health/api that will add two convenience endpoints: /debug/health/down and /debug/health/up. These endpoints add "manual" checks that allow the service to quickly be brought in/out of rotation.

import _ "github.com/docker/distribution/registry/health/api"

# curl localhost:5001/debug/health
{}
# curl -X POST localhost:5001/debug/health/down
# curl localhost:5001/debug/health
{"manual_http_status":"Manual Check"}

After importing these packages to your main application, you can start registering checks.

Registering Checks

The recommended way of registering checks is using a periodic Check. PeriodicChecks run on a certain schedule and asynchronously update the status of the check. This allows CheckStatus() to return without blocking on an expensive check.

A trivial example of a check that runs every 5 seconds and shuts down our server if the current minute is even, could be added as follows:

 func currentMinuteEvenCheck() error {
   m := time.Now().Minute()
   if m%2 == 0 {
     return errors.New("Current minute is even!")
   }
   return nil
 }

 health.RegisterPeriodicFunc("minute_even", currentMinuteEvenCheck, time.Second*5)

Alternatively, you can also make use of RegisterPeriodicThresholdFunc to implement the exact same check, but add a threshold of failures after which the check will be unhealthy. This is particularly useful for flaky Checks, ensuring some stability of the service when handling them.

health.RegisterPeriodicThresholdFunc("minute_even", currentMinuteEvenCheck, time.Second*5, 4)

The lowest-level way to interact with the health package is calling Register directly. Register allows you to pass in an arbitrary string and something that implements Checker and runs your check. If your method returns an error with nil, it is considered a healthy check, otherwise it will make the health check endpoint /debug/health start returning a 500 and list the specific check that failed.

Assuming you wish to register a method called currentMinuteEvenCheck() error you could do that by doing:

health.Register("even_minute", health.CheckFunc(currentMinuteEvenCheck))

CheckFunc is a convenience type that implements Checker.

Another way of registering a check could be by using an anonymous function and the convenience method RegisterFunc. An example that makes the status endpoint always return an error:

 health.RegisterFunc("my_check", func() error {
  return Errors.new("This is an error!")
}))

Examples

You could also use the health checker mechanism to ensure your application only comes up if certain conditions are met, or to allow the developer to take the service out of rotation immediately. An example that checks database connectivity and immediately takes the server out of rotation on err:

 updater = health.NewStatusUpdater()
  health.RegisterFunc("database_check", func() error {
   return updater.Check()
 }))

 conn, err := Connect(...) // database call here
 if err != nil {
   updater.Update(errors.New("Error connecting to the database: " + err.Error()))
 }

You can also use the predefined Checkers that come included with the health package. First, import the checks:

import "github.com/docker/distribution/health/checks

After that you can make use of any of the provided checks. An example of using a FileChecker to take the application out of rotation if a certain file exists can be done as follows:

health.Register("fileChecker", health.PeriodicChecker(checks.FileChecker("/tmp/disable"), time.Second*5))

After registering the check, it is trivial to take an application out of rotation from the console:

# curl localhost:5001/debug/health
{}
# touch /tmp/disable
# curl localhost:5001/debug/health
{"fileChecker":"file exists"}

You could also test the connectivity to a downstream service by using a HTTPChecker, but ensure that you only mark the test unhealthy if there are a minimum of two failures in a row:

health.Register("httpChecker", health.PeriodicThresholdChecker(checks.HTTPChecker("https://www.google.pt"), time.Second*5, 2))

@diogomonica
Copy link
Contributor Author

// Represents the possible server states based on the currently recorded
// healthchecks.
const (
StatusOK = "StatusOK"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These string values can just be "ok", "warning" and "error".

@stevvooe
Copy link
Collaborator

stevvooe commented Mar 3, 2015

Let's move this package into distribution/health. I want to avoid having registry-specific functionality in this package.

@stevvooe stevvooe changed the title Adding healthcheck registry/health: adding healthcheck package Mar 3, 2015
)

// Status represents a named status check and it's current status.
type Status struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this structure going to have a "meta" field, as proposed by #133? Or, did we find that information should be a part of expvar?

@diogomonica diogomonica force-pushed the adding-healthcheck branch 2 times, most recently from ea601f3 to af88da8 Compare March 14, 2015 01:57
@diogomonica
Copy link
Contributor Author

@stevvooe @NathanMcCauley would love another review.

@dmp42
Copy link
Contributor

dmp42 commented Mar 14, 2015

cc @icecrime @aluzzardi @mavenugo @endophage because this kind of stuff has a broader interest.

@diogomonica diogomonica force-pushed the adding-healthcheck branch 2 times, most recently from 8e00f96 to b9cd974 Compare March 14, 2015 06:12
})
}

// HTTPChecker does a HEAD request and verifies if the HTTTP status

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/HTTTP/HTTP/

@icecrime
Copy link

Got a few comments, but overall I think it's cool 👍


DownHandler(recorder, req)

assert.Equal(t, recorder.Code, 404, "Code should be 404")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if record.Code != 404 {
// report error
}

@stevvooe
Copy link
Collaborator

LGTM!

Nice work on this one.

// overwrites to a specific check status.
func Register(name string, check Checker) {
mutex.RLock()
defer mutex.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to use mutex.Lock & mutex.Unlock as there is a write access 5 lines below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. Fixed.

@stevvooe stevvooe modified the milestones: Registry/2.0.0-beta, Registry/2.0.0-rc Mar 18, 2015
StatusHandler(recorder, req)

if recorder.Code != 503 {
t.Errorf("Did not get a 500.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

503 =)

@endophage
Copy link

LGTM. Might be nice to be able to configure the checkers but that can be future work.

...
health:
  checkers:
    filechecker: [/path/one/, /path/two/]
    httpchecker: [http://foo.com, http://bar.com]
...

@diogomonica
Copy link
Contributor Author

@endophage white-belt changes?

@endophage
Copy link

Sounds good to me

Added a expvar style handler for the debug http server to allow health checks (/debug/health).

Signed-off-by: Diogo Monica <diogo@docker.com>
stevvooe added a commit that referenced this pull request Mar 21, 2015
registry/health: adding healthcheck package
@stevvooe stevvooe merged commit 15a7926 into distribution:master Mar 21, 2015
@stevvooe stevvooe modified the milestones: Registry/2.0.0-beta, Registry/2.0 Mar 31, 2015
@aluzzardi
Copy link

ping @vieux @abronan

Is this something we could use?

@stevvooe
Copy link
Collaborator

stevvooe commented Apr 1, 2015

@aluzzardi Please! Let us know if you need help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants