Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include failing/timed out check name in health/readiness check response #638

Merged
merged 1 commit into from
May 8, 2020

Conversation

longshorej
Copy link
Contributor

@longshorej longshorej commented Feb 5, 2020

Include failing check name in health/readiness check messages/responses

This modifies the readiness/health check internals to ensure that when a check fails, for any reason, that the check's class name is included in the message to aid in debugging. Additionally, timeouts are now tracked on a per-check basis so that it's clear which check timed out.

I haven't updated the tests yet -- if I could get some validation/assurance that such a chance would be accepted, then I'd be happy to get the tests passing.

Ready for a review.

Example failure:

$ curl  192.168.1.20:8558/ready

Health Check Failed: Check [com.myco.GrpcReadinessCheck] failed: UNAVAILABLE: io exception

@octonato
Copy link
Member

octonato commented Feb 6, 2020

I like the idea and I don't see a reason for not improving the message.

@longshorej longshorej force-pushed the health-name branch 2 times, most recently from a0b17d9 to 528ad78 Compare February 6, 2020 17:50
@longshorej longshorej changed the title WIP: Include failing check name in health/readiness check response Include failing/timed out check name in health/readiness check response Feb 6, 2020
@longshorej
Copy link
Contributor Author

Cool, thanks @renatocaval. This is ready for a review.

@@ -33,14 +33,14 @@ abstract class HealthChecks {
/**
* Returns Future(true) if the system is ready to receive user traffic
*/
def ready(): Future[Boolean]
def ready(): Future[Either[String, Unit]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the public API in a non-compatible way.

We'd have to add new methods and delegate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that should be simple enough, I'll update when I have some time.

Copy link
Member

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we solve this with additional or improved logging instead of changing API?

@@ -30,16 +29,16 @@ final class HealthChecks(system: ExtendedActorSystem, settings: HealthCheckSetti
/**
* Returns CompletionStage(true) if the system is ready to receive user traffic
*/
def ready(): CompletionStage[java.lang.Boolean] =
delegate.ready().map(Boolean.box)(ExecutionContexts.sameThreadExecutionContext).toJava
def ready(): CompletionStage[Either[String, Unit]] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't use Either in javadsl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - ideas on what a good alternative would be for Java (if this PR goes forward)? Maybe Optional<String> (plus delegated methods requested by @chbatey)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a custom type is more idiomatic in Java

@longshorej
Copy link
Contributor Author

Can we solve this with additional or improved logging instead of changing API?

I'll look into it, but atm I'm not sure how unfortunately, as I think the majority of the utility is from having the messages directly in the HTTP response. Consider it may be a bit more difficult to obtain host logs than it is to hit an endpoint.

@chbatey
Copy link
Member

chbatey commented Mar 3, 2020

I agree it will be very useful to have this in the error message so you can see it from the place doing the pinging of the health checks. If we can do it in an API compatible way I am +1 to doing this

@longshorej
Copy link
Contributor Author

Okay, I've pushed an update to ensure the API remains compatible, let me know what y'all think. Thanks!

This modifies the readiness/health check internals to ensure that when a
check fails, for any reason, that the check's class name is included in
the message to aid in debugging. Additionally, timeouts are now tracked
on a per-check basis so that it's clear which check timed out.
/**
* Returns CompletionStage(true) if the system is ready to receive user traffic
*/
def ready(): CompletionStage[java.lang.Boolean] =
delegate.ready().map(Boolean.box)(ExecutionContexts.sameThreadExecutionContext).toJava
readyResult().thenApply(((r: CheckResult) => r.isSuccess).asJava)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally had .thenApply(_.iSuccess).asJava but that failed to compile on 2.11:

[error] /Users/jlongshore/work/tubi/akka-management/management/src/main/scala/akka/management/javadsl/HealthChecks.scala:38:29: missing parameter type for expanded function ((x$2) => x$2.isSuccess)
[error]     readyResult().thenApply(_.isSuccess)
[error]                             ^
[error] /Users/jlongshore/work/tubi/akka-management/management/src/main/scala/akka/management/javadsl/HealthChecks.scala:52:29: missing parameter type for expanded function ((x$4) => x$4.isSuccess)
[error]     aliveResult().thenApply(_.isSuccess)
[error]                             ^
[error] two errors found

@longshorej
Copy link
Contributor Author

Hi @patriknw and @chbatey - do you think you'll have some time to look this over in the coming weeks?

Copy link
Member

@chbatey chbatey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for updating and sorry for the slow reviews

Copy link
Member

@ennru ennru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ennru ennru merged commit f0da124 into akka:master May 8, 2020
@ennru ennru added this to the 1.0.7 milestone May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants