Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having healthz endpoint with appHealthCheck false #7355

Closed
shivamkm07 opened this issue Jan 9, 2024 · 12 comments · Fixed by #7394
Closed

Having healthz endpoint with appHealthCheck false #7355

shivamkm07 opened this issue Jan 9, 2024 · 12 comments · Fixed by #7394
Assignees
Labels
kind/bug Something isn't working P0
Milestone

Comments

@shivamkm07
Copy link
Contributor

In what area(s)?

/area runtime

/area operator

/area placement

/area docs

/area test-and-release

What version of Dapr?

edge: output of git describe --dirty

Expected Behavior

If app health check is disabled, it shouldn't be required for applications to implement healthz endpoint.

Actual Behavior

Even with app health disabled, If app doesn't implement healthz, actors(and so Dapr workflows) don't work. This happens because placement detects app as unhealthy and disconnects with log Disconnecting from placement service by the unhealthy app.

Steps to Reproduce the Problem

  1. Run tests/integration/suite/actors/healthz/deactivateOnPlacementFail test with healthz endpoint commented out:
    r.Get("/healthz", func(w http.ResponseWriter, r *http.Request) {
  2. It would fail even when app health check is false

Release Note

RELEASE NOTE:

@shivamkm07 shivamkm07 added the kind/bug Something isn't working label Jan 9, 2024
@ItalyPaleAle
Copy link
Contributor

Just some additional context. Actors health checks were implemented long time before app health checks. There are also some oddities, such as the fact that if there's no app channel, or the app channel is gRPC, then actors work without health checks too (this can happen for apps that use workflow).

Also worth pointing out that all Actors SDKs today implement /healthz automatically.

@mukundansundar
Copy link
Contributor

Also worth pointing out that all Actors SDKs today implement /healthz automatically.

Currently this situation is observed, when workflows are used, and no actors are used. So it seems that workflow SDKs potentially do not implement the /healthz endpoint ....

@ItalyPaleAle
Copy link
Contributor

That's correct, Workflow SDKs do not implement /healthz.

If the app doesn't have an app-channel, or if the app-channel's protocol is gRPC, then Workflow will work without /healthz

However if the app has a HTTP app channel and uses Workflow, but no Actors, then in that case /healthz needs to be implemented

It's important to note that this is something that's part of Dapr Actors and it's completely unrelated from app health checks. They just happen to shrae the same endpoint.

@shivamkm07
Copy link
Contributor Author

That's correct, Workflow SDKs do not implement /healthz.

If the app doesn't have an app-channel, or if the app-channel's protocol is gRPC, then Workflow will work without /healthz

However if the app has a HTTP app channel and uses Workflow, but no Actors, then in that case /healthz needs to be implemented

It's important to note that this is something that's part of Dapr Actors and it's completely unrelated from app health checks. They just happen to shrae the same endpoint.

So is it a breaking change then? There can be users who are using workflow without using actors and http app channel right? For actors though actors SDK implement it, should we be checking app Health if healthCheck is disabled?

@JoshVanL
Copy link
Contributor

I can confirm this is a breaking change as this is causing the version skew integration tests to fail.
https://github.com/dapr/dapr/pull/7368/files

@JoshVanL JoshVanL added this to the v1.13 milestone Jan 16, 2024
@JoshVanL JoshVanL added the P0 label Jan 16, 2024
@ItalyPaleAle
Copy link
Contributor

One possibility to fix this without breaking changes is to add a CLI flag like --disable-actor-healthchecks (false by default) which allows actors to work without a /healthz endpoint when the app channel is configured as HTTP.

@yaron2
Copy link
Member

yaron2 commented Jan 16, 2024

One possibility to fix this without breaking changes is to add a CLI flag like --disable-actor-healthchecks (false by default) which allows actors to work without a /healthz endpoint when the app channel is configured as HTTP.

Will this mean that users who just want to use workflows and pubsub/service invocation for example and nothing else need to configure this flag? If so, that's a less than ideal user experience because it would require them to take a special action that is easy to miss

@ItalyPaleAle
Copy link
Contributor

Will this mean that users who just want to use workflows and pubsub/service invocation for example and nothing else need to configure this flag?

Yes. The flag will be needed when (and only when) all these 3 are true:

  • App uses workflow
  • There's an app channel configured
  • App channel uses HTTP

So that does include your case of "workflow + pubsub / service invocation" (as long as it's HTTP for app channel).


Another option could be to disable actor healthchecks if the app doesn't explicitly list actors in /dapr/config. This means the app could still be an actor host, but only for internal actors (which as of today includes workflow only)?

@yaron2
Copy link
Member

yaron2 commented Jan 16, 2024

Another option could be to disable actor healthchecks if the app doesn't explicitly list actors in /dapr/config. This means the app could still be an actor host, but only for internal actors (which as of today includes workflow only)?

That's a much better option

@ItalyPaleAle
Copy link
Contributor

Another option could be to disable actor healthchecks if the app doesn't explicitly list actors in /dapr/config. This means the app could still be an actor host, but only for internal actors (which as of today includes workflow only)?

That's a much better option

Ok. It should be doable.

If we're going to merge #7022 however let's get that in first please!

@ItalyPaleAle
Copy link
Contributor

Another option could be to disable actor healthchecks if the app doesn't explicitly list actors in /dapr/config. This means the app could still be an actor host, but only for internal actors (which as of today includes workflow only)?

Looks like this behavior was already, incidentally, implemented in master with some recent commit. So we should be all good already.

@shivamkm07 could you verify with the master builds if everything works?

Josh is adding ITs but some user validation would help too

@shivamkm07
Copy link
Contributor Author

Another option could be to disable actor healthchecks if the app doesn't explicitly list actors in /dapr/config. This means the app could still be an actor host, but only for internal actors (which as of today includes workflow only)?

Looks like this behavior was already, incidentally, implemented in master with some recent commit. So we should be all good already.

@shivamkm07 could you verify with the master builds if everything works?

Josh is adding ITs but some user validation would help too

I tested this locally and verified now workflow runs fine without requiring app to implement healthz endpoint.
Since it's not a breaking change now, created this PR #7405 to remove the patch that was added to fix for this scenario. version skew e2e is passing in the PR for workflows (for dapr-sidecar-master mode, for which the patch was added).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working P0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants