Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement /readyz and /livez endpoints #3949

Closed
3 tasks done
LesnyRumcajs opened this issue Feb 12, 2024 · 0 comments · Fixed by #4156
Closed
3 tasks done

Implement /readyz and /livez endpoints #3949

LesnyRumcajs opened this issue Feb 12, 2024 · 0 comments · Fixed by #4156
Assignees
Labels
Ready Issue is ready for work and anyone can freely assign it to themselves RPC

Comments

@LesnyRumcajs
Copy link
Member

LesnyRumcajs commented Feb 12, 2024

Issue summary

To make operations easier with Forest, there should be an endpoint that clearly defines whether Forest is healthy, alive and ready to serve RPC requests.

In the Kubernetes world (subjectively, a good source of best practices when it comes to managing services), there is a notion of liveness and readiness probes. Alternatively, there is a health probe, which was deprecated, but we could still have it.

The goal is for the orchestrator, be it Kubernetes, Docker Compose or any custom contraption to know whether the service is up or down (and should be restarted) and ready to serve requests (for load-balancing purposes).

Task summary

  • Implement /livez endpoint - it should return 200 if the node is live. The endpoint should accept ?verbose argument which should list the checks that were mode. Sample checks:
    • Prometheus server is up,
    • connected to at least 1 other peer,
    • other tasks in the main loop have started.
  • Implement /readyz endpoint - it should return 200 if the node is ready to serve RPC requests (note that the checks may be different for an offline node). The endpoint should accept ?verbose argument, which should list the checks that were mode. Sample checks:
    • RPC server is up and responding,
    • (online node-only) - the node is not too far behind the expected time, e.g., genesis time + block_interval * epoch <= current_timestamp + epsilon, where epsilon is some arbitrarily chosen grace period, for example, six blocks period. This may need to be disabled for devnets.
    • (online node-only) is in follow-mode
      - [ ] Implement forest-cli node info node-ready|node-live subcommands to wrap the above.
  • Add these checks to some of the existing tests.

Feel free to come up with more checks. They don't have to be necessarily implemented (at least, not all of them), these may come as follow-up tasks.

Other information and links
https://kubernetes.io/docs/reference/using-api/health-checks/#api-endpoints-for-health

@LesnyRumcajs LesnyRumcajs added RPC Ready Issue is ready for work and anyone can freely assign it to themselves labels Feb 12, 2024
@ruseinov ruseinov assigned ruseinov and unassigned ruseinov Feb 12, 2024
@LesnyRumcajs LesnyRumcajs self-assigned this Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready Issue is ready for work and anyone can freely assign it to themselves RPC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants