Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthcheck service #24817

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Healthcheck service #24817

wants to merge 14 commits into from

Conversation

vinzscam
Copy link
Member

@vinzscam vinzscam commented May 17, 2024

Hey, I just made a Pull Request!

This PR adds a new healthcheck service which adds a bunch of endpoints to the root HTTP Router

Closes #24548

✔️ Checklist

  • A changeset describing the change and affected packages. (more info)
  • Added or updated documentation
  • Tests for new functionality and regression tests for bug fixes
  • Screenshots attached (for UI changes)
  • All your commits have a Signed-off-by line in the message. (more info)

@vinzscam vinzscam requested review from a team as code owners May 17, 2024 09:42
@vinzscam vinzscam requested review from Rugvip and camilaibs May 17, 2024 09:42
@backstage-goalie
Copy link
Contributor

backstage-goalie bot commented May 17, 2024

Changed Packages

Package Name Package Path Changeset Bump Current Version
@backstage/backend-app-api packages/backend-app-api patch v0.7.6-next.2
@backstage/backend-defaults packages/backend-defaults patch v0.3.0-next.2
@backstage/backend-plugin-api packages/backend-plugin-api patch v0.6.19-next.2
@backstage/backend-test-utils packages/backend-test-utils patch v0.4.0-next.2

Copy link
Member

@Rugvip Rugvip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🎉

Couple of initial high level bits

const router = Router();
let handler = () => Promise.resolve({ status: 'ok' });

router.get('/healthcheck', async (_request: Request, response: Response) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking we should have this separated from the main routes, probably something like /.backstage/health, but potentially grouped a bit

export interface HttpRouterHealthCheckConfig {
handler: () => Promise<any>;
}

/**
* @public
*/
export interface HttpRouterService {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about keeping health checks at the root level instead? Something like /.backstage/health/v1/readiness for checking whether the entire instance is up, and /.backstage/health/v1/<plugin-id>/readiness for individual plugins?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah nice!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the other way around, /.backstage/health/v1/readiness/<plugin-id>

// @public (undocumented)
export interface HttpRouterService {
// (undocumented)
addAuthPolicy(policy: HttpRouterServiceAuthPolicy): void;
// (undocumented)
healthCheckConfig(healthCheckOptions: HttpRouterHealthCheckConfig): void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this should live on the http router tbh. Or framed a bit differently: if we can make this a separate service we probably should. I'm thinking this should probably be additive as well so that it's possible to add health checks from modules etc.

// @public (undocumented)
export interface HttpRouterHealthCheckConfig {
// (undocumented)
handler: () => Promise<any>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to consider a liveness + readiness split. It can help avoid situations where instances are shut down because they timed out at startup, while still being able to be pretty aggressive about timeouts when it comes to whether the service is functional at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have any suggestion for a possible root level readiness implementation?

Copy link
Contributor

@camilaibs camilaibs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🎉, left minor comments. Could we also deprecate the legacy handler from the backend-common package as well?

@vinzscam vinzscam changed the title Healthcheck endpoint Healthcheck service May 17, 2024
@vinzscam vinzscam force-pushed the healthcheck-router branch 3 times, most recently from 3ede1f0 to 74a6836 Compare May 24, 2024 11:54
Copy link
Contributor

@camilaibs camilaibs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, left minor comments 🙌🏻

@@ -58,6 +59,7 @@ export const defaultServiceFactories = [
userInfoServiceFactory(),
urlReaderServiceFactory(),
eventsServiceFactory(),
healthServiceFactory(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have also to provide a mock implementation here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is not used anymore, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this file can be deleted, right?

'@backstage/backend-app-api': patch
---

Added a new health service which adds new endpoints for health checks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice add this service to the core services docs:
https://backstage.io/docs/backend-system/core-services/index

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 3, 2024
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
@github-actions github-actions bot added the area:catalog Related to the Catalog Project Area label Jun 3, 2024
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
@github-actions github-actions bot removed the area:catalog Related to the Catalog Project Area label Jun 3, 2024
Signed-off-by: Vincenzo Scamporlino <vincenzos@spotify.com>
Copy link
Member

@Rugvip Rugvip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 👍

Couple of nits

* The service reference for the plugin scoped {@link HealthService}.
*/
export const health = createServiceRef<
import('./HealthService').HealthService
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's prolly best to keep this as void instead, since now you can pass anything instead. We can still add methods in the future even if it starts out as void.

Comment on lines +34 to +43
router.get('/v1/readiness', async (_request: Request, response: Response) => {
if (!isRunning) {
throw new Error('Backend has not started yet');
}
response.json({ status: 'ok' });
});

router.get('/v1/liveness', async (_request: Request, response: Response) => {
response.json({ status: 'ok' });
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a quick description to each of these with what their purpose is?


router.get('/v1/readiness', async (_request: Request, response: Response) => {
if (!isRunning) {
throw new Error('Backend has not started yet');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this should be a 503 ServiceUnavailableError. Perhaps we shouldn't throw an error in this case either, so that we can have a more consistent response? Instead setting an explicit status code and returning whatever the opposite of status: 'ok' is

Comment on lines +82 to +86
* The service reference for the plugin scoped {@link HealthService}.
*/
export const health = createServiceRef<
import('./HealthService').HealthService
>({ id: 'core.health', scope: 'root' });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The service reference for the plugin scoped {@link HealthService}.
*/
export const health = createServiceRef<
import('./HealthService').HealthService
>({ id: 'core.health', scope: 'root' });
* The service reference for the root scoped {@link RootHealthService}.
*/
export const rootHealth = createServiceRef<
import('./RootHealthService').RootHealthService
>({ id: 'core.rootHealth', scope: 'root' });

Every single other root scoped thing is named like this - we should probably follow suit here. Make sure to change to that naming pattern everywhere else too.

I'd also be interested in what the future might be in terms of per-plugin liveness/health. That's sort of where the most interesting developments might be in the future. This root scoped one could at most do simple lifecycle matching like it does today, or possibly offer some form of rollup of the states of all installed plugins (not even sure that's a good idea though). I mean, how often do we really desire to look at the liveness of effectively a machine that hosts many plugins? What does that even mean?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a replacement for the status check router in the new system and deprecated it from the commons
4 participants