feat: graceful shutdown by lindesvard · Pull Request #205 · Openpanel-dev/openpanel

lindesvard · 2025-09-30T20:46:06Z

Summary by CodeRabbit

New Features
- Added Kubernetes-style health endpoints: /healthz/live (always reports liveness) and /healthz/ready (per-dependency readiness). Kept /healthcheck for compatibility.
Refactor
- Simplified health behavior to lightweight, real-time dependency probes with clearer ready/live responses and proper error codes.
Chores
- Centralized graceful shutdown for safer deployments and signal handling.
- Reduced log noise by excluding /healthz requests from request logging.

coderabbitai · 2025-09-30T20:46:17Z

Walkthrough

Refactors health checks into Kubernetes-style liveness and readiness endpoints, adds a centralized graceful shutdown utility and wiring, updates index to use new endpoints and shutdown handlers, and extends request-logging to ignore the new health path.

Changes

Cohort / File(s)	Summary of changes
Health endpoints refactor `apps/api/src/controllers/healthcheck.controller.ts`	Replaces legacy timing/queue checks with `liveness` and `readiness` handlers. `readiness` short-circuits when shutting down, performs lightweight checks (Redis, DB, query cache), and returns per-dependency flags; errors return 503. Removed `healthcheckQueue` and legacy timing logic.
Request logging filter `apps/api/src/hooks/request-logging.hook.ts`	Adds `/healthz` to the ignore list so health endpoints are excluded from request logging.
API wiring and signals `apps/api/src/index.ts`	Exposes `/healthz/live` and `/healthz/ready` (keeps `/healthcheck` for backward compatibility). Replaces manual signal handlers with centralized `shutdown` usage and wires SIGTERM/SIGINT/uncaughtException/unhandledRejection to graceful shutdown.
Graceful shutdown utility `apps/api/src/utils/graceful-shutdown.ts`	New multi-step shutdown: `setShuttingDown` / `isShuttingDown` and `shutdown(fastify, signal, exitCode?)`. Marks shutting down, waits grace period, closes Fastify, disconnects DB/ClickHouse, closes Bull queues, closes Redis connections, logs per-step outcomes, then exits.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant F as Fastify
  participant HC as HealthController
  participant R as Redis
  participant D as DB
  participant Q as QueryCache

  Note over C,F: Liveness
  C->>F: GET /healthz/live
  F->>HC: liveness()
  HC-->>F: { live: true }
  F-->>C: 200 OK

  rect rgba(230,245,255,0.5)
  Note over C,F: Readiness (short-circuits if shutting down)
  C->>F: GET /healthz/ready
  F->>HC: readiness()
  HC->>HC: check isShuttingDown()
  alt Shutting down
    HC-->>F: { ready: false, reason: "shutting_down" }
    F-->>C: 503 Service Unavailable
  else Not shutting down
    par Redis check
      HC->>R: ping/info
      R-->>HC: ok/fail
    and DB check
      HC->>D: simple query/connect
      D-->>HC: ok/fail
    and Query cache check
      HC->>Q: ping/get
      Q-->>HC: ok/fail
    end
    alt Any failure
      HC-->>F: { ready: false, deps: {...} }
      F-->>C: 503 Service Unavailable
    else All pass
      HC-->>F: { ready: true, deps: {...} }
      F-->>C: 200 OK
    end
  end
  end

sequenceDiagram
  autonumber
  participant OS as OS/Node Runtime
  participant F as Fastify
  participant GS as GracefulShutdown
  participant DB as Database
  participant CH as ClickHouse
  participant BQ as BullQueues
  participant RC as RedisConns

  OS-->>F: SIGTERM/SIGINT or fatal error
  F->>GS: shutdown(fastify, signal)
  GS->>GS: setShuttingDown(true)
  Note over GS: Wait SHUTDOWN_GRACE_PERIOD_MS (~5s default)
  GS->>F: fastify.close()
  F-->>GS: closed/err (logged)
  par Close backends
    GS->>DB: disconnect()
    DB-->>GS: ok/err
    GS->>CH: disconnect()
    CH-->>GS: ok/err
    GS->>BQ: close queues (events/sessions/cron/...)
    BQ-->>GS: ok/err
    GS->>RC: close cache/pub/sub/queue if ready
    RC-->>GS: ok/err
  end
  GS-->>OS: process.exit(exitCode)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

Thump-thump, I check each burrow bright,
Live flag waving in the moonlit night.
Ready I am—unless storms call,
Then I tuck in, close every hall.
Quiet logs, soft shutdown song, carrots safe and strong. 🥕🫶

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “feat: graceful shutdown” succinctly captures the introduction of a coordinated shutdown sequence triggered by process signals and follows conventional commit style, clearly reflecting the core objective of this pull request without extraneous detail.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch graceful-healthchecks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

apps/api/src/utils/graceful-shutdown.ts (3)
18-26: Shutdown flag lacks thread-safety guarantees.

JavaScript is single-threaded for execution, but the boolean flag shuttingDown could be subject to race conditions if multiple shutdown signals arrive simultaneously (e.g., SIGTERM followed quickly by SIGINT). While unlikely to cause major issues in practice due to the event loop model, consider using an atomic flag or checking if shutdown is already in progress at the start of the shutdown() function.

Add an early-return guard in the shutdown function:
 export async function shutdown(fastify: FastifyInstance, signal: string) {
+  if (shuttingDown) {
+    logger.warn('Shutdown already in progress, ignoring signal', { signal });
+    return;
+  }
   logger.info('Starting graceful shutdown', { signal });
   setShuttingDown(true);
34-35: Consider making the load balancer grace period configurable.

The hard-coded 5-second wait is reasonable and aligns with typical Kubernetes preStop hook delays, but consider making it configurable via an environment variable for flexibility across different deployment environments.

Example:
-  // Step 2: Wait for load balancer to stop sending traffic (matches preStop sleep)
-  await new Promise((resolve) => setTimeout(resolve, 5000));
+  // Step 2: Wait for load balancer to stop sending traffic (matches preStop sleep)
+  const gracePeriod = Number(process.env.SHUTDOWN_GRACE_PERIOD_MS || '5000');
+  await new Promise((resolve) => setTimeout(resolve, gracePeriod));
96-97: Consider using non-zero exit code for error-triggered shutdowns.

The shutdown function always exits with code 0, even when triggered by uncaughtException or unhandledRejection. This may confuse orchestrators (Kubernetes, Docker) that rely on exit codes to determine if a restart is needed.

Pass the exit code or distinguish error signals:
-export async function shutdown(fastify: FastifyInstance, signal: string) {
+export async function shutdown(fastify: FastifyInstance, signal: string, exitCode = 0) {
   logger.info('Starting graceful shutdown', { signal });
   
   // ... shutdown steps ...
   
   logger.info('Graceful shutdown completed');
-  process.exit(0);
+  process.exit(exitCode);
 }
Then in index.ts:
-      process.on('uncaughtException', async (error) => {
+      process.on('uncaughtException', (error) => {
         logger.error('Uncaught exception', error);
-        await shutdown(fastify, 'uncaughtException');
+        shutdown(fastify, 'uncaughtException', 1);
       });
apps/api/src/controllers/healthcheck.controller.ts (1)
36-60: LGTM! Readiness probe correctly integrates shutdown awareness.

The readiness endpoint properly:

Short-circuits to 503 when the shutdown flag is set

Performs lightweight dependency checks to determine readiness

Returns detailed status information for debugging

One optional improvement: Consider adding timeouts to the dependency checks to prevent the readiness probe from hanging if a dependency becomes slow during shutdown or failure scenarios.

Example timeout wrapper:
const withTimeout = <T>(promise: Promise<T>, ms: number): Promise<T> => {
  return Promise.race([
    promise,
    new Promise<T>((_, reject) => 
      setTimeout(() => reject(new Error('Timeout')), ms)
    ),
  ]);
};

// Then use:
const redisRes = await withTimeout(getRedisCache().ping(), 1000);

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5092b6a and b00ce5d.

📒 Files selected for processing (4)

apps/api/src/controllers/healthcheck.controller.ts (1 hunks)
apps/api/src/hooks/request-logging.hook.ts (1 hunks)
apps/api/src/index.ts (4 hunks)
apps/api/src/utils/graceful-shutdown.ts (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

apps/api/src/utils/graceful-shutdown.ts (4)

packages/db/src/clickhouse/client.ts (2)

error (40-45)

ch (130-140)

packages/db/src/prisma-client.ts (1)

db (270-270)

packages/queue/src/queues.ts (5)

eventsQueue (94-104)

sessionsQueue (106-111)

cronQueue (116-121)

miscQueue (123-128)

notificationQueue (137-145)

packages/redis/redis.ts (4)

getRedisCache (64-70)

getRedisPub (82-88)

getRedisSub (73-79)

getRedisQueue (91-106)

apps/api/src/index.ts (3)

apps/api/src/controllers/healthcheck.controller.ts (3)

healthcheck (7-29)

liveness (32-34)

readiness (37-60)

apps/api/src/utils/logger.ts (1)

logger (3-3)

apps/api/src/utils/graceful-shutdown.ts (1)

shutdown (29-98)

apps/api/src/controllers/healthcheck.controller.ts (4)

packages/redis/redis.ts (1)

getRedisCache (64-70)

packages/db/src/prisma-client.ts (1)

db (270-270)

packages/db/src/clickhouse/client.ts (1)

chQuery (180-185)

apps/api/src/utils/graceful-shutdown.ts (1)

isShuttingDown (24-26)

🔇 Additional comments (10)

apps/api/src/hooks/request-logging.hook.ts (1)

4-4: LGTM! Correctly excludes new health endpoints from logging.

The addition of '/healthz' to the ignore list appropriately reduces log noise from frequent Kubernetes health probes while maintaining backward compatibility with the existing '/healthcheck' path.

apps/api/src/index.ts (3)

24-25: LGTM! Health endpoint imports aligned with Kubernetes patterns.

The import of liveness and readiness functions supports the new Kubernetes-style health check endpoints.

44-44: LGTM! Centralized shutdown utility imported.

177-181: LGTM! Health endpoints properly registered with backward compatibility.

The implementation correctly:

Maintains the existing /healthcheck endpoint for backward compatibility

Adds Kubernetes-standard /healthz/live for liveness probes

Adds /healthz/ready for readiness probes

apps/api/src/utils/graceful-shutdown.ts (3)

37-43: LGTM! Fastify server closure properly handled.

The fastify.close() call drains in-flight requests gracefully with appropriate error handling.

75-94: LGTM! Redis connection closure properly handled.

The code correctly:

Checks connection status before calling quit()

Uses quit() instead of disconnect() to gracefully wait for pending commands

Handles errors per step

The status check for 'ready' is appropriate—connections in other states (e.g., 'connecting', 'disconnecting', 'end') should not be quit again.

61-73: queue.close() in API only tears down Redis connections, not jobs
In the API process you’re only enqueuing work—actual job execution and graceful shutdown (waiting on active jobs) occurs in the worker process via worker.close(). Calling queue.close() here simply closes the client connections; there are no in-flight jobs to await in this service.

Likely an incorrect or invalid review comment.

apps/api/src/controllers/healthcheck.controller.ts (3)

1-1: LGTM! Import of shutdown flag for readiness check.

6-29: LGTM! Backward-compatible healthcheck with proper dependency checks.

The healthcheck function correctly validates all dependencies (Redis, DB, ClickHouse) and returns appropriate status codes with detailed responses.

31-34: LGTM! Liveness probe correctly implemented.

The liveness endpoint properly follows Kubernetes conventions by only checking if the process is alive, without performing dependency checks.

vercel · 2025-10-02T08:57:31Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
openpanel-public	Ready	Preview	Comment	Oct 2, 2025 8:58am

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

apps/api/src/index.ts (1)
220-229: Process event handlers don't await async functions.

While the handlers are now declared async and use await, Node.js process event handlers (SIGTERM, SIGINT, uncaughtException, unhandledRejection) are inherently synchronous and won't wait for async operations to complete. The async/await wrapper here doesn't provide the synchronization guarantee you might expect.

However, since your shutdown() utility calls process.exit() at the end of its execution, the process will remain alive until shutdown completes. The current implementation works, but the async/await wrapper is misleading about how it achieves graceful shutdown.

Consider simplifying to make the control flow clearer:
-      process.on('SIGTERM', async () => await shutdown(fastify, 'SIGTERM', 0));
-      process.on('SIGINT', async () => await shutdown(fastify, 'SIGINT', 0));
-      process.on('uncaughtException', async (error) => {
+      process.on('SIGTERM', () => shutdown(fastify, 'SIGTERM', 0));
+      process.on('SIGINT', () => shutdown(fastify, 'SIGINT', 0));
+      process.on('uncaughtException', (error) => {
         logger.error('Uncaught exception', error);
-        await shutdown(fastify, 'uncaughtException', 1);
+        shutdown(fastify, 'uncaughtException', 1);
       });
-      process.on('unhandledRejection', async (reason, promise) => {
+      process.on('unhandledRejection', (reason, promise) => {
         logger.error('Unhandled rejection', { reason, promise });
-        await shutdown(fastify, 'unhandledRejection', 1);
+        shutdown(fastify, 'unhandledRejection', 1);
       });
This makes it clear that the handlers fire-and-forget the shutdown promise, which is safe because shutdown() manages the process lifecycle internally via process.exit().

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b00ce5d and ce536ca.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (2)

apps/api/src/index.ts (4 hunks)
apps/api/src/utils/graceful-shutdown.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

apps/api/src/utils/graceful-shutdown.ts

🧰 Additional context used

🧬 Code graph analysis (1)

apps/api/src/index.ts (2)

apps/api/src/controllers/healthcheck.controller.ts (3)

healthcheck (7-29)

liveness (32-34)

readiness (37-60)

apps/api/src/utils/graceful-shutdown.ts (1)

shutdown (29-108)

🔇 Additional comments (2)

apps/api/src/index.ts (2)

24-25: LGTM! Clean imports for health endpoints and shutdown utility.

The new imports for Kubernetes-style health endpoints (liveness, readiness) and the centralized shutdown utility align well with the PR's graceful shutdown objectives.

Also applies to: 44-44

177-181: LGTM! Proper Kubernetes health endpoint implementation.

The implementation correctly:

Maintains backward compatibility with the existing /healthcheck endpoint

Follows Kubernetes conventions with /healthz prefix

Separates liveness (always responds) from readiness (checks dependencies and shutdown state)

feat: graceful shutdown

b00ce5d

coderabbitai Bot reviewed Sep 30, 2025

View reviewed changes

Comment thread apps/api/src/index.ts

lindesvard added 2 commits October 2, 2025 10:42

comments by coderabbit

938730e

fix

ce536ca

vercel Bot deployed to Preview – openpanel-public October 2, 2025 08:58 View deployment

coderabbitai Bot reviewed Oct 2, 2025

View reviewed changes

lindesvard merged commit ca4a880 into main Oct 2, 2025
9 checks passed

lindesvard deleted the graceful-healthchecks branch November 7, 2025 15:18

coderabbitai Bot mentioned this pull request Apr 8, 2026

Feature/mcp #331

Merged

coderabbitai Bot mentioned this pull request Apr 23, 2026

fix: improve healthcheck #349

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: graceful shutdown#205

feat: graceful shutdown#205
lindesvard merged 3 commits intomainfrom
graceful-healthchecks

lindesvard commented Sep 30, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Sep 30, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

vercel Bot commented Oct 2, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lindesvard commented Sep 30, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vercel Bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lindesvard commented Sep 30, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Sep 30, 2025 •

edited

Loading

vercel Bot commented Oct 2, 2025 •

edited

Loading