Conversation
WalkthroughRefactors health checks into Kubernetes-style liveness and readiness endpoints, adds a centralized graceful shutdown utility and wiring, updates index to use new endpoints and shutdown handlers, and extends request-logging to ignore the new health path. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant F as Fastify
participant HC as HealthController
participant R as Redis
participant D as DB
participant Q as QueryCache
Note over C,F: Liveness
C->>F: GET /healthz/live
F->>HC: liveness()
HC-->>F: { live: true }
F-->>C: 200 OK
rect rgba(230,245,255,0.5)
Note over C,F: Readiness (short-circuits if shutting down)
C->>F: GET /healthz/ready
F->>HC: readiness()
HC->>HC: check isShuttingDown()
alt Shutting down
HC-->>F: { ready: false, reason: "shutting_down" }
F-->>C: 503 Service Unavailable
else Not shutting down
par Redis check
HC->>R: ping/info
R-->>HC: ok/fail
and DB check
HC->>D: simple query/connect
D-->>HC: ok/fail
and Query cache check
HC->>Q: ping/get
Q-->>HC: ok/fail
end
alt Any failure
HC-->>F: { ready: false, deps: {...} }
F-->>C: 503 Service Unavailable
else All pass
HC-->>F: { ready: true, deps: {...} }
F-->>C: 200 OK
end
end
end
sequenceDiagram
autonumber
participant OS as OS/Node Runtime
participant F as Fastify
participant GS as GracefulShutdown
participant DB as Database
participant CH as ClickHouse
participant BQ as BullQueues
participant RC as RedisConns
OS-->>F: SIGTERM/SIGINT or fatal error
F->>GS: shutdown(fastify, signal)
GS->>GS: setShuttingDown(true)
Note over GS: Wait SHUTDOWN_GRACE_PERIOD_MS (~5s default)
GS->>F: fastify.close()
F-->>GS: closed/err (logged)
par Close backends
GS->>DB: disconnect()
DB-->>GS: ok/err
GS->>CH: disconnect()
CH-->>GS: ok/err
GS->>BQ: close queues (events/sessions/cron/...)
BQ-->>GS: ok/err
GS->>RC: close cache/pub/sub/queue if ready
RC-->>GS: ok/err
end
GS-->>OS: process.exit(exitCode)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
apps/api/src/utils/graceful-shutdown.ts (3)
18-26: Shutdown flag lacks thread-safety guarantees.JavaScript is single-threaded for execution, but the boolean flag
shuttingDowncould be subject to race conditions if multiple shutdown signals arrive simultaneously (e.g., SIGTERM followed quickly by SIGINT). While unlikely to cause major issues in practice due to the event loop model, consider using an atomic flag or checking if shutdown is already in progress at the start of theshutdown()function.Add an early-return guard in the shutdown function:
export async function shutdown(fastify: FastifyInstance, signal: string) { + if (shuttingDown) { + logger.warn('Shutdown already in progress, ignoring signal', { signal }); + return; + } logger.info('Starting graceful shutdown', { signal }); setShuttingDown(true);
34-35: Consider making the load balancer grace period configurable.The hard-coded 5-second wait is reasonable and aligns with typical Kubernetes
preStophook delays, but consider making it configurable via an environment variable for flexibility across different deployment environments.Example:
- // Step 2: Wait for load balancer to stop sending traffic (matches preStop sleep) - await new Promise((resolve) => setTimeout(resolve, 5000)); + // Step 2: Wait for load balancer to stop sending traffic (matches preStop sleep) + const gracePeriod = Number(process.env.SHUTDOWN_GRACE_PERIOD_MS || '5000'); + await new Promise((resolve) => setTimeout(resolve, gracePeriod));
96-97: Consider using non-zero exit code for error-triggered shutdowns.The shutdown function always exits with code
0, even when triggered byuncaughtExceptionorunhandledRejection. This may confuse orchestrators (Kubernetes, Docker) that rely on exit codes to determine if a restart is needed.Pass the exit code or distinguish error signals:
-export async function shutdown(fastify: FastifyInstance, signal: string) { +export async function shutdown(fastify: FastifyInstance, signal: string, exitCode = 0) { logger.info('Starting graceful shutdown', { signal }); // ... shutdown steps ... logger.info('Graceful shutdown completed'); - process.exit(0); + process.exit(exitCode); }Then in
index.ts:- process.on('uncaughtException', async (error) => { + process.on('uncaughtException', (error) => { logger.error('Uncaught exception', error); - await shutdown(fastify, 'uncaughtException'); + shutdown(fastify, 'uncaughtException', 1); });apps/api/src/controllers/healthcheck.controller.ts (1)
36-60: LGTM! Readiness probe correctly integrates shutdown awareness.The readiness endpoint properly:
- Short-circuits to 503 when the shutdown flag is set
- Performs lightweight dependency checks to determine readiness
- Returns detailed status information for debugging
One optional improvement: Consider adding timeouts to the dependency checks to prevent the readiness probe from hanging if a dependency becomes slow during shutdown or failure scenarios.
Example timeout wrapper:
const withTimeout = <T>(promise: Promise<T>, ms: number): Promise<T> => { return Promise.race([ promise, new Promise<T>((_, reject) => setTimeout(() => reject(new Error('Timeout')), ms) ), ]); }; // Then use: const redisRes = await withTimeout(getRedisCache().ping(), 1000);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apps/api/src/controllers/healthcheck.controller.ts(1 hunks)apps/api/src/hooks/request-logging.hook.ts(1 hunks)apps/api/src/index.ts(4 hunks)apps/api/src/utils/graceful-shutdown.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
apps/api/src/utils/graceful-shutdown.ts (4)
packages/db/src/clickhouse/client.ts (2)
error(40-45)ch(130-140)packages/db/src/prisma-client.ts (1)
db(270-270)packages/queue/src/queues.ts (5)
eventsQueue(94-104)sessionsQueue(106-111)cronQueue(116-121)miscQueue(123-128)notificationQueue(137-145)packages/redis/redis.ts (4)
getRedisCache(64-70)getRedisPub(82-88)getRedisSub(73-79)getRedisQueue(91-106)
apps/api/src/index.ts (3)
apps/api/src/controllers/healthcheck.controller.ts (3)
healthcheck(7-29)liveness(32-34)readiness(37-60)apps/api/src/utils/logger.ts (1)
logger(3-3)apps/api/src/utils/graceful-shutdown.ts (1)
shutdown(29-98)
apps/api/src/controllers/healthcheck.controller.ts (4)
packages/redis/redis.ts (1)
getRedisCache(64-70)packages/db/src/prisma-client.ts (1)
db(270-270)packages/db/src/clickhouse/client.ts (1)
chQuery(180-185)apps/api/src/utils/graceful-shutdown.ts (1)
isShuttingDown(24-26)
🔇 Additional comments (10)
apps/api/src/hooks/request-logging.hook.ts (1)
4-4: LGTM! Correctly excludes new health endpoints from logging.The addition of '/healthz' to the ignore list appropriately reduces log noise from frequent Kubernetes health probes while maintaining backward compatibility with the existing '/healthcheck' path.
apps/api/src/index.ts (3)
24-25: LGTM! Health endpoint imports aligned with Kubernetes patterns.The import of liveness and readiness functions supports the new Kubernetes-style health check endpoints.
44-44: LGTM! Centralized shutdown utility imported.
177-181: LGTM! Health endpoints properly registered with backward compatibility.The implementation correctly:
- Maintains the existing
/healthcheckendpoint for backward compatibility- Adds Kubernetes-standard
/healthz/livefor liveness probes- Adds
/healthz/readyfor readiness probesapps/api/src/utils/graceful-shutdown.ts (3)
37-43: LGTM! Fastify server closure properly handled.The
fastify.close()call drains in-flight requests gracefully with appropriate error handling.
75-94: LGTM! Redis connection closure properly handled.The code correctly:
- Checks connection status before calling
quit()- Uses
quit()instead ofdisconnect()to gracefully wait for pending commands- Handles errors per step
The status check for
'ready'is appropriate—connections in other states (e.g.,'connecting','disconnecting','end') should not be quit again.
61-73: queue.close() in API only tears down Redis connections, not jobs
In the API process you’re only enqueuing work—actual job execution and graceful shutdown (waiting on active jobs) occurs in the worker process viaworker.close(). Callingqueue.close()here simply closes the client connections; there are no in-flight jobs to await in this service.Likely an incorrect or invalid review comment.
apps/api/src/controllers/healthcheck.controller.ts (3)
1-1: LGTM! Import of shutdown flag for readiness check.
6-29: LGTM! Backward-compatible healthcheck with proper dependency checks.The healthcheck function correctly validates all dependencies (Redis, DB, ClickHouse) and returns appropriate status codes with detailed responses.
31-34: LGTM! Liveness probe correctly implemented.The liveness endpoint properly follows Kubernetes conventions by only checking if the process is alive, without performing dependency checks.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
apps/api/src/index.ts (1)
220-229: Process event handlers don't await async functions.While the handlers are now declared
asyncand useawait, Node.js process event handlers (SIGTERM,SIGINT,uncaughtException,unhandledRejection) are inherently synchronous and won't wait for async operations to complete. Theasync/awaitwrapper here doesn't provide the synchronization guarantee you might expect.However, since your
shutdown()utility callsprocess.exit()at the end of its execution, the process will remain alive until shutdown completes. The current implementation works, but theasync/awaitwrapper is misleading about how it achieves graceful shutdown.Consider simplifying to make the control flow clearer:
- process.on('SIGTERM', async () => await shutdown(fastify, 'SIGTERM', 0)); - process.on('SIGINT', async () => await shutdown(fastify, 'SIGINT', 0)); - process.on('uncaughtException', async (error) => { + process.on('SIGTERM', () => shutdown(fastify, 'SIGTERM', 0)); + process.on('SIGINT', () => shutdown(fastify, 'SIGINT', 0)); + process.on('uncaughtException', (error) => { logger.error('Uncaught exception', error); - await shutdown(fastify, 'uncaughtException', 1); + shutdown(fastify, 'uncaughtException', 1); }); - process.on('unhandledRejection', async (reason, promise) => { + process.on('unhandledRejection', (reason, promise) => { logger.error('Unhandled rejection', { reason, promise }); - await shutdown(fastify, 'unhandledRejection', 1); + shutdown(fastify, 'unhandledRejection', 1); });This makes it clear that the handlers fire-and-forget the shutdown promise, which is safe because
shutdown()manages the process lifecycle internally viaprocess.exit().
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (2)
apps/api/src/index.ts(4 hunks)apps/api/src/utils/graceful-shutdown.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/api/src/utils/graceful-shutdown.ts
🧰 Additional context used
🧬 Code graph analysis (1)
apps/api/src/index.ts (2)
apps/api/src/controllers/healthcheck.controller.ts (3)
healthcheck(7-29)liveness(32-34)readiness(37-60)apps/api/src/utils/graceful-shutdown.ts (1)
shutdown(29-108)
🔇 Additional comments (2)
apps/api/src/index.ts (2)
24-25: LGTM! Clean imports for health endpoints and shutdown utility.The new imports for Kubernetes-style health endpoints (
liveness,readiness) and the centralizedshutdownutility align well with the PR's graceful shutdown objectives.Also applies to: 44-44
177-181: LGTM! Proper Kubernetes health endpoint implementation.The implementation correctly:
- Maintains backward compatibility with the existing
/healthcheckendpoint- Follows Kubernetes conventions with
/healthzprefix- Separates liveness (always responds) from readiness (checks dependencies and shutdown state)
Summary by CodeRabbit