Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish redis-worker and add graceful shutdown manager #1810

Merged
merged 27 commits into from
Mar 21, 2025

Conversation

nicktrn
Copy link
Collaborator

@nicktrn nicktrn commented Mar 21, 2025

This PR makes the required changes to publish redis-worker. As it depends on internal packages we're using tsup to bundle them.

Also adds a ShutdownManager to help with challenges around multiple components adding their own graceful shutdown handlers. This ensures amongst other things that the process actually does exit as expect once all handlers have finished executing.

Other important changes:

  • Fix Cursor terminal links
  • Add redis worker and HTTP server metrics
  • Update checkpoint/restore

Summary by CodeRabbit

  • New Features

    • Introduced configurable worker shutdown timeouts and enhanced Kubernetes deployment settings via new environment variables.
    • Expanded workload API support by including additional metadata headers (deployment version and project reference).
    • Integrated metrics collection with request duration tracking and a dedicated metrics endpoint for improved observability.
    • Added new package @trigger.dev/redis-worker for Redis worker functionality.
    • Enhanced HttpServer with metrics collection capabilities and a new /metrics endpoint.
    • Introduced ShutdownManager for graceful shutdown handling in response to signals.
  • Bug Fixes

    • Enhanced error handling in the SimpleQueue test suite to ensure items are successfully dequeued.
  • Refactor

    • Streamlined error handling and logging for development and runtime processes.
    • Migrated from an internal Redis worker package to a public one, ensuring a more maintainable dependency structure.

Copy link

changeset-bot bot commented Mar 21, 2025

⚠️ No Changeset found

Latest commit: 921edcb

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Mar 21, 2025

Walkthrough

This pull request implements extensive configuration and dependency updates along with several code and schema refinements. It replaces internal redis-worker references with a public package, adds new environment variables for Kubernetes and worker shutdown timeouts, and improves error handling, metrics collection, header extraction, checkpoint API schemas, and shutdown management. Additional adjustments include enhanced test guidelines, updated build configurations for a new redis-worker package, and script improvements for robust cleanup during pre-release publishing.

Changes

File(s) Change Summary
.changeset/config.json Replaced "proxy" with "supervisor" in the ignore list.
cursor/rules/webapp.mdc, ai/references/repo.md Renamed package reference from @internal/redis-worker to @trigger.dev/redis-worker; updated package descriptions accordingly.
apps/webapp/package.json, internal-packages/run-engine/package.json, apps/webapp/app/v3/commonWorker.server.ts, apps/webapp/app/v3/legacyRunEngineWorker.server.ts Updated dependency references to use @trigger.dev/redis-worker instead of the internal version; added shutdownTimeoutMs configuration to worker setups.
apps/supervisor/src/env.ts, apps/webapp/app/env.server.ts Introduced new environment variables for Kubernetes settings (e.g., KUBERNETES_FORCE_ENABLED, KUBERNETES_NAMESPACE, storage limits) and worker shutdown timeouts.
apps/supervisor/src/index.ts, apps/supervisor/src/workloadManager/kubernetes.ts, apps/supervisor/src/workloadServer/index.ts Enhanced environment detection (using isKubernetesEnvironment), restructured message handling, added new header extraction methods, and introduced an orchestrator property.
packages/core/src/**/* (multiple files) Revised checkpoint schemas (for suspend/restore), updated HTTP client methods (submitSuspendCompletion), integrated metrics collection in httpServer.ts, added new utilities (e.g., tryCatch), and updated type imports.
internal-packages/redis-worker/* Removed internal redis-worker package files and their TypeScript configs.
packages/redis-worker/* Introduced a new public redis-worker package with updated build (tsconfig, tsup) configurations, enhanced shutdown handling, and improved test error checks.
scripts/publish-prerelease.sh Improved error handling and cleanup with new git_reset and abort functions; added a trap for interrupt signals.
patches/supports-hyperlinks@2.3.0.patch Modified hyperlink support to return true if CURSOR_TRACE_ID is present.
packages/cli-v3/src/entryPoints/managed-run-controller.ts Enhanced WorkloadHttpClient configuration by adding deploymentVersion and projectRef properties.

Sequence Diagram(s)

Loading
sequenceDiagram
    participant S as ShutdownManager
    participant H as Registered Handlers
    participant P as Process
    Note over S, P: Shutdown initiated (SIGTERM/SIGINT)
    S->>H: Trigger all registered shutdown handlers concurrently
    H-->>S: Return handler statuses
    S->>P: Exit process with appropriate signal code
sequenceDiagram
    participant C as Client Request
    participant WS as WorkloadServer
    participant CP as CheckpointClient
    Note over C,WS: Handling suspend run request with extended headers
    C->>WS: Send request with headers (runnerId, deploymentVersion, projectRef)
    WS->>WS: Extract header values using new helper methods
    WS->>CP: Call restoreRun with checkpoint body (includes orchestrator info)
    CP-->>WS: Return response
    WS-->>C: Respond with suspension status

Possibly related PRs

Suggested reviewers

  • ericallam
  • matt-aitken

Poem

I'm a rabbit on a coding spree,
Hopping through changes wild and free.
From configs to headers, updates galore,
New dependencies and metrics to explore.
With each line I nibble bugs away,
In this digital burrow, I joyfully play!
🐰 Happy coding!

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
packages/core/src/v3/serverOnly/shutdownManager.test.ts (1)

4-9: Question usage of private members in tests
It's a standard approach to use type assertions for accessing private members in a white-box test scenario. However, consider whether you can test the same functionality through the public interface alone to maintain better encapsulation.

packages/redis-worker/package.json (2)

14-17: Module type consistency.
You are shipping both CJS and ESM builds while specifying "type": "module". This can cause confusion in some build tools or environments. Consider removing "type": "module" or using separate packages for each module type if you encounter unexpected import behavior.


33-42: Dev dependencies referencing internal packages.
Including @internal/* packages in devDependencies is usually fine since consumers won't install these, but confirm that any tooling (e.g., test containers, tracing) doesn’t block usage for external contributors.

packages/core/src/v3/serverOnly/shutdownManager.ts (2)

4-7: Limited signal handling.
Restricting shutdown signals to “SIGTERM” and “SIGINT” prevents unexpected issues. If your environment sends other signals (like “SIGHUP”), consider extending support to ensure all relevant signals trigger graceful shutdowns.


39-80: Graceful shutdown strategy.
Running all handlers concurrently with Promise.allSettled is a solid approach to ensure no single slow handler prevents others from completing. Logging all failures is helpful for debugging, but consider whether partial failures should block the process from exiting.

apps/supervisor/src/workloadServer/index.ts (1)

97-98: Header retrieval logic is fine.
Note that Node.js lowercases header keys, so ensure WORKLOAD_HEADERS keys match that case. Otherwise, consider toLowerCase().

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between adca199 and aa3e57d.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (51)
  • .changeset/config.json (1 hunks)
  • .cursor/rules/webapp.mdc (1 hunks)
  • ai/references/repo.md (1 hunks)
  • ai/references/tests.md (1 hunks)
  • apps/supervisor/package.json (1 hunks)
  • apps/supervisor/src/env.ts (1 hunks)
  • apps/supervisor/src/index.ts (5 hunks)
  • apps/supervisor/src/workloadManager/kubernetes.ts (2 hunks)
  • apps/supervisor/src/workloadServer/index.ts (4 hunks)
  • apps/webapp/app/env.server.ts (3 hunks)
  • apps/webapp/app/v3/commonWorker.server.ts (2 hunks)
  • apps/webapp/app/v3/legacyRunEngineWorker.server.ts (2 hunks)
  • apps/webapp/app/v3/runEngine.server.ts (1 hunks)
  • apps/webapp/package.json (1 hunks)
  • internal-packages/redis-worker/package.json (0 hunks)
  • internal-packages/redis-worker/tsconfig.json (0 hunks)
  • internal-packages/redis-worker/tsconfig.src.json (0 hunks)
  • internal-packages/redis-worker/tsconfig.test.json (0 hunks)
  • internal-packages/redis/src/index.ts (1 hunks)
  • internal-packages/run-engine/package.json (2 hunks)
  • internal-packages/run-engine/src/engine/index.ts (2 hunks)
  • internal-packages/run-engine/src/engine/types.ts (2 hunks)
  • internal-packages/tracing/src/index.ts (1 hunks)
  • package.json (1 hunks)
  • packages/cli-v3/src/entryPoints/managed-run-controller.ts (1 hunks)
  • packages/core/package.json (1 hunks)
  • packages/core/src/utils.ts (1 hunks)
  • packages/core/src/v3/isomorphic/friendlyId.ts (1 hunks)
  • packages/core/src/v3/runEngineWorker/consts.ts (1 hunks)
  • packages/core/src/v3/runEngineWorker/supervisor/http.ts (2 hunks)
  • packages/core/src/v3/runEngineWorker/supervisor/schemas.ts (1 hunks)
  • packages/core/src/v3/runEngineWorker/workload/types.ts (1 hunks)
  • packages/core/src/v3/runEngineWorker/workload/util.ts (1 hunks)
  • packages/core/src/v3/schemas/checkpoints.ts (3 hunks)
  • packages/core/src/v3/schemas/runEngine.ts (1 hunks)
  • packages/core/src/v3/serverOnly/checkpointClient.ts (3 hunks)
  • packages/core/src/v3/serverOnly/httpServer.ts (4 hunks)
  • packages/core/src/v3/serverOnly/index.ts (1 hunks)
  • packages/core/src/v3/serverOnly/k8s.ts (1 hunks)
  • packages/core/src/v3/serverOnly/shutdownManager.test.ts (1 hunks)
  • packages/core/src/v3/serverOnly/shutdownManager.ts (1 hunks)
  • packages/core/src/v3/serverOnly/singleton.ts (1 hunks)
  • packages/redis-worker/package.json (1 hunks)
  • packages/redis-worker/src/queue.test.ts (5 hunks)
  • packages/redis-worker/src/worker.ts (13 hunks)
  • packages/redis-worker/tsconfig.json (1 hunks)
  • packages/redis-worker/tsconfig.src.json (1 hunks)
  • packages/redis-worker/tsconfig.test.json (1 hunks)
  • packages/redis-worker/tsup.config.ts (1 hunks)
  • patches/supports-hyperlinks@2.3.0.patch (1 hunks)
  • scripts/publish-prerelease.sh (3 hunks)
💤 Files with no reviewable changes (4)
  • internal-packages/redis-worker/tsconfig.src.json
  • internal-packages/redis-worker/tsconfig.json
  • internal-packages/redis-worker/tsconfig.test.json
  • internal-packages/redis-worker/package.json
🧰 Additional context used
🧬 Code Definitions (12)
apps/webapp/app/v3/runEngine.server.ts (1)
apps/webapp/app/env.server.ts (1) (1)
  • env (678-678)
packages/core/src/v3/serverOnly/k8s.ts (1)
apps/supervisor/src/env.ts (1) (1)
  • env (56-56)
packages/core/src/v3/runEngineWorker/workload/util.ts (1)
packages/core/src/v3/runEngineWorker/consts.ts (1) (1)
  • WORKLOAD_HEADERS (8-13)
apps/webapp/app/v3/legacyRunEngineWorker.server.ts (1)
apps/webapp/app/env.server.ts (1) (1)
  • env (678-678)
apps/webapp/app/v3/commonWorker.server.ts (1)
apps/webapp/app/env.server.ts (1) (1)
  • env (678-678)
packages/core/src/v3/serverOnly/shutdownManager.test.ts (1)
packages/core/src/v3/serverOnly/shutdownManager.ts (1) (1)
  • shutdownManager (89-89)
apps/supervisor/src/workloadManager/kubernetes.ts (1)
apps/supervisor/src/env.ts (1) (1)
  • env (56-56)
packages/core/src/v3/runEngineWorker/supervisor/http.ts (2)
packages/core/src/v3/runEngineWorker/supervisor/schemas.ts (4) (4)
  • WorkerApiSuspendRunRequestBody (31-40)
  • WorkerApiSuspendRunRequestBody (41-41)
  • WorkerApiSuspendRunResponseBody (43-45)
  • WorkerApiSuspendRunResponseBody (46-46)
packages/core/src/v3/apiClient/core.ts (1) (1)
  • wrapZodFetch (706-746)
packages/core/src/v3/schemas/checkpoints.ts (1)
packages/core/src/v3/schemas/runEngine.ts (2) (2)
  • DequeuedMessage (225-260)
  • DequeuedMessage (261-261)
packages/core/src/v3/serverOnly/shutdownManager.ts (1)
packages/core/src/v3/serverOnly/singleton.ts (1) (1)
  • singleton (1-8)
apps/supervisor/src/index.ts (2)
packages/core/src/v3/serverOnly/k8s.ts (1) (1)
  • isKubernetesEnvironment (3-16)
apps/supervisor/src/env.ts (1) (1)
  • env (56-56)
packages/cli-v3/src/entryPoints/managed-run-controller.ts (1)
apps/supervisor/src/env.ts (1) (1)
  • env (56-56)
🪛 Biome (1.9.4)
packages/core/src/v3/serverOnly/httpServer.ts

[error] 210-210: Unsafe usage of 'return'.

'return' in 'finally' overwrites the control flow statements inside 'try' and 'catch'.

(lint/correctness/noUnsafeFinally)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: typecheck / typecheck
🔇 Additional comments (130)
packages/core/src/v3/isomorphic/friendlyId.ts (1)

24-26: Optimization to prevent duplicate ID prefixing

This is a good addition that prevents creating redundant prefixes when an ID already has the correct entityName prefix format. This change ensures that if an ID is processed multiple times through toFriendlyId, you won't end up with malformed IDs like entityName_entityName_id.

internal-packages/tracing/src/index.ts (1)

1-2: TypeScript type imports properly marked

Good improvement to explicitly mark imports that are only used as types with the type keyword. This follows TypeScript best practices and can help with bundle optimization since type imports are erased during compilation.

.changeset/config.json (1)

15-15: Package ignore list updated for versioning

The change replaces "proxy" with "supervisor" in the ignore list, which aligns with the PR's goal of restructuring the worker management system. This will affect how these packages are versioned and included in changelogs when using Changesets.

packages/redis-worker/tsconfig.json (1)

1-11: New TypeScript configuration for redis-worker package

This configuration follows the project's established patterns for TypeScript packages by extending a base configuration and referencing separate configs for source and test code. This is consistent with the PR's goal of publishing the redis-worker as a public package.

package.json (1)

77-78:

✅ Verification successful

Added new patched dependency for supports-hyperlinks.

This change adds a patched version of supports-hyperlinks@2.3.0 to the patchedDependencies section. This appears to be part of the transition from using the internal Redis worker to a public @trigger.dev/redis-worker package.


🏁 Script executed:

#!/bin/bash
# Verify that the patch file exists
if [ -f "patches/supports-hyperlinks@2.3.0.patch" ]; then
  echo "✅ Patch file exists"
else
  echo "❌ Patch file does not exist"
fi

Length of output: 101


Verified Patched Dependency: Supports-Hyperlinks@2.3.0
The patch file at patches/supports-hyperlinks@2.3.0.patch has been confirmed to exist, indicating that the new patched dependency added in the package.json (lines 77-78) is correctly set up. This change supports the transition from the internal Redis worker to the public @trigger.dev/redis-worker package.

  • File: package.json (Lines 77-78)
  • Patch Confirmed: patches/supports-hyperlinks@2.3.0.patch exists

Approved code changes.

packages/core/package.json (2)

192-192: Added Prometheus client for metrics collection.

Adding prom-client is a good addition to support the metrics collection features mentioned in the PR objectives for both the Redis worker and HTTP server.


195-195: Added std-env for environment variable management.

The std-env package provides a standardized way to handle environment variables across different environments, which will help with the graceful shutdown management and environment-specific configurations.

packages/redis-worker/tsconfig.test.json (1)

1-11: New TypeScript test configuration for redis-worker package.

This configuration correctly sets up the TypeScript environment for testing the redis-worker package. It properly extends the base configuration, includes test files, references the source configuration, and sets appropriate compiler options for testing with Vitest.

This is part of the broader effort to publish the redis-worker as a public package as mentioned in the PR objectives. Having a proper test configuration ensures the package is well-tested before publication.

apps/webapp/app/v3/runEngine.server.ts (1)

20-20: Added shutdown timeout for graceful worker termination.

This change adds the shutdownTimeoutMs configuration parameter to the RunEngine initialization, using the environment variable RUN_ENGINE_WORKER_SHUTDOWN_TIMEOUT_MS. This directly supports the PR objective of implementing a graceful shutdown manager to ensure the process exits properly after all shutdown handlers have completed.

The parameter is correctly integrated with the existing configuration pattern, maintaining consistency with the codebase's approach to configuration management.

.cursor/rules/webapp.mdc (1)

12-12: Package reference updated correctly

The reference to the Redis worker has been updated from @internal/redis-worker to @trigger.dev/redis-worker, which aligns with the PR objective of publishing the redis-worker as a public package.

patches/supports-hyperlinks@2.3.0.patch (1)

9-13: Good fix for Cursor terminal hyperlinks

This patch correctly adds support for detecting when the application is running in a Cursor terminal environment by checking for the CURSOR_TRACE_ID environment variable. This addresses the PR objective of "Fixes to Cursor terminal links."

ai/references/tests.md (1)

28-32: Clear guidance on test file placement

This addition provides clear guidance on where to place test files, following the common industry practice of co-locating tests with their implementation files. This makes it easier for developers to find and maintain tests.

apps/supervisor/package.json (1)

9-9: Improved dev script with helpful error handling

The updated dev script now includes error handling that reminds users to run nvm use if the command fails. This is a good developer experience improvement that will help prevent confusion when the Node.js version isn't set correctly.

packages/core/src/v3/serverOnly/index.ts (1)

4-6: Good addition of new exports

These exports make modules from "singleton.js", "shutdownManager.js", and "k8s.js" available to importers, which aligns with the PR objective of implementing a ShutdownManager to handle graceful shutdowns across components.

packages/core/src/v3/runEngineWorker/workload/types.ts (1)

4-6: Expanded type definition with deployment tracking properties

The addition of deploymentId, deploymentVersion, and projectRef properties enhances the WorkloadClientCommonOptions type with better deployment tracking capabilities.

apps/webapp/package.json (1)

56-56: Package dependency updated from internal to public

Replacing @internal/redis-worker with @trigger.dev/redis-worker aligns with the PR objective of publishing the redis-worker as a public package.

packages/core/src/v3/serverOnly/k8s.ts (1)

1-16: Well-implemented Kubernetes environment detection

The isKubernetesEnvironment function provides a clean way to detect if the application is running in a Kubernetes environment by checking common environment variables. The optional override parameter adds flexibility for testing or forced configurations.

The implementation uses the efficient .some(Boolean) pattern to check if any of the environment variables are present.

internal-packages/redis/src/index.ts (1)

1-1: Clean TypeScript import enhancement.

The import statement now explicitly identifies RedisOptions as a type with the type keyword, which improves code clarity and can help with tree-shaking.

packages/core/src/v3/serverOnly/singleton.ts (1)

1-8: Well-implemented singleton pattern.

This utility function provides a clean way to manage singleton instances across the application. It's type-safe and properly initializes the global storage object.

A few observations:

  • Uses globalThis to store singletons, which works across different JavaScript environments
  • Properly type-parameterized to ensure type safety
  • Initializes the singleton container only once
  • Returns the cached instance when available, preventing duplicate initialization
packages/redis-worker/tsconfig.src.json (1)

1-10: Good TypeScript configuration for a publishable package.

This configuration is appropriate for a package that's being prepared for publication, with proper source mapping and composite project support.

The configuration:

  • Extends the base configuration
  • Properly includes source files
  • Enables source maps for debugging
  • Sets up composite project support for incremental builds
  • Includes custom conditions for module resolution
internal-packages/run-engine/src/engine/index.ts (2)

2-2: Package import updated to use published redis-worker.

The import has been changed from the internal package to the publicly published package, aligning with the PR objective of publishing the redis-worker.


123-123: Added shutdown timeout configuration.

The shutdownTimeoutMs property has been added to the Worker configuration, supporting the new graceful shutdown manager feature. This allows configuring how long the worker should wait for in-progress jobs to complete before stopping.

This change aligns with the PR objective of implementing a ShutdownManager to ensure proper process termination after all shutdown handlers complete.

internal-packages/run-engine/src/engine/types.ts (2)

2-2: Change from internal to published package

The import for Worker and WorkerConcurrencyOptions now comes from the published @trigger.dev/redis-worker package instead of the internal one, which aligns with the PR objective of publishing the redis-worker.


16-16: Added shutdown timeout configuration

The new shutdownTimeoutMs property provides control over worker shutdown behavior, supporting the graceful shutdown manager implementation mentioned in the PR objectives.

packages/core/src/v3/runEngineWorker/consts.ts (1)

11-12: Enhanced workload context headers

The addition of DEPLOYMENT_VERSION and PROJECT_REF headers will improve workload tracking and metrics collection capabilities, allowing for better identification of requests by deployment version and project.

packages/core/src/utils.ts (1)

5-12: Useful error handling utility

The new tryCatch function provides a clean pattern for Promise error handling, eliminating nested try/catch blocks and simplifying error management in async code.

Example usage:

const [error, result] = await tryCatch(someAsyncOperation());
if (error) {
  // Handle error
} else {
  // Use result
}
packages/cli-v3/src/entryPoints/managed-run-controller.ts (1)

154-155: Add deployment metadata to HTTP client

These additions enhance the WorkloadHttpClient with deployment version and project reference information, which corresponds to the newly added workload headers in the constants file. This provides important context for metrics collection and request identification.

packages/core/src/v3/runEngineWorker/workload/util.ts (1)

11-12: Headers extended for enriched workload context

The addition of deployment version and project reference headers enhances the contextual information passed with each workload. This change aligns with the PR objectives of improving metrics collection and is consistent with the header constants defined in consts.ts.

packages/core/src/v3/runEngineWorker/supervisor/schemas.ts (1)

146-152: New schema for suspend completion responses

This schema provides a standardized structure for suspend completion responses, with success/error status reporting. The schema correctly follows the established pattern of other API schemas in this file and properly exports both the Zod schema and inferred TypeScript type.

internal-packages/run-engine/package.json (1)

18-18: Redis worker dependency updated to public package

This change aligns with the PR objective of publishing the redis-worker by replacing the internal package reference with the public one. This promotes better code reuse and standardization across the project.

apps/webapp/app/v3/commonWorker.server.ts (2)

1-1: Import updated to use published redis-worker package

Updated import path to use the newly published @trigger.dev/redis-worker package instead of the internal one, consistent with the PR's objective of standardizing on the public package.


83-83: Graceful shutdown timeout added to worker configuration

Added shutdownTimeoutMs configuration which implements the PR objective of improving graceful shutdown management. This ensures the worker will properly complete its tasks before shutting down, with the timeout value controlled via environment variables.

apps/webapp/app/v3/legacyRunEngineWorker.server.ts (2)

1-1: Import updated to use published package

The import statement has been updated to use the published package @trigger.dev/redis-worker instead of the internal package. This change aligns with the PR objective of publishing the redis-worker package externally.


70-70:

✅ Verification successful

Added shutdown timeout configuration

Adding the shutdownTimeoutMs property improves graceful shutdown handling by allowing configuration of how long the worker should wait before forcefully terminating. This is part of the new ShutdownManager implementation mentioned in the PR objectives.


🏁 Script executed:

#!/bin/bash
# Check for the environment variable definition
rg "LEGACY_RUN_ENGINE_WORKER_SHUTDOWN_TIMEOUT_MS" --type ts

Length of output: 302


Approved: Shutdown Timeout Configuration Validated

The shutdown timeout property is correctly wired. The environment variable LEGACY_RUN_ENGINE_WORKER_SHUTDOWN_TIMEOUT_MS is set with a default value in apps/webapp/app/env.server.ts (default: 60_000) and properly referenced in apps/webapp/app/v3/legacyRunEngineWorker.server.ts. This confirms that the graceful shutdown handling via the new ShutdownManager implementation meets the PR objectives.

  • apps/webapp/app/v3/legacyRunEngineWorker.server.ts: Uses shutdownTimeoutMs: env.LEGACY_RUN_ENGINE_WORKER_SHUTDOWN_TIMEOUT_MS
  • apps/webapp/app/env.server.ts: Defines LEGACY_RUN_ENGINE_WORKER_SHUTDOWN_TIMEOUT_MS with an appropriate default
scripts/publish-prerelease.sh (5)

38-47: Enhanced error handling with cleanup functions

The addition of git_reset() and abort() functions improves code maintainability by centralizing the cleanup logic. This ensures consistent cleanup when the script is interrupted or aborted, preventing Git repository states from being left in an inconsistent state.


48-48: Added trap for interrupt signals

The trap command ensures that the abort function is called when the script receives an interrupt signal (CTRL+C), providing proper cleanup even when the script execution is forcibly terminated.


62-62: Added pause for manual changes

This pause gives the user an opportunity to make manual adjustments if needed before continuing with the build and publish process, adding flexibility to the workflow.


75-75: Using abort function for consistency

Replaced direct call to git reset --hard HEAD with the abort() function, which ensures consistent cleanup behavior throughout the script.


80-80: Using git_reset function for consistency

Replaced direct call to git reset --hard HEAD with the git_reset() function, maintaining consistency in how Git reset operations are performed throughout the script.

apps/supervisor/src/env.ts (1)

50-53:

✅ Verification successful

Added Kubernetes configuration environment variables

These new environment variables enhance Kubernetes configuration options:

  • KUBERNETES_FORCE_ENABLED: Allows forcing Kubernetes mode
  • KUBERNETES_NAMESPACE: Configures the Kubernetes namespace (default: "default")
  • EPHEMERAL_STORAGE_SIZE_LIMIT and EPHEMERAL_STORAGE_SIZE_REQUEST: Define storage resources

These additions provide better flexibility and control for Kubernetes deployments, with sensible defaults.


🏁 Script executed:

#!/bin/bash
# Check where these environment variables are used in the codebase
rg "env\.KUBERNETES_NAMESPACE|env\.EPHEMERAL_STORAGE_SIZE" --type ts

Length of output: 506


Attention: Kubernetes Environment Variables Added and Verified

The addition in apps/supervisor/src/env.ts (lines 50-53) correctly introduces the following environment variables using sensible defaults:

  • KUBERNETES_FORCE_ENABLED: Enables forced Kubernetes mode.
  • KUBERNETES_NAMESPACE: Sets the Kubernetes namespace (default is "default").
  • EPHEMERAL_STORAGE_SIZE_LIMIT & EPHEMERAL_STORAGE_SIZE_REQUEST: Define storage resource limits and requests.

Verification using rg confirms these variables are referenced appropriately in the codebase:

  • apps/kubernetes-provider/src/index.ts: Utilizes process.env.KUBERNETES_NAMESPACE.
  • apps/supervisor/src/workloadManager/kubernetes.ts: References both EPHEMERAL_STORAGE_SIZE_LIMIT and EPHEMERAL_STORAGE_SIZE_REQUEST along with the namespace.

No issues were found with their usage, and the integration provides enhanced control over Kubernetes deployments.

packages/redis-worker/tsup.config.ts (1)

1-28: Well-structured tsup configuration for redis-worker package

This configuration:

  • Builds from the src/index.ts entry point
  • Outputs both CommonJS and ESM formats for maximum compatibility
  • Generates TypeScript declaration files
  • Enables source maps for better debugging
  • Applies tree-shaking to reduce bundle size
  • Ensures internal packages and ESM-only packages are always bundled
  • Adds a banner for ESM format to support CommonJS require calls

The configuration follows best practices for JavaScript/TypeScript package publishing, making the package widely compatible across different environments.

ai/references/repo.md (1)

17-17: Documentation updated to reflect the new redis-worker package

The addition of the @trigger.dev/redis-worker package documentation aligns with the PR objective of publishing the redis-worker. This change correctly documents the new public package that replaces the internal implementation.

packages/redis-worker/src/queue.test.ts (5)

33-36: Improved error handling in tests

Adding null checks for dequeued items ensures that the test fails explicitly when an item isn't dequeued properly, rather than causing potential downstream errors that would be harder to debug.


55-58: Consistent error handling pattern

This null check follows the same pattern established earlier, ensuring consistent error handling throughout the test file.


266-268: Robust error checking for multiple dequeued items

This check ensures both items in the array exist before proceeding with acknowledgment operations, preventing potential runtime errors.


287-289: Consistent error handling for last item

Following the established pattern, this check ensures the last dequeued item exists before acknowledging it.


337-339: Error handling for redriven items

This check verifies that items redriven from the Dead Letter Queue are properly dequeued, maintaining the error handling consistency throughout the tests.

packages/core/src/v3/schemas/runEngine.ts (2)

215-221: New checkpoint schema with improved type safety

The new DequeueMessageCheckpoint schema provides better type safety by using the CheckpointType enum instead of a string for the type property, which helps prevent potential type errors.


224-260: Updated DequeuedMessage schema with optional checkpoint

The updated DequeuedMessage schema now references the new DequeueMessageCheckpoint type for the checkpoint field and marks it as optional. This change aligns with the PR's objective of updating checkpoint functionality.

apps/webapp/app/env.server.ts (3)

427-427: Added configuration for Run Engine worker shutdown timeout

This new environment variable supports the ShutdownManager implementation by providing a configurable timeout for graceful shutdowns of the Run Engine worker processes.


595-595: Added configuration for Legacy Run Engine worker shutdown timeout

This environment variable ensures consistent shutdown behavior for legacy run engine workers, maintaining compatibility while implementing the new ShutdownManager.


638-638: Added configuration for Common worker shutdown timeout

This environment variable completes the set of configurations needed for the ShutdownManager to handle graceful shutdowns across all worker types, ensuring that processes exit as expected after shutdown handlers complete.

packages/core/src/v3/serverOnly/shutdownManager.test.ts (11)

1-2: Good import statements
No issues found, the imports from Vitest and the local shutdownManager are used properly.


11-20: Proper test setup
Disabling concurrency ensures that global mocking of process.exit doesn't cause concurrency conflicts. Good approach to reset the manager's state in beforeEach.


22-30: Comprehensive test coverage for registration
The test ensures that the handler is correctly stored. Great job verifying both the handler and the associated signals.


32-40: Duplicate registration validation
Throwing an error for duplicate handler names is an excellent safeguard. The test fully confirms the behavior.


41-47: Proper custom signal registration
Verifies that the specified signal is honored. This helps ensure that only the intended signals trigger the handler.


49-61: Correct multi-handler invocation
Ensures that all relevant handlers are invoked and that the process exit code is correct for SIGTERM. Excellent coverage.


63-75: Specific signal filtering
Correctly ensures that only those handlers registered for SIGTERM are invoked. This is an important separation of concerns.


77-89: Error handling
Good approach verifying that a single failing handler does not block the overall shutdown sequence.


91-100: Idempotent shutdown
Nicely ensures that repeated calls to shutdown do not run the sequence multiple times, preventing duplicates or race conditions.


102-115: Correct exit code assertion
Verifies the correct numeric offset for SIGINT and SIGTERM. This ensures consistent behavior across signals.


117-175: Ensuring asynchronous handlers complete
This test verifies that all asynchronous handlers are awaited and the process only exits afterward. Strong coverage of concurrency aspects.

packages/core/src/v3/runEngineWorker/supervisor/http.ts (2)

18-19: New imports for run suspension
Importing WorkerApiSuspendRunRequestBody and WorkerApiSuspendRunResponseBody sets the foundation for the new suspend completion API calls.


225-248: submitSuspendCompletion method
This addition is consistent with other fetch-based methods, leveraging wrapZodFetch for typed responses. It nicely handles both the request body and optional runnerId.

packages/core/src/v3/schemas/checkpoints.ts (3)

1-1: Import from runEngine
Includes DequeuedMessage for usage in the new CheckpointServiceRestoreRequestBody definition.


11-15: New properties in CheckpointServiceSuspendRequestBody
The addition of runId, snapshotId, runnerId, projectRef, and deploymentVersion clarifies the context of the suspended checkpoint, aligning with the new design requirements.


34-34: Refined restore request body
Replacing the prior discriminated union with DequeuedMessage ensures the required checkpoint field is included. It's more direct and simplifies the schema.

packages/redis-worker/package.json (4)

1-5: Confirm version and package identification.
The package name, scope, and version look accurate for a publicly published package. However, please verify that 3.3.17 aligns with your release strategy to avoid confusion in the monorepo.


6-8: Publish config check.
Using "access": "public" is correct for a publicly published scope. Ensure you have the correct npm permissions under the @trigger.dev scope before publishing.


25-32: Check workspace dependencies.
Referencing @trigger.dev/core with "workspace:*" is likely correct within the monorepo, but ensure the workspace resolution is configured and pinned to a consistent version at publish time.


46-55: Exports structure looks good.
Declaring multiple entry points (CommonJS, ESM, and types) in package.json using the exports field is a solid approach for modern Node.js packaging. This is consistent and should provide a smooth developer experience.

apps/supervisor/src/workloadManager/kubernetes.ts (2)

19-19: Confirm fallback for Kubernetes namespace.
Replacing the default "default" namespace with env.KUBERNETES_NAMESPACE is sensible. However, ensure that KUBERNETES_NAMESPACE is always set or that you provide a fallback in case it’s undefined.


203-213: Environment-based ephemeral storage settings.
Switching from hardcoded storage values to environment variables increases flexibility. Double-check that your environment variables EPHEMERAL_STORAGE_SIZE_REQUEST and EPHEMERAL_STORAGE_SIZE_LIMIT are always properly set and validated during deployment.

packages/core/src/v3/serverOnly/shutdownManager.ts (2)

8-22: Constructor attaches signal listeners.
Automatically binding to “SIGTERM” and “SIGINT” inside the constructor ensures consistent behavior across the app. Confirm that no other conflicting shutdown logic is present in the codebase.


89-89: Singleton export is clean.
Exporting a single shutdownManager instance avoids multiple competing signal handlers throughout the code, which fosters consistency.

apps/supervisor/src/index.ts (5)

16-20: Imports look consistent and purposeful.
These imports from @trigger.dev/core/v3/serverOnly correctly align the HTTP server and environment detection utilities with the new orchestrator logic.


32-32: Ensure the override parameter is properly typed.
Passing env.KUBERNETES_FORCE_ENABLED as the override parameter is correct, but double-check that it’s strictly a boolean to avoid unwanted type coercion.

Would you like to validate that KUBERNETES_FORCE_ENABLED is indeed a boolean in apps/supervisor/src/env.ts?


101-101: Clear orchestration setting.
Using "KUBERNETES" or "DOCKER" based on environment is straightforward. This seamlessly integrates with the new checkpoint logic.


135-137: Concise destructuring of message data.
Separating checkpoint from the rest of the message clarifies the restore path. No issues noted.


149-152: Correct usage of spread for checkpoint restore body.
Merging ...rest and checkpoint into a single object is consistent with the new checkpoint client structure. No concerns.

apps/supervisor/src/workloadServer/index.ts (7)

107-109: Runner ID extraction.
This reuses the new helper neatly. Implementation looks consistent.


111-113: Deployment ID accessor.
Using the same approach to fetch the header is clear and maintains uniformity.


115-117: Deployment version accessor.
Again, consistency is maintained with the shared method.


119-121: Project reference accessor.
Simple adaptation of the standard header fetch pattern. No issues.


232-233: Extracting new headers in suspend route.
Capturing deploymentVersion and projectRef ensures sufficient context for checkpoint requests.


235-235: Validate presence of headers.
Checking all required header values protects against invalid requests. Well done.


263-268: Expanded checkpoint suspend body.
Including runnerId, deploymentVersion, and projectRef aligns with the new checkpoint schema. Looks correct.

packages/core/src/v3/serverOnly/checkpointClient.ts (10)

7-7: Pulling in new types.
Importing CheckpointType aligns with the updated orchestrator logic. No issues here.


13-13: Added orchestrator property.
Defining orchestrator: CheckpointType clarifies which environment engine is controlling checkpoints.


19-19: Constructor refactor.
Accepting a single CheckpointClientOptions parameter is cleaner and more scalable for future config.


24-24: body parameter included for clarity.
Collecting request data in body fosters consistency with the new checkpoint schema.


28-29: Omitted 'type' for dynamic set.
Using Omit<CheckpointServiceSuspendRequestBodyInput, "type"> is a neat approach to let orchestrator logic fill in.


30-45: Suspend run request.
Forming the URL and merging type with body is logically sound. No security or concurrency concerns identified.


51-51: Breakout of extra property references.
Listing body, nicely for future expansions of function signature.


91-91: body shaped for restore.
Matches the new schema for restoring runs from checkpoint.


95-95: Adopt new typed body for restore.
Ensures consistent data for the checkpoint restore process.


97-109: Refined restore request.
Using fetch with the updated URL and JSON body is aligned with the new structure. Checking status code first is good.

packages/core/src/v3/serverOnly/httpServer.ts (11)

5-6: Adding Prometheus client utilities.
Importing Registry, Histogram, and Counter plus a tryCatch helper sets up improved observability and robust error handling.


53-53: Metrics options introduced.
The optional metrics config (with register, expose, collect) is flexible for toggling telemetry.

Also applies to: 56-60


64-66: Static metric definitions.
httpRequestDuration and httpRequestTotal are well-named to track request performance.


67-68: Reference to metrics registry.
Storing metricsRegister locally preserves modular design.


77-80: Consumer-provided metrics config.
Capturing collectMetrics and exposeMetrics ensures the server can behave differently across environments.


81-90: Lazy initialization of metrics.
Only defining histograms/counters once avoids redundant metric definitions. This is a valid approach.


91-101: Defensive approach for reusing counters.
Gracefully skipping re-initialization if the counters exist. This prevents accidental re-registration.


102-117: Optional /metrics route exposure.
Serving metrics via /metrics is standard practice, and conditionally exposing it is wise.


121-121: Capturing request time at the start.
Assigning startTime from process.hrtime() is standard for measuring request duration.


190-200: Error handling with tryCatch simplification.
Using [error] = await tryCatch(...) clearly segregates exceptions from normal returns, simplifying error checks.

Also applies to: 202-202


250-266: Intuitive metrics collection method.
Recording request duration (Histogram) and total requests (Counter) is a reliable approach. The label set is thorough.

packages/redis-worker/src/worker.ts (21)

11-11: No issue with this new import.
It correctly references the shutdown manager for coordinated shutdown handling.


12-12: Prometheus client import looks good.
Ensures that we can collect and export various metrics.


49-49: Optional shutdown timeout parameter is a welcome addition.
It makes the worker’s shutdown behavior more flexible for different operational needs.


52-52: Optional metrics definition is clear and unobtrusive.
Allows consumers to opt into Prometheus-based metrics when desired.


72-81: Good approach centralizing metric fields in a private object.
This design keeps metric references organized and simplifies checks.


88-88: Storing shutdown timeout in a private field is fine.
Straightforward usage later in the shutdown implementation.


97-97: Defaulting to 60 seconds is sensible.
This ensures a predictable fallback if shutdownTimeoutMs is not provided.


118-122: Short-circuiting metric setup if no registry is provided makes sense.
Prevents unnecessary processing when metrics are disabled.


124-131: Well-defined enqueue duration histogram.
Naming, labels, and bucket ranges look appropriate for measuring enqueue operations.


132-138: Dequeue duration histogram is consistently set up.
Using the same bucket strategy promotes uniform metrics tracking.


140-147: Job duration histogram is well-tuned.
Including larger buckets is practical for capturing longer job executions.


149-155: Acknowledgment duration histogram is straightforward and consistent.
No concerns here.


157-163: Redrive duration histogram is similarly structured.
Retains a uniform approach to metric collection.


165-171: Reschedule duration histogram setup is sound.
Mirrors the rest of the histogram definitions effectively.


224-228: Visibility timeout validation is helpful.
Graceful error handling ensures each job type defines a proper timeout.


232-245: Histogram-wrapped enqueue call is well-implemented.
Encapsulating the queue operation in withHistogram gives clear insights into enqueue performance.


311-318: Robust dequeue measurement.
Wrapping the critical I/O call in withHistogram helps diagnose queueing performance.


363-372: Wrapping job execution in the jobDuration histogram is excellent.
Provides precise measurement of each job’s execution time.


462-477: Implementation of the withHistogram helper is correct.
Starting the timer before awaiting the promise, then stopping in a finally block, ensures accurate timings.


503-506: Redrive histogram usage is consistent.
Captures redrive performance, helping track Dead Letter Queue recovery times.


514-514: Registration of the worker with the shutdown manager is well-placed.
Ensures that the worker is integrated into the global shutdown lifecycle.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
packages/core/src/v3/serverOnly/httpServer.ts (1)

212-212: Potentially unreachable code

This default response is unlikely to be reached because:

  1. Success cases return from the route handler
  2. Error cases return in the catch block
  3. The code is outside the try/catch/finally structure

Consider removing this line or adding a comment explaining when it would be reached.

-      return reply.empty(501);
+      // This should never be reached as all request paths should return a response
+      // If we get here, it means there's a bug in the request handling logic
+      logger.error("No response was sent, this should never happen");
+      return reply.empty(500);
packages/core/src/v3/serverOnly/shutdownManager.ts (1)

8-91: Nice implementation of the ShutdownManager class.

The class provides a robust solution for graceful shutdowns with good error handling and detailed logging. Using Promise.allSettled ensures all handlers can attempt to run even if some fail.

I have one suggestion - consider adding a timeout option for individual handlers to prevent one slow handler from blocking others:

 register(
   name: string,
   handler: ShutdownHandler,
+  options?: { timeoutMs?: number },
   signals: ShutdownSignal[] = ["SIGTERM", "SIGINT"]
 ) {
   if (this.handlers.has(name)) {
     throw new Error(`Shutdown handler "${name}" already registered`);
   }
-  this.handlers.set(name, { handler, signals });
+  this.handlers.set(name, { handler, signals, timeoutMs: options?.timeoutMs });
 }

Then in the shutdown method, you could implement a timeout for each handler:

// Apply timeout if specified
const handlerWithTimeout = async (name: string, handler: ShutdownHandler, timeoutMs?: number) => {
  if (!timeoutMs) return handler(signal);
  
  return Promise.race([
    handler(signal),
    new Promise((_, reject) => 
      setTimeout(() => reject(new Error(`Handler "${name}" timed out after ${timeoutMs}ms`)), timeoutMs)
    )
  ]);
};
packages/redis-worker/src/worker.ts (2)

232-245: Consider adding error metrics.

While you're tracking duration metrics, consider also tracking error counts for each operation type to improve observability of failure rates.

 constructor(private options: WorkerOptions<TCatalog>) {
   // Existing code...
   
   if (!this.metrics.register) {
     return;
   }
   
   // Existing histograms...
   
+  this.metrics.errors = new Counter({
+    name: "redis_worker_errors_total",
+    help: "The total number of errors encountered",
+    labelNames: ["worker_name", "operation", "job_type"],
+    registers: [this.metrics.register],
+  });
 }

 // Then in withHistogram:
 private async withHistogram<T>(
   histogram: Histogram<string> | undefined,
   promise: Promise<T>,
   labels?: Record<string, string | number>,
+  operationType?: string
 ): Promise<T> {
   if (!histogram || !this.metrics.register) {
     return promise;
   }

   const end = histogram.startTimer({ worker_name: this.options.name, ...labels });
   try {
     return await promise;
   } catch (error) {
+    if (this.metrics.errors && operationType) {
+      this.metrics.errors.inc({
+        worker_name: this.options.name,
+        operation: operationType,
+        job_type: labels?.job_type || "unknown"
+      });
+    }
+    throw error;
   } finally {
     end();
   }
 }

527-535: Consider adding a force termination mechanism after timeout.

Currently, if the shutdown timeout is reached, an error is logged but worker loops might continue running. Consider adding a mechanism to force termination of lingering worker loops after the timeout.

 await Promise.race([
   Promise.all(this.workerLoops),
   Worker.delay(this.shutdownTimeoutMs).then(() => {
     this.logger.error("Worker shutdown timed out", {
       signal,
       shutdownTimeoutMs: this.shutdownTimeoutMs,
     });
+    // Force termination of any remaining work
+    this.workerLoops.forEach((_, index) => {
+      this.logger.warn(`Forcefully terminating worker loop ${index}`);
+      // Set a flag or use another mechanism to force worker loops to exit
+      this.forceTerminateWorkers = true;
+    });
   }),
 ]);

This would require adding checks in the worker loop to exit immediately if forceTerminateWorkers is true.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa3e57d and 821b918.

📒 Files selected for processing (3)
  • packages/core/src/v3/serverOnly/httpServer.ts (4 hunks)
  • packages/core/src/v3/serverOnly/shutdownManager.ts (1 hunks)
  • packages/redis-worker/src/worker.ts (13 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
packages/core/src/v3/serverOnly/shutdownManager.ts (1)
packages/core/src/v3/serverOnly/singleton.ts (1) (1)
  • singleton (1-8)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: typecheck / typecheck
🔇 Additional comments (18)
packages/core/src/v3/serverOnly/httpServer.ts (7)

5-6: Good integration of Prometheus metrics

Added necessary imports for metrics collection and the tryCatch utility for improved error handling.


56-60: Well-designed metrics configuration option

The metrics configuration object provides good flexibility with three optional flags:

  • register: For providing the metrics registry
  • expose: Controls whether to expose the /metrics endpoint
  • collect: Controls whether metrics are collected

This approach allows fine-grained control over metrics functionality.


64-67: Good use of static properties for metrics

Using static properties for metrics ensures they're shared across all HttpServer instances, which is appropriate for application-wide metrics collection.


77-117: Well-implemented metrics initialization

The metrics initialization logic is robust:

  1. Only initializes when a registry is provided and collection is enabled
  2. Prevents re-initialization of metrics
  3. Properly configures histogram and counter with appropriate labels
  4. Conditionally exposes the /metrics endpoint with proper error handling

190-204: Good error handling refactoring

The tryCatch utility simplifies error handling logic while maintaining proper error reporting.


208-210: Proper metrics collection in finally block

Using a finally block ensures metrics are collected regardless of request outcome, which is essential for accurate monitoring.


250-267: Well-implemented metrics collection method

The collectMetrics method:

  1. Properly handles the case when metrics are disabled
  2. Correctly calculates request duration using high-resolution timing
  3. Extracts relevant information from request and response
  4. Uses consistent labeling for metrics
packages/core/src/v3/serverOnly/shutdownManager.ts (1)

93-93: Good use of the singleton pattern.

Using the singleton pattern ensures there's only one instance of the ShutdownManager throughout the application, which is essential for consistent shutdown management.

packages/redis-worker/src/worker.ts (10)

11-12: Importing the new shutdownManager is a good approach.

This properly integrates with the new centralized shutdown management system.


49-54: Good addition of the shutdownTimeoutMs option.

This allows for configurable shutdown timeouts, which is an important part of graceful shutdown.


72-80: Well-structured metrics object.

The metrics object is well-organized with all related metrics grouped together.


97-98: Good default timeout value.

Setting a reasonable default of 60 seconds for shutdown timeout is a good practice.


118-171: Comprehensive metrics instrumentation.

You've added appropriate metrics for all key operations with good naming conventions and useful label dimensions. The buckets are well-chosen for different operation types.


462-477: Excellent reusable helper method for metrics.

The withHistogram method is a clean abstraction that avoids code duplication for metrics tracking across different operations.


514-514: Worker registration with shutdownManager is correct.

The worker is now properly registered with a unique name that includes the worker's name, which prevents conflicts in multi-worker scenarios.


517-541: Good implementation of graceful shutdown with timeout.

The shutdown method now properly handles signals and includes a timeout mechanism using Promise.race to prevent hanging. The logging is also comprehensive.


544-544: Fixed unregister name mismatch.

This fix properly addresses the previous issue where the registration and unregistration used different names.


224-228: Good validation for missing visibility timeout.

This check ensures that jobs always have a visibility timeout, preventing potential issues during processing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/redis-worker/src/worker.test.ts (1)

11-13: Use of private API in test cleanup

The afterEach hook uses type casting to access what appears to be a private method (_reset) on the shutdownManager. While this works for tests, consider whether this API could be formalized:

  1. If _reset is meant to be used in tests only, consider exposing a public test-specific method like resetForTesting() instead of using type casting
  2. If this reset functionality might be needed elsewhere, consider making it part of the public API

Type casting to bypass the type system is a code smell that often indicates API design issues.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 821b918 and 59afe71.

📒 Files selected for processing (1)
  • packages/redis-worker/src/worker.test.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: units / 🧪 Unit Tests
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
packages/redis-worker/src/worker.test.ts (2)

4-4: Import changes related to shutdown management

The two new imports (afterEach from vitest and shutdownManager from @trigger.dev/core/v3/serverOnly) support the implementation of reset functionality between tests. This aligns with the PR's goal of implementing graceful shutdown management.

Also applies to: 8-8


11-13: Important test cleanup for shutdown management

The addition of this afterEach hook is critical for test isolation. Without it, the state of the shutdownManager would persist between tests, potentially causing unpredictable behavior in the test suite as registered shutdown handlers from one test could affect others.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/core/src/v3/serverOnly/shutdownManager.ts (1)

50-93: Consider adding a timeout mechanism for graceful shutdown

The shutdown implementation correctly handles multiple aspects of graceful shutdown including parallel execution of handlers and error handling. However, if a handler takes too long or hangs, it might delay the process exit indefinitely.

Consider adding a timeout mechanism that would force exit after a configurable duration:

 async shutdown(signal: ShutdownSignal) {
   if (!this.isEnabled()) return;

   if (this.isShuttingDown) return;
   this.isShuttingDown = true;

   this.logger.info(`Received ${signal}. Starting graceful shutdown...`);

   // Get handlers that are registered for this signal
   const handlersToRun = Array.from(this.handlers.entries()).filter(([_, { signals }]) =>
     signals.includes(signal)
   );

+  // Set a timeout to force exit if handlers take too long
+  const forceExitTimeout = setTimeout(() => {
+    this.logger.warn(`Forcing exit after timeout during ${signal} shutdown`);
+    process.exit(128 + this.signalNumbers[signal]);
+  }, 30000); // 30 seconds timeout, could be made configurable

   try {
     const results = await Promise.allSettled(
       handlersToRun.map(async ([name, { handler }]) => {
         try {
           this.logger.info(`Running shutdown handler: ${name}`);
           await handler(signal);
           this.logger.info(`Shutdown handler completed: ${name}`);
         } catch (error) {
           this.logger.error(`Shutdown handler failed: ${name}`, { error });
           throw error;
         }
       })
     );

     // Log any failures
     results.forEach((result, index) => {
       if (result.status === "rejected") {
         const handlerEntry = handlersToRun[index];
         if (handlerEntry) {
           const [name] = handlerEntry;
           this.logger.error(`Shutdown handler "${name}" failed:`, { reason: result.reason });
         }
       }
     });
   } catch (error) {
     this.logger.error("Error during shutdown:", { error });
   } finally {
+    clearTimeout(forceExitTimeout);
     // Exit with the correct signal number
     process.exit(128 + this.signalNumbers[signal]);
   }
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59afe71 and 921edcb.

📒 Files selected for processing (2)
  • packages/core/src/v3/serverOnly/shutdownManager.test.ts (1 hunks)
  • packages/core/src/v3/serverOnly/shutdownManager.ts (1 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
packages/core/src/v3/serverOnly/shutdownManager.test.ts (1)
packages/core/src/v3/serverOnly/shutdownManager.ts (1) (1)
  • ShutdownManager (9-110)
packages/core/src/v3/serverOnly/shutdownManager.ts (1)
packages/core/src/v3/serverOnly/singleton.ts (1) (1)
  • singleton (1-8)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: units / 🧪 Unit Tests
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (11)
packages/core/src/v3/serverOnly/shutdownManager.ts (5)

5-7: Good type safety with restricted signal types

The intentional limitation to only SIGTERM and SIGINT signals is a good practice that prevents unexpected issues with signal handling. This restriction is properly enforced through TypeScript's Extract utility type.


9-25: Well-structured ShutdownManager initialization

The class has a well-designed constructor that:

  1. Properly initializes internal state
  2. Sets up signal handlers conditionally based on test environment
  3. Provides signal number mapping for correct exit codes

The disableForTesting parameter makes the class testable without interfering with the actual process signals.


27-38: Clean handler registration with validation

The register method has good validation that prevents duplicate handler registration and clearly defines the default signals. The early return when disabled is an efficient approach.


95-101: Effective environment detection for testing

The isEnabled method correctly determines whether the shutdown manager should be active based on the test environment, which prevents interference with tests while allowing tests to explicitly enable it when needed.


112-112: Well-implemented singleton pattern

Using the singleton pattern ensures that only one instance of the ShutdownManager exists across the application, which is crucial for proper signal handling and coordination of shutdown procedures.

packages/core/src/v3/serverOnly/shutdownManager.test.ts (6)

4-11: Well-structured test setup with proper isolation

The test setup properly:

  1. Disables concurrent execution for signal handling tests
  2. Mocks process.exit to prevent actual process termination
  3. Clears all mocks before each test to maintain isolation

This ensures reliable and deterministic test execution without side effects.


12-43: Comprehensive handler registration tests

These tests thoroughly verify the registration functionality:

  1. Successful registration with default signals
  2. Error handling for duplicate registrations
  3. Custom signal registration

The tests properly validate both the presence of handlers and their configured signals.


45-75: Thorough shutdown handler execution tests

These tests effectively verify that:

  1. All registered handlers for a signal are called with the correct signal parameter
  2. Only handlers registered for the specific signal are executed
  3. The process exits with the correct exit code

This ensures the core functionality of the shutdown manager works as expected.


77-91: Good error handling verification

This test correctly verifies that the shutdown process continues even if some handlers fail, which is crucial for robust graceful shutdown behavior. It confirms both successful and failing handlers are attempted, and the process still exits properly.


93-122: Proper verification of shutdown idempotence and exit codes

These tests effectively verify:

  1. The shutdown sequence is only executed once even when called multiple times
  2. The process exits with the correct signal-specific exit code

This ensures reliable behavior when multiple shutdown signals are received.


124-182: Excellent asynchronous handler completion verification

This test thoroughly verifies that:

  1. All handlers complete execution before the process exits
  2. Handlers run in parallel but the exit only happens after all complete
  3. The sequence of operations is tracked and verified

The test implementation is impressive, with good use of mock implementation overrides and restoration. It's an excellent example of validating asynchronous behavior in a deterministic way.

@nicktrn nicktrn merged commit 49a3f72 into main Mar 21, 2025
11 of 13 checks passed
@nicktrn nicktrn deleted the feat/redis-worker-package branch March 21, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants