Skip to content

Conversation

@adityachoudhari26
Copy link
Contributor

@adityachoudhari26 adityachoudhari26 commented Sep 18, 2025

Summary by CodeRabbit

  • New Features

    • Added observability for the event-queue service: tracing and structured logging.
    • Enabled automatic instrumentation for HTTP, database, cache, and logging to improve diagnostics.
    • Production-only initialization to avoid development overhead.
  • Chores

    • Container startup updated to preload instrumentation at runtime.
    • Added required observability dependencies.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds OpenTelemetry to the event-queue service: new instrumentation modules and dependencies, Docker CMD preloads instrumentation via Node's -r option, and the ResourceCreated handler is wrapped to create a tracing span.

Changes

Cohort / File(s) Summary
Container startup
apps/event-queue/Dockerfile
CMD changed to preload instrumentation using node -r ./instrumentation-node.js index.js (exec form preserved with -r flag). WORKDIR remains apps/event-queue/dist/.
Dependencies
apps/event-queue/package.json
Adds OpenTelemetry packages: @opentelemetry/api, @opentelemetry/auto-instrumentations-node, @opentelemetry/exporter-logs-otlp-http, @opentelemetry/exporter-trace-otlp-http, @opentelemetry/resources, @opentelemetry/sdk-logs, @opentelemetry/sdk-node, @opentelemetry/sdk-trace-base, @opentelemetry/semantic-conventions.
Event handler tracing
apps/event-queue/src/events/resources.ts
newResource handler wrapped with withSpan("new-resource", ...); span sets resource.id attribute and retains existing workspace load and dispatch logic.
Instrumentation runtime (Node)
apps/event-queue/src/instrumentation-node.ts
New file: configures NodeSDK for service ctrlplane/event-queue, OTLP HTTP trace/log exporters, Batch processors, AlwaysOnSampler, and auto-instrumentations (http, pg with options, ioredis, winston; disables fs/net/dns). Starts SDK and handles SIGTERM shutdown.
Instrumentation gate
apps/event-queue/src/instrumentation.ts
New file: exports async register() that conditionally imports ./instrumentation-node.js in production (dynamic import) to defer instrumentation loading.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Docker
  participant Node as Node runtime
  participant Instr as instrumentation-node.js
  participant App as index.js

  Docker->>Node: Start container (CMD: node -r ./instrumentation-node.js index.js)
  activate Node
  Node->>Instr: Preload instrumentation module (-r)
  activate Instr
  Instr->>Instr: Configure NodeSDK (service resource, exporters, processors, sampler)
  Instr->>Instr: Enable auto-instrumentations (http, pg, ioredis, winston)
  Instr->>Instr: sdk.start()
  deactivate Instr
  Node->>App: Execute index.js
  deactivate Node
Loading
sequenceDiagram
  autonumber
  participant Bus as Event Bus
  participant Handler as newResource (withSpan)
  participant Span as OTel Span
  participant WS as WorkspaceStore
  participant Pipe as OperationPipeline

  Bus->>Handler: Event.ResourceCreated
  activate Handler
  Handler->>Span: Start span "new-resource"
  note right of Span: set attribute `resource.id`
  Handler->>WS: Load workspace
  alt Workspace missing
    Handler-->>Span: End span
    Handler-->>Bus: Return (no-op)
  else Workspace found
    Handler->>Pipe: Dispatch update
    Handler-->>Span: End span
  end
  deactivate Handler
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • jsbroks

Poem

I thump and trace through code and queue,
OTLP carrots, bright and new.
A tiny span around each seed,
I hop, I log, I do the deed. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "chore: add traces to event queue" is concise, follows conventional commit style, and accurately summarizes the primary change—adding OpenTelemetry tracing and related instrumentation/dependencies to the event-queue service (including new instrumentation files and a Docker CMD update). It is clear enough for a teammate scanning PR history to understand the main intent.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e5e2098 and 2a09b70.

📒 Files selected for processing (2)
  • apps/event-queue/Dockerfile (1 hunks)
  • apps/event-queue/src/instrumentation-node.ts (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
apps/event-queue/src/instrumentation-node.ts (1)

43-45: Use span attributes instead of log record fields for service name

Setting resource.service.name directly on the log record is unconventional. The service name should be propagated through the resource context automatically.

Consider removing the logHook or using proper span attributes:

      "@opentelemetry/instrumentation-winston": {
        enabled: true,
-        logHook: (span, record) => {
-          record["resource.service.name"] = "ctrlplane/event-worker";
-        },
      },
apps/event-queue/src/events/resources.ts (2)

22-24: Consider using a shared tracer instance

Creating a separate tracer for each handler may be unnecessary. Consider using a shared tracer for all resource event handlers.

-const newResourceTracer = trace.getTracer("new-resource");
-const withSpan = makeWithSpan(newResourceTracer);
+const resourceTracer = trace.getTracer("event-queue.resources");
+const withSpan = makeWithSpan(resourceTracer);

25-34: Add tracing to other resource handlers for consistency

Only newResource has tracing while other handlers (updatedResource, deletedResource, etc.) don't. Consider adding spans to all handlers for comprehensive observability.

Would you like me to generate the code to add tracing to all resource event handlers for consistency?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between aeda0e6 and e5e2098.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (5)
  • apps/event-queue/Dockerfile (1 hunks)
  • apps/event-queue/package.json (1 hunks)
  • apps/event-queue/src/events/resources.ts (2 hunks)
  • apps/event-queue/src/instrumentation-node.ts (1 hunks)
  • apps/event-queue/src/instrumentation.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use TypeScript with explicit types (prefer interfaces for public APIs)
Import styles: Use named imports, group imports by source (std lib > external > internal)
Consistent type imports: import type { Type } from "module"
Prefer async/await over raw promises
Handle errors explicitly (use try/catch and typed error responses)

Files:

  • apps/event-queue/src/instrumentation-node.ts
  • apps/event-queue/src/instrumentation.ts
  • apps/event-queue/src/events/resources.ts

⚙️ CodeRabbit configuration file

**/*.{ts,tsx}: Note on Error Handling:
Avoid strict enforcement of try/catch blocks. Code may use early returns, Promise chains (.then().catch()), or other patterns for error handling. These are acceptable as long as they maintain clarity and predictability.

Files:

  • apps/event-queue/src/instrumentation-node.ts
  • apps/event-queue/src/instrumentation.ts
  • apps/event-queue/src/events/resources.ts
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}

📄 CodeRabbit inference engine (CLAUDE.md)

Formatting: Prettier is used with @ctrlplane/prettier-config

Files:

  • apps/event-queue/src/instrumentation-node.ts
  • apps/event-queue/src/instrumentation.ts
  • apps/event-queue/package.json
  • apps/event-queue/src/events/resources.ts
🧬 Code graph analysis (1)
apps/event-queue/src/events/resources.ts (3)
packages/logger/src/index.ts (2)
  • trace (7-7)
  • makeWithSpan (74-95)
apps/event-queue/src/events/index.ts (1)
  • Handler (71-73)
apps/event-queue/src/workspace/workspace.ts (1)
  • WorkspaceManager (101-121)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build (linux/amd64)
  • GitHub Check: build (linux/amd64)
  • GitHub Check: Format
  • GitHub Check: Typecheck
  • GitHub Check: Lint
  • GitHub Check: build (linux/amd64)
  • GitHub Check: build (linux/amd64)
  • GitHub Check: build (linux/amd64)
🔇 Additional comments (3)
apps/event-queue/src/instrumentation.ts (1)

1-4: Verify the runtime import path resolves correctly

Build artifacts not found at apps/event-queue/dist and tsconfig.json compilerOptions.module is null — cannot confirm the imported file exists after build.

  • Ensure compiled output contains apps/event-queue/dist/instrumentation-node.js (or update the import to the actual emitted filename/extension).
  • Verify tsconfig.json outDir/module/emit settings so the emitted module format/extension (.js/.cjs/.mjs) matches the runtime import.

Location: apps/event-queue/src/instrumentation.ts:1-4

apps/event-queue/src/instrumentation-node.ts (1)

17-17: Remove unnecessary as any casts on BatchSpanProcessor

Drop the as any cast and use the concrete typed instance; run the TypeScript typechecker and fix any real type mismatches instead of suppressing them.

Affected files:

  • apps/event-worker/src/instrumentation-node.ts
  • apps/event-queue/src/instrumentation-node.ts
  • apps/webservice/src/instrumentation-node.ts

Example replacement:
spanProcessors: [new BatchSpanProcessor(new OTLPTraceExporter())],

apps/event-queue/package.json (1)

22-30: Align OpenTelemetry package versions for compatibility

apps/event-queue/package.json (lines 22–30) mixes @opentelemetry majors (api@^1.9.0, auto-instrumentations@^0.52.1, sdk/exporters@^0.54.2, resources/semantic-conventions@^1.27.0). Align to a single compatible stream — e.g., upgrade SDK/exporter packages to the stable 2.0.x line and update auto-instrumentations to a matching release (keep @opentelemetry/api on the compatible 1.9.x series if required). Pin matching major versions across all @opentelemetry/* packages and run telemetry integration tests.

},
}),
],
sampler: new AlwaysOnSampler(),
Copy link
Contributor

@coderabbitai coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider configurable sampling strategy for production

Using AlwaysOnSampler in production can generate excessive trace data and incur high costs. Consider making the sampling strategy configurable.

Apply this diff to make sampling configurable:

+import { TraceIdRatioBasedSampler } from "@opentelemetry/sdk-trace-base";
+
+const samplingRatio = parseFloat(process.env.OTEL_SAMPLING_RATIO || "0.1");
+const sampler = samplingRatio >= 1 
+  ? new AlwaysOnSampler()
+  : new TraceIdRatioBasedSampler(samplingRatio);
+
 const sdk = new NodeSDK({
   // ... other config
-  sampler: new AlwaysOnSampler(),
+  sampler,
 });

Would you like me to create an issue to track adding sampling configuration?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sampler: new AlwaysOnSampler(),
import { TraceIdRatioBasedSampler } from "@opentelemetry/sdk-trace-base";
const samplingRatio = parseFloat(process.env.OTEL_SAMPLING_RATIO || "0.1");
const sampler = samplingRatio >= 1
? new AlwaysOnSampler()
: new TraceIdRatioBasedSampler(samplingRatio);
const sdk = new NodeSDK({
// ... other config
sampler,
});
🤖 Prompt for AI Agents
In apps/event-queue/src/instrumentation-node.ts around line 49, the code
unconditionally uses sampler: new AlwaysOnSampler(), which can produce excessive
traces in production; change this to select the sampler from configuration (env
var or shared config) instead. Import the needed samplers (AlwaysOnSampler,
TraceIdRatioBasedSampler, ParentBasedSampler) and read something like PROCESS
env TRACE_SAMPLER (e.g., "always"|"ratio"|"parent") and TRACE_SAMPLE_RATE (a
float 0.0-1.0) or equivalent config; build the sampler with a small switch/if
(default to TraceIdRatioBasedSampler with a sensible default in production and
AlwaysOnSampler in local/dev) and pass that sampler into the tracer/provider.
Validate and coerce the sample-rate value, and document the two env vars near
the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not worried about cost atm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adityachoudhari26, understood! Keeping AlwaysOnSampler for now makes sense when you want full observability and cost isn't a concern. You can always optimize sampling later when needed.


✏️ Learnings added
Learnt from: adityachoudhari26
PR: ctrlplanedev/ctrlplane#666
File: apps/event-queue/src/instrumentation-node.ts:49-49
Timestamp: 2025-09-18T17:40:04.183Z
Learning: The team prefers using AlwaysOnSampler for OpenTelemetry tracing in the event-queue service as they are not currently concerned about tracing costs and want full observability.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@adityachoudhari26 adityachoudhari26 merged commit 4ceda6c into main Sep 18, 2025
4 of 10 checks passed
@adityachoudhari26 adityachoudhari26 deleted the event-queue-traces branch September 18, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants