-
Notifications
You must be signed in to change notification settings - Fork 11
chore: add traces to event queue #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds OpenTelemetry to the event-queue service: new instrumentation modules and dependencies, Docker CMD preloads instrumentation via Node's Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Docker
participant Node as Node runtime
participant Instr as instrumentation-node.js
participant App as index.js
Docker->>Node: Start container (CMD: node -r ./instrumentation-node.js index.js)
activate Node
Node->>Instr: Preload instrumentation module (-r)
activate Instr
Instr->>Instr: Configure NodeSDK (service resource, exporters, processors, sampler)
Instr->>Instr: Enable auto-instrumentations (http, pg, ioredis, winston)
Instr->>Instr: sdk.start()
deactivate Instr
Node->>App: Execute index.js
deactivate Node
sequenceDiagram
autonumber
participant Bus as Event Bus
participant Handler as newResource (withSpan)
participant Span as OTel Span
participant WS as WorkspaceStore
participant Pipe as OperationPipeline
Bus->>Handler: Event.ResourceCreated
activate Handler
Handler->>Span: Start span "new-resource"
note right of Span: set attribute `resource.id`
Handler->>WS: Load workspace
alt Workspace missing
Handler-->>Span: End span
Handler-->>Bus: Return (no-op)
else Workspace found
Handler->>Pipe: Dispatch update
Handler-->>Span: End span
end
deactivate Handler
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
apps/event-queue/src/instrumentation-node.ts (1)
43-45: Use span attributes instead of log record fields for service nameSetting
resource.service.namedirectly on the log record is unconventional. The service name should be propagated through the resource context automatically.Consider removing the logHook or using proper span attributes:
"@opentelemetry/instrumentation-winston": { enabled: true, - logHook: (span, record) => { - record["resource.service.name"] = "ctrlplane/event-worker"; - }, },apps/event-queue/src/events/resources.ts (2)
22-24: Consider using a shared tracer instanceCreating a separate tracer for each handler may be unnecessary. Consider using a shared tracer for all resource event handlers.
-const newResourceTracer = trace.getTracer("new-resource"); -const withSpan = makeWithSpan(newResourceTracer); +const resourceTracer = trace.getTracer("event-queue.resources"); +const withSpan = makeWithSpan(resourceTracer);
25-34: Add tracing to other resource handlers for consistencyOnly
newResourcehas tracing while other handlers (updatedResource,deletedResource, etc.) don't. Consider adding spans to all handlers for comprehensive observability.Would you like me to generate the code to add tracing to all resource event handlers for consistency?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (5)
apps/event-queue/Dockerfile(1 hunks)apps/event-queue/package.json(1 hunks)apps/event-queue/src/events/resources.ts(2 hunks)apps/event-queue/src/instrumentation-node.ts(1 hunks)apps/event-queue/src/instrumentation.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use TypeScript with explicit types (prefer interfaces for public APIs)
Import styles: Use named imports, group imports by source (std lib > external > internal)
Consistent type imports:import type { Type } from "module"
Prefer async/await over raw promises
Handle errors explicitly (use try/catch and typed error responses)
Files:
apps/event-queue/src/instrumentation-node.tsapps/event-queue/src/instrumentation.tsapps/event-queue/src/events/resources.ts
⚙️ CodeRabbit configuration file
**/*.{ts,tsx}: Note on Error Handling:
Avoid strict enforcement of try/catch blocks. Code may use early returns, Promise chains (.then().catch()), or other patterns for error handling. These are acceptable as long as they maintain clarity and predictability.
Files:
apps/event-queue/src/instrumentation-node.tsapps/event-queue/src/instrumentation.tsapps/event-queue/src/events/resources.ts
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}
📄 CodeRabbit inference engine (CLAUDE.md)
Formatting: Prettier is used with
@ctrlplane/prettier-config
Files:
apps/event-queue/src/instrumentation-node.tsapps/event-queue/src/instrumentation.tsapps/event-queue/package.jsonapps/event-queue/src/events/resources.ts
🧬 Code graph analysis (1)
apps/event-queue/src/events/resources.ts (3)
packages/logger/src/index.ts (2)
trace(7-7)makeWithSpan(74-95)apps/event-queue/src/events/index.ts (1)
Handler(71-73)apps/event-queue/src/workspace/workspace.ts (1)
WorkspaceManager(101-121)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build (linux/amd64)
- GitHub Check: build (linux/amd64)
- GitHub Check: Format
- GitHub Check: Typecheck
- GitHub Check: Lint
- GitHub Check: build (linux/amd64)
- GitHub Check: build (linux/amd64)
- GitHub Check: build (linux/amd64)
🔇 Additional comments (3)
apps/event-queue/src/instrumentation.ts (1)
1-4: Verify the runtime import path resolves correctlyBuild artifacts not found at apps/event-queue/dist and tsconfig.json compilerOptions.module is null — cannot confirm the imported file exists after build.
- Ensure compiled output contains apps/event-queue/dist/instrumentation-node.js (or update the import to the actual emitted filename/extension).
- Verify tsconfig.json outDir/module/emit settings so the emitted module format/extension (.js/.cjs/.mjs) matches the runtime import.
Location: apps/event-queue/src/instrumentation.ts:1-4
apps/event-queue/src/instrumentation-node.ts (1)
17-17: Remove unnecessaryas anycasts on BatchSpanProcessorDrop the
as anycast and use the concrete typed instance; run the TypeScript typechecker and fix any real type mismatches instead of suppressing them.Affected files:
- apps/event-worker/src/instrumentation-node.ts
- apps/event-queue/src/instrumentation-node.ts
- apps/webservice/src/instrumentation-node.ts
Example replacement:
spanProcessors: [new BatchSpanProcessor(new OTLPTraceExporter())],apps/event-queue/package.json (1)
22-30: Align OpenTelemetry package versions for compatibilityapps/event-queue/package.json (lines 22–30) mixes @opentelemetry majors (api@^1.9.0, auto-instrumentations@^0.52.1, sdk/exporters@^0.54.2, resources/semantic-conventions@^1.27.0). Align to a single compatible stream — e.g., upgrade SDK/exporter packages to the stable 2.0.x line and update auto-instrumentations to a matching release (keep @opentelemetry/api on the compatible 1.9.x series if required). Pin matching major versions across all @opentelemetry/* packages and run telemetry integration tests.
| }, | ||
| }), | ||
| ], | ||
| sampler: new AlwaysOnSampler(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider configurable sampling strategy for production
Using AlwaysOnSampler in production can generate excessive trace data and incur high costs. Consider making the sampling strategy configurable.
Apply this diff to make sampling configurable:
+import { TraceIdRatioBasedSampler } from "@opentelemetry/sdk-trace-base";
+
+const samplingRatio = parseFloat(process.env.OTEL_SAMPLING_RATIO || "0.1");
+const sampler = samplingRatio >= 1
+ ? new AlwaysOnSampler()
+ : new TraceIdRatioBasedSampler(samplingRatio);
+
const sdk = new NodeSDK({
// ... other config
- sampler: new AlwaysOnSampler(),
+ sampler,
});Would you like me to create an issue to track adding sampling configuration?
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sampler: new AlwaysOnSampler(), | |
| import { TraceIdRatioBasedSampler } from "@opentelemetry/sdk-trace-base"; | |
| const samplingRatio = parseFloat(process.env.OTEL_SAMPLING_RATIO || "0.1"); | |
| const sampler = samplingRatio >= 1 | |
| ? new AlwaysOnSampler() | |
| : new TraceIdRatioBasedSampler(samplingRatio); | |
| const sdk = new NodeSDK({ | |
| // ... other config | |
| sampler, | |
| }); |
🤖 Prompt for AI Agents
In apps/event-queue/src/instrumentation-node.ts around line 49, the code
unconditionally uses sampler: new AlwaysOnSampler(), which can produce excessive
traces in production; change this to select the sampler from configuration (env
var or shared config) instead. Import the needed samplers (AlwaysOnSampler,
TraceIdRatioBasedSampler, ParentBasedSampler) and read something like PROCESS
env TRACE_SAMPLER (e.g., "always"|"ratio"|"parent") and TRACE_SAMPLE_RATE (a
float 0.0-1.0) or equivalent config; build the sampler with a small switch/if
(default to TraceIdRatioBasedSampler with a sensible default in production and
AlwaysOnSampler in local/dev) and pass that sampler into the tracer/provider.
Validate and coerce the sample-rate value, and document the two env vars near
the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not worried about cost atm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adityachoudhari26, understood! Keeping AlwaysOnSampler for now makes sense when you want full observability and cost isn't a concern. You can always optimize sampling later when needed.
✏️ Learnings added
Learnt from: adityachoudhari26
PR: ctrlplanedev/ctrlplane#666
File: apps/event-queue/src/instrumentation-node.ts:49-49
Timestamp: 2025-09-18T17:40:04.183Z
Learning: The team prefers using AlwaysOnSampler for OpenTelemetry tracing in the event-queue service as they are not currently concerned about tracing costs and want full observability.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Summary by CodeRabbit
New Features
Chores