Skip to content

Conversation

@tomusdrw
Copy link
Contributor

This PR exposes simple telemetry using OpenTelemetry APIs. I'm exposing some JIP-3 events, but please note this is not JIP-3 implementation (TART), because that will need a custom protocol.

However it's possible to run multiple nodes and have them push to prometheus instance. See instructions for testing it locally.

Because we run multiple worker threads, it's not trivial to expose metrics and have prometheus pull them - see explanation in telemetry package.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 22, 2025

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Integrated comprehensive OpenTelemetry support for system-wide observability and real-time performance monitoring, providing detailed operational metrics across networking connections, block generation, block importing, synchronization status, and node operations to enable better troubleshooting and diagnostics.
  • Documentation

    • Added telemetry configuration and setup documentation with Prometheus integration guide for local metrics collection and visualization.
  • Chores

    • Removed debug console logging for cleaner output.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds OpenTelemetry telemetry across the repository: a new core telemetry package with initialize/close APIs; telemetry initialization in CLIs and worker bootstrap files; new metrics modules and re-exports for networking, jam node, block generator, and importer; propagation of nodeName through worker configs and transferables; a now() utility and refactored measure(); removal of several debug logs; and workspace/package.json updates adding telemetry and OpenTelemetry dependencies.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing extra attention:
    • packages/core/telemetry/index.ts (SDK composition, env toggles, exporter setup, startup/shutdown)
    • packages/core/networking/setup.ts and packages/core/networking/metrics.ts (many instrumentation points in QUIC flow)
    • Worker config/API changes (packages/workers/api-node/config.ts and packages/workers/api/config.ts) — transferable shape and nodeName propagation
    • Instrumentation in importer and block-generator (metrics creation and usage)

Possibly related PRs

Suggested reviewers

  • skoszuta
  • DrEverr

Poem

🐰 I hopped where metrics quietly grow,
Counters tally each connection's flow,
Slots and blocks I softly note,
Telemetry trails each tiny mote,
Carrots crunch — observability aglow! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Simple telemetry' directly and clearly summarizes the main objective of the PR, which is to expose simple telemetry using OpenTelemetry APIs across the codebase.
Description check ✅ Passed The description comprehensively explains the telemetry implementation, noting it exposes JIP-3 events via OpenTelemetry APIs, clarifies it is not a full JIP-3 implementation, and describes the multi-node Prometheus integration approach.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch td-opentelemetry

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
packages/workers/block-generator/package.json (1)

7-7: Dependency additions look correct.

The workspace dependency @typeberry/telemetry correctly uses "*" per coding guidelines. The OpenTelemetry API version should be verified as noted in other package.json files.

Also applies to: 18-18

🧹 Nitpick comments (3)
bin/jam/README.md (1)

140-140: Fix bare URL markdown linting warning.

The URL http://localhost:9090/api/v1/otlp should be formatted as inline code to resolve the markdown linting warning and improve consistency with the table format.

Apply this diff:

-| `OTEL_EXPORTER_OTLP_ENDPOINT` | URL to push metrics to | http://localhost:9090/api/v1/otlp |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | URL to push metrics to | `http://localhost:9090/api/v1/otlp` |
packages/workers/api-node/port.ts (1)

88-90: Avoid using as type assertions.

While the comment explains why the cast is safe, the coding guidelines specify that as conversions should not be used. Consider creating a type guard function or using a tryAs pattern to validate the cast at runtime.

As per coding guidelines

Apply this diff to use a more explicit type assertion approach:

+function asTransferable(buffer: ArrayBuffer): Transferable {
+  return buffer as unknown as Transferable;
+}
+
 postMessage<T>(event: string, codec: Codec<T>, msg: Envelope<T>): void {
   const encoded = Encoder.encodeObject(codec, msg.data, this.spec);
   const message: Message = {
     eventName: event,
     responseId: msg.responseId,
     data: encoded.raw,
   };
-  // casting to transferable is safe here, since we know that encoder
-  // always returns owned uint8arrays.
-  this.port.postMessage(message, [encoded.raw.buffer as unknown as Transferable]);
+  // Encoder always returns owned uint8arrays, so the buffer is transferable
+  this.port.postMessage(message, [asTransferable(encoded.raw.buffer)]);
 }
packages/core/telemetry/index.ts (1)

74-76: Consider making the export interval configurable.

The 3-second export interval (line 76) is quite aggressive and could generate significant network traffic, especially with multiple workers. Consider making this configurable via an environment variable (e.g., OTEL_METRIC_EXPORT_INTERVAL_MILLIS) with 3000 as a default for local development but allowing operators to adjust it for production.

Apply this diff:

+const exportIntervalMillis = Number(env.OTEL_METRIC_EXPORT_INTERVAL_MILLIS) || 3_000;
+
 const otlpReader = new PeriodicExportingMetricReader({
   exporter: otlpExporter,
-  exportIntervalMillis: 3_000,
+  exportIntervalMillis,
 });
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96c9168 and ae1bbc6.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (39)
  • bin/jam/README.md (1 hunks)
  • bin/jam/index.ts (2 hunks)
  • bin/jam/package.json (1 hunks)
  • bin/jam/telemetry.ts (1 hunks)
  • package.json (1 hunks)
  • packages/core/networking/index.ts (1 hunks)
  • packages/core/networking/metrics.ts (1 hunks)
  • packages/core/networking/package.json (1 hunks)
  • packages/core/networking/setup.ts (5 hunks)
  • packages/core/telemetry/index.ts (1 hunks)
  • packages/core/telemetry/package.json (1 hunks)
  • packages/core/utils/compatibility.ts (1 hunks)
  • packages/core/utils/debug.ts (2 hunks)
  • packages/core/utils/dev.ts (1 hunks)
  • packages/core/utils/env.ts (1 hunks)
  • packages/core/utils/index.ts (1 hunks)
  • packages/jam/node/export.ts (1 hunks)
  • packages/jam/node/index.ts (1 hunks)
  • packages/jam/node/main-importer.ts (3 hunks)
  • packages/jam/node/main.ts (4 hunks)
  • packages/jam/node/metrics.ts (1 hunks)
  • packages/jam/node/package.json (1 hunks)
  • packages/jam/node/workers.ts (0 hunks)
  • packages/workers/api-node/config.ts (6 hunks)
  • packages/workers/api-node/port.ts (1 hunks)
  • packages/workers/api/channel.ts (0 hunks)
  • packages/workers/api/config.ts (2 hunks)
  • packages/workers/block-generator/bootstrap-main.ts (1 hunks)
  • packages/workers/block-generator/generator.ts (4 hunks)
  • packages/workers/block-generator/index.ts (1 hunks)
  • packages/workers/block-generator/metrics.ts (1 hunks)
  • packages/workers/block-generator/package.json (2 hunks)
  • packages/workers/importer/bootstrap-main.ts (1 hunks)
  • packages/workers/importer/importer.ts (6 hunks)
  • packages/workers/importer/index.ts (1 hunks)
  • packages/workers/importer/metrics.ts (1 hunks)
  • packages/workers/importer/package.json (2 hunks)
  • packages/workers/jam-network/bootstrap-main.ts (1 hunks)
  • packages/workers/jam-network/package.json (1 hunks)
💤 Files with no reviewable changes (2)
  • packages/jam/node/workers.ts
  • packages/workers/api/channel.ts
🧰 Additional context used
📓 Path-based instructions (3)
packages/core/**/*.ts

⚙️ CodeRabbit configuration file

packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined under packages/jam/**)

Files:

  • packages/core/utils/index.ts
  • packages/core/telemetry/index.ts
  • packages/core/networking/metrics.ts
  • packages/core/networking/setup.ts
  • packages/core/utils/debug.ts
  • packages/core/utils/env.ts
  • packages/core/utils/dev.ts
  • packages/core/networking/index.ts
  • packages/core/utils/compatibility.ts
**/*.ts

⚙️ CodeRabbit configuration file

**/*.ts: rules from ./CODESTYLE.md should be adhered to.

**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in *.test.ts files. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.

**/*.ts: as conversions must not be used. Suggest using tryAs conversion methods.

**/*.ts: Classes with static Codec field must have private constructor and static create method.

**/*.ts: Casting a bigint (or U64) using Number(x) must have an explanation comment why
it is safe.

**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.

Files:

  • packages/core/utils/index.ts
  • packages/jam/node/export.ts
  • packages/jam/node/main.ts
  • packages/workers/api/config.ts
  • packages/jam/node/metrics.ts
  • bin/jam/index.ts
  • packages/workers/block-generator/index.ts
  • packages/workers/block-generator/bootstrap-main.ts
  • packages/core/telemetry/index.ts
  • packages/core/networking/metrics.ts
  • bin/jam/telemetry.ts
  • packages/workers/block-generator/metrics.ts
  • packages/jam/node/index.ts
  • packages/workers/jam-network/bootstrap-main.ts
  • packages/workers/block-generator/generator.ts
  • packages/workers/api-node/port.ts
  • packages/workers/importer/metrics.ts
  • packages/workers/importer/index.ts
  • packages/jam/node/main-importer.ts
  • packages/core/networking/setup.ts
  • packages/core/utils/debug.ts
  • packages/core/utils/env.ts
  • packages/core/utils/dev.ts
  • packages/core/networking/index.ts
  • packages/workers/api-node/config.ts
  • packages/workers/importer/bootstrap-main.ts
  • packages/workers/importer/importer.ts
  • packages/core/utils/compatibility.ts
packages/**/package.json

⚙️ CodeRabbit configuration file

packages/**/package.json: Whenever a new package is added, the packages/README.md should be updated accordingly.

packages/**/package.json: Dependencies coming from the workspace (typically the ones starting with @typeberry/)
must use * as version.

Files:

  • packages/workers/importer/package.json
  • packages/jam/node/package.json
  • packages/core/telemetry/package.json
  • packages/workers/jam-network/package.json
  • packages/workers/block-generator/package.json
  • packages/core/networking/package.json
🧠 Learnings (4)
📚 Learning: 2025-06-18T20:35:13.536Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 442
File: packages/core/pvm-debugger-adapter/index.ts:22-40
Timestamp: 2025-06-18T20:35:13.536Z
Learning: The `typeberry/utils` package has browser compatibility issues due to Node.js-specific code like `measure` function using `process.hrtime()` and `testUtils` importing `node:assert`, causing white screens in browser environments.

Applied to files:

  • bin/jam/package.json
  • packages/jam/node/main.ts
  • packages/core/telemetry/package.json
  • packages/core/utils/debug.ts
  • packages/workers/jam-network/package.json
📚 Learning: 2025-09-15T14:37:04.214Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 625
File: benchmarks/package.json:8-15
Timestamp: 2025-09-15T14:37:04.214Z
Learning: In the FluffyLabs/typeberry monorepo, the workspace:* protocol does not work for internal typeberry dependencies. The working approach is to use "*" as the version specifier for internal monorepo packages instead of "workspace:*".

Applied to files:

  • bin/jam/package.json
  • packages/workers/importer/package.json
  • package.json
  • packages/jam/node/package.json
  • packages/core/telemetry/package.json
  • packages/workers/jam-network/package.json
  • packages/workers/block-generator/package.json
📚 Learning: 2025-06-18T20:35:13.536Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 442
File: packages/core/pvm-debugger-adapter/index.ts:22-40
Timestamp: 2025-06-18T20:35:13.536Z
Learning: The `measure` function in `typeberry/utils/debug.ts` attempts environment detection by checking `process === undefined` but still causes bundling issues because bundlers see the `process.hrtime` reference in the Node.js branch.

Applied to files:

  • packages/jam/node/main.ts
  • bin/jam/index.ts
  • packages/core/utils/debug.ts
  • packages/core/utils/env.ts
  • packages/core/utils/dev.ts
📚 Learning: 2025-05-26T21:31:58.688Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 399
File: .github/workflows/vectors-jamduna.yml:78-78
Timestamp: 2025-05-26T21:31:58.688Z
Learning: In the typeberry project, the npm start script in the test-runner workspace is designed to accept test suite names as arguments (e.g., "jamduna", "w3f"). The command `npm start -w typeberry/test-runner jamduna` is the correct way to run the jamduna test suite, not `npm run jamduna`. This is an architectural design choice where the start script acts as a unified entry point for different test suites.

Applied to files:

  • bin/jam/index.ts
🧬 Code graph analysis (12)
packages/jam/node/metrics.ts (3)
packages/core/networking/metrics.ts (1)
  • createMetrics (10-112)
packages/workers/block-generator/metrics.ts (1)
  • createMetrics (10-52)
packages/workers/importer/metrics.ts (1)
  • createMetrics (10-93)
bin/jam/index.ts (1)
packages/core/telemetry/index.ts (1)
  • initializeTelemetry (18-28)
packages/workers/block-generator/bootstrap-main.ts (2)
packages/workers/api-node/protocol.ts (1)
  • initWorker (106-163)
packages/core/telemetry/index.ts (1)
  • initializeTelemetry (18-28)
packages/core/telemetry/index.ts (2)
packages/core/logger/index.ts (1)
  • Logger (20-120)
packages/core/utils/env.ts (1)
  • env (1-1)
packages/core/networking/metrics.ts (3)
packages/jam/node/metrics.ts (1)
  • createMetrics (10-46)
packages/workers/block-generator/metrics.ts (1)
  • createMetrics (10-52)
packages/workers/importer/metrics.ts (1)
  • createMetrics (10-93)
packages/workers/block-generator/metrics.ts (3)
packages/core/networking/metrics.ts (1)
  • createMetrics (10-112)
packages/jam/node/metrics.ts (1)
  • createMetrics (10-46)
packages/workers/importer/metrics.ts (1)
  • createMetrics (10-93)
packages/workers/jam-network/bootstrap-main.ts (2)
packages/workers/api-node/protocol.ts (1)
  • initWorker (106-163)
packages/core/telemetry/index.ts (1)
  • initializeTelemetry (18-28)
packages/workers/block-generator/generator.ts (2)
packages/jam/block/header.ts (1)
  • Header (128-194)
packages/core/utils/debug.ts (1)
  • now (13-25)
packages/workers/importer/metrics.ts (3)
packages/core/networking/metrics.ts (1)
  • createMetrics (10-112)
packages/jam/node/metrics.ts (1)
  • createMetrics (10-46)
packages/workers/block-generator/metrics.ts (1)
  • createMetrics (10-52)
packages/core/networking/setup.ts (5)
packages/core/networking/peer-verification.ts (1)
  • peerVerification (4-21)
packages/core/networking/quic-utils.ts (1)
  • addEventListener (6-19)
packages/core/networking/certificate.ts (1)
  • PeerInfo (33-36)
packages/core/networking/quic-peer.ts (1)
  • QuicPeer (13-59)
packages/core/utils/debug.ts (1)
  • now (13-25)
packages/workers/importer/bootstrap-main.ts (2)
packages/workers/api-node/protocol.ts (1)
  • initWorker (106-163)
packages/core/telemetry/index.ts (1)
  • initializeTelemetry (18-28)
packages/workers/importer/importer.ts (2)
packages/core/utils/debug.ts (1)
  • now (13-25)
packages/core/utils/result.ts (1)
  • resultToString (73-78)
🪛 markdownlint-cli2 (0.18.1)
bin/jam/README.md

140-140: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: test (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: e2e (22.x)
🔇 Additional comments (42)
packages/workers/jam-network/package.json (1)

17-17: LGTM!

Workspace dependency correctly uses "*" as the version specifier per coding guidelines.

packages/workers/api/config.ts (2)

6-7: LGTM!

The addition of the nodeName field to the WorkerConfig interface is clean and well-documented.


24-36: LGTM!

The DirectWorkerConfig.new factory method correctly accepts and propagates the nodeName parameter through to the constructor. The implementation is consistent with the interface definition.

packages/core/utils/debug.ts (2)

5-25: LGTM! Improved cross-platform timing with proper bundler handling.

The new now() utility correctly handles both browser and Node.js environments. The deferred process.hrtime reference (lines 16-24) addresses the bundler issues noted in previous learnings. The Number() cast on line 19 has an adequate safety explanation per coding guidelines—dividing hrtime nanoseconds by 1M yields milliseconds well below Number.MAX_SAFE_INTEGER.

Based on learnings.


134-137: LGTM! Clean refactor using the new timing utility.

The measure() function now correctly uses the unified now() utility, simplifying the implementation while maintaining the same functionality.

packages/core/networking/package.json (1)

16-16: OpenTelemetry API version is current and secure.

Version 1.9.0 is the latest stable @opentelemetry/api on npm, and there are no reported/direct vulnerabilities for @opentelemetry/api v1.9.0 in public vulnerability databases.

packages/jam/node/package.json (1)

7-7: Version 1.9.0 is the latest stable release and has no known security vulnerabilities.

The npm package @opentelemetry/api version 1.9.0 has no publicly reported CVEs, and security scans show no direct vulnerabilities.

packages/jam/node/index.ts (1)

6-6: LGTM!

The metrics module export aligns with the telemetry integration and follows the existing barrel pattern.

packages/workers/jam-network/bootstrap-main.ts (1)

1-11: LGTM!

The telemetry initialization follows the consistent pattern used across other workers and correctly initializes OpenTelemetry before the main worker logic executes.

packages/jam/node/export.ts (1)

45-45: LGTM!

The nodeName propagation aligns with the telemetry configuration requirements across the codebase.

bin/jam/package.json (1)

17-17: LGTM!

The dependency correctly uses the "*" version specifier for the internal monorepo package.

packages/core/networking/index.ts (1)

3-3: LGTM!

The metrics module export follows the existing barrel pattern and makes the networking metrics API publicly accessible.

packages/workers/block-generator/index.ts (1)

2-2: LGTM!

The metrics module export follows the existing barrel pattern and aligns with the telemetry instrumentation for block authoring.

packages/workers/block-generator/bootstrap-main.ts (1)

2-12: LGTM!

The telemetry initialization follows the consistent pattern used across other workers and correctly initializes OpenTelemetry before the main worker logic executes.

bin/jam/index.ts (1)

84-89: LGTM! Telemetry initialization is properly positioned.

The telemetry is initialized early in the startup sequence (after config preparation but before command handling), ensuring metrics are available throughout the node's lifecycle. The parameters correctly match the expected TelemetryConfig interface.

packages/workers/importer/index.ts (1)

2-2: LGTM! Clean barrel export.

The metrics module is properly exposed through the package's public API surface.

packages/jam/node/main-importer.ts (1)

24-24: LGTM! Parameter propagation is straightforward.

The nodeName is correctly extracted and propagated to both worker configuration paths, enabling telemetry and worker identification.

Also applies to: 37-37, 46-46

packages/workers/api-node/port.ts (1)

2-2: LGTM! Type-only imports improve tree-shaking.

Converting to type-only imports for MessagePort and Transferable is a good practice that helps avoid unnecessary runtime dependencies.

packages/jam/node/main.ts (2)

35-36: LGTM! Metrics initialization is well-positioned.

Creating the metrics instance early in the main function ensures it's available before any operations that need instrumentation.


75-79: LGTM! Metrics wrapper preserves async behavior correctly.

The wrapper properly records the block change metric before delegating to the original handler, maintaining the async flow without blocking or race conditions.

packages/core/utils/dev.ts (1)

1-1: LGTM! Centralized environment access improves consistency.

Using the shared env module ensures consistent environment handling across the codebase. Please see the verification request in packages/core/utils/compatibility.ts to confirm the env module implementation is correct.

Based on learnings

Also applies to: 21-21

packages/workers/importer/package.json (1)

7-7: No issues identified.

The opentelemetry-js repository shows 0 published security advisories, and version 1.9.0 is confirmed as the latest stable release. The dependency is correctly specified.

packages/core/utils/compatibility.ts (1)

1-2: env module correctly handles cross-platform compatibility.

The centralized env module implementation uses a safe pattern: it returns an empty object in browser environments (where process is undefined) and returns process.env in Node.js. This ensures safe cross-platform compatibility without introducing bundling issues—the typeof guard prevents runtime errors, and process.env is designed to be handled by bundlers during compilation.

packages/core/utils/env.ts (1)

1-1: LGTM! Browser-safe environment detection.

This centralized approach avoids the bundler issues previously encountered with direct process.env references scattered throughout the codebase.

Based on learnings

packages/workers/block-generator/generator.ts (4)

12-13: LGTM! Clean metrics integration.

The imports correctly bring in the timing utility and metrics module for block authoring instrumentation.


19-19: LGTM! Metrics field properly typed.

Using ReturnType<typeof metrics.createMetrics> ensures type safety without duplicating the metrics interface.


74-75: LGTM! Proper timing capture and event recording.

The start time is captured immediately before the block authoring logic, and the JIP-3 authoring event is recorded with the appropriate slot number.


132-133: LGTM! Duration tracking completed correctly.

The block authoring duration is calculated and recorded along with the completion event, providing useful timing metrics for observability.

packages/workers/block-generator/metrics.ts (1)

1-52: LGTM! Consistent metrics implementation.

The metrics module follows the established pattern from packages/core/networking/metrics.ts and packages/workers/importer/metrics.ts, providing a clean factory API for recording block authoring events. The JIP-3 event numbers (40, 41, 42) are properly documented.

bin/jam/telemetry.ts (1)

1-3: LGTM! Clean barrel export.

The re-export pattern provides a clean interface for telemetry functionality without duplicating implementation.

packages/workers/importer/bootstrap-main.ts (1)

1-11: LGTM! Proper telemetry initialization sequence.

The telemetry is correctly initialized after the configuration is obtained (line 6) and before the main worker logic starts (line 12), ensuring metrics are captured from the start of the import workflow.

packages/workers/importer/importer.ts (4)

10-11: LGTM! Proper imports for timing and metrics.

The imports correctly bring in the timing utility (now) and the metrics module for instrumentation.


91-95: LGTM! Comprehensive import lifecycle tracking.

The block import is properly instrumented with start event recording and timing capture for duration measurement.


119-127: LGTM! Verification timing and failure tracking.

The verification phase is properly instrumented with both timing metrics and failure path recording.


152-160: Hardcoded gas cost should be replaced with actual execution cost.

Line 160 records block execution with a gas cost of 0. This appears to be a placeholder value. The execution result (res.ok on line 162) likely contains actual gas/cost information that should be extracted and recorded.

Please verify if the execution result contains cost information and update the metrics call:

#!/bin/bash
# Search for cost/gas information in the execution result type
echo "=== Searching for gas/cost in transition/chain-stf.ts ==="
rg -n -A5 -B5 'export.*transition|interface.*Result' --type=ts packages/core/transition/chain-stf.ts | head -50

echo -e "\n=== Checking StateUpdate type definition ==="
rg -n 'export.*StateUpdate|interface.*StateUpdate' --type=ts packages/
⛔ Skipped due to learnings
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 390
File: packages/jam/jam-host-calls/fetch.ts:220-226
Timestamp: 2025-05-16T16:23:08.685Z
Learning: In the typeberry codebase, host call (HC) implementations define a gas cost but don't need to explicitly consume gas in their execute() methods. Gas accounting is handled outside of the host call implementation.
packages/core/telemetry/index.ts (2)

18-28: LGTM! Clean configuration wrapper.

The initializeTelemetry function provides a simplified interface that translates high-level configuration into the detailed telemetry setup, reading environment variables with sensible defaults.


59-67: Excellent documentation of multi-worker challenges.

The comment clearly explains why pull-based Prometheus metrics aren't used and acknowledges the architectural constraint of multiple worker threads, with a forward-looking note about TART/JIP-3.

packages/jam/node/metrics.ts (1)

42-44: LGTM!

Correct conversion of the boolean synced attribute to string, consistent with OpenTelemetry best practices.

packages/core/networking/setup.ts (1)

193-204: LGTM! Duration tracking correctly implemented.

The connection duration measurement using now() is properly implemented. The start time is captured when the peer is created, and the duration is calculated on disconnect and passed to the metrics.

packages/workers/importer/metrics.ts (1)

1-93: LGTM! Well-structured metrics module.

The importer metrics module follows the established pattern consistently:

  • Slot values are correctly converted to strings (line 71)
  • Boolean success flag is converted to string (line 67)
  • Histograms and counters are properly instrumented
  • JIP-3 event numbering is documented
packages/core/networking/metrics.ts (1)

1-112: LGTM! Comprehensive networking metrics module.

The networking metrics module is well-structured:

  • Clear JIP-3 event numbering (20-28) with documentation
  • Connection duration histogram properly tracks time by side (in/out)
  • All metric instruments have appropriate descriptions and units
  • API methods correctly update counters and histograms with relevant attributes
packages/workers/api-node/config.ts (1)

17-29: LGTM! Clean addition of nodeName field.

The nodeName field has been consistently added to both LmdbWorkerConfig and InMemWorkerConfig:

  • Static factory methods properly accept and propagate the field
  • Constructors updated to accept and store as public readonly
  • TransferableConfig type and serialization methods correctly handle the field
  • All code paths (new, fromTransferable, intoTransferable) are consistent

Also applies to: 39-53, 66-82, 90-115

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bin/jam/index.ts (1)

11-99: Signal handlers must call process.exit() after shutdown, and normal command paths should also clean up telemetry.

The review comment correctly identifies that signal handlers prevent Node's default exit behavior. Once process.on("SIGINT"/"SIGTERM") is attached, Node won't auto-exit on these signals—the handlers must explicitly call process.exit() or the process continues running indefinitely.

Additionally, the code paths for FuzzTarget (line 106), Import (line 111), Export (line 115), and the default node start (line 117) all return without calling telemetry shutdown. While the signal handlers are the critical issue, these normal paths should also clean up telemetry before returning to ensure metrics are properly flushed on graceful termination.

The suggested fix in the review (wrapping shutdown + exit in a helper) is correct and aligns with the pattern already used in bin/rpc/main.ts:77-79.

♻️ Duplicate comments (2)
packages/core/networking/setup.ts (1)

136-188: Avoid double-counting outbound failures and reconsider sentinel peer ID

Two related concerns in the outbound dial path:

  1. Failure metrics can be emitted twice for a single attempt
  • In the peerDetails.info === null and peer_id_mismatch branches you call networkMetrics.recordConnectOutFailed(...) and then throw.
  • The outer catch also calls networkMetrics.recordConnectOutFailed(String(error)), so those cases will produce two failure events per failed dial.

This will skew telemetry. A simple guard would ensure at most one failure metric per attempt, e.g.:

       async function doDial() {
         const peerAddress = `${peer.host}:${peer.port}`;
         const peerDetails = peerVerification();
+        let failureRecorded = false;

         try {
@@
-          if (peerDetails.info === null) {
-            networkMetrics.recordConnectOutFailed("no_peer_info");
+          if (peerDetails.info === null) {
+            failureRecorded = true;
+            networkMetrics.recordConnectOutFailed("no_peer_info");
             await client.destroy({ isApp: true, errorCode: CloseReason.PeerIdMismatch });
             throw new Error("Client connected, but there is no peer details!");
           }
@@
-          if (options.verifyName !== undefined && options.verifyName !== peerDetails.info.id) {
-            networkMetrics.recordConnectOutFailed("peer_id_mismatch");
+          if (options.verifyName !== undefined && options.verifyName !== peerDetails.info.id) {
+            failureRecorded = true;
+            networkMetrics.recordConnectOutFailed("peer_id_mismatch");
             await client.destroy({ isApp: true, errorCode: CloseReason.PeerIdMismatch });
@@
-        } catch (error) {
-          networkMetrics.recordConnectOutFailed(String(error));
+        } catch (error) {
+          if (!failureRecorded) {
+            networkMetrics.recordConnectOutFailed(String(error));
+          }
           throw error;
         }
  1. Using "unknown" as a synthetic peer ID is still a magic sentinel

recordConnectingOut(peerDetails.info?.id ?? "unknown", peerAddress) improves over the previous "pending" literal but still injects a fake peer ID into metrics. This is essentially the same concern as the earlier review comment: ideally the metric API would accept an optional peer ID (or a dedicated “peer_unknown” attribute) rather than overloading an ID label with a synthetic value.

If you can adjust the metrics signature, something like the following would avoid magic IDs:

// in metrics.ts
recordConnectingOut(peerAddress: string, peerId?: string) { /* ... */ }

// here
networkMetrics.recordConnectingOut(peerAddress, peerDetails.info?.id);

That keeps the cardinality model clear and lets the exporter handle “unknown” explicitly instead of via a magic string.

packages/jam/node/metrics.ts (1)

33-44: Standardize slot attribute type (string) to match other metrics.

Here slot is recorded as a number:

bestBlockChangedCounter.add(1, { slot, header_hash: headerHash });
finalizedBlockChangedCounter.add(1, { slot, header_hash: headerHash });

Other metrics modules in this PR use slot.toString() for the same attribute, and mixing numeric vs. string types for a given attribute key can cause surprises in backends and make cross-metric queries harder.

Recommend coercing to string for consistency:

    recordBestBlockChanged(slot: number, headerHash: string): void {
-      bestBlockChangedCounter.add(1, { slot, header_hash: headerHash });
+      bestBlockChangedCounter.add(1, { slot: slot.toString(), header_hash: headerHash });
    },

    recordFinalizedBlockChanged(slot: number, headerHash: string): void {
-      finalizedBlockChangedCounter.add(1, { slot, header_hash: headerHash });
+      finalizedBlockChangedCounter.add(1, { slot: slot.toString(), header_hash: headerHash });
    },
🧹 Nitpick comments (3)
packages/core/networking/setup.ts (1)

193-201: Fix reason fallback when disconnect events have no detail

String(ev.detail) ?? "normal" will never produce "normal" because String(...) is never null/undefined. If ev.detail is missing, you’ll emit "undefined" instead of "normal".

Consider:

-        const reason = String(ev.detail) ?? "normal";
+        const reason = String(ev.detail ?? "normal");

This preserves detailed reasons when present and falls back cleanly to "normal" when not.

packages/core/telemetry/index.ts (1)

39-104: Handle NodeSDK.start() as an async operation to avoid unhandled rejections.

NodeSDK.start() returns a Promise, but it’s called like a sync method inside a try block:

try {
  sdk.start();
  logger.info`OpenTelemetry initialized for service: ${serviceName}`;
} catch (error) {
  logger.error`Error initializing OpenTelemetry: ${error}`;
}

This won’t catch async startup errors and can produce unhandled promise rejections if the SDK fails to start.

You can keep initializeTelemetryFull synchronous while still handling errors by chaining on the Promise:

-  try {
-    sdk.start();
-    logger.info`OpenTelemetry initialized for service: ${serviceName}`;
-  } catch (error) {
-    logger.error`Error initializing OpenTelemetry: ${error}`;
-  }
+  sdk
+    .start()
+    .then(() => {
+      logger.info`OpenTelemetry initialized for service: ${serviceName}`;
+    })
+    .catch((error) => {
+      logger.error`Error initializing OpenTelemetry: ${error}`;
+    });

This way you still return sdk immediately, but startup success/failure is properly logged and async errors are not dropped.

packages/workers/importer/metrics.ts (1)

65-92: Stringify slot attribute for consistency with other metrics.

In the importer metrics you currently emit slot as a number:

recordBlockImportingStarted(slot: number): void {
  blockImportingCounter.add(1, { slot });
}

Other modules in this PR use slot.toString() for the same attribute key. Mixing numeric and string types for slot across instruments/backends can complicate querying and, in some systems, lead to dropped or split series.

Consider normalizing to string here as well:

    recordBlockImportingStarted(slot: number): void {
-      blockImportingCounter.add(1, { slot });
+      blockImportingCounter.add(1, { slot: slot.toString() });
    },
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae1bbc6 and ca81567.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (8)
  • bin/jam/README.md (1 hunks)
  • bin/jam/index.ts (2 hunks)
  • packages/core/networking/setup.ts (5 hunks)
  • packages/core/telemetry/index.ts (1 hunks)
  • packages/core/utils/debug.ts (2 hunks)
  • packages/jam/node/metrics.ts (1 hunks)
  • packages/workers/block-generator/metrics.ts (1 hunks)
  • packages/workers/importer/metrics.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/core/utils/debug.ts
  • packages/workers/block-generator/metrics.ts
🧰 Additional context used
📓 Path-based instructions (2)
**/*.ts

⚙️ CodeRabbit configuration file

**/*.ts: rules from ./CODESTYLE.md should be adhered to.

**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in *.test.ts files. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.

**/*.ts: as conversions must not be used. Suggest using tryAs conversion methods.

**/*.ts: Classes with static Codec field must have private constructor and static create method.

**/*.ts: Casting a bigint (or U64) using Number(x) must have an explanation comment why
it is safe.

**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.

Files:

  • bin/jam/index.ts
  • packages/core/telemetry/index.ts
  • packages/jam/node/metrics.ts
  • packages/core/networking/setup.ts
  • packages/workers/importer/metrics.ts
packages/core/**/*.ts

⚙️ CodeRabbit configuration file

packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined under packages/jam/**)

Files:

  • packages/core/telemetry/index.ts
  • packages/core/networking/setup.ts
🧠 Learnings (1)
📚 Learning: 2025-05-26T21:31:58.688Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 399
File: .github/workflows/vectors-jamduna.yml:78-78
Timestamp: 2025-05-26T21:31:58.688Z
Learning: In the typeberry project, the npm start script in the test-runner workspace is designed to accept test suite names as arguments (e.g., "jamduna", "w3f"). The command `npm start -w typeberry/test-runner jamduna` is the correct way to run the jamduna test suite, not `npm run jamduna`. This is an architectural design choice where the start script acts as a unified entry point for different test suites.

Applied to files:

  • bin/jam/index.ts
🧬 Code graph analysis (2)
bin/jam/index.ts (1)
packages/core/telemetry/index.ts (2)
  • initializeTelemetry (18-28)
  • shutdownTelemetry (106-115)
packages/core/telemetry/index.ts (2)
packages/core/logger/index.ts (1)
  • Logger (20-120)
packages/core/utils/env.ts (1)
  • env (1-1)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: run (22.x)
  • GitHub Check: state_transition (22.x)
  • GitHub Check: e2e (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: test (22.x)
  • GitHub Check: jam-e2e (22.x)
  • GitHub Check: benchmarks (22.x)
🔇 Additional comments (2)
packages/core/networking/setup.ts (2)

6-47: Metrics initialization per QuicNetwork instance looks good

Creating a single networkMetrics via metrics.createMetrics() at the start of Quic.setup is a clean integration point and keeps instrumentation localized to this networking instance.


98-129: Inbound connection metrics coverage is consistent

The sequence recordConnectingIn → conditional recordConnectInFailed / recordConnectionRefusedrecordConnectedIn on success cleanly models the inbound connection lifecycle and matches the existing control flow (self-connection, duplicate, missing lastConnectedPeer.info).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
bin/jam/README.md (1)

135-163: Documentation successfully clarifies OTLP push-based telemetry.

The OpenTelemetry section correctly describes:

  • Environment variables with clear defaults (OTEL_ENABLED=true, OTLP endpoint)
  • Push-based metrics flow (jam pushes to Prometheus OTLP endpoint, not the reverse)
  • Docker setup for local testing with Prometheus

The documentation now properly addresses the past concern about pull-based metrics endpoints—there's no reference to the old 9464/metrics pattern. The setup is clearly push-based via OTLP.

Consider clarifying that jam does not expose a local /metrics endpoint by default (metrics are pushed to the configured OTLP endpoint). This could help users avoid confusion if they search for metrics on the jam process itself.

Consider adding a clarifying note like:

 To inspect metrics pushed over OTLP, start a Prometheus container with the OTLP receiver enabled:
+
+(Note: jam pushes metrics to the configured OTLP endpoint and does not expose a local `/metrics` endpoint.)

This prevents users from looking for metrics in the wrong place.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca81567 and 0166efd.

📒 Files selected for processing (2)
  • bin/jam/README.md (1 hunks)
  • packages/README.md (1 hunks)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: run (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: benchmarks (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: test (22.x)
  • GitHub Check: e2e (22.x)
🔇 Additional comments (2)
packages/README.md (1)

32-32: Documentation entry for new telemetry package is well-placed and formatted correctly.

The addition of the telemetry package to the core packages list is appropriately positioned (alphabetically between shuffling and trie), follows the established format, and provides a concise description that aligns with the package's purpose of OpenTelemetry initialization utilities.

bin/jam/README.md (1)

135-163: The Prometheus OTLP receiver flag is correct.

The flag --web.enable-otlp-receiver is the correct way to enable OTLP HTTP receiver in current Prometheus versions. The documentation in lines 157-160 is accurate.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
packages/jam/node/main-importer.ts (1)

24-54: LGTM! NodeName properly propagated to worker configs.

The nodeName is correctly extracted from config and passed to both InMemWorkerConfig and LmdbWorkerConfig constructors, enabling per-node telemetry identification.

Minor style note: The nodeName variable is extracted on line 24 but config.nodeName is still used directly on line 29. For consistency, consider moving the extraction before line 27 and using the nodeName variable throughout, or just use config.nodeName directly everywhere.

Optional refactor for consistency:

 const chainSpec = getChainSpec(config.node.flavor);
 const blake2b = await Blake2b.createHasher();
+const nodeName = config.nodeName;
 const omitSealVerification = false;

 const { dbPath, genesisHeaderHash } = getDatabasePath(
   blake2b,
-  config.nodeName,
+  nodeName,
   config.node.chainSpec.genesisHeader,
   withRelPath(config.node.databaseBasePath ?? "<in-memory>"),
 );
-const nodeName = config.nodeName;
packages/workers/importer/importer.ts (1)

163-168: Consider adding metrics for state update failures.

Lines 165-168 handle state update errors but don't record specific metrics like verification and execution failures do. While the overall import failure is captured at line 107, adding granular metrics here would improve observability for debugging state-related issues.

Example:

 if (updateResult.isError) {
   logger.error`🧱 Unable to update state: ${resultToString(updateResult)}`;
+  this.metrics.recordStateUpdateFailed?.(resultToString(updateResult));
   return importerError(ImporterErrorKind.Update, updateResult);
 }

Note: This assumes the metrics API supports state update tracking.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0166efd and 017d3c7.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (6)
  • bin/jam/package.json (1 hunks)
  • packages/jam/node/main-importer.ts (3 hunks)
  • packages/workers/api-node/config.ts (6 hunks)
  • packages/workers/api-node/port.ts (0 hunks)
  • packages/workers/api/config.ts (2 hunks)
  • packages/workers/importer/importer.ts (6 hunks)
💤 Files with no reviewable changes (1)
  • packages/workers/api-node/port.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/jam/package.json
🧰 Additional context used
📓 Path-based instructions (1)
**/*.ts

⚙️ CodeRabbit configuration file

**/*.ts: rules from ./CODESTYLE.md should be adhered to.

**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in *.test.ts files. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.

**/*.ts: as conversions must not be used. Suggest using tryAs conversion methods.

**/*.ts: Classes with static Codec field must have private constructor and static create method.

**/*.ts: Casting a bigint (or U64) using Number(x) must have an explanation comment why
it is safe.

**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.

Files:

  • packages/workers/importer/importer.ts
  • packages/jam/node/main-importer.ts
  • packages/workers/api/config.ts
  • packages/workers/api-node/config.ts
🧬 Code graph analysis (1)
packages/workers/importer/importer.ts (2)
packages/core/utils/debug.ts (1)
  • now (13-13)
packages/core/utils/result.ts (1)
  • resultToString (73-78)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test (22.x)
  • GitHub Check: benchmarks (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: build (22.x)
  • GitHub Check: jam-e2e (22.x)
  • GitHub Check: e2e (22.x)
🔇 Additional comments (8)
packages/workers/api/config.ts (1)

6-7: LGTM! Clean addition of nodeName property.

The nodeName property is consistently added to the WorkerConfig interface and DirectWorkerConfig implementation. The factory method and constructor properly handle the new parameter, following the existing patterns in the codebase.

Also applies to: 24-40

packages/workers/api-node/config.ts (2)

17-29: LGTM! Comprehensive nodeName integration for LmdbWorkerConfig.

The nodeName parameter is properly threaded through the static factory method, constructor, and serialization methods (intoTransferable/fromTransferable). The TransferableConfig type is correctly updated to include nodeName for cross-thread communication.

Also applies to: 39-48, 66-78


90-102: LGTM! Clean nodeName integration for InMemWorkerConfig.

The nodeName parameter is correctly added to the static factory method and constructor, following the same pattern as LmdbWorkerConfig.

Also applies to: 107-112

packages/workers/importer/importer.ts (5)

10-11: LGTM! Clean imports for telemetry instrumentation.

The addition of now for high-resolution timing and the metrics module are appropriate for the telemetry functionality being added.


37-37: LGTM! Proper metrics initialization.

The metrics field is correctly declared as readonly and initialized early in the constructor.

Also applies to: 47-47


90-96: LGTM! Comprehensive block import timing.

The timing instrumentation correctly captures both successful and failed import attempts with accurate duration measurements.

Also applies to: 101-101, 107-107


119-127: LGTM! Proper verification metrics.

Both success and failure paths of block verification are correctly instrumented with timing and error details.


152-160: Clarify whether the hardcoded 0 is intentional or a placeholder.

The metrics API parameter is cost (not code), representing block execution gas. However, the STF result type does not include gas/cost data, making it impossible to extract this value from res.ok. Line 160 passing hardcoded 0 is the only feasible option given the current STF API.

Consider clarifying intent:

  • If execution cost is always 0 because STF is a pure state transition, add a comment explaining this.
  • If this is a placeholder for future implementation, document it as such or use a named constant like NO_GAS_COST = 0.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e898807 and f2be9de.

📒 Files selected for processing (2)
  • bin/jam/index.ts (2 hunks)
  • packages/core/telemetry/index.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/jam/index.ts
🧰 Additional context used
📓 Path-based instructions (2)
packages/core/**/*.ts

⚙️ CodeRabbit configuration file

packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined under packages/jam/**)

Files:

  • packages/core/telemetry/index.ts
**/*.ts

⚙️ CodeRabbit configuration file

**/*.ts: rules from ./CODESTYLE.md should be adhered to.

**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in *.test.ts files. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.

**/*.ts: as conversions must not be used. Suggest using tryAs conversion methods.

**/*.ts: Classes with static Codec field must have private constructor and static create method.

**/*.ts: Casting a bigint (or U64) using Number(x) must have an explanation comment why
it is safe.

**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.

Files:

  • packages/core/telemetry/index.ts
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: e2e (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: benchmarks (22.x)
  • GitHub Check: state_transition (22.x)
  • GitHub Check: test (22.x)
  • GitHub Check: jam-e2e (22.x)
🔇 Additional comments (7)
packages/core/telemetry/index.ts (7)

1-11: LGTM!

The imports are appropriate for OpenTelemetry integration and comply with the coding guideline that core packages must not import JAM-related packages.


13-17: LGTM!

The interface design is clean and provides the necessary configuration for telemetry initialization.


24-24: Verify the opt-out telemetry design.

Telemetry is enabled by default unless OTEL_ENABLED is explicitly set to "false". This opt-out approach means telemetry will be active in any environment where the variable is undefined or set to any value other than "false".

For production deployments, opt-in telemetry is typically preferred to avoid unexpected metric collection and network traffic. Please confirm whether this default-enabled design is intentional for both development and production environments.


32-40: LGTM!

The internal configuration interface appropriately extends the public API with implementation details.


79-79: Verify the 3-second export interval is appropriate.

The exportIntervalMillis is set to 3 seconds, which is quite aggressive. Standard practice for production environments is typically 30-60 seconds to balance observability with network overhead and server load.

While this may be appropriate for local testing (as mentioned in the PR objectives), please confirm whether this is the intended interval for production deployments or if it should be made configurable.


62-70: Good documentation of multi-threaded metrics challenges.

The comment clearly explains why pull-based Prometheus metrics aren't implemented yet and acknowledges the cross-worker aggregation challenge. This provides valuable context for future work.


110-119: LGTM!

The shutdown function properly handles null SDKs, includes appropriate error handling, and uses suitable log levels for success and error cases.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
packages/core/telemetry/index.ts (1)

117-127: Startup failure handling may leave SDK in invalid state.

The try-catch block logs startup errors but still returns the SDK even if sdk.start() throws an exception. This could leave the SDK in an invalid or partially initialized state.

Past review comments suggested returning null when startup fails. While marked as addressed in earlier commits, the current code does not return null on startup failure.

Consider applying this fix:

 try {
   sdk.start();
   if (config.isMain) {
     logger.info`📳 OTLP metrics will be exported to ${otlpEndpoint}`;
   }
 } catch (error) {
   logger.error`🔴 Error initializing OpenTelemetry: ${error}`;
+  return null;
 }

 return sdk;
🧹 Nitpick comments (1)
packages/workers/importer/bootstrap-main.ts (1)

1-1: LGTM! Consider consistent naming across workers.

The telemetry initialization is correct. However, note that this file uses sdk as the variable name, while packages/workers/jam-network/bootstrap-main.ts uses tele. Consider using consistent naming across workers for better readability.

Apply this diff for consistency with other workers:

 // Initialize OpenTelemetry for this worker
-const sdk = Telemetry.initialize({
+const tele = Telemetry.initialize({
   nodeName: config.nodeName,
   worker: "importer",
 });
 await main(config, comms);
-await sdk?.close();
+await tele?.close();

Also applies to: 7-11

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2be9de and f07e93a.

📒 Files selected for processing (8)
  • bin/jam/index.ts (4 hunks)
  • bin/rpc/test/e2e-setup.ts (1 hunks)
  • bin/tci/index.ts (1 hunks)
  • packages/core/telemetry/index.ts (1 hunks)
  • packages/jam/node/main.ts (5 hunks)
  • packages/workers/block-generator/bootstrap-main.ts (1 hunks)
  • packages/workers/importer/bootstrap-main.ts (1 hunks)
  • packages/workers/jam-network/bootstrap-main.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/workers/block-generator/bootstrap-main.ts
🧰 Additional context used
📓 Path-based instructions (2)
**/*.ts

⚙️ CodeRabbit configuration file

**/*.ts: rules from ./CODESTYLE.md should be adhered to.

**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in *.test.ts files. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.

**/*.ts: as conversions must not be used. Suggest using tryAs conversion methods.

**/*.ts: Classes with static Codec field must have private constructor and static create method.

**/*.ts: Casting a bigint (or U64) using Number(x) must have an explanation comment why
it is safe.

**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.

Files:

  • packages/workers/importer/bootstrap-main.ts
  • packages/jam/node/main.ts
  • packages/workers/jam-network/bootstrap-main.ts
  • bin/tci/index.ts
  • bin/rpc/test/e2e-setup.ts
  • bin/jam/index.ts
  • packages/core/telemetry/index.ts
packages/core/**/*.ts

⚙️ CodeRabbit configuration file

packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined under packages/jam/**)

Files:

  • packages/core/telemetry/index.ts
🧠 Learnings (2)
📚 Learning: 2025-05-26T21:31:58.688Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 399
File: .github/workflows/vectors-jamduna.yml:78-78
Timestamp: 2025-05-26T21:31:58.688Z
Learning: In the typeberry project, the npm start script in the test-runner workspace is designed to accept test suite names as arguments (e.g., "jamduna", "w3f"). The command `npm start -w typeberry/test-runner jamduna` is the correct way to run the jamduna test suite, not `npm run jamduna`. This is an architectural design choice where the start script acts as a unified entry point for different test suites.

Applied to files:

  • packages/jam/node/main.ts
  • bin/jam/index.ts
📚 Learning: 2025-06-18T20:35:13.536Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 442
File: packages/core/pvm-debugger-adapter/index.ts:22-40
Timestamp: 2025-06-18T20:35:13.536Z
Learning: The `measure` function in `typeberry/utils/debug.ts` attempts environment detection by checking `process === undefined` but still causes bundling issues because bundlers see the `process.hrtime` reference in the Node.js branch.

Applied to files:

  • bin/jam/index.ts
🧬 Code graph analysis (3)
packages/workers/importer/bootstrap-main.ts (2)
packages/core/telemetry/index.ts (1)
  • Telemetry (19-49)
packages/jam/node/main.ts (1)
  • main (29-141)
packages/jam/node/main.ts (3)
packages/jam/node/jam-config.ts (1)
  • JamConfig (21-58)
packages/core/telemetry/index.ts (1)
  • Telemetry (19-49)
packages/jam/node/common.ts (1)
  • logger (19-19)
bin/jam/index.ts (1)
packages/core/telemetry/index.ts (1)
  • Telemetry (19-49)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: run (22.x)
  • GitHub Check: e2e (22.x)
  • GitHub Check: test (22.x)
  • GitHub Check: benchmarks (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: run (22.x)
  • GitHub Check: jam-e2e (22.x)
🔇 Additional comments (19)
bin/rpc/test/e2e-setup.ts (1)

16-16: LGTM!

Passing null for telemetry in the test setup is appropriate, as telemetry is not needed for this end-to-end test scenario.

bin/tci/index.ts (1)

31-31: LGTM!

The third argument (null) is correctly added to align with the updated main signature. For the TCI interface, passing null for telemetry is appropriate.

packages/jam/node/main.ts (5)

11-11: LGTM!

The function signature update to accept telemetry: Telemetry | null is well-designed:

  • Uses type-only import for Telemetry, minimizing runtime dependencies
  • Nullable parameter allows graceful degradation when telemetry is disabled
  • Aligns with the telemetry integration pattern across the codebase

Also applies to: 17-17, 29-33


58-58: LGTM!

Adding nodeName to baseConfig enables workers to initialize their own telemetry instances with the correct node name identifier.


80-84: LGTM!

The metrics recording is well-integrated into the best header announcement flow:

  • Slot is properly materialized from the timeSlotIndex
  • Block hash is converted to string for metric recording
  • Metrics are recorded before notifying listeners, ensuring observability

134-135: LGTM!

Telemetry shutdown is properly integrated into the cleanup sequence with optional chaining to handle null telemetry gracefully.


40-40: Metrics initialization is safe and properly handles disabled telemetry.

The verification confirms that metrics.createMetrics() safely handles the case when OpenTelemetry is not initialized. When the OpenTelemetry JS SDK (MeterProvider) is not installed/initialized, metrics.getMeter() returns a no-op Meter — a Meter whose instruments do nothing without errors. This is the standard OpenTelemetry API behavior and allows instrumentation to call getMeter() safely regardless of whether telemetry is enabled. The code in packages/jam/node/metrics.ts correctly relies on this pattern, and no changes are required.

packages/workers/jam-network/bootstrap-main.ts (2)

1-1: LGTM!

Telemetry initialization follows the correct pattern:

  • Initializes before the main worker logic
  • Uses the appropriate worker type identifier ("network")
  • Stores the result for cleanup

Also applies to: 7-11


13-13: LGTM!

Telemetry cleanup is properly positioned after the main worker logic completes, with optional chaining to handle disabled telemetry.

packages/workers/importer/bootstrap-main.ts (1)

13-15: LGTM!

Telemetry cleanup is properly sequenced before the forceful exit. The process.exit(0) ensures the worker terminates cleanly after telemetry shutdown.

bin/jam/index.ts (4)

11-11: LGTM!

Telemetry initialization in the main entry point is correctly configured:

  • isMain: true distinguishes the main thread from worker threads
  • Positioned appropriately before node initialization
  • Clear comment indicates intent

Also applies to: 84-90


101-109: LGTM!

Telemetry is correctly propagated to the import path, enabling observability during block import operations.


118-118: LGTM!

Telemetry is correctly propagated to the main node execution path.


93-96: Fuzz target path does not use telemetry.

The fuzz target path calls mainFuzz without telemetry integration. This is likely intentional since fuzzing is a development/testing activity that doesn't require production observability.

packages/core/telemetry/index.ts (5)

13-17: LGTM!

The TelemetryConfig interface is well-designed with appropriate fields for distinguishing main thread vs. worker threads and identifying the node and worker type.


19-49: LGTM!

The Telemetry class follows good design patterns:

  • Factory method (initialize) encapsulates SDK setup complexity
  • Private constructor with public factory method
  • Gracefully handles disabled telemetry by returning null
  • Async close() method properly handles shutdown and logs errors

51-66: LGTM!

The internal TelemetryConfigFull interface and the early return when telemetry is disabled are correctly implemented.


68-101: LGTM! Consider documenting the export interval choice.

The resource and exporter setup is well-structured:

  • Resource attributes correctly identify the service, version, and worker type
  • The comment at lines 81-89 clearly explains the multi-worker challenge with Prometheus pull-based metrics
  • Push-based OTLP exporter is an appropriate solution

The 3-second export interval (line 98) is reasonable for development/testing but might need tuning for production workloads.


1-9: All OpenTelemetry dependencies referenced in the imports are properly declared in packages/core/telemetry/package.json with valid version specifiers. No issues detected.

@github-actions
Copy link

View all
File Benchmark Ops
bytes/hex-from.ts[0] parse hex using Number with NaN checking 136146.61 ±0.41% 83.94% slower
bytes/hex-from.ts[1] parse hex from char codes 847851.62 ±0.65% fastest ✅
bytes/hex-from.ts[2] parse hex from string nibbles 543102.28 ±0.55% 35.94% slower
math/count-bits-u32.ts[0] standard method 85478225.52 ±1.5% 65.93% slower
math/count-bits-u32.ts[1] magic 250882030.94 ±6.36% fastest ✅
math/count-bits-u64.ts[0] standard method 8340072.73 ±1.93% 91% slower
math/count-bits-u64.ts[1] magic 92646752.47 ±1.25% fastest ✅
math/mul_overflow.ts[0] multiply and bring back to u32 247206758.96 ±7.57% 7.2% slower
math/mul_overflow.ts[1] multiply and take modulus 266392576.44 ±5.9% fastest ✅
math/switch.ts[0] switch 255266131.96 ±5.76% 8.07% slower
math/switch.ts[1] if 277678994.48 ±4.66% fastest ✅
collections/map-set.ts[0] 2 gets + conditional set 119992.35 ±0.3% fastest ✅
collections/map-set.ts[1] 1 get 1 set 60656.47 ±0.35% 49.45% slower
codec/bigint.compare.ts[0] compare custom 257259870.71 ±6.01% fastest ✅
codec/bigint.compare.ts[1] compare bigint 252840166.97 ±5.81% 1.72% slower
codec/bigint.decode.ts[0] decode custom 205000219.05 ±5.45% fastest ✅
codec/bigint.decode.ts[1] decode bigint 106868841 ±2.69% 47.87% slower
bytes/hex-to.ts[0] number toString + padding 319426.62 ±0.89% fastest ✅
bytes/hex-to.ts[1] manual 16333.13 ±0.14% 94.89% slower
hash/index.ts[0] hash with numeric representation 190.42 ±0.5% 27.87% slower
hash/index.ts[1] hash with string representation 121.06 ±0.12% 54.14% slower
hash/index.ts[2] hash with symbol representation 183.66 ±0.49% 30.43% slower
hash/index.ts[3] hash with uint8 representation 164.26 ±0.34% 37.78% slower
hash/index.ts[4] hash with packed representation 263.99 ±0.65% fastest ✅
hash/index.ts[5] hash with bigint representation 193.13 ±0.09% 26.84% slower
hash/index.ts[6] hash with uint32 representation 203.03 ±0.03% 23.09% slower
logger/index.ts[0] console.log with string concat 9062997.29 ±35.21% fastest ✅
logger/index.ts[1] console.log with args 633160.98 ±89.03% 93.01% slower
math/add_one_overflow.ts[0] add and take modulus 227578509.11 ±31.64% 19.22% slower
math/add_one_overflow.ts[1] condition before calculation 281725746.15 ±4.16% fastest ✅
codec/decoding.ts[0] manual decode 76205.81 ±30.78% 99.96% slower
codec/decoding.ts[1] int32array decode 210363464.81 ±4.2% fastest ✅
codec/decoding.ts[2] dataview decode 208025393.86 ±3.62% 1.11% slower
codec/encoding.ts[0] manual encode 2181817.42 ±0.25% 30.05% slower
codec/encoding.ts[1] int32array encode 2775245.25 ±1.07% 11.03% slower
codec/encoding.ts[2] dataview encode 3119135.48 ±0.27% fastest ✅
codec/view_vs_object.ts[0] Get the first field from Decoded 495510.26 ±0.21% 0.1% slower
codec/view_vs_object.ts[1] Get the first field from View 102014.58 ±0.18% 79.43% slower
codec/view_vs_object.ts[2] Get the first field as view from View 101667.24 ±0.15% 79.5% slower
codec/view_vs_object.ts[3] Get two fields from Decoded 492249.17 ±0.15% 0.76% slower
codec/view_vs_object.ts[4] Get two fields from View 81005.43 ±0.14% 83.67% slower
codec/view_vs_object.ts[5] Get two fields from materialized from View 169900.81 ±0.18% 65.75% slower
codec/view_vs_object.ts[6] Get two fields as views from View 80819.3 ±0.14% 83.71% slower
codec/view_vs_object.ts[7] Get only third field from Decoded 496030.07 ±0.24% fastest ✅
codec/view_vs_object.ts[8] Get only third field from View 100267.36 ±0.21% 79.79% slower
codec/view_vs_object.ts[9] Get only third field as view from View 100045.05 ±0.21% 79.83% slower
codec/view_vs_collection.ts[0] Get first element from Decoded 30243.42 ±0.3% 16.34% slower
codec/view_vs_collection.ts[1] Get first element from View 36150.88 ±90.2% fastest ✅
codec/view_vs_collection.ts[2] Get 50th element from Decoded 23510.17 ±5.54% 34.97% slower
codec/view_vs_collection.ts[3] Get 50th element from View 34592.8 ±0.32% 4.31% slower
codec/view_vs_collection.ts[4] Get last element from Decoded 30328.67 ±0.32% 16.11% slower
codec/view_vs_collection.ts[5] Get last element from View 23646.16 ±0.32% 34.59% slower
collections/hash-dict-vs-blob-dict_set.ts[0] StringHashDictionary 415.54 ±88.19% 39.32% slower
collections/hash-dict-vs-blob-dict_set.ts[1] BlobDictionary(1) 180.64 ±86.77% 73.62% slower
collections/hash-dict-vs-blob-dict_set.ts[2] BlobDictionary(2) 618.52 ±111.09% 9.68% slower
collections/hash-dict-vs-blob-dict_set.ts[3] BlobDictionary(3) 684.82 ±108.78% fastest ✅
collections/hash-dict-vs-blob-dict_set.ts[4] BlobDictionary(4) 135.1 ±88.41% 80.27% slower
collections/hash-dict-vs-blob-dict_set.ts[5] BlobDictionary(5) 567.5 ±113.25% 17.13% slower
collections/hash-dict-vs-blob-dict_delete.ts[0] StringHashDictionary 325.53 ±129.31% 2.28% slower
collections/hash-dict-vs-blob-dict_delete.ts[1] BlobDictionary(1) 319.87 ±124.73% 3.98% slower
collections/hash-dict-vs-blob-dict_delete.ts[2] BlobDictionary(2) 326.99 ±130.61% 1.84% slower
collections/hash-dict-vs-blob-dict_delete.ts[3] BlobDictionary(3) 292.92 ±128.59% 12.07% slower
collections/hash-dict-vs-blob-dict_delete.ts[4] BlobDictionary(4) 333.13 ±129.09% fastest ✅
collections/hash-dict-vs-blob-dict_delete.ts[5] BlobDictionary(5) 318.84 ±130.83% 4.29% slower
collections/hash-dict-vs-blob-dict_get.ts[0] StringHashDictionary 3620.36 ±0.32% 0.59% slower
collections/hash-dict-vs-blob-dict_get.ts[1] BlobDictionary(1) 3641.83 ±0.34% fastest ✅
collections/hash-dict-vs-blob-dict_get.ts[2] BlobDictionary(2) 3624.3 ±0.36% 0.48% slower
collections/hash-dict-vs-blob-dict_get.ts[3] BlobDictionary(3) 3629.37 ±0.31% 0.34% slower
collections/hash-dict-vs-blob-dict_get.ts[4] BlobDictionary(4) 3640.03 ±0.24% 0.05% slower
collections/hash-dict-vs-blob-dict_get.ts[5] BlobDictionary(5) 3627.02 ±0.26% 0.41% slower
collections/map_vs_sorted.ts[0] Map 305970.35 ±0.12% fastest ✅
collections/map_vs_sorted.ts[1] Map-array 103645.77 ±0.04% 66.13% slower
collections/map_vs_sorted.ts[2] Array 60188.9 ±2.95% 80.33% slower
collections/map_vs_sorted.ts[3] SortedArray 197670.83 ±0.23% 35.4% slower
bytes/bytes-to-number.ts[0] Conversion with bitops 8775.03 ±4.14% 1.56% slower
bytes/bytes-to-number.ts[1] Conversion without bitops 8913.66 ±4.83% fastest ✅
bytes/compare.ts[0] Comparing Uint32 bytes 20505.38 ±0.74% 7.31% slower
bytes/compare.ts[1] Comparing raw bytes 22123.02 ±0.2% fastest ✅
hash/blake2b.ts[0] our hasher 2.22 ±0.15% fastest ✅
hash/blake2b.ts[1] blake2b js 0.05 ±0.35% 97.75% slower
crypto/ed25519.ts[0] native crypto 7.96 ±0.84% 73.96% slower
crypto/ed25519.ts[1] wasm lib 10.88 ±0.03% 64.41% slower
crypto/ed25519.ts[2] wasm lib batch 30.57 ±0.55% fastest ✅

Benchmarks summary: 83/83 OK ✅

@DrEverr DrEverr merged commit 8a21145 into main Nov 24, 2025
16 checks passed
@DrEverr DrEverr deleted the td-opentelemetry branch November 24, 2025 21:08
This was referenced Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants