-
Notifications
You must be signed in to change notification settings - Fork 0
Simple telemetry #799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple telemetry #799
Conversation
📝 WalkthroughSummary by CodeRabbitRelease Notes
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds OpenTelemetry telemetry across the repository: a new core telemetry package with initialize/close APIs; telemetry initialization in CLIs and worker bootstrap files; new metrics modules and re-exports for networking, jam node, block generator, and importer; propagation of nodeName through worker configs and transferables; a now() utility and refactored measure(); removal of several debug logs; and workspace/package.json updates adding telemetry and OpenTelemetry dependencies. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
♻️ Duplicate comments (1)
packages/workers/block-generator/package.json (1)
7-7: Dependency additions look correct.The workspace dependency
@typeberry/telemetrycorrectly uses"*"per coding guidelines. The OpenTelemetry API version should be verified as noted in other package.json files.Also applies to: 18-18
🧹 Nitpick comments (3)
bin/jam/README.md (1)
140-140: Fix bare URL markdown linting warning.The URL
http://localhost:9090/api/v1/otlpshould be formatted as inline code to resolve the markdown linting warning and improve consistency with the table format.Apply this diff:
-| `OTEL_EXPORTER_OTLP_ENDPOINT` | URL to push metrics to | http://localhost:9090/api/v1/otlp | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | URL to push metrics to | `http://localhost:9090/api/v1/otlp` |packages/workers/api-node/port.ts (1)
88-90: Avoid usingastype assertions.While the comment explains why the cast is safe, the coding guidelines specify that
asconversions should not be used. Consider creating a type guard function or using atryAspattern to validate the cast at runtime.As per coding guidelines
Apply this diff to use a more explicit type assertion approach:
+function asTransferable(buffer: ArrayBuffer): Transferable { + return buffer as unknown as Transferable; +} + postMessage<T>(event: string, codec: Codec<T>, msg: Envelope<T>): void { const encoded = Encoder.encodeObject(codec, msg.data, this.spec); const message: Message = { eventName: event, responseId: msg.responseId, data: encoded.raw, }; - // casting to transferable is safe here, since we know that encoder - // always returns owned uint8arrays. - this.port.postMessage(message, [encoded.raw.buffer as unknown as Transferable]); + // Encoder always returns owned uint8arrays, so the buffer is transferable + this.port.postMessage(message, [asTransferable(encoded.raw.buffer)]); }packages/core/telemetry/index.ts (1)
74-76: Consider making the export interval configurable.The 3-second export interval (line 76) is quite aggressive and could generate significant network traffic, especially with multiple workers. Consider making this configurable via an environment variable (e.g.,
OTEL_METRIC_EXPORT_INTERVAL_MILLIS) with 3000 as a default for local development but allowing operators to adjust it for production.Apply this diff:
+const exportIntervalMillis = Number(env.OTEL_METRIC_EXPORT_INTERVAL_MILLIS) || 3_000; + const otlpReader = new PeriodicExportingMetricReader({ exporter: otlpExporter, - exportIntervalMillis: 3_000, + exportIntervalMillis, });
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (39)
bin/jam/README.md(1 hunks)bin/jam/index.ts(2 hunks)bin/jam/package.json(1 hunks)bin/jam/telemetry.ts(1 hunks)package.json(1 hunks)packages/core/networking/index.ts(1 hunks)packages/core/networking/metrics.ts(1 hunks)packages/core/networking/package.json(1 hunks)packages/core/networking/setup.ts(5 hunks)packages/core/telemetry/index.ts(1 hunks)packages/core/telemetry/package.json(1 hunks)packages/core/utils/compatibility.ts(1 hunks)packages/core/utils/debug.ts(2 hunks)packages/core/utils/dev.ts(1 hunks)packages/core/utils/env.ts(1 hunks)packages/core/utils/index.ts(1 hunks)packages/jam/node/export.ts(1 hunks)packages/jam/node/index.ts(1 hunks)packages/jam/node/main-importer.ts(3 hunks)packages/jam/node/main.ts(4 hunks)packages/jam/node/metrics.ts(1 hunks)packages/jam/node/package.json(1 hunks)packages/jam/node/workers.ts(0 hunks)packages/workers/api-node/config.ts(6 hunks)packages/workers/api-node/port.ts(1 hunks)packages/workers/api/channel.ts(0 hunks)packages/workers/api/config.ts(2 hunks)packages/workers/block-generator/bootstrap-main.ts(1 hunks)packages/workers/block-generator/generator.ts(4 hunks)packages/workers/block-generator/index.ts(1 hunks)packages/workers/block-generator/metrics.ts(1 hunks)packages/workers/block-generator/package.json(2 hunks)packages/workers/importer/bootstrap-main.ts(1 hunks)packages/workers/importer/importer.ts(6 hunks)packages/workers/importer/index.ts(1 hunks)packages/workers/importer/metrics.ts(1 hunks)packages/workers/importer/package.json(2 hunks)packages/workers/jam-network/bootstrap-main.ts(1 hunks)packages/workers/jam-network/package.json(1 hunks)
💤 Files with no reviewable changes (2)
- packages/jam/node/workers.ts
- packages/workers/api/channel.ts
🧰 Additional context used
📓 Path-based instructions (3)
packages/core/**/*.ts
⚙️ CodeRabbit configuration file
packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined underpackages/jam/**)
Files:
packages/core/utils/index.tspackages/core/telemetry/index.tspackages/core/networking/metrics.tspackages/core/networking/setup.tspackages/core/utils/debug.tspackages/core/utils/env.tspackages/core/utils/dev.tspackages/core/networking/index.tspackages/core/utils/compatibility.ts
**/*.ts
⚙️ CodeRabbit configuration file
**/*.ts: rules from./CODESTYLE.mdshould be adhered to.
**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in*.test.tsfiles. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.
**/*.ts:asconversions must not be used. Suggest usingtryAsconversion methods.
**/*.ts: Classes withstatic Codecfield must have private constructor and staticcreatemethod.
**/*.ts: Casting abigint(orU64) usingNumber(x)must have an explanation comment why
it is safe.
**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.
Files:
packages/core/utils/index.tspackages/jam/node/export.tspackages/jam/node/main.tspackages/workers/api/config.tspackages/jam/node/metrics.tsbin/jam/index.tspackages/workers/block-generator/index.tspackages/workers/block-generator/bootstrap-main.tspackages/core/telemetry/index.tspackages/core/networking/metrics.tsbin/jam/telemetry.tspackages/workers/block-generator/metrics.tspackages/jam/node/index.tspackages/workers/jam-network/bootstrap-main.tspackages/workers/block-generator/generator.tspackages/workers/api-node/port.tspackages/workers/importer/metrics.tspackages/workers/importer/index.tspackages/jam/node/main-importer.tspackages/core/networking/setup.tspackages/core/utils/debug.tspackages/core/utils/env.tspackages/core/utils/dev.tspackages/core/networking/index.tspackages/workers/api-node/config.tspackages/workers/importer/bootstrap-main.tspackages/workers/importer/importer.tspackages/core/utils/compatibility.ts
packages/**/package.json
⚙️ CodeRabbit configuration file
packages/**/package.json: Whenever a new package is added, thepackages/README.mdshould be updated accordingly.
packages/**/package.json: Dependencies coming from the workspace (typically the ones starting with@typeberry/)
must use*as version.
Files:
packages/workers/importer/package.jsonpackages/jam/node/package.jsonpackages/core/telemetry/package.jsonpackages/workers/jam-network/package.jsonpackages/workers/block-generator/package.jsonpackages/core/networking/package.json
🧠 Learnings (4)
📚 Learning: 2025-06-18T20:35:13.536Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 442
File: packages/core/pvm-debugger-adapter/index.ts:22-40
Timestamp: 2025-06-18T20:35:13.536Z
Learning: The `typeberry/utils` package has browser compatibility issues due to Node.js-specific code like `measure` function using `process.hrtime()` and `testUtils` importing `node:assert`, causing white screens in browser environments.
Applied to files:
bin/jam/package.jsonpackages/jam/node/main.tspackages/core/telemetry/package.jsonpackages/core/utils/debug.tspackages/workers/jam-network/package.json
📚 Learning: 2025-09-15T14:37:04.214Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 625
File: benchmarks/package.json:8-15
Timestamp: 2025-09-15T14:37:04.214Z
Learning: In the FluffyLabs/typeberry monorepo, the workspace:* protocol does not work for internal typeberry dependencies. The working approach is to use "*" as the version specifier for internal monorepo packages instead of "workspace:*".
Applied to files:
bin/jam/package.jsonpackages/workers/importer/package.jsonpackage.jsonpackages/jam/node/package.jsonpackages/core/telemetry/package.jsonpackages/workers/jam-network/package.jsonpackages/workers/block-generator/package.json
📚 Learning: 2025-06-18T20:35:13.536Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 442
File: packages/core/pvm-debugger-adapter/index.ts:22-40
Timestamp: 2025-06-18T20:35:13.536Z
Learning: The `measure` function in `typeberry/utils/debug.ts` attempts environment detection by checking `process === undefined` but still causes bundling issues because bundlers see the `process.hrtime` reference in the Node.js branch.
Applied to files:
packages/jam/node/main.tsbin/jam/index.tspackages/core/utils/debug.tspackages/core/utils/env.tspackages/core/utils/dev.ts
📚 Learning: 2025-05-26T21:31:58.688Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 399
File: .github/workflows/vectors-jamduna.yml:78-78
Timestamp: 2025-05-26T21:31:58.688Z
Learning: In the typeberry project, the npm start script in the test-runner workspace is designed to accept test suite names as arguments (e.g., "jamduna", "w3f"). The command `npm start -w typeberry/test-runner jamduna` is the correct way to run the jamduna test suite, not `npm run jamduna`. This is an architectural design choice where the start script acts as a unified entry point for different test suites.
Applied to files:
bin/jam/index.ts
🧬 Code graph analysis (12)
packages/jam/node/metrics.ts (3)
packages/core/networking/metrics.ts (1)
createMetrics(10-112)packages/workers/block-generator/metrics.ts (1)
createMetrics(10-52)packages/workers/importer/metrics.ts (1)
createMetrics(10-93)
bin/jam/index.ts (1)
packages/core/telemetry/index.ts (1)
initializeTelemetry(18-28)
packages/workers/block-generator/bootstrap-main.ts (2)
packages/workers/api-node/protocol.ts (1)
initWorker(106-163)packages/core/telemetry/index.ts (1)
initializeTelemetry(18-28)
packages/core/telemetry/index.ts (2)
packages/core/logger/index.ts (1)
Logger(20-120)packages/core/utils/env.ts (1)
env(1-1)
packages/core/networking/metrics.ts (3)
packages/jam/node/metrics.ts (1)
createMetrics(10-46)packages/workers/block-generator/metrics.ts (1)
createMetrics(10-52)packages/workers/importer/metrics.ts (1)
createMetrics(10-93)
packages/workers/block-generator/metrics.ts (3)
packages/core/networking/metrics.ts (1)
createMetrics(10-112)packages/jam/node/metrics.ts (1)
createMetrics(10-46)packages/workers/importer/metrics.ts (1)
createMetrics(10-93)
packages/workers/jam-network/bootstrap-main.ts (2)
packages/workers/api-node/protocol.ts (1)
initWorker(106-163)packages/core/telemetry/index.ts (1)
initializeTelemetry(18-28)
packages/workers/block-generator/generator.ts (2)
packages/jam/block/header.ts (1)
Header(128-194)packages/core/utils/debug.ts (1)
now(13-25)
packages/workers/importer/metrics.ts (3)
packages/core/networking/metrics.ts (1)
createMetrics(10-112)packages/jam/node/metrics.ts (1)
createMetrics(10-46)packages/workers/block-generator/metrics.ts (1)
createMetrics(10-52)
packages/core/networking/setup.ts (5)
packages/core/networking/peer-verification.ts (1)
peerVerification(4-21)packages/core/networking/quic-utils.ts (1)
addEventListener(6-19)packages/core/networking/certificate.ts (1)
PeerInfo(33-36)packages/core/networking/quic-peer.ts (1)
QuicPeer(13-59)packages/core/utils/debug.ts (1)
now(13-25)
packages/workers/importer/bootstrap-main.ts (2)
packages/workers/api-node/protocol.ts (1)
initWorker(106-163)packages/core/telemetry/index.ts (1)
initializeTelemetry(18-28)
packages/workers/importer/importer.ts (2)
packages/core/utils/debug.ts (1)
now(13-25)packages/core/utils/result.ts (1)
resultToString(73-78)
🪛 markdownlint-cli2 (0.18.1)
bin/jam/README.md
140-140: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: test (22.x)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: e2e (22.x)
🔇 Additional comments (42)
packages/workers/jam-network/package.json (1)
17-17: LGTM!Workspace dependency correctly uses
"*"as the version specifier per coding guidelines.packages/workers/api/config.ts (2)
6-7: LGTM!The addition of the
nodeNamefield to theWorkerConfiginterface is clean and well-documented.
24-36: LGTM!The
DirectWorkerConfig.newfactory method correctly accepts and propagates thenodeNameparameter through to the constructor. The implementation is consistent with the interface definition.packages/core/utils/debug.ts (2)
5-25: LGTM! Improved cross-platform timing with proper bundler handling.The new
now()utility correctly handles both browser and Node.js environments. The deferredprocess.hrtimereference (lines 16-24) addresses the bundler issues noted in previous learnings. The Number() cast on line 19 has an adequate safety explanation per coding guidelines—dividing hrtime nanoseconds by 1M yields milliseconds well belowNumber.MAX_SAFE_INTEGER.Based on learnings.
134-137: LGTM! Clean refactor using the new timing utility.The
measure()function now correctly uses the unifiednow()utility, simplifying the implementation while maintaining the same functionality.packages/core/networking/package.json (1)
16-16: OpenTelemetry API version is current and secure.Version 1.9.0 is the latest stable @opentelemetry/api on npm, and there are no reported/direct vulnerabilities for @opentelemetry/api v1.9.0 in public vulnerability databases.
packages/jam/node/package.json (1)
7-7: Version 1.9.0 is the latest stable release and has no known security vulnerabilities.The npm package @opentelemetry/api version 1.9.0 has no publicly reported CVEs, and security scans show no direct vulnerabilities.
packages/jam/node/index.ts (1)
6-6: LGTM!The metrics module export aligns with the telemetry integration and follows the existing barrel pattern.
packages/workers/jam-network/bootstrap-main.ts (1)
1-11: LGTM!The telemetry initialization follows the consistent pattern used across other workers and correctly initializes OpenTelemetry before the main worker logic executes.
packages/jam/node/export.ts (1)
45-45: LGTM!The
nodeNamepropagation aligns with the telemetry configuration requirements across the codebase.bin/jam/package.json (1)
17-17: LGTM!The dependency correctly uses the
"*"version specifier for the internal monorepo package.packages/core/networking/index.ts (1)
3-3: LGTM!The metrics module export follows the existing barrel pattern and makes the networking metrics API publicly accessible.
packages/workers/block-generator/index.ts (1)
2-2: LGTM!The metrics module export follows the existing barrel pattern and aligns with the telemetry instrumentation for block authoring.
packages/workers/block-generator/bootstrap-main.ts (1)
2-12: LGTM!The telemetry initialization follows the consistent pattern used across other workers and correctly initializes OpenTelemetry before the main worker logic executes.
bin/jam/index.ts (1)
84-89: LGTM! Telemetry initialization is properly positioned.The telemetry is initialized early in the startup sequence (after config preparation but before command handling), ensuring metrics are available throughout the node's lifecycle. The parameters correctly match the expected
TelemetryConfiginterface.packages/workers/importer/index.ts (1)
2-2: LGTM! Clean barrel export.The metrics module is properly exposed through the package's public API surface.
packages/jam/node/main-importer.ts (1)
24-24: LGTM! Parameter propagation is straightforward.The
nodeNameis correctly extracted and propagated to both worker configuration paths, enabling telemetry and worker identification.Also applies to: 37-37, 46-46
packages/workers/api-node/port.ts (1)
2-2: LGTM! Type-only imports improve tree-shaking.Converting to type-only imports for
MessagePortandTransferableis a good practice that helps avoid unnecessary runtime dependencies.packages/jam/node/main.ts (2)
35-36: LGTM! Metrics initialization is well-positioned.Creating the metrics instance early in the main function ensures it's available before any operations that need instrumentation.
75-79: LGTM! Metrics wrapper preserves async behavior correctly.The wrapper properly records the block change metric before delegating to the original handler, maintaining the async flow without blocking or race conditions.
packages/core/utils/dev.ts (1)
1-1: LGTM! Centralized environment access improves consistency.Using the shared
envmodule ensures consistent environment handling across the codebase. Please see the verification request inpackages/core/utils/compatibility.tsto confirm the env module implementation is correct.Based on learnings
Also applies to: 21-21
packages/workers/importer/package.json (1)
7-7: No issues identified.The opentelemetry-js repository shows 0 published security advisories, and version 1.9.0 is confirmed as the latest stable release. The dependency is correctly specified.
packages/core/utils/compatibility.ts (1)
1-2: env module correctly handles cross-platform compatibility.The centralized env module implementation uses a safe pattern: it returns an empty object in browser environments (where
processis undefined) and returnsprocess.envin Node.js. This ensures safe cross-platform compatibility without introducing bundling issues—thetypeofguard prevents runtime errors, andprocess.envis designed to be handled by bundlers during compilation.packages/core/utils/env.ts (1)
1-1: LGTM! Browser-safe environment detection.This centralized approach avoids the bundler issues previously encountered with direct
process.envreferences scattered throughout the codebase.Based on learnings
packages/workers/block-generator/generator.ts (4)
12-13: LGTM! Clean metrics integration.The imports correctly bring in the timing utility and metrics module for block authoring instrumentation.
19-19: LGTM! Metrics field properly typed.Using
ReturnType<typeof metrics.createMetrics>ensures type safety without duplicating the metrics interface.
74-75: LGTM! Proper timing capture and event recording.The start time is captured immediately before the block authoring logic, and the JIP-3 authoring event is recorded with the appropriate slot number.
132-133: LGTM! Duration tracking completed correctly.The block authoring duration is calculated and recorded along with the completion event, providing useful timing metrics for observability.
packages/workers/block-generator/metrics.ts (1)
1-52: LGTM! Consistent metrics implementation.The metrics module follows the established pattern from
packages/core/networking/metrics.tsandpackages/workers/importer/metrics.ts, providing a clean factory API for recording block authoring events. The JIP-3 event numbers (40, 41, 42) are properly documented.bin/jam/telemetry.ts (1)
1-3: LGTM! Clean barrel export.The re-export pattern provides a clean interface for telemetry functionality without duplicating implementation.
packages/workers/importer/bootstrap-main.ts (1)
1-11: LGTM! Proper telemetry initialization sequence.The telemetry is correctly initialized after the configuration is obtained (line 6) and before the main worker logic starts (line 12), ensuring metrics are captured from the start of the import workflow.
packages/workers/importer/importer.ts (4)
10-11: LGTM! Proper imports for timing and metrics.The imports correctly bring in the timing utility (
now) and the metrics module for instrumentation.
91-95: LGTM! Comprehensive import lifecycle tracking.The block import is properly instrumented with start event recording and timing capture for duration measurement.
119-127: LGTM! Verification timing and failure tracking.The verification phase is properly instrumented with both timing metrics and failure path recording.
152-160: Hardcoded gas cost should be replaced with actual execution cost.Line 160 records block execution with a gas cost of
0. This appears to be a placeholder value. The execution result (res.okon line 162) likely contains actual gas/cost information that should be extracted and recorded.Please verify if the execution result contains cost information and update the metrics call:
#!/bin/bash # Search for cost/gas information in the execution result type echo "=== Searching for gas/cost in transition/chain-stf.ts ===" rg -n -A5 -B5 'export.*transition|interface.*Result' --type=ts packages/core/transition/chain-stf.ts | head -50 echo -e "\n=== Checking StateUpdate type definition ===" rg -n 'export.*StateUpdate|interface.*StateUpdate' --type=ts packages/⛔ Skipped due to learnings
Learnt from: tomusdrw Repo: FluffyLabs/typeberry PR: 390 File: packages/jam/jam-host-calls/fetch.ts:220-226 Timestamp: 2025-05-16T16:23:08.685Z Learning: In the typeberry codebase, host call (HC) implementations define a gas cost but don't need to explicitly consume gas in their execute() methods. Gas accounting is handled outside of the host call implementation.packages/core/telemetry/index.ts (2)
18-28: LGTM! Clean configuration wrapper.The
initializeTelemetryfunction provides a simplified interface that translates high-level configuration into the detailed telemetry setup, reading environment variables with sensible defaults.
59-67: Excellent documentation of multi-worker challenges.The comment clearly explains why pull-based Prometheus metrics aren't used and acknowledges the architectural constraint of multiple worker threads, with a forward-looking note about TART/JIP-3.
packages/jam/node/metrics.ts (1)
42-44: LGTM!Correct conversion of the boolean
syncedattribute to string, consistent with OpenTelemetry best practices.packages/core/networking/setup.ts (1)
193-204: LGTM! Duration tracking correctly implemented.The connection duration measurement using
now()is properly implemented. The start time is captured when the peer is created, and the duration is calculated on disconnect and passed to the metrics.packages/workers/importer/metrics.ts (1)
1-93: LGTM! Well-structured metrics module.The importer metrics module follows the established pattern consistently:
- Slot values are correctly converted to strings (line 71)
- Boolean success flag is converted to string (line 67)
- Histograms and counters are properly instrumented
- JIP-3 event numbering is documented
packages/core/networking/metrics.ts (1)
1-112: LGTM! Comprehensive networking metrics module.The networking metrics module is well-structured:
- Clear JIP-3 event numbering (20-28) with documentation
- Connection duration histogram properly tracks time by side (in/out)
- All metric instruments have appropriate descriptions and units
- API methods correctly update counters and histograms with relevant attributes
packages/workers/api-node/config.ts (1)
17-29: LGTM! Clean addition of nodeName field.The
nodeNamefield has been consistently added to bothLmdbWorkerConfigandInMemWorkerConfig:
- Static factory methods properly accept and propagate the field
- Constructors updated to accept and store as public readonly
- TransferableConfig type and serialization methods correctly handle the field
- All code paths (new, fromTransferable, intoTransferable) are consistent
Also applies to: 39-53, 66-82, 90-115
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bin/jam/index.ts (1)
11-99: Signal handlers must callprocess.exit()after shutdown, and normal command paths should also clean up telemetry.The review comment correctly identifies that signal handlers prevent Node's default exit behavior. Once
process.on("SIGINT"/"SIGTERM")is attached, Node won't auto-exit on these signals—the handlers must explicitly callprocess.exit()or the process continues running indefinitely.Additionally, the code paths for FuzzTarget (line 106), Import (line 111), Export (line 115), and the default node start (line 117) all return without calling telemetry shutdown. While the signal handlers are the critical issue, these normal paths should also clean up telemetry before returning to ensure metrics are properly flushed on graceful termination.
The suggested fix in the review (wrapping shutdown + exit in a helper) is correct and aligns with the pattern already used in
bin/rpc/main.ts:77-79.
♻️ Duplicate comments (2)
packages/core/networking/setup.ts (1)
136-188: Avoid double-counting outbound failures and reconsider sentinel peer IDTwo related concerns in the outbound dial path:
- Failure metrics can be emitted twice for a single attempt
- In the
peerDetails.info === nullandpeer_id_mismatchbranches you callnetworkMetrics.recordConnectOutFailed(...)and then throw.- The outer
catchalso callsnetworkMetrics.recordConnectOutFailed(String(error)), so those cases will produce two failure events per failed dial.This will skew telemetry. A simple guard would ensure at most one failure metric per attempt, e.g.:
async function doDial() { const peerAddress = `${peer.host}:${peer.port}`; const peerDetails = peerVerification(); + let failureRecorded = false; try { @@ - if (peerDetails.info === null) { - networkMetrics.recordConnectOutFailed("no_peer_info"); + if (peerDetails.info === null) { + failureRecorded = true; + networkMetrics.recordConnectOutFailed("no_peer_info"); await client.destroy({ isApp: true, errorCode: CloseReason.PeerIdMismatch }); throw new Error("Client connected, but there is no peer details!"); } @@ - if (options.verifyName !== undefined && options.verifyName !== peerDetails.info.id) { - networkMetrics.recordConnectOutFailed("peer_id_mismatch"); + if (options.verifyName !== undefined && options.verifyName !== peerDetails.info.id) { + failureRecorded = true; + networkMetrics.recordConnectOutFailed("peer_id_mismatch"); await client.destroy({ isApp: true, errorCode: CloseReason.PeerIdMismatch }); @@ - } catch (error) { - networkMetrics.recordConnectOutFailed(String(error)); + } catch (error) { + if (!failureRecorded) { + networkMetrics.recordConnectOutFailed(String(error)); + } throw error; }
- Using
"unknown"as a synthetic peer ID is still a magic sentinel
recordConnectingOut(peerDetails.info?.id ?? "unknown", peerAddress)improves over the previous"pending"literal but still injects a fake peer ID into metrics. This is essentially the same concern as the earlier review comment: ideally the metric API would accept an optional peer ID (or a dedicated “peer_unknown” attribute) rather than overloading an ID label with a synthetic value.If you can adjust the metrics signature, something like the following would avoid magic IDs:
// in metrics.ts recordConnectingOut(peerAddress: string, peerId?: string) { /* ... */ } // here networkMetrics.recordConnectingOut(peerAddress, peerDetails.info?.id);That keeps the cardinality model clear and lets the exporter handle “unknown” explicitly instead of via a magic string.
packages/jam/node/metrics.ts (1)
33-44: Standardizeslotattribute type (string) to match other metrics.Here
slotis recorded as a number:bestBlockChangedCounter.add(1, { slot, header_hash: headerHash }); finalizedBlockChangedCounter.add(1, { slot, header_hash: headerHash });Other metrics modules in this PR use
slot.toString()for the same attribute, and mixing numeric vs. string types for a given attribute key can cause surprises in backends and make cross-metric queries harder.Recommend coercing to string for consistency:
recordBestBlockChanged(slot: number, headerHash: string): void { - bestBlockChangedCounter.add(1, { slot, header_hash: headerHash }); + bestBlockChangedCounter.add(1, { slot: slot.toString(), header_hash: headerHash }); }, recordFinalizedBlockChanged(slot: number, headerHash: string): void { - finalizedBlockChangedCounter.add(1, { slot, header_hash: headerHash }); + finalizedBlockChangedCounter.add(1, { slot: slot.toString(), header_hash: headerHash }); },
🧹 Nitpick comments (3)
packages/core/networking/setup.ts (1)
193-201: Fixreasonfallback when disconnect events have no detail
String(ev.detail) ?? "normal"will never produce"normal"becauseString(...)is nevernull/undefined. Ifev.detailis missing, you’ll emit"undefined"instead of"normal".Consider:
- const reason = String(ev.detail) ?? "normal"; + const reason = String(ev.detail ?? "normal");This preserves detailed reasons when present and falls back cleanly to
"normal"when not.packages/core/telemetry/index.ts (1)
39-104: HandleNodeSDK.start()as an async operation to avoid unhandled rejections.
NodeSDK.start()returns a Promise, but it’s called like a sync method inside atryblock:try { sdk.start(); logger.info`OpenTelemetry initialized for service: ${serviceName}`; } catch (error) { logger.error`Error initializing OpenTelemetry: ${error}`; }This won’t catch async startup errors and can produce unhandled promise rejections if the SDK fails to start.
You can keep
initializeTelemetryFullsynchronous while still handling errors by chaining on the Promise:- try { - sdk.start(); - logger.info`OpenTelemetry initialized for service: ${serviceName}`; - } catch (error) { - logger.error`Error initializing OpenTelemetry: ${error}`; - } + sdk + .start() + .then(() => { + logger.info`OpenTelemetry initialized for service: ${serviceName}`; + }) + .catch((error) => { + logger.error`Error initializing OpenTelemetry: ${error}`; + });This way you still return
sdkimmediately, but startup success/failure is properly logged and async errors are not dropped.packages/workers/importer/metrics.ts (1)
65-92: Stringifyslotattribute for consistency with other metrics.In the importer metrics you currently emit
slotas a number:recordBlockImportingStarted(slot: number): void { blockImportingCounter.add(1, { slot }); }Other modules in this PR use
slot.toString()for the same attribute key. Mixing numeric and string types forslotacross instruments/backends can complicate querying and, in some systems, lead to dropped or split series.Consider normalizing to string here as well:
recordBlockImportingStarted(slot: number): void { - blockImportingCounter.add(1, { slot }); + blockImportingCounter.add(1, { slot: slot.toString() }); },
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (8)
bin/jam/README.md(1 hunks)bin/jam/index.ts(2 hunks)packages/core/networking/setup.ts(5 hunks)packages/core/telemetry/index.ts(1 hunks)packages/core/utils/debug.ts(2 hunks)packages/jam/node/metrics.ts(1 hunks)packages/workers/block-generator/metrics.ts(1 hunks)packages/workers/importer/metrics.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/core/utils/debug.ts
- packages/workers/block-generator/metrics.ts
🧰 Additional context used
📓 Path-based instructions (2)
**/*.ts
⚙️ CodeRabbit configuration file
**/*.ts: rules from./CODESTYLE.mdshould be adhered to.
**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in*.test.tsfiles. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.
**/*.ts:asconversions must not be used. Suggest usingtryAsconversion methods.
**/*.ts: Classes withstatic Codecfield must have private constructor and staticcreatemethod.
**/*.ts: Casting abigint(orU64) usingNumber(x)must have an explanation comment why
it is safe.
**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.
Files:
bin/jam/index.tspackages/core/telemetry/index.tspackages/jam/node/metrics.tspackages/core/networking/setup.tspackages/workers/importer/metrics.ts
packages/core/**/*.ts
⚙️ CodeRabbit configuration file
packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined underpackages/jam/**)
Files:
packages/core/telemetry/index.tspackages/core/networking/setup.ts
🧠 Learnings (1)
📚 Learning: 2025-05-26T21:31:58.688Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 399
File: .github/workflows/vectors-jamduna.yml:78-78
Timestamp: 2025-05-26T21:31:58.688Z
Learning: In the typeberry project, the npm start script in the test-runner workspace is designed to accept test suite names as arguments (e.g., "jamduna", "w3f"). The command `npm start -w typeberry/test-runner jamduna` is the correct way to run the jamduna test suite, not `npm run jamduna`. This is an architectural design choice where the start script acts as a unified entry point for different test suites.
Applied to files:
bin/jam/index.ts
🧬 Code graph analysis (2)
bin/jam/index.ts (1)
packages/core/telemetry/index.ts (2)
initializeTelemetry(18-28)shutdownTelemetry(106-115)
packages/core/telemetry/index.ts (2)
packages/core/logger/index.ts (1)
Logger(20-120)packages/core/utils/env.ts (1)
env(1-1)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: run (22.x)
- GitHub Check: state_transition (22.x)
- GitHub Check: e2e (22.x)
- GitHub Check: run (22.x)
- GitHub Check: test (22.x)
- GitHub Check: jam-e2e (22.x)
- GitHub Check: benchmarks (22.x)
🔇 Additional comments (2)
packages/core/networking/setup.ts (2)
6-47: Metrics initialization perQuicNetworkinstance looks goodCreating a single
networkMetricsviametrics.createMetrics()at the start ofQuic.setupis a clean integration point and keeps instrumentation localized to this networking instance.
98-129: Inbound connection metrics coverage is consistentThe sequence
recordConnectingIn→ conditionalrecordConnectInFailed/recordConnectionRefused→recordConnectedInon success cleanly models the inbound connection lifecycle and matches the existing control flow (self-connection, duplicate, missinglastConnectedPeer.info).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
bin/jam/README.md (1)
135-163: Documentation successfully clarifies OTLP push-based telemetry.The OpenTelemetry section correctly describes:
- Environment variables with clear defaults (OTEL_ENABLED=true, OTLP endpoint)
- Push-based metrics flow (jam pushes to Prometheus OTLP endpoint, not the reverse)
- Docker setup for local testing with Prometheus
The documentation now properly addresses the past concern about pull-based metrics endpoints—there's no reference to the old 9464/metrics pattern. The setup is clearly push-based via OTLP.
Consider clarifying that jam does not expose a local
/metricsendpoint by default (metrics are pushed to the configured OTLP endpoint). This could help users avoid confusion if they search for metrics on the jam process itself.Consider adding a clarifying note like:
To inspect metrics pushed over OTLP, start a Prometheus container with the OTLP receiver enabled: + +(Note: jam pushes metrics to the configured OTLP endpoint and does not expose a local `/metrics` endpoint.)This prevents users from looking for metrics in the wrong place.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
bin/jam/README.md(1 hunks)packages/README.md(1 hunks)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: benchmarks (22.x)
- GitHub Check: run (22.x)
- GitHub Check: test (22.x)
- GitHub Check: e2e (22.x)
🔇 Additional comments (2)
packages/README.md (1)
32-32: Documentation entry for new telemetry package is well-placed and formatted correctly.The addition of the telemetry package to the core packages list is appropriately positioned (alphabetically between shuffling and trie), follows the established format, and provides a concise description that aligns with the package's purpose of OpenTelemetry initialization utilities.
bin/jam/README.md (1)
135-163: The Prometheus OTLP receiver flag is correct.The flag
--web.enable-otlp-receiveris the correct way to enable OTLP HTTP receiver in current Prometheus versions. The documentation in lines 157-160 is accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/jam/node/main-importer.ts (1)
24-54: LGTM! NodeName properly propagated to worker configs.The nodeName is correctly extracted from config and passed to both InMemWorkerConfig and LmdbWorkerConfig constructors, enabling per-node telemetry identification.
Minor style note: The nodeName variable is extracted on line 24 but config.nodeName is still used directly on line 29. For consistency, consider moving the extraction before line 27 and using the nodeName variable throughout, or just use config.nodeName directly everywhere.
Optional refactor for consistency:
const chainSpec = getChainSpec(config.node.flavor); const blake2b = await Blake2b.createHasher(); +const nodeName = config.nodeName; const omitSealVerification = false; const { dbPath, genesisHeaderHash } = getDatabasePath( blake2b, - config.nodeName, + nodeName, config.node.chainSpec.genesisHeader, withRelPath(config.node.databaseBasePath ?? "<in-memory>"), ); -const nodeName = config.nodeName;packages/workers/importer/importer.ts (1)
163-168: Consider adding metrics for state update failures.Lines 165-168 handle state update errors but don't record specific metrics like verification and execution failures do. While the overall import failure is captured at line 107, adding granular metrics here would improve observability for debugging state-related issues.
Example:
if (updateResult.isError) { logger.error`🧱 Unable to update state: ${resultToString(updateResult)}`; + this.metrics.recordStateUpdateFailed?.(resultToString(updateResult)); return importerError(ImporterErrorKind.Update, updateResult); }Note: This assumes the metrics API supports state update tracking.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (6)
bin/jam/package.json(1 hunks)packages/jam/node/main-importer.ts(3 hunks)packages/workers/api-node/config.ts(6 hunks)packages/workers/api-node/port.ts(0 hunks)packages/workers/api/config.ts(2 hunks)packages/workers/importer/importer.ts(6 hunks)
💤 Files with no reviewable changes (1)
- packages/workers/api-node/port.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- bin/jam/package.json
🧰 Additional context used
📓 Path-based instructions (1)
**/*.ts
⚙️ CodeRabbit configuration file
**/*.ts: rules from./CODESTYLE.mdshould be adhered to.
**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in*.test.tsfiles. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.
**/*.ts:asconversions must not be used. Suggest usingtryAsconversion methods.
**/*.ts: Classes withstatic Codecfield must have private constructor and staticcreatemethod.
**/*.ts: Casting abigint(orU64) usingNumber(x)must have an explanation comment why
it is safe.
**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.
Files:
packages/workers/importer/importer.tspackages/jam/node/main-importer.tspackages/workers/api/config.tspackages/workers/api-node/config.ts
🧬 Code graph analysis (1)
packages/workers/importer/importer.ts (2)
packages/core/utils/debug.ts (1)
now(13-13)packages/core/utils/result.ts (1)
resultToString(73-78)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: test (22.x)
- GitHub Check: benchmarks (22.x)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: build (22.x)
- GitHub Check: jam-e2e (22.x)
- GitHub Check: e2e (22.x)
🔇 Additional comments (8)
packages/workers/api/config.ts (1)
6-7: LGTM! Clean addition of nodeName property.The nodeName property is consistently added to the WorkerConfig interface and DirectWorkerConfig implementation. The factory method and constructor properly handle the new parameter, following the existing patterns in the codebase.
Also applies to: 24-40
packages/workers/api-node/config.ts (2)
17-29: LGTM! Comprehensive nodeName integration for LmdbWorkerConfig.The nodeName parameter is properly threaded through the static factory method, constructor, and serialization methods (intoTransferable/fromTransferable). The TransferableConfig type is correctly updated to include nodeName for cross-thread communication.
Also applies to: 39-48, 66-78
90-102: LGTM! Clean nodeName integration for InMemWorkerConfig.The nodeName parameter is correctly added to the static factory method and constructor, following the same pattern as LmdbWorkerConfig.
Also applies to: 107-112
packages/workers/importer/importer.ts (5)
10-11: LGTM! Clean imports for telemetry instrumentation.The addition of
nowfor high-resolution timing and the metrics module are appropriate for the telemetry functionality being added.
37-37: LGTM! Proper metrics initialization.The metrics field is correctly declared as readonly and initialized early in the constructor.
Also applies to: 47-47
90-96: LGTM! Comprehensive block import timing.The timing instrumentation correctly captures both successful and failed import attempts with accurate duration measurements.
Also applies to: 101-101, 107-107
119-127: LGTM! Proper verification metrics.Both success and failure paths of block verification are correctly instrumented with timing and error details.
152-160: Clarify whether the hardcoded0is intentional or a placeholder.The metrics API parameter is
cost(not code), representing block execution gas. However, the STF result type does not include gas/cost data, making it impossible to extract this value fromres.ok. Line 160 passing hardcoded0is the only feasible option given the current STF API.Consider clarifying intent:
- If execution cost is always 0 because STF is a pure state transition, add a comment explaining this.
- If this is a placeholder for future implementation, document it as such or use a named constant like
NO_GAS_COST = 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
bin/jam/index.ts(2 hunks)packages/core/telemetry/index.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- bin/jam/index.ts
🧰 Additional context used
📓 Path-based instructions (2)
packages/core/**/*.ts
⚙️ CodeRabbit configuration file
packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined underpackages/jam/**)
Files:
packages/core/telemetry/index.ts
**/*.ts
⚙️ CodeRabbit configuration file
**/*.ts: rules from./CODESTYLE.mdshould be adhered to.
**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in*.test.tsfiles. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.
**/*.ts:asconversions must not be used. Suggest usingtryAsconversion methods.
**/*.ts: Classes withstatic Codecfield must have private constructor and staticcreatemethod.
**/*.ts: Casting abigint(orU64) usingNumber(x)must have an explanation comment why
it is safe.
**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.
Files:
packages/core/telemetry/index.ts
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: e2e (22.x)
- GitHub Check: run (22.x)
- GitHub Check: benchmarks (22.x)
- GitHub Check: state_transition (22.x)
- GitHub Check: test (22.x)
- GitHub Check: jam-e2e (22.x)
🔇 Additional comments (7)
packages/core/telemetry/index.ts (7)
1-11: LGTM!The imports are appropriate for OpenTelemetry integration and comply with the coding guideline that core packages must not import JAM-related packages.
13-17: LGTM!The interface design is clean and provides the necessary configuration for telemetry initialization.
24-24: Verify the opt-out telemetry design.Telemetry is enabled by default unless
OTEL_ENABLEDis explicitly set to"false". This opt-out approach means telemetry will be active in any environment where the variable is undefined or set to any value other than"false".For production deployments, opt-in telemetry is typically preferred to avoid unexpected metric collection and network traffic. Please confirm whether this default-enabled design is intentional for both development and production environments.
32-40: LGTM!The internal configuration interface appropriately extends the public API with implementation details.
79-79: Verify the 3-second export interval is appropriate.The
exportIntervalMillisis set to 3 seconds, which is quite aggressive. Standard practice for production environments is typically 30-60 seconds to balance observability with network overhead and server load.While this may be appropriate for local testing (as mentioned in the PR objectives), please confirm whether this is the intended interval for production deployments or if it should be made configurable.
62-70: Good documentation of multi-threaded metrics challenges.The comment clearly explains why pull-based Prometheus metrics aren't implemented yet and acknowledges the cross-worker aggregation challenge. This provides valuable context for future work.
110-119: LGTM!The shutdown function properly handles null SDKs, includes appropriate error handling, and uses suitable log levels for success and error cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/core/telemetry/index.ts (1)
117-127: Startup failure handling may leave SDK in invalid state.The
try-catchblock logs startup errors but still returns the SDK even ifsdk.start()throws an exception. This could leave the SDK in an invalid or partially initialized state.Past review comments suggested returning
nullwhen startup fails. While marked as addressed in earlier commits, the current code does not returnnullon startup failure.Consider applying this fix:
try { sdk.start(); if (config.isMain) { logger.info`📳 OTLP metrics will be exported to ${otlpEndpoint}`; } } catch (error) { logger.error`🔴 Error initializing OpenTelemetry: ${error}`; + return null; } return sdk;
🧹 Nitpick comments (1)
packages/workers/importer/bootstrap-main.ts (1)
1-1: LGTM! Consider consistent naming across workers.The telemetry initialization is correct. However, note that this file uses
sdkas the variable name, whilepackages/workers/jam-network/bootstrap-main.tsusestele. Consider using consistent naming across workers for better readability.Apply this diff for consistency with other workers:
// Initialize OpenTelemetry for this worker -const sdk = Telemetry.initialize({ +const tele = Telemetry.initialize({ nodeName: config.nodeName, worker: "importer", }); await main(config, comms); -await sdk?.close(); +await tele?.close();Also applies to: 7-11
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
bin/jam/index.ts(4 hunks)bin/rpc/test/e2e-setup.ts(1 hunks)bin/tci/index.ts(1 hunks)packages/core/telemetry/index.ts(1 hunks)packages/jam/node/main.ts(5 hunks)packages/workers/block-generator/bootstrap-main.ts(1 hunks)packages/workers/importer/bootstrap-main.ts(1 hunks)packages/workers/jam-network/bootstrap-main.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/workers/block-generator/bootstrap-main.ts
🧰 Additional context used
📓 Path-based instructions (2)
**/*.ts
⚙️ CodeRabbit configuration file
**/*.ts: rules from./CODESTYLE.mdshould be adhered to.
**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in*.test.tsfiles. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.
**/*.ts:asconversions must not be used. Suggest usingtryAsconversion methods.
**/*.ts: Classes withstatic Codecfield must have private constructor and staticcreatemethod.
**/*.ts: Casting abigint(orU64) usingNumber(x)must have an explanation comment why
it is safe.
**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.
Files:
packages/workers/importer/bootstrap-main.tspackages/jam/node/main.tspackages/workers/jam-network/bootstrap-main.tsbin/tci/index.tsbin/rpc/test/e2e-setup.tsbin/jam/index.tspackages/core/telemetry/index.ts
packages/core/**/*.ts
⚙️ CodeRabbit configuration file
packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined underpackages/jam/**)
Files:
packages/core/telemetry/index.ts
🧠 Learnings (2)
📚 Learning: 2025-05-26T21:31:58.688Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 399
File: .github/workflows/vectors-jamduna.yml:78-78
Timestamp: 2025-05-26T21:31:58.688Z
Learning: In the typeberry project, the npm start script in the test-runner workspace is designed to accept test suite names as arguments (e.g., "jamduna", "w3f"). The command `npm start -w typeberry/test-runner jamduna` is the correct way to run the jamduna test suite, not `npm run jamduna`. This is an architectural design choice where the start script acts as a unified entry point for different test suites.
Applied to files:
packages/jam/node/main.tsbin/jam/index.ts
📚 Learning: 2025-06-18T20:35:13.536Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 442
File: packages/core/pvm-debugger-adapter/index.ts:22-40
Timestamp: 2025-06-18T20:35:13.536Z
Learning: The `measure` function in `typeberry/utils/debug.ts` attempts environment detection by checking `process === undefined` but still causes bundling issues because bundlers see the `process.hrtime` reference in the Node.js branch.
Applied to files:
bin/jam/index.ts
🧬 Code graph analysis (3)
packages/workers/importer/bootstrap-main.ts (2)
packages/core/telemetry/index.ts (1)
Telemetry(19-49)packages/jam/node/main.ts (1)
main(29-141)
packages/jam/node/main.ts (3)
packages/jam/node/jam-config.ts (1)
JamConfig(21-58)packages/core/telemetry/index.ts (1)
Telemetry(19-49)packages/jam/node/common.ts (1)
logger(19-19)
bin/jam/index.ts (1)
packages/core/telemetry/index.ts (1)
Telemetry(19-49)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: run (22.x)
- GitHub Check: e2e (22.x)
- GitHub Check: test (22.x)
- GitHub Check: benchmarks (22.x)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: jam-e2e (22.x)
🔇 Additional comments (19)
bin/rpc/test/e2e-setup.ts (1)
16-16: LGTM!Passing
nullfor telemetry in the test setup is appropriate, as telemetry is not needed for this end-to-end test scenario.bin/tci/index.ts (1)
31-31: LGTM!The third argument (
null) is correctly added to align with the updatedmainsignature. For the TCI interface, passingnullfor telemetry is appropriate.packages/jam/node/main.ts (5)
11-11: LGTM!The function signature update to accept
telemetry: Telemetry | nullis well-designed:
- Uses type-only import for Telemetry, minimizing runtime dependencies
- Nullable parameter allows graceful degradation when telemetry is disabled
- Aligns with the telemetry integration pattern across the codebase
Also applies to: 17-17, 29-33
58-58: LGTM!Adding
nodeNametobaseConfigenables workers to initialize their own telemetry instances with the correct node name identifier.
80-84: LGTM!The metrics recording is well-integrated into the best header announcement flow:
- Slot is properly materialized from the timeSlotIndex
- Block hash is converted to string for metric recording
- Metrics are recorded before notifying listeners, ensuring observability
134-135: LGTM!Telemetry shutdown is properly integrated into the cleanup sequence with optional chaining to handle
nulltelemetry gracefully.
40-40: Metrics initialization is safe and properly handles disabled telemetry.The verification confirms that
metrics.createMetrics()safely handles the case when OpenTelemetry is not initialized. When the OpenTelemetry JS SDK (MeterProvider) is not installed/initialized, metrics.getMeter() returns a no-op Meter — a Meter whose instruments do nothing without errors. This is the standard OpenTelemetry API behavior and allows instrumentation to callgetMeter()safely regardless of whether telemetry is enabled. The code inpackages/jam/node/metrics.tscorrectly relies on this pattern, and no changes are required.packages/workers/jam-network/bootstrap-main.ts (2)
1-1: LGTM!Telemetry initialization follows the correct pattern:
- Initializes before the main worker logic
- Uses the appropriate worker type identifier ("network")
- Stores the result for cleanup
Also applies to: 7-11
13-13: LGTM!Telemetry cleanup is properly positioned after the main worker logic completes, with optional chaining to handle disabled telemetry.
packages/workers/importer/bootstrap-main.ts (1)
13-15: LGTM!Telemetry cleanup is properly sequenced before the forceful exit. The
process.exit(0)ensures the worker terminates cleanly after telemetry shutdown.bin/jam/index.ts (4)
11-11: LGTM!Telemetry initialization in the main entry point is correctly configured:
isMain: truedistinguishes the main thread from worker threads- Positioned appropriately before node initialization
- Clear comment indicates intent
Also applies to: 84-90
101-109: LGTM!Telemetry is correctly propagated to the import path, enabling observability during block import operations.
118-118: LGTM!Telemetry is correctly propagated to the main node execution path.
93-96: Fuzz target path does not use telemetry.The fuzz target path calls
mainFuzzwithout telemetry integration. This is likely intentional since fuzzing is a development/testing activity that doesn't require production observability.packages/core/telemetry/index.ts (5)
13-17: LGTM!The
TelemetryConfiginterface is well-designed with appropriate fields for distinguishing main thread vs. worker threads and identifying the node and worker type.
19-49: LGTM!The
Telemetryclass follows good design patterns:
- Factory method (
initialize) encapsulates SDK setup complexity- Private constructor with public factory method
- Gracefully handles disabled telemetry by returning
null- Async
close()method properly handles shutdown and logs errors
51-66: LGTM!The internal
TelemetryConfigFullinterface and the early return when telemetry is disabled are correctly implemented.
68-101: LGTM! Consider documenting the export interval choice.The resource and exporter setup is well-structured:
- Resource attributes correctly identify the service, version, and worker type
- The comment at lines 81-89 clearly explains the multi-worker challenge with Prometheus pull-based metrics
- Push-based OTLP exporter is an appropriate solution
The 3-second export interval (line 98) is reasonable for development/testing but might need tuning for production workloads.
1-9: All OpenTelemetry dependencies referenced in the imports are properly declared inpackages/core/telemetry/package.jsonwith valid version specifiers. No issues detected.
View all
Benchmarks summary: 83/83 OK ✅ |
This PR exposes simple telemetry using OpenTelemetry APIs. I'm exposing some JIP-3 events, but please note this is not JIP-3 implementation (TART), because that will need a custom protocol.
However it's possible to run multiple nodes and have them push to prometheus instance. See instructions for testing it locally.
Because we run multiple worker threads, it's not trivial to expose metrics and have prometheus pull them - see explanation in
telemetrypackage.