feat(metrics): improve observability interoperability#98
Conversation
yordis
commented
Apr 24, 2026
- Operators should be able to export runtime metrics into standard OpenTelemetry pipelines without maintaining a separate extension surface.
- The server should remain quiet by default while still respecting established OTEL environment variable conventions when metrics export is intentionally enabled.
- Keeping this built in reduces deployment friction for observability-first environments while preserving the existing Prometheus path.
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
PR SummaryMedium Risk Overview Refactors Adds the Reviewer note (env vars): this PR relies on standard Reviewed by Cursor Bugbot for commit bde34e6. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 51 minutes and 45 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThe pull request adds OpenTelemetry Protocol (OTLP) metrics exporter support to the codebase. It introduces the OTLP exporter NuGet package, adds configuration infrastructure for enabling/disabling OTLP exports, refactors the metrics setup in ClusterVNodeStartup to conditionally register the OTLP exporter based on configuration and environment variables, and includes tests validating the configuration binding. Changes
Sequence DiagramsequenceDiagram
participant App as Application Startup
participant Config as IConfiguration
participant Metrics as ClusterVNodeStartup
participant OTEL as OpenTelemetry SDK
participant Exporter as OTLP Exporter
App->>Config: Load configuration + environment variables
Config-->>Metrics: Return MetricsConfiguration with Otlp.Enabled
Metrics->>Metrics: Check Otlp.Enabled & OTEL_METRICS_EXPORTER
alt OTLP Enabled
Metrics->>Metrics: Validate OTEL_EXPORTER_OTLP_ENDPOINT (absolute URI)
Metrics->>Metrics: Parse OTEL_EXPORTER_OTLP_TIMEOUT (positive int)
Metrics->>Metrics: Map OTEL_EXPORTER_OTLP_PROTOCOL to OtlpExportProtocol
Metrics->>OTEL: ConfigureMetrics(includeOtlp: true)
OTEL->>OTEL: AddMetrics with Prometheus + Histogram views
OTEL->>OTEL: AddOtlpExporter with validated settings
OTEL->>Exporter: Initialize with endpoint, headers, timeout, protocol
Exporter-->>OTEL: Ready to export metrics
else OTLP Disabled
Metrics->>OTEL: ConfigureMetrics(includeOtlp: false)
OTEL->>OTEL: AddMetrics (Prometheus only, no OTLP)
end
OTEL-->>Metrics: Metrics provider configured
Metrics-->>App: Startup complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
src/EventStore.Core/ClusterVNodeStartup.cs (5)
327-330:ExpectedScrapeIntervalSecondsis now doing double duty — worth noting.This field was historically the Prometheus scrape cadence hint; it's now also the OTLP push
ExportIntervalMillisecondsfallback. The semantic is close enough, but the name no longer reflects the full scope. A doc comment on the property inMetricsConfiguration(or a release note) clarifying that it also drives the OTLP periodic export interval whenOTEL_METRIC_EXPORT_INTERVALis unset would prevent confusion.Also confirm the unit correctness:
ExpectedScrapeIntervalSeconds * 1000insidechecked(...)guards overflow, but there's no lower-bound guard. If an operator configuresExpectedScrapeIntervalSeconds = 0, the> 0check skips assignment and the SDK's default interval applies — that's fine. But negative values (possible via config binding) would also be skipped by the> 0check, silently masking misconfiguration. Minor.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/EventStore.Core/ClusterVNodeStartup.cs` around lines 327 - 330, The MetricsConfiguration.ExpectedScrapeIntervalSeconds name and semantics are ambiguous because it now also controls OTLP push fallback; add a doc comment on the MetricsConfiguration.ExpectedScrapeIntervalSeconds property stating it is the Prometheus scrape cadence hint and is used as the OTLP PeriodicExportingMetricReader export interval when OTEL_METRIC_EXPORT_INTERVAL/ExportIntervalMilliseconds is not set; also validate its value during configuration binding or in the constructor (reject/log and default on negative values) so negatives aren't silently ignored, and ensure callers that set readerOptions.PeriodicExportingMetricReaderOptions.ExportIntervalMilliseconds (the assignment using OtelMetricExportInterval and ExpectedScrapeIntervalSeconds * 1000) are documented to be milliseconds to avoid unit confusion.
350-358: Endpoint validation: consider also rejecting non-http(s) schemes.
Uri.TryCreate(..., UriKind.Absolute, ...)will happily accept schemes likefile://orftp://. The OTLP spec only defineshttp/httpsas valid schemes for HTTP transport (and gRPC acceptshttp/httpstoo in the .NET implementation). An early, explicit check here produces a better error than a late failure deep in the exporter.🛡️ Suggested tightening
- if (!Uri.TryCreate(endpoint, UriKind.Absolute, out var endpointUri)) + if (!Uri.TryCreate(endpoint, UriKind.Absolute, out var endpointUri) || + (endpointUri.Scheme != Uri.UriSchemeHttp && endpointUri.Scheme != Uri.UriSchemeHttps)) throw new InvalidConfigurationException( - $"{OtelExporterOtlpMetricsEndpoint} must be an absolute URI."); + $"{OtelExporterOtlpMetricsEndpoint} must be an absolute http:// or https:// URI.");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/EventStore.Core/ClusterVNodeStartup.cs` around lines 350 - 358, The code validates that OtelExporterOtlpMetricsEndpoint is an absolute URI but doesn't reject non-HTTP schemes; update the block that creates endpointUri (the Uri.TryCreate call and subsequent assignment to exporterOptions.Endpoint) to also verify endpointUri.Scheme is "http" or "https" (case-insensitive) and throw an InvalidConfigurationException with a clear message referencing OtelExporterOtlpMetricsEndpoint if it is not; keep the absolute-URI check, perform the scheme check immediately after Uri.TryCreate, and only set exporterOptions.Endpoint when both checks pass.
334-339: Nit: const placement.The
OtelMetricsExporter,OtelMetricExportInterval,OtelExporterOtlpMetrics*constants are declared after the methods that use them. C# allows this, but by convention constants/fields are grouped at the top of the type (or at least above the methods that reference them). Consider moving them alongside the other field declarations around line 64 for discoverability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/EventStore.Core/ClusterVNodeStartup.cs` around lines 334 - 339, Move the Otel-related constant declarations (OtelMetricsExporter, OtelMetricExportInterval, OtelExporterOtlpMetricsEndpoint, OtelExporterOtlpMetricsHeaders, OtelExporterOtlpMetricsTimeout, OtelExporterOtlpMetricsProtocol) from below the methods into the top of the ClusterVNodeStartup class where other fields/constants are declared so they appear before any methods that reference them; update their placement so they are grouped with the existing field declarations for better discoverability and maintain original const modifiers and names.
374-384: Protocol mapping: OK for the SDK, but error message could be more actionable.
OtlpExportProtocolin OpenTelemetry .NET 1.15.3 only exposesGrpcandHttpProtobuf, so rejecting anything else is correct. However, the OTel spec also listshttp/jsonas a valid value, and users coming from other OTel SDKs may try it. Consider making the exception message reference the supported values and the unsupported-but-spec-valid one explicitly, so the failure mode is obvious:💬 Suggested message
- _ => throw new InvalidConfigurationException( - $"{OtelExporterOtlpMetricsProtocol} must be 'grpc' or 'http/protobuf'.") + _ => throw new InvalidConfigurationException( + $"{OtelExporterOtlpMetricsProtocol} must be 'grpc' or 'http/protobuf' " + + $"('http/json' is not supported by the OpenTelemetry .NET SDK).")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/EventStore.Core/ClusterVNodeStartup.cs` around lines 374 - 384, Update the thrown InvalidConfigurationException in the protocol parsing block so the message clearly lists the supported values for OtlpExportProtocol (Grpc, HttpProtobuf) and also mentions that "http/json" is a valid OTLP spec value but not supported by this SDK; locate the Environment.GetEnvironmentVariable(OtelExporterOtlpMetricsProtocol) handling and change the InvalidConfigurationException raised when setting exporterOptions.Protocol to include OtelExporterOtlpMetricsProtocol, the allowed values ("grpc", "http/protobuf") and note "http/json" as spec-valid but unsupported to make the failure actionable.
323-331: Add clarifying comment about manual signal-specific environment variable handling.The code correctly applies
OTEL_EXPORTER_OTLP_METRICS_*environment variables manually becauseMeterProviderBuilder.AddOtlpExporterdoes not honor these signal-specific variables—only the genericOTEL_EXPORTER_OTLP_*variables are auto-applied. Without documentation, a future maintainer might mistakenly remove this call thinking it duplicates SDK behavior.Suggested comment
meterOptions.AddOtlpExporter((exporterOptions, readerOptions) => { + // NOTE: MeterProviderBuilder.AddOtlpExporter only honors the generic + // OTEL_EXPORTER_OTLP_* vars (via OtlpExporterOptions <- IConfiguration). + // Signal-specific OTEL_EXPORTER_OTLP_METRICS_* vars are not auto-applied, + // so we apply them manually below to comply with the OTel spec + // (per-signal options take precedence). ApplyMetricSpecificOtlpEnvironmentVariables(exporterOptions);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/EventStore.Core/ClusterVNodeStartup.cs` around lines 323 - 331, Add a clarifying comment above the call to meterOptions.AddOtlpExporter explaining that ApplyMetricSpecificOtlpEnvironmentVariables is intentionally invoked to manually apply OTEL_EXPORTER_OTLP_METRICS_* environment variables because MeterProviderBuilder.AddOtlpExporter only auto-applies the generic OTEL_EXPORTER_OTLP_* variables and does not honor signal-specific metric variables; mention that this prevents accidental removal and note that the subsequent conditional sets readerOptions.PeriodicExportingMetricReaderOptions.ExportIntervalMilliseconds from OtelMetricExportInterval/metricsConfiguration.ExpectedScrapeIntervalSeconds when the specific env var is not set. Reference the ApplyMetricSpecificOtlpEnvironmentVariables call, meterOptions.AddOtlpExporter, OtelMetricExportInterval, metricsConfiguration.ExpectedScrapeIntervalSeconds, and readerOptions.PeriodicExportingMetricReaderOptions.ExportIntervalMilliseconds in the comment.src/EventStore.Core.XUnit.Tests/Metrics/OtlpMetricsConfigurationTests.cs (1)
9-35: Consider adding coverage for the OTLP env‑var parsing paths.These tests only cover
Otlp.Enabledbinding. The OTLP wiring added inClusterVNodeStartup.csintroduces several failure modes that currently have no regression safety net:
OTEL_METRICS_EXPORTERcontainingotlp(comma-separated, case-insensitive) enabling the exporter even whenOtlp.Enabled=false.OTEL_EXPORTER_OTLP_METRICS_ENDPOINTrejecting relative URIs.OTEL_EXPORTER_OTLP_METRICS_TIMEOUTrejecting non-positive values.OTEL_EXPORTER_OTLP_METRICS_PROTOCOLmapping toOtlpExportProtocoland throwing on unknown values.Consider extracting
OtlpMetricExporterSelected/ApplyMetricSpecificOtlpEnvironmentVariablesinto aninternal statichelper (or a small class) so they can be unit‑tested directly without spinning up the full startup pipeline.Want me to sketch the helper extraction and accompanying tests?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/EventStore.Core.XUnit.Tests/Metrics/OtlpMetricsConfigurationTests.cs` around lines 9 - 35, Tests only validate binding of Otlp.Enabled but not the environment-variable parsing introduced in ClusterVNodeStartup.cs; extract the logic in OtlpMetricExporterSelected / ApplyMetricSpecificOtlpEnvironmentVariables into an internal static helper (or small internal class) so it can be called from unit tests, then add tests that cover: OTEL_METRICS_EXPORTER parsing (comma-separated, case-insensitive, detecting "otlp" to enable exporter even if Otlp.Enabled is false), OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (rejects relative URIs), OTEL_EXPORTER_OTLP_METRICS_TIMEOUT (rejects non‑positive values), and OTEL_EXPORTER_OTLP_METRICS_PROTOCOL (maps to OtlpExportProtocol and throws on unknown values); keep existing MetricsConfiguration.Get behavior and use the new helper from ClusterVNodeStartup.cs to preserve runtime behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/EventStore.Core/ClusterVNodeStartup.cs`:
- Around line 319-346: Document in the release notes that our code intentionally
diverges from the OTel spec by not treating OTEL_METRICS_EXPORTER as defaulting
to "otlp" when unset (reference OTEL_METRICS_EXPORTER and the behavior in
OtlpMetricExporterSelected()), and add a short migration note for operators;
also update the OtlpMetricExporterSelected() implementation to be more
future-proof by matching exporters that start with "otlp" (case-insensitive) so
values like "otlp/stdout" are recognized as selecting OTLP.
---
Nitpick comments:
In `@src/EventStore.Core.XUnit.Tests/Metrics/OtlpMetricsConfigurationTests.cs`:
- Around line 9-35: Tests only validate binding of Otlp.Enabled but not the
environment-variable parsing introduced in ClusterVNodeStartup.cs; extract the
logic in OtlpMetricExporterSelected /
ApplyMetricSpecificOtlpEnvironmentVariables into an internal static helper (or
small internal class) so it can be called from unit tests, then add tests that
cover: OTEL_METRICS_EXPORTER parsing (comma-separated, case-insensitive,
detecting "otlp" to enable exporter even if Otlp.Enabled is false),
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (rejects relative URIs),
OTEL_EXPORTER_OTLP_METRICS_TIMEOUT (rejects non‑positive values), and
OTEL_EXPORTER_OTLP_METRICS_PROTOCOL (maps to OtlpExportProtocol and throws on
unknown values); keep existing MetricsConfiguration.Get behavior and use the new
helper from ClusterVNodeStartup.cs to preserve runtime behavior.
In `@src/EventStore.Core/ClusterVNodeStartup.cs`:
- Around line 327-330: The MetricsConfiguration.ExpectedScrapeIntervalSeconds
name and semantics are ambiguous because it now also controls OTLP push
fallback; add a doc comment on the
MetricsConfiguration.ExpectedScrapeIntervalSeconds property stating it is the
Prometheus scrape cadence hint and is used as the OTLP
PeriodicExportingMetricReader export interval when
OTEL_METRIC_EXPORT_INTERVAL/ExportIntervalMilliseconds is not set; also validate
its value during configuration binding or in the constructor (reject/log and
default on negative values) so negatives aren't silently ignored, and ensure
callers that set
readerOptions.PeriodicExportingMetricReaderOptions.ExportIntervalMilliseconds
(the assignment using OtelMetricExportInterval and ExpectedScrapeIntervalSeconds
* 1000) are documented to be milliseconds to avoid unit confusion.
- Around line 350-358: The code validates that OtelExporterOtlpMetricsEndpoint
is an absolute URI but doesn't reject non-HTTP schemes; update the block that
creates endpointUri (the Uri.TryCreate call and subsequent assignment to
exporterOptions.Endpoint) to also verify endpointUri.Scheme is "http" or "https"
(case-insensitive) and throw an InvalidConfigurationException with a clear
message referencing OtelExporterOtlpMetricsEndpoint if it is not; keep the
absolute-URI check, perform the scheme check immediately after Uri.TryCreate,
and only set exporterOptions.Endpoint when both checks pass.
- Around line 334-339: Move the Otel-related constant declarations
(OtelMetricsExporter, OtelMetricExportInterval, OtelExporterOtlpMetricsEndpoint,
OtelExporterOtlpMetricsHeaders, OtelExporterOtlpMetricsTimeout,
OtelExporterOtlpMetricsProtocol) from below the methods into the top of the
ClusterVNodeStartup class where other fields/constants are declared so they
appear before any methods that reference them; update their placement so they
are grouped with the existing field declarations for better discoverability and
maintain original const modifiers and names.
- Around line 374-384: Update the thrown InvalidConfigurationException in the
protocol parsing block so the message clearly lists the supported values for
OtlpExportProtocol (Grpc, HttpProtobuf) and also mentions that "http/json" is a
valid OTLP spec value but not supported by this SDK; locate the
Environment.GetEnvironmentVariable(OtelExporterOtlpMetricsProtocol) handling and
change the InvalidConfigurationException raised when setting
exporterOptions.Protocol to include OtelExporterOtlpMetricsProtocol, the allowed
values ("grpc", "http/protobuf") and note "http/json" as spec-valid but
unsupported to make the failure actionable.
- Around line 323-331: Add a clarifying comment above the call to
meterOptions.AddOtlpExporter explaining that
ApplyMetricSpecificOtlpEnvironmentVariables is intentionally invoked to manually
apply OTEL_EXPORTER_OTLP_METRICS_* environment variables because
MeterProviderBuilder.AddOtlpExporter only auto-applies the generic
OTEL_EXPORTER_OTLP_* variables and does not honor signal-specific metric
variables; mention that this prevents accidental removal and note that the
subsequent conditional sets
readerOptions.PeriodicExportingMetricReaderOptions.ExportIntervalMilliseconds
from OtelMetricExportInterval/metricsConfiguration.ExpectedScrapeIntervalSeconds
when the specific env var is not set. Reference the
ApplyMetricSpecificOtlpEnvironmentVariables call, meterOptions.AddOtlpExporter,
OtelMetricExportInterval, metricsConfiguration.ExpectedScrapeIntervalSeconds,
and
readerOptions.PeriodicExportingMetricReaderOptions.ExportIntervalMilliseconds in
the comment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 17932e32-89c0-4bda-8aeb-1f31210ba1fa
📒 Files selected for processing (6)
src/Directory.Packages.propssrc/EventStore.ClusterNode/metricsconfig.jsonsrc/EventStore.Common/Configuration/MetricsConfiguration.cssrc/EventStore.Core.XUnit.Tests/Metrics/OtlpMetricsConfigurationTests.cssrc/EventStore.Core/ClusterVNodeStartup.cssrc/EventStore.Core/EventStore.Core.csproj
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit bde34e6. Configure here.
