Cross-release consistency of telemetry names/values #166

donbourne · 2024-03-18T02:03:07Z

Users of MP Telemetry will expect to be able to use the technology across different servers that are running with apps that
depend on different JEE / Jakarta EE levels. Monitoring tools that interface with MP Telemetry often provide dashboards which
render context and signal data from spans, metrics and logs and that rely on the exact names of metrics or attributes.

Consistency of telemetry names/values

Problem: if MP Telemetry v1.1 exposes a metric A, and MP Telemetry v2.0 renames that metric to B, and if MP Telemetry versions each only work with a subset of currently prevalent EE apps, then monitoring tools need to have 2 different dashboards -- one for apps running with MP Telemetry v1.1 and one for apps running with MP Telemetry v2.0. This problem can be foreseen as OpenTelemetry gradually modifies metric names on their way to becoming part of stable semantic conventions.
To ensure all apps can be monitored using the same dashboard regardless of EE version, there are a few solution options:
- option 1a
  - Have MP Telemetry include a configurable instrumentation version to be used for choosing which metric/span/attribute names/values to use. The instrumentation version would ideally be respected by all spans/metrics emitted by runtimes and MP Telemetry-enabled components. Newer releases of MP Telemetry would be required to be able to use metric/span/attribute names/values used by all older releases (backward compatibility).
  - Pro: no configuration change required to the OTel Collector
  - Pro: app owners can configure their new MP Telemetry compatible runtime to use the same metric/span/attribute names/values as older releases of MP Telemetry. That means they can use the dashboards that were compatible with the older releases.
  - Con: app owners can't use dashboards that are compatible with the new release of MP Telemetry until all apps have been upgraded to an EE level supported by the new MP Telemetry.
- option 1b
  - same as option 1a, plus older releases of MP Telemetry would be required to be able to use metric/span/attribute names/values defined by all newer releases (forward compatibility).
  - Pro: no configuration change required to the OTel Collector
  - Pro: app owners can configure their new MP Telemetry compatible runtime to use the same metric/span/attribute names/values as older releases of MP Telemetry. That means they can use the dashboards that were compatible with the older releases.
  - Pro: app owners can configure their old MP Telemetry compatible runtime to use the same metric/span/attribute names/values as new releases of MP Telemetry. That means they can use the dashboards that are compatible with new releases.
- option 2
  - Publish OTel Collector configurations that would enable deployers of the collector to rename incompatible metric/span names/labels/values using https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor (stability alpha). Each new MP Telemetry release would publish a configuration to switch all names/values to those used in all older releases. Each old MP Telemetry release would publish a configuration to switch all names/values to those used in all newer releases.
  - Pro: app owners can configure their new MP Telemetry compatible runtime to use the same metric/span/attribute names/values as older releases of MP Telemetry. That means they can use the dashboards that were compatible with the older releases.
  - Pro: app owners can configure their old MP Telemetry compatible runtime to use the same metric/span/attribute names/values as new releases of MP Telemetry. That means they can use the dashboards that are compatible with new releases.
  - Con: app owners have to coordinate changes to OTel Collector configuration with the deployers that control the OTel Collector setup. May not be welcome in environments where the OTel Collector is shared.

I believe the best options are 1b or 2. Looking for opinions on which is better.

pdudits · 2024-03-18T16:59:10Z

First let me remind that we are in fact introducting incompatibility in this release due to move to stable HTTP attributes in traces. OpenTelemetry did suggest common property to optionally emit duplicate attributes but we chose not to support it. I understand that this issue is seeking long term MP specific solution.

I agree that 1a would promote not upgrading to newer versions and is therefore undesireable.

Option 1b as you described it would require defining some translation format that could be published as an amendment to already released spec. While this is achievable, we face a risk of defining that translation format to be either too simple for our future needs or too complicated for what we will actually do.

Option 2 seems the most practical one even though bit out of place as it doesn't specify any Java API or runtime behavior.I can imagine such configuration file being a non-normative appendix of the spec. It is more practical than 1b, because Collector already exists and is designed to cover practical needs of signal transformations.

I wouldn't worry about users not being able to change their existing Collector configuration, as they should be able to deploy additional intermediate Collector that would just do the translation.

donbourne · 2024-03-19T15:21:25Z

from discussion on Mar 18th in MP Telemetry call:

Option 1b too hard?

Would use a name mapping file to avoid need for new releases of old releases
Span name changes are one thing - structure changes
Implemented a switch in Payara already to select between 1.1 and 1.0 span names

Option 2 sounds most natural

Can deploy an intermediate collector if they don’t have access to central collector
Doesn’t feel like MP if we rely on changes to OTel config to achieve this
Could list the OTel transformation in an appendix to the spec

Option 1a

Has result of keeping people on oldest version and never updating
When semantic versioning moves forward then old apps can’t adopt them
At some point you need to build tolerant dashboards that support both old and new metrics
Major releases with breaking changes cause tension

Conclusion: Most people sound to be in favor of option 2 over options 1a/1b. We can try to use the changes to span names that we already have in this release as the first ones to handle this way.

donbourne changed the title ~~Cross release stability~~ Cross-release consistency of telemetry names/values Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-release consistency of telemetry names/values #166

Cross-release consistency of telemetry names/values #166

donbourne commented Mar 18, 2024

pdudits commented Mar 18, 2024

donbourne commented Mar 19, 2024

Cross-release consistency of telemetry names/values #166

Cross-release consistency of telemetry names/values #166

Comments

donbourne commented Mar 18, 2024

pdudits commented Mar 18, 2024

donbourne commented Mar 19, 2024