Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a way to specify a server otel config independently of app otel config #169

Open
donbourne opened this issue Mar 27, 2024 · 47 comments · May be fixed by #180
Open

Need a way to specify a server otel config independently of app otel config #169

donbourne opened this issue Mar 27, 2024 · 47 comments · May be fixed by #180

Comments

@donbourne
Copy link
Member

For servers that have the ability to run more than one app we currently have the ability to specify the otel config for each app separately using microprofile-config.properties files.

In MP Telemetry 2.0, when we add logs and metrics, we'll need the server to be a first class provider of otel data as well. For example, logs emitted during server startup do not relate to any app, and many of the metrics (eg. JVM metrics) do not relate to any app.

As the server starts it needs to create its own otel instance so that metrics and logs have somewhere to go.

The problem is, MP Config doesn't let us specify different config attribute values for the server and for apps using the same config property names.

A few solution options to consider:

  1. specify new server-specific config attribute names that apply only to the server level. eg. mp.server.otel.disabled, mp.server.otel.traces.exporter, etc.
  2. change how mp config works to allow for the same config property name to have different values for each app and the server (perhaps with a microprofile-config.properties file at server level?)
  3. make users use server-visible attributes for everything other than otel.service.name
    • wouldn't be able to handle sending data from different apps to different otel endpoints or with different span propagation protocols

note that for (1) we can't use config attributes that start with otel per https://opentelemetry.io/docs/specs/semconv/general/attribute-naming/#otel-namespace

@donbourne
Copy link
Member Author

It seems to me that option 1 may be the only viable option. Option 2 would add a lot of complexity to MP Config, so may not be a good idea. Option 3 is rather limited in that it requires server-level attributes to not overlap with app-level attributes (since the server-level attributes in environment variables would take precedence over the app-level attributes in microprofile-config.properties).

@donbourne
Copy link
Member Author

Brought up the question of what the right approach is for handling logs/traces/metrics in case where you have multiple apps and server producing telemetry.


from April 4, 2024 OpenTelemetry Java + Instrumentation SIG minutes:

[don] question on best usage of openTelemetry instances and resource attributes for app servers running multiple apps
I have multiple apps running on a server, each with their own OpenTelemetry instance, each with their own service.name. The app server itself generates logs and metrics that are unrelated to the apps. If I create another OpenTelemetry instance for the app server, I could send those logs and metrics as well. I want the apps and server to appear as related resources in monitoring tools. Is having a separate OpenTelemetry instance for each app and for the app server the best approach? Is there a best practice for how to set up resource attributes so that monitoring tools know to associate the apps on a server with the app server they are running on?
https://github.com/open-telemetry/community/blob/main/projects/resources-and-entities.md#problem-1-commingling-of-entities

app1:
service.name=app1
host.name=somehost
process.pid=123
app2:
service.name=app2
host.name=somehost
process.pid=123
server:
service.name=appServerX ?
host.name=somehost
process.pid=123

Comments from the call suggest that having separate OTel sdk instances is the right approach for this, and that the resource attributes mentioned above look as people would expect.

Comments also indicated that the monitoring tools very likely don't handle this situation in an elegant way, nor with any consistency. They also said this is an issue that is coming up from others as well.

related to this, there is a new OTel SIG which started last week - see https://github.com/open-telemetry/community/blob/main/projects/resources-and-entities.md#problem-1-commingling-of-entities
see Specification: Entities at https://github.com/open-telemetry/community#specification-sigs

@donbourne
Copy link
Member Author

donbourne commented Apr 7, 2024

The above comment has 2 somewhat contradictory points:

  • using separate OTel sdk instances for each of multiple apps and the app server they run on is reasonable
  • monitoring tools don't elegantly handle showing the relationship between the apps and the server when the data from them comes from different OTel sdk instances.

Here are a few options to consider:

Option A
Let user choose between using a SINGLE OTel sdk instance for the whole server (mp.server.otel.single_instance = True) and SEPARATE OTel sdk instances for servers vs. apps (mp.server.otel.single_instance = False). Users might decide based on whether their monitoring tools do a good job of correlating telemetry from separate OTel instances and whether they plan to run more than one app on the server. Provide a way for any configuration attributes related to OTel to be specified at server level (eg. mp.server.otel.*).

Option B
ALWAYS USE SEPARATE OTel sdk instances for servers vs. apps for app servers that can run multiple apps. This avoids monitoring tools and end users having inconsistent presentation in monitoring tools caused by sometimes having all the data together and sometimes not (as in Option A). The downside of this option is that it may require considerable time for OTel spec and monitoring tool changes to occur to facilitate correlation of telemetry across apps and their associated server.

Option C
ALWAYS USE ONE OTel sdk instance for the whole server, regardless of whether the server is running one or multiple apps. The downside of this is it would make it harder for people viewing data in monitoring tools to tell which logs/traces/metrics come from which apps.

@donbourne
Copy link
Member Author

I'm leaning towards Option A. I expect most of the time the best answer for users will be to use one OTel instance for the whole server, so we might want to consider making mp.server.otel.single_instance = True the default. Having the ability to use one instance per app could be used by customers that run multiple apps on the same server and that have monitoring tools that can handle that.

@Emily-Jiang
Copy link
Member

I'm leaning towards Option A. I expect most of the time the best answer for users will be to use one OTel instance for the whole server, so we might want to consider making mp.server.otel.single_instance = True the default. Having the ability to use one instance per app could be used by customers that run multiple apps on the same server and that have monitoring tools that can handle that.

+1. It is a best practice for the deployment of having one app per server. However, this spec needs to define the behavior of what behaviour is for multiple apps per server when defining single_instance to be true or false.

@pdudits
Copy link
Contributor

pdudits commented Apr 15, 2024

I was briefly considering other options such as "inheriting" server span processors or having server export metrics to all application sdks. But in the end that either hits the closed-for-extension design of OpenTelemetry SDK or lead to duplicate measurements.

Until OpenTelemetry has a solution for "less global" Resource definition, I think I'm leaning towards variant of Option A, but I think we can make it work without explicit properties:

  • When a runtime supports running multiple applications it MAY use a global OpenTelemetry instance. All global metrics, such as Java Virtual Machine statistics will be exported via this global instance
  • The global instance is auto-configured by OTEL system properties and environment variables, or by vendor specific means.
  • If application only specifies otel.disabled=false, then runtime MAY use its global instance as source of signal providers.
  • To help distinguish applications in produced signals implementations SHOULD set Instrumentation Scope Name to the application name on injected Tracers and Meter instances.
  • If any other config properties prefix with otel are used, the runtime SHALL create separate SDK instance, where application will be identified by service.name resource attribute unless overriden in config properties (otel.service.name / otel.resource.attributes).

I will not be able to join the call today, so we may discuss it further next week.

@donbourne
Copy link
Member Author

@pdudits ,

The global instance is auto-configured by OTEL system properties and environment variables, or by vendor specific means.

I think we would want people to still use MP Config to configure this, regardless of whether they are using a global instance or app-specific instances ("vendor specific means" sounds inconsistent with that goal). If we do want everything to be configurable by MP Config, then that motivates needing mp.otel.* configuration attributes, since MP Config doesn't provide a way to distinguish between server-wide and app-specific config attributes.

Are you thinking that we could have mp.otel.* MP config attributes but that we don't need to introduce a new mp.server.otel.single_instance attribute (since that could be inferred by the absence of any otel.* config attributes?). I believe that could work, and each app that needs its own service name could trigger a new OTel sdk instance to be created by specifying one or more otel.* properties, whereas the server OTel sdk instance is always configured using the mp.otel.* attributes. That said, would we expect all runtimes to support mp.otel.* attributes, or only app servers that support running multiple apps? I think I prefer the latter.

@tjquinno
Copy link

IIUC Patrik has made a careful distinction that I was also wondering about.

There seem to be two goals:

  1. We want to be able to distinguish among the signals about the server and the signals about each of the applications. (Patrik's 4th bullet point)
  2. We want to be able to (optionally) differently configure the OTel behavior governing how signals are emitted for the server and for each of the applications. (Patrik's 5th bullet point)

What Patrik has describes seems reasonable for multi-app implementations.

We also have two categories of MP implementations to consider: those which do support multiple apps and those which do not.

Is goal 1 something we would want for single-application implementations (as well as multi-app implementations)? That is, allow the emitted signals to indicate whether they are from the server or the application?

Is goal 2 also valid for single-app implementations? That is, are we mandating that all implementations would permit configuring the server OTel SDK differently from the application OTel SDK?

Or, is one or both of these goals optional parts of the spec which multi-app vendors would satisfy but single-app vendors need not?

@pdudits
Copy link
Contributor

pdudits commented Apr 16, 2024

If we do want everything to be configurable by MP Config, then that motivates needing mp.otel.* configuration attributes, since MP Config doesn't provide a way to distinguish between server-wide and app-specific config attributes.

The reason I wouldn't want to standardize server-wide Config attributes is because then we need to specify what happens when two or more applications deployed into the runtime define these. Also -- at least in case of Payara -- runtime starts sooner than an application-level properties is discovered, requiring re-initialization of server SDK.

Relying on global values instead gives better defaults -- heterogenous environments would define OTEL endpoints via environment variables anyway (I suppose), naturally contributing to both global and any overriden SDKs.

@Emily-Jiang
Copy link
Member

@pdudits ,

The global instance is auto-configured by OTEL system properties and environment variables, or by vendor specific means.

I think we would want people to still use MP Config to configure this, regardless of whether they are using a global instance or app-specific instances ("vendor specific means" sounds inconsistent with that goal). If we do want everything to be configurable by MP Config, then that motivates needing mp.otel.* configuration attributes, since MP Config doesn't provide a way to distinguish between server-wide and app-specific config attributes.

It would be a wrong design if MP Config differentiates the server and app, All configuration are the same.

@donbourne
Copy link
Member Author

donbourne commented Apr 16, 2024

The reason I wouldn't want to standardize server-wide Config attributes is because then we need to specify what happens when two or more applications deployed into the runtime define these. Also -- at least in case of Payara -- runtime starts sooner than an application-level properties is discovered, requiring re-initialization of server SDK.

I was thinking we'd only use mp.server.otel.* for the server-level config, and we'd tell users that setting those in a microprofile-config.properties file is not supported (ie. we will ignore those attributes when initializing the otel instance for an app).


To help distinguish applications in produced signals implementations SHOULD set Instrumentation Scope Name to the application name on injected Tracers and Meter instances.

I don't believe the Instrumentation Scope Name is the right place to specify the application name. The instrumentation scope name may be used by open source components or instrumentation libraries when they get Tracers / Meters / Loggers, and those open source components and instrumentation libraries may be used with more than one app running in the same server. I was thinking the service.name resource attribute was a better place for the app name.


In order for the above to work, I think we'd end up with something like this...

App server that can run multiple apps at same time

  • if you want the whole server to use one otel instance:
    • use mp.server.otel.* attributes with a global mechanism (anything other than microprofile-config.properties file) to specify settings
    • do not use otel.* attributes
  • if you want the server to use one otel instance and your apps to use separate otel instances:
    • use mp.server.otel.* attributes with a global mechanism (anything other than microprofile-config.properties file) to specify server-level settings
    • use otel.* attributes in microprofile-config.properties file to specify app-level settings

Runtime that can only run one app

  • use otel.* attributes with a global mechanism (anything other than microprofile-config.properties file) to specify settings
  • do not use mp.server.otel.* attributes

@donbourne
Copy link
Member Author

@Emily-Jiang , in your comment... #169 (comment) were you objecting to the suggestion of having some attributes configured with MP Config and some through other means, or were you objecting to having some attributes (otel.*) that are intended for app config and some attributes (mp.server.otel.*) that are intended for server config?

@pdudits
Copy link
Contributor

pdudits commented Apr 16, 2024

I don't believe the Instrumentation Scope Name is the right place to specify the application name. I was thinking the service.name resource attribute was a better place for the app name.

I agree, but this issue started with premise that using different resource attributes within single SDK is an known unsolved design issue with Otel Sdk. I am therefore searching for solution that would allow using single SDK instance on multi-app server for better monitoring consistency and ease of configuration, while still being able to distuguish traces and metrics of individual applications.

So first option is to use separate Sdk instances with different resource attributes, ultimately leading to separation of server and application metrics among different Otel resources and more runtime overhead, but emitting signals with perfect structure.

Instrumentation scope our is our second best option. Spec already discourages use of arbitrary scopes, so we may as well use it for our purposes. All signals belong to single resource and instrumentation scopes may be used to slice across applications.

Third option is to standardize signal attribute such as mp.app to carry application name.

I was thinking we'd only use mp.server.otel.* for the server-level config, and we'd tell users that setting those in a microprofile-config.properties file is not supported

I'd argue that using unprefixed properties (also) for global instance gives more consistency with single-use case, especially if we consider multi-app use to be the less relevant one.

App server that can run multiple apps at same time

  • if you want the whole server to use one otel instance

    • use otel.* attributes with a global mechanism (anything other than microprofile-config.properties file) to specify settings
    • do not specify anything beyond otel.disabled=false in microprofile-config.properties
  • if you want the server to use one otel instance and your apps to use separate otel instances:

    • use otel.* attributes with a global mechanism (anything other than microprofile-config.properties file) to specify server-level settings
    • use additional otel.* attributes in microprofile-config.properties file to override app-level settings
    • due to way MP config is designed, this result in merging global and app-level attributes. I don't know if it is useful to prevent this. In such case we could define property mp.otel.standalone to prevent use of global mechanisms in app configuration.
  • Runtime that can only run one app

    • use otel.* attributes to specify settings

If global config mechanisms are used for configuring Otel endpoints -- and I would expect them to be used much more than app configurations -- then all three cases are configured the same way. And spec only needs to describe behavior of multi-app runtimes with minimal configuration versus a more complex one.

@pdudits
Copy link
Contributor

pdudits commented Apr 16, 2024

One restriction of using shared SDK instance that didn't get mentioned yet is that any autoconfiguration service providers that application would ship with will not be enabled.
Therefore the requirement for using shared instance would be "only otel.disabled=false in config and no service providers registered for AutoConfigurationCustomizerProvider or ResourceProvider". (Other SPIs require some additonal otel property to be defined to in order to be effective).

@donbourne
Copy link
Member Author

Thanks for clarifying on the instrumentation scope. I think I now get what you're trying to do, avoiding having multiple openTelemetry instances to avoid the problem correlating data between them on the monitoring tool side. That said, I'm not sure how we would get instrumentation scope to always be the app name, since instrumentation libraries won't use that and we can only encourage customers to use it for their own meters / tracers, and since it would need to match the value we use when we get the loggers on the apps behalf.

use additional otel.* attributes in microprofile-config.properties file to override app-level settings

The property file config source has a lower ordinal than the env var config source. So the env var values override the property file values and that means you couldn't have the same property specified by 2 different apps (in property files) and at the global level for the server (since the global value would override the 2 app values).

@pdudits
Copy link
Contributor

pdudits commented Apr 18, 2024

The property file config source has a lower ordinal than the env var config source. So the env var values override the property file values and that means you couldn't have the same property specified by 2 different apps (in property files) and at the global level for the server (since the global value would override the 2 app values).

Ah, right. So that would be the usecase for mp.otel.standalone=true -- but then it would violate the principle @Emily-Jiang is referring to -- only considering property file for an application goes against equality of config sources.

Another way to avoid prefix would then be to use different property file for app, such as META-INF/microprofile-telemetry.properties if app wants to use standalone configuration. This probably goes too far, but I'm trying really hard to have scanarios "Whole server uses single SDK" / "Single-app runtime" configured the same way.

That said, I'm not sure how we would get instrumentation scope to always be the app name, since instrumentation libraries won't use that and we can only encourage customers to use it for their own meters / tracers, and since it would need to match the value we use when we get the loggers on the apps behalf.

That's what we would do for injected tracers and meters (and any logging integration we might do). As far as I'm aware, the only instrumentation libraries out there are agents, and I remember us discussing that they will have hard time making them behave according to spec anyway. But also I'm suggesting that as optional feature anyway.

@donbourne
Copy link
Member Author

Another way to avoid prefix would then be to use different property file for app, such as META-INF/microprofile-telemetry.properties if app wants to use standalone configuration. This probably goes too far, but I'm trying really hard to have scanarios "Whole server uses single SDK" / "Single-app runtime" configured the same way.

I like the goal of having consistent config style for any server that is running just one app. How about this...?

App server running one app / Runtime that can only run one app

  • use otel.* attributes with a global mechanism (anything other than microprofile-config.properties file) to specify settings

App server running multiple apps

  • use otel.* attributes with a global mechanism (anything other than microprofile-config.properties file) to specify server-level settings
  • use mp.app.otel.* attributes in microprofile-config.properties file to specify app-level settings
  • can also just ignore the fact you have multiple apps and configure the server as mentioned above for 1 app case

@pdudits
Copy link
Contributor

pdudits commented Apr 22, 2024

Yes, this proposal sounds good.

@donbourne
Copy link
Member Author

donbourne commented Apr 22, 2024

update.... the approach below won't work as the attributes from microprofile-config.properties file will be hidden by any env variables set with the same name.

new approach

In the MP Telemetry community call today @Emily-Jiang raised that we need to keep otel.* attributes being valid for app-level configuration as we had in MP Telemetry 1.1. Aiming to also have consistent config style for any server that is running just one app.

during server-level otel-instance initialization:

  • if otel.sdk.disabled=false is set:
    • the otel.* attributes are used to create server-level otel instance
    • server will record the otel.* attribute names/values used
  • else
    • noop otel sdk is used to create server-level otel instance

during app-level otel-instance initialization:

  • if otel.sdk.disabled=false is set:
    • if all otel.* attributes match those recorded at server-level:
      • app will share otel instance from server-level
    • else:
      • the otel.* attributes are used to create app-level otel instance
  • else:
    • noop otel sdk is used to create app-level otel instance

@donbourne
Copy link
Member Author

donbourne commented Apr 23, 2024

I think we have contradictory requirements. we want the server to use otel.* parameters because that's the most common case, but that means the apps can't use otel.* parameters since env vars take precedence over microprofile-config.properties files. The other requirement is that apps be able to continue to use otel.* parameters.

So if the most important requirement is backward compatibility (apps can set otel.* parameters) then we need a different prefix for the server-level parameters.

If the most important requirement is that servers running single apps can use otel.* then we need a different prefix for the app-level parameters.

@Emily-Jiang
Copy link
Member

The bottom line of the design is that the property should not change behaviors in single app vs. multiple app scenarios.

@donbourne
Copy link
Member Author

IMO, it's more important that we avoid breaking backward compatibility with the previous release, so I'd like to propose that we stick with otel.* properties to configure app-level otel instances and introduce mp.server.otel.* properties to configure server-level otel instances. Configuring server and apps to use different otel instances may not work well with many monitoring tools, so the recommended config for MP Telemetry 2.0 would be to only use mp.server.otel.* properties (one config for whole server).

mp.server.otel.*

  • used at server startup time to create otel instance for server.

otel.*

  • used at app startup time to create otel instance for app.
  • if no otel.* properties are present, app will use server's otel instance.

Comments?

@pdudits
Copy link
Contributor

pdudits commented Apr 29, 2024

My concern with with #169 (comment) is the usability for servers running a single app. We discussed this on today's call and I understand that this is the revised proposal

Optional mp.server.otel Configuration:

  • The mp.server.otel configuration is optional. If not specified, the server will default to using global otel.* properties for the server instance configuration, with mp.server.otel taking precedence when present.

Handling Global MP Config Sources:

  • Enabled Server Telemetry (mp.server.otel.disabled=false present): The runtime initializes the central SDK using properties prefixed with mp.server.otel.
  • Disabled Server Telemetry (mp.server.otel.disabled=true present): No runtime-level signals are emitted.
  • Absent Server Telemetry Setting:
    • Enabled Global Telemetry (otel.disabled=false present): The runtime initializes the global SDK with otel.* properties, overriding application-specific settings.
    • Disabled Global Telemetry (otel.disabled=true present): No server or application-level metrics are emitted, superseding any application-level microprofile-config.properties settings.

Application-Level Config Sources:

  • Enabled Application Telemetry (otel.disabled=false): The runtime provides an SDK configured with otel.* properties.
  • Disabled Application Telemetry (otel.disabled=true): No application-level signals are emitted.

The terms "initialize" and "provide" are used here to allow for the possibility of a shared central SDK with applications, though this detail is not required in the specification. I'm also trying to not use word server in the description, because the spec usually says "runtime".

Therefore if user has no concerns with multiple applications or require different settings they can happily use just OTEL_ env variables, if they do they will use MP_SERVER_OTEL_ ones to tell what should the... should we use the word server here? ... runtime do about the global metrics and logs.

@tjquinno
Copy link

@pdudits Nicely done. This (to me) clearly captures the discussion from the meeting.

Let's not lose sight of the fact that, while important to some MP implementations and user communities, the multi-app support is an optional component of the spec. We need to make sure that the use case the spec mandates--single-app behavior without requiring an explicit deployment step--is simple and clear to users.

In Patrik's description above I believe it is.

Also, thanks to Patrik for noting that all this employs normal MP config mechanisms. Users in situations with multiple apps or with a single app that has an app-level META-INF/microprofile-config.properties file might indeed want to use env. vars or system properties to specify the runtime-level settings, but we should not mandate that the runtime-level settings be specified only those ways. For the non-optional requirements in the spec, using a single microprofile-config.properties file containing both otel.* and mp.server.otel.* settings makes perfect sense.

@tjquinno
Copy link

tjquinno commented May 6, 2024

Separate but related question...

In MP in general, what principles guide whether something becomes an optional part of a spec vs. an implementation suggestion if vendors have chosen a particular approach?

In MP metrics, for example, there are both. There are suggestions if a vendor chose to implement using Micrometer, but there are no TCK tests for those suggestions. On the other hand, the REST request metrics are optional; a vendor need not implement them but the behavior is specified and there are optional TCK tests to verify that behavior.

In which category does this multi-app OTel config behavior properly belong?

@donbourne
Copy link
Member Author

@pdudits , I think this makes sense. I looked at your comment from perspective of what properties we would need to consider at server startup time and at app startup time.

One nuance that isn't called out is that we normally want there to only be one SDK instance created at the server level and shared with the app in the case where there is only 1 app. For that, during app startup, I think we need to check if there exists a server SDK instance, and if so, to share that if mp.server.otel.disabled is unset and otel.disabled=false.

I've actually gone through all of the cases below to try to determine how things should be handled at server and app startup time with respect to creating SDK instances. This may seem like overkill, but it helps uncover edge cases like the one I mentioned in the previous paragraph, and hopefully eliminates any ambiguity.


Server init time (can't see microprofile-config.properties config source)

At server init time we need to consider the 9 possible combos resulting from the permutations of:

    mp.server.otel.disabled=true|false|unset & otel.disabled=true|false|unset

runtime initializes the central SDK using noop. effect is that no runtime-level signals are emitted for the following combinations

    mp.server.otel.disabled=true & otel.disabled=true
    mp.server.otel.disabled=true & otel.disabled=false
    mp.server.otel.disabled=true & otel.disabled=<unset>
    mp.server.otel.disabled=<unset> & otel.disabled=true
    mp.server.otel.disabled=<unset> & otel.disabled=<unset>

    = (mp.server.otel.disabled=true & otel.disabled=<any>) | (mp.server.otel.disabled=<unset> & otel.disabled=true|<unset>)
    = mp.server.otel.disabled=true | (mp.server.otel.disabled=<unset> & otel.disabled=true|<unset>)

runtime initializes the central SDK using properties prefixed with mp.server.otel for the following combinations

    mp.server.otel.disabled=false & otel.disabled=true
    mp.server.otel.disabled=false & otel.disabled=false
    mp.server.otel.disabled=false & otel.disabled=<unset>

    = mp.server.otel.disabled=false & otel.disabled=<any>
    = mp.server.otel.disabled=false

runtime initializes the central SDK using properties prefixed with otel for the following combinations

    mp.server.otel.disabled=<unset> & otel.disabled=false

    = mp.server.otel.disabled=<unset> & otel.disabled=false

App init time (can see all config sources)

note, server_sdk_enabled is not meant to be a config property -- it is just a detection we need to do to see if there was a non-noop SDK created at server init time.

At app init time we need to consider the 18 possible combos resulting from the permuations of:

    server_sdk_enabled=true|false & mp.server.otel.disabled=true|false|unset & otel.disabled=true|false|unset

app shares central SDK with server for this app's signals for the following combinations

    server_sdk_enabled=true & mp.server.otel.disabled=true & otel.disabled=true
    server_sdk_enabled=true & mp.server.otel.disabled=true & otel.disabled=<unset>
    server_sdk_enabled=true & mp.server.otel.disabled=<unset> & otel.disabled=true
    server_sdk_enabled=true & mp.server.otel.disabled=<unset> & otel.disabled=<unset>
    server_sdk_enabled=false & mp.server.otel.disabled=true & otel.disabled=true
    server_sdk_enabled=false & mp.server.otel.disabled=true & otel.disabled=<unset>
    server_sdk_enabled=false & mp.server.otel.disabled=<unset> & otel.disabled=true
    server_sdk_enabled=false & mp.server.otel.disabled=<unset> & otel.disabled=<unset>
    server_sdk_enabled=true & mp.server.otel.disabled=<unset> & otel.disabled=false  ***

    = (server_sdk_enabled=<any> & (mp.server.otel.disabled=true|<unset> & otel.disabled=true|<unset>))
      | (server_sdk_enabled=true & (mp.server.otel.disabled=<unset> & otel.disabled=false))
    = (mp.server.otel.disabled=true|<unset> & otel.disabled=true|<unset>)
      | (server_sdk_enabled=true & (mp.server.otel.disabled=<unset> & otel.disabled=false))

runtime provides an SDK for this app's signals using properties prefixed with otel for the following combinations

    server_sdk_enabled=true & mp.server.otel.disabled=true & otel.disabled=false
    server_sdk_enabled=true & mp.server.otel.disabled=false & otel.disabled=false
    server_sdk_enabled=false & mp.server.otel.disabled=true & otel.disabled=false
    server_sdk_enabled=false & mp.server.otel.disabled=false & otel.disabled=false
    server_sdk_enabled=false & mp.server.otel.disabled=<unset> & otel.disabled=false
    = (server_sdk_enabled=<any> & (mp.server.otel.disabled=<any> & otel.disabled=false)) 
      & !(server_sdk_enabled=true & (mp.server.otel.disabled=<unset> & otel.disabled=false))
    = otel.disabled=false
      & !(server_sdk_enabled=true & (mp.server.otel.disabled=<unset> & otel.disabled=false))
    = otel.disabled=false & (!server_sdk_enabled=true | !(mp.server.otel.disabled=<unset> & otel.disabled=false)))
    = otel.disabled=false & (!server_sdk_enabled=true | !mp.server.otel.disabled=<unset> | !otel.disabled=false)
    = (otel.disabled=false & !server_sdk_enabled=true) | (otel.disabled=false & !mp.server.otel.disabled=<unset>) | (otel.disabled=false & !otel.disabled=false)
    = (otel.disabled=false & !server_sdk_enabled=true) | (otel.disabled=false & !mp.server.otel.disabled=<unset>)
    = otel.disabled=false & (server_sdk_enabled=false | !mp.server.otel.disabled=<unset>)

runtime provides a noop SDK for this app's signals for the following combinations

    server_sdk_enabled=true & mp.server.otel.disabled=false & otel.disabled=true
    server_sdk_enabled=true & mp.server.otel.disabled=false & otel.disabled=<unset>
    server_sdk_enabled=false & mp.server.otel.disabled=false & otel.disabled=true
    server_sdk_enabled=false & mp.server.otel.disabled=false & otel.disabled=<unset>
    = (server_sdk_enabled=<any> & (mp.server.otel.disabled=false & otel.disabled=true|<unset>))
    = mp.server.otel.disabled=false & otel.disabled=true|<unset>

Does this all make sense and seem accurate?

@Emily-Jiang
Copy link
Member

Thank you @pdudits @donbourne for the writeup! I have the following comments:

  1. The property mp.server.otel.* needs to be defined either environment variable or system properties. If it is defined in the microprofile-config.properties, they will be ignored.
  2. The property mp.server.otel* is NOT optional. It should work exactly same in multi-app vs. single app scenarion. Otherwise, the apps are not portable and might behave differently.

@donbourne
Copy link
Member Author

@Emily-Jiang ,

(1) -- I agree, any properties that will be paid attention to during server init (mp.server.otel.* or otel.*) have to be defined using env vars or system properties.

(2) -- I agree that mp.server.otel.* config properties should work exactly same in multi-app as single-app if the user configures them. I think the "optional-ness" is just that typically user of a single-app runtime wouldn't set any mp.server.otel.* properties. I'm not sure if that's how @pdudits meant it though.

@yasmin-aumeeruddy
Copy link
Contributor

I think this is clear and would be implementable. Thank you both for the write up!

@pdudits
Copy link
Contributor

pdudits commented May 13, 2024

One nuance that isn't called out is that we normally want there to only be one SDK instance created at the server level and shared with the app in the case where there is only 1 app. For that, during app startup, I think we need to check if there exists a server SDK instance, and if so, to share that if mp.server.otel.disabled is unset and otel.disabled=false.

There are more conditions when the server SDK cannot be shared -- if application provides any AutoConfigurationCustomizerProvider or ResourceProvider via service loading facilities, or other otel properties beyond otel.disabled (i.e. custom exporters). In such case the app has requirements that server SDK would cannot satisfy. I'd therefore formulate the requirement as "The Runtime MAY provide its global SDK to an application, when application's Telemetry configuration is equivalent to global one. Typically this happens when there is only single application deployed and it doesn't configure OpenTelemetry beyong specifying otel.disabled=false".

The property mp.server.otel* is NOT optional. It should work exactly same in multi-app vs. single app scenarion. Otherwise, the apps are not portable and might behave differently.

What I meant by optional was exactly what @donbourne says -- it is not required to specify the property for telemetry to work. But on the other hand the applications would behave exactly the same regardless of whether runtime would support mp.server.otel. What server setting controls is JVM metrics and logs.

@donbourne
Copy link
Member Author

donbourne commented May 14, 2024

Attempting to summarize the configuration story, from end user perspective:

  1. specify otel.* properties via env vars to configure a single OTel instance to be shared between runtime and all apps
  2. specify otel.* properties via microprofile-config.properties to configure a single app's OTel instance
  3. for servers that can run more than one app, specify mp.runtime.otel.* properties via env vars to configure an OTel instance for exclusive use by the runtime
  4. for servers that can run more than one app, combine 2/3 to configure separate OTel instances for each app and an OTel instance for exclusive use by the runtime.

(1) is the recommended configuration approach.
(2) is not recommended as it only collects app traces/metrics/logs (provided for backward compatibility)
(3) is not recommended as it only collects server traces/metrics/logs
(4) can be used when separate apps must have separate OTel configs, but is not the recommended approach as monitoring tools may make it difficult to correlate data between server and app OTel instances.

@tjquinno
Copy link

Nice, Don. Clear and concise.

I do have a couple concerns and a proposal that builds on Don's. And thanks for your patience; this note is not so concise!

Concern 1

When using runtimes that support one app, users should be able to employ any config source--microprofile-config.properties, env. vars, system properties, and custom config sources--to set the otel.* values which control the single OTel instance.

In such environments there is, as described by the MP Config spec, a single configuration applied to the entire JVM which contains both the runtime and the app. The system derives the single config from the available config sources as prescribed by MP Config. Users are accustomed to configuring the environment as a whole, regardless of which config sources they employ.

If the MP Telemetry spec were to specify special treatment to a particular config source--env vars--for a certain group of settings then that would be quite a break from the conceptual model from MP config which users are accustomed to.

Concern 2

In the multi-app server case, should the MP Telemetry spec really restrict the source of truth for runtime-level behavior in multi-app servers to just env vars?

What about the other config sources: system properties in particular but perhaps also custom config sources or even a microprofile-config.properties file visible during runtime start-up before any apps are in the picture?

It does seem a bit odd, given that we have MP Config, for MP Telemetry to propose a different regime for specifying config for this one portion of its behavior.

Having said that, I absolutely get that servers that support multiple apps might apply various specific config sources at different times--some at runtime start-up, some at application deployment and start-up. But ideally a MicroProfile spec will not dictate those choices/restrictions.

The proposal

Whatever the exact behavior is for a given multi-app server, presumably the server's documentation explains that behavior so users of that implementation know what to expect and how to use config appropriately.

Given that, I think the MP Telemetry spec could reasonably state something like this:

Assign telemetry-related settings using any supported MicroProfile Config config sources.

MicroProfile Telemetry suggests configuring a single OTel instance, shared between the runtime and all apps. To do so, specify otel.* settings in config sources available during server start-up.

Users with servers that support multiple applications should keep in mind how their chosen server deals with runtime-level vs. application-level configuration.

  • Be sure to configure the shared OTel instance in config sources available during server start-up.
  • To configure a single app's OTel instance individually: specify otel.* settings via config sources available during the application deployment and start-up.
  • To configure an OTel instance for exclusive use by the runtime: specify mp.runtime.otel.* settings in config sources available during server start-up.

(or words to that effect which might have to change if my assumptions about multi-app server config behavior are wrong)

The net effect might very well be the same as what Don's concise list describes.

But this way the MP Telemetry spec itself does not impose any restrictions, non-standard config behavior, or implementation requirements on servers. Instead, it relies on MP-standard technology (config) plus whatever documented behavior individual servers have chosen to adopt.

@donbourne
Copy link
Member Author

Previously, when I was referring to microprofile-config.properties and environment variables, I meant those as shorthand for any MP config source that is only visible to applications vs. any MP config source that is visible to both the server and the apps. I didn't realize that there are runtimes that can see microprofile-config.properties files during server start up (since those files are packaged in the .war).

@tjquinno , I like your suggested wording. I'd tweak a couple of things...

Assign telemetry-related settings using any supported MicroProfile Config config sources.

MicroProfile Telemetry suggests configuring a single OTel instance, shared between the runtime and all apps. To do so, specify otel.* settings in config sources available during both server and app start-up.

Users with servers that support multiple applications should keep in mind how their chosen server deals with runtime-level vs. application-level configuration.

  • Be sure to configure the shared OTel instance in config sources available during server start-up.
  • To configure a single app's OTel instance individually: specify otel.* settings via config sources only available during the application start-up.
  • To configure an OTel instance for exclusive use by the runtime: specify mp.runtime.otel.* settings in config sources available during server start-up.

@Emily-Jiang
Copy link
Member

Since we specify the mp.runtime.otel.* as environment variable or system properties, should we use MP_RUNTIME_OTEL_* to ensure they can be easily specified as environment variables as some OS does not allow dot for sure? The recommended approach is to use _ and uppercase. In this case, it is neater differentiation with app level properties.

@yasmin-aumeeruddy
Copy link
Contributor

I think we should state system property: mp.runtime.otel.* and environment variable: MP_RUNTIME_OTEL_* like otel do in their docs .

Would it be an issue if a user inconsistently sets the environment variable and system property? Does the system property take priority over the environment variables?

@Emily-Jiang
Copy link
Member

Emily-Jiang commented May 22, 2024

I think we should state system property: mp.runtime.otel.* and environment variable: MP_RUNTIME_OTEL_* like otel do in their docs .

Would it be an issue if a user inconsistently sets the environment variable and system property? Does the system property take priority over the environment variables?

Can you find out from OTEL config to see which one takes priority? My previous comment means there is no priority concerns by just using one config property. Personally, I think introducing two properties causes confusion. I would just stick to one as you can set MP_RUNTIME_OTEL_* as a system property. However, if most of you want to specify multiple properties, I am okay with that. The reason for the other properties using otel.*, as they can be easily specified as OTE_ because MP Config can find them.

@yasmin-aumeeruddy
Copy link
Contributor

yasmin-aumeeruddy commented May 22, 2024

The system properties have higher priority than envrionment variables. See docs for configuring-the-agent and specifying resource information via an environment variable. This is the same beahviour from MicroProfile Config.

@Emily-Jiang
Copy link
Member

Emily-Jiang commented May 22, 2024

The system properties have higher priority than envrionment variables. See docs for configuring-the-agent and specifying resource information via an environment variable. This is the same beahviour from MicroProfile Config.

Okay, thanks @yasmin-aumeeruddy ! Thought further your suggestions and took back what I said with one property name. Let's go with what you suggested specifying mp.runtime.otel.* in system properties and MP_RUNTIME_OTEL_* to be consistent with OTel config. However, we would not cater for the misconfiguration. We should just look at the mp.runtime.otel.* in system properties and MP_RUNTIME_OTEL_* in environment variables. Adopt the OTel priority rules on taking system properties over environment variables.

@tjquinno
Copy link

Re: Don's follow-up

Your edits to my suggested wording look good to me. Thanks for the upgrade.


Re: Yasmin and Emily's discussion of env. vars and properties

If the spec describes retrieving these settings using MP Config (as I suggested and as in Don's edit) then does the MP Telemetry spec really need to go into the details of environment variable and property formatting and ordering? The Telemetry spec could refer the reader to the MP Config spec which covers those aspects in detail.

(It's indeed fortunate that OTel and MP Config order env. vars and properties the same way. Less confusion for our users.)

@Emily-Jiang
Copy link
Member

Re: Don's follow-up

Your edits to my suggested wording look good to me. Thanks for the upgrade.

Re: Yasmin and Emily's discussion of env. vars and properties

If the spec describes retrieving these settings using MP Config (as I suggested and as in Don's edit) then does the MP Telemetry spec really need to go into the details of environment variable and property formatting and ordering? The Telemetry spec could refer the reader to the MP Config spec which covers those aspects in detail.

(It's indeed fortunate that OTel and MP Config order env. vars and properties the same way. Less confusion for our users.)

@tjquinno these properties won't be discovered by MP Config as these would be used before the app has started. MP Config is not available to help out.

@tjquinno
Copy link

@Emily-Jiang

these properties won't be discovered by MP Config as these would be used before the app has started. MP Config is not available to help out.

What you say is true only of servers that support multiple applications and only because of implementation choices made as those servers were written. Those choices might well be very sensible ones, but they are choices nonetheless.

For example, Helidon MP took a different approach. The Helidon-provided runtime code and the developer-provided app code are bundled as a single executable server JAR (plus dependencies). Helidon MP, and Helidon MP users, have embraced MP Config as the way of providing settings which control the behavior of the server as a whole, at the runtime level and at the application level.

Both approaches--and potentially many others--are valid and comply with the MicroProfile specs.

That said, no MP spec should include restrictions or requirements based on any single implementation approach.

In particular, the MP Telemetry spec should not mandate that runtime-level settings can only be made using environment variable or system properties because that interferes with the Helidon approach (and perhaps others).

Neither should the spec require that runtime settings be made in microprofile-config.properties because that interferes with the Liberty approach (and perhaps others).

If we need to revise the phrasing of the text Don and I suggested I'm sure we can do that. But we need to find a way of describing the required behavior without tying that behavior to how a particular implementation approach works. Specifically, whatever the MP Telemetry spec text turns out to be, it needs to allow Helidon users to use MP config for all settings and Liberty users to use env. vars and system properties for runtime settings.

@Emily-Jiang
Copy link
Member

@Emily-Jiang

these properties won't be discovered by MP Config as these would be used before the app has started. MP Config is not available to help out.

What you say is true only of servers that support multiple applications and only because of implementation choices made as those servers were written. Those choices might well be very sensible ones, but they are choices nonetheless.

For example, Helidon MP took a different approach. The Helidon-provided runtime code and the developer-provided app code are bundled as a single executable server JAR (plus dependencies). Helidon MP, and Helidon MP users, have embraced MP Config as the way of providing settings which control the behavior of the server as a whole, at the runtime level and at the application level.

Both approaches--and potentially many others--are valid and comply with the MicroProfile specs.

That said, no MP spec should include restrictions or requirements based on any single implementation approach.

In particular, the MP Telemetry spec should not mandate that runtime-level settings can only be made using environment variable or system properties because that interferes with the Helidon approach (and perhaps others).

Neither should the spec require that runtime settings be made in microprofile-config.properties because that interferes with the Liberty approach (and perhaps others).

If we need to revise the phrasing of the text Don and I suggested I'm sure we can do that. But we need to find a way of describing the required behavior without tying that behavior to how a particular implementation approach works. Specifically, whatever the MP Telemetry spec text turns out to be, it needs to allow Helidon users to use MP config for all settings and Liberty users to use env. vars and system properties for runtime settings.

I guess I typed too fast. I was talking about the implementation details. These properties are environment variables or system properties and they would be discovered by apps when apps are up for sure and they can be used by apps. What I meant is that when pushing out server metrics and logs at that time apps are not up, the server internals would have to query the variable values via System rather than MP Config.

@donbourne
Copy link
Member Author

@pdudits , on your earlier comment...

There are more conditions when the server SDK cannot be shared -- if application provides any AutoConfigurationCustomizerProvider or ResourceProvider via service loading facilities, or other otel properties beyond otel.disabled (i.e. custom exporters). In such case the app has requirements that server SDK would cannot satisfy. I'd therefore formulate the requirement as "The Runtime MAY provide its global SDK to an application, when application's Telemetry configuration is equivalent to global one. Typically this happens when there is only single application deployed and it doesn't configure OpenTelemetry beyong specifying otel.disabled=false".

... to generalize, I'd say we need to avoid sharing an otel sdk instance any time the properties used to create the server instance don't match the properties visible when requesting an app instance. That implies that impls will need to store the properties used to create the server instance for purpose of later comparison.

@donbourne
Copy link
Member Author

for people using above rules for their impl, we've noticed that these rows can't occur...

app shares central SDK with server for this app's signals for the following combinations

    server_sdk_enabled=true & mp.server.otel.disabled=true & otel.disabled=true
    server_sdk_enabled=true & mp.server.otel.disabled=true & otel.disabled=<unset>
    server_sdk_enabled=true & mp.server.otel.disabled=<unset> & otel.disabled=true
    server_sdk_enabled=true & mp.server.otel.disabled=<unset> & otel.disabled=<unset>

@Emily-Jiang
Copy link
Member

runtime level configs: mp.runtime.otel.sdk.disabled, MP_RUNTIME_OTEL_SDK_DISABLED.

@donbourne
Copy link
Member Author

I noticed a way to simplify a little bit -- whenever an app's otel settings are configured the same as the runtime's, the app will share the runtime's otel sdk instance rather than creating a new one. The configuration logic then becomes:

when runtime starts:

if any mp.runtime.otel.sdk.* props are visible
    runtime uses new instance (noop or real) created using the mp props 
else if any otel.sdk.* props are visible
    runtime uses new instance (noop or real) created using the otel props 
else 
    runtime uses new noop instance

when app starts:

if any otel.sdk.* props are visible
    if there is already a runtime instance which was defined with same props:
        app uses runtime instance
    else
        app uses new instance (noop or real) created using otel props
else
    app uses new noop instance

I'm planning to open a PR for this issue shortly to update the spec.

@donbourne
Copy link
Member Author

I've been trying to resolve a lot of issues with this configuration and have come to a new approach that I believe is far simpler and easier to implement, and I believe addresses the issues raised so far in this issue.

Here's what I'm proposing...


In order to enable any aspects of integration, the configuration otel.sdk.disabled=false MUST be specified.

If the otel.sdk.disabled=false configuration setting is visible to the runtime at initialization time then the runtime must provide a SINGLE OpenTelemetry SDK instance which MUST be used by the runtime and all applications.

If the otel.sdk.disabled=false configuration setting is only visible to the application(s) at initialization time, then runtime telemetry MUST be disabled and the application(s) MUST be configured using the visible otel.* properties.

At runtime initization time runtimes may use the OTEL_SDK_DISABLED environment variable or otel.sdk.disabled Java system property.

At application initialization time runtime MUST use configuration sources available via MicroProfile Config for configuration.


A few things to notice:

  • it's far simpler than the earlier proposals.
  • it completely does away with mp.runtime.otel.* concept. That had effect of splitting runtime and app telemetry, which isn't good.
  • by requiring the runtime instance to be the ONLY instance we do away with any need to compare settings between instances (which falls apart when you consider the possibility of services being loaded into different OTel instances by separate service loaders).

please let me know what you think. PR on its way :-)

@donbourne donbourne linked a pull request Jun 5, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants