Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More "intelligent" sampler option(s) #56

Closed
stnor opened this issue May 14, 2021 · 8 comments
Closed

More "intelligent" sampler option(s) #56

stnor opened this issue May 14, 2021 · 8 comments

Comments

@stnor
Copy link

stnor commented May 14, 2021

Is your feature request related to a problem? Please describe.
We've just migrated from AppD (javaagent) to X-Ray using the OTEL agent, v1.1. We're currently using traceidratio sampling and a 1/200 ratio.

This causes high frequency service calls to be over-represented in the sample set and low frequency services to be under-represented, or simply not present at all.

Describe the solution you'd like
I'd like a more sophisticated sampler that will collect data from all spans every time period, regardless of traffic.

See jaegertracing/jaeger#365 for problem description and possible inspiration.

It seems like a common and basic need for instrumentation? Is anything planned upstream?

I should also add that we're running a monolitic application, so we can't configure each service with a different sample rate, as they are running in the same JVM / javaagent. But even in a micro service architecture, this problem would exist.
We do use Spring AOP to create custom spans for the Spring @Service business services.

I'd rather not have to roll my own OTEL java agent to address this issue... Are there any ways to address this from user-space?

@stnor
Copy link
Author

stnor commented May 15, 2021

Ok, I found some nice skeletons in the Jaeger extensions upstream (in the java-sdk). I built my own agent by adding my own classes to your project but I cannot get the SPI TracerProviderConfiguration.configureSampler() to pick up my SampleProvider.

spiSamplers is length 0.

What am I missing?

public class HelloSamplerProvider implements ConfigurableSamplerProvider {

    /**
     * Returns a {@link Sampler} that can be registered to OpenTelemetry by providing the property
     * value specified by {@link #getName()}.
     */
    @Override
    public Sampler createSampler(ConfigProperties config) {
        return new RateLimitingSampler(1);
    }

    /**
     * Returns the name of this sampler, which can be specified with the {@code otel.traces.sampler}
     * property to enable it. The name returned should NOT be the same as any other exporter name. If
     * the name does conflict with another exporter name, the resulting behavior is undefined and it
     * is explicitly unspecified which exporter will actually be used.
     */
    public String getName() {
        return "hello";
    }
}
$ jar tvf aws-opentelemetry-agent-1.2.0-SNAPSHOT.jar |grep HelloSampler
   913 Sun May 16 01:25:18 CEST 2021 inst/software/amazon/opentelemetry/javaagent/providers/HelloSamplerProvider.classdata

@stnor
Copy link
Author

stnor commented May 16, 2021

I will continue trying rolling my own agent for now, as the current one doesnt work for us... Is the plan to support X-Ray sampling rules any time soon? Ping @anuraaga

@stnor
Copy link
Author

stnor commented May 16, 2021

OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
***class software.amazon.opentelemetry.javaagent.bootstrap.HelloSamplerProvider <----- printing the classname from agentmain
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
[opentelemetry.auto.trace 2021-05-16 10:33:07:973 +0200] [main] INFO io.opentelemetry.javaagent.tooling.VersionLogger - opentelemetry-javaagent - version: 1.2.0-aws-SNAPSHOT
ERROR io.opentelemetry.javaagent.OpenTelemetryAgent
java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at io.opentelemetry.javaagent.OpenTelemetryAgent.agentmain(OpenTelemetryAgent.java:64)
	at software.amazon.opentelemetry.javaagent.bootstrap.AwsAgentBootstrap.agentmain(AwsAgentBootstrap.java:29)
	at software.amazon.opentelemetry.javaagent.bootstrap.AwsAgentBootstrap.premain(AwsAgentBootstrap.java:24)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:513)
	at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:525)
Caused by: java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at io.opentelemetry.javaagent.bootstrap.AgentInitializer.startAgent(AgentInitializer.java:44)
	at io.opentelemetry.javaagent.bootstrap.AgentInitializer.initialize(AgentInitializer.java:30)
	... 13 more
Caused by: io.opentelemetry.sdk.autoconfigure.ConfigurationException: Unrecognized value for otel.traces.sampler: hello

What am I missing?

Code at https://github.com/stnor/aws-otel-java-instrumentation/tree/main/otelagent/src/main/java/software/amazon/opentelemetry/javaagent/bootstrap

@anuraaga
Copy link
Contributor

Hi @stnor

For the issue of the agent, I didn't confirm myself but believe the problem is putting the code in the otelagent folder, you should put it in the awsagentprovider folder. otelagent is just the initialization of the agent, which creates a separate ClassLoader with the actual SDK, and that includes what's in awsagentprovider. That is controlled by this line

https://github.com/stnor/aws-otel-java-instrumentation/blob/main/otelagent/build.gradle.kts#L58

As for sampling in general, its indeed an area OTel still needs to spec out much better. Out of curiosity, can you clarify your priority of these aspects

  1. Rate limiting, leaky bucket sampler to allow full sampling for lower QPS services, as opposed to random sampling which is always random
  2. Per-operation sampling where different HTTP paths for example have different sampling properties
  3. Remote sampling configuration, where you can update properties on a server and have that automatically reflected on running servers without restart

We will want to get some issues filed in opentelemetry-specification based on what can help and the priorities. In the meantime, I think you having full control by writing a Sampler is a good idea and hopefully my suggestion unblocks you. Having some real code based on what works well for you will help frame discussions in the spec even better.

@anuraaga
Copy link
Contributor

Ah also just noticed the repo you linked seems to be missing the META-INF/services file to register the sampler provider as an SPI implementation.

@stnor
Copy link
Author

stnor commented May 17, 2021

Hi @anuraaga

Thanks for the feedback, and for taking the time to look at this. I will revisit the code later today.

As for my priorities; Remote sampling configuration is not important at all. The ability to change configuration in runtime isn't important for us ever.

Per operation-configuration is my top prio as things stand, but I think that wouldn't be needed (I think) if the sampler would try to meet a budget with a lower and upper bound of traces per time unit per operation automatically. But I am not a statistician.

AppDynamics keeps track of typical invocation response times per operation and samples (more) based on deviations, which is very useful, https://docs.appdynamics.com/display/PRO21/Diagnostic+Sessions

@stnor
Copy link
Author

stnor commented May 17, 2021

Got the sampler registration to work now. Thanks. Did a reset, and created a new branch for this work in my fork. https://github.com/stnor/aws-otel-java-instrumentation/tree/nomp-sampler

@stnor
Copy link
Author

stnor commented May 17, 2021

I will try this simple hack to start with. When testing my code, I realised I should be using a parent based sampler.
https://github.com/stnor/aws-otel-java-instrumentation/blob/nomp-sampler/awsagentprovider/src/main/java/se/nomp/instrumentation/otel/providers/NompParentBasedSampler.java

@stnor stnor closed this as completed May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants