Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAMEL-18661: Make span current and clear scope properly for async processing #8713

Merged
merged 11 commits into from
Dec 16, 2022

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Nov 11, 2022

Fixesr https://issues.apache.org/jira/browse/CAMEL-18661

!Looking for input on the overall approach!

This change enables current span propagation to underlying libraries and end-user-code (for OpenTelemetry):
for all spans created by Camel, it calls into span.makeCurrent()

However, OpenTelemetry (and other tracing tools) rely on ThreadLocal to propagate context. They do it carefully for Executors, Reactor, etc, so it works in async scenarios too.

Instrumentations have to close scope returned by span.makeCurrent() on the same thread where it was created to avoid leaking context (by leaving current span on current thread).

This is easy to guarantee with sync operations in Camel - ExchangeStarted/Sending and ExchangeCompleted/Sent events (where spans start and end) are called on the same thread.

However, for async operations, they are called on different threads.

This change also adds a new event - ExchangeAsyncStarted (naming needs more work), that notifies that Processor.process call has ended, i.e. async operation has started. Tracers would end the scope (but not span) when this event is sent.

Alternative approach:

add a dependency on camel-tracing to camel-base-engine and call into ActiveSpanManager directly:

try(AutoCloseable scope = ActiveSpanManager.makeSpanCurrent(exchange)) {
    sync = processor.process(exchange, async);
}

This would be easier to maintain and would more or less guarantee that scope is disposed on the same thread, however, dependency on tracing might not be desirable.

Traces with this change

(tick/testme spans come from Camel and were previously not correlated with ServiceBus spans coming from Azure SDK)

image

@github-actions
Copy link
Contributor

🌟 Thank you for your contribution to the Apache Camel project! 🌟

⚠️ Please note that the changes on this PR may be tested automatically.

If necessary Apache Camel Committers may access logs and test results in the job summaries!

@lmolkova
Copy link
Contributor Author

General question: is there a good place/way to wrap callback executions (user or 3rd party library) with something like

try(AutoCloseable scope = ActiveSpanManager.makeSpanCurrent(exchange)) {
    sync = processor.process(exchange, async);
}

@github-actions
Copy link
Contributor

Components tested:

Total Tested Failed ❌ Passed ✅
2 2 2 1

@davsclaus
Copy link
Contributor

Sorry for not giving this PR attention - however we have been buys (as usual).

Observability with open-telemtry and micrometer is important and thanks for helping with this.

@oscerd
Copy link
Contributor

oscerd commented Nov 26, 2022

I'm sorry, too. It was on my to-do list to review this and I forgot about it

@lmolkova
Copy link
Contributor Author

@davsclaus @oscerd thank you! Once you folks have time to take a look, please let me know what approach you'd like me to pursue nd I'm happy to work on it further.

@lmolkova
Copy link
Contributor Author

lmolkova commented Dec 6, 2022

@davsclaus @oscerd friendly ping to take a look. I'd be happy to continue working on it based on your initial feedback.

Copy link
Contributor

@oscerd oscerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found one typo. For placing the camel-tracing dependency in the camel-core-engine, I'm against polluting the core with a tracing component. This will probably create a circular dependency in the maven reactor. So we should find a different way.

@davsclaus
Copy link
Contributor

add a dependency on camel-tracing to camel-base-engine and call into ActiveSpanManager directly:

This is not possible, camel-base should be 100% independent on camel itself only.

@davsclaus
Copy link
Contributor

davsclaus commented Dec 8, 2022

Its actually the opposite

var res = producer.process(exchange, ac);

if the returned value == true then it was a sync call and its done

if the returned value == false, then its async call, and currently processing by another thread (done when the callback is triggered)

@lmolkova lmolkova marked this pull request as ready for review December 10, 2022 21:42
@lmolkova
Copy link
Contributor Author

@davsclaus @oscerd thank you for the feedback. I've implemented the current approach with the extra event and added tests. Can you please take another look?

@lmolkova lmolkova changed the title Prototype: make span current and clear scope Make span current and clear scope properly for async processing Dec 10, 2022
@lmolkova lmolkova force-pushed the otel-current-span branch 2 times, most recently from 40ddebf to afa146b Compare December 10, 2022 21:52
@bvahdat
Copy link
Member

bvahdat commented Dec 11, 2022

Hi @lmolkova

Thanks for your contribution and support to look into this issue. Looking into build.log, this is the reason why it has failed:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.10.1:testCompile (default-testCompile) on project camel-tracing: Compilation failure: Compilation failure: 
[ERROR] /home/runner/work/camel/camel/components/camel-tracing/src/test/java/org/apache/camel/tracing/ActiveSpanManagerTest.java:[45,23] error: cannot find symbol
[ERROR]   symbol:   method isCurrent()
[ERROR]   location: variable span of type SpanAdapter
[ERROR] /home/runner/work/camel/camel/components/camel-tracing/src/test/java/org/apache/camel/tracing/ActiveSpanManagerTest.java:[49,24] error: cannot find symbol
[ERROR]   symbol:   method isCurrent()
[ERROR]   location: variable span of type SpanAdapter
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.10.1:testCompile (default-testCompile) on project camel-tracing: Compilation failure
	at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:375)
	at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:351)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:215)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:171)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:163)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.mvndaemon.mvnd.builder.SmartBuilderImpl.buildProject(SmartBuilderImpl.java:178)
	at org.mvndaemon.mvnd.builder.SmartBuilderImpl$ProjectBuildTask.run(SmartBuilderImpl.java:198)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.maven.plugin.compiler.CompilationFailureException: Compilation failure
	at org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:1310)
	at org.apache.maven.plugin.compiler.TestCompilerMojo.execute(TestCompilerMojo.java:183)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
	at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:370)
	... 12 common frames omitted

Please note that you don't have access to the pipeline's generated build.log but I think you should be able to reproduce it locally.

@bvahdat bvahdat changed the title Make span current and clear scope properly for async processing CAMEL-18661: Make span current and clear scope properly for async processing Dec 12, 2022
@davsclaus
Copy link
Contributor

Using the event notifier to emit a "async processing" event is tricky as there are potentially more places this can happens in the core routing engine. However they all bubble up to the base-engine in the CamelInternalProcessor, especially for spans that would call external systems via "SendProcessor".

@lmolkova
Copy link
Contributor Author

@bvahdat thank you! it should be fixed now.

@davsclaus I removed notifications from everywhere except CamelInternalProcessor and yes, everything seems to be working.

Do you have in mind any scenarios I can run to validate to make sure I didn't miss something? I tried to cover what I can think of in CurrentSpanTests.

@lmolkova lmolkova force-pushed the otel-current-span branch 2 times, most recently from dc45e19 to 8989daf Compare December 12, 2022 19:53
@davsclaus
Copy link
Contributor

@lmolkova have you tested your latest code with the AWS servicebus to see if the diagram/span looks correct

@lmolkova
Copy link
Contributor Author

@davsclaus, yes, I did (with Azure ServiceBus ) and yes, it works fine and looks the same as in the description.

@github-actions
Copy link
Contributor

Components tested:

Total Tested Failed ❌ Passed ✅
9 9 1 9

@github-actions
Copy link
Contributor

Components tested:

Total Tested Failed ❌ Passed ✅
9 9 1 9

Copy link
Contributor

@davsclaus davsclaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to fix tests

@davsclaus
Copy link
Contributor

davsclaus commented Dec 15, 2022

It happens when its the JDK fork-join pool that are in use

pool-17-thread-4 -> Valid context: false
pool-17-thread-8 -> Valid context: false
pool-17-thread-2 -> Valid context: false
pool-17-thread-9 -> Valid context: false
pool-17-thread-1 -> Valid context: false
Camel (camel-8) thread #11 - Delay -> Valid context: false
ForkJoinPool.commonPool-worker-5 -> Valid context: true

Which is a JDK pool you use via

private static final Executor DELAYED = CompletableFuture.delayedExecutor(100L, TimeUnit.MILLISECONDS);

@lmolkova
Copy link
Contributor Author

thanks a lot, @davsclaus ! I was able to reproduce testContextDoesNotLeak locally with Java 11 (does not repro with 17 I used before) and working on fixing it. I'll try my best to fix it today.

@davsclaus
Copy link
Contributor

Thanks for looking into this. Yeah I only tried with JDK11, as I use that primary due to Camel v3 is still Java 11 based.

Could be that the JDK ForJoin pool behaves differently or something?

@bvahdat
Copy link
Member

bvahdat commented Dec 15, 2022

@lmolkova by any chance, could you please rebase on top of the main branch when you are done?

@davsclaus
Copy link
Contributor

Yeah in the mean time we have upgraded: upgrade opentelemetry to the version 1.21.0

@lmolkova
Copy link
Contributor Author

@bvahdat sure, I rebased!

I believe (again) that I fixed tests. There were two issues:

  1. We still need to clean the context in SendProcessor: since we have multiple nested spans, we need to clean scope after them. I think I found all the new event-related tests.
  2. Test was not correct: sometimes aggregation can be scheduled on the same thread on which outer span is still active - it's ok and if we had OTel executor instrumentation enabled, it'd populate this context on other threads too. Added check and description to the test. This behavior was different on Java 11 and Java 17 and affected Java 11 much more. It's stable now on both versions.

@github-actions
Copy link
Contributor

Components tested:

Total Tested Failed ❌ Passed ✅
3 3 1 3

@github-actions
Copy link
Contributor

Components tested:

Total Tested Failed ❌ Passed ✅
3 3 0 4

@davsclaus davsclaus merged commit 002c9ca into apache:main Dec 16, 2022
@bvahdat bvahdat mentioned this pull request Dec 16, 2022
@trask
Copy link

trask commented Dec 16, 2022

thanks @lmolkova and camel team!!

@lmolkova
Copy link
Contributor Author

Thank you all!

@oscerd
Copy link
Contributor

oscerd commented Dec 16, 2022

Opentelemetry support is important. We would like to improve it in the future. This is a really good work. Thanks to everyone involved.

jamesnetherton pushed a commit to jamesnetherton/camel that referenced this pull request Apr 24, 2023
…cessing (apache#8713)

* Prototype: make span current and clear scope

* cleanup

* tests and fixes

* cleanup and tests

* cleanup

* fix build

* Remove unnecessary notifications

* fix tests

* oops

* Fix tests on Java 11

* checkstyle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants