Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traces not being emitted for custom instrumentation #433

Closed
nhoughto opened this issue Aug 8, 2018 · 9 comments
Closed

Traces not being emitted for custom instrumentation #433

nhoughto opened this issue Aug 8, 2018 · 9 comments
Labels
tag: needs investigation Issues needing investigations type: bug

Comments

@nhoughto
Copy link

nhoughto commented Aug 8, 2018

Trying to get our APM going with some custom and agent woven instrumentation using a GlobalTracer, running in Java 8 on undertow servlet container. Our custom code is very similar to the example code:

...
        Tracer tracer = GlobalTracer.get();

        Scope scope = tracer.buildSpan("operation-name").startActive(true);
        try {
            scope.span().setTag(DDTags.SERVICE_NAME, "my-new-service");

            // The code you're tracing
            Thread.sleep(1000);

        // If you don't call close(), the span data will NOT make it to Datadog!
        } finally {
            scope.close();
        }
...

But these custom events never appear in the APM UI, or the logs via LoggingWriter. Digging into your tracing code, it appears write() is never being called for these traces in the expireReference method.

  private void expireReference() {
    final int count = pendingReferenceCount.decrementAndGet();
    if (count == 0) {
      write();
    }
    log.debug("traceId: {} -- Expired reference. count = {}", traceId, count);
  }

because the pendingReferenceCount is always greater than 1 for custom traces, but it is 0 for agent woven instrumentation. Its not clear to me what this counter is aiming to achieve or how / when it gets incremented. Is this style of tracing (custom and auto agent) supported? Are there any gotchas? From the docs it looks like I only have to make sure to call close() on the span and things should work, as per the code close() is definitely being called, but ends up in the expireReferences method in PendingTrace.java and never gets written out.

I can see the counter gets incremented in registerContinuation and registerSpan, and we are running in a servlet container so continuations are possible, I'm not sure what to try next.

@nhoughto
Copy link
Author

nhoughto commented Aug 8, 2018

An update to this, it appears if I disabled agent woven instrumentation via -Ddd.integrations.enabled=false, my custom instrumentation starts working as expected (can see it in the logging / UI).

When debugging the PendingTrace objects that aren't being written when both custom and agent tracing is enabled, I can see a number of spans in one trace, some custom and some agent derived (which is correct, JDBC queries underneath a custom API request span). So they appear to be interacting, maybe that is the problem?

@tylerbenson
Copy link
Contributor

Nice job digging into the details.

My guess is we have some fragile instrumentation that starts a span in one method, assumes it is the active span and rather than maintain reference, retrieves the current active span and finishes it. This would probably imply that the span you're creating is being finished twice.

It would be helpful to review the set of classes that are modified when you do have the instrumentation enabled. Set the log level to debug and search for lines like this in your log output:

[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.ScheduledThreadPoolExecutor -- null

You can also disable all the instrumentation as you did, but then selectively enable individual ones that you want to use until it stops working.

@nhoughto
Copy link
Author

nhoughto commented Aug 9, 2018

Ok i'll try that, also really hard to debug the agent as it appears to rewrite the class package from datadog.opentracing to datadog.trace.agent.ot or similar, so IntelliJ has no source during debug, had to manually download the source and repackage all the classes 😢, maybe something to simplify if you want issue debugging to be easier for people =) maybe add both to the agent src jar

@nhoughto
Copy link
Author

nhoughto commented Aug 9, 2018

Ok got them

[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.ScheduledThreadPoolExecutor -- null
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.ThreadPoolExecutor -- null
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.AbstractExecutorService -- null
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.Executors$FinalizableDelegatedExecutorService -- null
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.Executors$DelegatedExecutorService -- null
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask -- null
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.FutureTask -- null
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class javax.management.NotificationBroadcasterSupport$1 -- null
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class io.undertow.servlet.handlers.DefaultServlet -- sun.misc.Launcher$AppClassLoader@7e72c127
[Camel Thread #0 - LRUCacheFactory] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.ForkJoinPool -- null
[Camel Thread #0 - LRUCacheFactory] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.ForkJoinTask -- null
[Camel Thread #0 - LRUCacheFactory] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.ForkJoinTask$RunnableExecuteAction -- null
[Camel Thread #0 - LRUCacheFactory] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class java.util.concurrent.ForkJoinTask$AdaptedCallable -- null
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class org.apache.camel.component.jetty.CamelContinuationServlet -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class org.apache.cxf.transport.servlet.CXFServlet -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class org.apache.cxf.transport.servlet.CXFNonSpringServlet -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class org.apache.cxf.transport.servlet.AbstractHTTPServlet -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class org.apache.cxf.transport.servlet.servicelist.ServiceListGeneratorServlet -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class org.postgresql.jdbc.PgConnection -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class org.postgresql.jdbc.PgPreparedStatement -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class org.postgresql.jdbc.PgStatement -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class com.zaxxer.hikari.pool.ProxyConnection -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class com.zaxxer.hikari.pool.HikariProxyConnection -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class com.zaxxer.hikari.pool.HikariProxyPreparedStatement -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class com.zaxxer.hikari.pool.ProxyPreparedStatement -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class com.zaxxer.hikari.pool.ProxyStatement -- sun.misc.Launcher$AppClassLoader@7e72c127
[main] DEBUG datadog.trace.agent.tooling.AgentInstaller$LoggingListener - Transformed class com.zaxxer.hikari.pool.HikariProxyStatement -- sun.misc.Launcher$AppClassLoader@7e72c127

@tylerbenson
Copy link
Contributor

@nhoughto great, thanks. If you want to build the agent locally, there's a gradle flag to disable the shadow package rename.

Were you able to narrow down at all which integration might be causing the problems? Here are some to try:

-Ddd.integration.java_concurrent.enabled=false
-Ddd.integration.servlet.enabled=false
-Ddd.integration.jdbc.enabled=false

(Those seem to be the only integrations that are triggering. Maybe start by re-enabling everything and disable those three and see if it still works.)

One last piece of info... Could you provide a stacktrace for the point at which you call scope.close()?

Also, any details you could provide as to what might be "underneath" or called by // The code you're tracing.

Thanks!

@nhoughto
Copy link
Author

So the custom tracing works until the JDBC integration is added (the disabling of it is removed 🤔 ).
So its some interaction between JDBC tracing and custom instrumentation, which makes sense i suppose since I can see JDBC spans as part of the PendingTrace object when checking that counter I reference earlier in the issue. Bit of a surprise though as i'd expect JDBC instrumentation to be quite easy since it is only synchronous, no funny business with callbacks..

close() call stack is :

java.lang.Exception: test
	at com.bsycorp.common.camel.ServiceProcessor.lambda$process$0(ServiceProcessor.java:134)
	at com.bsycorp.common.camel.ServiceProcessor$$Lambda$166.000000005809D3C0.done(Unknown Source)
	at org.apache.camel.processor.CamelInternalProcessor$InternalCallback.done(CamelInternalProcessor.java:262)
	at org.apache.camel.impl.MDCUnitOfWork$MDCCallback.done(MDCUnitOfWork.java:242)
	at com.bsycorp.common.unitofwork.MessageHeaderAsyncCallback.done(MessageHeaderAsyncCallback.java:53)
	at org.apache.camel.processor.Pipeline$1.done(Pipeline.java:166)
	at org.apache.camel.processor.CamelInternalProcessor$InternalCallback.done(CamelInternalProcessor.java:262)
	at org.apache.camel.impl.MDCUnitOfWork$MDCCallback.done(MDCUnitOfWork.java:242)
	at com.bsycorp.common.unitofwork.MessageHeaderAsyncCallback.done(MessageHeaderAsyncCallback.java:53)
	at org.apache.camel.processor.RedeliveryErrorHandler$2.done(RedeliveryErrorHandler.java:560)
	at org.apache.camel.processor.SendProcessor$1.done(SendProcessor.java:160)
	at com.bsycorp.common.camel.ServiceProducer.lambda$process$0(ServiceProducer.java:89)
	at com.bsycorp.common.camel.ServiceProducer$$Lambda$163.000000005809C4E0.done(Unknown Source)
	at org.apache.camel.component.directvm.DirectVmProducer.lambda$process$2(DirectVmProducer.java:84)
	at org.apache.camel.component.directvm.DirectVmProducer$$Lambda$164.000000005809CAB0.done(Unknown Source)
	at org.apache.camel.component.directvm.DirectVmProcessor$1.done(DirectVmProcessor.java:70)
	at com.bsycorp.common.camel.ServiceProcessor.lambda$process$0(ServiceProcessor.java:139)
	at com.bsycorp.common.camel.ServiceProcessor$$Lambda$166.000000005809D3C0.done(Unknown Source)
	at org.apache.camel.processor.CamelInternalProcessor$InternalCallback.done(CamelInternalProcessor.java:262)
	at org.apache.camel.impl.MDCUnitOfWork$MDCCallback.done(MDCUnitOfWork.java:242)
	at com.bsycorp.common.unitofwork.MessageHeaderAsyncCallback.done(MessageHeaderAsyncCallback.java:53)
	at org.apache.camel.processor.Pipeline$1.done(Pipeline.java:166)
	at org.apache.camel.processor.CamelInternalProcessor$InternalCallback.done(CamelInternalProcessor.java:262)
	at org.apache.camel.impl.MDCUnitOfWork$MDCCallback.done(MDCUnitOfWork.java:242)
	at com.bsycorp.common.unitofwork.MessageHeaderAsyncCallback.done(MessageHeaderAsyncCallback.java:53)
	at org.apache.camel.processor.RedeliveryErrorHandler$2.done(RedeliveryErrorHandler.java:560)
	at org.apache.camel.processor.Enricher$1.done(Enricher.java:237)
	at org.apache.camel.processor.CamelInternalProcessor$InternalCallback.done(CamelInternalProcessor.java:262)
	at org.apache.camel.impl.MDCUnitOfWork$MDCCallback.done(MDCUnitOfWork.java:242)
	at com.bsycorp.common.unitofwork.MessageHeaderAsyncCallback.done(MessageHeaderAsyncCallback.java:53)
	at org.apache.camel.processor.Pipeline$1.done(Pipeline.java:166)
	at org.apache.camel.processor.CamelInternalProcessor$InternalCallback.done(CamelInternalProcessor.java:262)
	at org.apache.camel.impl.MDCUnitOfWork$MDCCallback.done(MDCUnitOfWork.java:242)
	at com.bsycorp.common.unitofwork.MessageHeaderAsyncCallback.done(MessageHeaderAsyncCallback.java:53)
	at org.apache.camel.processor.RedeliveryErrorHandler$2.done(RedeliveryErrorHandler.java:560)
	at org.apache.camel.processor.SendProcessor$1.done(SendProcessor.java:160)
	at com.bsycorp.common.camel.ServiceProducer.lambda$process$2(ServiceProducer.java:113)
	at com.bsycorp.common.camel.ServiceProducer$$Lambda$169.00000000580AA050.done(Unknown Source)
	at org.apache.camel.component.ahc.AhcProducer$AhcAsyncHandler.onCompleted(AhcProducer.java:109)
	at org.apache.camel.component.ahc.AhcProducer$AhcAsyncHandler.onCompleted(AhcProducer.java:67)
	at org.asynchttpclient.netty.NettyResponseFuture.loadContent(NettyResponseFuture.java:223)
	at org.asynchttpclient.netty.NettyResponseFuture.done(NettyResponseFuture.java:258)
	at org.asynchttpclient.netty.handler.AsyncHttpClientHandler.finishUpdate(AsyncHttpClientHandler.java:239)
	at org.asynchttpclient.netty.handler.HttpHandler.handleChunk(HttpHandler.java:113)
	at org.asynchttpclient.netty.handler.HttpHandler.handleRead(HttpHandler.java:142)
	at org.asynchttpclient.netty.handler.AsyncHttpClientHandler.channelRead(AsyncHttpClientHandler.java:76)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1414)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:945)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:146)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:812)

@tylerbenson tylerbenson added type: bug tag: needs investigation Issues needing investigations labels Aug 11, 2018
@tylerbenson
Copy link
Contributor

@nhoughto I can't reproduce this and I can't figure out what is causing the problem. Perhaps you could provide a small sample app that demonstrates the issue?

Thanks!

@nhoughto
Copy link
Author

nhoughto commented Sep 5, 2018

Repro test could be hard, will need to unpick it from the rest of the app 😢

I'll put it on the list and see how hard it is, we are blocked by #429 anyway atm.

@randomanderson
Copy link
Contributor

Closing as there has been no activity on this issue in over a year. Please create a new issue if the problem still persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag: needs investigation Issues needing investigations type: bug
Projects
None yet
Development

No branches or pull requests

3 participants