Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netty 4.0 integration doesn't seem to work #410

Closed
htmldoug opened this issue Jul 27, 2018 · 13 comments
Closed

Netty 4.0 integration doesn't seem to work #410

htmldoug opened this issue Jul 27, 2018 · 13 comments
Milestone

Comments

@htmldoug
Copy link
Contributor

Summary

I've followed https://docs.datadoghq.com/tracing/setup/java/, but the Netty 4.0 Client instrumentation doesn't seem to work with Play 2.5 or even just AHC 2.0 by itself.

Expected result

I expect x-datadog-trace-id, x-datadog-parent-id headers attached to the outbound HTTP request, and a child span named netty.client.request.

I get neither.

Repro

I've replicated Netty40ClientTest in a new project: https://github.com/htmldoug/datadog-netty4-failing.

You can repro with sbt run.

Screenshots

no child span

wireshark

Relates to #352. cc: @tylerbenson @realark

@tylerbenson
Copy link
Contributor

@htmldoug Thanks for the report. We will look into it.

@tylerbenson
Copy link
Contributor

Hey @htmldoug,

I think I might have some ideas as to what is broken. I was using the all dependency for version analysis, which was too broad and resulted in some wrong analysis and causing the instrumentation to not apply. There was also some issues with classload ordering that we were just looking at before your report came in.

You can see my PR here: #411

I ran it against your sample app (thanks for providing that btw), and see the additional trace being reported, but without the relationship between the spans being captured. (Each span has a parent id of 0.) There might be some additional work required to figure out why that's not linking up properly. I tried converting it to an active span (spanBuilder.startActive(true)) and it still didn't work.

I added the following setting, so I was able to see the spans printed out to the logs:
"-Ddd.writer.type=LoggingWriter"

[error] [AsyncHttpClient-2-1] INFO datadog.trace.agent.common.writer.LoggingWriter - write(trace): [{"type":"http","error":0,"meta":{"http.status_code":"200","component":"netty-client","span.kind":"client","http.url":"http://www.example.com/","peer.hostname":"www.example.com","peer.port":"80","thread.name":"AsyncHttpClient-2-1","http.method":"GET","thread.id":"12","span.type":"http"},"metrics":{},"duration":247206435,"resource":"GET /","trace_id":3772310731995224786,"span_id":2025072365345260028,"name":"netty.client.request","service":"datadog_test_app","parent_id":0,"start":1532673208862226325}]
[error] [main] INFO datadog.trace.agent.common.writer.LoggingWriter - write(trace): [{"type":null,"error":0,"meta":{"thread.name":"main","thread.id":"1"},"metrics":{},"duration":737819770,"resource":"parent","trace_id":3079532397405656187,"span_id":4957654225470335532,"name":"parent","service":"datadog_test_app","parent_id":0,"start":1532673208440954595}]

@htmldoug
Copy link
Contributor Author

@tylerbenson thanks so much for the quick response, PR, and -Ddd.writer.type=LoggingWriter protip!

Does the PR CI publish a snapshot somewhere, or is there an easy way to build one locally? I'm less familiar with gradle and couldn't find build instructions in the repo. My naive attempt with ./gradlew publishToMavenLocal, didn't work.

@tylerbenson
Copy link
Contributor

It was funny, your report came in right as we were investigating it ourselves. We were trying to figure out why the netty instrumentation wasn't working with Spring Webflux. So we already had a few hours head start . 😉

@htmldoug you can try the jar built by CI: https://7229-89221572-gh.circle-artifacts.com/0/home/circleci/dd-trace-java/libs/dd-java-agent-0.12.0-SNAPSHOT.jar
For PR's, we save it as artifacts on the CI, but when it gets merged to master, there will be a snapshot published to the snapshot repo.

@htmldoug
Copy link
Contributor Author

I've updated https://github.com/htmldoug/datadog-netty4-failing, pulling in 0.12.0-SNAPSHOT and removing netty-all (you're right, it's not relevant to AHC). Also added logback.

I ran it against your sample app (thanks for providing that btw), and see the additional trace being reported, but without the relationship between the spans being captured.

Can't replicate. I still only see the single trace in LoggingWriter against 4c88e1a: https://gist.github.com/htmldoug/6663c069568a92fa7bcc5bf4c0db6be5.

@tylerbenson
Copy link
Contributor

@htmldoug it appears your project is pulling in the snapshot from the snapshot repository. We don't push branches to the snapshot repo. Only master is pushed. My fix hasn't been merged to master, so is only available in the CI artifact link I provided.

In other news, @mar-kolya has a fix for a related bug in #417 that is more generic and so will probably be merged instead of #411 which is what the build I linked to is from, so you can try your sample app as is when that gets merged to master.

@htmldoug
Copy link
Contributor Author

Derp. I mistakenly thought it was merged. I'll sit tight.

@tylerbenson
Copy link
Contributor

@htmldoug that PR has been merged to master. Can you try your app again with 0.13.0-SNAPSHOT?
Thanks!

@htmldoug
Copy link
Contributor Author

htmldoug commented Aug 7, 2018

Looking much better with 0.13.0-SNAPSHOT. Next issue is that it's not linking up all of the spans across different services. Need to investigate further before I can determine if it's related to this or not.

For example, these should all have Origins:
screen shot 2018-08-07 at 12 38 16 am

@tylerbenson
Copy link
Contributor

I think the first problem is that you're using span.start() (https://github.com/htmldoug/datadog-netty4-failing/blob/master/src/main/scala/com/rallyhealth/datadog/DatadogTestApp.scala#L18)
which creates a detached span, so any span created underneath it won't report it as the parent. When I was testing before, I changed that and was still not seeing them connected. I wasn't able to dig into it at that point though. Might be a problem with our scala propagation instrumentation.

@htmldoug
Copy link
Contributor Author

htmldoug commented Aug 7, 2018

Thanks for digging into the example app. I'm content that it works well enough.

The screenshot above is from our real apps, and that's where I'm going to focus the rest of my testing, at least until I can narrow the problem down to simpler repro steps.

I'm investigating this today and will keep you posted.

Might be a problem with our scala propagation instrumentation.

Having worked on a couple implementations of this myself, it's definitely quite tricky.

@htmldoug
Copy link
Contributor Author

htmldoug commented Aug 8, 2018

After investigating further, I think you're right that the scala propagation isn't making it across some internal async boundaries in our play 2.5 app. I'd expect to see a purple netty.request and play.request for the csedge-web play 2.5 service making an http request to the user-web play 2.5 service.

screen shot 2018-08-07 at 8 46 10 pm

Would you like for me to update https://github.com/htmldoug/datadog-netty4-failing to a play 2.5 app that reproduces the failure?

@htmldoug
Copy link
Contributor Author

htmldoug commented Aug 8, 2018

I've logged that as a new issue, #432. I think it's more of a play-integration issue than a netty issue. I'm pretty satisfied with the netty part. Thanks!

@htmldoug htmldoug closed this as completed Aug 8, 2018
@tylerbenson tylerbenson added this to the 0.13.0 milestone Aug 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants