-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine agent URL version on first upload call #1236
Determine agent URL version on first upload call #1236
Conversation
This should remove http request from critical path during app load
@@ -128,6 +116,10 @@ Response sendTraces(final List<List<DDSpan>> traces) { | |||
|
|||
Response sendSerializedTraces( | |||
final int representativeCount, final Integer sizeInBytes, final List<byte[]> traces) { | |||
if (tracesUrl == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need to make tracesUrl
volatile or better yet do the if (tracesUrl == null)
check inside the synchronized detectEndpoint()
method which would then be called unconditionally from here.
Otherwise you are risking data races.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Looks like I can just add another null check under synchronized block. With that after we have detected url there should be no contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, this would work without much coordination, since each thread would eventually make the null -> non-null transition. This is how string hashCode works.
volatile alone will also have a race, but be a bit more clear. On x86, it will compile to basically the same thing anyway.
In general, I'd like us to avoid adding synchronized blocks, since there's often a better way to achieve the same thing. In this case, I don't think we really need synchronized. There is a chance that two threads will make a network call, but reducing coordination overhead in the long run is probably a better choice overall.
But the real test is to measure the start-up and then we'll need to monitor throughput in perf env.
dd-trace-ot/src/main/java/datadog/trace/common/writer/ddagent/DDAgentApi.java
Outdated
Show resolved
Hide resolved
@@ -298,6 +290,16 @@ private static HttpUrl getUrl(final String host, final int port, final String en | |||
} | |||
} | |||
|
|||
private synchronized void detectEndpoint() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not for this PR, but a better overall approach might be to eliminate the endpoint sniffing as separate step.
We could simply wait until the first send. If the first send fails with a 404, then we fallback to v3. Then we remember whichever one succeeded.
This would require a fair amount of changes to the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but it feels like it may not be worth the effort overall.
Do you happen to have any numbers showing if this improved start-up? From looking at our flame graphs, I think part of the problem is the construction of the OkHttpClient in the constructor of DDAgentApi. We might need to make that lazy as well. Although, if this is showing gains, we show go ahead and land this and then try the lazy construction of the OkHttpClient in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the description with the perf difference with this change.
I did a bit of data gathering with this change on a local integration branch -- both with the agent up & down. On Spring PetClinic, I observed about a 3% (0.2sec of 7sec overhead) improvement on start-up. |
This should remove http request from critical path during app load