Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine agent URL version on first upload call #1236

Merged
merged 8 commits into from
Feb 21, 2020

Conversation

mar-kolya
Copy link
Contributor

This should remove http request from critical path during app load

This should remove http request from critical path during app load
@mar-kolya mar-kolya requested a review from a team as a code owner February 20, 2020 12:24
@@ -128,6 +116,10 @@ Response sendTraces(final List<List<DDSpan>> traces) {

Response sendSerializedTraces(
final int representativeCount, final Integer sizeInBytes, final List<byte[]> traces) {
if (tracesUrl == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to make tracesUrl volatile or better yet do the if (tracesUrl == null) check inside the synchronized detectEndpoint() method which would then be called unconditionally from here.
Otherwise you are risking data races.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
Looks like I can just add another null check under synchronized block. With that after we have detected url there should be no contention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, this would work without much coordination, since each thread would eventually make the null -> non-null transition. This is how string hashCode works.

volatile alone will also have a race, but be a bit more clear. On x86, it will compile to basically the same thing anyway.

In general, I'd like us to avoid adding synchronized blocks, since there's often a better way to achieve the same thing. In this case, I don't think we really need synchronized. There is a chance that two threads will make a network call, but reducing coordination overhead in the long run is probably a better choice overall.

But the real test is to measure the start-up and then we'll need to monitor throughput in perf env.

@@ -298,6 +290,16 @@ private static HttpUrl getUrl(final String host, final int port, final String en
}
}

private synchronized void detectEndpoint() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not for this PR, but a better overall approach might be to eliminate the endpoint sniffing as separate step.

We could simply wait until the first send. If the first send fails with a 404, then we fallback to v3. Then we remember whichever one succeeded.

This would require a fair amount of changes to the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but it feels like it may not be worth the effort overall.

@dougqh
Copy link
Contributor

dougqh commented Feb 20, 2020

Do you happen to have any numbers showing if this improved start-up?

From looking at our flame graphs, I think part of the problem is the construction of the OkHttpClient in the constructor of DDAgentApi. We might need to make that lazy as well.

Although, if this is showing gains, we show go ahead and land this and then try the lazy construction of the OkHttpClient in a separate PR.

@tylerbenson tylerbenson added the tag: performance Performance related changes label Feb 21, 2020
Copy link
Contributor

@tylerbenson tylerbenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the description with the perf difference with this change.

@dougqh
Copy link
Contributor

dougqh commented Feb 21, 2020

Please update the description with the perf difference with this change.

I did a bit of data gathering with this change on a local integration branch -- both with the agent up & down. On Spring PetClinic, I observed about a 3% (0.2sec of 7sec overhead) improvement on start-up.

@mar-kolya mar-kolya merged commit e92d326 into master Feb 21, 2020
@mar-kolya mar-kolya deleted the mar-kolya/determine-agent-url-on-first-call branch February 21, 2020 12:04
@randomanderson randomanderson added this to the 0.44.0 milestone Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag: performance Performance related changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants