-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DSM support to SQS v1 #6259
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 48 metrics, 6 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.27.0-SNAPSHOT~c62a1128b2, baseline=1.26.0-SNAPSHOT~ad2dbbf866
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1049514
Total [baseline] (9.325 s) : 0, 9324835
Agent [candidate] (1.048 s) : 0, 1048197
Total [candidate] (9.375 s) : 0, 9375162
section appsec
Agent [baseline] (1.145 s) : 0, 1145450
Total [baseline] (9.443 s) : 0, 9442777
Agent [candidate] (1.144 s) : 0, 1144342
Total [candidate] (9.433 s) : 0, 9432557
section iast
Agent [baseline] (1.171 s) : 0, 1171449
Total [baseline] (9.565 s) : 0, 9565266
Agent [candidate] (1.167 s) : 0, 1167183
Total [candidate] (9.562 s) : 0, 9561646
section profiling
Agent [baseline] (1.247 s) : 0, 1246741
Total [baseline] (9.614 s) : 0, 9614204
Agent [candidate] (1.247 s) : 0, 1246553
Total [candidate] (9.641 s) : 0, 9640906
gantt
title petclinic - break down per module: candidate=1.27.0-SNAPSHOT~c62a1128b2, baseline=1.26.0-SNAPSHOT~ad2dbbf866
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (648.811 ms) : 0, 648811
BytebuddyAgent [candidate] (649.297 ms) : 0, 649297
GlobalTracer [baseline] (306.883 ms) : 0, 306883
GlobalTracer [candidate] (306.139 ms) : 0, 306139
AppSec [baseline] (51.551 ms) : 0, 51551
AppSec [candidate] (50.63 ms) : 0, 50630
Remote Config [baseline] (681.638 µs) : 0, 682
Remote Config [candidate] (680.382 µs) : 0, 680
Telemetry [baseline] (7.166 ms) : 0, 7166
Telemetry [candidate] (7.283 ms) : 0, 7283
section appsec
BytebuddyAgent [baseline] (648.31 ms) : 0, 648310
BytebuddyAgent [candidate] (648.072 ms) : 0, 648072
GlobalTracer [baseline] (306.129 ms) : 0, 306129
GlobalTracer [candidate] (305.641 ms) : 0, 305641
AppSec [baseline] (149.171 ms) : 0, 149171
AppSec [candidate] (148.864 ms) : 0, 148864
Remote Config [baseline] (644.079 µs) : 0, 644
Remote Config [candidate] (643.768 µs) : 0, 644
Telemetry [baseline] (6.848 ms) : 0, 6848
Telemetry [candidate] (6.866 ms) : 0, 6866
section iast
BytebuddyAgent [baseline] (773.667 ms) : 0, 773667
BytebuddyAgent [candidate] (769.618 ms) : 0, 769618
GlobalTracer [baseline] (286.374 ms) : 0, 286374
GlobalTracer [candidate] (285.628 ms) : 0, 285628
AppSec [baseline] (49.054 ms) : 0, 49054
AppSec [candidate] (49.371 ms) : 0, 49371
Remote Config [baseline] (628.101 µs) : 0, 628
Remote Config [candidate] (616.38 µs) : 0, 616
Telemetry [baseline] (6.552 ms) : 0, 6552
Telemetry [candidate] (7.245 ms) : 0, 7245
IAST [baseline] (20.652 ms) : 0, 20652
IAST [candidate] (20.42 ms) : 0, 20420
section profiling
BytebuddyAgent [baseline] (659.894 ms) : 0, 659894
BytebuddyAgent [candidate] (658.943 ms) : 0, 658943
GlobalTracer [baseline] (378.163 ms) : 0, 378163
GlobalTracer [candidate] (378.325 ms) : 0, 378325
AppSec [baseline] (51.512 ms) : 0, 51512
AppSec [candidate] (51.055 ms) : 0, 51055
Remote Config [baseline] (683.072 µs) : 0, 683
Remote Config [candidate] (666.294 µs) : 0, 666
Telemetry [baseline] (7.441 ms) : 0, 7441
Telemetry [candidate] (7.44 ms) : 0, 7440
ProfilingAgent [baseline] (94.854 ms) : 0, 94854
ProfilingAgent [candidate] (95.805 ms) : 0, 95805
Profiling [baseline] (94.879 ms) : 0, 94879
Profiling [candidate] (95.831 ms) : 0, 95831
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.27.0-SNAPSHOT~c62a1128b2, baseline=1.26.0-SNAPSHOT~ad2dbbf866
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.044 s) : 0, 1043957
Total [baseline] (8.704 s) : 0, 8704049
Agent [candidate] (1.063 s) : 0, 1062718
Total [candidate] (8.766 s) : 0, 8765991
section iast
Agent [baseline] (1.162 s) : 0, 1161528
Total [baseline] (9.278 s) : 0, 9278466
Agent [candidate] (1.174 s) : 0, 1174421
Total [candidate] (9.271 s) : 0, 9271204
section iast_TELEMETRY_OFF
Agent [baseline] (1.172 s) : 0, 1171859
Total [baseline] (9.299 s) : 0, 9299443
Agent [candidate] (1.159 s) : 0, 1159165
Total [candidate] (9.276 s) : 0, 9275973
gantt
title insecure-bank - break down per module: candidate=1.27.0-SNAPSHOT~c62a1128b2, baseline=1.26.0-SNAPSHOT~ad2dbbf866
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (645.824 ms) : 0, 645824
BytebuddyAgent [candidate] (658.119 ms) : 0, 658119
GlobalTracer [baseline] (305.288 ms) : 0, 305288
GlobalTracer [candidate] (310.235 ms) : 0, 310235
AppSec [baseline] (50.911 ms) : 0, 50911
AppSec [candidate] (51.64 ms) : 0, 51640
Remote Config [baseline] (670.126 µs) : 0, 670
Remote Config [candidate] (690.815 µs) : 0, 691
Telemetry [baseline] (7.111 ms) : 0, 7111
Telemetry [candidate] (7.369 ms) : 0, 7369
section iast
BytebuddyAgent [baseline] (766.56 ms) : 0, 766560
BytebuddyAgent [candidate] (775.069 ms) : 0, 775069
GlobalTracer [baseline] (283.941 ms) : 0, 283941
GlobalTracer [candidate] (286.795 ms) : 0, 286795
AppSec [baseline] (48.464 ms) : 0, 48464
AppSec [candidate] (49.524 ms) : 0, 49524
Remote Config [baseline] (633.142 µs) : 0, 633
Remote Config [candidate] (610.747 µs) : 0, 611
Telemetry [baseline] (7.983 ms) : 0, 7983
Telemetry [candidate] (6.513 ms) : 0, 6513
IAST [baseline] (19.626 ms) : 0, 19626
IAST [candidate] (21.383 ms) : 0, 21383
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (769.283 ms) : 0, 769283
BytebuddyAgent [candidate] (761.535 ms) : 0, 761535
GlobalTracer [baseline] (290.384 ms) : 0, 290384
GlobalTracer [candidate] (285.396 ms) : 0, 285396
AppSec [baseline] (50.018 ms) : 0, 50018
AppSec [candidate] (49.51 ms) : 0, 49510
Remote Config [baseline] (632.144 µs) : 0, 632
Remote Config [candidate] (631.283 µs) : 0, 631
Telemetry [baseline] (7.524 ms) : 0, 7524
Telemetry [candidate] (7.535 ms) : 0, 7535
IAST [baseline] (19.425 ms) : 0, 19425
IAST [candidate] (20.251 ms) : 0, 20251
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 7 metrics, 14 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.27.0-SNAPSHOT~c62a1128b2, baseline=1.26.0-SNAPSHOT~ad2dbbf866
dateFormat X
axisFormat %s
section baseline
no_agent (1.375 ms) : 1355, 1394
. : milestone, 1375,
appsec (1.736 ms) : 1711, 1761
. : milestone, 1736,
iast (1.494 ms) : 1470, 1518
. : milestone, 1494,
profiling (1.574 ms) : 1548, 1600
. : milestone, 1574,
tracing (1.473 ms) : 1448, 1497
. : milestone, 1473,
section candidate
no_agent (1.353 ms) : 1334, 1372
. : milestone, 1353,
appsec (1.753 ms) : 1727, 1779
. : milestone, 1753,
iast (1.5 ms) : 1476, 1525
. : milestone, 1500,
profiling (1.506 ms) : 1481, 1532
. : milestone, 1506,
tracing (1.503 ms) : 1478, 1527
. : milestone, 1503,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.27.0-SNAPSHOT~c62a1128b2, baseline=1.26.0-SNAPSHOT~ad2dbbf866
dateFormat X
axisFormat %s
section baseline
no_agent (361.614 µs) : 341, 382
. : milestone, 362,
iast (478.651 µs) : 458, 499
. : milestone, 479,
iast_FULL (539.805 µs) : 519, 560
. : milestone, 540,
iast_INACTIVE (435.269 µs) : 415, 456
. : milestone, 435,
iast_TELEMETRY_OFF (464.078 µs) : 444, 485
. : milestone, 464,
tracing (437.337 µs) : 417, 458
. : milestone, 437,
section candidate
no_agent (363.725 µs) : 344, 384
. : milestone, 364,
iast (474.983 µs) : 454, 496
. : milestone, 475,
iast_FULL (534.147 µs) : 514, 555
. : milestone, 534,
iast_INACTIVE (445.371 µs) : 425, 466
. : milestone, 445,
iast_TELEMETRY_OFF (464.369 µs) : 443, 485
. : milestone, 464,
tracing (442.981 µs) : 422, 464
. : milestone, 443,
|
...ava-sdk-1.11.0/src/main/java/datadog/trace/instrumentation/aws/v0/TracingRequestHandler.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this by mixing the V2 instrumentation and what was done in the aws sdk v1 instrumentation
.../aws-java-sqs-1.0/src/main/java/datadog/trace/instrumentation/aws/v1/sqs/SqsInterceptor.java
Show resolved
Hide resolved
...ava-sdk-1.11.0/src/main/java/datadog/trace/instrumentation/aws/v0/TracingRequestHandler.java
Outdated
Show resolved
Hide resolved
assert messages[0].attributes['AWSTraceHeader'] =~ | ||
/Root=1-[0-9a-f]{8}-00000000${sendSpan.traceId.toHexStringPadded(16)};Parent=${DDSpanId.toHexStringPadded(sendSpan.spanId)};Sampled=1/ | ||
|
||
cleanup: | ||
client.shutdown() | ||
} | ||
|
||
@IgnoreIf({ instance.isDataStreamsEnabled() }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, it's a mistake on my part when synchronizing this with the V2 tests, will remove that line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it was not a mistake from me ! It's disabled in V2 as well !
dd-trace-java/dd-java-agent/instrumentation/aws-java-sqs-2.0/src/test/groovy/SqsClientTest.groovy
Lines 189 to 190 in 2b28420
@IgnoreIf({instance.isDataStreamsEnabled()}) | |
def "trace details propagated via embedded SQS message attribute (string)"() { |
I suppose the logic is that whether DSM is enabled or not has no impact on the expecations of the test, so we ignore it to not run the same test twice.
One could also argue that we could run it to make sure we didn't change the behavior with DSM. But in any case, I think it should be coherent between v1 and v2 if tests are ignored or not.
...sqs-1.0/src/main/java/datadog/trace/instrumentation/aws/v1/sqs/SqsClientInstrumentation.java
Outdated
Show resolved
Hide resolved
...ava-sdk-1.11.0/src/main/java/datadog/trace/instrumentation/aws/v0/TracingRequestHandler.java
Outdated
Show resolved
Hide resolved
.../aws-java-sqs-1.0/src/main/java/datadog/trace/instrumentation/aws/v1/sqs/SqsInterceptor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments that should be addressed on this PR. I think that the implementation can be also simplified a bit
...ava-sdk-1.11.0/src/main/java/datadog/trace/instrumentation/aws/v0/TracingRequestHandler.java
Outdated
Show resolved
Hide resolved
...aws-java-sqs-1.0/src/main/java/datadog/trace/instrumentation/aws/v1/sqs/TracingIterator.java
Outdated
Show resolved
Hide resolved
String queueUrl = smRequest.getQueueUrl(); | ||
if (queueUrl == null) return request; | ||
|
||
AgentSpan span = startSpan("aws.sqs.send"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardcoded the span name because I don't want to pull dependencies just to access the cache that is normally used to get the name
Line 60 in 62d6d15
span = startSpan(AwsNameCache.spanName(request)); |
and since we know what kind of request we're dealing with here, it'd always be the same value anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a span is started here? For the sdk v2 is not happening andt the reason is that the tracing for the send should already done by the aws sdk instrumentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the sdk v2 instrumentation, the spans are always started early, in beforeExecution
, which is a method that doesn't exist in v1, but takes place even before beforeMarshalling
. This allows access to the span in all methods basically.
Line 49 in 275ed3a
final AgentSpan span = startSpan(DECORATE.spanName(executionAttributes)); |
(it is activated only later)
In the sdk v1 instrumentation, the span is both created and started in the same method, in beforeRequest
, which means it's not available in beforeMarshalling
, where I need it. I could create the span in the beforeMarshalling
method of the sdk instrumentation, but we already discussed here that it'd be better to create it right where I need it, in the SQS instrumentation, which saves one use of the context, and also avoids potential leaking span issues.
If it can help an easy implementation can be found in #6281 . It has to be completed for extracting the context on receive side but it will be easy to do |
I don't understand, it's more or less what I did initially, but then we discussed the fact that it'd be better to use the handler context that is available from 1.11.106 to store the span in this discussion. I do note the improvements you added:
Are you suggesting that after trying it yourself, you think using a context store is a better idea than relying on the post-1.11.106 handler context ? |
In terms of orchestration there is no big difference since today we are opening the span too late (beforeRequest) and there is no way to modify it like happens in the v2. So at this point using a contextStore or using the request attributes does not make difference since it won't avoid creating the span beforeHand (that seems needed). Probably just use a contextStore that allows supporting more versions?. But it's my own feeling. The important thing I would have care about is to try to duplicate as less as possible the logic around the span decoration and its naming (since there are a lot of cases the can change depending on the naming version). Also try to kick in this advice only if datastream is enabled will be a plus |
if (span == null) { | ||
// also try getting the span from the context store, if the error happened early | ||
span = requestSpanStore.remove(request.getOriginalRequest()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this after a discussion with java folks to try to avoid leaking the spans that are created in beforeMarshalling
...0/src/main/java/datadog/trace/instrumentation/aws/v0/HandlerChainFactoryInstrumentation.java
Outdated
Show resolved
Hide resolved
...0/src/main/java/datadog/trace/instrumentation/aws/v0/HandlerChainFactoryInstrumentation.java
Outdated
Show resolved
Hide resolved
...sqs-1.0/src/main/java/datadog/trace/instrumentation/aws/v1/sqs/SqsClientInstrumentation.java
Outdated
Show resolved
Hide resolved
...sqs-1.0/src/main/java/datadog/trace/instrumentation/aws/v1/sqs/SqsClientInstrumentation.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks almost OK to me. I left some final comments to address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me now. You might also ask a double review form the apm java core team (or dsm folks since it's dsm related)
@PerfectSlayer or @mcculls could you give this a look ? I don't think a DSM review is necessary, because I really copied what was done in the V2 implem, so there was no functional question, only technical ones. |
public void set( | ||
final Map<String, MessageAttributeValue> carrier, final String key, final String value) { | ||
if (carrier.size() < 10 && !carrier.containsKey(DATADOG_KEY)) { | ||
String jsonPathway = String.format("{\"%s\": \"%s\"}", key, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t you fear some key
or value
value might break the JSON? Like some"thing
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point, but also this line of code was taken from
Line 18 in 5afa55c
String jsonPathway = String.format("{\"%s\": \"%s\"}", key, value); |
so... idk, maybe I can fix both at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the dotnet implementation, it's implemented that way as well.
https://github.com/DataDog/dd-trace-dotnet/blob/e8735bc26f63534356e7f40b650c5e91a43b111a/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/SQS/ContextPropagation.cs#L84
I'd say we keep it like this, and it may be worth it to fix it if we get an escalation for it.
Also, AWS is usually pretty restrictive on what it accepts for names, so we have that on our side.
What Does This Do
Add the pathway context injection necessary for DSM support to SQS when used with the v1 java sdk.
DSM is only going to be available for producers in v >= 1.11.106 because that's the first version to introduce a handler context with
AmazonWebServiceRequest
, and this is very helpful to pass the span frombeforeMarshalling
tobeforeRequest
. Not being able to use that increases the complexity a lot, and this version is old enough that it shouldn't be a big ask from customers.Consequently, I added a new instrumentation for DSM support specifically, that's going to overlap with the existing APM instrumentation for all V1.X versions of the client.
Motivation
DSM support had already been added to the SQS v2 client, adding support to v1 will allow more clients to benefit from it.
Additional Notes
Jira ticket: AIT-8676