New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add frame tags and send snapshots on addThrowable #6824
Conversation
Collect snapshot generated by an exception probe frame in ThrowableState class that is generated by exception instance. Those exception instances are tracked by a WeakIdentityHashMap in ExceptionProbeManager. Key is directly the exception instance. The Weak nature of the hash map allow to remove the entry upon the exception is no longer reachable. When unwinding, once the exception reaches the frame level where it is added to the span (DDSpan::addThrowable) we process all the collected snapshots and and add tags with frame index for each snapshot id. The frame index is computed based on the generated mapping between original exception (the one added to the span) and the inner most exception which was used to generated exception probes and snapshots. An instrumentation bug is also fixed: casting when calling CapturedValue.of is not necessary as the method is accepting any objects. On the other side, some return types could not be used as cast target because not visible from the current frame. inherited field with a type not visible from the inherited class.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 12 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.087 s) : 0, 1087341
Total [baseline] (9.256 s) : 0, 9255674
Agent [candidate] (1.082 s) : 0, 1081921
Total [candidate] (9.191 s) : 0, 9190927
section appsec
Agent [baseline] (1.208 s) : 0, 1208172
Total [baseline] (9.298 s) : 0, 9298387
Agent [candidate] (1.203 s) : 0, 1202632
Total [candidate] (9.308 s) : 0, 9308005
section iast
Agent [baseline] (1.223 s) : 0, 1223473
Total [baseline] (9.409 s) : 0, 9408774
Agent [candidate] (1.207 s) : 0, 1206827
Total [candidate] (9.339 s) : 0, 9338683
section profiling
Agent [baseline] (1.275 s) : 0, 1275026
Total [baseline] (9.374 s) : 0, 9373784
Agent [candidate] (1.277 s) : 0, 1276510
Total [candidate] (9.359 s) : 0, 9358831
gantt
title petclinic - break down per module: candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (701.522 ms) : 0, 701522
BytebuddyAgent [candidate] (697.619 ms) : 0, 697619
GlobalTracer [baseline] (293.364 ms) : 0, 293364
GlobalTracer [candidate] (292.231 ms) : 0, 292231
AppSec [baseline] (49.464 ms) : 0, 49464
AppSec [candidate] (49.231 ms) : 0, 49231
Remote Config [baseline] (1.098 ms) : 0, 1098
Remote Config [candidate] (1.104 ms) : 0, 1104
Telemetry [baseline] (7.439 ms) : 0, 7439
Telemetry [candidate] (7.448 ms) : 0, 7448
section appsec
BytebuddyAgent [baseline] (701.316 ms) : 0, 701316
BytebuddyAgent [candidate] (697.238 ms) : 0, 697238
GlobalTracer [baseline] (293.375 ms) : 0, 293375
GlobalTracer [candidate] (292.299 ms) : 0, 292299
AppSec [baseline] (153.615 ms) : 0, 153615
AppSec [candidate] (153.508 ms) : 0, 153508
Remote Config [baseline] (608.078 µs) : 0, 608
Remote Config [candidate] (615.376 µs) : 0, 615
Telemetry [baseline] (6.839 ms) : 0, 6839
Telemetry [candidate] (6.858 ms) : 0, 6858
IAST [baseline] (17.825 ms) : 0, 17825
IAST [candidate] (17.872 ms) : 0, 17872
section iast
BytebuddyAgent [baseline] (814.099 ms) : 0, 814099
BytebuddyAgent [candidate] (802.158 ms) : 0, 802158
GlobalTracer [baseline] (292.001 ms) : 0, 292001
GlobalTracer [candidate] (289.35 ms) : 0, 289350
AppSec [baseline] (52.368 ms) : 0, 52368
AppSec [candidate] (50.946 ms) : 0, 50946
Remote Config [baseline] (601.911 µs) : 0, 602
Remote Config [candidate] (567.112 µs) : 0, 567
Telemetry [baseline] (6.777 ms) : 0, 6777
Telemetry [candidate] (6.601 ms) : 0, 6601
IAST [baseline] (22.768 ms) : 0, 22768
IAST [candidate] (22.907 ms) : 0, 22907
section profiling
BytebuddyAgent [baseline] (690.365 ms) : 0, 690365
BytebuddyAgent [candidate] (690.52 ms) : 0, 690520
GlobalTracer [baseline] (375.647 ms) : 0, 375647
GlobalTracer [candidate] (377.21 ms) : 0, 377210
AppSec [baseline] (50.23 ms) : 0, 50230
AppSec [candidate] (50.028 ms) : 0, 50028
Remote Config [baseline] (883.227 µs) : 0, 883
Remote Config [candidate] (869.611 µs) : 0, 870
Telemetry [baseline] (7.229 ms) : 0, 7229
Telemetry [candidate] (7.312 ms) : 0, 7312
ProfilingAgent [baseline] (94.444 ms) : 0, 94444
ProfilingAgent [candidate] (94.47 ms) : 0, 94470
Profiling [baseline] (94.467 ms) : 0, 94467
Profiling [candidate] (94.493 ms) : 0, 94493
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.085 s) : 0, 1084669
Total [baseline] (8.584 s) : 0, 8583645
Agent [candidate] (1.083 s) : 0, 1083161
Total [candidate] (8.611 s) : 0, 8611168
section iast
Agent [baseline] (1.219 s) : 0, 1219045
Total [baseline] (9.091 s) : 0, 9090614
Agent [candidate] (1.203 s) : 0, 1203406
Total [candidate] (9.044 s) : 0, 9044135
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.208 s) : 0, 1207786
Total [baseline] (9.024 s) : 0, 9023829
Agent [candidate] (1.214 s) : 0, 1214006
Total [candidate] (9.064 s) : 0, 9063592
section iast_TELEMETRY_OFF
Agent [baseline] (1.199 s) : 0, 1198964
Total [baseline] (9.029 s) : 0, 9029111
Agent [candidate] (1.217 s) : 0, 1216899
Total [candidate] (9.13 s) : 0, 9130150
gantt
title insecure-bank - break down per module: candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (701.552 ms) : 0, 701552
BytebuddyAgent [candidate] (698.15 ms) : 0, 698150
GlobalTracer [baseline] (291.06 ms) : 0, 291060
GlobalTracer [candidate] (292.777 ms) : 0, 292777
AppSec [baseline] (48.972 ms) : 0, 48972
AppSec [candidate] (49.352 ms) : 0, 49352
Remote Config [baseline] (1.062 ms) : 0, 1062
Remote Config [candidate] (1.109 ms) : 0, 1109
Telemetry [baseline] (7.467 ms) : 0, 7467
Telemetry [candidate] (7.457 ms) : 0, 7457
section iast
BytebuddyAgent [baseline] (811.014 ms) : 0, 811014
BytebuddyAgent [candidate] (799.697 ms) : 0, 799697
GlobalTracer [baseline] (291.401 ms) : 0, 291401
GlobalTracer [candidate] (288.52 ms) : 0, 288520
AppSec [baseline] (53.225 ms) : 0, 53225
AppSec [candidate] (49.441 ms) : 0, 49441
Remote Config [baseline] (561.143 µs) : 0, 561
Remote Config [candidate] (563.859 µs) : 0, 564
Telemetry [baseline] (6.614 ms) : 0, 6614
Telemetry [candidate] (6.567 ms) : 0, 6567
IAST [baseline] (21.393 ms) : 0, 21393
IAST [candidate] (24.336 ms) : 0, 24336
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (802.705 ms) : 0, 802705
BytebuddyAgent [candidate] (806.986 ms) : 0, 806986
GlobalTracer [baseline] (289.434 ms) : 0, 289434
GlobalTracer [candidate] (290.888 ms) : 0, 290888
AppSec [baseline] (51.717 ms) : 0, 51717
AppSec [candidate] (51.3 ms) : 0, 51300
Remote Config [baseline] (577.424 µs) : 0, 577
Remote Config [candidate] (579.5 µs) : 0, 580
Telemetry [baseline] (7.461 ms) : 0, 7461
Telemetry [candidate] (7.447 ms) : 0, 7447
IAST [baseline] (21.455 ms) : 0, 21455
IAST [candidate] (22.216 ms) : 0, 22216
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (793.52 ms) : 0, 793520
BytebuddyAgent [candidate] (805.793 ms) : 0, 805793
GlobalTracer [baseline] (289.603 ms) : 0, 289603
GlobalTracer [candidate] (293.423 ms) : 0, 293423
AppSec [baseline] (48.99 ms) : 0, 48990
AppSec [candidate] (49.585 ms) : 0, 49585
Remote Config [baseline] (580.65 µs) : 0, 581
Remote Config [candidate] (591.163 µs) : 0, 591
Telemetry [baseline] (9.543 ms) : 0, 9543
Telemetry [candidate] (6.618 ms) : 0, 6618
IAST [baseline] (22.417 ms) : 0, 22417
IAST [candidate] (26.086 ms) : 0, 26086
LoadRequest duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a
dateFormat X
axisFormat %s
section baseline
no_agent (1.351 ms) : 1332, 1370
. : milestone, 1351,
appsec (1.782 ms) : 1759, 1805
. : milestone, 1782,
iast (1.507 ms) : 1484, 1531
. : milestone, 1507,
profiling (1.592 ms) : 1567, 1616
. : milestone, 1592,
tracing (1.483 ms) : 1460, 1507
. : milestone, 1483,
section candidate
no_agent (1.344 ms) : 1324, 1363
. : milestone, 1344,
appsec (1.781 ms) : 1758, 1805
. : milestone, 1781,
iast (1.526 ms) : 1503, 1549
. : milestone, 1526,
profiling (1.525 ms) : 1501, 1549
. : milestone, 1525,
tracing (1.514 ms) : 1491, 1537
. : milestone, 1514,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a
dateFormat X
axisFormat %s
section baseline
no_agent (369.514 µs) : 348, 391
. : milestone, 370,
iast (470.262 µs) : 450, 491
. : milestone, 470,
iast_FULL (540.268 µs) : 520, 561
. : milestone, 540,
iast_GLOBAL (496.742 µs) : 476, 518
. : milestone, 497,
iast_HARDCODED_SECRET_DISABLED (480.152 µs) : 459, 501
. : milestone, 480,
iast_INACTIVE (449.327 µs) : 428, 471
. : milestone, 449,
iast_TELEMETRY_OFF (471.035 µs) : 450, 492
. : milestone, 471,
tracing (451.131 µs) : 430, 472
. : milestone, 451,
section candidate
no_agent (369.928 µs) : 350, 390
. : milestone, 370,
iast (476.406 µs) : 456, 497
. : milestone, 476,
iast_FULL (537.262 µs) : 517, 558
. : milestone, 537,
iast_GLOBAL (499.388 µs) : 478, 521
. : milestone, 499,
iast_HARDCODED_SECRET_DISABLED (476.997 µs) : 456, 498
. : milestone, 477,
iast_INACTIVE (443.77 µs) : 424, 464
. : milestone, 444,
iast_TELEMETRY_OFF (470.575 µs) : 450, 492
. : milestone, 471,
tracing (440.828 µs) : 420, 461
. : milestone, 441,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress. thank you!
...ugger/debugger-bootstrap/src/main/java/datadog/trace/bootstrap/debugger/DebuggerContext.java
Show resolved
Hide resolved
return; | ||
} | ||
span.setTag(ERROR_DEBUG_INFO_CAPTURED, true); | ||
if (span.getTag(DD_DEBUG_ERROR_EXCEPTION_ID) != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how this can happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 instrumentations at different level are triggered:
- one at framework level: Spring
- one at http server level: tomcat
so 2 calls to addThrowable are made for the same span.
...nt/agent-debugger/src/main/java/com/datadog/debugger/exception/DefaultExceptionDebugger.java
Outdated
Show resolved
Hide resolved
...nt/agent-debugger/src/main/java/com/datadog/debugger/exception/DefaultExceptionDebugger.java
Outdated
Show resolved
Hide resolved
StackTraceElement[] innerTrace = innerMostException.getStackTrace(); | ||
int currentIdx = 0; | ||
List<Snapshot> snapshots = state.getSnapshots(); | ||
for (int i = 0; i < snapshots.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this deal with recursion functions? if we have all snapshots for all frames this would work, but if some of the frames are missing this can go out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not, this approach have limitations, but it's good enough for a first approach. it will be refined later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WE need to revisit later this naive heuristic using stack depth to perform assignment and use the class and method name to ensure we don't mess up
@@ -772,6 +772,11 @@ private ASMHelper.Type tryRetrieveField( | |||
ldc(insnList, fieldName); | |||
// stack: [target_object, string] | |||
emitReflectiveCall(insnList, returnType, OBJECT_TYPE); | |||
int sort = returnType.getMainType().getSort(); | |||
if (sort == org.objectweb.asm.Type.OBJECT || sort == org.objectweb.asm.Type.ARRAY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give me some background on what this does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The api for reflective call always returns Object
(see Field::get) so If we want to continue to retrieve sub fields we need to cast it to the underlying object.
This was moved from emitReflectiveCall
method to here because for capturing values we don't need to cast it
dd-java-agent/agent-debugger/src/main/java/com/datadog/debugger/probe/ExceptionProbe.java
Outdated
Show resolved
Hide resolved
int currentIdx = 0; | ||
List<Snapshot> snapshots = state.getSnapshots(); | ||
for (int i = 0; i < snapshots.size(); i++) { | ||
Snapshot snapshot = snapshots.get(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because each snapshot have a reference to throwable at that time. we can reduce the search into that section of frames. aka, assume we have throwable T1 -> T2 -> T3 (inner most). then all snapshots that captured T2 should be located on those frames.
elements.add(trace[i]); | ||
} | ||
if (trace.length - 1 - m > 0) { | ||
elements.add(null); // add a stack frame representing number of common frames ("... n more") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that those frames will not have a matching frame index. then we would not even commit those snapshots right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the relation. we are not using this function to decide if we need to collect snapshots or not but for mapping back frame and eventually snapshots collected to original stacktrace as seen by the span
What Does This Do
Collect snapshot generated by an exception probe frame in ThrowableState class that is generated by exception instance. Those exception instances are tracked by a WeakIdentityHashMap in ExceptionProbeManager. Key is directly the exception instance. The Weak nature of the hash map allow to remove the entry upon the exception is no longer reachable.
When unwinding, once the exception reaches the frame level where it is added to the span (DDSpan::addThrowable) we process all the collected snapshots and and add tags with frame index for each snapshot id. The frame index is computed based on the generated mapping between original exception (the one added to the span) and the inner most exception which was used to generated exception probes and snapshots.
An instrumentation bug is also fixed:
casting when calling CapturedValue.of is not necessary as the method is accepting any objects. On the other side, some return types could not be used as cast target because not visible from the current frame. inherited field with a type not visible from the inherited class.
Motivation
Exception debugging feature
Additional Notes
Jira ticket: DEBUG-2068