Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add frame tags and send snapshots on addThrowable #6824

Merged
merged 2 commits into from Mar 19, 2024

Conversation

jpbempel
Copy link
Member

@jpbempel jpbempel commented Mar 18, 2024

What Does This Do

Collect snapshot generated by an exception probe frame in ThrowableState class that is generated by exception instance. Those exception instances are tracked by a WeakIdentityHashMap in ExceptionProbeManager. Key is directly the exception instance. The Weak nature of the hash map allow to remove the entry upon the exception is no longer reachable.

When unwinding, once the exception reaches the frame level where it is added to the span (DDSpan::addThrowable) we process all the collected snapshots and and add tags with frame index for each snapshot id. The frame index is computed based on the generated mapping between original exception (the one added to the span) and the inner most exception which was used to generated exception probes and snapshots.

An instrumentation bug is also fixed:
casting when calling CapturedValue.of is not necessary as the method is accepting any objects. On the other side, some return types could not be used as cast target because not visible from the current frame. inherited field with a type not visible from the inherited class.

Motivation

Exception debugging feature

Additional Notes

Jira ticket: DEBUG-2068

Collect snapshot generated by an exception probe frame in
ThrowableState class that is generated by exception instance. Those
exception instances are tracked by a WeakIdentityHashMap in
ExceptionProbeManager. Key is directly the exception instance. The
Weak nature of the hash map allow to remove the entry upon the
exception is no longer reachable.

When unwinding, once the exception reaches the frame level where it
is added to the span (DDSpan::addThrowable) we process all the
collected snapshots and and add tags with frame index for each
snapshot id. The frame index is computed based on the generated
mapping between original exception (the one added to the span) and
the inner most exception which was used to generated exception probes
and snapshots.

An instrumentation bug is also fixed:
casting when calling CapturedValue.of is not necessary as the method
is accepting any objects. On the other side, some return types could
not be used as cast target because not visible from the current frame.
inherited field with a type not visible from the inherited class.
@jpbempel jpbempel requested review from a team as code owners March 18, 2024 14:01
@jpbempel jpbempel requested review from cimi, dougqh and am312 and removed request for a team March 18, 2024 14:01
@jpbempel jpbempel added the comp: debugger Dynamic Instrumentation label Mar 18, 2024
@pr-commenter
Copy link

pr-commenter bot commented Mar 18, 2024

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master jpbempel/exception-id
git_commit_date 1710833793 1710858590
git_commit_sha f97f632 8c50d27
release_version 1.32.0-SNAPSHOT~f97f63248a 1.32.0-SNAPSHOT~8c50d27dd4
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1710879140 1710879140
ci_job_id 463588103 463588103
ci_pipeline_id 30383506 30383506
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 12 unstable metrics.

Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.087 s) : 0, 1087341
Total [baseline] (9.256 s) : 0, 9255674
Agent [candidate] (1.082 s) : 0, 1081921
Total [candidate] (9.191 s) : 0, 9190927
section appsec
Agent [baseline] (1.208 s) : 0, 1208172
Total [baseline] (9.298 s) : 0, 9298387
Agent [candidate] (1.203 s) : 0, 1202632
Total [candidate] (9.308 s) : 0, 9308005
section iast
Agent [baseline] (1.223 s) : 0, 1223473
Total [baseline] (9.409 s) : 0, 9408774
Agent [candidate] (1.207 s) : 0, 1206827
Total [candidate] (9.339 s) : 0, 9338683
section profiling
Agent [baseline] (1.275 s) : 0, 1275026
Total [baseline] (9.374 s) : 0, 9373784
Agent [candidate] (1.277 s) : 0, 1276510
Total [candidate] (9.359 s) : 0, 9358831
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.087 s -
Agent appsec 1.208 s 120.831 ms (11.1%)
Agent iast 1.223 s 136.133 ms (12.5%)
Agent profiling 1.275 s 187.686 ms (17.3%)
Total tracing 9.256 s -
Total appsec 9.298 s 42.713 ms (0.5%)
Total iast 9.409 s 153.1 ms (1.7%)
Total profiling 9.374 s 118.11 ms (1.3%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.082 s -
Agent appsec 1.203 s 120.711 ms (11.2%)
Agent iast 1.207 s 124.907 ms (11.5%)
Agent profiling 1.277 s 194.589 ms (18.0%)
Total tracing 9.191 s -
Total appsec 9.308 s 117.077 ms (1.3%)
Total iast 9.339 s 147.755 ms (1.6%)
Total profiling 9.359 s 167.903 ms (1.8%)
gantt
    title petclinic - break down per module: candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (701.522 ms) : 0, 701522
BytebuddyAgent [candidate] (697.619 ms) : 0, 697619
GlobalTracer [baseline] (293.364 ms) : 0, 293364
GlobalTracer [candidate] (292.231 ms) : 0, 292231
AppSec [baseline] (49.464 ms) : 0, 49464
AppSec [candidate] (49.231 ms) : 0, 49231
Remote Config [baseline] (1.098 ms) : 0, 1098
Remote Config [candidate] (1.104 ms) : 0, 1104
Telemetry [baseline] (7.439 ms) : 0, 7439
Telemetry [candidate] (7.448 ms) : 0, 7448
section appsec
BytebuddyAgent [baseline] (701.316 ms) : 0, 701316
BytebuddyAgent [candidate] (697.238 ms) : 0, 697238
GlobalTracer [baseline] (293.375 ms) : 0, 293375
GlobalTracer [candidate] (292.299 ms) : 0, 292299
AppSec [baseline] (153.615 ms) : 0, 153615
AppSec [candidate] (153.508 ms) : 0, 153508
Remote Config [baseline] (608.078 µs) : 0, 608
Remote Config [candidate] (615.376 µs) : 0, 615
Telemetry [baseline] (6.839 ms) : 0, 6839
Telemetry [candidate] (6.858 ms) : 0, 6858
IAST [baseline] (17.825 ms) : 0, 17825
IAST [candidate] (17.872 ms) : 0, 17872
section iast
BytebuddyAgent [baseline] (814.099 ms) : 0, 814099
BytebuddyAgent [candidate] (802.158 ms) : 0, 802158
GlobalTracer [baseline] (292.001 ms) : 0, 292001
GlobalTracer [candidate] (289.35 ms) : 0, 289350
AppSec [baseline] (52.368 ms) : 0, 52368
AppSec [candidate] (50.946 ms) : 0, 50946
Remote Config [baseline] (601.911 µs) : 0, 602
Remote Config [candidate] (567.112 µs) : 0, 567
Telemetry [baseline] (6.777 ms) : 0, 6777
Telemetry [candidate] (6.601 ms) : 0, 6601
IAST [baseline] (22.768 ms) : 0, 22768
IAST [candidate] (22.907 ms) : 0, 22907
section profiling
BytebuddyAgent [baseline] (690.365 ms) : 0, 690365
BytebuddyAgent [candidate] (690.52 ms) : 0, 690520
GlobalTracer [baseline] (375.647 ms) : 0, 375647
GlobalTracer [candidate] (377.21 ms) : 0, 377210
AppSec [baseline] (50.23 ms) : 0, 50230
AppSec [candidate] (50.028 ms) : 0, 50028
Remote Config [baseline] (883.227 µs) : 0, 883
Remote Config [candidate] (869.611 µs) : 0, 870
Telemetry [baseline] (7.229 ms) : 0, 7229
Telemetry [candidate] (7.312 ms) : 0, 7312
ProfilingAgent [baseline] (94.444 ms) : 0, 94444
ProfilingAgent [candidate] (94.47 ms) : 0, 94470
Profiling [baseline] (94.467 ms) : 0, 94467
Profiling [candidate] (94.493 ms) : 0, 94493
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.085 s) : 0, 1084669
Total [baseline] (8.584 s) : 0, 8583645
Agent [candidate] (1.083 s) : 0, 1083161
Total [candidate] (8.611 s) : 0, 8611168
section iast
Agent [baseline] (1.219 s) : 0, 1219045
Total [baseline] (9.091 s) : 0, 9090614
Agent [candidate] (1.203 s) : 0, 1203406
Total [candidate] (9.044 s) : 0, 9044135
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.208 s) : 0, 1207786
Total [baseline] (9.024 s) : 0, 9023829
Agent [candidate] (1.214 s) : 0, 1214006
Total [candidate] (9.064 s) : 0, 9063592
section iast_TELEMETRY_OFF
Agent [baseline] (1.199 s) : 0, 1198964
Total [baseline] (9.029 s) : 0, 9029111
Agent [candidate] (1.217 s) : 0, 1216899
Total [candidate] (9.13 s) : 0, 9130150
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.085 s -
Agent iast 1.219 s 134.376 ms (12.4%)
Agent iast_HARDCODED_SECRET_DISABLED 1.208 s 123.117 ms (11.4%)
Agent iast_TELEMETRY_OFF 1.199 s 114.295 ms (10.5%)
Total tracing 8.584 s -
Total iast 9.091 s 506.969 ms (5.9%)
Total iast_HARDCODED_SECRET_DISABLED 9.024 s 440.185 ms (5.1%)
Total iast_TELEMETRY_OFF 9.029 s 445.466 ms (5.2%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.083 s -
Agent iast 1.203 s 120.245 ms (11.1%)
Agent iast_HARDCODED_SECRET_DISABLED 1.214 s 130.845 ms (12.1%)
Agent iast_TELEMETRY_OFF 1.217 s 133.738 ms (12.3%)
Total tracing 8.611 s -
Total iast 9.044 s 432.968 ms (5.0%)
Total iast_HARDCODED_SECRET_DISABLED 9.064 s 452.424 ms (5.3%)
Total iast_TELEMETRY_OFF 9.13 s 518.982 ms (6.0%)
gantt
    title insecure-bank - break down per module: candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (701.552 ms) : 0, 701552
BytebuddyAgent [candidate] (698.15 ms) : 0, 698150
GlobalTracer [baseline] (291.06 ms) : 0, 291060
GlobalTracer [candidate] (292.777 ms) : 0, 292777
AppSec [baseline] (48.972 ms) : 0, 48972
AppSec [candidate] (49.352 ms) : 0, 49352
Remote Config [baseline] (1.062 ms) : 0, 1062
Remote Config [candidate] (1.109 ms) : 0, 1109
Telemetry [baseline] (7.467 ms) : 0, 7467
Telemetry [candidate] (7.457 ms) : 0, 7457
section iast
BytebuddyAgent [baseline] (811.014 ms) : 0, 811014
BytebuddyAgent [candidate] (799.697 ms) : 0, 799697
GlobalTracer [baseline] (291.401 ms) : 0, 291401
GlobalTracer [candidate] (288.52 ms) : 0, 288520
AppSec [baseline] (53.225 ms) : 0, 53225
AppSec [candidate] (49.441 ms) : 0, 49441
Remote Config [baseline] (561.143 µs) : 0, 561
Remote Config [candidate] (563.859 µs) : 0, 564
Telemetry [baseline] (6.614 ms) : 0, 6614
Telemetry [candidate] (6.567 ms) : 0, 6567
IAST [baseline] (21.393 ms) : 0, 21393
IAST [candidate] (24.336 ms) : 0, 24336
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (802.705 ms) : 0, 802705
BytebuddyAgent [candidate] (806.986 ms) : 0, 806986
GlobalTracer [baseline] (289.434 ms) : 0, 289434
GlobalTracer [candidate] (290.888 ms) : 0, 290888
AppSec [baseline] (51.717 ms) : 0, 51717
AppSec [candidate] (51.3 ms) : 0, 51300
Remote Config [baseline] (577.424 µs) : 0, 577
Remote Config [candidate] (579.5 µs) : 0, 580
Telemetry [baseline] (7.461 ms) : 0, 7461
Telemetry [candidate] (7.447 ms) : 0, 7447
IAST [baseline] (21.455 ms) : 0, 21455
IAST [candidate] (22.216 ms) : 0, 22216
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (793.52 ms) : 0, 793520
BytebuddyAgent [candidate] (805.793 ms) : 0, 805793
GlobalTracer [baseline] (289.603 ms) : 0, 289603
GlobalTracer [candidate] (293.423 ms) : 0, 293423
AppSec [baseline] (48.99 ms) : 0, 48990
AppSec [candidate] (49.585 ms) : 0, 49585
Remote Config [baseline] (580.65 µs) : 0, 581
Remote Config [candidate] (591.163 µs) : 0, 591
Telemetry [baseline] (9.543 ms) : 0, 9543
Telemetry [candidate] (6.618 ms) : 0, 6618
IAST [baseline] (22.417 ms) : 0, 22417
IAST [candidate] (26.086 ms) : 0, 26086

Load

Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.351 ms) : 1332, 1370
.   : milestone, 1351,
appsec (1.782 ms) : 1759, 1805
.   : milestone, 1782,
iast (1.507 ms) : 1484, 1531
.   : milestone, 1507,
profiling (1.592 ms) : 1567, 1616
.   : milestone, 1592,
tracing (1.483 ms) : 1460, 1507
.   : milestone, 1483,
section candidate
no_agent (1.344 ms) : 1324, 1363
.   : milestone, 1344,
appsec (1.781 ms) : 1758, 1805
.   : milestone, 1781,
iast (1.526 ms) : 1503, 1549
.   : milestone, 1526,
profiling (1.525 ms) : 1501, 1549
.   : milestone, 1525,
tracing (1.514 ms) : 1491, 1537
.   : milestone, 1514,
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.351 ms [1.332 ms, 1.37 ms] -
appsec 1.782 ms [1.759 ms, 1.805 ms] 431.178 µs (31.9%)
iast 1.507 ms [1.484 ms, 1.531 ms] 156.241 µs (11.6%)
profiling 1.592 ms [1.567 ms, 1.616 ms] 240.556 µs (17.8%)
tracing 1.483 ms [1.46 ms, 1.507 ms] 132.104 µs (9.8%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.344 ms [1.324 ms, 1.363 ms] -
appsec 1.781 ms [1.758 ms, 1.805 ms] 437.568 µs (32.6%)
iast 1.526 ms [1.503 ms, 1.549 ms] 182.567 µs (13.6%)
profiling 1.525 ms [1.501 ms, 1.549 ms] 181.682 µs (13.5%)
tracing 1.514 ms [1.491 ms, 1.537 ms] 170.628 µs (12.7%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.32.0-SNAPSHOT~8c50d27dd4, baseline=1.32.0-SNAPSHOT~f97f63248a
    dateFormat X
    axisFormat %s
section baseline
no_agent (369.514 µs) : 348, 391
.   : milestone, 370,
iast (470.262 µs) : 450, 491
.   : milestone, 470,
iast_FULL (540.268 µs) : 520, 561
.   : milestone, 540,
iast_GLOBAL (496.742 µs) : 476, 518
.   : milestone, 497,
iast_HARDCODED_SECRET_DISABLED (480.152 µs) : 459, 501
.   : milestone, 480,
iast_INACTIVE (449.327 µs) : 428, 471
.   : milestone, 449,
iast_TELEMETRY_OFF (471.035 µs) : 450, 492
.   : milestone, 471,
tracing (451.131 µs) : 430, 472
.   : milestone, 451,
section candidate
no_agent (369.928 µs) : 350, 390
.   : milestone, 370,
iast (476.406 µs) : 456, 497
.   : milestone, 476,
iast_FULL (537.262 µs) : 517, 558
.   : milestone, 537,
iast_GLOBAL (499.388 µs) : 478, 521
.   : milestone, 499,
iast_HARDCODED_SECRET_DISABLED (476.997 µs) : 456, 498
.   : milestone, 477,
iast_INACTIVE (443.77 µs) : 424, 464
.   : milestone, 444,
iast_TELEMETRY_OFF (470.575 µs) : 450, 492
.   : milestone, 471,
tracing (440.828 µs) : 420, 461
.   : milestone, 441,
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 369.514 µs [348.363 µs, 390.664 µs] -
iast 470.262 µs [449.731 µs, 490.793 µs] 100.748 µs (27.3%)
iast_FULL 540.268 µs [519.528 µs, 561.009 µs] 170.755 µs (46.2%)
iast_GLOBAL 496.742 µs [475.662 µs, 517.822 µs] 127.229 µs (34.4%)
iast_HARDCODED_SECRET_DISABLED 480.152 µs [459.286 µs, 501.018 µs] 110.638 µs (29.9%)
iast_INACTIVE 449.327 µs [428.047 µs, 470.607 µs] 79.813 µs (21.6%)
iast_TELEMETRY_OFF 471.035 µs [450.114 µs, 491.956 µs] 101.522 µs (27.5%)
tracing 451.131 µs [430.093 µs, 472.169 µs] 81.617 µs (22.1%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 369.928 µs [349.571 µs, 390.285 µs] -
iast 476.406 µs [455.822 µs, 496.99 µs] 106.478 µs (28.8%)
iast_FULL 537.262 µs [516.652 µs, 557.873 µs] 167.334 µs (45.2%)
iast_GLOBAL 499.388 µs [477.555 µs, 521.221 µs] 129.46 µs (35.0%)
iast_HARDCODED_SECRET_DISABLED 476.997 µs [456.14 µs, 497.854 µs] 107.068 µs (28.9%)
iast_INACTIVE 443.77 µs [423.672 µs, 463.867 µs] 73.841 µs (20.0%)
iast_TELEMETRY_OFF 470.575 µs [449.587 µs, 491.564 µs] 100.647 µs (27.2%)
tracing 440.828 µs [420.282 µs, 461.374 µs] 70.9 µs (19.2%)

Copy link
Contributor

@shatzi shatzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress. thank you!

return;
}
span.setTag(ERROR_DEBUG_INFO_CAPTURED, true);
if (span.getTag(DD_DEBUG_ERROR_EXCEPTION_ID) != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this can happen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 instrumentations at different level are triggered:

  • one at framework level: Spring
  • one at http server level: tomcat
    so 2 calls to addThrowable are made for the same span.

StackTraceElement[] innerTrace = innerMostException.getStackTrace();
int currentIdx = 0;
List<Snapshot> snapshots = state.getSnapshots();
for (int i = 0; i < snapshots.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this deal with recursion functions? if we have all snapshots for all frames this would work, but if some of the frames are missing this can go out of sync.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, this approach have limitations, but it's good enough for a first approach. it will be refined later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WE need to revisit later this naive heuristic using stack depth to perform assignment and use the class and method name to ensure we don't mess up

@@ -772,6 +772,11 @@ private ASMHelper.Type tryRetrieveField(
ldc(insnList, fieldName);
// stack: [target_object, string]
emitReflectiveCall(insnList, returnType, OBJECT_TYPE);
int sort = returnType.getMainType().getSort();
if (sort == org.objectweb.asm.Type.OBJECT || sort == org.objectweb.asm.Type.ARRAY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give me some background on what this does?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api for reflective call always returns Object (see Field::get) so If we want to continue to retrieve sub fields we need to cast it to the underlying object.
This was moved from emitReflectiveCall method to here because for capturing values we don't need to cast it

int currentIdx = 0;
List<Snapshot> snapshots = state.getSnapshots();
for (int i = 0; i < snapshots.size(); i++) {
Snapshot snapshot = snapshots.get(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because each snapshot have a reference to throwable at that time. we can reduce the search into that section of frames. aka, assume we have throwable T1 -> T2 -> T3 (inner most). then all snapshots that captured T2 should be located on those frames.

elements.add(trace[i]);
}
if (trace.length - 1 - m > 0) {
elements.add(null); // add a stack frame representing number of common frames ("... n more")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that those frames will not have a matching frame index. then we would not even commit those snapshots right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the relation. we are not using this function to decide if we need to collect snapshots or not but for mapping back frame and eventually snapshots collected to original stacktrace as seen by the span

@jpbempel jpbempel merged commit 97283c4 into master Mar 19, 2024
80 checks passed
@jpbempel jpbempel deleted the jpbempel/exception-id branch March 19, 2024 20:15
@github-actions github-actions bot added this to the 1.32.0 milestone Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: debugger Dynamic Instrumentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants