Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instrumentation of exception stacktraces #6609

Merged
merged 2 commits into from
Feb 6, 2024

Conversation

jpbempel
Copy link
Member

@jpbempel jpbempel commented Feb 2, 2024

What Does This Do

When instrumenting exception on first occurence, we get the stacktrace of the exception and for each frame we will generate a special method probe (ExceptionProbe) based on LogProbe doing a snapshot at Exit. to place correctly the probes we are basing on StackTrace element that gives class, method and line number.
for overloaded methods (with same name) the only way to disambiguate which method to instrument is to use the line number. We introduce then the ClassFileLines class to generalize the LineNumber information extracted from the parsing of a classfile by the ASM library and be reused for line instrumentation and method matching. previous LineMap class is then replaced by it.
Created exception probe are kept by ExceptionProbeManager and must be merged with the Configuration coming from Remote Config. Therefore the ConfigurationUpdater class must now take into account that and also be able to be triggered not only by the Remote Config thread but also the thread coming from the exception captured added to the span. Applying configuration require now to be atomic between Configuration object, new transformer and retransformation of the classes impacted.

Motivation

Exception debugging feature

Additional Notes

Jira ticket: DEBUG-2068

When instrumenting exception on first occurence, we get the stacktrace
of the exception and for each frame we will generate a special method
probe (ExceptionProbe) based on LogProbe doing a snapshot at Exit.
to place correctly the probes we are basing on StackTrace element that
gives class, method and line number.
for overloaded methods (with same name) the only way to disambiguate
which method to instrument is to use the line number. We introduce
then the ClassFileLines class to generalize the LineNumber information
extracted from the parsing of a classfile by the ASM library and be
reused for line instrumentation and method matching. previous LineMap
class is then replaced by it.
Created exception probe are kept by ExceptionProbeManager and must be
merged with the Configuration coming from Remote Config. Therefore
the ConfigurationUpdater class must now take into account that and
also be able to be triggered not only by the Remote Config thread
but also the thread coming from the exception captured added to the
span. Applying configuration require now to be atomic between
Configuration object, new transformer and retransformation of the
classes impacted.
@jpbempel jpbempel requested a review from a team as a code owner February 2, 2024 14:35
@jpbempel jpbempel requested review from ojung and removed request for a team February 2, 2024 14:35
@pr-commenter
Copy link

pr-commenter bot commented Feb 2, 2024

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master jpbempel/instrument-stacktrace
git_commit_date 1707125641 1707125961
git_commit_sha c34fdea adeb199
release_version 1.29.0-SNAPSHOT~c34fdeaac9 1.29.0-SNAPSHOT~adeb199dd6
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1707128851 1707128851
ci_job_id 425614750 425614750
ci_pipeline_id 27681263 27681263
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 8 unstable metrics.

Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.29.0-SNAPSHOT~adeb199dd6, baseline=1.29.0-SNAPSHOT~c34fdeaac9

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1056992
Total [baseline] (9.318 s) : 0, 9318123
Agent [candidate] (1.072 s) : 0, 1071640
Total [candidate] (9.359 s) : 0, 9359328
section appsec
Agent [baseline] (1.159 s) : 0, 1158717
Total [baseline] (9.47 s) : 0, 9469639
Agent [candidate] (1.155 s) : 0, 1155182
Total [candidate] (9.449 s) : 0, 9449081
section iast
Agent [baseline] (1.18 s) : 0, 1180364
Total [baseline] (9.678 s) : 0, 9678489
Agent [candidate] (1.184 s) : 0, 1183607
Total [candidate] (9.647 s) : 0, 9647140
section profiling
Agent [baseline] (1.277 s) : 0, 1277080
Total [baseline] (9.531 s) : 0, 9531009
Agent [candidate] (1.277 s) : 0, 1277316
Total [candidate] (9.565 s) : 0, 9565381
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.057 s -
Agent appsec 1.159 s 101.724 ms (9.6%)
Agent iast 1.18 s 123.372 ms (11.7%)
Agent profiling 1.277 s 220.087 ms (20.8%)
Total tracing 9.318 s -
Total appsec 9.47 s 151.517 ms (1.6%)
Total iast 9.678 s 360.366 ms (3.9%)
Total profiling 9.531 s 212.886 ms (2.3%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.072 s -
Agent appsec 1.155 s 83.541 ms (7.8%)
Agent iast 1.184 s 111.967 ms (10.4%)
Agent profiling 1.277 s 205.675 ms (19.2%)
Total tracing 9.359 s -
Total appsec 9.449 s 89.753 ms (1.0%)
Total iast 9.647 s 287.812 ms (3.1%)
Total profiling 9.565 s 206.053 ms (2.2%)
gantt
    title petclinic - break down per module: candidate=1.29.0-SNAPSHOT~adeb199dd6, baseline=1.29.0-SNAPSHOT~c34fdeaac9

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (666.891 ms) : 0, 666891
BytebuddyAgent [candidate] (675.905 ms) : 0, 675905
GlobalTracer [baseline] (295.836 ms) : 0, 295836
GlobalTracer [candidate] (300.152 ms) : 0, 300152
AppSec [baseline] (51.732 ms) : 0, 51732
AppSec [candidate] (52.638 ms) : 0, 52638
Remote Config [baseline] (686.264 µs) : 0, 686
Remote Config [candidate] (702.684 µs) : 0, 703
Telemetry [baseline] (7.519 ms) : 0, 7519
Telemetry [candidate] (7.544 ms) : 0, 7544
section appsec
BytebuddyAgent [baseline] (670.621 ms) : 0, 670621
BytebuddyAgent [candidate] (666.886 ms) : 0, 666886
GlobalTracer [baseline] (296.435 ms) : 0, 296435
GlobalTracer [candidate] (296.487 ms) : 0, 296487
AppSec [baseline] (149.83 ms) : 0, 149830
AppSec [candidate] (150.237 ms) : 0, 150237
Remote Config [baseline] (651.866 µs) : 0, 652
Remote Config [candidate] (651.329 µs) : 0, 651
Telemetry [baseline] (6.747 ms) : 0, 6747
Telemetry [candidate] (6.709 ms) : 0, 6709
section iast
BytebuddyAgent [baseline] (777.482 ms) : 0, 777482
BytebuddyAgent [candidate] (777.953 ms) : 0, 777953
GlobalTracer [baseline] (287.67 ms) : 0, 287670
GlobalTracer [candidate] (288.499 ms) : 0, 288499
AppSec [baseline] (53.862 ms) : 0, 53862
AppSec [candidate] (53.135 ms) : 0, 53135
IAST [baseline] (20.053 ms) : 0, 20053
IAST [candidate] (22.504 ms) : 0, 22504
Remote Config [baseline] (603.272 µs) : 0, 603
Remote Config [candidate] (619.723 µs) : 0, 620
Telemetry [baseline] (6.507 ms) : 0, 6507
Telemetry [candidate] (6.627 ms) : 0, 6627
section profiling
BytebuddyAgent [baseline] (665.485 ms) : 0, 665485
BytebuddyAgent [candidate] (667.285 ms) : 0, 667285
GlobalTracer [baseline] (381.532 ms) : 0, 381532
GlobalTracer [candidate] (381.27 ms) : 0, 381270
AppSec [baseline] (52.132 ms) : 0, 52132
AppSec [candidate] (52.106 ms) : 0, 52106
Remote Config [baseline] (662.376 µs) : 0, 662
Remote Config [candidate] (658.178 µs) : 0, 658
Telemetry [baseline] (8.174 ms) : 0, 8174
Telemetry [candidate] (7.399 ms) : 0, 7399
ProfilingAgent [baseline] (114.538 ms) : 0, 114538
ProfilingAgent [candidate] (114.034 ms) : 0, 114034
Profiling [baseline] (114.563 ms) : 0, 114563
Profiling [candidate] (114.058 ms) : 0, 114058
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2024-02-05T10:03:35 2024-02-05T10:22:28
git_branch master jpbempel/instrument-stacktrace
git_commit_date 1707125641 1707125961
git_commit_sha c34fdea adeb199
release_version 1.29.0-SNAPSHOT~c34fdeaac9 1.29.0-SNAPSHOT~adeb199dd6
start_time 2024-02-05T10:03:22 2024-02-05T10:22:15
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1707128851 1707128851
ci_job_id 425614750 425614750
ci_pipeline_id 27681263 27681263
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 1 performance regressions! Performance is the same for 9 metrics, 16 unstable metrics.

scenario Δ mean http_req_duration Δ mean throughput candidate mean http_req_duration candidate mean throughput baseline mean http_req_duration baseline mean throughput
scenario:load:petclinic:profiling worse
[+55.542µs; +110.437µs] or [+3.704%; +7.364%]
unstable
[-677.857op/s; +449.937op/s] or [-22.030%; +14.623%]
1.583ms 2962.963op/s 1.500ms 3076.923op/s
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.29.0-SNAPSHOT~adeb199dd6, baseline=1.29.0-SNAPSHOT~c34fdeaac9
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.355 ms) : 1336, 1374
.   : milestone, 1355,
appsec (1.767 ms) : 1741, 1793
.   : milestone, 1767,
iast (1.521 ms) : 1497, 1545
.   : milestone, 1521,
profiling (1.5 ms) : 1475, 1525
.   : milestone, 1500,
tracing (1.489 ms) : 1465, 1514
.   : milestone, 1489,
section candidate
no_agent (1.347 ms) : 1328, 1366
.   : milestone, 1347,
appsec (1.751 ms) : 1726, 1776
.   : milestone, 1751,
iast (1.519 ms) : 1495, 1544
.   : milestone, 1519,
profiling (1.583 ms) : 1557, 1609
.   : milestone, 1583,
tracing (1.516 ms) : 1490, 1542
.   : milestone, 1516,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.355 ms [1.336 ms, 1.374 ms] -
appsec 1.767 ms [1.741 ms, 1.793 ms] 411.984 µs (30.4%)
iast 1.521 ms [1.497 ms, 1.545 ms] 165.909 µs (12.2%)
profiling 1.5 ms [1.475 ms, 1.525 ms] 144.595 µs (10.7%)
tracing 1.489 ms [1.465 ms, 1.514 ms] 134.325 µs (9.9%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.347 ms [1.328 ms, 1.366 ms] -
appsec 1.751 ms [1.726 ms, 1.776 ms] 403.968 µs (30.0%)
iast 1.519 ms [1.495 ms, 1.544 ms] 172.704 µs (12.8%)
profiling 1.583 ms [1.557 ms, 1.609 ms] 236.013 µs (17.5%)
tracing 1.516 ms [1.49 ms, 1.542 ms] 169.287 µs (12.6%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.29.0-SNAPSHOT~adeb199dd6, baseline=1.29.0-SNAPSHOT~c34fdeaac9
    dateFormat X
    axisFormat %s
section baseline
no_agent (371.467 µs) : 350, 393
.   : milestone, 371,
iast (470.831 µs) : 450, 492
.   : milestone, 471,
iast_FULL (537.052 µs) : 516, 558
.   : milestone, 537,
iast_GLOBAL (501.516 µs) : 481, 522
.   : milestone, 502,
iast_HARDCODED_SECRET_DISABLED (466.863 µs) : 447, 487
.   : milestone, 467,
iast_INACTIVE (451.116 µs) : 429, 473
.   : milestone, 451,
iast_TELEMETRY_OFF (470.591 µs) : 450, 491
.   : milestone, 471,
tracing (442.822 µs) : 422, 464
.   : milestone, 443,
section candidate
no_agent (367.605 µs) : 347, 388
.   : milestone, 368,
iast (475.848 µs) : 455, 497
.   : milestone, 476,
iast_FULL (528.997 µs) : 508, 550
.   : milestone, 529,
iast_GLOBAL (510.289 µs) : 488, 533
.   : milestone, 510,
iast_HARDCODED_SECRET_DISABLED (469.045 µs) : 449, 489
.   : milestone, 469,
iast_INACTIVE (438.336 µs) : 418, 459
.   : milestone, 438,
iast_TELEMETRY_OFF (475.011 µs) : 454, 496
.   : milestone, 475,
tracing (434.421 µs) : 414, 455
.   : milestone, 434,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 371.467 µs [349.579 µs, 393.355 µs] -
iast 470.831 µs [450.036 µs, 491.626 µs] 99.364 µs (26.7%)
iast_FULL 537.052 µs [516.075 µs, 558.028 µs] 165.584 µs (44.6%)
iast_GLOBAL 501.516 µs [480.554 µs, 522.479 µs] 130.049 µs (35.0%)
iast_HARDCODED_SECRET_DISABLED 466.863 µs [446.525 µs, 487.2 µs] 95.395 µs (25.7%)
iast_INACTIVE 451.116 µs [429.08 µs, 473.151 µs] 79.648 µs (21.4%)
iast_TELEMETRY_OFF 470.591 µs [449.742 µs, 491.44 µs] 99.124 µs (26.7%)
tracing 442.822 µs [421.856 µs, 463.787 µs] 71.355 µs (19.2%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 367.605 µs [347.275 µs, 387.935 µs] -
iast 475.848 µs [454.963 µs, 496.733 µs] 108.243 µs (29.4%)
iast_FULL 528.997 µs [508.294 µs, 549.699 µs] 161.391 µs (43.9%)
iast_GLOBAL 510.289 µs [487.871 µs, 532.708 µs] 142.684 µs (38.8%)
iast_HARDCODED_SECRET_DISABLED 469.045 µs [448.706 µs, 489.384 µs] 101.44 µs (27.6%)
iast_INACTIVE 438.336 µs [418.03 µs, 458.641 µs] 70.731 µs (19.2%)
iast_TELEMETRY_OFF 475.011 µs [453.989 µs, 496.034 µs] 107.406 µs (29.2%)
tracing 434.421 µs [414.139 µs, 454.703 µs] 66.816 µs (18.2%)

Copy link
Contributor

@shatzi shatzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving on comment until we fix the hot-path issue.

this.instrumentation = instrumentation;
this.transformerSupplier = transformerSupplier;
this.serviceName = TagsHelper.sanitize(config.getServiceName());
this.sink = sink;
this.finder = finder;
this.exceptionProbeManager = exceptionProbeManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we make configuration updater know about source of the configuration.
I would say lets try to keep this extendable more easily.

One way to think about it is having

 accept(source: string, configuration) {
   configPerSource[source] = configruation;
   newConfiguration = buildConfigurationFromAllSourcesAndApplyFilters();
   applyNewConfiguration(newConfiguration);
}

Then we simply can add more sources like span-origin, local-config etc...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the idea, but let's have another source before make an abstraction out of it.

@@ -27,6 +31,7 @@ public void handleException(Throwable t) {
// TODO trigger send snapshots already captured
} else {
exceptionProbeManager.instrument(fingerprint, t.getStackTrace());
configurationUpdater.reapplyCurrentConfig();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like we talked. I think that because "reapplyCurrentConfig" is blocking it should not be called inside the hot path of span.attachError.

Maybe this can stay here is configurationUpdater would reapply the configuration async.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep will do later

if (changes.hasProbeRelatedChanges()) {
LOGGER.info("Applying new probe configuration, changes: {}", changes);
handleProbesChanges(changes);
configurationLock.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one things we can get benefits from making is async is "batching".
assume we got multiple exception triggered and we calling applyConfiguration 3 time each time with few more probes.
This code can check if we are currently applying configuration, and if we do, just mark that the latestConfig needed to be applied. this could happen multiple times if retransformation taking along time, and every time it happen we basically skipping applying a old configuration that will no longer relevant or if lock is not ordered we could be in a case where we apply a configuration that is not the latest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes with async and a queue we could conflate to discard intermediate configs

@jpbempel jpbempel added comp: debugger Dynamic Instrumentation type: enhancement labels Feb 5, 2024
Copy link
Contributor

@shatzi shatzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@jpbempel jpbempel merged commit 07d0733 into master Feb 6, 2024
79 checks passed
@jpbempel jpbempel deleted the jpbempel/instrument-stacktrace branch February 6, 2024 06:36
@github-actions github-actions bot added this to the 1.30.0 milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants