Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce new ingestion reason for spark traces #6310

Merged
merged 3 commits into from
Dec 11, 2023

Conversation

paul-laffon-dd
Copy link
Contributor

@paul-laffon-dd paul-laffon-dd commented Dec 4, 2023

What Does This Do

Enforce new ingestion reason for spark traces

Motivation

It is critical to keep all spark traces as customers closely monitor job runs. The new ingestion reason will allow tracking of ingested bytes for billing. Analysis on ingested bytes can be found in here

Additional Notes

Added the method AgentSpan setSamplingPriority(final int newPriority, int samplingMechanism) in the AgentSpan interface so that it can be called from an instrumentation

@pr-commenter
Copy link

pr-commenter bot commented Dec 4, 2023

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master paul.laffon/spark-spans-sampling
git_commit_date 1702298139 1702300388
git_commit_sha 00358aa d49aa5f
release_version 1.26.0-SNAPSHOT~00358aaa1a 1.25.0-SNAPSHOT~d49aa5febf
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1702303072 1702303072
ci_job_id 386462116 386462116
ci_pipeline_id 24783057 24783057
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 9 unstable metrics.

Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.25.0-SNAPSHOT~d49aa5febf, baseline=1.26.0-SNAPSHOT~00358aaa1a

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.048 s) : 0, 1047564
Total [baseline] (8.769 s) : 0, 8769452
Agent [candidate] (1.043 s) : 0, 1042871
Total [candidate] (8.7 s) : 0, 8700152
section iast
Agent [baseline] (1.159 s) : 0, 1158729
Total [baseline] (9.236 s) : 0, 9235529
Agent [candidate] (1.168 s) : 0, 1168485
Total [candidate] (9.259 s) : 0, 9259340
section iast_TELEMETRY_OFF
Agent [baseline] (1.153 s) : 0, 1152941
Total [baseline] (9.226 s) : 0, 9225770
Agent [candidate] (1.151 s) : 0, 1150789
Total [candidate] (9.287 s) : 0, 9287172
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.048 s -
Agent iast 1.159 s 111.165 ms (10.6%)
Agent iast_TELEMETRY_OFF 1.153 s 105.377 ms (10.1%)
Total tracing 8.769 s -
Total iast 9.236 s 466.077 ms (5.3%)
Total iast_TELEMETRY_OFF 9.226 s 456.318 ms (5.2%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.043 s -
Agent iast 1.168 s 125.614 ms (12.0%)
Agent iast_TELEMETRY_OFF 1.151 s 107.918 ms (10.3%)
Total tracing 8.7 s -
Total iast 9.259 s 559.188 ms (6.4%)
Total iast_TELEMETRY_OFF 9.287 s 587.02 ms (6.7%)
gantt
    title insecure-bank - break down per module: candidate=1.25.0-SNAPSHOT~d49aa5febf, baseline=1.26.0-SNAPSHOT~00358aaa1a

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (648.91 ms) : 0, 648910
BytebuddyAgent [candidate] (645.617 ms) : 0, 645617
GlobalTracer [baseline] (307.309 ms) : 0, 307309
GlobalTracer [candidate] (306.749 ms) : 0, 306749
AppSec [baseline] (48.887 ms) : 0, 48887
AppSec [candidate] (48.53 ms) : 0, 48530
Remote Config [baseline] (673.088 µs) : 0, 673
Remote Config [candidate] (664.934 µs) : 0, 665
Telemetry [baseline] (7.118 ms) : 0, 7118
Telemetry [candidate] (7.046 ms) : 0, 7046
section iast
BytebuddyAgent [baseline] (764.871 ms) : 0, 764871
BytebuddyAgent [candidate] (771.7 ms) : 0, 771700
GlobalTracer [baseline] (284.29 ms) : 0, 284290
GlobalTracer [candidate] (287.312 ms) : 0, 287312
AppSec [baseline] (46.373 ms) : 0, 46373
AppSec [candidate] (46.643 ms) : 0, 46643
IAST [baseline] (19.541 ms) : 0, 19541
IAST [candidate] (20.979 ms) : 0, 20979
Remote Config [baseline] (624.158 µs) : 0, 624
Remote Config [candidate] (602.082 µs) : 0, 602
Telemetry [baseline] (8.774 ms) : 0, 8774
Telemetry [candidate] (6.636 ms) : 0, 6636
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (759.226 ms) : 0, 759226
BytebuddyAgent [candidate] (758.058 ms) : 0, 758058
GlobalTracer [baseline] (285.891 ms) : 0, 285891
GlobalTracer [candidate] (285.528 ms) : 0, 285528
AppSec [baseline] (47.918 ms) : 0, 47918
AppSec [candidate] (46.276 ms) : 0, 46276
IAST [baseline] (18.579 ms) : 0, 18579
IAST [candidate] (17.395 ms) : 0, 17395
Remote Config [baseline] (599.66 µs) : 0, 600
Remote Config [candidate] (2.146 ms) : 0, 2146
Telemetry [baseline] (6.524 ms) : 0, 6524
Telemetry [candidate] (7.23 ms) : 0, 7230
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.25.0-SNAPSHOT~d49aa5febf, baseline=1.26.0-SNAPSHOT~00358aaa1a

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.044 s) : 0, 1043800
Total [baseline] (9.332 s) : 0, 9331804
Agent [candidate] (1.055 s) : 0, 1054909
Total [candidate] (9.345 s) : 0, 9345464
section appsec
Agent [baseline] (1.137 s) : 0, 1136776
Total [baseline] (9.417 s) : 0, 9416949
Agent [candidate] (1.135 s) : 0, 1135456
Total [candidate] (9.396 s) : 0, 9396371
section iast
Agent [baseline] (1.162 s) : 0, 1161601
Total [baseline] (9.578 s) : 0, 9578485
Agent [candidate] (1.161 s) : 0, 1160540
Total [candidate] (9.632 s) : 0, 9631804
section profiling
Agent [baseline] (1.232 s) : 0, 1231785
Total [baseline] (9.625 s) : 0, 9624652
Agent [candidate] (1.233 s) : 0, 1233454
Total [candidate] (9.623 s) : 0, 9622534
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.044 s -
Agent appsec 1.137 s 92.976 ms (8.9%)
Agent iast 1.162 s 117.801 ms (11.3%)
Agent profiling 1.232 s 187.985 ms (18.0%)
Total tracing 9.332 s -
Total appsec 9.417 s 85.145 ms (0.9%)
Total iast 9.578 s 246.681 ms (2.6%)
Total profiling 9.625 s 292.847 ms (3.1%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.055 s -
Agent appsec 1.135 s 80.546 ms (7.6%)
Agent iast 1.161 s 105.631 ms (10.0%)
Agent profiling 1.233 s 178.544 ms (16.9%)
Total tracing 9.345 s -
Total appsec 9.396 s 50.907 ms (0.5%)
Total iast 9.632 s 286.34 ms (3.1%)
Total profiling 9.623 s 277.07 ms (3.0%)
gantt
    title petclinic - break down per module: candidate=1.25.0-SNAPSHOT~d49aa5febf, baseline=1.26.0-SNAPSHOT~00358aaa1a

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (645.645 ms) : 0, 645645
BytebuddyAgent [candidate] (653.29 ms) : 0, 653290
GlobalTracer [baseline] (307.008 ms) : 0, 307008
GlobalTracer [candidate] (310.017 ms) : 0, 310017
AppSec [baseline] (49.017 ms) : 0, 49017
AppSec [candidate] (49.066 ms) : 0, 49066
Remote Config [baseline] (672.815 µs) : 0, 673
Remote Config [candidate] (678.225 µs) : 0, 678
Telemetry [baseline] (7.127 ms) : 0, 7127
Telemetry [candidate] (7.148 ms) : 0, 7148
section appsec
BytebuddyAgent [baseline] (649.753 ms) : 0, 649753
BytebuddyAgent [candidate] (647.665 ms) : 0, 647665
GlobalTracer [baseline] (308.323 ms) : 0, 308323
GlobalTracer [candidate] (307.725 ms) : 0, 307725
AppSec [baseline] (136.819 ms) : 0, 136819
AppSec [candidate] (136.955 ms) : 0, 136955
Remote Config [baseline] (644.132 µs) : 0, 644
Remote Config [candidate] (645.273 µs) : 0, 645
Telemetry [baseline] (6.818 ms) : 0, 6818
Telemetry [candidate] (8.094 ms) : 0, 8094
section iast
BytebuddyAgent [baseline] (767.231 ms) : 0, 767231
BytebuddyAgent [candidate] (765.943 ms) : 0, 765943
GlobalTracer [baseline] (284.744 ms) : 0, 284744
GlobalTracer [candidate] (284.877 ms) : 0, 284877
AppSec [baseline] (46.278 ms) : 0, 46278
AppSec [candidate] (46.721 ms) : 0, 46721
Remote Config [baseline] (612.558 µs) : 0, 613
Remote Config [candidate] (633.018 µs) : 0, 633
Telemetry [baseline] (6.577 ms) : 0, 6577
Telemetry [candidate] (7.97 ms) : 0, 7970
IAST [baseline] (21.764 ms) : 0, 21764
IAST [candidate] (20.09 ms) : 0, 20090
section profiling
BytebuddyAgent [baseline] (655.614 ms) : 0, 655614
BytebuddyAgent [candidate] (656.626 ms) : 0, 656626
GlobalTracer [baseline] (377.421 ms) : 0, 377421
GlobalTracer [candidate] (377.248 ms) : 0, 377248
AppSec [baseline] (48.54 ms) : 0, 48540
AppSec [candidate] (48.396 ms) : 0, 48396
Remote Config [baseline] (674.801 µs) : 0, 675
Remote Config [candidate] (688.872 µs) : 0, 689
Telemetry [baseline] (7.355 ms) : 0, 7355
Telemetry [candidate] (7.384 ms) : 0, 7384
ProfilingAgent [baseline] (88.144 ms) : 0, 88144
ProfilingAgent [candidate] (89.022 ms) : 0, 89022
Profiling [baseline] (88.169 ms) : 0, 88169
Profiling [candidate] (89.047 ms) : 0, 89047
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2023-12-11T13:37:05 2023-12-11T13:53:39
git_branch master paul.laffon/spark-spans-sampling
git_commit_date 1702298139 1702300388
git_commit_sha 00358aa d49aa5f
release_version 1.26.0-SNAPSHOT~00358aaa1a 1.25.0-SNAPSHOT~d49aa5febf
start_time 2023-12-11T13:36:52 2023-12-11T13:53:26
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1702303072 1702303072
ci_job_id 386462116 386462116
ci_pipeline_id 24783057 24783057
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 13 unstable metrics.

Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.25.0-SNAPSHOT~d49aa5febf, baseline=1.26.0-SNAPSHOT~00358aaa1a
    dateFormat X
    axisFormat %s
section baseline
no_agent (366.156 µs) : 345, 387
.   : milestone, 366,
iast (468.718 µs) : 448, 489
.   : milestone, 469,
iast_FULL (541.311 µs) : 521, 562
.   : milestone, 541,
iast_INACTIVE (450.941 µs) : 430, 472
.   : milestone, 451,
iast_TELEMETRY_OFF (467.967 µs) : 447, 489
.   : milestone, 468,
tracing (439.36 µs) : 419, 460
.   : milestone, 439,
section candidate
no_agent (368.685 µs) : 348, 389
.   : milestone, 369,
iast (471.404 µs) : 451, 492
.   : milestone, 471,
iast_FULL (537.305 µs) : 517, 558
.   : milestone, 537,
iast_INACTIVE (442.738 µs) : 422, 463
.   : milestone, 443,
iast_TELEMETRY_OFF (467.134 µs) : 446, 488
.   : milestone, 467,
tracing (446.845 µs) : 427, 467
.   : milestone, 447,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 366.156 µs [345.072 µs, 387.24 µs] -
iast 468.718 µs [448.372 µs, 489.063 µs] 102.562 µs (28.0%)
iast_FULL 541.311 µs [520.95 µs, 561.673 µs] 175.156 µs (47.8%)
iast_INACTIVE 450.941 µs [429.924 µs, 471.958 µs] 84.785 µs (23.2%)
iast_TELEMETRY_OFF 467.967 µs [446.928 µs, 489.006 µs] 101.811 µs (27.8%)
tracing 439.36 µs [418.543 µs, 460.177 µs] 73.204 µs (20.0%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 368.685 µs [348.328 µs, 389.041 µs] -
iast 471.404 µs [450.896 µs, 491.912 µs] 102.719 µs (27.9%)
iast_FULL 537.305 µs [516.857 µs, 557.753 µs] 168.62 µs (45.7%)
iast_INACTIVE 442.738 µs [422.279 µs, 463.197 µs] 74.053 µs (20.1%)
iast_TELEMETRY_OFF 467.134 µs [446.209 µs, 488.059 µs] 98.45 µs (26.7%)
tracing 446.845 µs [426.529 µs, 467.161 µs] 78.161 µs (21.2%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.25.0-SNAPSHOT~d49aa5febf, baseline=1.26.0-SNAPSHOT~00358aaa1a
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.362 ms) : 1343, 1382
.   : milestone, 1362,
appsec (1.764 ms) : 1739, 1789
.   : milestone, 1764,
iast (1.531 ms) : 1507, 1555
.   : milestone, 1531,
profiling (1.537 ms) : 1511, 1564
.   : milestone, 1537,
tracing (1.496 ms) : 1472, 1520
.   : milestone, 1496,
section candidate
no_agent (1.357 ms) : 1338, 1376
.   : milestone, 1357,
appsec (1.773 ms) : 1748, 1797
.   : milestone, 1773,
iast (1.529 ms) : 1504, 1553
.   : milestone, 1529,
profiling (1.54 ms) : 1513, 1567
.   : milestone, 1540,
tracing (1.51 ms) : 1486, 1535
.   : milestone, 1510,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.362 ms [1.343 ms, 1.382 ms] -
appsec 1.764 ms [1.739 ms, 1.789 ms] 401.727 µs (29.5%)
iast 1.531 ms [1.507 ms, 1.555 ms] 168.186 µs (12.3%)
profiling 1.537 ms [1.511 ms, 1.564 ms] 174.896 µs (12.8%)
tracing 1.496 ms [1.472 ms, 1.52 ms] 133.782 µs (9.8%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.357 ms [1.338 ms, 1.376 ms] -
appsec 1.773 ms [1.748 ms, 1.797 ms] 415.829 µs (30.6%)
iast 1.529 ms [1.504 ms, 1.553 ms] 171.968 µs (12.7%)
profiling 1.54 ms [1.513 ms, 1.567 ms] 183.336 µs (13.5%)
tracing 1.51 ms [1.486 ms, 1.535 ms] 153.534 µs (11.3%)

@@ -135,6 +137,7 @@ private void initApplicationSpanIfNotInitialized() {
captureApplicationParameters(builder);

applicationSpan = builder.start();
setDataJobsSamplingPriority(applicationSpan);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting priority on the root span at creation should be enough to propagate the priority + mechanism everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root span depends on the setup and it can be either the application, batch, sql or job. I added it to all spans to make it easier to not forget it

@paul-laffon-dd paul-laffon-dd marked this pull request as ready for review December 6, 2023 14:27
@paul-laffon-dd paul-laffon-dd requested a review from a team as a code owner December 6, 2023 14:27
@paul-laffon-dd paul-laffon-dd added the inst: apache spark Apache Spark instrumentation label Dec 6, 2023
@@ -21,6 +21,8 @@ public class SamplingMechanism {
public static final byte REMOTE_USER_RATE = 6;
/** Span Sampling Rate (single span sampled on account of a span sampling rule) */
public static final byte SPAN_SAMPLING_RATE = 8;
/** Data Jobs */
public static final byte DATA_JOBS = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this mechanism only seem to be used with USER_KEEP, then the validation code further down in this file should maybe be updated to only allow that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch, added DATA_JOBS in validateWithSamplingPriority

@paul-laffon-dd paul-laffon-dd merged commit 163ba8a into master Dec 11, 2023
72 checks passed
@paul-laffon-dd paul-laffon-dd deleted the paul.laffon/spark-spans-sampling branch December 11, 2023 16:08
@github-actions github-actions bot added this to the 1.26.0 milestone Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inst: apache spark Apache Spark instrumentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants