Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create SpanLink from spark.streaming_batch span to databricks.task.execution span for spark instrumentation if applicable. #6816

Conversation

yiliangzhou
Copy link
Contributor

@yiliangzhou yiliangzhou commented Mar 15, 2024

What Does This Do

Add support for SpanLinks in spark instrumentation to correlate spark.streaming_batch span with its parent databricks.task.execution span from a different trace.

Motivation

spark.streaming_batch spans are not linked to the parent databricks.task.execution span as this would have created traces with too many spans.

However, without such casual relationship tracked inside the same trace, it's cumbersome to navigate from the spark.streaming_batch span to its parent span in order to understand larger context. Instead, users need to manually narrow down parent trace by filtering Databricks Workflows related tags common to both spark.streaming_batch span and databricks.task.execution span, such as databricks_task_run_id tag.

We improve the user experience by adding the support of SpanLinks for spark.streaming_batch spans.

Additional Notes

Check this for more information on SpanLinks in the Trace View.

Jira ticket: DJM-65

…sk span with the help of DatabricksParentContext when running spark on databricks.
@yiliangzhou yiliangzhou added the inst: apache spark Apache Spark instrumentation label Mar 15, 2024
@yiliangzhou yiliangzhou self-assigned this Mar 15, 2024
Copy link
Contributor

@paul-laffon-dd paul-laffon-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, added a few comments if we can simplify the tests

@pr-commenter
Copy link

pr-commenter bot commented Mar 15, 2024

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master liangzhou.yi/spark-streaming-span-link-to-databricks-workflow-job
git_commit_date 1710435433 1710515230
git_commit_sha 6d2f2ad b073919
release_version 1.32.0-SNAPSHOT~6d2f2adf41 1.32.0-SNAPSHOT~b073919c6c
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1710518415 1710518415
ci_job_id 461041511 461041511
ci_pipeline_id 30191756 30191756
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 50 metrics, 13 unstable metrics.

Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.32.0-SNAPSHOT~b073919c6c, baseline=1.32.0-SNAPSHOT~6d2f2adf41

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.078 s) : 0, 1077721
Total [baseline] (9.197 s) : 0, 9197105
Agent [candidate] (1.088 s) : 0, 1087727
Total [candidate] (9.179 s) : 0, 9179247
section appsec
Agent [baseline] (1.198 s) : 0, 1197720
Total [baseline] (9.409 s) : 0, 9409314
Agent [candidate] (1.202 s) : 0, 1202073
Total [candidate] (9.291 s) : 0, 9291333
section iast
Agent [baseline] (1.204 s) : 0, 1204475
Total [baseline] (9.32 s) : 0, 9319733
Agent [candidate] (1.205 s) : 0, 1205282
Total [candidate] (9.333 s) : 0, 9333239
section profiling
Agent [baseline] (1.273 s) : 0, 1273144
Total [baseline] (9.405 s) : 0, 9405190
Agent [candidate] (1.274 s) : 0, 1273690
Total [candidate] (9.384 s) : 0, 9384454
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.078 s -
Agent appsec 1.198 s 119.999 ms (11.1%)
Agent iast 1.204 s 126.754 ms (11.8%)
Agent profiling 1.273 s 195.423 ms (18.1%)
Total tracing 9.197 s -
Total appsec 9.409 s 212.21 ms (2.3%)
Total iast 9.32 s 122.628 ms (1.3%)
Total profiling 9.405 s 208.085 ms (2.3%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.088 s -
Agent appsec 1.202 s 114.345 ms (10.5%)
Agent iast 1.205 s 117.554 ms (10.8%)
Agent profiling 1.274 s 185.963 ms (17.1%)
Total tracing 9.179 s -
Total appsec 9.291 s 112.086 ms (1.2%)
Total iast 9.333 s 153.992 ms (1.7%)
Total profiling 9.384 s 205.207 ms (2.2%)
gantt
    title petclinic - break down per module: candidate=1.32.0-SNAPSHOT~b073919c6c, baseline=1.32.0-SNAPSHOT~6d2f2adf41

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (695.204 ms) : 0, 695204
BytebuddyAgent [candidate] (701.354 ms) : 0, 701354
GlobalTracer [baseline] (290.976 ms) : 0, 290976
GlobalTracer [candidate] (294.151 ms) : 0, 294151
AppSec [baseline] (48.566 ms) : 0, 48566
AppSec [candidate] (49.026 ms) : 0, 49026
Remote Config [baseline] (719.931 µs) : 0, 720
Remote Config [candidate] (739.239 µs) : 0, 739
Telemetry [baseline] (7.896 ms) : 0, 7896
Telemetry [candidate] (7.85 ms) : 0, 7850
section appsec
BytebuddyAgent [baseline] (694.192 ms) : 0, 694192
BytebuddyAgent [candidate] (696.76 ms) : 0, 696760
GlobalTracer [baseline] (290.329 ms) : 0, 290329
GlobalTracer [candidate] (292.199 ms) : 0, 292199
AppSec [baseline] (153.412 ms) : 0, 153412
AppSec [candidate] (153.388 ms) : 0, 153388
IAST [baseline] (18.097 ms) : 0, 18097
IAST [candidate] (17.936 ms) : 0, 17936
Remote Config [baseline] (614.184 µs) : 0, 614
Remote Config [candidate] (609.987 µs) : 0, 610
Telemetry [baseline] (6.937 ms) : 0, 6937
Telemetry [candidate] (6.884 ms) : 0, 6884
section iast
BytebuddyAgent [baseline] (800.806 ms) : 0, 800806
BytebuddyAgent [candidate] (800.636 ms) : 0, 800636
GlobalTracer [baseline] (288.422 ms) : 0, 288422
GlobalTracer [candidate] (289.168 ms) : 0, 289168
AppSec [baseline] (50.364 ms) : 0, 50364
AppSec [candidate] (49.524 ms) : 0, 49524
IAST [baseline] (23.279 ms) : 0, 23279
IAST [candidate] (22.965 ms) : 0, 22965
Remote Config [baseline] (613.014 µs) : 0, 613
Remote Config [candidate] (587.872 µs) : 0, 588
Telemetry [baseline] (6.662 ms) : 0, 6662
Telemetry [candidate] (8.089 ms) : 0, 8089
section profiling
BytebuddyAgent [baseline] (689.123 ms) : 0, 689123
BytebuddyAgent [candidate] (688.834 ms) : 0, 688834
GlobalTracer [baseline] (375.306 ms) : 0, 375306
GlobalTracer [candidate] (376.429 ms) : 0, 376429
AppSec [baseline] (49.676 ms) : 0, 49676
AppSec [candidate] (49.736 ms) : 0, 49736
Remote Config [baseline] (822.136 µs) : 0, 822
Remote Config [candidate] (802.695 µs) : 0, 803
Telemetry [baseline] (7.396 ms) : 0, 7396
Telemetry [candidate] (7.437 ms) : 0, 7437
ProfilingAgent [baseline] (94.695 ms) : 0, 94695
ProfilingAgent [candidate] (94.342 ms) : 0, 94342
Profiling [baseline] (94.719 ms) : 0, 94719
Profiling [candidate] (94.365 ms) : 0, 94365
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.32.0-SNAPSHOT~b073919c6c, baseline=1.32.0-SNAPSHOT~6d2f2adf41

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.078 s) : 0, 1078375
Total [baseline] (8.538 s) : 0, 8537816
Agent [candidate] (1.094 s) : 0, 1093917
Total [candidate] (8.557 s) : 0, 8556585
section iast
Agent [baseline] (1.201 s) : 0, 1200827
Total [baseline] (9.033 s) : 0, 9033235
Agent [candidate] (1.207 s) : 0, 1207276
Total [candidate] (9.06 s) : 0, 9059594
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.211 s) : 0, 1210527
Total [baseline] (9.022 s) : 0, 9021553
Agent [candidate] (1.207 s) : 0, 1207112
Total [candidate] (9.007 s) : 0, 9006844
section iast_TELEMETRY_OFF
Agent [baseline] (1.197 s) : 0, 1196744
Total [baseline] (9.027 s) : 0, 9026559
Agent [candidate] (1.199 s) : 0, 1198560
Total [candidate] (9.06 s) : 0, 9059653
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.078 s -
Agent iast 1.201 s 122.452 ms (11.4%)
Agent iast_HARDCODED_SECRET_DISABLED 1.211 s 132.152 ms (12.3%)
Agent iast_TELEMETRY_OFF 1.197 s 118.369 ms (11.0%)
Total tracing 8.538 s -
Total iast 9.033 s 495.419 ms (5.8%)
Total iast_HARDCODED_SECRET_DISABLED 9.022 s 483.737 ms (5.7%)
Total iast_TELEMETRY_OFF 9.027 s 488.743 ms (5.7%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.094 s -
Agent iast 1.207 s 113.359 ms (10.4%)
Agent iast_HARDCODED_SECRET_DISABLED 1.207 s 113.195 ms (10.3%)
Agent iast_TELEMETRY_OFF 1.199 s 104.643 ms (9.6%)
Total tracing 8.557 s -
Total iast 9.06 s 503.01 ms (5.9%)
Total iast_HARDCODED_SECRET_DISABLED 9.007 s 450.259 ms (5.3%)
Total iast_TELEMETRY_OFF 9.06 s 503.068 ms (5.9%)
gantt
    title insecure-bank - break down per module: candidate=1.32.0-SNAPSHOT~b073919c6c, baseline=1.32.0-SNAPSHOT~6d2f2adf41

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (696.252 ms) : 0, 696252
BytebuddyAgent [candidate] (706.296 ms) : 0, 706296
GlobalTracer [baseline] (290.738 ms) : 0, 290738
GlobalTracer [candidate] (295.176 ms) : 0, 295176
AppSec [baseline] (48.656 ms) : 0, 48656
AppSec [candidate] (49.155 ms) : 0, 49155
Remote Config [baseline] (736.539 µs) : 0, 737
Remote Config [candidate] (745.394 µs) : 0, 745
Telemetry [baseline] (7.701 ms) : 0, 7701
Telemetry [candidate] (7.743 ms) : 0, 7743
section iast
BytebuddyAgent [baseline] (797.86 ms) : 0, 797860
BytebuddyAgent [candidate] (802.347 ms) : 0, 802347
GlobalTracer [baseline] (287.775 ms) : 0, 287775
GlobalTracer [candidate] (289.554 ms) : 0, 289554
AppSec [baseline] (49.906 ms) : 0, 49906
AppSec [candidate] (51.692 ms) : 0, 51692
Remote Config [baseline] (615.688 µs) : 0, 616
Remote Config [candidate] (610.018 µs) : 0, 610
Telemetry [baseline] (7.401 ms) : 0, 7401
Telemetry [candidate] (6.597 ms) : 0, 6597
IAST [baseline] (22.977 ms) : 0, 22977
IAST [candidate] (22.059 ms) : 0, 22059
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (804.782 ms) : 0, 804782
BytebuddyAgent [candidate] (801.378 ms) : 0, 801378
GlobalTracer [baseline] (289.796 ms) : 0, 289796
GlobalTracer [candidate] (290.014 ms) : 0, 290014
AppSec [baseline] (51.927 ms) : 0, 51927
AppSec [candidate] (50.365 ms) : 0, 50365
Remote Config [baseline] (604.476 µs) : 0, 604
Remote Config [candidate] (610.581 µs) : 0, 611
Telemetry [baseline] (6.642 ms) : 0, 6642
Telemetry [candidate] (6.616 ms) : 0, 6616
IAST [baseline] (22.233 ms) : 0, 22233
IAST [candidate] (23.799 ms) : 0, 23799
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (792.804 ms) : 0, 792804
BytebuddyAgent [candidate] (793.906 ms) : 0, 793906
GlobalTracer [baseline] (288.811 ms) : 0, 288811
GlobalTracer [candidate] (290.011 ms) : 0, 290011
AppSec [baseline] (49.192 ms) : 0, 49192
AppSec [candidate] (49.405 ms) : 0, 49405
Remote Config [baseline] (572.921 µs) : 0, 573
Remote Config [candidate] (573.494 µs) : 0, 573
Telemetry [baseline] (6.49 ms) : 0, 6490
Telemetry [candidate] (6.479 ms) : 0, 6479
IAST [baseline] (24.662 ms) : 0, 24662
IAST [candidate] (23.878 ms) : 0, 23878

Load

Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.32.0-SNAPSHOT~b073919c6c, baseline=1.32.0-SNAPSHOT~6d2f2adf41
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.349 ms) : 1330, 1368
.   : milestone, 1349,
appsec (1.777 ms) : 1754, 1800
.   : milestone, 1777,
iast (1.54 ms) : 1516, 1563
.   : milestone, 1540,
profiling (1.525 ms) : 1501, 1549
.   : milestone, 1525,
tracing (1.493 ms) : 1469, 1516
.   : milestone, 1493,
section candidate
no_agent (1.344 ms) : 1325, 1363
.   : milestone, 1344,
appsec (1.754 ms) : 1729, 1779
.   : milestone, 1754,
iast (1.534 ms) : 1511, 1557
.   : milestone, 1534,
profiling (1.588 ms) : 1563, 1612
.   : milestone, 1588,
tracing (1.495 ms) : 1472, 1519
.   : milestone, 1495,
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.349 ms [1.33 ms, 1.368 ms] -
appsec 1.777 ms [1.754 ms, 1.8 ms] 427.961 µs (31.7%)
iast 1.54 ms [1.516 ms, 1.563 ms] 190.928 µs (14.2%)
profiling 1.525 ms [1.501 ms, 1.549 ms] 176.159 µs (13.1%)
tracing 1.493 ms [1.469 ms, 1.516 ms] 143.866 µs (10.7%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.344 ms [1.325 ms, 1.363 ms] -
appsec 1.754 ms [1.729 ms, 1.779 ms] 410.149 µs (30.5%)
iast 1.534 ms [1.511 ms, 1.557 ms] 189.881 µs (14.1%)
profiling 1.588 ms [1.563 ms, 1.612 ms] 244.137 µs (18.2%)
tracing 1.495 ms [1.472 ms, 1.519 ms] 151.228 µs (11.3%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.32.0-SNAPSHOT~b073919c6c, baseline=1.32.0-SNAPSHOT~6d2f2adf41
    dateFormat X
    axisFormat %s
section baseline
no_agent (359.419 µs) : 339, 380
.   : milestone, 359,
iast (466.535 µs) : 446, 487
.   : milestone, 467,
iast_FULL (540.485 µs) : 520, 561
.   : milestone, 540,
iast_GLOBAL (494.156 µs) : 473, 515
.   : milestone, 494,
iast_HARDCODED_SECRET_DISABLED (466.666 µs) : 447, 487
.   : milestone, 467,
iast_INACTIVE (450.379 µs) : 430, 471
.   : milestone, 450,
iast_TELEMETRY_OFF (464.111 µs) : 444, 484
.   : milestone, 464,
tracing (441.942 µs) : 422, 462
.   : milestone, 442,
section candidate
no_agent (365.098 µs) : 345, 385
.   : milestone, 365,
iast (480.85 µs) : 461, 501
.   : milestone, 481,
iast_FULL (536.432 µs) : 516, 557
.   : milestone, 536,
iast_GLOBAL (493.903 µs) : 473, 515
.   : milestone, 494,
iast_HARDCODED_SECRET_DISABLED (473.853 µs) : 453, 494
.   : milestone, 474,
iast_INACTIVE (441.745 µs) : 421, 462
.   : milestone, 442,
iast_TELEMETRY_OFF (467.328 µs) : 447, 488
.   : milestone, 467,
tracing (440.162 µs) : 420, 460
.   : milestone, 440,
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 359.419 µs [339.206 µs, 379.632 µs] -
iast 466.535 µs [445.908 µs, 487.161 µs] 107.115 µs (29.8%)
iast_FULL 540.485 µs [519.739 µs, 561.231 µs] 181.066 µs (50.4%)
iast_GLOBAL 494.156 µs [473.169 µs, 515.143 µs] 134.737 µs (37.5%)
iast_HARDCODED_SECRET_DISABLED 466.666 µs [446.705 µs, 486.627 µs] 107.247 µs (29.8%)
iast_INACTIVE 450.379 µs [429.671 µs, 471.088 µs] 90.96 µs (25.3%)
iast_TELEMETRY_OFF 464.111 µs [443.848 µs, 484.375 µs] 104.692 µs (29.1%)
tracing 441.942 µs [421.663 µs, 462.221 µs] 82.523 µs (23.0%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 365.098 µs [345.071 µs, 385.125 µs] -
iast 480.85 µs [460.566 µs, 501.134 µs] 115.752 µs (31.7%)
iast_FULL 536.432 µs [516.116 µs, 556.747 µs] 171.334 µs (46.9%)
iast_GLOBAL 493.903 µs [472.74 µs, 515.065 µs] 128.804 µs (35.3%)
iast_HARDCODED_SECRET_DISABLED 473.853 µs [453.283 µs, 494.424 µs] 108.755 µs (29.8%)
iast_INACTIVE 441.745 µs [421.495 µs, 461.996 µs] 76.647 µs (21.0%)
iast_TELEMETRY_OFF 467.328 µs [446.974 µs, 487.682 µs] 102.23 µs (28.0%)
tracing 440.162 µs [419.86 µs, 460.463 µs] 75.063 µs (20.6%)

@yiliangzhou yiliangzhou marked this pull request as ready for review March 15, 2024 18:53
@yiliangzhou yiliangzhou requested a review from a team as a code owner March 15, 2024 18:53
@yiliangzhou yiliangzhou merged commit a3553ef into master Mar 18, 2024
80 checks passed
@yiliangzhou yiliangzhou deleted the liangzhou.yi/spark-streaming-span-link-to-databricks-workflow-job branch March 18, 2024 15:08
@github-actions github-actions bot added this to the 1.32.0 milestone Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inst: apache spark Apache Spark instrumentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants