Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture relationships between spark SQL stages #6459

Merged
merged 2 commits into from
Jan 25, 2024

Conversation

paul-laffon-dd
Copy link
Contributor

@paul-laffon-dd paul-laffon-dd commented Jan 9, 2024

What Does This Do

Relationships are captured with the two new fields

  • _dd.spark.sql_parent_stage_ids, the list of stageIds that the current stage is dependant on
  • nodeId for each node of the existing JSON plan in _dd.spark.sql_plan. It allows to capture which SQL nodes are used by multiple stages

Motivation

Capture relationships between spark SQL stages so that they can be displayed linked together

Additional Notes

@pr-commenter
Copy link

pr-commenter bot commented Jan 9, 2024

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master paul.laffon/spark-sql-node-id
git_commit_date 1706178321 1706182127
git_commit_sha cd33d47 140b86d
release_version 1.29.0-SNAPSHOT~cd33d47cbf 1.28.0-SNAPSHOT~140b86d5e0
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1706184804 1706184804
ci_job_id 418365583 418365583
ci_pipeline_id 27170106 27170106
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 6 performance improvements and 4 performance regressions! Performance is the same for 37 metrics, 7 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:startup:insecure-bank:tracing:GlobalTracer worse
[+10.539ms; +12.547ms] or [+3.556%; +4.233%]
307.919ms 296.376ms
scenario:startup:insecure-bank:tracing:Remote Config better
[-50.088µs; -25.647µs] or [-7.214%; -3.694%]
656.438µs 694.306µs
scenario:startup:insecure-bank:tracing:Telemetry better
[-473.456µs; -190.940µs] or [-6.266%; -2.527%]
7.224ms 7.556ms
scenario:startup:petclinic:appsec:GlobalTracer worse
[+11.102ms; +18.661ms] or [+3.752%; +6.307%]
310.771ms 295.889ms
scenario:startup:petclinic:appsec:Remote Config better
[-57.117µs; -25.049µs] or [-8.281%; -3.632%]
648.687µs 689.770µs
scenario:startup:petclinic:profiling:Remote Config worse
[+287.673µs; +327.076µs] or [+42.189%; +47.968%]
989.236µs 681.862µs
scenario:startup:petclinic:profiling:Telemetry better
[-514.995µs; -211.626µs] or [-6.780%; -2.786%]
7.233ms 7.596ms
scenario:startup:petclinic:tracing:GlobalTracer worse
[+6.485ms; +17.144ms] or [+2.176%; +5.754%]
309.783ms 297.969ms
scenario:startup:petclinic:tracing:Remote Config better
[-61.709µs; -21.316µs] or [-8.854%; -3.058%]
655.474µs 696.987µs
scenario:startup:petclinic:tracing:Telemetry better
[-471.137µs; -154.340µs] or [-6.219%; -2.037%]
7.263ms 7.576ms
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.28.0-SNAPSHOT~140b86d5e0, baseline=1.29.0-SNAPSHOT~cd33d47cbf

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.056 s) : 0, 1055996
Total [baseline] (8.745 s) : 0, 8745378
Agent [candidate] (1.052 s) : 0, 1051972
Total [candidate] (8.761 s) : 0, 8761221
section iast
Agent [baseline] (1.194 s) : 0, 1193865
Total [baseline] (9.369 s) : 0, 9368573
Agent [candidate] (1.183 s) : 0, 1183086
Total [candidate] (9.341 s) : 0, 9341029
section iast_TELEMETRY_OFF
Agent [baseline] (1.17 s) : 0, 1169517
Total [baseline] (9.322 s) : 0, 9322200
Agent [candidate] (1.194 s) : 0, 1193984
Total [candidate] (9.36 s) : 0, 9360080
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.056 s -
Agent iast 1.194 s 137.869 ms (13.1%)
Agent iast_TELEMETRY_OFF 1.17 s 113.521 ms (10.8%)
Total tracing 8.745 s -
Total iast 9.369 s 623.195 ms (7.1%)
Total iast_TELEMETRY_OFF 9.322 s 576.823 ms (6.6%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.052 s -
Agent iast 1.183 s 131.115 ms (12.5%)
Agent iast_TELEMETRY_OFF 1.194 s 142.012 ms (13.5%)
Total tracing 8.761 s -
Total iast 9.341 s 579.808 ms (6.6%)
Total iast_TELEMETRY_OFF 9.36 s 598.859 ms (6.8%)
gantt
    title insecure-bank - break down per module: candidate=1.28.0-SNAPSHOT~140b86d5e0, baseline=1.29.0-SNAPSHOT~cd33d47cbf

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (665.295 ms) : 0, 665295
BytebuddyAgent [candidate] (651.072 ms) : 0, 651072
GlobalTracer [baseline] (296.376 ms) : 0, 296376
GlobalTracer [candidate] (307.919 ms) : 0, 307919
AppSec [baseline] (51.915 ms) : 0, 51915
AppSec [candidate] (50.87 ms) : 0, 50870
Remote Config [baseline] (694.306 µs) : 0, 694
Remote Config [candidate] (656.438 µs) : 0, 656
Telemetry [baseline] (7.556 ms) : 0, 7556
Telemetry [candidate] (7.224 ms) : 0, 7224
section iast
BytebuddyAgent [baseline] (787.513 ms) : 0, 787513
BytebuddyAgent [candidate] (777.581 ms) : 0, 777581
GlobalTracer [baseline] (289.457 ms) : 0, 289457
GlobalTracer [candidate] (289.905 ms) : 0, 289905
AppSec [baseline] (53.915 ms) : 0, 53915
AppSec [candidate] (53.756 ms) : 0, 53756
IAST [baseline] (20.21 ms) : 0, 20210
IAST [candidate] (20.001 ms) : 0, 20001
Remote Config [baseline] (596.873 µs) : 0, 597
Remote Config [candidate] (588.699 µs) : 0, 589
Telemetry [baseline] (7.418 ms) : 0, 7418
Telemetry [candidate] (6.604 ms) : 0, 6604
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (766.729 ms) : 0, 766729
BytebuddyAgent [candidate] (783.387 ms) : 0, 783387
GlobalTracer [baseline] (286.355 ms) : 0, 286355
GlobalTracer [candidate] (294.054 ms) : 0, 294054
AppSec [baseline] (53.504 ms) : 0, 53504
AppSec [candidate] (50.33 ms) : 0, 50330
IAST [baseline] (21.743 ms) : 0, 21743
IAST [candidate] (23.8 ms) : 0, 23800
Remote Config [baseline] (609.487 µs) : 0, 609
Remote Config [candidate] (586.497 µs) : 0, 586
Telemetry [baseline] (6.351 ms) : 0, 6351
Telemetry [candidate] (6.598 ms) : 0, 6598
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.28.0-SNAPSHOT~140b86d5e0, baseline=1.29.0-SNAPSHOT~cd33d47cbf

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.067 s) : 0, 1067051
Total [baseline] (9.468 s) : 0, 9468096
Agent [candidate] (1.058 s) : 0, 1057531
Total [candidate] (9.426 s) : 0, 9425532
section appsec
Agent [baseline] (1.153 s) : 0, 1153272
Total [baseline] (9.465 s) : 0, 9465290
Agent [candidate] (1.154 s) : 0, 1154053
Total [candidate] (9.5 s) : 0, 9500300
section iast
Agent [baseline] (1.177 s) : 0, 1176593
Total [baseline] (9.68 s) : 0, 9680030
Agent [candidate] (1.183 s) : 0, 1183233
Total [candidate] (9.59 s) : 0, 9590316
section profiling
Agent [baseline] (1.289 s) : 0, 1288953
Total [baseline] (9.651 s) : 0, 9651276
Agent [candidate] (1.273 s) : 0, 1273183
Total [candidate] (9.606 s) : 0, 9606115
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.067 s -
Agent appsec 1.153 s 86.221 ms (8.1%)
Agent iast 1.177 s 109.542 ms (10.3%)
Agent profiling 1.289 s 221.902 ms (20.8%)
Total tracing 9.468 s -
Total appsec 9.465 s -2.806 ms (-0.0%)
Total iast 9.68 s 211.933 ms (2.2%)
Total profiling 9.651 s 183.18 ms (1.9%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.058 s -
Agent appsec 1.154 s 96.522 ms (9.1%)
Agent iast 1.183 s 125.702 ms (11.9%)
Agent profiling 1.273 s 215.652 ms (20.4%)
Total tracing 9.426 s -
Total appsec 9.5 s 74.768 ms (0.8%)
Total iast 9.59 s 164.784 ms (1.7%)
Total profiling 9.606 s 180.583 ms (1.9%)
gantt
    title petclinic - break down per module: candidate=1.28.0-SNAPSHOT~140b86d5e0, baseline=1.29.0-SNAPSHOT~cd33d47cbf

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (673.842 ms) : 0, 673842
BytebuddyAgent [candidate] (654.481 ms) : 0, 654481
GlobalTracer [baseline] (297.969 ms) : 0, 297969
GlobalTracer [candidate] (309.783 ms) : 0, 309783
AppSec [baseline] (52.208 ms) : 0, 52208
AppSec [candidate] (50.853 ms) : 0, 50853
Remote Config [baseline] (696.987 µs) : 0, 697
Remote Config [candidate] (655.474 µs) : 0, 655
Telemetry [baseline] (7.576 ms) : 0, 7576
Telemetry [candidate] (7.263 ms) : 0, 7263
section appsec
BytebuddyAgent [baseline] (665.558 ms) : 0, 665558
BytebuddyAgent [candidate] (652.033 ms) : 0, 652033
GlobalTracer [baseline] (295.889 ms) : 0, 295889
GlobalTracer [candidate] (310.771 ms) : 0, 310771
AppSec [baseline] (150.178 ms) : 0, 150178
AppSec [candidate] (149.313 ms) : 0, 149313
Remote Config [baseline] (689.77 µs) : 0, 690
Remote Config [candidate] (648.687 µs) : 0, 649
Telemetry [baseline] (6.774 ms) : 0, 6774
Telemetry [candidate] (6.949 ms) : 0, 6949
section iast
BytebuddyAgent [baseline] (773.966 ms) : 0, 773966
BytebuddyAgent [candidate] (778.642 ms) : 0, 778642
GlobalTracer [baseline] (285.815 ms) : 0, 285815
GlobalTracer [candidate] (288.775 ms) : 0, 288775
AppSec [baseline] (52.025 ms) : 0, 52025
AppSec [candidate] (55.355 ms) : 0, 55355
Remote Config [baseline] (597.503 µs) : 0, 598
Remote Config [candidate] (570.912 µs) : 0, 571
Telemetry [baseline] (6.542 ms) : 0, 6542
Telemetry [candidate] (7.268 ms) : 0, 7268
IAST [baseline] (23.517 ms) : 0, 23517
IAST [candidate] (17.945 ms) : 0, 17945
section profiling
BytebuddyAgent [baseline] (669.767 ms) : 0, 669767
BytebuddyAgent [candidate] (661.955 ms) : 0, 661955
GlobalTracer [baseline] (379.413 ms) : 0, 379413
GlobalTracer [candidate] (375.064 ms) : 0, 375064
AppSec [baseline] (52.668 ms) : 0, 52668
AppSec [candidate] (51.291 ms) : 0, 51291
Remote Config [baseline] (681.862 µs) : 0, 682
Remote Config [candidate] (989.236 µs) : 0, 989
Telemetry [baseline] (7.596 ms) : 0, 7596
Telemetry [candidate] (7.233 ms) : 0, 7233
ProfilingAgent [baseline] (123.871 ms) : 0, 123871
ProfilingAgent [candidate] (122.415 ms) : 0, 122415
Profiling [baseline] (123.896 ms) : 0, 123896
Profiling [candidate] (122.442 ms) : 0, 122442
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2024-01-25T11:52:37 2024-01-25T12:09:09
git_branch master paul.laffon/spark-sql-node-id
git_commit_date 1706178321 1706182127
git_commit_sha cd33d47 140b86d
release_version 1.29.0-SNAPSHOT~cd33d47cbf 1.28.0-SNAPSHOT~140b86d5e0
start_time 2024-01-25T11:52:23 2024-01-25T12:08:56
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1706184804 1706184804
ci_job_id 418365583 418365583
ci_pipeline_id 27170106 27170106
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 13 unstable metrics.

Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.28.0-SNAPSHOT~140b86d5e0, baseline=1.29.0-SNAPSHOT~cd33d47cbf
    dateFormat X
    axisFormat %s
section baseline
no_agent (376.912 µs) : 356, 398
.   : milestone, 377,
iast (484.879 µs) : 464, 506
.   : milestone, 485,
iast_FULL (545.614 µs) : 525, 566
.   : milestone, 546,
iast_INACTIVE (464.966 µs) : 443, 487
.   : milestone, 465,
iast_TELEMETRY_OFF (477.998 µs) : 457, 499
.   : milestone, 478,
tracing (451.865 µs) : 430, 473
.   : milestone, 452,
section candidate
no_agent (371.255 µs) : 352, 391
.   : milestone, 371,
iast (480.311 µs) : 460, 501
.   : milestone, 480,
iast_FULL (553.288 µs) : 533, 574
.   : milestone, 553,
iast_INACTIVE (449.363 µs) : 429, 470
.   : milestone, 449,
iast_TELEMETRY_OFF (474.923 µs) : 454, 495
.   : milestone, 475,
tracing (444.677 µs) : 423, 466
.   : milestone, 445,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 376.912 µs [356.164 µs, 397.661 µs] -
iast 484.879 µs [463.85 µs, 505.908 µs] 107.967 µs (28.6%)
iast_FULL 545.614 µs [524.986 µs, 566.242 µs] 168.702 µs (44.8%)
iast_INACTIVE 464.966 µs [443.03 µs, 486.902 µs] 88.054 µs (23.4%)
iast_TELEMETRY_OFF 477.998 µs [457.225 µs, 498.77 µs] 101.085 µs (26.8%)
tracing 451.865 µs [430.308 µs, 473.421 µs] 74.952 µs (19.9%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 371.255 µs [351.627 µs, 390.883 µs] -
iast 480.311 µs [459.914 µs, 500.708 µs] 109.056 µs (29.4%)
iast_FULL 553.288 µs [532.612 µs, 573.963 µs] 182.033 µs (49.0%)
iast_INACTIVE 449.363 µs [428.721 µs, 470.005 µs] 78.108 µs (21.0%)
iast_TELEMETRY_OFF 474.923 µs [454.441 µs, 495.406 µs] 103.668 µs (27.9%)
tracing 444.677 µs [423.427 µs, 465.927 µs] 73.422 µs (19.8%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.28.0-SNAPSHOT~140b86d5e0, baseline=1.29.0-SNAPSHOT~cd33d47cbf
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.365 ms) : 1346, 1384
.   : milestone, 1365,
appsec (1.766 ms) : 1741, 1791
.   : milestone, 1766,
iast (1.545 ms) : 1520, 1569
.   : milestone, 1545,
profiling (1.558 ms) : 1532, 1584
.   : milestone, 1558,
tracing (1.506 ms) : 1481, 1532
.   : milestone, 1506,
section candidate
no_agent (1.372 ms) : 1353, 1391
.   : milestone, 1372,
appsec (1.773 ms) : 1748, 1799
.   : milestone, 1773,
iast (1.544 ms) : 1520, 1569
.   : milestone, 1544,
profiling (1.534 ms) : 1509, 1560
.   : milestone, 1534,
tracing (1.484 ms) : 1460, 1509
.   : milestone, 1484,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.365 ms [1.346 ms, 1.384 ms] -
appsec 1.766 ms [1.741 ms, 1.791 ms] 400.87 µs (29.4%)
iast 1.545 ms [1.52 ms, 1.569 ms] 179.735 µs (13.2%)
profiling 1.558 ms [1.532 ms, 1.584 ms] 192.832 µs (14.1%)
tracing 1.506 ms [1.481 ms, 1.532 ms] 141.303 µs (10.4%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.372 ms [1.353 ms, 1.391 ms] -
appsec 1.773 ms [1.748 ms, 1.799 ms] 401.325 µs (29.2%)
iast 1.544 ms [1.52 ms, 1.569 ms] 172.375 µs (12.6%)
profiling 1.534 ms [1.509 ms, 1.56 ms] 162.339 µs (11.8%)
tracing 1.484 ms [1.46 ms, 1.509 ms] 112.219 µs (8.2%)

@paul-laffon-dd paul-laffon-dd added the inst: apache spark Apache Spark instrumentation label Jan 9, 2024
@paul-laffon-dd paul-laffon-dd marked this pull request as ready for review January 9, 2024 15:08
@paul-laffon-dd paul-laffon-dd requested a review from a team as a code owner January 9, 2024 15:08
computeStageInfoForStage(plan, accumulators, stageId, false);
computeStageInfoForStage(plan, accumulators, stageId, parentStageIds, false);

span.setTag("_dd.spark.sql_parent_stage_ids", Arrays.toString(parentStageIds.toArray()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parentStageIds.toString() should also work - the general toString for Java collection classes is to output square brackets around a comma-separated list of elements, which is the same as Arrays.toString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, changing it to calling .toString() directly

Copy link
Contributor

@mcculls mcculls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - you should be able to drop the toArray -> Arrays.toString step and just use toString on the set of ids

@paul-laffon-dd paul-laffon-dd merged commit e2c012d into master Jan 25, 2024
73 checks passed
@paul-laffon-dd paul-laffon-dd deleted the paul.laffon/spark-sql-node-id branch January 25, 2024 12:51
@github-actions github-actions bot added this to the 1.29.0 milestone Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inst: apache spark Apache Spark instrumentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants