Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument Google protobuf #6865

Merged
merged 21 commits into from
Apr 26, 2024

Conversation

piochelepiotr
Copy link
Contributor

What Does This Do

Add protobuf instrumentation for serialize / deserialize operations.
If data streams monitoring is enabled, capture schemas of messages.

Motivation

Additional Notes

It impacts the GRPC instrumentation

since it adds spans for serialize / deserialize

Jira ticket: [PROJ-IDENT]

@pr-commenter
Copy link

pr-commenter bot commented Apr 1, 2024

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master piotr-wolski/add-protobuf-instrumentation-2
git_commit_date 1714049499 1714076754
git_commit_sha ae1c4c9 a2735d4
release_version 1.34.0-SNAPSHOT~ae1c4c9475 1.33.0-SNAPSHOT~a2735d4bea
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1714080173 1714080173
ci_job_id 497652818 497652818
ci_pipeline_id 32969118 32969118
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 1 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 16 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:startup:petclinic:appsec:IAST better
[-667.517µs; -400.323µs] or [-3.464%; -2.077%]
18.736ms 19.270ms
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.083 s) : 0, 1083467
Total [baseline] (8.563 s) : 0, 8562724
Agent [candidate] (1.079 s) : 0, 1079449
Total [candidate] (8.568 s) : 0, 8568070
section iast
Agent [baseline] (1.207 s) : 0, 1207241
Total [baseline] (9.072 s) : 0, 9072296
Agent [candidate] (1.2 s) : 0, 1199540
Total [candidate] (9.075 s) : 0, 9074504
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.209 s) : 0, 1209035
Total [baseline] (9.062 s) : 0, 9062063
Agent [candidate] (1.203 s) : 0, 1203485
Total [candidate] (9.036 s) : 0, 9036482
section iast_TELEMETRY_OFF
Agent [baseline] (1.195 s) : 0, 1195209
Total [baseline] (9.02 s) : 0, 9019868
Agent [candidate] (1.21 s) : 0, 1209606
Total [candidate] (9.032 s) : 0, 9032204
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.083 s -
Agent iast 1.207 s 123.774 ms (11.4%)
Agent iast_HARDCODED_SECRET_DISABLED 1.209 s 125.568 ms (11.6%)
Agent iast_TELEMETRY_OFF 1.195 s 111.741 ms (10.3%)
Total tracing 8.563 s -
Total iast 9.072 s 509.572 ms (6.0%)
Total iast_HARDCODED_SECRET_DISABLED 9.062 s 499.339 ms (5.8%)
Total iast_TELEMETRY_OFF 9.02 s 457.144 ms (5.3%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.079 s -
Agent iast 1.2 s 120.091 ms (11.1%)
Agent iast_HARDCODED_SECRET_DISABLED 1.203 s 124.036 ms (11.5%)
Agent iast_TELEMETRY_OFF 1.21 s 130.156 ms (12.1%)
Total tracing 8.568 s -
Total iast 9.075 s 506.435 ms (5.9%)
Total iast_HARDCODED_SECRET_DISABLED 9.036 s 468.413 ms (5.5%)
Total iast_TELEMETRY_OFF 9.032 s 464.134 ms (5.4%)
gantt
    title insecure-bank - break down per module: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (678.598 ms) : 0, 678598
BytebuddyAgent [candidate] (675.76 ms) : 0, 675760
GlobalTracer [baseline] (311.991 ms) : 0, 311991
GlobalTracer [candidate] (311.28 ms) : 0, 311280
AppSec [baseline] (49.902 ms) : 0, 49902
AppSec [candidate] (49.675 ms) : 0, 49675
Remote Config [baseline] (680.469 µs) : 0, 680
Remote Config [candidate] (663.906 µs) : 0, 664
Telemetry [baseline] (7.63 ms) : 0, 7630
Telemetry [candidate] (7.679 ms) : 0, 7679
section iast
BytebuddyAgent [baseline] (798.143 ms) : 0, 798143
BytebuddyAgent [candidate] (795.199 ms) : 0, 795199
GlobalTracer [baseline] (290.877 ms) : 0, 290877
GlobalTracer [candidate] (289.161 ms) : 0, 289161
AppSec [baseline] (51.12 ms) : 0, 51120
AppSec [candidate] (49.714 ms) : 0, 49714
Remote Config [baseline] (598.567 µs) : 0, 599
Remote Config [candidate] (602.396 µs) : 0, 602
Telemetry [baseline] (7.505 ms) : 0, 7505
Telemetry [candidate] (8.211 ms) : 0, 8211
IAST [baseline] (24.54 ms) : 0, 24540
IAST [candidate] (22.252 ms) : 0, 22252
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (801.07 ms) : 0, 801070
BytebuddyAgent [candidate] (796.904 ms) : 0, 796904
GlobalTracer [baseline] (290.242 ms) : 0, 290242
GlobalTracer [candidate] (290.383 ms) : 0, 290383
AppSec [baseline] (48.861 ms) : 0, 48861
AppSec [candidate] (50.8 ms) : 0, 50800
Remote Config [baseline] (601.595 µs) : 0, 602
Remote Config [candidate] (594.75 µs) : 0, 595
Telemetry [baseline] (9.048 ms) : 0, 9048
Telemetry [candidate] (7.412 ms) : 0, 7412
IAST [baseline] (24.689 ms) : 0, 24689
IAST [candidate] (22.842 ms) : 0, 22842
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (791.953 ms) : 0, 791953
BytebuddyAgent [candidate] (800.679 ms) : 0, 800679
GlobalTracer [baseline] (287.83 ms) : 0, 287830
GlobalTracer [candidate] (291.877 ms) : 0, 291877
AppSec [baseline] (51.305 ms) : 0, 51305
AppSec [candidate] (50.175 ms) : 0, 50175
Remote Config [baseline] (580.553 µs) : 0, 581
Remote Config [candidate] (589.47 µs) : 0, 589
Telemetry [baseline] (6.594 ms) : 0, 6594
Telemetry [candidate] (8.87 ms) : 0, 8870
IAST [baseline] (22.607 ms) : 0, 22607
IAST [candidate] (22.806 ms) : 0, 22806
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.089 s) : 0, 1089264
Total [baseline] (10.468 s) : 0, 10467543
Agent [candidate] (1.083 s) : 0, 1083056
Total [candidate] (10.384 s) : 0, 10384124
section appsec
Agent [baseline] (1.198 s) : 0, 1197836
Total [baseline] (10.445 s) : 0, 10444923
Agent [candidate] (1.2 s) : 0, 1200436
Total [candidate] (10.467 s) : 0, 10467159
section iast
Agent [baseline] (1.209 s) : 0, 1209382
Total [baseline] (10.747 s) : 0, 10747104
Agent [candidate] (1.207 s) : 0, 1206501
Total [candidate] (10.828 s) : 0, 10827530
section profiling
Agent [baseline] (1.266 s) : 0, 1266154
Total [baseline] (10.656 s) : 0, 10655967
Agent [candidate] (1.288 s) : 0, 1288114
Total [candidate] (10.662 s) : 0, 10661545
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.089 s -
Agent appsec 1.198 s 108.571 ms (10.0%)
Agent iast 1.209 s 120.118 ms (11.0%)
Agent profiling 1.266 s 176.89 ms (16.2%)
Total tracing 10.468 s -
Total appsec 10.445 s -22.62 ms (-0.2%)
Total iast 10.747 s 279.56 ms (2.7%)
Total profiling 10.656 s 188.424 ms (1.8%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.083 s -
Agent appsec 1.2 s 117.38 ms (10.8%)
Agent iast 1.207 s 123.446 ms (11.4%)
Agent profiling 1.288 s 205.058 ms (18.9%)
Total tracing 10.384 s -
Total appsec 10.467 s 83.035 ms (0.8%)
Total iast 10.828 s 443.406 ms (4.3%)
Total profiling 10.662 s 277.421 ms (2.7%)
gantt
    title petclinic - break down per module: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (682.652 ms) : 0, 682652
BytebuddyAgent [candidate] (679.311 ms) : 0, 679311
GlobalTracer [baseline] (313.421 ms) : 0, 313421
GlobalTracer [candidate] (311.394 ms) : 0, 311394
AppSec [baseline] (50.04 ms) : 0, 50040
AppSec [candidate] (49.494 ms) : 0, 49494
Remote Config [baseline] (665.771 µs) : 0, 666
Remote Config [candidate] (652.991 µs) : 0, 653
Telemetry [baseline] (7.697 ms) : 0, 7697
Telemetry [candidate] (7.567 ms) : 0, 7567
section appsec
BytebuddyAgent [baseline] (694.802 ms) : 0, 694802
BytebuddyAgent [candidate] (695.775 ms) : 0, 695775
GlobalTracer [baseline] (291.115 ms) : 0, 291115
GlobalTracer [candidate] (291.968 ms) : 0, 291968
AppSec [baseline] (149.145 ms) : 0, 149145
AppSec [candidate] (149.686 ms) : 0, 149686
Remote Config [baseline] (609.168 µs) : 0, 609
Remote Config [candidate] (595.657 µs) : 0, 596
Telemetry [baseline] (8.666 ms) : 0, 8666
Telemetry [candidate] (9.326 ms) : 0, 9326
IAST [baseline] (19.27 ms) : 0, 19270
IAST [candidate] (18.736 ms) : 0, 18736
section iast
BytebuddyAgent [baseline] (801.325 ms) : 0, 801325
BytebuddyAgent [candidate] (797.712 ms) : 0, 797712
GlobalTracer [baseline] (290.753 ms) : 0, 290753
GlobalTracer [candidate] (291.878 ms) : 0, 291878
AppSec [baseline] (49.617 ms) : 0, 49617
AppSec [candidate] (50.564 ms) : 0, 50564
Remote Config [baseline] (572.541 µs) : 0, 573
Remote Config [candidate] (591.987 µs) : 0, 592
Telemetry [baseline] (8.087 ms) : 0, 8087
Telemetry [candidate] (8.287 ms) : 0, 8287
IAST [baseline] (24.393 ms) : 0, 24393
IAST [candidate] (23.154 ms) : 0, 23154
section profiling
BytebuddyAgent [baseline] (676.279 ms) : 0, 676279
BytebuddyAgent [candidate] (687.788 ms) : 0, 687788
GlobalTracer [baseline] (379.962 ms) : 0, 379962
GlobalTracer [candidate] (386.831 ms) : 0, 386831
AppSec [baseline] (50.014 ms) : 0, 50014
AppSec [candidate] (51.103 ms) : 0, 51103
Remote Config [baseline] (703.811 µs) : 0, 704
Remote Config [candidate] (717.664 µs) : 0, 718
Telemetry [baseline] (7.429 ms) : 0, 7429
Telemetry [candidate] (7.552 ms) : 0, 7552
ProfilingAgent [baseline] (95.455 ms) : 0, 95455
ProfilingAgent [candidate] (97.05 ms) : 0, 97050
Profiling [baseline] (95.478 ms) : 0, 95478
Profiling [candidate] (97.074 ms) : 0, 97074

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2024-04-25T20:55:07 2024-04-25T21:17:06
git_branch master piotr-wolski/add-protobuf-instrumentation-2
git_commit_date 1714049499 1714076754
git_commit_sha ae1c4c9 a2735d4
release_version 1.34.0-SNAPSHOT~ae1c4c9475 1.33.0-SNAPSHOT~a2735d4bea
start_time 2024-04-25T20:54:53 2024-04-25T21:16:53
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1714080173 1714080173
ci_job_id 497652818 497652818
ci_pipeline_id 32969118 32969118
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 13 metrics, 15 unstable metrics.

Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475
    dateFormat X
    axisFormat %s
section baseline
no_agent (375.289 µs) : 354, 396
.   : milestone, 375,
iast (488.846 µs) : 468, 510
.   : milestone, 489,
iast_FULL (547.607 µs) : 527, 569
.   : milestone, 548,
iast_GLOBAL (511.151 µs) : 488, 534
.   : milestone, 511,
iast_HARDCODED_SECRET_DISABLED (477.961 µs) : 456, 500
.   : milestone, 478,
iast_INACTIVE (455.838 µs) : 435, 477
.   : milestone, 456,
iast_TELEMETRY_OFF (479.616 µs) : 459, 500
.   : milestone, 480,
tracing (453.058 µs) : 433, 474
.   : milestone, 453,
section candidate
no_agent (377.385 µs) : 357, 398
.   : milestone, 377,
iast (483.211 µs) : 462, 505
.   : milestone, 483,
iast_FULL (545.65 µs) : 525, 567
.   : milestone, 546,
iast_GLOBAL (514.792 µs) : 492, 538
.   : milestone, 515,
iast_HARDCODED_SECRET_DISABLED (484.614 µs) : 463, 507
.   : milestone, 485,
iast_INACTIVE (454.983 µs) : 434, 476
.   : milestone, 455,
iast_TELEMETRY_OFF (480.685 µs) : 459, 502
.   : milestone, 481,
tracing (454.05 µs) : 433, 475
.   : milestone, 454,
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 375.289 µs [354.384 µs, 396.194 µs] -
iast 488.846 µs [467.965 µs, 509.728 µs] 113.558 µs (30.3%)
iast_FULL 547.607 µs [526.629 µs, 568.586 µs] 172.319 µs (45.9%)
iast_GLOBAL 511.151 µs [488.256 µs, 534.046 µs] 135.862 µs (36.2%)
iast_HARDCODED_SECRET_DISABLED 477.961 µs [456.398 µs, 499.525 µs] 102.672 µs (27.4%)
iast_INACTIVE 455.838 µs [434.545 µs, 477.13 µs] 80.549 µs (21.5%)
iast_TELEMETRY_OFF 479.616 µs [459.066 µs, 500.167 µs] 104.328 µs (27.8%)
tracing 453.058 µs [432.607 µs, 473.509 µs] 77.769 µs (20.7%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 377.385 µs [356.88 µs, 397.891 µs] -
iast 483.211 µs [461.681 µs, 504.74 µs] 105.825 µs (28.0%)
iast_FULL 545.65 µs [524.75 µs, 566.551 µs] 168.265 µs (44.6%)
iast_GLOBAL 514.792 µs [491.727 µs, 537.857 µs] 137.407 µs (36.4%)
iast_HARDCODED_SECRET_DISABLED 484.614 µs [462.645 µs, 506.582 µs] 107.228 µs (28.4%)
iast_INACTIVE 454.983 µs [434.374 µs, 475.592 µs] 77.598 µs (20.6%)
iast_TELEMETRY_OFF 480.685 µs [459.27 µs, 502.1 µs] 103.299 µs (27.4%)
tracing 454.05 µs [433.081 µs, 475.019 µs] 76.665 µs (20.3%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.34 ms) : 1321, 1359
.   : milestone, 1340,
appsec (1.73 ms) : 1705, 1755
.   : milestone, 1730,
appsec_no_iast (1.727 ms) : 1702, 1753
.   : milestone, 1727,
iast (1.49 ms) : 1467, 1513
.   : milestone, 1490,
profiling (1.518 ms) : 1492, 1544
.   : milestone, 1518,
tracing (1.5 ms) : 1476, 1524
.   : milestone, 1500,
section candidate
no_agent (1.358 ms) : 1339, 1377
.   : milestone, 1358,
appsec (1.754 ms) : 1729, 1778
.   : milestone, 1754,
appsec_no_iast (1.742 ms) : 1718, 1766
.   : milestone, 1742,
iast (1.534 ms) : 1511, 1557
.   : milestone, 1534,
profiling (1.498 ms) : 1473, 1523
.   : milestone, 1498,
tracing (1.506 ms) : 1481, 1530
.   : milestone, 1506,
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.34 ms [1.321 ms, 1.359 ms] -
appsec 1.73 ms [1.705 ms, 1.755 ms] 390.717 µs (29.2%)
appsec_no_iast 1.727 ms [1.702 ms, 1.753 ms] 387.645 µs (28.9%)
iast 1.49 ms [1.467 ms, 1.513 ms] 150.266 µs (11.2%)
profiling 1.518 ms [1.492 ms, 1.544 ms] 178.51 µs (13.3%)
tracing 1.5 ms [1.476 ms, 1.524 ms] 160.466 µs (12.0%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.358 ms [1.339 ms, 1.377 ms] -
appsec 1.754 ms [1.729 ms, 1.778 ms] 395.739 µs (29.1%)
appsec_no_iast 1.742 ms [1.718 ms, 1.766 ms] 384.072 µs (28.3%)
iast 1.534 ms [1.511 ms, 1.557 ms] 175.584 µs (12.9%)
profiling 1.498 ms [1.473 ms, 1.523 ms] 139.457 µs (10.3%)
tracing 1.506 ms [1.481 ms, 1.53 ms] 147.494 µs (10.9%)

Dacapo

@piochelepiotr piochelepiotr force-pushed the piotr-wolski/add-protobuf-instrumentation-2 branch from abcbd6c to 36f81bd Compare April 1, 2024 20:04
@piochelepiotr piochelepiotr changed the title Piotr wolski/add protobuf instrumentation 2 Instrument Google protobuf Apr 1, 2024
@piochelepiotr piochelepiotr marked this pull request as ready for review April 1, 2024 20:10
@piochelepiotr piochelepiotr requested a review from a team as a code owner April 1, 2024 20:10
AgentSpan span = scope.span();
DESERIALIZER_DECORATOR.onError(span, throwable);
if (message instanceof AbstractMessage) {
DESERIALIZER_DECORATOR.attachSchemaOnSpan((AbstractMessage) message, span);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like work that would probably be better done in a background thread.
There is a mechanism for doing that, but I think it could use some improvement.

For now, I'm fine with this as is - as long as it is not on by default.

span.setTag(DDTags.SCHEMA_TYPE, PROTOBUF);
span.setTag(DDTags.SCHEMA_NAME, descriptor.getName());
span.setTag(DDTags.SCHEMA_OPERATION, operation);
Integer prio = span.forceSamplingDecision();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There probably needs to a product specific code for why the decision was forced.
You should check with APM ingestion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, if a customer pays for the ingestion of a span, we want to be able to tell why it was ingested (it’s also pretty useful for troubleshooting). The tracer set a specific attribute called _dd.p.dm depending on the sampling mechanism (see all potential values that exist).

Questions:

  • When you override the sampling decision, do you just want the span or the whole trace?
  • Do you expect a lot of spans coming in? Once per tracer process maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked a bit more the code, it doesn’t look like forceSamplingPriority is going to increase sampling rate, is that correct?
It looks like it’s going to force the sampling decision to be taken earlier.
What I can do, is reverse the two samplings.
Today, the code extracts only 1 schema every 30 seconds, so I can do that sampling first. That way, the change would only affect the sampling decision of 1 trace every 30 seconds, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched the two sampling calls.
Now the forceSamplingDecision is only rarely called (a few times every 30 seconds, 1 time in the best case depending on race conditions).
Also, forceSamplingDecision doesn't force the trace to be sampled. It forces the sampling decision to be made.
So this should not increase sampling rate.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes sense, I had not understood that initially. If you don't influence sampling I don't think we need to do anything at intake. For accomplishing what you want I am not knowledgeable enough with the tracer itself.

public static String extractSchema(Descriptor descriptor) {
List<String> schemas = new ArrayList<>();
extractDescriptorSchema(descriptor, schemas);
return "{"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not properly escaped

}

private static void extractDescriptorSchema(Descriptor descriptor, List<String> schemas) {
StringBuilder schema = new StringBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not properly escaped - you should probably use a JSON API instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's ok to add the dependency to JSON API?
It would make the code easier indeed. But add a dependency

Copy link
Contributor

@dougqh dougqh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns that JSON is not done using JSON API and is not properly escaped.
I also have concerns with the altering of the sampling decision.
When altering the sampling decision, I believe we should be adding information about which product made that decision. We should check that with APM ingestion.

span.setTag(DDTags.SCHEMA_TYPE, PROTOBUF);
span.setTag(DDTags.SCHEMA_NAME, descriptor.getName());
span.setTag(DDTags.SCHEMA_OPERATION, operation);
Integer prio = span.forceSamplingDecision();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, if a customer pays for the ingestion of a span, we want to be able to tell why it was ingested (it’s also pretty useful for troubleshooting). The tracer set a specific attribute called _dd.p.dm depending on the sampling mechanism (see all potential values that exist).

Questions:

  • When you override the sampling decision, do you just want the span or the whole trace?
  • Do you expect a lot of spans coming in? Once per tracer process maybe?

Comment on lines 89 to 90
extractSchema(field.getMessageType(), builder);
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will end up in a StackOverflowError in case there are cycles in the schema, see the example:

message DummyMessage {
    int32 id = 1;
    string name = 2;
    DummyMessage nested = 3;
}

A cache of already processed types will help here and the SchemaBuilder should be able to handle cycles. Optionally, we might want to include a limit in the recursion depth and also number of elements visited to deal with problematic/big schemas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will update the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code & added limits. Thanks for the comment!

@pr-commenter
Copy link

pr-commenter bot commented Apr 5, 2024

Kafka / producer-benchmark

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master piotr-wolski/add-protobuf-instrumentation-2
git_commit_date 1712235284 1714076754
git_commit_sha 734e3c5 a2735d4
See matching parameters
Baseline Candidate
ci_job_date 1714077994 1714077994
ci_job_id 497652819 497652819
ci_pipeline_id 32969118 32969118
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
jdkVersion 11.0.21 11.0.21
jmhVersion 1.36 1.36
jvm /usr/lib/jvm/java-11-openjdk-amd64/bin/java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
jvmArgs -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
vmName OpenJDK 64-Bit Server VM OpenJDK 64-Bit Server VM
vmVersion 11.0.21+9-post-Ubuntu-0ubuntu122.04 11.0.21+9-post-Ubuntu-0ubuntu122.04

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics.

See unchanged results
scenario Δ mean throughput
scenario:not-instrumented/KafkaProduceBenchmark.benchProduce same
scenario:only-tracing-dsm-disabled-benchmarks/KafkaProduceBenchmark.benchProduce same
scenario:only-tracing-dsm-enabled-benchmarks/KafkaProduceBenchmark.benchProduce unsure
[-4697.552op/s; -489.341op/s] or [-3.829%; -0.399%]

@pr-commenter
Copy link

pr-commenter bot commented Apr 5, 2024

Kafka / consumer-benchmark

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master piotr-wolski/add-protobuf-instrumentation-2
git_commit_date 1712235284 1714076754
git_commit_sha 734e3c5 a2735d4
See matching parameters
Baseline Candidate
ci_job_date 1714078031 1714078031
ci_job_id 497652820 497652820
ci_pipeline_id 32969118 32969118
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
jdkVersion 11.0.21 11.0.21
jmhVersion 1.36 1.36
jvm /usr/lib/jvm/java-11-openjdk-amd64/bin/java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
jvmArgs -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
vmName OpenJDK 64-Bit Server VM OpenJDK 64-Bit Server VM
vmVersion 11.0.21+9-post-Ubuntu-0ubuntu122.04 11.0.21+9-post-Ubuntu-0ubuntu122.04

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics.

See unchanged results
scenario Δ mean throughput
scenario:not-instrumented/KafkaConsumerBenchmark.benchConsume same
scenario:only-tracing-dsm-disabled-benchmarks/KafkaConsumerBenchmark.benchConsume same
scenario:only-tracing-dsm-enabled-benchmarks/KafkaConsumerBenchmark.benchConsume same

Copy link

@dumontg dumontg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for APM Trace Intake. Since this change doesn't change the sampling whatsoever, there is no need to add new ingestion reasons.

group = "com.google.protobuf"
module = "protobuf-java"
versions = "[3.0.0,)"
assertInverse = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason not to test that the instrumentation is disabled on versions < 3?


static final String instrumentationName = "protobuf";
static final String TARGET_TYPE = "com.google.protobuf.AbstractMessage";
static final String SERIALIZE = "serialize";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may we use a more specific operation name? something like protobuf.serialize is probably more inline with the naming we are used to. I had a check and I did not find any otel prio art to get aligned to. Given this, you can also use Utf8ByteString here to cache utf8 conversions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the name, for some reason I can't explain, it doesn't work if I switch to Utf8ByteString 🤷

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have tests failing is probably because they are comparing String with CharSequence that's not the same thing.
So

String s = "test"
CharSequence cs = UTF8ByteString.create("test")

s.equals(cs) -> false
s.contentEquals(cs) -> true
s.equals(cs.toString()) -> true

new org.apache.kafka.clients.producer.KafkaProducer<>(properties);

Duration duration = Duration.newBuilder().setSeconds(10).build();
System.out.println("duration is " + duration.getSeconds());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a logger should be used to print out this

@@ -44,4 +44,6 @@ public class InternalSpanTypes {
UTF8BytesString.create(DDSpanTypes.TEST_SESSION_END);
public static final UTF8BytesString VULNERABILITY =
UTF8BytesString.create(DDSpanTypes.VULNERABILITY);
public static final UTF8BytesString SERIALIZE = UTF8BytesString.create(DDSpanTypes.SERIALIZE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I would have been used protobuf as span type since serialize/deserialize are rather operations

@@ -777,6 +777,7 @@ private int appendLeaf(int dataIndex, String key, int keyIndex, int value) {

/** Generates Java source for a trie described as a series of "{number} {class-name}" lines. */
public static class JavaGenerator {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the newline should be removed since that file should not be part of the changeset

@@ -38,6 +38,7 @@ compileJava.dependsOn 'generateClassNameTries'
packageSources.dependsOn 'generateClassNameTries'
sourcesJar.dependsOn 'generateClassNameTries'


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this file from the changeset


static final String instrumentationName = "protobuf";
static final String TARGET_TYPE = "com.google.protobuf.AbstractParser";
static final String DESERIALIZE = "deserialize";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have same naming concern than "serialize"

if (weight == 0) {
return;
}
String schema = SchemaExtractor.extractSchemas(descriptor);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to me a quite expensive operation. Can we cache this? The same for the schemaId being a hash depending on the schema name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cached based on the schema name, but I'm worried the schema changes, and the schema name doesn't 🤔
Do you have a better idea on how to cache?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a schema be dynamically changed at runtime and what is the use case?

import net.bytebuddy.matcher.ElementMatcher;

@AutoService(InstrumenterModule.class)
public final class AbstractMessageInstrumentation extends InstrumenterModule.Tracing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning the instrumentation, do we want to systematically trace all the ser/de operations or having this couple of instrumentations enabled by default only when datastreams is enabled? I'm wondering if we're adding too many details for the non dsm users

@piochelepiotr piochelepiotr merged commit 2a8627d into master Apr 26, 2024
79 checks passed
@piochelepiotr piochelepiotr deleted the piotr-wolski/add-protobuf-instrumentation-2 branch April 26, 2024 21:51
@github-actions github-actions bot added this to the 1.34.0 milestone Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants