Instrument Google protobuf #6865

piochelepiotr · 2024-04-01T19:31:06Z

What Does This Do

Add protobuf instrumentation for serialize / deserialize operations.
If data streams monitoring is enabled, capture schemas of messages.

Motivation

Additional Notes

It impacts the GRPC instrumentation

since it adds spans for serialize / deserialize

Jira ticket: [PROJ-IDENT]

pr-commenter · 2024-04-01T19:53:45Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	piotr-wolski/add-protobuf-instrumentation-2
git_commit_date	1714049499	1714076754
git_commit_sha	`ae1c4c9`	`a2735d4`
release_version	1.34.0-SNAPSHOT~ae1c4c9475	1.33.0-SNAPSHOT~a2735d4bea

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1714080173	1714080173
ci_job_id	497652818	497652818
ci_pipeline_id	32969118	32969118
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module	Agent	Agent
parent	None	None
variant	iast	iast

Summary

Found 1 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 16 unstable metrics.

scenario	Δ mean execution_time	candidate mean execution_time	baseline mean execution_time
scenario:startup:petclinic:appsec:IAST	better [-667.517µs; -400.323µs] or [-3.464%; -2.077%]	18.736ms	19.270ms

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.083 s) : 0, 1083467
Total [baseline] (8.563 s) : 0, 8562724
Agent [candidate] (1.079 s) : 0, 1079449
Total [candidate] (8.568 s) : 0, 8568070
section iast
Agent [baseline] (1.207 s) : 0, 1207241
Total [baseline] (9.072 s) : 0, 9072296
Agent [candidate] (1.2 s) : 0, 1199540
Total [candidate] (9.075 s) : 0, 9074504
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.209 s) : 0, 1209035
Total [baseline] (9.062 s) : 0, 9062063
Agent [candidate] (1.203 s) : 0, 1203485
Total [candidate] (9.036 s) : 0, 9036482
section iast_TELEMETRY_OFF
Agent [baseline] (1.195 s) : 0, 1195209
Total [baseline] (9.02 s) : 0, 9019868
Agent [candidate] (1.21 s) : 0, 1209606
Total [candidate] (9.032 s) : 0, 9032204

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.083 s	-
Agent	iast	1.207 s	123.774 ms (11.4%)
Agent	iast_HARDCODED_SECRET_DISABLED	1.209 s	125.568 ms (11.6%)
Agent	iast_TELEMETRY_OFF	1.195 s	111.741 ms (10.3%)
Total	tracing	8.563 s	-
Total	iast	9.072 s	509.572 ms (6.0%)
Total	iast_HARDCODED_SECRET_DISABLED	9.062 s	499.339 ms (5.8%)
Total	iast_TELEMETRY_OFF	9.02 s	457.144 ms (5.3%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.079 s	-
Agent	iast	1.2 s	120.091 ms (11.1%)
Agent	iast_HARDCODED_SECRET_DISABLED	1.203 s	124.036 ms (11.5%)
Agent	iast_TELEMETRY_OFF	1.21 s	130.156 ms (12.1%)
Total	tracing	8.568 s	-
Total	iast	9.075 s	506.435 ms (5.9%)
Total	iast_HARDCODED_SECRET_DISABLED	9.036 s	468.413 ms (5.5%)
Total	iast_TELEMETRY_OFF	9.032 s	464.134 ms (5.4%)

gantt
    title insecure-bank - break down per module: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (678.598 ms) : 0, 678598
BytebuddyAgent [candidate] (675.76 ms) : 0, 675760
GlobalTracer [baseline] (311.991 ms) : 0, 311991
GlobalTracer [candidate] (311.28 ms) : 0, 311280
AppSec [baseline] (49.902 ms) : 0, 49902
AppSec [candidate] (49.675 ms) : 0, 49675
Remote Config [baseline] (680.469 µs) : 0, 680
Remote Config [candidate] (663.906 µs) : 0, 664
Telemetry [baseline] (7.63 ms) : 0, 7630
Telemetry [candidate] (7.679 ms) : 0, 7679
section iast
BytebuddyAgent [baseline] (798.143 ms) : 0, 798143
BytebuddyAgent [candidate] (795.199 ms) : 0, 795199
GlobalTracer [baseline] (290.877 ms) : 0, 290877
GlobalTracer [candidate] (289.161 ms) : 0, 289161
AppSec [baseline] (51.12 ms) : 0, 51120
AppSec [candidate] (49.714 ms) : 0, 49714
Remote Config [baseline] (598.567 µs) : 0, 599
Remote Config [candidate] (602.396 µs) : 0, 602
Telemetry [baseline] (7.505 ms) : 0, 7505
Telemetry [candidate] (8.211 ms) : 0, 8211
IAST [baseline] (24.54 ms) : 0, 24540
IAST [candidate] (22.252 ms) : 0, 22252
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (801.07 ms) : 0, 801070
BytebuddyAgent [candidate] (796.904 ms) : 0, 796904
GlobalTracer [baseline] (290.242 ms) : 0, 290242
GlobalTracer [candidate] (290.383 ms) : 0, 290383
AppSec [baseline] (48.861 ms) : 0, 48861
AppSec [candidate] (50.8 ms) : 0, 50800
Remote Config [baseline] (601.595 µs) : 0, 602
Remote Config [candidate] (594.75 µs) : 0, 595
Telemetry [baseline] (9.048 ms) : 0, 9048
Telemetry [candidate] (7.412 ms) : 0, 7412
IAST [baseline] (24.689 ms) : 0, 24689
IAST [candidate] (22.842 ms) : 0, 22842
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (791.953 ms) : 0, 791953
BytebuddyAgent [candidate] (800.679 ms) : 0, 800679
GlobalTracer [baseline] (287.83 ms) : 0, 287830
GlobalTracer [candidate] (291.877 ms) : 0, 291877
AppSec [baseline] (51.305 ms) : 0, 51305
AppSec [candidate] (50.175 ms) : 0, 50175
Remote Config [baseline] (580.553 µs) : 0, 581
Remote Config [candidate] (589.47 µs) : 0, 589
Telemetry [baseline] (6.594 ms) : 0, 6594
Telemetry [candidate] (8.87 ms) : 0, 8870
IAST [baseline] (22.607 ms) : 0, 22607
IAST [candidate] (22.806 ms) : 0, 22806

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.089 s) : 0, 1089264
Total [baseline] (10.468 s) : 0, 10467543
Agent [candidate] (1.083 s) : 0, 1083056
Total [candidate] (10.384 s) : 0, 10384124
section appsec
Agent [baseline] (1.198 s) : 0, 1197836
Total [baseline] (10.445 s) : 0, 10444923
Agent [candidate] (1.2 s) : 0, 1200436
Total [candidate] (10.467 s) : 0, 10467159
section iast
Agent [baseline] (1.209 s) : 0, 1209382
Total [baseline] (10.747 s) : 0, 10747104
Agent [candidate] (1.207 s) : 0, 1206501
Total [candidate] (10.828 s) : 0, 10827530
section profiling
Agent [baseline] (1.266 s) : 0, 1266154
Total [baseline] (10.656 s) : 0, 10655967
Agent [candidate] (1.288 s) : 0, 1288114
Total [candidate] (10.662 s) : 0, 10661545

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.089 s	-
Agent	appsec	1.198 s	108.571 ms (10.0%)
Agent	iast	1.209 s	120.118 ms (11.0%)
Agent	profiling	1.266 s	176.89 ms (16.2%)
Total	tracing	10.468 s	-
Total	appsec	10.445 s	-22.62 ms (-0.2%)
Total	iast	10.747 s	279.56 ms (2.7%)
Total	profiling	10.656 s	188.424 ms (1.8%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.083 s	-
Agent	appsec	1.2 s	117.38 ms (10.8%)
Agent	iast	1.207 s	123.446 ms (11.4%)
Agent	profiling	1.288 s	205.058 ms (18.9%)
Total	tracing	10.384 s	-
Total	appsec	10.467 s	83.035 ms (0.8%)
Total	iast	10.828 s	443.406 ms (4.3%)
Total	profiling	10.662 s	277.421 ms (2.7%)

gantt
    title petclinic - break down per module: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (682.652 ms) : 0, 682652
BytebuddyAgent [candidate] (679.311 ms) : 0, 679311
GlobalTracer [baseline] (313.421 ms) : 0, 313421
GlobalTracer [candidate] (311.394 ms) : 0, 311394
AppSec [baseline] (50.04 ms) : 0, 50040
AppSec [candidate] (49.494 ms) : 0, 49494
Remote Config [baseline] (665.771 µs) : 0, 666
Remote Config [candidate] (652.991 µs) : 0, 653
Telemetry [baseline] (7.697 ms) : 0, 7697
Telemetry [candidate] (7.567 ms) : 0, 7567
section appsec
BytebuddyAgent [baseline] (694.802 ms) : 0, 694802
BytebuddyAgent [candidate] (695.775 ms) : 0, 695775
GlobalTracer [baseline] (291.115 ms) : 0, 291115
GlobalTracer [candidate] (291.968 ms) : 0, 291968
AppSec [baseline] (149.145 ms) : 0, 149145
AppSec [candidate] (149.686 ms) : 0, 149686
Remote Config [baseline] (609.168 µs) : 0, 609
Remote Config [candidate] (595.657 µs) : 0, 596
Telemetry [baseline] (8.666 ms) : 0, 8666
Telemetry [candidate] (9.326 ms) : 0, 9326
IAST [baseline] (19.27 ms) : 0, 19270
IAST [candidate] (18.736 ms) : 0, 18736
section iast
BytebuddyAgent [baseline] (801.325 ms) : 0, 801325
BytebuddyAgent [candidate] (797.712 ms) : 0, 797712
GlobalTracer [baseline] (290.753 ms) : 0, 290753
GlobalTracer [candidate] (291.878 ms) : 0, 291878
AppSec [baseline] (49.617 ms) : 0, 49617
AppSec [candidate] (50.564 ms) : 0, 50564
Remote Config [baseline] (572.541 µs) : 0, 573
Remote Config [candidate] (591.987 µs) : 0, 592
Telemetry [baseline] (8.087 ms) : 0, 8087
Telemetry [candidate] (8.287 ms) : 0, 8287
IAST [baseline] (24.393 ms) : 0, 24393
IAST [candidate] (23.154 ms) : 0, 23154
section profiling
BytebuddyAgent [baseline] (676.279 ms) : 0, 676279
BytebuddyAgent [candidate] (687.788 ms) : 0, 687788
GlobalTracer [baseline] (379.962 ms) : 0, 379962
GlobalTracer [candidate] (386.831 ms) : 0, 386831
AppSec [baseline] (50.014 ms) : 0, 50014
AppSec [candidate] (51.103 ms) : 0, 51103
Remote Config [baseline] (703.811 µs) : 0, 704
Remote Config [candidate] (717.664 µs) : 0, 718
Telemetry [baseline] (7.429 ms) : 0, 7429
Telemetry [candidate] (7.552 ms) : 0, 7552
ProfilingAgent [baseline] (95.455 ms) : 0, 95455
ProfilingAgent [candidate] (97.05 ms) : 0, 97050
Profiling [baseline] (95.478 ms) : 0, 95478
Profiling [candidate] (97.074 ms) : 0, 97074

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
end_time	2024-04-25T20:55:07	2024-04-25T21:17:06
git_branch	master	piotr-wolski/add-protobuf-instrumentation-2
git_commit_date	1714049499	1714076754
git_commit_sha	`ae1c4c9`	`a2735d4`
release_version	1.34.0-SNAPSHOT~ae1c4c9475	1.33.0-SNAPSHOT~a2735d4bea
start_time	2024-04-25T20:54:53	2024-04-25T21:16:53

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1714080173	1714080173
ci_job_id	497652818	497652818
ci_pipeline_id	32969118	32969118
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant	iast	iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 13 metrics, 15 unstable metrics.

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475
    dateFormat X
    axisFormat %s
section baseline
no_agent (375.289 µs) : 354, 396
.   : milestone, 375,
iast (488.846 µs) : 468, 510
.   : milestone, 489,
iast_FULL (547.607 µs) : 527, 569
.   : milestone, 548,
iast_GLOBAL (511.151 µs) : 488, 534
.   : milestone, 511,
iast_HARDCODED_SECRET_DISABLED (477.961 µs) : 456, 500
.   : milestone, 478,
iast_INACTIVE (455.838 µs) : 435, 477
.   : milestone, 456,
iast_TELEMETRY_OFF (479.616 µs) : 459, 500
.   : milestone, 480,
tracing (453.058 µs) : 433, 474
.   : milestone, 453,
section candidate
no_agent (377.385 µs) : 357, 398
.   : milestone, 377,
iast (483.211 µs) : 462, 505
.   : milestone, 483,
iast_FULL (545.65 µs) : 525, 567
.   : milestone, 546,
iast_GLOBAL (514.792 µs) : 492, 538
.   : milestone, 515,
iast_HARDCODED_SECRET_DISABLED (484.614 µs) : 463, 507
.   : milestone, 485,
iast_INACTIVE (454.983 µs) : 434, 476
.   : milestone, 455,
iast_TELEMETRY_OFF (480.685 µs) : 459, 502
.   : milestone, 481,
tracing (454.05 µs) : 433, 475
.   : milestone, 454,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	375.289 µs [354.384 µs, 396.194 µs]	-
iast	488.846 µs [467.965 µs, 509.728 µs]	113.558 µs (30.3%)
iast_FULL	547.607 µs [526.629 µs, 568.586 µs]	172.319 µs (45.9%)
iast_GLOBAL	511.151 µs [488.256 µs, 534.046 µs]	135.862 µs (36.2%)
iast_HARDCODED_SECRET_DISABLED	477.961 µs [456.398 µs, 499.525 µs]	102.672 µs (27.4%)
iast_INACTIVE	455.838 µs [434.545 µs, 477.13 µs]	80.549 µs (21.5%)
iast_TELEMETRY_OFF	479.616 µs [459.066 µs, 500.167 µs]	104.328 µs (27.8%)
tracing	453.058 µs [432.607 µs, 473.509 µs]	77.769 µs (20.7%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	377.385 µs [356.88 µs, 397.891 µs]	-
iast	483.211 µs [461.681 µs, 504.74 µs]	105.825 µs (28.0%)
iast_FULL	545.65 µs [524.75 µs, 566.551 µs]	168.265 µs (44.6%)
iast_GLOBAL	514.792 µs [491.727 µs, 537.857 µs]	137.407 µs (36.4%)
iast_HARDCODED_SECRET_DISABLED	484.614 µs [462.645 µs, 506.582 µs]	107.228 µs (28.4%)
iast_INACTIVE	454.983 µs [434.374 µs, 475.592 µs]	77.598 µs (20.6%)
iast_TELEMETRY_OFF	480.685 µs [459.27 µs, 502.1 µs]	103.299 µs (27.4%)
tracing	454.05 µs [433.081 µs, 475.019 µs]	76.665 µs (20.3%)

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.34 ms) : 1321, 1359
.   : milestone, 1340,
appsec (1.73 ms) : 1705, 1755
.   : milestone, 1730,
appsec_no_iast (1.727 ms) : 1702, 1753
.   : milestone, 1727,
iast (1.49 ms) : 1467, 1513
.   : milestone, 1490,
profiling (1.518 ms) : 1492, 1544
.   : milestone, 1518,
tracing (1.5 ms) : 1476, 1524
.   : milestone, 1500,
section candidate
no_agent (1.358 ms) : 1339, 1377
.   : milestone, 1358,
appsec (1.754 ms) : 1729, 1778
.   : milestone, 1754,
appsec_no_iast (1.742 ms) : 1718, 1766
.   : milestone, 1742,
iast (1.534 ms) : 1511, 1557
.   : milestone, 1534,
profiling (1.498 ms) : 1473, 1523
.   : milestone, 1498,
tracing (1.506 ms) : 1481, 1530
.   : milestone, 1506,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.34 ms [1.321 ms, 1.359 ms]	-
appsec	1.73 ms [1.705 ms, 1.755 ms]	390.717 µs (29.2%)
appsec_no_iast	1.727 ms [1.702 ms, 1.753 ms]	387.645 µs (28.9%)
iast	1.49 ms [1.467 ms, 1.513 ms]	150.266 µs (11.2%)
profiling	1.518 ms [1.492 ms, 1.544 ms]	178.51 µs (13.3%)
tracing	1.5 ms [1.476 ms, 1.524 ms]	160.466 µs (12.0%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.358 ms [1.339 ms, 1.377 ms]	-
appsec	1.754 ms [1.729 ms, 1.778 ms]	395.739 µs (29.1%)
appsec_no_iast	1.742 ms [1.718 ms, 1.766 ms]	384.072 µs (28.3%)
iast	1.534 ms [1.511 ms, 1.557 ms]	175.584 µs (12.9%)
profiling	1.498 ms [1.473 ms, 1.523 ms]	139.457 µs (10.3%)
tracing	1.506 ms [1.481 ms, 1.53 ms]	147.494 µs (10.9%)

Dacapo

dougqh · 2024-04-02T14:39:03Z

...src/main/java/datadog/trace/instrumentation/protobuf_java/AbstractParserInstrumentation.java

+      AgentSpan span = scope.span();
+      DESERIALIZER_DECORATOR.onError(span, throwable);
+      if (message instanceof AbstractMessage) {
+        DESERIALIZER_DECORATOR.attachSchemaOnSpan((AbstractMessage) message, span);


This seems like work that would probably be better done in a background thread.
There is a mechanism for doing that, but I think it could use some improvement.

For now, I'm fine with this as is - as long as it is not on by default.

dougqh · 2024-04-02T14:40:40Z

...tion/protobuf-3.0.0/src/main/java/datadog/trace/instrumentation/protobuf_java/Decorator.java

+    span.setTag(DDTags.SCHEMA_TYPE, PROTOBUF);
+    span.setTag(DDTags.SCHEMA_NAME, descriptor.getName());
+    span.setTag(DDTags.SCHEMA_OPERATION, operation);
+    Integer prio = span.forceSamplingDecision();


There probably needs to a product specific code for why the decision was forced.
You should check with APM ingestion.

Indeed, if a customer pays for the ingestion of a span, we want to be able to tell why it was ingested (it’s also pretty useful for troubleshooting). The tracer set a specific attribute called _dd.p.dm depending on the sampling mechanism (see all potential values that exist).

Questions:

When you override the sampling decision, do you just want the span or the whole trace?

Do you expect a lot of spans coming in? Once per tracer process maybe?

I checked a bit more the code, it doesn’t look like forceSamplingPriority is going to increase sampling rate, is that correct?
It looks like it’s going to force the sampling decision to be taken earlier.
What I can do, is reverse the two samplings.
Today, the code extracts only 1 schema every 30 seconds, so I can do that sampling first. That way, the change would only affect the sampling decision of 1 trace every 30 seconds, what do you think?

I switched the two sampling calls.
Now the forceSamplingDecision is only rarely called (a few times every 30 seconds, 1 time in the best case depending on race conditions).
Also, forceSamplingDecision doesn't force the trace to be sampled. It forces the sampling decision to be made.
So this should not increase sampling rate.

Ah that makes sense, I had not understood that initially. If you don't influence sampling I don't think we need to do anything at intake. For accomplishing what you want I am not knowledgeable enough with the tracer itself.

dougqh · 2024-04-02T14:42:24Z

...-3.0.0/src/main/java/datadog/trace/instrumentation/protobuf_java/OpenAPIFormatExtractor.java

+  public static String extractSchema(Descriptor descriptor) {
+    List<String> schemas = new ArrayList<>();
+    extractDescriptorSchema(descriptor, schemas);
+    return "{"


This is not properly escaped

dougqh · 2024-04-02T14:42:49Z

...-3.0.0/src/main/java/datadog/trace/instrumentation/protobuf_java/OpenAPIFormatExtractor.java

+  }
+
+  private static void extractDescriptorSchema(Descriptor descriptor, List<String> schemas) {
+    StringBuilder schema = new StringBuilder();


This is not properly escaped - you should probably use a JSON API instead

Do you think it's ok to add the dependency to JSON API?
It would make the code easier indeed. But add a dependency

dougqh

I have concerns that JSON is not done using JSON API and is not properly escaped.
I also have concerns with the altering of the sampling decision.
When altering the sampling decision, I believe we should be adding information about which product made that decision. We should check that with APM ingestion.

dumontg · 2024-04-04T16:20:42Z

...tion/protobuf-3.0.0/src/main/java/datadog/trace/instrumentation/protobuf_java/Decorator.java

+    span.setTag(DDTags.SCHEMA_TYPE, PROTOBUF);
+    span.setTag(DDTags.SCHEMA_NAME, descriptor.getName());
+    span.setTag(DDTags.SCHEMA_OPERATION, operation);
+    Integer prio = span.forceSamplingDecision();


Indeed, if a customer pays for the ingestion of a span, we want to be able to tell why it was ingested (it’s also pretty useful for troubleshooting). The tracer set a specific attribute called _dd.p.dm depending on the sampling mechanism (see all potential values that exist).

Questions:

When you override the sampling decision, do you just want the span or the whole trace?

Do you expect a lot of spans coming in? Once per tracer process maybe?

...tion/protobuf-3.0.0/src/main/java/datadog/trace/instrumentation/protobuf_java/Decorator.java

manuel-alvarez-alvarez · 2024-04-05T08:35:40Z

...rotobuf-3.0.0/src/main/java/datadog/trace/instrumentation/protobuf_java/SchemaExtractor.java

+        extractSchema(field.getMessageType(), builder);
+        break;


This code will end up in a StackOverflowError in case there are cycles in the schema, see the example:

message DummyMessage { int32 id = 1; string name = 2; DummyMessage nested = 3; }

A cache of already processed types will help here and the SchemaBuilder should be able to handle cycles. Optionally, we might want to include a limit in the recursion depth and also number of elements visited to deal with problematic/big schemas.

Good catch! Will update the code

I updated the code & added limits. Thanks for the comment!

pr-commenter · 2024-04-05T18:19:02Z

Kafka / producer-benchmark

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	piotr-wolski/add-protobuf-instrumentation-2
git_commit_date	1712235284	1714076754
git_commit_sha	`734e3c5`	`a2735d4`

See matching parameters

	Baseline	Candidate
ci_job_date	1714077994	1714077994
ci_job_id	497652819	497652819
ci_pipeline_id	32969118	32969118
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
jdkVersion	11.0.21	11.0.21
jmhVersion	1.36	1.36
jvm	/usr/lib/jvm/java-11-openjdk-amd64/bin/java	/usr/lib/jvm/java-11-openjdk-amd64/bin/java
jvmArgs	-Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant	-Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
vmName	OpenJDK 64-Bit Server VM	OpenJDK 64-Bit Server VM
vmVersion	11.0.21+9-post-Ubuntu-0ubuntu122.04	11.0.21+9-post-Ubuntu-0ubuntu122.04

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics.

See unchanged results

scenario	Δ mean throughput
scenario:not-instrumented/KafkaProduceBenchmark.benchProduce	same
scenario:only-tracing-dsm-disabled-benchmarks/KafkaProduceBenchmark.benchProduce	same
scenario:only-tracing-dsm-enabled-benchmarks/KafkaProduceBenchmark.benchProduce	unsure [-4697.552op/s; -489.341op/s] or [-3.829%; -0.399%]

pr-commenter · 2024-04-05T18:32:04Z

Kafka / consumer-benchmark

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	piotr-wolski/add-protobuf-instrumentation-2
git_commit_date	1712235284	1714076754
git_commit_sha	`734e3c5`	`a2735d4`

See matching parameters

	Baseline	Candidate
ci_job_date	1714078031	1714078031
ci_job_id	497652820	497652820
ci_pipeline_id	32969118	32969118
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
jdkVersion	11.0.21	11.0.21
jmhVersion	1.36	1.36
jvm	/usr/lib/jvm/java-11-openjdk-amd64/bin/java	/usr/lib/jvm/java-11-openjdk-amd64/bin/java
jvmArgs	-Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant	-Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
vmName	OpenJDK 64-Bit Server VM	OpenJDK 64-Bit Server VM
vmVersion	11.0.21+9-post-Ubuntu-0ubuntu122.04	11.0.21+9-post-Ubuntu-0ubuntu122.04

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics.

See unchanged results

scenario	Δ mean throughput
scenario:not-instrumented/KafkaConsumerBenchmark.benchConsume	same
scenario:only-tracing-dsm-disabled-benchmarks/KafkaConsumerBenchmark.benchConsume	same
scenario:only-tracing-dsm-enabled-benchmarks/KafkaConsumerBenchmark.benchConsume	same

dumontg

Approving for APM Trace Intake. Since this change doesn't change the sampling whatsoever, there is no need to add new ingestion reasons.

amarziali · 2024-04-10T12:47:33Z

dd-java-agent/instrumentation/protobuf-3.0.0/build.gradle

+    group = "com.google.protobuf"
+    module = "protobuf-java"
+    versions = "[3.0.0,)"
+    assertInverse = false


any particular reason not to test that the instrumentation is disabled on versions < 3?

amarziali · 2024-04-10T12:52:31Z

...rc/main/java/datadog/trace/instrumentation/protobuf_java/AbstractMessageInstrumentation.java

+
+  static final String instrumentationName = "protobuf";
+  static final String TARGET_TYPE = "com.google.protobuf.AbstractMessage";
+  static final String SERIALIZE = "serialize";


may we use a more specific operation name? something like protobuf.serialize is probably more inline with the naming we are used to. I had a check and I did not find any otel prio art to get aligned to. Given this, you can also use Utf8ByteString here to cache utf8 conversions

Updated the name, for some reason I can't explain, it doesn't work if I switch to Utf8ByteString 🤷

If you have tests failing is probably because they are comparing String with CharSequence that's not the same thing.
So

String s = "test" CharSequence cs = UTF8ByteString.create("test") s.equals(cs) -> false s.contentEquals(cs) -> true s.equals(cs.toString()) -> true

amarziali · 2024-04-10T13:02:25Z

...gistry/src/main/java/datadog/smoketest/datastreamskafka/KafkaProducerWithSchemaRegistry.java

        new org.apache.kafka.clients.producer.KafkaProducer<>(properties);

+    Duration duration = Duration.newBuilder().setSeconds(10).build();
+    System.out.println("duration is " + duration.getSeconds());


a logger should be used to print out this

amarziali · 2024-04-10T13:05:13Z

internal-api/src/main/java/datadog/trace/bootstrap/instrumentation/api/InternalSpanTypes.java

@@ -44,4 +44,6 @@ public class InternalSpanTypes {
      UTF8BytesString.create(DDSpanTypes.TEST_SESSION_END);
  public static final UTF8BytesString VULNERABILITY =
      UTF8BytesString.create(DDSpanTypes.VULNERABILITY);
+  public static final UTF8BytesString SERIALIZE = UTF8BytesString.create(DDSpanTypes.SERIALIZE);


Perhaps I would have been used protobuf as span type since serialize/deserialize are rather operations

amarziali · 2024-04-10T13:05:54Z

internal-api/src/main/java/datadog/trace/util/ClassNameTrie.java

@@ -777,6 +777,7 @@ private int appendLeaf(int dataIndex, String key, int keyIndex, int value) {

  /** Generates Java source for a trie described as a series of "{number} {class-name}" lines. */
  public static class JavaGenerator {
+


the newline should be removed since that file should not be part of the changeset

amarziali · 2024-04-10T13:08:45Z

dd-java-agent/agent-tooling/build.gradle

@@ -38,6 +38,7 @@ compileJava.dependsOn 'generateClassNameTries'
 packageSources.dependsOn 'generateClassNameTries'
 sourcesJar.dependsOn 'generateClassNameTries'

+


nit: remove this file from the changeset

amarziali · 2024-04-10T13:09:46Z

...src/main/java/datadog/trace/instrumentation/protobuf_java/AbstractParserInstrumentation.java

+
+  static final String instrumentationName = "protobuf";
+  static final String TARGET_TYPE = "com.google.protobuf.AbstractParser";
+  static final String DESERIALIZE = "deserialize";


I have same naming concern than "serialize"

amarziali · 2024-04-10T13:13:15Z

...tion/protobuf-3.0.0/src/main/java/datadog/trace/instrumentation/protobuf_java/Decorator.java

+    if (weight == 0) {
+      return;
+    }
+    String schema = SchemaExtractor.extractSchemas(descriptor);


This seems to me a quite expensive operation. Can we cache this? The same for the schemaId being a hash depending on the schema name

I cached based on the schema name, but I'm worried the schema changes, and the schema name doesn't 🤔
Do you have a better idea on how to cache?

Can a schema be dynamically changed at runtime and what is the use case?

amarziali · 2024-04-11T11:38:36Z

...rc/main/java/datadog/trace/instrumentation/protobuf_java/AbstractMessageInstrumentation.java

+import net.bytebuddy.matcher.ElementMatcher;
+
+@AutoService(InstrumenterModule.class)
+public final class AbstractMessageInstrumentation extends InstrumenterModule.Tracing


Concerning the instrumentation, do we want to systematically trace all the ser/de operations or having this couple of instrumentations enabled by default only when datastreams is enabled? I'm wondering if we're adding too many details for the non dsm users

piochelepiotr added 5 commits April 1, 2024 16:04

dd-java-agent: Add Protobuf instrumentation

feb37fd

instrument random method in application

e87e613

instrument writeTo & parseFrom

87b09c5

Use decorator

b556540

lint

36f81bd

piochelepiotr force-pushed the piotr-wolski/add-protobuf-instrumentation-2 branch from abcbd6c to 36f81bd Compare April 1, 2024 20:04

piochelepiotr changed the title ~~Piotr wolski/add protobuf instrumentation 2~~ Instrument Google protobuf Apr 1, 2024

piochelepiotr marked this pull request as ready for review April 1, 2024 20:10

piochelepiotr requested a review from a team as a code owner April 1, 2024 20:10

piochelepiotr requested review from dougqh and nayeem-kamal April 1, 2024 20:10

dougqh reviewed Apr 2, 2024

View reviewed changes

dougqh requested changes Apr 2, 2024

View reviewed changes

piochelepiotr added 2 commits April 3, 2024 15:12

rename to instrumenterModule

18a0bb2

Use moshi for building schemas

a353f60

dumontg reviewed Apr 4, 2024

View reviewed changes

piochelepiotr added 2 commits April 4, 2024 14:28

fix tests

b49db0f

add test coverage

f4c80e5

piochelepiotr requested a review from a team as a code owner April 4, 2024 20:18

piochelepiotr requested review from manuel-alvarez-alvarez and jandro996 April 4, 2024 20:18

piochelepiotr added 2 commits April 4, 2024 17:05

update grpc tests

04649f4

update grpc tests

a0ecb32

piochelepiotr requested a review from dougqh April 5, 2024 04:00

manuel-alvarez-alvarez reviewed Apr 5, 2024

View reviewed changes

protect against stack overflow

6a7b5cb

Merge branch 'master' into piotr-wolski/add-protobuf-instrumentation-2

b0dc306

dumontg approved these changes Apr 9, 2024

View reviewed changes

amarziali reviewed Apr 10, 2024

View reviewed changes

piochelepiotr added 5 commits April 10, 2024 18:25

[data streams] Cache schema computation results

a963e0e

fix cache

8ab6626

allow older protobuf versions

a800a1f

update test

53ae1e5

update coverage

8adb457

amarziali reviewed Apr 11, 2024

View reviewed changes

dougqh approved these changes Apr 23, 2024

View reviewed changes

piochelepiotr added 3 commits April 24, 2024 21:50

don't create new spans

ed1d06f

force keep test traces

bc55c1c

updater to full name

a2735d4

piochelepiotr merged commit 2a8627d into master Apr 26, 2024
79 checks passed

piochelepiotr deleted the piotr-wolski/add-protobuf-instrumentation-2 branch April 26, 2024 21:51

github-actions bot added this to the 1.34.0 milestone Apr 26, 2024

piochelepiotr mentioned this pull request Apr 30, 2024

Add Kafka poll span when DSM is enabled #6969

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument Google protobuf #6865

Instrument Google protobuf #6865

piochelepiotr commented Apr 1, 2024

pr-commenter bot commented Apr 1, 2024 •

edited

Loading

dougqh Apr 2, 2024

dougqh Apr 2, 2024

dumontg Apr 4, 2024

piochelepiotr Apr 5, 2024

piochelepiotr Apr 5, 2024

dumontg Apr 9, 2024

dougqh Apr 2, 2024

dougqh Apr 2, 2024

piochelepiotr Apr 2, 2024

dougqh left a comment

dumontg Apr 4, 2024

manuel-alvarez-alvarez Apr 5, 2024

piochelepiotr Apr 5, 2024

piochelepiotr Apr 5, 2024

pr-commenter bot commented Apr 5, 2024 •

edited

Loading

pr-commenter bot commented Apr 5, 2024 •

edited

Loading

dumontg left a comment

amarziali Apr 10, 2024

amarziali Apr 10, 2024

piochelepiotr Apr 11, 2024

amarziali Apr 11, 2024

amarziali Apr 10, 2024

amarziali Apr 10, 2024

amarziali Apr 10, 2024

amarziali Apr 10, 2024

amarziali Apr 10, 2024

amarziali Apr 10, 2024

piochelepiotr Apr 11, 2024

amarziali Apr 11, 2024

amarziali Apr 11, 2024

		@@ -777,6 +777,7 @@ private int appendLeaf(int dataIndex, String key, int keyIndex, int value) {

		/** Generates Java source for a trie described as a series of "{number} {class-name}" lines. */
		public static class JavaGenerator {

		@@ -38,6 +38,7 @@ compileJava.dependsOn 'generateClassNameTries'
		packageSources.dependsOn 'generateClassNameTries'
		sourcesJar.dependsOn 'generateClassNameTries'

Instrument Google protobuf #6865

Instrument Google protobuf #6865

Conversation

piochelepiotr commented Apr 1, 2024

What Does This Do

Motivation

Additional Notes

pr-commenter bot commented Apr 1, 2024 • edited Loading

Benchmarks

Startup

Parameters

Summary

Load

Parameters

Summary

Dacapo

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dougqh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pr-commenter bot commented Apr 5, 2024 • edited Loading

Kafka / producer-benchmark

Parameters

Summary

pr-commenter bot commented Apr 5, 2024 • edited Loading

Kafka / consumer-benchmark

Parameters

Summary

dumontg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pr-commenter bot commented Apr 1, 2024 •

edited

Loading

pr-commenter bot commented Apr 5, 2024 •

edited

Loading

pr-commenter bot commented Apr 5, 2024 •

edited

Loading