-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument Google protobuf #6865
Instrument Google protobuf #6865
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 16 unstable metrics.
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.083 s) : 0, 1083467
Total [baseline] (8.563 s) : 0, 8562724
Agent [candidate] (1.079 s) : 0, 1079449
Total [candidate] (8.568 s) : 0, 8568070
section iast
Agent [baseline] (1.207 s) : 0, 1207241
Total [baseline] (9.072 s) : 0, 9072296
Agent [candidate] (1.2 s) : 0, 1199540
Total [candidate] (9.075 s) : 0, 9074504
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.209 s) : 0, 1209035
Total [baseline] (9.062 s) : 0, 9062063
Agent [candidate] (1.203 s) : 0, 1203485
Total [candidate] (9.036 s) : 0, 9036482
section iast_TELEMETRY_OFF
Agent [baseline] (1.195 s) : 0, 1195209
Total [baseline] (9.02 s) : 0, 9019868
Agent [candidate] (1.21 s) : 0, 1209606
Total [candidate] (9.032 s) : 0, 9032204
gantt
title insecure-bank - break down per module: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (678.598 ms) : 0, 678598
BytebuddyAgent [candidate] (675.76 ms) : 0, 675760
GlobalTracer [baseline] (311.991 ms) : 0, 311991
GlobalTracer [candidate] (311.28 ms) : 0, 311280
AppSec [baseline] (49.902 ms) : 0, 49902
AppSec [candidate] (49.675 ms) : 0, 49675
Remote Config [baseline] (680.469 µs) : 0, 680
Remote Config [candidate] (663.906 µs) : 0, 664
Telemetry [baseline] (7.63 ms) : 0, 7630
Telemetry [candidate] (7.679 ms) : 0, 7679
section iast
BytebuddyAgent [baseline] (798.143 ms) : 0, 798143
BytebuddyAgent [candidate] (795.199 ms) : 0, 795199
GlobalTracer [baseline] (290.877 ms) : 0, 290877
GlobalTracer [candidate] (289.161 ms) : 0, 289161
AppSec [baseline] (51.12 ms) : 0, 51120
AppSec [candidate] (49.714 ms) : 0, 49714
Remote Config [baseline] (598.567 µs) : 0, 599
Remote Config [candidate] (602.396 µs) : 0, 602
Telemetry [baseline] (7.505 ms) : 0, 7505
Telemetry [candidate] (8.211 ms) : 0, 8211
IAST [baseline] (24.54 ms) : 0, 24540
IAST [candidate] (22.252 ms) : 0, 22252
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (801.07 ms) : 0, 801070
BytebuddyAgent [candidate] (796.904 ms) : 0, 796904
GlobalTracer [baseline] (290.242 ms) : 0, 290242
GlobalTracer [candidate] (290.383 ms) : 0, 290383
AppSec [baseline] (48.861 ms) : 0, 48861
AppSec [candidate] (50.8 ms) : 0, 50800
Remote Config [baseline] (601.595 µs) : 0, 602
Remote Config [candidate] (594.75 µs) : 0, 595
Telemetry [baseline] (9.048 ms) : 0, 9048
Telemetry [candidate] (7.412 ms) : 0, 7412
IAST [baseline] (24.689 ms) : 0, 24689
IAST [candidate] (22.842 ms) : 0, 22842
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (791.953 ms) : 0, 791953
BytebuddyAgent [candidate] (800.679 ms) : 0, 800679
GlobalTracer [baseline] (287.83 ms) : 0, 287830
GlobalTracer [candidate] (291.877 ms) : 0, 291877
AppSec [baseline] (51.305 ms) : 0, 51305
AppSec [candidate] (50.175 ms) : 0, 50175
Remote Config [baseline] (580.553 µs) : 0, 581
Remote Config [candidate] (589.47 µs) : 0, 589
Telemetry [baseline] (6.594 ms) : 0, 6594
Telemetry [candidate] (8.87 ms) : 0, 8870
IAST [baseline] (22.607 ms) : 0, 22607
IAST [candidate] (22.806 ms) : 0, 22806
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.089 s) : 0, 1089264
Total [baseline] (10.468 s) : 0, 10467543
Agent [candidate] (1.083 s) : 0, 1083056
Total [candidate] (10.384 s) : 0, 10384124
section appsec
Agent [baseline] (1.198 s) : 0, 1197836
Total [baseline] (10.445 s) : 0, 10444923
Agent [candidate] (1.2 s) : 0, 1200436
Total [candidate] (10.467 s) : 0, 10467159
section iast
Agent [baseline] (1.209 s) : 0, 1209382
Total [baseline] (10.747 s) : 0, 10747104
Agent [candidate] (1.207 s) : 0, 1206501
Total [candidate] (10.828 s) : 0, 10827530
section profiling
Agent [baseline] (1.266 s) : 0, 1266154
Total [baseline] (10.656 s) : 0, 10655967
Agent [candidate] (1.288 s) : 0, 1288114
Total [candidate] (10.662 s) : 0, 10661545
gantt
title petclinic - break down per module: candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (682.652 ms) : 0, 682652
BytebuddyAgent [candidate] (679.311 ms) : 0, 679311
GlobalTracer [baseline] (313.421 ms) : 0, 313421
GlobalTracer [candidate] (311.394 ms) : 0, 311394
AppSec [baseline] (50.04 ms) : 0, 50040
AppSec [candidate] (49.494 ms) : 0, 49494
Remote Config [baseline] (665.771 µs) : 0, 666
Remote Config [candidate] (652.991 µs) : 0, 653
Telemetry [baseline] (7.697 ms) : 0, 7697
Telemetry [candidate] (7.567 ms) : 0, 7567
section appsec
BytebuddyAgent [baseline] (694.802 ms) : 0, 694802
BytebuddyAgent [candidate] (695.775 ms) : 0, 695775
GlobalTracer [baseline] (291.115 ms) : 0, 291115
GlobalTracer [candidate] (291.968 ms) : 0, 291968
AppSec [baseline] (149.145 ms) : 0, 149145
AppSec [candidate] (149.686 ms) : 0, 149686
Remote Config [baseline] (609.168 µs) : 0, 609
Remote Config [candidate] (595.657 µs) : 0, 596
Telemetry [baseline] (8.666 ms) : 0, 8666
Telemetry [candidate] (9.326 ms) : 0, 9326
IAST [baseline] (19.27 ms) : 0, 19270
IAST [candidate] (18.736 ms) : 0, 18736
section iast
BytebuddyAgent [baseline] (801.325 ms) : 0, 801325
BytebuddyAgent [candidate] (797.712 ms) : 0, 797712
GlobalTracer [baseline] (290.753 ms) : 0, 290753
GlobalTracer [candidate] (291.878 ms) : 0, 291878
AppSec [baseline] (49.617 ms) : 0, 49617
AppSec [candidate] (50.564 ms) : 0, 50564
Remote Config [baseline] (572.541 µs) : 0, 573
Remote Config [candidate] (591.987 µs) : 0, 592
Telemetry [baseline] (8.087 ms) : 0, 8087
Telemetry [candidate] (8.287 ms) : 0, 8287
IAST [baseline] (24.393 ms) : 0, 24393
IAST [candidate] (23.154 ms) : 0, 23154
section profiling
BytebuddyAgent [baseline] (676.279 ms) : 0, 676279
BytebuddyAgent [candidate] (687.788 ms) : 0, 687788
GlobalTracer [baseline] (379.962 ms) : 0, 379962
GlobalTracer [candidate] (386.831 ms) : 0, 386831
AppSec [baseline] (50.014 ms) : 0, 50014
AppSec [candidate] (51.103 ms) : 0, 51103
Remote Config [baseline] (703.811 µs) : 0, 704
Remote Config [candidate] (717.664 µs) : 0, 718
Telemetry [baseline] (7.429 ms) : 0, 7429
Telemetry [candidate] (7.552 ms) : 0, 7552
ProfilingAgent [baseline] (95.455 ms) : 0, 95455
ProfilingAgent [candidate] (97.05 ms) : 0, 97050
Profiling [baseline] (95.478 ms) : 0, 95478
Profiling [candidate] (97.074 ms) : 0, 97074
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 13 metrics, 15 unstable metrics. Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475
dateFormat X
axisFormat %s
section baseline
no_agent (375.289 µs) : 354, 396
. : milestone, 375,
iast (488.846 µs) : 468, 510
. : milestone, 489,
iast_FULL (547.607 µs) : 527, 569
. : milestone, 548,
iast_GLOBAL (511.151 µs) : 488, 534
. : milestone, 511,
iast_HARDCODED_SECRET_DISABLED (477.961 µs) : 456, 500
. : milestone, 478,
iast_INACTIVE (455.838 µs) : 435, 477
. : milestone, 456,
iast_TELEMETRY_OFF (479.616 µs) : 459, 500
. : milestone, 480,
tracing (453.058 µs) : 433, 474
. : milestone, 453,
section candidate
no_agent (377.385 µs) : 357, 398
. : milestone, 377,
iast (483.211 µs) : 462, 505
. : milestone, 483,
iast_FULL (545.65 µs) : 525, 567
. : milestone, 546,
iast_GLOBAL (514.792 µs) : 492, 538
. : milestone, 515,
iast_HARDCODED_SECRET_DISABLED (484.614 µs) : 463, 507
. : milestone, 485,
iast_INACTIVE (454.983 µs) : 434, 476
. : milestone, 455,
iast_TELEMETRY_OFF (480.685 µs) : 459, 502
. : milestone, 481,
tracing (454.05 µs) : 433, 475
. : milestone, 454,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.33.0-SNAPSHOT~a2735d4bea, baseline=1.34.0-SNAPSHOT~ae1c4c9475
dateFormat X
axisFormat %s
section baseline
no_agent (1.34 ms) : 1321, 1359
. : milestone, 1340,
appsec (1.73 ms) : 1705, 1755
. : milestone, 1730,
appsec_no_iast (1.727 ms) : 1702, 1753
. : milestone, 1727,
iast (1.49 ms) : 1467, 1513
. : milestone, 1490,
profiling (1.518 ms) : 1492, 1544
. : milestone, 1518,
tracing (1.5 ms) : 1476, 1524
. : milestone, 1500,
section candidate
no_agent (1.358 ms) : 1339, 1377
. : milestone, 1358,
appsec (1.754 ms) : 1729, 1778
. : milestone, 1754,
appsec_no_iast (1.742 ms) : 1718, 1766
. : milestone, 1742,
iast (1.534 ms) : 1511, 1557
. : milestone, 1534,
profiling (1.498 ms) : 1473, 1523
. : milestone, 1498,
tracing (1.506 ms) : 1481, 1530
. : milestone, 1506,
Dacapo |
abcbd6c
to
36f81bd
Compare
AgentSpan span = scope.span(); | ||
DESERIALIZER_DECORATOR.onError(span, throwable); | ||
if (message instanceof AbstractMessage) { | ||
DESERIALIZER_DECORATOR.attachSchemaOnSpan((AbstractMessage) message, span); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like work that would probably be better done in a background thread.
There is a mechanism for doing that, but I think it could use some improvement.
For now, I'm fine with this as is - as long as it is not on by default.
span.setTag(DDTags.SCHEMA_TYPE, PROTOBUF); | ||
span.setTag(DDTags.SCHEMA_NAME, descriptor.getName()); | ||
span.setTag(DDTags.SCHEMA_OPERATION, operation); | ||
Integer prio = span.forceSamplingDecision(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There probably needs to a product specific code for why the decision was forced.
You should check with APM ingestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, if a customer pays for the ingestion of a span, we want to be able to tell why it was ingested (it’s also pretty useful for troubleshooting). The tracer set a specific attribute called _dd.p.dm
depending on the sampling mechanism (see all potential values that exist).
Questions:
- When you override the sampling decision, do you just want the span or the whole trace?
- Do you expect a lot of spans coming in? Once per tracer process maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked a bit more the code, it doesn’t look like forceSamplingPriority is going to increase sampling rate, is that correct?
It looks like it’s going to force the sampling decision to be taken earlier.
What I can do, is reverse the two samplings.
Today, the code extracts only 1 schema every 30 seconds, so I can do that sampling first. That way, the change would only affect the sampling decision of 1 trace every 30 seconds, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched the two sampling calls.
Now the forceSamplingDecision is only rarely called (a few times every 30 seconds, 1 time in the best case depending on race conditions).
Also, forceSamplingDecision doesn't force the trace to be sampled. It forces the sampling decision to be made.
So this should not increase sampling rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that makes sense, I had not understood that initially. If you don't influence sampling I don't think we need to do anything at intake. For accomplishing what you want I am not knowledgeable enough with the tracer itself.
public static String extractSchema(Descriptor descriptor) { | ||
List<String> schemas = new ArrayList<>(); | ||
extractDescriptorSchema(descriptor, schemas); | ||
return "{" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not properly escaped
} | ||
|
||
private static void extractDescriptorSchema(Descriptor descriptor, List<String> schemas) { | ||
StringBuilder schema = new StringBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not properly escaped - you should probably use a JSON API instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's ok to add the dependency to JSON API?
It would make the code easier indeed. But add a dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns that JSON is not done using JSON API and is not properly escaped.
I also have concerns with the altering of the sampling decision.
When altering the sampling decision, I believe we should be adding information about which product made that decision. We should check that with APM ingestion.
span.setTag(DDTags.SCHEMA_TYPE, PROTOBUF); | ||
span.setTag(DDTags.SCHEMA_NAME, descriptor.getName()); | ||
span.setTag(DDTags.SCHEMA_OPERATION, operation); | ||
Integer prio = span.forceSamplingDecision(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, if a customer pays for the ingestion of a span, we want to be able to tell why it was ingested (it’s also pretty useful for troubleshooting). The tracer set a specific attribute called _dd.p.dm
depending on the sampling mechanism (see all potential values that exist).
Questions:
- When you override the sampling decision, do you just want the span or the whole trace?
- Do you expect a lot of spans coming in? Once per tracer process maybe?
...tion/protobuf-3.0.0/src/main/java/datadog/trace/instrumentation/protobuf_java/Decorator.java
Outdated
Show resolved
Hide resolved
extractSchema(field.getMessageType(), builder); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code will end up in a StackOverflowError
in case there are cycles in the schema, see the example:
message DummyMessage {
int32 id = 1;
string name = 2;
DummyMessage nested = 3;
}
A cache of already processed types will help here and the SchemaBuilder
should be able to handle cycles. Optionally, we might want to include a limit in the recursion depth and also number of elements visited to deal with problematic/big schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will update the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code & added limits. Thanks for the comment!
Kafka / producer-benchmarkParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics. See unchanged results
|
Kafka / consumer-benchmarkParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics. See unchanged results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for APM Trace Intake. Since this change doesn't change the sampling whatsoever, there is no need to add new ingestion reasons.
group = "com.google.protobuf" | ||
module = "protobuf-java" | ||
versions = "[3.0.0,)" | ||
assertInverse = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason not to test that the instrumentation is disabled on versions < 3?
|
||
static final String instrumentationName = "protobuf"; | ||
static final String TARGET_TYPE = "com.google.protobuf.AbstractMessage"; | ||
static final String SERIALIZE = "serialize"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may we use a more specific operation name? something like protobuf.serialize
is probably more inline with the naming we are used to. I had a check and I did not find any otel prio art to get aligned to. Given this, you can also use Utf8ByteString here to cache utf8 conversions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the name, for some reason I can't explain, it doesn't work if I switch to Utf8ByteString 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have tests failing is probably because they are comparing String with CharSequence that's not the same thing.
So
String s = "test"
CharSequence cs = UTF8ByteString.create("test")
s.equals(cs) -> false
s.contentEquals(cs) -> true
s.equals(cs.toString()) -> true
new org.apache.kafka.clients.producer.KafkaProducer<>(properties); | ||
|
||
Duration duration = Duration.newBuilder().setSeconds(10).build(); | ||
System.out.println("duration is " + duration.getSeconds()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a logger should be used to print out this
@@ -44,4 +44,6 @@ public class InternalSpanTypes { | |||
UTF8BytesString.create(DDSpanTypes.TEST_SESSION_END); | |||
public static final UTF8BytesString VULNERABILITY = | |||
UTF8BytesString.create(DDSpanTypes.VULNERABILITY); | |||
public static final UTF8BytesString SERIALIZE = UTF8BytesString.create(DDSpanTypes.SERIALIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I would have been used protobuf
as span type since serialize/deserialize are rather operations
@@ -777,6 +777,7 @@ private int appendLeaf(int dataIndex, String key, int keyIndex, int value) { | |||
|
|||
/** Generates Java source for a trie described as a series of "{number} {class-name}" lines. */ | |||
public static class JavaGenerator { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the newline should be removed since that file should not be part of the changeset
@@ -38,6 +38,7 @@ compileJava.dependsOn 'generateClassNameTries' | |||
packageSources.dependsOn 'generateClassNameTries' | |||
sourcesJar.dependsOn 'generateClassNameTries' | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove this file from the changeset
|
||
static final String instrumentationName = "protobuf"; | ||
static final String TARGET_TYPE = "com.google.protobuf.AbstractParser"; | ||
static final String DESERIALIZE = "deserialize"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have same naming concern than "serialize"
if (weight == 0) { | ||
return; | ||
} | ||
String schema = SchemaExtractor.extractSchemas(descriptor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to me a quite expensive operation. Can we cache this? The same for the schemaId being a hash depending on the schema name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cached based on the schema name, but I'm worried the schema changes, and the schema name doesn't 🤔
Do you have a better idea on how to cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a schema be dynamically changed at runtime and what is the use case?
import net.bytebuddy.matcher.ElementMatcher; | ||
|
||
@AutoService(InstrumenterModule.class) | ||
public final class AbstractMessageInstrumentation extends InstrumenterModule.Tracing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning the instrumentation, do we want to systematically trace all the ser/de operations or having this couple of instrumentations enabled by default only when datastreams is enabled? I'm wondering if we're adding too many details for the non dsm users
What Does This Do
Add protobuf instrumentation for serialize / deserialize operations.
If data streams monitoring is enabled, capture schemas of messages.
Motivation
Additional Notes
It impacts the GRPC instrumentation
since it adds spans for serialize / deserialize
Jira ticket: [PROJ-IDENT]