-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to trace agent service rates feedback - e.g. MAX TPS support #6628
Conversation
This change set fixes the MAX TPS support in the tracer. Previously, the tracer treated service name and env in a case sensitive fashion. Additionally, the tracer didn't properly handle the fallback case of an empty service and empty env. While the empty service and empty env case was parsed properly, getSampler would return a static DEFAULT sampler that always used a rate 1.0. This wasn't obvious from the tests because the tests also use a rate of 1.0.
@@ -65,24 +65,37 @@ private <T extends CoreSpan<T>> String getSpanEnv(final T span) { | |||
public void onResponse( | |||
final String endpoint, final Map<String, Map<String, Number>> responseJson) { | |||
final Map<String, Number> newServiceRates = responseJson.get("rate_by_service"); | |||
if (null != newServiceRates) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the code a fair amount.
The biggest change is introducing a fallbackSampler field rather than storing the fallbackSampler in the Map.
|
||
public static EnvAndService fromString(String key) { | ||
return CACHE.computeIfAbsent(key, PARSE); | ||
} | ||
|
||
private final String env; | ||
private final String service; | ||
private final String lowerEnv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed so that it is clear that the values have been transformed
expect: | ||
serviceSampler.serviceRates.getSampler(RateByServiceTraceSampler.EnvAndService.DEFAULT).sampleRate == expectedRate | ||
serviceSampler.serviceRates.getSampler(RateByServiceTraceSampler.EnvAndService.FALLBACK).sampleRate == expectedRate | ||
serviceSampler.serviceRates.getSampler("foo", "bar").sampleRate == expectedRate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key check that was missing previously -- that would have caught the broken fallback handling
...race-core/src/test/groovy/datadog/trace/common/sampling/RateByServiceTraceSamplerTest.groovy
Show resolved
Hide resolved
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 10 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.30.0-SNAPSHOT~c30a5741e1, baseline=1.30.0-SNAPSHOT~a1e2d778c0
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.06 s) : 0, 1060124
Total [baseline] (9.305 s) : 0, 9304599
Agent [candidate] (1.066 s) : 0, 1066094
Total [candidate] (9.367 s) : 0, 9367071
section appsec
Agent [baseline] (1.157 s) : 0, 1157240
Total [baseline] (9.479 s) : 0, 9478851
Agent [candidate] (1.163 s) : 0, 1163245
Total [candidate] (9.436 s) : 0, 9436145
section iast
Agent [baseline] (1.181 s) : 0, 1181012
Total [baseline] (9.631 s) : 0, 9630736
Agent [candidate] (1.185 s) : 0, 1184834
Total [candidate] (9.814 s) : 0, 9813813
section profiling
Agent [baseline] (1.272 s) : 0, 1272324
Total [baseline] (9.502 s) : 0, 9502011
Agent [candidate] (1.277 s) : 0, 1277097
Total [candidate] (9.622 s) : 0, 9622403
gantt
title petclinic - break down per module: candidate=1.30.0-SNAPSHOT~c30a5741e1, baseline=1.30.0-SNAPSHOT~a1e2d778c0
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (667.756 ms) : 0, 667756
BytebuddyAgent [candidate] (671.304 ms) : 0, 671304
GlobalTracer [baseline] (297.844 ms) : 0, 297844
GlobalTracer [candidate] (300.007 ms) : 0, 300007
AppSec [baseline] (52.008 ms) : 0, 52008
AppSec [candidate] (52.04 ms) : 0, 52040
Remote Config [baseline] (693.88 µs) : 0, 694
Remote Config [candidate] (694.203 µs) : 0, 694
Telemetry [baseline] (7.574 ms) : 0, 7574
Telemetry [candidate] (7.581 ms) : 0, 7581
section appsec
BytebuddyAgent [baseline] (668.145 ms) : 0, 668145
BytebuddyAgent [candidate] (671.489 ms) : 0, 671489
GlobalTracer [baseline] (297.73 ms) : 0, 297730
GlobalTracer [candidate] (299.283 ms) : 0, 299283
AppSec [baseline] (149.729 ms) : 0, 149729
AppSec [candidate] (150.483 ms) : 0, 150483
Remote Config [baseline] (646.605 µs) : 0, 647
Remote Config [candidate] (653.047 µs) : 0, 653
Telemetry [baseline] (6.763 ms) : 0, 6763
Telemetry [candidate] (6.818 ms) : 0, 6818
section iast
BytebuddyAgent [baseline] (776.512 ms) : 0, 776512
BytebuddyAgent [candidate] (779.007 ms) : 0, 779007
GlobalTracer [baseline] (288.038 ms) : 0, 288038
GlobalTracer [candidate] (289.666 ms) : 0, 289666
AppSec [baseline] (52.564 ms) : 0, 52564
AppSec [candidate] (52.545 ms) : 0, 52545
Remote Config [baseline] (1.349 ms) : 0, 1349
Remote Config [candidate] (630.063 µs) : 0, 630
Telemetry [baseline] (6.521 ms) : 0, 6521
Telemetry [candidate] (7.382 ms) : 0, 7382
IAST [baseline] (21.919 ms) : 0, 21919
IAST [candidate] (21.374 ms) : 0, 21374
section profiling
BytebuddyAgent [baseline] (662.664 ms) : 0, 662664
BytebuddyAgent [candidate] (664.817 ms) : 0, 664817
GlobalTracer [baseline] (380.923 ms) : 0, 380923
GlobalTracer [candidate] (382.208 ms) : 0, 382208
AppSec [baseline] (51.823 ms) : 0, 51823
AppSec [candidate] (52.365 ms) : 0, 52365
Remote Config [baseline] (656.791 µs) : 0, 657
Remote Config [candidate] (669.484 µs) : 0, 669
Telemetry [baseline] (8.06 ms) : 0, 8060
Telemetry [candidate] (8.747 ms) : 0, 8747
ProfilingAgent [baseline] (113.823 ms) : 0, 113823
ProfilingAgent [candidate] (113.831 ms) : 0, 113831
Profiling [baseline] (113.847 ms) : 0, 113847
Profiling [candidate] (113.855 ms) : 0, 113855
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 16 unstable metrics. Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.30.0-SNAPSHOT~c30a5741e1, baseline=1.30.0-SNAPSHOT~a1e2d778c0
dateFormat X
axisFormat %s
section baseline
no_agent (1.356 ms) : 1337, 1375
. : milestone, 1356,
appsec (1.753 ms) : 1728, 1779
. : milestone, 1753,
iast (1.517 ms) : 1492, 1541
. : milestone, 1517,
profiling (1.506 ms) : 1481, 1531
. : milestone, 1506,
tracing (1.505 ms) : 1480, 1530
. : milestone, 1505,
section candidate
no_agent (1.358 ms) : 1339, 1377
. : milestone, 1358,
appsec (1.768 ms) : 1743, 1793
. : milestone, 1768,
iast (1.505 ms) : 1480, 1529
. : milestone, 1505,
profiling (1.524 ms) : 1497, 1551
. : milestone, 1524,
tracing (1.51 ms) : 1485, 1535
. : milestone, 1510,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.30.0-SNAPSHOT~c30a5741e1, baseline=1.30.0-SNAPSHOT~a1e2d778c0
dateFormat X
axisFormat %s
section baseline
no_agent (362.985 µs) : 343, 383
. : milestone, 363,
iast (467.725 µs) : 447, 488
. : milestone, 468,
iast_FULL (534.042 µs) : 513, 555
. : milestone, 534,
iast_GLOBAL (500.878 µs) : 480, 522
. : milestone, 501,
iast_HARDCODED_SECRET_DISABLED (466.864 µs) : 446, 487
. : milestone, 467,
iast_INACTIVE (444.555 µs) : 423, 466
. : milestone, 445,
iast_TELEMETRY_OFF (471.142 µs) : 450, 492
. : milestone, 471,
tracing (436.859 µs) : 416, 458
. : milestone, 437,
section candidate
no_agent (367.081 µs) : 346, 388
. : milestone, 367,
iast (473.225 µs) : 452, 494
. : milestone, 473,
iast_FULL (530.173 µs) : 510, 551
. : milestone, 530,
iast_GLOBAL (500.35 µs) : 479, 522
. : milestone, 500,
iast_HARDCODED_SECRET_DISABLED (469.793 µs) : 449, 491
. : milestone, 470,
iast_INACTIVE (438.99 µs) : 418, 460
. : milestone, 439,
iast_TELEMETRY_OFF (465.933 µs) : 445, 487
. : milestone, 466,
tracing (436.1 µs) : 416, 456
. : milestone, 436,
|
if (serviceRates == null) { | ||
return DEFAULT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines returning DEFAULT are the bugs in the original implementation.
Checking case insensitivity Checking partial match of service & partial match of env Checking use of fallback
|
||
where: | ||
service | env | expectedRate | ||
"foo" | "bar" | 0.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New tests to verify case insensitivity
And check that partial matches uses the fallback
def tracer = tracerBuilder().writer(new ListWriter()).build() | ||
|
||
when: | ||
def response = '{"rate_by_service": {"service:spock,env:test":1.0}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add one more rule ,"service:SPOCK,env:Test":0.0
to make sure we verify case-insensitivity otherwise this test will pass even if it's case-sensitive b/o the default case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another commit with that test in addition to some others that check case insensitive equivalent service/env pairs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left one proposal for the test to make sure we are testing case-insensitivity and not the default case
Adding tests that cover case insensitive equivalent service/env pairs Not defined in spec, but in Java tracer implemented as first one wins Also added tests to cover partial collisions (e.g. just service or just env) -- both same case and differing case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't had the time to submit my review before Yury approved it but I wanted to share I appreciated the inline comments about the refactoring part, thanks for taking time to add them 👍
This change set fixes the MAX TPS support in the tracer.
What Does This Do
Previously, the tracer treated service name and env in a case sensitive fashion. Additionally, the tracer didn't properly handle the fallback case of an empty service and empty env.
While the empty service and empty env case was parsed properly, getSampler would return a static DEFAULT sampler that always used a rate 1.0. This wasn't obvious from the tests because the tests also use a rate of 1.0.
Motivation
Bring Java tracer into compliance with trace agent feedback specification
Jira ticket: APMS-11270