Introduce PII redaction based on keywords #6048

jpbempel · 2023-10-13T13:32:44Z

What Does This Do

Redacts values based on a list of predefined words name are normalized in lower case and by removing special characters: @, $, - and _
Redaction happening at:

expression language for reference, index and getmember operations
serialization for snapshot or template values including maps
capture of the values from instrumentation For conditions (log or span decoration probes) evaluation of a redacted values triggers an evaluation exception that will be reported as evaluation error into a snapshot.

Motivation

Prevent leakage of sensitive information like passwords, api keys, tokens, ...

Additional Notes

Jira ticket: DEBUG-1838

pr-commenter · 2023-10-13T14:14:21Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
commit	1.22.0-SNAPSHOT~f282e8c0d6	1.22.0-SNAPSHOT~b21f9edd6a
config	baseline	candidate

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
module	Agent	Agent
parent	None	None
variant	iast	iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 54 cases.

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.22.0-SNAPSHOT~b21f9edd6a, baseline=1.22.0-SNAPSHOT~f282e8c0d6

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.034 s) : 0, 1034154
Total [baseline] (8.711 s) : 0, 8711392
Agent [candidate] (1.012 s) : 0, 1012257
Total [candidate] (8.679 s) : 0, 8678852
section iast
Agent [baseline] (1.139 s) : 0, 1139448
Total [baseline] (9.245 s) : 0, 9244804
Agent [candidate] (1.139 s) : 0, 1138682
Total [candidate] (9.204 s) : 0, 9203720
section iast_TELEMETRY_OFF
Agent [baseline] (1.131 s) : 0, 1130957
Total [baseline] (9.202 s) : 0, 9202479
Agent [candidate] (1.134 s) : 0, 1134338
Total [candidate] (9.234 s) : 0, 9234350

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.034 s	-
Agent	iast	1.139 s	105.294 ms (10.2%)
Agent	iast_TELEMETRY_OFF	1.131 s	96.803 ms (9.4%)
Total	tracing	8.711 s	-
Total	iast	9.245 s	533.412 ms (6.1%)
Total	iast_TELEMETRY_OFF	9.202 s	491.087 ms (5.6%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.012 s	-
Agent	iast	1.139 s	126.425 ms (12.5%)
Agent	iast_TELEMETRY_OFF	1.134 s	122.081 ms (12.1%)
Total	tracing	8.679 s	-
Total	iast	9.204 s	524.867 ms (6.0%)
Total	iast_TELEMETRY_OFF	9.234 s	555.498 ms (6.4%)

gantt
    title insecure-bank - break down per module: candidate=1.22.0-SNAPSHOT~b21f9edd6a, baseline=1.22.0-SNAPSHOT~f282e8c0d6

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (644.631 ms) : 0, 644631
BytebuddyAgent [candidate] (629.915 ms) : 0, 629915
GlobalTracer [baseline] (297.94 ms) : 0, 297940
GlobalTracer [candidate] (292.506 ms) : 0, 292506
AppSec [baseline] (49.772 ms) : 0, 49772
AppSec [candidate] (48.935 ms) : 0, 48935
Remote Config [baseline] (677.316 µs) : 0, 677
Remote Config [candidate] (651.408 µs) : 0, 651
Telemetry [baseline] (6.131 ms) : 0, 6131
Telemetry [candidate] (5.993 ms) : 0, 5993
section iast
BytebuddyAgent [baseline] (760.323 ms) : 0, 760323
BytebuddyAgent [candidate] (759.841 ms) : 0, 759841
GlobalTracer [baseline] (271.931 ms) : 0, 271931
GlobalTracer [candidate] (272.675 ms) : 0, 272675
AppSec [baseline] (46.028 ms) : 0, 46028
AppSec [candidate] (46.407 ms) : 0, 46407
Remote Config [baseline] (569.881 µs) : 0, 570
Remote Config [candidate] (567.723 µs) : 0, 568
Telemetry [baseline] (6.843 ms) : 0, 6843
Telemetry [candidate] (6.831 ms) : 0, 6831
IAST [baseline] (19.378 ms) : 0, 19378
IAST [candidate] (17.981 ms) : 0, 17981
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (753.513 ms) : 0, 753513
BytebuddyAgent [candidate] (757.835 ms) : 0, 757835
GlobalTracer [baseline] (271.716 ms) : 0, 271716
GlobalTracer [candidate] (272.629 ms) : 0, 272629
AppSec [baseline] (46.386 ms) : 0, 46386
AppSec [candidate] (46.244 ms) : 0, 46244
Remote Config [baseline] (565.87 µs) : 0, 566
Remote Config [candidate] (568.256 µs) : 0, 568
Telemetry [baseline] (8.323 ms) : 0, 8323
Telemetry [candidate] (6.301 ms) : 0, 6301
IAST [baseline] (16.269 ms) : 0, 16269
IAST [candidate] (16.213 ms) : 0, 16213

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.22.0-SNAPSHOT~b21f9edd6a, baseline=1.22.0-SNAPSHOT~f282e8c0d6

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.014 s) : 0, 1014284
Total [baseline] (9.25 s) : 0, 9250293
Agent [candidate] (1.014 s) : 0, 1014297
Total [candidate] (9.269 s) : 0, 9268889
section appsec
Agent [baseline] (1.102 s) : 0, 1101630
Total [baseline] (9.288 s) : 0, 9287723
Agent [candidate] (1.101 s) : 0, 1101354
Total [candidate] (9.25 s) : 0, 9249989
section iast
Agent [baseline] (1.141 s) : 0, 1140964
Total [baseline] (9.442 s) : 0, 9441568
Agent [candidate] (1.146 s) : 0, 1145838
Total [candidate] (9.451 s) : 0, 9450783
section profiling
Agent [baseline] (1.192 s) : 0, 1192311
Total [baseline] (9.451 s) : 0, 9450780
Agent [candidate] (1.195 s) : 0, 1195062
Total [candidate] (9.474 s) : 0, 9473973

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.014 s	-
Agent	appsec	1.102 s	87.345 ms (8.6%)
Agent	iast	1.141 s	126.68 ms (12.5%)
Agent	profiling	1.192 s	178.027 ms (17.6%)
Total	tracing	9.25 s	-
Total	appsec	9.288 s	37.43 ms (0.4%)
Total	iast	9.442 s	191.275 ms (2.1%)
Total	profiling	9.451 s	200.487 ms (2.2%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.014 s	-
Agent	appsec	1.101 s	87.058 ms (8.6%)
Agent	iast	1.146 s	131.542 ms (13.0%)
Agent	profiling	1.195 s	180.765 ms (17.8%)
Total	tracing	9.269 s	-
Total	appsec	9.25 s	-18.9 ms (-0.2%)
Total	iast	9.451 s	181.894 ms (2.0%)
Total	profiling	9.474 s	205.085 ms (2.2%)

gantt
    title petclinic - break down per module: candidate=1.22.0-SNAPSHOT~b21f9edd6a, baseline=1.22.0-SNAPSHOT~f282e8c0d6

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (631.769 ms) : 0, 631769
BytebuddyAgent [candidate] (631.275 ms) : 0, 631275
GlobalTracer [baseline] (292.27 ms) : 0, 292270
GlobalTracer [candidate] (293.026 ms) : 0, 293026
AppSec [baseline] (49.236 ms) : 0, 49236
AppSec [candidate] (48.947 ms) : 0, 48947
Remote Config [baseline] (652.874 µs) : 0, 653
Remote Config [candidate] (653.975 µs) : 0, 654
Telemetry [baseline] (5.988 ms) : 0, 5988
Telemetry [candidate] (6.039 ms) : 0, 6039
section appsec
BytebuddyAgent [baseline] (631.888 ms) : 0, 631888
BytebuddyAgent [candidate] (630.736 ms) : 0, 630736
GlobalTracer [baseline] (291.364 ms) : 0, 291364
GlobalTracer [candidate] (292.568 ms) : 0, 292568
AppSec [baseline] (137.842 ms) : 0, 137842
AppSec [candidate] (137.512 ms) : 0, 137512
Remote Config [baseline] (636.752 µs) : 0, 637
Remote Config [candidate] (654.877 µs) : 0, 655
Telemetry [baseline] (5.669 ms) : 0, 5669
Telemetry [candidate] (5.69 ms) : 0, 5690
section iast
BytebuddyAgent [baseline] (761.743 ms) : 0, 761743
BytebuddyAgent [candidate] (764.495 ms) : 0, 764495
GlobalTracer [baseline] (271.98 ms) : 0, 271980
GlobalTracer [candidate] (274.301 ms) : 0, 274301
AppSec [baseline] (46.51 ms) : 0, 46510
AppSec [candidate] (46.319 ms) : 0, 46319
Remote Config [baseline] (588.495 µs) : 0, 588
Remote Config [candidate] (574.282 µs) : 0, 574
Telemetry [baseline] (6.269 ms) : 0, 6269
Telemetry [candidate] (7.552 ms) : 0, 7552
IAST [baseline] (19.491 ms) : 0, 19491
IAST [candidate] (18.156 ms) : 0, 18156
section profiling
BytebuddyAgent [baseline] (646.347 ms) : 0, 646347
BytebuddyAgent [candidate] (647.779 ms) : 0, 647779
GlobalTracer [baseline] (355.752 ms) : 0, 355752
GlobalTracer [candidate] (356.994 ms) : 0, 356994
AppSec [baseline] (49.218 ms) : 0, 49218
AppSec [candidate] (49.522 ms) : 0, 49522
Remote Config [baseline] (649.25 µs) : 0, 649
Remote Config [candidate] (667.206 µs) : 0, 667
Telemetry [baseline] (6.127 ms) : 0, 6127
Telemetry [candidate] (6.188 ms) : 0, 6188
ProfilingAgent [baseline] (80.69 ms) : 0, 80690
ProfilingAgent [candidate] (80.373 ms) : 0, 80373
Profiling [baseline] (80.714 ms) : 0, 80714
Profiling [candidate] (80.397 ms) : 0, 80397

Load

Parameters

	Baseline	Candidate
commit	1.22.0-SNAPSHOT~f282e8c0d6	1.22.0-SNAPSHOT~b21f9edd6a
config	baseline	candidate
end_time	2023-10-17T06:03:44	2023-10-17T06:19:57
start_time	2023-10-17T06:03:31	2023-10-17T06:19:44

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
variant	iast	iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 22 cases.

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.22.0-SNAPSHOT~b21f9edd6a, baseline=1.22.0-SNAPSHOT~f282e8c0d6
    dateFormat X
    axisFormat %s
section baseline
no_agent (354.621 µs) : 335, 374
.   : milestone, 355,
iast (454.677 µs) : 434, 475
.   : milestone, 455,
iast_FULL (508.407 µs) : 488, 529
.   : milestone, 508,
iast_INACTIVE (421.017 µs) : 401, 441
.   : milestone, 421,
iast_TELEMETRY_OFF (455.326 µs) : 434, 477
.   : milestone, 455,
tracing (419.451 µs) : 399, 440
.   : milestone, 419,
section candidate
no_agent (358.961 µs) : 339, 379
.   : milestone, 359,
iast (449.985 µs) : 429, 471
.   : milestone, 450,
iast_FULL (513.144 µs) : 493, 534
.   : milestone, 513,
iast_INACTIVE (426.345 µs) : 406, 447
.   : milestone, 426,
iast_TELEMETRY_OFF (448.866 µs) : 428, 470
.   : milestone, 449,
tracing (430.248 µs) : 410, 451
.   : milestone, 430,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	354.621 µs [334.767 µs, 374.474 µs]	-
iast	454.677 µs [434.191 µs, 475.162 µs]	100.056 µs (28.2%)
iast_FULL	508.407 µs [487.777 µs, 529.036 µs]	153.786 µs (43.4%)
iast_INACTIVE	421.017 µs [400.651 µs, 441.384 µs]	66.397 µs (18.7%)
iast_TELEMETRY_OFF	455.326 µs [434.065 µs, 476.586 µs]	100.705 µs (28.4%)
tracing	419.451 µs [399.261 µs, 439.641 µs]	64.83 µs (18.3%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	358.961 µs [339.315 µs, 378.608 µs]	-
iast	449.985 µs [429.383 µs, 470.586 µs]	91.023 µs (25.4%)
iast_FULL	513.144 µs [492.719 µs, 533.568 µs]	154.182 µs (43.0%)
iast_INACTIVE	426.345 µs [405.501 µs, 447.189 µs]	67.384 µs (18.8%)
iast_TELEMETRY_OFF	448.866 µs [428.022 µs, 469.709 µs]	89.904 µs (25.0%)
tracing	430.248 µs [409.516 µs, 450.979 µs]	71.286 µs (19.9%)

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.22.0-SNAPSHOT~b21f9edd6a, baseline=1.22.0-SNAPSHOT~f282e8c0d6
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.322 ms) : 1302, 1341
.   : milestone, 1322,
appsec (1.692 ms) : 1668, 1716
.   : milestone, 1692,
iast (1.442 ms) : 1417, 1466
.   : milestone, 1442,
profiling (1.459 ms) : 1434, 1485
.   : milestone, 1459,
tracing (1.443 ms) : 1418, 1468
.   : milestone, 1443,
section candidate
no_agent (1.336 ms) : 1317, 1354
.   : milestone, 1336,
appsec (1.69 ms) : 1666, 1714
.   : milestone, 1690,
iast (1.45 ms) : 1426, 1474
.   : milestone, 1450,
profiling (1.473 ms) : 1446, 1499
.   : milestone, 1473,
tracing (1.419 ms) : 1395, 1443
.   : milestone, 1419,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.322 ms [1.302 ms, 1.341 ms]	-
appsec	1.692 ms [1.668 ms, 1.716 ms]	370.11 µs (28.0%)
iast	1.442 ms [1.417 ms, 1.466 ms]	120.192 µs (9.1%)
profiling	1.459 ms [1.434 ms, 1.485 ms]	137.708 µs (10.4%)
tracing	1.443 ms [1.418 ms, 1.468 ms]	121.337 µs (9.2%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.336 ms [1.317 ms, 1.354 ms]	-
appsec	1.69 ms [1.666 ms, 1.714 ms]	354.651 µs (26.6%)
iast	1.45 ms [1.426 ms, 1.474 ms]	114.529 µs (8.6%)
profiling	1.473 ms [1.446 ms, 1.499 ms]	137.071 µs (10.3%)
tracing	1.419 ms [1.395 ms, 1.443 ms]	83.724 µs (6.3%)

Redacts values based on a list of predefined words name are normalized in lower case and by removing special characters: `@`, `$`, `-` and `_` Redaction happening at: - expression language for reference, index and getmember operations - serialization for snapshot or template values including maps - capture of the values from instrumentation For conditions (log or span decoration probes) evaluation of a redacted values triggers an evaluation exception that will be reported as evaluation error into a snapshot.

shatzi

Overall I like the implementation. I think we can simplify the expression evaluation logic by simple throw an exception and avoid all the redacted callback.

...bugger/debugger-el/src/main/java/com/datadog/debugger/el/expressions/ValueRefExpression.java

...a-agent/agent-debugger/debugger-el/src/main/java/com/datadog/debugger/el/ProbeCondition.java

shatzi

I think we have a bug with REDACTED_VALUE comparison.
Few comments about refactoring.

...ugger/debugger-el/src/main/java/com/datadog/debugger/el/expressions/GetMemberExpression.java

shatzi · 2023-10-16T19:48:11Z

...bugger/debugger-el/src/main/java/com/datadog/debugger/el/expressions/ValueRefExpression.java

@@ -22,7 +22,9 @@ public Value<?> evaluate(ValueReferenceResolver valueRefResolver) {
    try {
      Object symbol = valueRefResolver.lookup(symbolName);
      if (symbol == Redaction.REDACTED_VALUE) {
-        valueRefResolver.redacted(PrettyPrintVisitor.print(this));
+        String expr = PrettyPrintVisitor.print(this);
+        throw new EvaluationException(


the expr value would be overwritten by the catch expression as symbolName

true I need to fix the catch handler

shatzi · 2023-10-16T19:49:05Z

...-debugger/debugger-el/src/main/java/com/datadog/debugger/el/expressions/IndexExpression.java

-          valueRefResolver.redacted(PrettyPrintVisitor.print(this));
-          result = Value.of(Redaction.REDACTED_VALUE);
+          String expr = PrettyPrintVisitor.print(this);
+          throw new EvaluationException(


the catch exception would call PrettyPrintVisitor the second time.

...-debugger/debugger-el/src/main/java/com/datadog/debugger/el/expressions/IndexExpression.java

jpbempel requested a review from a team as a code owner October 13, 2023 13:32

jpbempel requested review from ojung and removed request for a team October 13, 2023 13:32

jpbempel force-pushed the jpbempel/key-based-redaction branch from 16aeb1d to 81a6d13 Compare October 13, 2023 15:20

jpbempel added comp: debugger Dynamic Instrumentation type: enhancement labels Oct 13, 2023

shatzi reviewed Oct 16, 2023

View reviewed changes

...bugger/debugger-el/src/main/java/com/datadog/debugger/el/expressions/ValueRefExpression.java Outdated Show resolved Hide resolved

...a-agent/agent-debugger/debugger-el/src/main/java/com/datadog/debugger/el/ProbeCondition.java Outdated Show resolved Hide resolved

jpbempel added 2 commits October 16, 2023 09:55

Throw exception on expression language in any case

cc8439d

fix bug identifier normalization

a58b8a2

shatzi reviewed Oct 16, 2023

View reviewed changes

make REDACTED_VALUE unique

3f1ccd4

shatzi approved these changes Oct 16, 2023

View reviewed changes

fix spotbugs

b21f9ed

jpbempel merged commit 3f59083 into master Oct 17, 2023
67 of 69 checks passed

jpbempel deleted the jpbempel/key-based-redaction branch October 17, 2023 06:39

github-actions bot added this to the 1.22.0 milestone Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce PII redaction based on keywords #6048

Introduce PII redaction based on keywords #6048

jpbempel commented Oct 13, 2023 •

edited by jira bot

Loading

pr-commenter bot commented Oct 13, 2023 •

edited

Loading

shatzi left a comment

shatzi left a comment

shatzi Oct 16, 2023

jpbempel Oct 16, 2023

shatzi Oct 16, 2023

Introduce PII redaction based on keywords #6048

Introduce PII redaction based on keywords #6048

Conversation

jpbempel commented Oct 13, 2023 • edited by jira bot Loading

What Does This Do

Motivation

Additional Notes

pr-commenter bot commented Oct 13, 2023 • edited Loading

Benchmarks

Startup

Parameters

Summary

Load

Parameters

Summary

shatzi left a comment

Choose a reason for hiding this comment

shatzi left a comment

Choose a reason for hiding this comment

shatzi Oct 16, 2023

Choose a reason for hiding this comment

jpbempel Oct 16, 2023

Choose a reason for hiding this comment

shatzi Oct 16, 2023

Choose a reason for hiding this comment

jpbempel commented Oct 13, 2023 •

edited by jira bot

Loading

pr-commenter bot commented Oct 13, 2023 •

edited

Loading