chore(appsec): skip user block when auto mode resolves to disabled#17580
Conversation
When `block_request_if_user_blocked` is called with `mode="auto"` (which happens via `set_user_for_asm` during authenticated requests), the disabled-mode guard was evaluated against the *incoming* parameter value before the AUTO→configured-mode resolution occurred. This meant that when `DD_APPSEC_AUTO_USER_INSTRUMENTATION_MODE=disabled` was configured, the check at the guard read `"auto" == "disabled"` → False, and the function proceeded to call `should_block_user`, which invoked the WAF with the user ID. The WAF returning `keep=True` for that call resulted in the trace being force-kept via `_asm_manual_keep` even though no automated user event was generated. Fix: resolve the AUTO mode to the configured mode first, then apply the disabled guard. This ensures that a configured disabled state is respected regardless of whether the caller passes `"auto"` or `"disabled"` directly. JJ-Change-Id: qvyyqx
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b554208d30
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Performance SLOsComparing candidate romain.marcadier/appsec-block-user-disabled-mode (cedaeca) with baseline main (a9cc850) 📈 Performance Regressions (2 suites)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 530.233µs (SLO: <700.000µs 📉 -24.3%) vs baseline: 📈 +27.4% Memory: ✅ 43.838MB (SLO: <46.000MB -4.7%) vs baseline: +5.1% ✅ ospathbasename_noaspectTime: ✅ 434.075µs (SLO: <700.000µs 📉 -38.0%) vs baseline: +1.1% Memory: ✅ 43.819MB (SLO: <46.000MB -4.7%) vs baseline: +4.9% ✅ ospathjoin_aspectTime: ✅ 630.910µs (SLO: <700.000µs -9.9%) vs baseline: +0.3% Memory: ✅ 43.916MB (SLO: <46.000MB -4.5%) vs baseline: +5.1% ✅ ospathjoin_noaspectTime: ✅ 639.208µs (SLO: <700.000µs -8.7%) vs baseline: +0.5% Memory: ✅ 43.817MB (SLO: <46.000MB -4.7%) vs baseline: +5.0% ✅ ospathnormcase_aspectTime: ✅ 349.971µs (SLO: <700.000µs 📉 -50.0%) vs baseline: +0.4% Memory: ✅ 43.891MB (SLO: <46.000MB -4.6%) vs baseline: +5.0% ✅ ospathnormcase_noaspectTime: ✅ 358.089µs (SLO: <700.000µs 📉 -48.8%) vs baseline: -0.4% Memory: ✅ 43.874MB (SLO: <46.000MB -4.6%) vs baseline: +5.2% ✅ ospathsplit_aspectTime: ✅ 490.195µs (SLO: <700.000µs 📉 -30.0%) vs baseline: -0.3% Memory: ✅ 43.777MB (SLO: <46.000MB -4.8%) vs baseline: +4.7% ✅ ospathsplit_noaspectTime: ✅ 501.710µs (SLO: <700.000µs 📉 -28.3%) vs baseline: +0.3% Memory: ✅ 43.869MB (SLO: <46.000MB -4.6%) vs baseline: +4.6% ✅ ospathsplitdrive_aspectTime: ✅ 376.158µs (SLO: <700.000µs 📉 -46.3%) vs baseline: +1.2% Memory: ✅ 43.859MB (SLO: <46.000MB -4.7%) vs baseline: +5.4% ✅ ospathsplitdrive_noaspectTime: ✅ 72.936µs (SLO: <700.000µs 📉 -89.6%) vs baseline: +0.3% Memory: ✅ 43.940MB (SLO: <46.000MB -4.5%) vs baseline: +5.4% ✅ ospathsplitext_aspectTime: ✅ 468.664µs (SLO: <700.000µs 📉 -33.0%) vs baseline: +2.5% Memory: ✅ 43.882MB (SLO: <46.000MB -4.6%) vs baseline: +5.0% ✅ ospathsplitext_noaspectTime: ✅ 472.807µs (SLO: <700.000µs 📉 -32.5%) vs baseline: +1.2% Memory: ✅ 43.838MB (SLO: <46.000MB -4.7%) vs baseline: +5.1% 📈 iastaspectssplit - 12/12✅ rsplit_aspectTime: ✅ 164.378µs (SLO: <250.000µs 📉 -34.2%) vs baseline: 📈 +10.9% Memory: ✅ 43.844MB (SLO: <46.000MB -4.7%) vs baseline: +4.8% ✅ rsplit_noaspectTime: ✅ 160.924µs (SLO: <250.000µs 📉 -35.6%) vs baseline: +3.0% Memory: ✅ 43.868MB (SLO: <46.000MB -4.6%) vs baseline: +5.3% ✅ split_aspectTime: ✅ 153.591µs (SLO: <250.000µs 📉 -38.6%) vs baseline: +4.8% Memory: ✅ 43.854MB (SLO: <46.000MB -4.7%) vs baseline: +5.3% ✅ split_noaspectTime: ✅ 153.641µs (SLO: <250.000µs 📉 -38.5%) vs baseline: +2.6% Memory: ✅ 43.744MB (SLO: <46.000MB -4.9%) vs baseline: +4.7% ✅ splitlines_aspectTime: ✅ 148.880µs (SLO: <250.000µs 📉 -40.4%) vs baseline: +3.3% Memory: ✅ 43.849MB (SLO: <46.000MB -4.7%) vs baseline: +4.9% ✅ splitlines_noaspectTime: ✅ 152.229µs (SLO: <250.000µs 📉 -39.1%) vs baseline: +0.1% Memory: ✅ 43.925MB (SLO: <46.000MB -4.5%) vs baseline: +5.8% ✅ All Tests Passing (1 suite)✅ iastpropagation - 8/8✅ no-propagationTime: ✅ 48.731µs (SLO: <60.000µs 📉 -18.8%) vs baseline: +0.6% Memory: ✅ 40.875MB (SLO: <42.000MB -2.7%) vs baseline: +4.9% ✅ propagation_enabledTime: ✅ 136.190µs (SLO: <190.000µs 📉 -28.3%) vs baseline: +0.2% Memory: ✅ 40.934MB (SLO: <42.000MB -2.5%) vs baseline: +5.1% ✅ propagation_enabled_100Time: ✅ 1.561ms (SLO: <2.300ms 📉 -32.1%) vs baseline: -1.4% Memory: ✅ 40.875MB (SLO: <42.000MB -2.7%) vs baseline: +4.9% ✅ propagation_enabled_1000Time: ✅ 29.079ms (SLO: <34.550ms 📉 -15.8%) vs baseline: -0.7% Memory: ✅ 40.934MB (SLO: <42.000MB -2.5%) vs baseline: +5.2% ℹ️ Scenarios Missing SLO Configuration (20 scenarios)The following scenarios exist in candidate data but have no SLO thresholds configured:
|
Codeowners resolved as |
When `block_request_if_user_blocked` is called in `auto` mode and the configured user instrumentation mode resolves to `disabled`, the previous code fell through a combined guard that logged a warning saying "ASM must be enabled" — which is misleading because ASM may well be enabled. Separate the two early-exit conditions: log the warning only when ASM is actually disabled, and emit a `debug`-level message for the expected case where user instrumentation mode is explicitly configured as `disabled`. This avoids log noise on the hot path for a correctly-configured system. JJ-Change-Id: ttpsrz
…17580) ## Summary `block_request_if_user_blocked` has a guard-ordering bug: it checks `mode == DISABLED` **before** resolving `AUTO` to the configured mode. When called with `mode="auto"` (via `set_user_for_asm`) and `DD_APPSEC_AUTO_USER_INSTRUMENTATION_MODE=disabled`, the guard reads `"auto" == "disabled"` → False and proceeds, calling `should_block_user` unnecessarily. That WAF call returns `keep=True` and causes the tracer to force-keep the trace via `_asm_manual_keep` — even though no automated user event was generated. Fix: resolve `AUTO` to the configured mode first, then apply the disabled guard. ```python # Before (buggy) if not asm_config._asm_enabled or mode == LOGIN_EVENTS_MODE.DISABLED: return if mode == LOGIN_EVENTS_MODE.AUTO: mode = asm_config._user_event_mode # too late — guard already passed # After if mode == LOGIN_EVENTS_MODE.AUTO: mode = asm_config._user_event_mode # resolve first if not asm_config._asm_enabled or mode == LOGIN_EVENTS_MODE.DISABLED: return # now correctly catches disabled ``` ## How the bug was found While writing end-to-end tests for the `APPSEC_AUTO_EVENTS_TRACKING=disabled` scenario in [system-tests](DataDog/system-tests#6750), login-success traces were unexpectedly force-kept (`_sampling_priority_v1=2`, `_dd.p.dm=-5`) even though no user event tags were emitted. Debug tracing confirmed `_asm_manual_keep` was called from `_processor.py:399` via `should_block_user` → `call_waf_callback(usr.id=...)`, bypassing the disabled guard due to the ordering bug described above. ## Release notes > **Note:** Release notes still needed before this can be merged. Co-authored-by: romain.marcadier <romain.marcadier@datadoghq.com>
…17580) ## Summary `block_request_if_user_blocked` has a guard-ordering bug: it checks `mode == DISABLED` **before** resolving `AUTO` to the configured mode. When called with `mode="auto"` (via `set_user_for_asm`) and `DD_APPSEC_AUTO_USER_INSTRUMENTATION_MODE=disabled`, the guard reads `"auto" == "disabled"` → False and proceeds, calling `should_block_user` unnecessarily. That WAF call returns `keep=True` and causes the tracer to force-keep the trace via `_asm_manual_keep` — even though no automated user event was generated. Fix: resolve `AUTO` to the configured mode first, then apply the disabled guard. ```python # Before (buggy) if not asm_config._asm_enabled or mode == LOGIN_EVENTS_MODE.DISABLED: return if mode == LOGIN_EVENTS_MODE.AUTO: mode = asm_config._user_event_mode # too late — guard already passed # After if mode == LOGIN_EVENTS_MODE.AUTO: mode = asm_config._user_event_mode # resolve first if not asm_config._asm_enabled or mode == LOGIN_EVENTS_MODE.DISABLED: return # now correctly catches disabled ``` ## How the bug was found While writing end-to-end tests for the `APPSEC_AUTO_EVENTS_TRACKING=disabled` scenario in [system-tests](DataDog/system-tests#6750), login-success traces were unexpectedly force-kept (`_sampling_priority_v1=2`, `_dd.p.dm=-5`) even though no user event tags were emitted. Debug tracing confirmed `_asm_manual_keep` was called from `_processor.py:399` via `should_block_user` → `call_waf_callback(usr.id=...)`, bypassing the disabled guard due to the ordering bug described above. ## Release notes > **Note:** Release notes still needed before this can be merged. Co-authored-by: romain.marcadier <romain.marcadier@datadoghq.com>
Summary
block_request_if_user_blockedhas a guard-ordering bug: it checksmode == DISABLEDbefore resolvingAUTOto the configured mode.When called with
mode="auto"(viaset_user_for_asm) andDD_APPSEC_AUTO_USER_INSTRUMENTATION_MODE=disabled, the guard reads"auto" == "disabled"→ False and proceeds, callingshould_block_userunnecessarily. That WAF call returns
keep=Trueand causes the tracerto force-keep the trace via
_asm_manual_keep— even though noautomated user event was generated.
Fix: resolve
AUTOto the configured mode first, then apply thedisabled guard.
How the bug was found
While writing end-to-end tests for the
APPSEC_AUTO_EVENTS_TRACKING=disabledscenario in system-tests,
login-success traces were unexpectedly force-kept (
_sampling_priority_v1=2,_dd.p.dm=-5) even though no user event tags were emitted. Debugtracing confirmed
_asm_manual_keepwas called from_processor.py:399via
should_block_user→call_waf_callback(usr.id=...), bypassing thedisabled guard due to the ordering bug described above.
Release notes