Conversation
…e test-unit Rust-optional
The plugin tests were running every scenario twice — once with
use_rust=False (Python fallback) and once with use_rust=True (Rust
backend) — via @pytest.mark.parametrize. Since the Rust extension is
the only production implementation, the Python-path variants test
internal fallback code that users never hit directly and that is not a
supported product surface.
Changes:
- Remove all @pytest.mark.parametrize("use_rust", ...) decorators; each
test now calls _scan_container(payload, cfg) and lets the plugin
auto-select the backend (Rust when available, Python fallback otherwise).
- Remove TestRustPythonParity class (Rust/Python output parity is only
meaningful while two maintained implementations exist).
- Strip explicit use_rust=False from non-parametric helpers.
- Rename TestNewFeaturesRustParity → TestNewFeatures.
- Rename test_max_findings_per_value_cap_python_path → test_max_findings_per_value_cap.
- Guard make test-unit to emit a skip message instead of a hard error
when cargo is not on PATH, so pytest can still run in environments
without Rust tooling installed.
Closes #63
Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
…e only implementation The Python scanning implementation (shannon_entropy, printable_ratio, decode_candidate, scan_text, scan_container, etc.) was a complete duplicate of the Rust engine kept as a silent ImportError fallback. Since the Rust extension is the sole production path and has full feature parity, the fallback is dead weight. Changes: - encoded_exfil_detection.py: delete all Python detection functions and constants; replace try/except import with a direct hard import of ExfilDetectorEngine and py_scan_container; simplify plugin __init__ to always construct the Rust engine; keep _scan_container and _scan_text as thin Rust-backed wrappers for external callers - __init__.py: remove backward-compat py_scan_container re-export - test_integration.py: remove 12 tests that exercised deleted Python helpers (_shannon_entropy, _normalize_padding, _decode_candidate, _has_egress_context, _printable_ratio, _evaluate_candidate); all remaining 84 tests pass via the Rust engine Closes #63 Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
|
Addressed reviewer feedback: Python fallback removed — plugin is now Rust-only. The reviewer asked whether the plugin still had a fallback path. It did: Commit 5227896 removes it:
Net: −464 lines of Python, zero loss of detection capability. |
Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
|
Thanks for the cleanup here. I found a few items worth fixing before merge:
|
Three issues raised by reviewer lucarlig: 1. Stubs diverge from runtime API — remove py_scan_container from __init__.pyi and from the hardcoded top-level stub in stub_gen.rs. The symbol is no longer re-exported by __init__.py so type checkers were accepting an import that fails at runtime. 2. Allowlist regex validation used Python re.compile() semantics, which accepts lookaround and backreferences that Rust regex rejects. Replace the misleading Python check with a non-empty-string guard and wrap ExfilDetectorEngine construction in a try/except that raises a clear ValueError naming allowlist_patterns and the unsupported features, so the engine fails closed with an actionable message. 3. test-unit silently skipped when cargo was absent, letting test-all and check-all go green without testing the only scanner implementation. Make test-unit fail loudly if cargo is missing. Add test-unit-local as an explicit opt-in target that preserves the skip-with-notice behaviour for environments without Rust toolchain. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
|
Thanks for the thorough review. All three items are addressed in commit 4172fd9. 1. Stubs diverge from runtime API Removed 2. Allowlist regex validation uses Python semantics Replaced the 3. Rust tests can silently skip
|
Three issues raised by reviewer lucarlig: 1. Stubs diverge from runtime API — remove py_scan_container from __init__.pyi and from the hardcoded top-level stub in stub_gen.rs. The symbol is no longer re-exported by __init__.py so type checkers were accepting an import that fails at runtime. 2. Allowlist regex validation used Python re.compile() semantics, which accepts lookaround and backreferences that Rust regex rejects. Replace the misleading Python check with a non-empty-string guard and wrap ExfilDetectorEngine construction in a try/except that raises a clear ValueError naming allowlist_patterns and the unsupported features, so the engine fails closed with an actionable message. 3. test-unit silently skipped when cargo was absent, letting test-all and check-all go green without testing the only scanner implementation. Make test-unit fail loudly if cargo is missing. Add test-unit-local as an explicit opt-in target that preserves the skip-with-notice behaviour for environments without Rust toolchain. Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
4172fd9 to
831bb09
Compare
|
Hi Suresh, thanks for the updates. The earlier stub/runtime drift and A few concerns still look worth handling before merge:
|
…back 1. Restore py_scan_container top-level re-export — removing it in a patch release (0.2.0→0.2.1) is a breaking change; lazy re-export added to __init__.py/__init__.pyi/stub_gen.rs so callers continue to work. 2. Guard Rust extension import at module level — replaced hard top-level import with a try/except that captures ImportError; _scan_container and EncodedExfilDetectorPlugin.__init__ now raise an actionable ImportError (rather than failing silently during plugin discovery). 3. Remove **_kwargs compatibility shim — _scan_container and _scan_text no longer accept use_rust= or other stale kwargs; callers using unsupported kwargs now get a clear TypeError instead of silent no-op. 4. Allowlist regex tests and docs — added test_python_valid_rust_invalid_allowlist_regex_rejected_at_init that passes (?<=foo)bar (lookbehind: valid Python, rejected by Rust's regex crate) and asserts ValueError matches "allowlist_patterns"; updated test_invalid_allowlist_regex_rejected_at_init to assert the same. README corrected: regex errors surface at engine initialization time, not at configuration time. Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
|
Thanks for the follow-up review. All four items are addressed in commit 8cd7638. 1. Patch release removes a top-level API Restored 2. Rust extension import now happens at module import time The top-level 3. Compatibility kwargs are silently ignored
4. Allowlist regex coverage/docs
|
Summary
Closes #63
This PR completes the Rust-only migration for
encoded_exfil_detectionby removing all dual-path testing and the Python fallback implementation.Changes
Remove parametric
use_rusttesting@pytest.mark.parametrize("use_rust", [False, True])from all test classes intest_integration.py— Rust is the production backend, no separate Python path warrants testing.TestRustPythonParityclass entirely.TestNewFeaturesRustParity→TestNewFeaturesandtest_max_findings_per_value_cap_python_path→test_max_findings_per_value_cap.Makefiletest-unitto skip gracefully (yellow notice) whencargois absent instead of hard-failing.Remove Python fallback implementation
encoded_exfil_detection.py: deleted the entire Python fallback (~400 lines:_PATTERNS,_shannon_entropy,_printable_ratio,_normalize_padding,_decode_candidate,_contains_sensitive_keywords,_has_egress_context,_apply_redactions,_evaluate_candidate,_scan_text,_scan_container).try/except ImportErrorwith a direct hard import ofExfilDetectorEngineandpy_scan_containerfrom the Rust extension — fails loudly if the extension is not built._scan_containerand_scan_textkept as thin one-line Rust-backed wrappers for external callers.__init__.py: removedpy_scan_containerbackward-compat re-export.Version bump
Cargo.toml,plugin-manifest.yaml,Cargo.lockbumped from0.2.0→0.2.1.Test plan
make test-integrationpasses — 84 passed, 2 xfailedmake test-unitskips cleanly whencargois absent