Skip to content

fix(encoded_exfil_detection): remove parametric tests, drop Python fallback, bump to 0.2.1#64

Merged
lucarlig merged 5 commits intomainfrom
fix/encoded-exfil-remove-parametric-tests
May 1, 2026
Merged

fix(encoded_exfil_detection): remove parametric tests, drop Python fallback, bump to 0.2.1#64
lucarlig merged 5 commits intomainfrom
fix/encoded-exfil-remove-parametric-tests

Conversation

@msureshkumar88
Copy link
Copy Markdown
Collaborator

@msureshkumar88 msureshkumar88 commented Apr 30, 2026

Summary

Closes #63

This PR completes the Rust-only migration for encoded_exfil_detection by removing all dual-path testing and the Python fallback implementation.

Changes

Remove parametric use_rust testing

  • Removed @pytest.mark.parametrize("use_rust", [False, True]) from all test classes in test_integration.py — Rust is the production backend, no separate Python path warrants testing.
  • Removed TestRustPythonParity class entirely.
  • Renamed TestNewFeaturesRustParityTestNewFeatures and test_max_findings_per_value_cap_python_pathtest_max_findings_per_value_cap.
  • Updated Makefile test-unit to skip gracefully (yellow notice) when cargo is absent instead of hard-failing.

Remove Python fallback implementation

  • encoded_exfil_detection.py: deleted the entire Python fallback (~400 lines: _PATTERNS, _shannon_entropy, _printable_ratio, _normalize_padding, _decode_candidate, _contains_sensitive_keywords, _has_egress_context, _apply_redactions, _evaluate_candidate, _scan_text, _scan_container).
  • Replaced try/except ImportError with a direct hard import of ExfilDetectorEngine and py_scan_container from the Rust extension — fails loudly if the extension is not built.
  • _scan_container and _scan_text kept as thin one-line Rust-backed wrappers for external callers.
  • __init__.py: removed py_scan_container backward-compat re-export.
  • Removed 12 tests that directly exercised the deleted Python helpers.
  • Net: −464 lines removed.

Version bump

  • Cargo.toml, plugin-manifest.yaml, Cargo.lock bumped from 0.2.00.2.1.

Test plan

  • make test-integration passes — 84 passed, 2 xfailed
  • make test-unit skips cleanly when cargo is absent
  • CI passes on this PR

…e test-unit Rust-optional

The plugin tests were running every scenario twice — once with
use_rust=False (Python fallback) and once with use_rust=True (Rust
backend) — via @pytest.mark.parametrize. Since the Rust extension is
the only production implementation, the Python-path variants test
internal fallback code that users never hit directly and that is not a
supported product surface.

Changes:
- Remove all @pytest.mark.parametrize("use_rust", ...) decorators; each
  test now calls _scan_container(payload, cfg) and lets the plugin
  auto-select the backend (Rust when available, Python fallback otherwise).
- Remove TestRustPythonParity class (Rust/Python output parity is only
  meaningful while two maintained implementations exist).
- Strip explicit use_rust=False from non-parametric helpers.
- Rename TestNewFeaturesRustParity → TestNewFeatures.
- Rename test_max_findings_per_value_cap_python_path → test_max_findings_per_value_cap.
- Guard make test-unit to emit a skip message instead of a hard error
  when cargo is not on PATH, so pytest can still run in environments
  without Rust tooling installed.

Closes #63

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
…e only implementation

The Python scanning implementation (shannon_entropy, printable_ratio,
decode_candidate, scan_text, scan_container, etc.) was a complete
duplicate of the Rust engine kept as a silent ImportError fallback.
Since the Rust extension is the sole production path and has full
feature parity, the fallback is dead weight.

Changes:
- encoded_exfil_detection.py: delete all Python detection functions and
  constants; replace try/except import with a direct hard import of
  ExfilDetectorEngine and py_scan_container; simplify plugin __init__
  to always construct the Rust engine; keep _scan_container and
  _scan_text as thin Rust-backed wrappers for external callers
- __init__.py: remove backward-compat py_scan_container re-export
- test_integration.py: remove 12 tests that exercised deleted Python
  helpers (_shannon_entropy, _normalize_padding, _decode_candidate,
  _has_egress_context, _printable_ratio, _evaluate_candidate); all
  remaining 84 tests pass via the Rust engine

Closes #63

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
@msureshkumar88
Copy link
Copy Markdown
Collaborator Author

Addressed reviewer feedback: Python fallback removed — plugin is now Rust-only.

The reviewer asked whether the plugin still had a fallback path. It did: encoded_exfil_detection.py contained the full Python implementation (_shannon_entropy, _printable_ratio, _decode_candidate, _scan_text, _scan_container, etc.) behind a silent try/except ImportError. Since the Rust extension is a complete 1:1 replacement for every Python function and is the sole production path, the fallback was dead weight.

Commit 5227896 removes it:

  • encoded_exfil_detection.py: all Python detection functions and constants deleted; try/except import replaced with a direct hard import of ExfilDetectorEngine and py_scan_container — the plugin now fails at import time if the Rust extension is not built, which is the correct behaviour. _scan_container and _scan_text are retained as thin Rust-backed wrappers (one line each) so that callers outside the plugin class continue to work.
  • __init__.py: removed backward-compat py_scan_container re-export (was tagged # backward compat in the original).
  • test_integration.py: 12 tests that directly exercised the now-deleted Python helpers (_shannon_entropy, _normalize_padding, _decode_candidate, _has_egress_context, _printable_ratio, _evaluate_candidate) removed. All remaining 84 tests pass, 2 xfailed as expected.

Net: −464 lines of Python, zero loss of detection capability.

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
@msureshkumar88 msureshkumar88 changed the title fix(encoded_exfil_detection): remove parametric use_rust testing; make test-unit Rust-optional fix(encoded_exfil_detection): remove parametric tests, drop Python fallback, bump to 0.2.1 Apr 30, 2026
@lucarlig
Copy link
Copy Markdown
Collaborator

Thanks for the cleanup here. I found a few items worth fixing before merge:

  1. Runtime API and stubs diverge

    cpex_encoded_exfil_detection.__init__ no longer exposes py_scan_container, but __init__.pyi and src/bin/stub_gen.rs still advertise from cpex_encoded_exfil_detection import py_scan_container. That means type checkers accept an import that now fails at runtime. Please either restore the lazy top-level export, or remove it from both shipped/generated stubs.

  2. Allowlist regex validation still uses Python semantics

    allowlist_patterns are validated with Python re.compile, but plugin init now unconditionally builds the Rust engine. Valid Python regex features that Rust regex rejects, such as lookaround/backrefs, can now pass config validation and then fail during engine construction. Please validate against the Rust-compatible regex syntax, or fail closed with a clear config error before constructing the engine.

  3. Rust tests can silently skip

    test-unit exits successfully when cargo is missing, but this PR makes Rust the only scanner implementation. make test, make test-all, or check-all can therefore go green without testing the only implementation. Please make missing Cargo fail, or move the skip behind an explicit opt-in local target/flag.

msureshkumar88 pushed a commit that referenced this pull request Apr 30, 2026
Three issues raised by reviewer lucarlig:

1. Stubs diverge from runtime API — remove py_scan_container from
   __init__.pyi and from the hardcoded top-level stub in stub_gen.rs.
   The symbol is no longer re-exported by __init__.py so type checkers
   were accepting an import that fails at runtime.

2. Allowlist regex validation used Python re.compile() semantics, which
   accepts lookaround and backreferences that Rust regex rejects. Replace
   the misleading Python check with a non-empty-string guard and wrap
   ExfilDetectorEngine construction in a try/except that raises a clear
   ValueError naming allowlist_patterns and the unsupported features,
   so the engine fails closed with an actionable message.

3. test-unit silently skipped when cargo was absent, letting test-all
   and check-all go green without testing the only scanner implementation.
   Make test-unit fail loudly if cargo is missing. Add test-unit-local
   as an explicit opt-in target that preserves the skip-with-notice
   behaviour for environments without Rust toolchain.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
@msureshkumar88
Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review. All three items are addressed in commit 4172fd9.

1. Stubs diverge from runtime API

Removed py_scan_container from cpex_encoded_exfil_detection/__init__.pyi (__all__ and the from .encoded_exfil_detection_rust import line) and updated stub_gen.rs so future make stub-gen runs produce the corrected stub. Type checkers will no longer advertise an import that fails at runtime.

2. Allowlist regex validation uses Python semantics

Replaced the re.compile() Pydantic validator with a minimal non-empty-string guard (the only invariant Pydantic can meaningfully enforce here). Rust validates regex syntax when ExfilDetectorEngine is constructed; that call is now wrapped in a try/except ValueError that re-raises with a clear message naming allowlist_patterns and the unsupported features (lookaround, backreferences). The plugin fails closed before it becomes usable if a Rust-incompatible pattern is supplied.

3. Rust tests can silently skip

test-unit now calls $(CARGO) test unconditionally — cargo absent means make fails. Added test-unit-local as an explicit opt-in target that preserves the skip-with-notice behaviour for environments intentionally without a Rust toolchain.

Three issues raised by reviewer lucarlig:

1. Stubs diverge from runtime API — remove py_scan_container from
   __init__.pyi and from the hardcoded top-level stub in stub_gen.rs.
   The symbol is no longer re-exported by __init__.py so type checkers
   were accepting an import that fails at runtime.

2. Allowlist regex validation used Python re.compile() semantics, which
   accepts lookaround and backreferences that Rust regex rejects. Replace
   the misleading Python check with a non-empty-string guard and wrap
   ExfilDetectorEngine construction in a try/except that raises a clear
   ValueError naming allowlist_patterns and the unsupported features,
   so the engine fails closed with an actionable message.

3. test-unit silently skipped when cargo was absent, letting test-all
   and check-all go green without testing the only scanner implementation.
   Make test-unit fail loudly if cargo is missing. Add test-unit-local
   as an explicit opt-in target that preserves the skip-with-notice
   behaviour for environments without Rust toolchain.

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
@msureshkumar88 msureshkumar88 force-pushed the fix/encoded-exfil-remove-parametric-tests branch from 4172fd9 to 831bb09 Compare April 30, 2026 14:08
@lucarlig
Copy link
Copy Markdown
Collaborator

Hi Suresh, thanks for the updates. The earlier stub/runtime drift and test-unit skip issue look addressed now.

A few concerns still look worth handling before merge:

  1. Patch release removes a top-level API

    from cpex_encoded_exfil_detection import py_scan_container worked before, but now fails while the version only moves from 0.2.0 to 0.2.1. If this is intended to be non-breaking, please restore the lazy top-level export and stub entry. If it is breaking, the version/docs should make that clear.

  2. Rust extension import now happens at module import time

    encoded_exfil_detection.py imports encoded_exfil_detection_rust immediately. A source checkout or install missing the compiled extension now fails during module import/plugin discovery, before plugin init can raise a targeted error. Please move the Rust import behind engine construction, or keep a guarded import that raises an actionable init error.

  3. Compatibility kwargs are silently ignored

    _scan_container(..., use_rust=False) and _scan_text(..., use_rust=False) still call successfully because of **_kwargs, but Rust is always used. Please reject unsupported kwargs explicitly, or remove that compatibility surface so old callers fail clearly.

  4. Allowlist regex coverage/docs need a small follow-up

    The current invalid-regex test uses [invalid, which is invalid in both Python and Rust. Please add a Python-valid/Rust-invalid case, such as (?<=foo)bar, and assert the wrapped error mentions allowlist_patterns / Rust regex compatibility. Also, the README still says invalid allowlist regexes are rejected at configuration time, but syntax errors now surface later during Rust engine/plugin initialization.

…back

1. Restore py_scan_container top-level re-export — removing it in a patch
   release (0.2.0→0.2.1) is a breaking change; lazy re-export added to
   __init__.py/__init__.pyi/stub_gen.rs so callers continue to work.

2. Guard Rust extension import at module level — replaced hard top-level
   import with a try/except that captures ImportError; _scan_container and
   EncodedExfilDetectorPlugin.__init__ now raise an actionable ImportError
   (rather than failing silently during plugin discovery).

3. Remove **_kwargs compatibility shim — _scan_container and _scan_text no
   longer accept use_rust= or other stale kwargs; callers using unsupported
   kwargs now get a clear TypeError instead of silent no-op.

4. Allowlist regex tests and docs — added
   test_python_valid_rust_invalid_allowlist_regex_rejected_at_init that
   passes (?<=foo)bar (lookbehind: valid Python, rejected by Rust's regex
   crate) and asserts ValueError matches "allowlist_patterns"; updated
   test_invalid_allowlist_regex_rejected_at_init to assert the same.
   README corrected: regex errors surface at engine initialization time,
   not at configuration time.

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>
@msureshkumar88
Copy link
Copy Markdown
Collaborator Author

Thanks for the follow-up review. All four items are addressed in commit 8cd7638.

1. Patch release removes a top-level API

Restored py_scan_container as a lazy re-export in __init__.py (added to __getattr__ and __all__), updated __init__.pyi to re-export it, and updated stub_gen.rs so make stub-gen produces the corrected stub going forward. from cpex_encoded_exfil_detection import py_scan_container works again and type-checkers won't advertise an import that fails at runtime.

2. Rust extension import now happens at module import time

The top-level from cpex_encoded_exfil_detection.encoded_exfil_detection_rust import ... is now wrapped in try/except ImportError. Both _scan_container and EncodedExfilDetectorPlugin.__init__ check _RUST_IMPORT_ERROR and raise a targeted ImportError("Rust extension not built — run 'make install' before using this plugin") before doing anything with the extension. Plugin discovery no longer fails at import time; the error surfaces at the point of use with a clear action to take.

3. Compatibility kwargs are silently ignored

**_kwargs removed from both _scan_container and _scan_text. Callers passing use_rust=False (or any other stale kwarg) now get a TypeError immediately instead of a silent no-op.

4. Allowlist regex coverage/docs

  • Added test_python_valid_rust_invalid_allowlist_regex_rejected_at_init: passes (?<=foo)bar (lookbehind — valid Python, rejected by Rust's regex crate), asserts ValueError with match="allowlist_patterns".
  • Tightened the existing test_invalid_allowlist_regex_rejected_at_init to also use pytest.raises(ValueError, match="allowlist_patterns") (previously (ValidationError, Exception) without a message check).
  • README updated: "rejected at configuration time" → "rejected at engine initialization time (during plugin construction). Features such as lookaround and backreferences are not supported."

Copy link
Copy Markdown
Collaborator

@lucarlig lucarlig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lucarlig lucarlig merged commit 1770855 into main May 1, 2026
44 checks passed
@lucarlig lucarlig deleted the fix/encoded-exfil-remove-parametric-tests branch May 1, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

encoded_exfil_detection: remove parametric use_rust testing, drop Python fallback, and bump to 0.2.1

2 participants