agent-sec-core: add prompt guard by haosanzi · Pull Request #253 · alibaba/anolisa

haosanzi · 2026-04-20T10:00:25Z

Description

Related Issue

closes #

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional change)
Performance improvement
CI/CD or build changes

Scope

Checklist

Testing

Additional Notes

gemini-code-assist · 2026-04-20T10:00:29Z

Important

Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services.

edonyzpc

👋 Review Summary

Nice work on this prompt scanner module! The multi-layer architecture (L1 regex → L2 ML classifier → future L3 semantic) is well thought out — separating fast deterministic rules from ML inference is a pragmatic approach that balances latency and accuracy. The code quality is generally solid: clean abstractions, good docstrings, and comprehensive unit tests.

I've identified one blocking bug and a few items worth addressing.

🛡️ Key Risks & Issues

1. classify_batch will crash at runtime due to tensor shape mismatch in _probs_to_result (Critical)

In prompt_guard.py, _probs_to_result assumes its input is always a 2-D tensor (shape (1, num_labels)), because probs[0] is used to extract the first row. This works for classify() (which passes the output of _get_probabilities — always (1, num_labels)). However, classify_batch calls _probs_to_result(probs_tensor[i]) where probs_tensor[i] is already a 1-D tensor of shape (num_labels,). In this case, probs[0] yields a 0-dim scalar tensor, and probs[0].tolist() returns a single float — not a list. Downstream code (prob_map = {label: prob_list[i] ...}) will then raise TypeError: 'float' object is not subscriptable.

See inline comment for a suggested fix.

2. ModelManager.load_model lacks thread-safety for scan_batch concurrency (Medium)

scan_batch dispatches work via ThreadPoolExecutor. If the ML model isn't loaded yet when multiple threads hit MLClassifier.detect() simultaneously, all of them pass the if model_name in self._loaded_models check and enter _do_load, potentially triggering redundant downloads or concurrent writes to _loaded_models. Under CPython's GIL this won't corrupt the dict, but it could waste resources. Consider adding a threading.Lock around the check-then-load pattern in load_model, or using functools.lru_cache.

🧪 Verification Advice

The classify_batch bug should be easy to reproduce with a 2-item batch in test_ml_classifier.py (mock the model forward pass to return a (2, 2) tensor, then verify both results are correct ClassifierResult objects).
For the thread-safety concern, a stress test calling scan_batch with ~20 items on a cold-start (model not pre-loaded) would surface the redundant-load behavior.

💡 Thoughts & Suggestions

Dead code in model_manager.py: The already_cached variable (lines ~168-172) is computed (involving filesystem I/O) but never referenced afterward. It looks like residual scaffolding for progress-bar suppression logic that wasn't completed. Also, import os is duplicated — it's already imported at module level (line 18) and again inside _do_load.
PR description: The issue reference (closes #) is empty — consider linking the tracking issue before merge.
Overall architecture is clean. The separation of concerns (preprocessor → detectors → scoring → result) is easy to follow and extend.

edonyzpc · 2026-04-21T06:14:22Z

+        return Verdict.PASS
+    if risk_score < 0.8:
+        return Verdict.WARN
+    return Verdict.DENY


[NIT]当前版本采用观察者模式，DENY需要在hook是现实特别处理

…itecture Add prompt_scanner module with multi-layer prompt injection and jailbreak detection capabilities: - Add scan-prompt CLI subcommand with JSON format output for scan results - Implement three-layer detection : L1 rule engine, L2 ML classifier, L3 semantic detection - Support three detection modes: fast(L1), standard(L1+L2), strict(L1+L2+L3) - Add prompt preprocessing module (Unicode normalization, encoding detection, etc. as stubs) - Implement risk scoring system and Verdict determination logic - Register scan-prompt to the main CLI application Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>

…ions Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>

- Implement NFKC unicode normalization for homoglyph and fullwidth char unification - Add whitespace normalization with zero-width/invisible char stripping - Implement encoding detection and decoding (Base64, ROT13, URL-encoding, hex) - Add lightweight heuristic language detection (CJK, Arabic, Cyrillic, Devanagari, Latin) - Add comprehensive unit tests for all preprocessing stages Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>

- Complete MLClassifier detection layer with Prompt Guard 2 integration - Implement ModelManager with ModelScope download and caching support - Implement PromptGuardClassifier with single and batch inference - Add ml optional dependency group (torch, transformers, modelscope) - Add unit tests in test_ml_classifier.py Technical details: - Use Meta Llama Prompt Guard 2 (86M) as default model - Support BENIGN/INJECTION/JAILBREAK three-class detection - Implement text preprocessing to avoid tokenizer boundary issues - Auto-detect best compute device (CUDA > MPS > CPU) - Download models via ModelScope mirror for China network compatibility Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>

…umentation - Model pre-download and caching via ModelScope to eliminate cold-start latency - Structured audit logs with scan and threat event types - Human-readable text output format with visual indicators - Full Python API documentation with examples - Graceful degradation when optional detectors (ML/semantic) are unavailable Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>

…ly style guide - Add double-checked locking to ModelManager.load_model() for thread safety - Fix _probs_to_result() to handle both 1-D and 2-D tensor shapes - Remove stale local cache-dir check in _do_load() - Replace `os.path.expanduser` with `Path.expanduser()` in ModelManager - Replace bare `dict` with `dict[str, Any]` in MLClassifier and RuleEngine - Pass `str(cache_dir)` to snapshot_download for explicit type compat - Optional heavy deps declared in [project.optional-dependencies] use lazy imports guarded by is_available() with `# noqa: PLC0415` annotations. Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>

RemindD · 2026-04-21T12:12:44Z

好像没有调security_middleware invoke?

我后续再添加pr吧。这个pr太大了。

- Fix thread-unsafe MLClassifier singleton init by adding _manager_lock - Fix scan_batch to run serially in STANDARD/STRICT mode - Fix json.dumps to use ensure_ascii=False in CLI error output - Fix RuleEngine to include severity in ThreatDetail output - Promote ML deps to core; remove optional [ml] extras Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>

edonyzpc

LGTM

haosanzi requested review from casparant, edonyzpc and kid9 as code owners April 20, 2026 10:00

haosanzi force-pushed the main branch from f0ae42e to 38d608d Compare April 20, 2026 10:28

haosanzi added the component:sec-core src/agent-sec-core/ label Apr 20, 2026

haosanzi added this to the sec-core/v0.3 milestone Apr 20, 2026

edonyzpc requested changes Apr 20, 2026

View reviewed changes

Comment thread src/agent-sec-core/agent-sec-cli/src/agent_sec_cli/prompt_scanner/models/prompt_guard.py

haosanzi force-pushed the main branch 2 times, most recently from 106579d to 5d97a78 Compare April 21, 2026 02:38