agent-sec-core: add prompt guard#253
agent-sec-core: add prompt guard#253edonyzpc merged 7 commits intoalibaba:release/agent-sec-core/v0.3from
Conversation
|
Important Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services. |
edonyzpc
left a comment
There was a problem hiding this comment.
👋 Review Summary
Nice work on this prompt scanner module! The multi-layer architecture (L1 regex → L2 ML classifier → future L3 semantic) is well thought out — separating fast deterministic rules from ML inference is a pragmatic approach that balances latency and accuracy. The code quality is generally solid: clean abstractions, good docstrings, and comprehensive unit tests.
I've identified one blocking bug and a few items worth addressing.
🛡️ Key Risks & Issues
1. classify_batch will crash at runtime due to tensor shape mismatch in _probs_to_result (Critical)
In prompt_guard.py, _probs_to_result assumes its input is always a 2-D tensor (shape (1, num_labels)), because probs[0] is used to extract the first row. This works for classify() (which passes the output of _get_probabilities — always (1, num_labels)). However, classify_batch calls _probs_to_result(probs_tensor[i]) where probs_tensor[i] is already a 1-D tensor of shape (num_labels,). In this case, probs[0] yields a 0-dim scalar tensor, and probs[0].tolist() returns a single float — not a list. Downstream code (prob_map = {label: prob_list[i] ...}) will then raise TypeError: 'float' object is not subscriptable.
See inline comment for a suggested fix.
2. ModelManager.load_model lacks thread-safety for scan_batch concurrency (Medium)
scan_batch dispatches work via ThreadPoolExecutor. If the ML model isn't loaded yet when multiple threads hit MLClassifier.detect() simultaneously, all of them pass the if model_name in self._loaded_models check and enter _do_load, potentially triggering redundant downloads or concurrent writes to _loaded_models. Under CPython's GIL this won't corrupt the dict, but it could waste resources. Consider adding a threading.Lock around the check-then-load pattern in load_model, or using functools.lru_cache.
🧪 Verification Advice
- The
classify_batchbug should be easy to reproduce with a 2-item batch intest_ml_classifier.py(mock the model forward pass to return a(2, 2)tensor, then verify both results are correctClassifierResultobjects). - For the thread-safety concern, a stress test calling
scan_batchwith ~20 items on a cold-start (model not pre-loaded) would surface the redundant-load behavior.
💡 Thoughts & Suggestions
-
Dead code in
model_manager.py: Thealready_cachedvariable (lines ~168-172) is computed (involving filesystem I/O) but never referenced afterward. It looks like residual scaffolding for progress-bar suppression logic that wasn't completed. Also,import osis duplicated — it's already imported at module level (line 18) and again inside_do_load. -
PR description: The issue reference (
closes #) is empty — consider linking the tracking issue before merge. -
Overall architecture is clean. The separation of concerns (preprocessor → detectors → scoring → result) is easy to follow and extend.
106579d to
5d97a78
Compare
| return Verdict.PASS | ||
| if risk_score < 0.8: | ||
| return Verdict.WARN | ||
| return Verdict.DENY |
There was a problem hiding this comment.
[NIT]当前版本采用观察者模式,DENY需要在hook是现实特别处理
…itecture Add prompt_scanner module with multi-layer prompt injection and jailbreak detection capabilities: - Add scan-prompt CLI subcommand with JSON format output for scan results - Implement three-layer detection : L1 rule engine, L2 ML classifier, L3 semantic detection - Support three detection modes: fast(L1), standard(L1+L2), strict(L1+L2+L3) - Add prompt preprocessing module (Unicode normalization, encoding detection, etc. as stubs) - Implement risk scoring system and Verdict determination logic - Register scan-prompt to the main CLI application Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
…ions Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
- Implement NFKC unicode normalization for homoglyph and fullwidth char unification - Add whitespace normalization with zero-width/invisible char stripping - Implement encoding detection and decoding (Base64, ROT13, URL-encoding, hex) - Add lightweight heuristic language detection (CJK, Arabic, Cyrillic, Devanagari, Latin) - Add comprehensive unit tests for all preprocessing stages Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
- Complete MLClassifier detection layer with Prompt Guard 2 integration - Implement ModelManager with ModelScope download and caching support - Implement PromptGuardClassifier with single and batch inference - Add ml optional dependency group (torch, transformers, modelscope) - Add unit tests in test_ml_classifier.py Technical details: - Use Meta Llama Prompt Guard 2 (86M) as default model - Support BENIGN/INJECTION/JAILBREAK three-class detection - Implement text preprocessing to avoid tokenizer boundary issues - Auto-detect best compute device (CUDA > MPS > CPU) - Download models via ModelScope mirror for China network compatibility Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
…umentation - Model pre-download and caching via ModelScope to eliminate cold-start latency - Structured audit logs with scan and threat event types - Human-readable text output format with visual indicators - Full Python API documentation with examples - Graceful degradation when optional detectors (ML/semantic) are unavailable Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
…ly style guide - Add double-checked locking to ModelManager.load_model() for thread safety - Fix _probs_to_result() to handle both 1-D and 2-D tensor shapes - Remove stale local cache-dir check in _do_load() - Replace `os.path.expanduser` with `Path.expanduser()` in ModelManager - Replace bare `dict` with `dict[str, Any]` in MLClassifier and RuleEngine - Pass `str(cache_dir)` to snapshot_download for explicit type compat - Optional heavy deps declared in [project.optional-dependencies] use lazy imports guarded by is_available() with `# noqa: PLC0415` annotations. Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
There was a problem hiding this comment.
好像没有调security_middleware invoke?
There was a problem hiding this comment.
我后续再添加pr吧。这个pr太大了。
- Fix thread-unsafe MLClassifier singleton init by adding _manager_lock - Fix scan_batch to run serially in STANDARD/STRICT mode - Fix json.dumps to use ensure_ascii=False in CLI error output - Fix RuleEngine to include severity in ThreatDetail output - Promote ML deps to core; remove optional [ml] extras Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
4f27f5c
into
alibaba:release/agent-sec-core/v0.3
Description
Related Issue
closes #
Type of Change
Scope
cosh(copilot-shell)sec-core(agent-sec-core)skill(os-skills)sight(agentsight)Checklist
cosh: Lint passes, type check passes, and tests passsec-core(Rust):cargo clippy -- -D warningsandcargo fmt --checkpasssec-core(Python): Ruff format and pytest passskill: Skill directory structure is valid and shell scripts pass syntax checksight:cargo clippy -- -D warningsandcargo fmt --checkpasspackage-lock.json/Cargo.lock)Testing
Additional Notes