Skip to content

agent-sec-core: add prompt guard#253

Merged
edonyzpc merged 7 commits intoalibaba:release/agent-sec-core/v0.3from
haosanzi:main
Apr 22, 2026
Merged

agent-sec-core: add prompt guard#253
edonyzpc merged 7 commits intoalibaba:release/agent-sec-core/v0.3from
haosanzi:main

Conversation

@haosanzi
Copy link
Copy Markdown
Collaborator

@haosanzi haosanzi commented Apr 20, 2026

Description

Related Issue

closes #

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional change)
  • Performance improvement
  • CI/CD or build changes

Scope

  • cosh (copilot-shell)
  • sec-core (agent-sec-core)
  • skill (os-skills)
  • sight (agentsight)
  • Multiple / Project-wide

Checklist

  • I have read the Contributing Guide
  • My code follows the project's code style
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the documentation accordingly
  • For cosh: Lint passes, type check passes, and tests pass
  • For sec-core (Rust): cargo clippy -- -D warnings and cargo fmt --check pass
  • For sec-core (Python): Ruff format and pytest pass
  • For skill: Skill directory structure is valid and shell scripts pass syntax check
  • For sight: cargo clippy -- -D warnings and cargo fmt --check pass
  • Lock files are up to date (package-lock.json / Cargo.lock)

Testing

Additional Notes

@gemini-code-assist
Copy link
Copy Markdown

Important

Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services.

@haosanzi haosanzi added the component:sec-core src/agent-sec-core/ label Apr 20, 2026
@haosanzi haosanzi added this to the sec-core/v0.3 milestone Apr 20, 2026
Copy link
Copy Markdown
Collaborator

@edonyzpc edonyzpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Review Summary

Nice work on this prompt scanner module! The multi-layer architecture (L1 regex → L2 ML classifier → future L3 semantic) is well thought out — separating fast deterministic rules from ML inference is a pragmatic approach that balances latency and accuracy. The code quality is generally solid: clean abstractions, good docstrings, and comprehensive unit tests.

I've identified one blocking bug and a few items worth addressing.

🛡️ Key Risks & Issues

1. classify_batch will crash at runtime due to tensor shape mismatch in _probs_to_result (Critical)

In prompt_guard.py, _probs_to_result assumes its input is always a 2-D tensor (shape (1, num_labels)), because probs[0] is used to extract the first row. This works for classify() (which passes the output of _get_probabilities — always (1, num_labels)). However, classify_batch calls _probs_to_result(probs_tensor[i]) where probs_tensor[i] is already a 1-D tensor of shape (num_labels,). In this case, probs[0] yields a 0-dim scalar tensor, and probs[0].tolist() returns a single float — not a list. Downstream code (prob_map = {label: prob_list[i] ...}) will then raise TypeError: 'float' object is not subscriptable.

See inline comment for a suggested fix.

2. ModelManager.load_model lacks thread-safety for scan_batch concurrency (Medium)

scan_batch dispatches work via ThreadPoolExecutor. If the ML model isn't loaded yet when multiple threads hit MLClassifier.detect() simultaneously, all of them pass the if model_name in self._loaded_models check and enter _do_load, potentially triggering redundant downloads or concurrent writes to _loaded_models. Under CPython's GIL this won't corrupt the dict, but it could waste resources. Consider adding a threading.Lock around the check-then-load pattern in load_model, or using functools.lru_cache.

🧪 Verification Advice

  • The classify_batch bug should be easy to reproduce with a 2-item batch in test_ml_classifier.py (mock the model forward pass to return a (2, 2) tensor, then verify both results are correct ClassifierResult objects).
  • For the thread-safety concern, a stress test calling scan_batch with ~20 items on a cold-start (model not pre-loaded) would surface the redundant-load behavior.

💡 Thoughts & Suggestions

  • Dead code in model_manager.py: The already_cached variable (lines ~168-172) is computed (involving filesystem I/O) but never referenced afterward. It looks like residual scaffolding for progress-bar suppression logic that wasn't completed. Also, import os is duplicated — it's already imported at module level (line 18) and again inside _do_load.

  • PR description: The issue reference (closes #) is empty — consider linking the tracking issue before merge.

  • Overall architecture is clean. The separation of concerns (preprocessor → detectors → scoring → result) is easy to follow and extend.

@haosanzi haosanzi force-pushed the main branch 2 times, most recently from 106579d to 5d97a78 Compare April 21, 2026 02:38
Comment thread src/agent-sec-core/agent-sec-cli/src/agent_sec_cli/prompt_scanner/result.py Outdated
return Verdict.PASS
if risk_score < 0.8:
return Verdict.WARN
return Verdict.DENY
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NIT]当前版本采用观察者模式,DENY需要在hook是现实特别处理

Comment thread src/agent-sec-core/docs/PROMPT_SCANNER.md
…itecture

Add prompt_scanner module with multi-layer prompt injection and jailbreak detection capabilities:

- Add scan-prompt CLI subcommand with JSON format output for scan results
- Implement three-layer detection : L1 rule engine, L2 ML classifier, L3 semantic detection
- Support three detection modes: fast(L1), standard(L1+L2), strict(L1+L2+L3)
- Add prompt preprocessing module (Unicode normalization, encoding detection, etc. as stubs)
- Implement risk scoring system and Verdict determination logic
- Register scan-prompt to the main CLI application

Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
…ions

Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
- Implement NFKC unicode normalization for homoglyph and fullwidth char unification
- Add whitespace normalization with zero-width/invisible char stripping
- Implement encoding detection and decoding (Base64, ROT13, URL-encoding, hex)
- Add lightweight heuristic language detection (CJK, Arabic, Cyrillic, Devanagari, Latin)
- Add comprehensive unit tests for all preprocessing stages

Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
- Complete MLClassifier detection layer with Prompt Guard 2 integration
- Implement ModelManager with ModelScope download and caching support
- Implement PromptGuardClassifier with single and batch inference
- Add ml optional dependency group (torch, transformers, modelscope)
- Add unit tests in test_ml_classifier.py

Technical details:
- Use Meta Llama Prompt Guard 2 (86M) as default model
- Support BENIGN/INJECTION/JAILBREAK three-class detection
- Implement text preprocessing to avoid tokenizer boundary issues
- Auto-detect best compute device (CUDA > MPS > CPU)
- Download models via ModelScope mirror for China network compatibility

Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
…umentation

- Model pre-download and caching via ModelScope to eliminate cold-start latency
- Structured audit logs with scan and threat event types
- Human-readable text output format with visual indicators
- Full Python API documentation with examples
- Graceful degradation when optional detectors (ML/semantic) are unavailable

Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
…ly style guide

- Add double-checked locking to ModelManager.load_model() for thread safety
- Fix _probs_to_result() to handle both 1-D and 2-D tensor shapes
- Remove stale local cache-dir check in _do_load()
- Replace `os.path.expanduser` with `Path.expanduser()` in ModelManager
- Replace bare `dict` with `dict[str, Any]` in MLClassifier and RuleEngine
- Pass `str(cache_dir)` to snapshot_download for explicit type compat
- Optional heavy deps declared in [project.optional-dependencies] use lazy
  imports guarded by is_available() with `# noqa: PLC0415` annotations.

Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好像没有调security_middleware invoke?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我后续再添加pr吧。这个pr太大了。

- Fix thread-unsafe MLClassifier singleton init by adding _manager_lock
- Fix scan_batch to run serially in STANDARD/STRICT mode
- Fix json.dumps to use ensure_ascii=False in CLI error output
- Fix RuleEngine to include severity in ThreatDetail output
- Promote ML deps to core; remove optional [ml] extras

Signed-off-by: Shirong Hao <shirong@linux.alibaba.com>
Copy link
Copy Markdown
Collaborator

@edonyzpc edonyzpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edonyzpc edonyzpc merged commit 4f27f5c into alibaba:release/agent-sec-core/v0.3 Apr 22, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:sec-core src/agent-sec-core/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants