Skip to content

WP-B: Security Hardening - Prompt Injection Defense & Audit Logging#43

Merged
Steffen025 merged 6 commits intodevfrom
feature/wp-b-security-hardening
Mar 8, 2026
Merged

WP-B: Security Hardening - Prompt Injection Defense & Audit Logging#43
Steffen025 merged 6 commits intodevfrom
feature/wp-b-security-hardening

Conversation

@Steffen025
Copy link
Owner

@Steffen025 Steffen025 commented Mar 6, 2026

Summary

Implements comprehensive security hardening for PAI-OpenCode v3.0 (WP-B).

Changes

New Files

  • .opencode/plugins/lib/injection-patterns.ts — 6 categories of prompt injection patterns (~30 total)
  • .opencode/plugins/lib/sanitizer.ts — Input normalization pipeline
  • docs/architecture/adr/ADR-011-security-hardening.md — Architectural decision record

Modified Files

  • .opencode/plugins/adapters/types.ts — Added 6 new DANGEROUS_PATTERNS
  • .opencode/plugins/handlers/security-validator.ts — Full refactor with audit logging

Security Improvements

Before WP-B After WP-B
5 inline injection patterns 30 categorized patterns
No input sanitization 4-step normalization pipeline
Only checks args.content Scans all 9 text fields
No audit trail JSONL audit log
15 dangerous patterns 21 dangerous patterns

Testing

  • All files pass Biome checks
  • No console.log in new files
  • Type-safe exports for all new modules

Related

Summary by CodeRabbit

  • New Features

    • Erweiterte Erkennung moderner Angriffsvektoren (obfuskierte RCE, Befehls‑Substitution, Umgebungs‑Exfiltration, SSH/Schlüssel‑Lecks) mit kategorisierten Mustern.
    • Umfangreiche Eingabesanitierung (Base64‑Decodierung, Unicode‑Normalisierung, Obfuskationsauflösung, HTML‑Strip) und mehrfeld‑Scanning aller relevanten Texteingaben.
    • Redigierte Befehlsvorschauen und asynchrones Audit‑Logging (append‑only JSONL).
  • Bug Fixes / Verhalten

    • Fehlertolerantes (fail‑open) Verhalten: Fehler protokolliert, Abläufe bleiben erlaubt.
  • Documentation

    • Neue Architektur‑ADR und Sicherheits‑Hardening‑Plan ergänzt.

Plan documents:
- Current state analysis (what security-validator.ts already covers)
- 4 blocks: injection-patterns.ts, sanitizer.ts, types.ts extensions, ADR-011
- 6 injection categories: instruction override, role hijacking, system prompt
  extraction, safety bypass, context separators, MCP tool injection
- Sanitizer pipeline: base64 decode, unicode normalize, spacing collapse, HTML strip
- Security audit log: JSONL at MEMORY/STATE/security-audit.jsonl
- New DANGEROUS_PATTERNS: base64+exec, env exfiltration, Python/Node RCE
- File manifest with exact line count estimates
- ISC preview criteria for verification
Implements comprehensive security improvements:

- lib/injection-patterns.ts: 6 categories of injection patterns (~30 total)
  * instruction_override, role_hijacking, system_prompt_extraction
  * safety_bypass, context_separator, mcp_tool_injection

- lib/sanitizer.ts: Input normalization pipeline
  * decodeBase64Payloads, normalizeUnicode, collapseObfuscatedSpacing
  * stripHtmlTags, sanitizeForSecurityCheck

- adapters/types.ts: 6 new DANGEROUS_PATTERNS
  * base64+exec, env exfiltration, Python/Node RCE, SSH recon

- handlers/security-validator.ts: Full refactor
  * Uses new libraries, multi-field scanning, security audit logging
  * Logs to MEMORY/STATE/security-audit.jsonl

- ADR-011: Documents security hardening decisions

Closes WP-B (PR #B)
@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Fügt ein modulares Security‑Hardening hinzu: neue Sanitizer‑Pipeline, katalogisierte Injection‑Pattern‑Bibliothek, erweitertes Multi‑Feld‑Scanning in der Security‑Validator‑Logik, asynchrones JSONL‑Audit‑Logging und Erweiterung von DANGEROUS_PATTERNS um moderne Regex‑Vektoren.

Changes

Cohort / File(s) Summary
Injection Patterns Library
.opencode/plugins/lib/injection-patterns.ts
Neues Modul mit kategorisierten, vorcompilierten RegExp‑Arrays (instruction_override, role_hijacking, system_prompt_extraction, safety_bypass, context_separator, mcp_tool_injection, pii_credential_leak), Export von ALL_INJECTION_PATTERNS, Typen (InjectionCategory, InjectionMatch) und detectInjections()‑Funktion; liefert eine Match‑Struktur mit Positionen und redigiertem Text.
Input Sanitization Pipeline
.opencode/plugins/lib/sanitizer.ts
Neues Sanitizer‑Modul: decodeBase64Payloads, normalizeUnicode, collapseObfuscatedSpacing, stripHtmlTags, sanitizeForSecurityCheck und exportierte INJECTION_SCAN_FIELDS; orchestriert Vorverarbeitung vor Pattern‑Matching und entobfuskierende Transforms.
Security Validator Enhancement
.opencode/plugins/handlers/security-validator.ts
Umfassende Überarbeitung: Multi‑Field‑Scanning über INJECTION_SCAN_FIELDS, Nutzung von sanitizeForSecurityCheck und detectInjections, asynchrones JSONL‑Audit‑Logging (MEMORY/STATE/security-audit.jsonl), SecurityAuditEntry Interface, logSecurityEvent() Helper; redigiert Secrets in Previews und implementiert Fail‑Open‑Verhalten mit Audit‑Einträgen bei allen Entscheidungspunkten.
Dangerous Patterns Extension
.opencode/plugins/adapters/types.ts
Erweiterung von DANGEROUS_PATTERNS um mehrere neue Regex‑Muster für obfuskierte RCE (Base64, command substitution/pipe), env‑Exfiltration, Python/Node one‑liner RCE und SSH/Identity‑Exposure.
Docs: ADR & Epic
docs/architecture/adr/ADR-011-security-hardening.md, docs/architecture/adr/README.md, docs/epic/WP-B-SECURITY-HARDENING-PLAN.md
Neue ADR und Epic‑Dokumentation beschreibt Architektur, Sanitization‑Pipeline, Pattern‑Kategorien, Audit‑Format (JSONL), Fail‑Open‑Rationale und Verifikationspläne; ADR‑Index ergänzt.
Runtime artifact (logging)
MEMORY/STATE/security-audit.jsonl (usage)
Neuer Audit‑Log‑Pfad wird asynchron im JSONL‑Format angehängt; Logeinträge mit SecurityAuditEntry‑Feldsatz (timestamp, tool, action, reason, pattern?, category?, commandPreview?).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Validator as SecurityValidator
    participant Sanitizer as Sanitizer
    participant Detector as InjectionDetector
    participant Logger as AuditLogger

    Client->>Validator: validateSecurity(args)
    Validator->>Validator: extractCommand(args)

    alt no command extracted
        Validator->>Logger: logSecurityEvent({action:"allowed", reason:"no_command"})
        Logger-->>Validator: ack
        Validator-->>Client: allow
    else command present
        Validator->>Sanitizer: sanitizeForSecurityCheck(field)
        Sanitizer->>Sanitizer: decodeBase64Payloads -> normalizeUnicode
        Sanitizer->>Sanitizer: collapseObfuscatedSpacing -> stripHtmlTags
        Sanitizer-->>Validator: sanitizedContent

        Validator->>Detector: detectInjections(sanitizedContent)
        Detector-->>Validator: [InjectionMatch...]

        alt injection detected
            Validator->>Logger: logSecurityEvent({action:"blocked", category, pattern, commandPreview})
            Logger-->>Validator: ack
            Validator-->>Client: block
        else requires confirmation
            Validator->>Logger: logSecurityEvent({action:"confirmed", pattern, commandPreview})
            Logger-->>Validator: ack
            Validator-->>Client: request_confirmation
        else all clear
            Validator->>Logger: logSecurityEvent({action:"allowed", reason:"all_checks_passed", commandPreview})
            Logger-->>Validator: ack
            Validator-->>Client: allow
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇
Ich hopp' durchs Bytefeld, such' nach dunklem Licht,
entwirr' versteckte Zeilen, bring' Transparenz ans Licht.
Ich dekodier', ich glätt', ich schütz' vor falschem Tun,
schreib' leise in das Log — so kann man’s später ruh'n.
Hopp! Ein sichrer Sprung, die Warnung bleibt im Nu.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Der PR-Titel bezieht sich auf die Hauptänderungen des Changeset: Security Hardening mit Prompt Injection Defense und Audit Logging, was genau den Inhalten der neuen Module, Pattern-Library und Logging-Funktionalität entspricht.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/wp-b-security-hardening

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.opencode/plugins/handlers/security-validator.ts (1)

177-192: ⚠️ Potential issue | 🔴 Critical

Injection-Checks werden für Inputs ohne extrahierbaren Command komplett übersprungen.

In Line 179 erfolgt allow vor checkAllFieldsForInjection(...). Damit greift der neue Multi-Field-Scan nur für Fälle mit extractCommand(...)-Treffer.

🛠️ Vorschlag (Reihenfolge korrigieren)
-		const command = extractCommand(input);
-
-		if (!command) {
-			fileLog(`No command extracted from input`, "warn");
-			// No command to validate - allow by default
-			await logSecurityEvent({
-				timestamp: new Date().toISOString(),
-				tool: input.tool,
-				action: "allowed",
-				reason: "No command extracted",
-			});
-			return {
-				action: "allow",
-				reason: "No command to validate",
-			};
-		}
+		const injectionResult = input.args
+			? checkAllFieldsForInjection(input.args)
+			: null;
+		if (injectionResult) {
+			// existing block-path for injection detection
+			// (moved here so it runs regardless of command extraction)
+		}
+
+		const command = extractCommand(input);
+		if (!command) {
+			fileLog(`No command extracted from input`, "warn");
+			void logSecurityEvent({
+				timestamp: new Date().toISOString(),
+				tool: input.tool,
+				action: "allowed",
+				reason: "No command extracted",
+			});
+			return {
+				action: "allow",
+				reason: "No command to validate",
+			};
+		}

Also applies to: 216-246

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.opencode/plugins/handlers/security-validator.ts around lines 177 - 192, The
early-return path that allows inputs when extractCommand(input) returns falsy
skips the injection scanner; move or add invocation of
checkAllFieldsForInjection(input, ...) before the allow return in the
extractCommand branch so inputs without an extracted command still undergo
injection checks, and if checkAllFieldsForInjection reports an injection, call
logSecurityEvent(...) and return { action: "deny", reason: "injection detected"
} instead of allowing; apply the same fix to the analogous block that currently
allows before scanning (the block around the other early-return mentioned) so
both code paths use checkAllFieldsForInjection and logSecurityEvent
consistently.
🧹 Nitpick comments (3)
.opencode/plugins/lib/sanitizer.ts (1)

80-82: Obfuskierte Wörter werden nur am Anfang des Strings erkannt.

Durch den ^-Anker in Line 80 wird z. B. please i g n o r e ... nicht normalisiert. Für Security-Checks sollte die Erkennung im gesamten String greifen.

♻️ Vorschlag
-	if (/^(\w\s){4,}/.test(content.trim())) {
+	if (/(?:\b\w\s){4,}\w\b/.test(content)) {
 		return content.replace(/(\w)\s(?=\w)/g, "$1");
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.opencode/plugins/lib/sanitizer.ts around lines 80 - 82, The detection regex
in sanitizer.ts currently uses the start-anchor (^ in the condition
/^(\\w\\s){4,}/) so obfuscated words are only detected at the string start;
change the condition to test the entire string (e.g. remove the ^ so it becomes
/(\w\s){4,}/ and run it against the full content rather than content.trim()),
leaving the replacement (/(\w)\s(?=\w)/g in the return call) intact so
spaced-out words anywhere in the string get normalized.
docs/architecture/adr/ADR-011-security-hardening.md (1)

127-137: Für die Architektur fehlt ein Mermaid-Detaildiagramm im einklappbaren Abschnitt.

Die ADR ist inhaltlich gut, aber für die gewünschte Doku-Form wäre ein kurzes Flow-Diagramm (Sanitizer -> Detection -> Decision -> Audit) hilfreich.

📈 Beispielstruktur
<details>
<summary>Mermaid: Security-Flow (Detail)</summary>

```mermaid
flowchart LR
  A[Tool Input] --> B[Sanitizer Pipeline]
  B --> C[detectInjections + Command Checks]
  C --> D{Decision}
  D -->|block/confirm/allow| E[security-audit.jsonl]
```
As per coding guidelines: `docs/**`: Mermaid diagrams in collapsible sections for detail.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture/adr/ADR-011-security-hardening.md` around lines 127 - 137,
Füge in der ADR-011-security-hardening.md unter der Implementation- oder Files
Modified-Sektion einen einklappbaren Details-Block mit einem Mermaid-Flowchart
namens "Mermaid: Security-Flow (Detail)" ein; der Block soll einen flowchart LR
zeigen: Tool Input -> Sanitizer Pipeline -> detectInjections + Command Checks ->
Entscheidung (block/confirm/allow) -> security-audit.jsonl, und muss als
```mermaid fenced code block innerhalb eines <details><summary>Mermaid:
Security-Flow (Detail)</summary> ... </details> eingefügt werden, damit die
Dokumentationsrichtlinie docs/** erfüllt wird.
docs/epic/WP-B-SECURITY-HARDENING-PLAN.md (1)

55-73: ASCII-Übersicht ist gut — ergänzend fehlt Mermaid-Detailansicht.

Für den Security-Flow wäre zusätzlich ein einklappbares Mermaid-Diagramm sinnvoll, damit Architekturpfade schneller erfassbar sind.

As per coding guidelines: docs/**: ASCII diagrams for quick overview and Mermaid diagrams in collapsible sections for detail.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/epic/WP-B-SECURITY-HARDENING-PLAN.md` around lines 55 - 73, Add a
collapsible Mermaid diagram immediately after the "Scope: 3 Themenblöcke" ASCII
overview that visualizes the Security-Flow (showing interactions between
security-validator.ts, lib/injection-patterns.ts, lib/sanitizer.ts,
adapters/types.ts/DANGEROUS_PATTERNS, MCP Tool-Description checks, and
ADR-011-security-hardening.md); wrap the Mermaid block in a collapsible section
per docs/** guidelines, include clear directional arrows and node labels for
normalization → pattern-matching → validation → audit logging, and keep the
ASCII overview intact above the collapsible Mermaid detail.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.opencode/plugins/handlers/security-validator.ts:
- Around line 182-187: The call to logSecurityEvent is being awaited on the hot
path (e.g., await logSecurityEvent(...)), causing blocking I/O; change to
fire-and-forget by removing await and invoking logSecurityEvent(...) without
blocking the decision path, and ensure you attach a .catch(...) handler to the
returned promise to log failures (so no unhandled rejections). Update every
occurrence of await logSecurityEvent(...) in this file (and other handlers) to
this non-blocking pattern so the security decision does not wait for filesystem
I/O.
- Around line 204-207: The audit entry currently stores raw command text in the
commandPreview field (constructed from command.slice(0, 100)), which may persist
secrets; update the code that builds the object (the block referencing
dangerousMatch and commandPreview) to first redact sensitive values from command
(e.g., via an existing redactSecrets/redactCommand utility or by adding a small
sanitizer that masks API keys/tokens/credentials with regexes) and then set
commandPreview to the redacted result (e.g., redactedCommand.slice(0,100));
ensure the sanitizer is applied wherever commandPreview is set and add/adjust
tests to confirm secrets are masked in security-audit.jsonl.

In @.opencode/plugins/lib/injection-patterns.ts:
- Line 25: Das aktuelle Regex
/you\s+are\s+now\s+(a\s+|an\s+)?(?!Claude|an\s+AI)/i ist zu generisch und
matched z.B. "you are now ready"; ändere die Pattern-Logik in
injection-patterns.ts so dass ein echtes Rollen-Token verlangt wird (z.B. ein
explizites Role-Label oder ein in Anführungszeichen/Kapitalisiert angegebenes
Rollennamen), und ersetze das bestehende Pattern durch eines, das nach "you are
now (a|an)?", gefolgt von einem klar abgegrenzten Rollen-Token wie "role:" oder
einem in quotes/CapitalizedWord stehenden Rollennamen sucht; aktualisiere die
Einträge, die den alten Regex verwenden, auf das neue strengere Pattern
(verwende das vorhandene Literal
/you\s+are\s+now\s+(a\s+|an\s+)?(?!Claude|an\s+AI)/i als Referenz).

In @.opencode/plugins/lib/sanitizer.ts:
- Around line 53-63: The homoglyph mapping in the lookalikes object uses a Latin
"h" where a Cyrillic "х" is intended, so update the lookalikes record (variable
name: lookalikes in sanitizer.ts) to replace the Latin small letter h with the
Cyrillic small letter kha (U+0445) so the Cyrillic→Latin normalization works
correctly; verify other entries use correct Unicode characters and run tests to
ensure the "х"→"x" mapping is applied during sanitization in the
sanitize/normalize function that consumes lookalikes.
- Around line 18-21: The unicode-lookalike mapping in the sanitizer contains an
incorrect key `h: "x"`; locate the mapping object in
.opencode/plugins/lib/sanitizer.ts (the object used for Cyrillic→ASCII
replacements) and replace the Latin letter key `h` with the Cyrillic small
letter kh `х` (U+0445) so the entry becomes the Cyrillic-to-ASCII mapping (х ->
"x"); alternatively remove the entry if unintended—update the mapping where `h:
"x"` currently appears.

In `@docs/architecture/adr/ADR-011-security-hardening.md`:
- Around line 120-122: Update the "Regex DoS" bullet under
ADR-011-security-hardening to accurately describe the actual strategy: note that
while bounded quantifiers like `{0,50}` are preferred, several implemented
pattern sets still contain `.*` sequences as a deliberate trade-off; explicitly
list or reference the affected pattern sets, state the conditions under which
`.*` is used, and add the mitigations applied for those cases (e.g., input
length limits, anchored patterns, complexity review, and
regex-timeout/sanitization measures) so the documentation matches the real
implementation.
- Around line 1-7: This ADR (ADR-011: Security Hardening — Prompt Injection
Defense & Audit Logging) is missing the required YAML frontmatter for
Obsidian/Dataview; add a top YAML block delimited by '---' containing keys like
id: ADR-011, title, status, date (ISO), depends_on (or depends), type: ADR, and
tags/topics (e.g., security, architecture) so Dataview/Obsidian workflows can
index it; ensure the frontmatter appears at the very top of
ADR-011-security-hardening.md and that keys match existing ADRs' conventions
(use the same key names as other ADR files).

In `@docs/epic/WP-B-SECURITY-HARDENING-PLAN.md`:
- Around line 6-7: Aktualisiere alle Platzhalter-PR-Referenzen im Dokument:
ersetze jede Instanz von "target-pr: PR `#B`" und "Estimierter PR: PR `#B`" durch
die tatsächliche Referenz "PR `#43`" und sorge dafür, dass die Einträge konsistent
sind (prüfe besonders die Stellen, die im Kommentar genannt wurden sowie die
weiteren Vorkommen im Dokument). Verwende die bestehenden Schlüsselwörter
"target-pr" und "Estimierter PR" beim Austausch, passe ggf. das
Datums-/Metadatenformat an das vorhandene Schema an und führe einen schnellen
grep/Find-Check durch, um keine weiteren verwaisten Platzhalter übrig zu lassen.

---

Outside diff comments:
In @.opencode/plugins/handlers/security-validator.ts:
- Around line 177-192: The early-return path that allows inputs when
extractCommand(input) returns falsy skips the injection scanner; move or add
invocation of checkAllFieldsForInjection(input, ...) before the allow return in
the extractCommand branch so inputs without an extracted command still undergo
injection checks, and if checkAllFieldsForInjection reports an injection, call
logSecurityEvent(...) and return { action: "deny", reason: "injection detected"
} instead of allowing; apply the same fix to the analogous block that currently
allows before scanning (the block around the other early-return mentioned) so
both code paths use checkAllFieldsForInjection and logSecurityEvent
consistently.

---

Nitpick comments:
In @.opencode/plugins/lib/sanitizer.ts:
- Around line 80-82: The detection regex in sanitizer.ts currently uses the
start-anchor (^ in the condition /^(\\w\\s){4,}/) so obfuscated words are only
detected at the string start; change the condition to test the entire string
(e.g. remove the ^ so it becomes /(\w\s){4,}/ and run it against the full
content rather than content.trim()), leaving the replacement (/(\w)\s(?=\w)/g in
the return call) intact so spaced-out words anywhere in the string get
normalized.

In `@docs/architecture/adr/ADR-011-security-hardening.md`:
- Around line 127-137: Füge in der ADR-011-security-hardening.md unter der
Implementation- oder Files Modified-Sektion einen einklappbaren Details-Block
mit einem Mermaid-Flowchart namens "Mermaid: Security-Flow (Detail)" ein; der
Block soll einen flowchart LR zeigen: Tool Input -> Sanitizer Pipeline ->
detectInjections + Command Checks -> Entscheidung (block/confirm/allow) ->
security-audit.jsonl, und muss als ```mermaid fenced code block innerhalb eines
<details><summary>Mermaid: Security-Flow (Detail)</summary> ... </details>
eingefügt werden, damit die Dokumentationsrichtlinie docs/** erfüllt wird.

In `@docs/epic/WP-B-SECURITY-HARDENING-PLAN.md`:
- Around line 55-73: Add a collapsible Mermaid diagram immediately after the
"Scope: 3 Themenblöcke" ASCII overview that visualizes the Security-Flow
(showing interactions between security-validator.ts, lib/injection-patterns.ts,
lib/sanitizer.ts, adapters/types.ts/DANGEROUS_PATTERNS, MCP Tool-Description
checks, and ADR-011-security-hardening.md); wrap the Mermaid block in a
collapsible section per docs/** guidelines, include clear directional arrows and
node labels for normalization → pattern-matching → validation → audit logging,
and keep the ASCII overview intact above the collapsible Mermaid detail.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bfa91f7e-9ed2-4acb-966e-8f6255344a3a

📥 Commits

Reviewing files that changed from the base of the PR and between 5758038 and 3ad2e50.

📒 Files selected for processing (7)
  • .opencode/plugins/adapters/types.ts
  • .opencode/plugins/handlers/security-validator.ts
  • .opencode/plugins/lib/injection-patterns.ts
  • .opencode/plugins/lib/sanitizer.ts
  • docs/architecture/adr/ADR-011-security-hardening.md
  • docs/architecture/adr/README.md
  • docs/epic/WP-B-SECURITY-HARDENING-PLAN.md

Comment on lines +1 to +7
# ADR-011: Security Hardening — Prompt Injection Defense & Audit Logging

**Status:** ✅ Implemented (WP-B)
**Date:** 2026-03-06
**Depends on:** ADR-006 (Security Validation Preservation)

---
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wichtiger ADR-Eintrag ohne YAML-Frontmatter.

Für diese ADR als zentrale Architekturentscheidung fehlt das Frontmatter für Obsidian/Dataview-Workflows.

📝 Vorschlag
+---
+title: ADR-011: Security Hardening — Prompt Injection Defense & Audit Logging
+status: accepted
+date: 2026-03-06
+depends_on: ADR-006
+category: security
+---
+
 # ADR-011: Security Hardening — Prompt Injection Defense & Audit Logging
As per coding guidelines: `**/*.md`: Frontmatter for important documents.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# ADR-011: Security Hardening — Prompt Injection Defense & Audit Logging
**Status:** ✅ Implemented (WP-B)
**Date:** 2026-03-06
**Depends on:** ADR-006 (Security Validation Preservation)
---
---
title: ADR-011: Security Hardening — Prompt Injection Defense & Audit Logging
status: accepted
date: 2026-03-06
depends_on: ADR-006
category: security
---
# ADR-011: Security Hardening — Prompt Injection Defense & Audit Logging
**Status:** ✅ Implemented (WP-B)
**Date:** 2026-03-06
**Depends on:** ADR-006 (Security Validation Preservation)
---
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture/adr/ADR-011-security-hardening.md` around lines 1 - 7, This
ADR (ADR-011: Security Hardening — Prompt Injection Defense & Audit Logging) is
missing the required YAML frontmatter for Obsidian/Dataview; add a top YAML
block delimited by '---' containing keys like id: ADR-011, title, status, date
(ISO), depends_on (or depends), type: ADR, and tags/topics (e.g., security,
architecture) so Dataview/Obsidian workflows can index it; ensure the
frontmatter appears at the very top of ADR-011-security-hardening.md and that
keys match existing ADRs' conventions (use the same key names as other ADR
files).

Comment on lines +120 to +122
- **Regex DoS:** Complex patterns on long input could be slow
- Mitigation: Patterns use bounded quantifiers (`{0,50}` not `*`)
- **False positives:** Aggressive patterns might block legitimate content
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Risikomitigation beschreibt die Regex-Strategie aktuell ungenau.

Line 121 sagt, es würden bounded Quantifier statt * genutzt; in den implementierten Pattern-Sets kommen aber mehrfach .*-Sequenzen vor. Das sollte als Trade-off präzise dokumentiert werden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture/adr/ADR-011-security-hardening.md` around lines 120 - 122,
Update the "Regex DoS" bullet under ADR-011-security-hardening to accurately
describe the actual strategy: note that while bounded quantifiers like `{0,50}`
are preferred, several implemented pattern sets still contain `.*` sequences as
a deliberate trade-off; explicitly list or reference the affected pattern sets,
state the conditions under which `.*` is used, and add the mitigations applied
for those cases (e.g., input length limits, anchored patterns, complexity
review, and regex-timeout/sanitization measures) so the documentation matches
the real implementation.

Comment on lines +6 to +7
target-pr: PR #B
date: 2026-03-06
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

PR-Referenzen sind noch Platzhalter und inzwischen inkonsistent.

target-pr: PR #B und `Estimierter PR: PR `#B sollten auf die reale PR-Referenz aktualisiert werden (hier: PR #43), damit Plan- und Audit-Doku nachvollziehbar bleiben.

🧾 Vorschlag
-target-pr: PR `#B`
+target-pr: PR `#43`
@@
-*Estimierter PR: PR `#B`*
+*PR: PR `#43`*

Also applies to: 435-437

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/epic/WP-B-SECURITY-HARDENING-PLAN.md` around lines 6 - 7, Aktualisiere
alle Platzhalter-PR-Referenzen im Dokument: ersetze jede Instanz von "target-pr:
PR `#B`" und "Estimierter PR: PR `#B`" durch die tatsächliche Referenz "PR `#43`" und
sorge dafür, dass die Einträge konsistent sind (prüfe besonders die Stellen, die
im Kommentar genannt wurden sowie die weiteren Vorkommen im Dokument). Verwende
die bestehenden Schlüsselwörter "target-pr" und "Estimierter PR" beim Austausch,
passe ggf. das Datums-/Metadatenformat an das vorhandene Schema an und führe
einen schnellen grep/Find-Check durch, um keine weiteren verwaisten Platzhalter
übrig zu lassen.

7 patterns for API keys and private keys:
Anthropic, OpenAI, GitHub PAT, AWS Access Key,
PEM Private Key, Groq, HuggingFace

Pattern sources: jcfischer/pai-content-filter (MIT) + own additions
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
.opencode/plugins/lib/injection-patterns.ts (1)

25-27: ⚠️ Potential issue | 🟠 Major

Rollen-Hijacking-Regexe blockieren weiter harmlose Prompts.

Line 25 und Line 27 erzwingen nach you are now bzw. pretend to be kein echtes Rollen-Token. Dadurch matchen auch Sätze wie you are now ready oder pretend to be careful, was legitime Inhalte unnötig als Injection markiert.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.opencode/plugins/lib/injection-patterns.ts around lines 25 - 27, Die beiden
Regex-Muster /you\s+are\s+now\s+(a\s+|an\s+)?(?!Claude|an\s+AI)/i und
/pretend\s+(you\s+are|to\s+be)\s+(a\s+|an\s+)?(?!(helpful|assistant))/i sind zu
permissiv und matchen harmlose Phrasen; enger die Negativ-Lookahead-Logik, z.B.
erfordern, dass nach „you are now“ bzw. „pretend to be“ ein echtes Rollen-Token
kommt (z.B. durch Lookahead auf ein Substantiv/Musterauswahl oder
Großbuchstaben/Minimallänge), oder ersetzen die current negative list by a
positive whitelist of role nouns (assistant|hacker|developer|doctor|Claude|...);
update die beiden Literale so, dass sie nur dann matchen, wenn unmittelbar ein
Rollenwort (nicht ein kurzes Adjektiv wie "ready" oder "careful") folgt.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.opencode/plugins/lib/injection-patterns.ts:
- Around line 109-112: The InjectionMatch interface currently stores full
secrets in matchedText (e.g., API keys) which leaks PII; change matchedText to
hold a redacted preview or metadata only (for example: masked value showing only
last 4 chars, length, and match position) and update all places that construct
InjectionMatch (references: InjectionMatch and matchedText and the match
creation sites around lines 134-136) to populate the new redacted field instead
of the raw match; ensure existing consumers/loggers use the redacted
preview/metadata for audit logging and never serialize the full secret.
- Line 70: Die Regex in the injection patterns uses .{0,50} which doesn't match
newlines, so a single line break between "this tool" and "send|exfiltrate|leak"
bypasses detection; update the pattern (the literal regex
/when\s+(you\s+)?(use|call|invoke)\s+this\s+tool.{0,50}(send|exfiltrate|leak)/i)
to allow multiline matching by replacing .{0,50} with a construct that matches
newlines as well (e.g., [\s\S]{0,50} or enable DOTALL with (?s)), keeping the
rest of the pattern and flags intact so the rule still limits the max span
between phrases.
- Line 84: The PEM private-key regex
'/-----BEGIN\s+(?:RSA\s+)?PRIVATE\s+KEY-----/' in
plugins/lib/injection-patterns.ts misses other key types (e.g. EC, OPENSSH);
replace the '(?:RSA\s+)?' fragment with a more general token like
'(?:[A-Z0-9]+\s+)?' (or '\w+' if preferred) and add the case-insensitive flag so
the pattern becomes e.g. '/-----BEGIN\s+(?:[A-Z0-9]+\s+)?PRIVATE\s+KEY-----/i'
to reliably match "BEGIN EC PRIVATE KEY", "BEGIN OPENSSH PRIVATE KEY", etc.

---

Duplicate comments:
In @.opencode/plugins/lib/injection-patterns.ts:
- Around line 25-27: Die beiden Regex-Muster
/you\s+are\s+now\s+(a\s+|an\s+)?(?!Claude|an\s+AI)/i und
/pretend\s+(you\s+are|to\s+be)\s+(a\s+|an\s+)?(?!(helpful|assistant))/i sind zu
permissiv und matchen harmlose Phrasen; enger die Negativ-Lookahead-Logik, z.B.
erfordern, dass nach „you are now“ bzw. „pretend to be“ ein echtes Rollen-Token
kommt (z.B. durch Lookahead auf ein Substantiv/Musterauswahl oder
Großbuchstaben/Minimallänge), oder ersetzen die current negative list by a
positive whitelist of role nouns (assistant|hacker|developer|doctor|Claude|...);
update die beiden Literale so, dass sie nur dann matchen, wenn unmittelbar ein
Rollenwort (nicht ein kurzes Adjektiv wie "ready" oder "careful") folgt.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb4a230f-97db-4b7f-9177-153426eaf897

📥 Commits

Reviewing files that changed from the base of the PR and between 3ad2e50 and 85d4e9b.

📒 Files selected for processing (1)
  • .opencode/plugins/lib/injection-patterns.ts

Critical fixes:
- security-validator.ts: Fire-and-forget audit logging (non-blocking)
- security-validator.ts: Secrets redaction for commandPreview (API keys, PEM keys)
- security-validator.ts: Early injection check (even when no command extracted)
- injection-patterns.ts: Strengthen ROLE_HIJACKING patterns (require role tokens)
- injection-patterns.ts: MCP pattern multiline-capable ([\s\S] instead of .)
- injection-patterns.ts: PEM regex extended for EC, OPENSSH, etc.
- sanitizer.ts: Fix Cyrillic kha (U+0445) instead of Latin h

All Biome checks passing.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
.opencode/plugins/handlers/security-validator.ts (1)

232-245: ⚠️ Potential issue | 🟠 Major

Die Fallback-Preview umgeht die Secret-Redaction.

Im Injection-Pfad ohne extrahierten command wird commandPreview aus args[field] unverändert in security-audit.jsonl geschrieben. Wenn genau dieses Feld ein Token, eine Mailadresse oder andere PII enthält, persistiert der Audit-Log wieder Rohdaten — die Redaction greift hier nur im command-Zweig. Führt den Fallback durch dieselbe Redaction oder lasst die Preview in diesem Pfad ganz weg. As per coding guidelines, ADR-011 requires a 100-char commandPreview with no PII.

🔧 Mögliche Korrektur
 		if (injectionResult) {
 			const firstMatch = injectionResult.matches[0];
+			const previewSource = command
+				? command
+				: `${injectionResult.field}:${String(input.args?.[injectionResult.field] ?? "")}`;
 			fileLog(
 				`BLOCKED: Prompt injection detected in field '${injectionResult.field}'`,
 				"error",
 			);
@@
 			logSecurityEvent({
 				timestamp: new Date().toISOString(),
 				tool: input.tool,
 				action: "blocked",
 				reason: `Prompt injection in ${injectionResult.field}`,
 				category: firstMatch.category,
 				pattern: firstMatch.pattern.toString(),
-				commandPreview: command
-					? redactSecrets(command).slice(0, 100)
-					: `${injectionResult.field}:${input.args?.[injectionResult.field]}`.slice(
-							0,
-							100,
-						),
+				commandPreview: redactSecrets(previewSource).slice(0, 100),
 			});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.opencode/plugins/handlers/security-validator.ts around lines 232 - 245, Der
Fallback-Pfad für commandPreview schreibt unveränderte args[field] ins Audit-Log
und umgeht damit redactSecrets; fix: when building commandPreview in the
logSecurityEvent call, pass the fallback string through redactSecrets before
slicing (i.e. compute fallback =
`${injectionResult.field}:${input.args?.[injectionResult.field]}` then
commandPreview = redactSecrets(fallback).slice(0,100)), or alternatively omit
the preview when command is missing; update the logic around logSecurityEvent /
commandPreview so redactSecrets is always applied and slice(0,100) runs after
redaction to satisfy ADR-011 and avoid persisting PII.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.opencode/plugins/lib/sanitizer.ts:
- Around line 40-68: normalizeUnicode currently leaves combining marks after
NFKD, letting characters like "o + U+0301" remain non-ASCII; update
normalizeUnicode to remove Unicode combining marks immediately after calling
content.normalize("NFKD") (e.g. strip characters in the Unicode Mark category or
the U+0300–U+036F range) before constructing/applying the lookalikes mapping and
nonAsciiRegex mapping so composed diacritics cannot bypass ASCII-only checks in
the subsequent replacement logic.
- Around line 78-83: Die Funktion collapseObfuscatedSpacing verwendet einen
Anchored-Guard (^(\w\s){4,}) und eine globale Ersetzung, die Wortgrenzen
zerstört; ändere die Logik so dass du statt eines Präfix-Guards global nach
ganzen Sequenzen von Einzelbuchstaben (mehrere aufeinanderfolgende "letter space
letter" Token, mindestens z.B. 4 Paare) suchst und nur die Leerzeichen innerhalb
dieser gefundenen Sequenzen entfernst; konkret: in collapseObfuscatedSpacing
zuerst alle Matches für solche letter-by-letter-Sequenzen (gebunden an
Wortgrenzen) finden und für jedes Match nur die internen Leerzeichen
zusammenziehen, statt die ursprünglichen Regexes ^(\w\s){4,} und (\w)\s(?=\w)
direkt auf den gesamten Content anzuwenden.

---

Duplicate comments:
In @.opencode/plugins/handlers/security-validator.ts:
- Around line 232-245: Der Fallback-Pfad für commandPreview schreibt
unveränderte args[field] ins Audit-Log und umgeht damit redactSecrets; fix: when
building commandPreview in the logSecurityEvent call, pass the fallback string
through redactSecrets before slicing (i.e. compute fallback =
`${injectionResult.field}:${input.args?.[injectionResult.field]}` then
commandPreview = redactSecrets(fallback).slice(0,100)), or alternatively omit
the preview when command is missing; update the logic around logSecurityEvent /
commandPreview so redactSecrets is always applied and slice(0,100) runs after
redaction to satisfy ADR-011 and avoid persisting PII.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f100ff67-dc87-4946-be28-72a4459b2a8a

📥 Commits

Reviewing files that changed from the base of the PR and between 85d4e9b and 3c89bae.

📒 Files selected for processing (3)
  • .opencode/plugins/handlers/security-validator.ts
  • .opencode/plugins/lib/injection-patterns.ts
  • .opencode/plugins/lib/sanitizer.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • .opencode/plugins/lib/injection-patterns.ts

Comment on lines +40 to +68
export function normalizeUnicode(content: string): string {
// Use NFKD normalization to decompose characters
const normalized = content.normalize("NFKD");

// Build regex from char codes to avoid control character lint warning
// Match any character outside ASCII range (0-127)
const nonAsciiRegex = new RegExp(
`[^${String.fromCharCode(0)}-${String.fromCharCode(127)}]`,
"g",
);

return normalized.replace(nonAsciiRegex, (char) => {
// Map common Cyrillic/Greek lookalikes to ASCII
const lookalikes: Record<string, string> = {
а: "a", // Cyrillic а (U+0430)
е: "e", // Cyrillic е (U+0435)
і: "i", // Cyrillic і (U+0456)
о: "o", // Cyrillic о (U+043E)
р: "p", // Cyrillic р (U+0440)
с: "c", // Cyrillic с (U+0441)
ѕ: "s", // Cyrillic ѕ (U+0455)
у: "y", // Cyrillic у (U+0443)
х: "x", // Cyrillic х (U+0445) - kha, NOT Latin h
А: "A", // Cyrillic А (U+0410)
В: "B", // Cyrillic В (U+0412)
Е: "E", // Cyrillic Е (U+0415)
};
return lookalikes[char] ?? char;
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

normalizeUnicode lässt Combining Marks stehen und verfehlt damit einfache Unicode-Bypässe.

normalize("NFKD") zerlegt z. B. ó zu o + U+0301, aber der Callback gibt unbekannte Non-ASCII-Zeichen unverändert zurück. Dadurch bleibt ignóre previous instructions nach der Pipeline weiter nicht-ASCII und ASCII-basierte Injection-Regexe sehen den Trigger nicht. Entfernt Combining Marks direkt nach NFKD, bevor die Lookalike-Mappings laufen.

🔧 Mögliche Korrektur
 export function normalizeUnicode(content: string): string {
 	// Use NFKD normalization to decompose characters
-	const normalized = content.normalize("NFKD");
+	const normalized = content.normalize("NFKD").replace(/\p{M}+/gu, "");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function normalizeUnicode(content: string): string {
// Use NFKD normalization to decompose characters
const normalized = content.normalize("NFKD");
// Build regex from char codes to avoid control character lint warning
// Match any character outside ASCII range (0-127)
const nonAsciiRegex = new RegExp(
`[^${String.fromCharCode(0)}-${String.fromCharCode(127)}]`,
"g",
);
return normalized.replace(nonAsciiRegex, (char) => {
// Map common Cyrillic/Greek lookalikes to ASCII
const lookalikes: Record<string, string> = {
а: "a", // Cyrillic а (U+0430)
е: "e", // Cyrillic е (U+0435)
і: "i", // Cyrillic і (U+0456)
о: "o", // Cyrillic о (U+043E)
р: "p", // Cyrillic р (U+0440)
с: "c", // Cyrillic с (U+0441)
ѕ: "s", // Cyrillic ѕ (U+0455)
у: "y", // Cyrillic у (U+0443)
х: "x", // Cyrillic х (U+0445) - kha, NOT Latin h
А: "A", // Cyrillic А (U+0410)
В: "B", // Cyrillic В (U+0412)
Е: "E", // Cyrillic Е (U+0415)
};
return lookalikes[char] ?? char;
});
export function normalizeUnicode(content: string): string {
// Use NFKD normalization to decompose characters
const normalized = content.normalize("NFKD").replace(/\p{M}+/gu, "");
// Build regex from char codes to avoid control character lint warning
// Match any character outside ASCII range (0-127)
const nonAsciiRegex = new RegExp(
`[^${String.fromCharCode(0)}-${String.fromCharCode(127)}]`,
"g",
);
return normalized.replace(nonAsciiRegex, (char) => {
// Map common Cyrillic/Greek lookalikes to ASCII
const lookalikes: Record<string, string> = {
а: "a", // Cyrillic а (U+0430)
е: "e", // Cyrillic е (U+0435)
і: "i", // Cyrillic і (U+0456)
о: "o", // Cyrillic о (U+043E)
р: "p", // Cyrillic р (U+0440)
с: "c", // Cyrillic с (U+0441)
ѕ: "s", // Cyrillic ѕ (U+0455)
у: "y", // Cyrillic у (U+0443)
х: "x", // Cyrillic х (U+0445) - kha, NOT Latin h
А: "A", // Cyrillic А (U+0410)
В: "B", // Cyrillic В (U+0412)
Е: "E", // Cyrillic Е (U+0415)
};
return lookalikes[char] ?? char;
});
}
🧰 Tools
🪛 ast-grep (0.41.0)

[warning] 45-48: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
[^${String.fromCharCode(0)}-${String.fromCharCode(127)}],
"g",
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.opencode/plugins/lib/sanitizer.ts around lines 40 - 68, normalizeUnicode
currently leaves combining marks after NFKD, letting characters like "o +
U+0301" remain non-ASCII; update normalizeUnicode to remove Unicode combining
marks immediately after calling content.normalize("NFKD") (e.g. strip characters
in the Unicode Mark category or the U+0300–U+036F range) before
constructing/applying the lookalikes mapping and nonAsciiRegex mapping so
composed diacritics cannot bypass ASCII-only checks in the subsequent
replacement logic.

Comment on lines +78 to +83
export function collapseObfuscatedSpacing(content: string): string {
// Detect letter-space-letter pattern (obfuscated words)
if (/^(\w\s){4,}/.test(content.trim())) {
return content.replace(/(\w)\s(?=\w)/g, "$1");
}
return content;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

collapseObfuscatedSpacing erkennt nur Präfixe und kann Wortgrenzen zerstören.

Der Guard ^(\w\s){4,} greift nur, wenn die Obfuskation direkt am Anfang steht. Wenn er greift, entfernt (\w)\s(?=\w) außerdem jede einzelne Leerstelle zwischen Wortzeichen, sodass i g n o r e all previous zu ignoreallprevious wird. Matcht besser echte Letter-by-letter-Sequenzen global und kollabiert nur innerhalb dieser Sequenzen.

🔧 Mögliche Korrektur
 export function collapseObfuscatedSpacing(content: string): string {
-	// Detect letter-space-letter pattern (obfuscated words)
-	if (/^(\w\s){4,}/.test(content.trim())) {
-		return content.replace(/(\w)\s(?=\w)/g, "$1");
-	}
-	return content;
+	return content.replace(
+		/(^|[^\w])((?:\w\s){4,}\w)(?=$|[^\w])/g,
+		(_, prefix, obfuscated) => prefix + obfuscated.replace(/\s+/g, ""),
+	);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.opencode/plugins/lib/sanitizer.ts around lines 78 - 83, Die Funktion
collapseObfuscatedSpacing verwendet einen Anchored-Guard (^(\w\s){4,}) und eine
globale Ersetzung, die Wortgrenzen zerstört; ändere die Logik so dass du statt
eines Präfix-Guards global nach ganzen Sequenzen von Einzelbuchstaben (mehrere
aufeinanderfolgende "letter space letter" Token, mindestens z.B. 4 Paare) suchst
und nur die Leerzeichen innerhalb dieser gefundenen Sequenzen entfernst;
konkret: in collapseObfuscatedSpacing zuerst alle Matches für solche
letter-by-letter-Sequenzen (gebunden an Wortgrenzen) finden und für jedes Match
nur die internen Leerzeichen zusammenziehen, statt die ursprünglichen Regexes
^(\w\s){4,} und (\w)\s(?=\w) direkt auf den gesamten Content anzuwenden.

- Add redactMatchedText() helper that masks API keys/tokens
- For PII: show sk12...[REDACTED:51]...xYzQ format
- For non-PII: truncate at 50 chars
- Add matchPosition and originalLength metadata
- Prevents full secrets from being stored in logs

Addresses CodeRabbit finding: InjectionMatch stores full secrets
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.opencode/plugins/lib/injection-patterns.ts (1)

25-33: ⚠️ Potential issue | 🟠 Major

Role-Hijacking-Regeln sind noch zu breit.

Line 27 ist trotz der Verschärfung weiter anfällig für False Positives: Durch das i-Flag matched ["']?[A-Z][a-z]+["']? auch normale Wörter wie ready, und developer|expert|specialist in Lines 27-31 blockiert sehr wahrscheinlich legitime Prompts. Für einen Block-Trigger würde ich hier nur explizit markierte Rollen (role: / quoted labels) oder klar schädliche Rollen zulassen.

🎯 Vorschlag
-	/you\s+are\s+now\s+(?:a\s+|an\s+)?(?:["']?[A-Z][a-z]+["']?|role:\s*\w+|hacker|attacker|malware|developer|expert|specialist)/i,
+	/you\s+are\s+now\s+(?:a\s+|an\s+)?(?:role:\s*\w+|["'][^"'\n]{2,30}["']|hacker|attacker|malware|jailbreak)\b/i,
...
-	/pretend\s+(?:you\s+are|to\s+be)\s+(?:a\s+|an\s+)?(?:["']?[A-Z][a-z]+["']?|hacker|attacker|expert|developer)/i,
+	/pretend\s+(?:you\s+are|to\s+be)\s+(?:a\s+|an\s+)?(?:["'][^"'\n]{2,30}["']|hacker|attacker|malware|jailbreak)\b/i,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.opencode/plugins/lib/injection-patterns.ts around lines 25 - 33, The
ROLE_HIJACKING_PATTERNS array is too broad and causes false positives (the
/["']?[A-Z][a-z]+["']?/ part and generic tokens like
developer|expert|specialist); update ROLE_HIJACKING_PATTERNS to only match
explicitly marked role labels (e.g., quoted strings or the "role:" prefix) and
clearly malicious roles (hacker, attacker, malware, jailbroken, evil, malicious)
— remove the ambiguous capitalized-word alternative and drop generic benign
tokens like developer/expert/specialist from the patterns used in functions or
modules that reference ROLE_HIJACKING_PATTERNS so only explicit markers or
clearly harmful role names trigger a block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.opencode/plugins/lib/injection-patterns.ts:
- Around line 75-77: The second regex in MCP_TOOL_INJECTION_PATTERNS is too
broad and can match across very large inputs; replace the current
/tool\s+description.*ignore.*instructions/is pattern with a bounded-window
variant that limits how far apart the keywords can be (e.g., use non-greedy or
[\s\S]{0,50} between "tool description", "ignore" and "instructions") so it only
flags proximate occurrences; update the MCP_TOOL_INJECTION_PATTERNS array entry
accordingly to keep the same flags (i and s) and behavior but avoid large-span
false positives.

---

Duplicate comments:
In @.opencode/plugins/lib/injection-patterns.ts:
- Around line 25-33: The ROLE_HIJACKING_PATTERNS array is too broad and causes
false positives (the /["']?[A-Z][a-z]+["']?/ part and generic tokens like
developer|expert|specialist); update ROLE_HIJACKING_PATTERNS to only match
explicitly marked role labels (e.g., quoted strings or the "role:" prefix) and
clearly malicious roles (hacker, attacker, malware, jailbroken, evil, malicious)
— remove the ambiguous capitalized-word alternative and drop generic benign
tokens like developer/expert/specialist from the patterns used in functions or
modules that reference ROLE_HIJACKING_PATTERNS so only explicit markers or
clearly harmful role names trigger a block.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 85078add-d274-49b7-ae17-c0cd43e38cea

📥 Commits

Reviewing files that changed from the base of the PR and between 3c89bae and c08ac5f.

📒 Files selected for processing (1)
  • .opencode/plugins/lib/injection-patterns.ts

…ction

- MCP Pattern (line 77): Replace unbounded .* with [\s\S]{0,50}
  Prevents large-span false positives across entire document

- ROLE_HIJACKING: Remove ambiguous patterns
  - Remove quoted-capitalized-word pattern that matched any Capitalized word
  - Remove developer/expert/specialist (generic benign tokens)
  - Keep only: explicit role: prefix, quoted strings, or clearly malicious roles
  - Now requires explicit markers or harmful role names only

Reduces false positives while maintaining security coverage.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.opencode/plugins/lib/injection-patterns.ts (1)

116-121: Pattern-Serialisierung ist bereits korrekt implementiert — Refactoring bleibt optional.

Der Code konvertiert RegExp zu String (.toString()) bereits an allen Serialisierungspunkten (Zeilen 238, 280, 300, 332) vor logSecurityEvent(). Das SecurityAuditEntry-Interface erwartet pattern?: string, nicht RegExp. Die JSONL-Audit-Logging funktioniert damit ohne Verlust.

Die Refactoring-Empfehlung (Änderung von pattern: RegExp zu pattern: string in InjectionMatch) bleibt architektonisch sinnvoll für Design-Klarheit, ist aber keine kritische Fehlerbehebung — das System serialisiert aktuell korrekt.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.opencode/plugins/lib/injection-patterns.ts around lines 116 - 121, The
InjectionMatch interface currently types pattern as RegExp but the audit
pipeline (SecurityAuditEntry and calls to logSecurityEvent) already serializes
regexes to strings; change InjectionMatch.pattern from RegExp to string for
design clarity and to match SecurityAuditEntry.pattern?: string, then update any
code that constructs InjectionMatch instances (places that set pattern) to call
.toString() or otherwise pass a string, and adjust consumers that assume a
RegExp to parse/recreate the RegExp when needed (e.g., callers of
logSecurityEvent or any match-processing functions that relied on a RegExp).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.opencode/plugins/lib/injection-patterns.ts:
- Line 88: The existing regex /sk-(?!ant-)[a-zA-Z0-9]{32,}/ in
injection-patterns.ts misses documented OpenAI key formats that include
additional hyphenated prefixes (e.g., sk-proj-..., sk-admin-...); update that
pattern to allow hyphens in the token by expanding the character class (for
example change [a-zA-Z0-9] to include '-' so it becomes [a-zA-Z0-9-]) while
keeping the negative lookahead (sk-(?!ant-)) and the length requirement,
ensuring keys with internal hyphens are detected.

---

Nitpick comments:
In @.opencode/plugins/lib/injection-patterns.ts:
- Around line 116-121: The InjectionMatch interface currently types pattern as
RegExp but the audit pipeline (SecurityAuditEntry and calls to logSecurityEvent)
already serializes regexes to strings; change InjectionMatch.pattern from RegExp
to string for design clarity and to match SecurityAuditEntry.pattern?: string,
then update any code that constructs InjectionMatch instances (places that set
pattern) to call .toString() or otherwise pass a string, and adjust consumers
that assume a RegExp to parse/recreate the RegExp when needed (e.g., callers of
logSecurityEvent or any match-processing functions that relied on a RegExp).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7bb87f33-9bd8-463d-bac6-ee07171b8575

📥 Commits

Reviewing files that changed from the base of the PR and between c08ac5f and 91d4073.

📒 Files selected for processing (1)
  • .opencode/plugins/lib/injection-patterns.ts

// Pattern-Quellen: jcfischer/pai-content-filter (MIT) + eigene Ergänzungen
export const PII_PATTERNS = [
/sk-ant-[A-Za-z0-9\-_]{20,}/, // Anthropic API Key
/sk-(?!ant-)[a-zA-Z0-9]{32,}/, // OpenAI API Key
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Welche Formate/Präfixe haben aktuelle OpenAI API Keys laut offizieller OpenAI-Dokumentation oder offiziellen Beispielen? Bitte speziell prüfen, ob nach sk-zusätzliche mit- getrennte Segmente vorkommen.

💡 Result:

Laut offiziellen OpenAI-Seiten/Beispielen kommen aktuell folgende Präfixe/Formate für OpenAI API Keys vor (relevante Ausschnitte in den Beispielen sind meist redaktiert):

  • sk-... (klassisches/„standard“ Secret-Key-Präfix; in Fehlermeldungsbeispielen so angezeigt). → keine weiteren, durch - getrennten „Namenssegmente“ direkt nach sk- erkennbar. [1]
  • sk-proj-... (Project-bezogene API Keys; in offiziellen Cookbook-Beispielen als OPENAI_API_KEY gesetzt). → ja, hier folgt nach sk- explizit ein weiteres Segment proj, getrennt durch -. [2]
  • sk-admin-... (Admin API Keys; im offiziellen API-Reference-Beispiel ist sogar ein vollständiger Beispiel-value gezeigt). → ja, hier folgt nach sk- explizit admin, getrennt durch -. [3]

Damit: Ja, es gibt offiziell dokumentierte/illustrierte API-Key-Präfixe, bei denen nach sk- weitere mit - getrennte Segmente vorkommen (z. B. sk-proj-…, sk-admin-…). [2][3]

Quellen: [1] [2] [3]


OpenAI-API-Key-Regex erfasst offizielle Keyformate mit Bindestrich nicht.

Die Regex auf Zeile 88 erlaubt nach sk- nur alphanumerische Zeichen und übersieht damit dokumentierte OpenAI-Formate wie sk-proj-... (Projekt-Keys) und sk-admin-... (Admin-Keys). Diese werden von der Leak-Erkennung nicht erfasst.

🎯 Vorschlag
-	/sk-(?!ant-)[a-zA-Z0-9]{32,}/, // OpenAI API Key
+	/sk-(?!ant-)[A-Za-z0-9-]{32,}/, // OpenAI API Key
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/sk-(?!ant-)[a-zA-Z0-9]{32,}/, // OpenAI API Key
/sk-(?!ant-)[A-Za-z0-9-]{32,}/, // OpenAI API Key
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.opencode/plugins/lib/injection-patterns.ts at line 88, The existing regex
/sk-(?!ant-)[a-zA-Z0-9]{32,}/ in injection-patterns.ts misses documented OpenAI
key formats that include additional hyphenated prefixes (e.g., sk-proj-...,
sk-admin-...); update that pattern to allow hyphens in the token by expanding
the character class (for example change [a-zA-Z0-9] to include '-' so it becomes
[a-zA-Z0-9-]) while keeping the negative lookahead (sk-(?!ant-)) and the length
requirement, ensuring keys with internal hyphens are detected.

@Steffen025 Steffen025 merged commit 1062aa9 into dev Mar 8, 2026
3 checks passed
@Steffen025 Steffen025 deleted the feature/wp-b-security-hardening branch March 8, 2026 20:59
Steffen025 added a commit that referenced this pull request Mar 8, 2026
File was superseded by actual implementation in WP-B PR #43.
No longer needed as security features are documented in ADR-011
and implemented in plugins/handlers/ directory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant