Skip to content

feat: policy recommendation plumbing — denial aggregation, transport, approval pipeline, and mechanistic recommendations #204

@johntmyers

Description

@johntmyers

Summary

Build the end-to-end infrastructure for sandbox-initiated policy recommendations: denial aggregation in the sandbox, gRPC transport to the gateway, persistence, approval workflow, policy merge, and CLI/TUI for human-in-the-loop review. Includes a deterministic (no-LLM) chunk generator so the full pipeline is testable without inference configured.

This is the plumbing half of #153. The LLM-powered PolicyAdvisor agent harness is a follow-up issue that plugs into this infrastructure.

Architecture

Follows Option D from the design doc — sandbox aggregates denials locally, generates mechanistic recommendations, submits to gateway for persistence + approval.

SANDBOX                              GATEWAY                        USER
┌──────────────────────┐     ┌──────────────────────┐     ┌──────────────────┐
│                      │     │                      │     │                  │
│  proxy.rs deny event │     │                      │     │                  │
│       │              │     │                      │     │                  │
│       v              │     │                      │     │                  │
│  DenialAggregator    │     │                      │     │                  │
│    group by          │     │                      │     │                  │
│    (host,port,binary)│     │                      │     │                  │
│    dedup + cooldown  │     │                      │     │                  │
│    L7 event ingestion│     │                      │     │                  │
│       │              │     │                      │     │                  │
│       v              │     │                      │     │                  │
│  MechanisticMapper   │     │                      │     │                  │
│    host:port → rule  │     │                      │     │                  │
│    Stage 1: L7 audit │     │                      │     │                  │
│    Stage 2: refine   │     │                      │     │                  │
│       │              │     │                      │     │                  │
│       │ SubmitPolicy │     │                      │     │                  │
│       │ Analysis RPC │     │                      │     │                  │
│       └──────────────┼────>│  Validate + persist  │     │                  │
│                      │     │  DraftPolicyUpdate   │────>│  CLI: draft list │
│                      │     │       │              │     │  TUI: draft panel│
│                      │     │       v              │     │       │          │
│                      │     │  Approve/Reject RPCs │<────│  approve/reject  │
│                      │     │       │              │     │                  │
│                      │     │  Merge into active   │     │                  │
│  Policy poll ←───────┼─────│  policy              │     │                  │
│  OPA reload          │     │                      │     │                  │
└──────────────────────┘     └──────────────────────┘     └──────────────────┘

Scope

Proto definitions (all messages + RPCs)

New messages in proto/:

  • L7RequestSample — observed HTTP method+path from L7 inspection
  • DenialSummary — structured denial data from sandbox aggregator (host, port, binary, counts, L7 samples, cmdlines, denial_stage)
  • PolicyChunk — proposed rule with rationale, status, stage, supersession
  • DraftPolicyUpdate — new SandboxStreamEvent variant for live notifications
  • Request/response pairs for all RPCs below

New RPCs on Navigator service:

  • SubmitPolicyAnalysis — sandbox → gateway: atomic submission of denial summaries + proposed chunks
  • GetDraftPolicy — CLI/TUI → gateway: query draft chunks with optional status filter
  • ApproveDraftChunk / RejectDraftChunk — single-chunk approval/rejection
  • ApproveAllDraftChunks — bulk approval (skips security-flagged unless forced)
  • EditDraftChunk — modify a pending chunk in-place (e.g., narrow allowed_ips)
  • UndoDraftChunk — reverse last approval
  • GetDraftHistory — audit trail of all decisions

See design doc Section 10 for complete proto definitions.

Gateway persistence layer

New tables:

  • draft_policy_chunks — stores proposed chunks with status lifecycle (pending → approved/rejected/superseded)
  • denial_summaries — stores aggregated denial data, upserted by (sandbox_id, host, port, binary)

Indexes for efficient querying by sandbox + status.

Extend the Store trait (crates/navigator-server/src/persistence/mod.rs) with methods for CRUD on both tables.

Gateway gRPC handlers

In crates/navigator-server/src/grpc.rs:

  • SubmitPolicyAnalysis handler: validate trust boundary (reject loopback/link-local hosts, rate limit per sandbox, format checks), DNS resolution for resolved_ips + is_private_ip annotation, persist summaries + chunks, increment draft version, publish DraftPolicyUpdate to SandboxWatchBus
  • GetDraftPolicy handler: query chunks by sandbox, optional status filter
  • ApproveDraftChunk handler: load current active policy, merge chunk's proposed_rule into network_policies map, persist via existing UpdateSandboxPolicy internals (deterministic_hash, put_policy_revision, supersede_older, notify watch_bus), update chunk status → approved, update denial_summaries status → resolved
  • RejectDraftChunk handler: update chunk status → rejected, store rejection reason
  • ApproveAllDraftChunks handler: iterate pending chunks, skip those with security_notes unless include_security_flagged=true, merge all into active policy
  • EditDraftChunk handler: replace proposed_rule on a pending chunk
  • UndoDraftChunk handler: remove merged rule from active policy, revert chunk to pending
  • GetDraftHistory handler: return chronological decision log

Gateway validation (trust boundary)

Per R7 from the design doc:

  • Reject chunks with loopback (127.0.0.0/8) or link-local (169.254.0.0/16) hosts — use is_internal_ip() from proxy.rs:956-974 as reference
  • Rate limit: max 10 outstanding pending chunks per sandbox (max_outstanding)
  • Format validation: rule names, endpoint fields, binary paths
  • DNS re-verification: gateway resolves hostnames independently (sandbox DNS is untrusted), annotates resolved_ips and is_private_ip on denial summaries
  • Pre-merge conflict detection: check if proposed rule overlaps existing network_policies entries (same host:port:binary)

Sandbox DenialAggregator

New module in crates/navigator-sandbox/src/:

  • Groups denial events by primary key (host, port, binary)
  • Dedup window (default 60s) with threshold (default 3) before emission
  • Cooldown (default 5m) between emissions for same key
  • Count tracking: window_count (resets), suppressed_count (cooldown drops), total_count (cumulative, never resets)
  • persistent_threshold (default 10): emit regardless of windowing for slow-drip patterns
  • Memory bounds: max_keys=1000, overflow counter for flood protection
  • Stale-flush: periodic sweep (30s) emits entries older than 5m with any activity
  • L7 event ingestion: collect (method, path, decision) from L7_REQUEST tracing events (relay.rs:123-133) into l7_request_samples map, capped at 50 distinct pairs per entry
  • Cmdline sanitization: redact Authorization, X-Vault-Token, Cookie, X-Api-Key headers, passwords, query string tokens before storage
  • Two event sources: L4 CONNECT deny from proxy.rs, L7 audit/deny from relay.rs

Design doc Section 9d has the full state machine diagram.

Sandbox mechanistic chunk generator

A deterministic mapper (no LLM) that converts denial summaries into proposed PolicyChunks:

Stage 1 (L4 denial → initial recommendation):

  • For HTTP-capable ports (80, 443, 8080, 8200, 9200): recommend rule with protocol: rest, tls: terminate (443), enforcement: audit, access: full — this unblocks traffic while enabling L7 visibility
  • For other ports: recommend plain L4 allow rule
  • Rule name: auto_{host}_{port} (sanitized)
  • Rationale: "Denied {count} connections to {host}:{port} from {binary}"

Stage 2 (L7 audit data → refined recommendation):

  • When an approved Stage 1 chunk has accumulated l7_request_samples:
    • All GET/HEAD/OPTIONS → access: read-only
    • Includes POST but no DELETE → access: read-write
    • Includes DELETE → access: full
  • Set enforcement: enforce, supersedes_chunk_id pointing to the Stage 1 chunk
  • Rationale: "Observed {n} HTTP requests ({method_summary}). Recommending {access} access."

This is intentionally simple — the LLM PolicyAdvisor (follow-up issue) replaces it with intelligent grouping, security analysis, and richer rationale.

Sandbox gRPC client

Extend CachedNavigatorClient (crates/navigator-sandbox/src/grpc_client.rs) with:

  • submit_policy_analysis() method: sends SubmitPolicyAnalysisRequest with denial summaries + proposed chunks
  • Error handling: log and retry on transient failures, drop on permanent failures
  • Include analysis_mode: "mechanistic" field

Approval merge flow

When a chunk is approved:

  1. Load current active SandboxPolicy (latest loaded version)
  2. Insert chunk's proposed_rule into network_policies map under chunk.rule_name
  3. Validate merged policy (static fields unchanged, no duplicate endpoint coverage)
  4. Persist via existing UpdateSandboxPolicy internals
  5. Update chunk status → approved, denial_summaries status → resolved
  6. Sandbox picks up new policy on next 30s poll cycle, reloads OPA

For chunk supersession (Stage 2 replacing Stage 1):

  1. Approve the Stage 2 chunk (same merge flow)
  2. The Stage 2 rule replaces the Stage 1 rule (same rule_name)
  3. Mark the original Stage 1 chunk as superseded

DraftPolicyUpdate streaming event

New SandboxStreamEvent variant sent via WatchSandbox stream:

  • draft_version, new_chunks count, total_pending count, brief summary
  • CLI logs --tail renders: "Draft policy updated: N new chunks. Run 'openshell sandbox draft <name>' to review."
  • TUI shows notification badge on Draft tab

CLI commands

New subcommands under openshell sandbox draft:

Command Description
sandbox draft <name> View the full living draft policy
sandbox draft <name> --chunks View individual chunks with rationale
sandbox draft <name> --chunk <id> View a specific chunk in detail
sandbox draft approve <name> <chunk_id> Approve a specific chunk
sandbox draft approve <name> --all Approve all pending (skips security-flagged)
sandbox draft approve <name> --all --force Include security-flagged chunks
sandbox draft reject <name> <chunk_id> Reject a specific chunk
sandbox draft reject <name> <chunk_id> --reason "..." Reject with reason
sandbox draft apply <name> Approve all + merge into active policy
sandbox draft clear <name> Clear all pending chunks
sandbox draft edit <name> <chunk_id> [--allowed-ips ...] [--access ...] Modify a pending chunk
sandbox draft undo <name> <chunk_id> Reverse last approval
sandbox draft history <name> Show decision history

See design doc Section 8a for detailed CLI flow examples.

TUI draft panel

New view mode in sandbox detail screen (keybinding cycle: pldp):

  • List pending/approved/rejected chunks with rationale summaries
  • Keybindings: a approve, r reject, A approve-all, Enter detail popup
  • Chunk detail popup: full proposed YAML, rationale, denial event summary
  • Live updates via DraftPolicyUpdate stream event
  • Status bar notification when new chunks arrive

See design doc Section 8b for TUI mockups.

Codebase references

Area File Lines
Proxy deny path crates/navigator-sandbox/src/proxy.rs 237-299
L7 relay events crates/navigator-sandbox/src/l7/relay.rs 123-133
Log push crates/navigator-sandbox/src/log_push.rs 44, 93
Policy poll loop crates/navigator-sandbox/src/lib.rs ~543-647
gRPC client crates/navigator-sandbox/src/grpc_client.rs
OPA evaluate crates/navigator-sandbox/src/opa.rs 227-237
Gateway gRPC crates/navigator-server/src/grpc.rs 1223+
Persistence Store crates/navigator-server/src/persistence/mod.rs
Tracing bus crates/navigator-server/src/tracing_bus.rs
SSRF check crates/navigator-sandbox/src/proxy.rs 956-974
Proto: sandbox events proto/navigator.proto SandboxStreamEvent
Proto: policy rules proto/sandbox.proto NetworkPolicyRule
CLI commands crates/navigator-cli/src/main.rs SandboxCommands
CLI handlers crates/navigator-cli/src/run.rs
TUI app crates/navigator-tui/src/app.rs Focus enum
Policy schema ref architecture/security-policy.md

Design document

Full design: https://gitlab-master.nvidia.com/-/snippets/12930
Local copy: architecture/plans/issue-153-policy-recommendations/00-deep-analysis.md

Effort estimate

~15-18 days (Phases 1, 2, 4, 5 from the design doc, minus LLM-specific parts)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions