Skip to content

Plan validation: OPA/Rego policy rules for deployment plan diffs #1073

@jsbroks

Description

@jsbroks

Summary

Add a plan validation system that evaluates OPA/Rego policies against deployment plan diffs (the current and proposed state produced by job agents like ArgoCD and Terraform Cloud). Validation results are surfaced in the deployment plan API and in GitHub check runs.

Note: a deploymentPlan is a manually-triggered dry-run (POST /v1/workspaces/{w}/deployments/{d}/plan) — it does not create a release or dispatch jobs. Validation results are observational. There is no in-engine deploy gate. The real gate, when a plan is associated with a PR, is the GitHub check failing → branch protection blocks merge → no version registered.

Policies use the deny rule convention (matching Conftest). Deny rules produce plain string messages (deny contains msg if { ... }). Rego v1 syntax only (import rego.v1, contains ... if, some x in ...).

This enables teams to enforce rules like:

  • Kubernetes manifests must have resource limits
  • Terraform plans cannot contain destructive changes
  • Replica counts cannot drop by more than 50%
  • Production changes require specific labels
  • Rollbacks are blocked unless explicitly approved

Design

Database Schema

Two new tables (Drizzle schema, no migration yet):

policy_rule_plan_validation_opa — stores OPA/Rego rules attached to a policy:

Column Type Notes
id UUID PK
policy_id UUID FK → policy(id) ON DELETE CASCADE
name text Human-readable rule name
description text Optional
rego text Rego v1 source code
severity text "error" or "warning" (open: see comment — may move into Rego output instead)
created_at timestamptz

deployment_plan_target_result_validation — stores evaluation outcomes per rule per plan result:

Column Type Notes
id UUID PK
result_id UUID FK → deployment_plan_target_result(id) ON DELETE CASCADE
rule_id UUID References the OPA rule
passed boolean
violations jsonb ["message 1", "message 2"] (array of strings)
evaluated_at timestamptz

Unique index on (result_id, rule_id) for upsert semantics.

OPA Evaluator (pkg/planvalidation)

In-process OPA evaluation using github.com/open-policy-agent/opa/v1. Rego v1 syntax only. Already exists in the repo.

Policies must define a deny rule set following the Conftest convention. Deny rules produce plain string messages. The evaluator:

  1. Parses the Rego module to extract the package declaration
  2. Queries data.<package>.deny
  3. Collects all string values from the deny set

OPA Input Schema

{
  "current": "<raw string from job agent — current state>",
  "proposed": "<raw string from job agent — proposed state>",
  "agentType": "argo-cd | terraform-cloud | ...",
  "hasChanges": true,
  "environment": { "name": "production", "id": "..." },
  "resource": { "name": "web-app", "id": "...", ... },
  "deployment": { "name": "my-deploy", "id": "...", ... },
  "proposedVersion": { "id": "...", "tag": "v2.0.0", "name": "v2.0.0", "metadata": {...}, "config": {...}, "status": "...", "createdAt": "..." },
  "currentVersion": { "id": "...", "tag": "v1.0.0", "name": "v1.0.0", "metadata": {...}, "config": {...}, "status": "...", "createdAt": "..." }
}
  • current / proposed are raw strings — Rego policies parse them with yaml.unmarshal, json.unmarshal, split, etc.
  • currentVersion is loaded from the release currently deployed to the target (deployment_plan_target.current_release_idreleasedeployment_version)
  • currentVersion is null for first-time deployments
  • Any package name is supported (package kubernetes.validation, package terraform.security, etc.)

Example Policies

Kubernetes resource limits:

package kubernetes.validation

import rego.v1

proposed_docs contains doc if {
    some raw in split(input.proposed, "\n---\n")
    doc := yaml.unmarshal(raw)
}

deny contains msg if {
    some m in proposed_docs
    m.kind == "Deployment"
    some c in m.spec.template.spec.containers
    not c.resources.limits
    msg := sprintf("Container %q missing resource limits", [c.name])
}

LoadBalancer label validation:

package kubernetes.validation

import rego.v1

docs := [doc | some raw in split(input.proposed, "\n---\n"); doc := yaml.unmarshal(raw)]

deny contains msg if {
    some doc in docs
    doc.kind == "Service"
    doc.spec.type == "LoadBalancer"
    labels := object.get(doc.metadata, "labels", {})
    not labels["lb.coreweave.com/address-pool"]
    not labels["service.beta.kubernetes.io/coreweave-load-balancer-type"]
    msg := sprintf("Service '%s' must define a load balancer label", [doc.metadata.name])
}

Terraform destructive change blocking:

package ctrlplane.plan_validation

import rego.v1

plan := json.unmarshal(input.proposed)

deny contains msg if {
    some rc in plan.resource_changes
    some action in rc.change.actions
    action == "delete"
    msg := sprintf("Destructive change to %s blocked", [rc.address])
}

Rollback detection via version comparison:

package ctrlplane.plan_validation

import rego.v1

deny contains msg if {
    input.currentVersion
    input.proposedVersion
    input.currentVersion.tag > input.proposedVersion.tag
    msg := sprintf("Rollback detected: %s -> %s", [input.currentVersion.tag, input.proposedVersion.tag])
}

Version metadata gating:

package ctrlplane.plan_validation

import rego.v1

deny contains msg if {
    input.proposedVersion.metadata["requires-approval"] == "true"
    input.environment.name == "production"
    msg := "version requires manual approval for production"
}

Validation Flow

Validation runs in a separate controller, not inline in deploymentplanresult. This decouples expensive OPA evaluation from plan completion and makes re-evaluation on rule edits a natural enqueue.

Plan result completes (deploymentplanresult controller)
  → enqueue planvalidation work item (scope = plan target result id)

planvalidation controller
  → load policy_rule_plan_validation_opa rules for the workspace
  → for each rule: evaluate OPA in-process
  → upsert into deployment_plan_target_result_validation
  → trigger GH check re-render

policy_rule_plan_validation_opa rule create/edit
  → enqueue planvalidation work items per existing plan result that should re-validate

GitHub Check Integration

The check body is re-rendered from DB state (current/proposed diff + validation rows) after each update — output.text is replaced wholesale on each PATCH, so the renderer is idempotent.

  • Passed rules show as ✅ with rule name
  • Failed rules show as ❌ with rule name, severity, and denial messages
  • Error-severity failures change the check conclusion to failure
  • Warning-severity failures appear in the output but don't fail the check

API Endpoints

Policy CRUDplanValidation is a new rule type in the existing policy API:

{
  "name": "Security Policy",
  "rules": {
    "planValidation": [
      {
        "name": "require-resource-limits",
        "rego": "package ...\nimport rego.v1\n...",
        "severity": "error"
      }
    ]
  }
}

Validation results read endpoint:

GET /v1/workspaces/{workspaceId}/deployments/{deploymentId}/plan/{planId}/targets/{targetId}/validations

Files Changed

File Description
packages/db/src/schema/policy.ts policyRulePlanValidationOpa table + relations
packages/db/src/schema/deployment-plan.ts deploymentPlanTargetResultValidation table + relations
apps/workspace-engine/pkg/planvalidation/evaluate.go OPA/Rego v1 evaluator (already exists)
apps/workspace-engine/svc/controllers/planvalidation/ New controller — leases work items, runs OPA, upserts results
apps/workspace-engine/svc/controllers/deploymentplanresult/controller.go Enqueue planvalidation work item on plan completion
apps/workspace-engine/svc/controllers/deploymentplanresult/github_check.go Validation sections in GH check body
apps/api/src/routes/v1/workspaces/policies.ts CRUD for plan validation rules
apps/api/src/routes/v1/workspaces/deployments.ts Validation results read endpoint
apps/api/openapi/paths/deployments.jsonnet + schemas/ API surface

Future Work

  • Webhook-based validation — add policy_rule_plan_validation_webhook table with url, secret (HMAC signing), severity, timeoutSeconds. Validator dispatches to HTTP POST endpoint alongside OPA evaluation.
  • Per-document mode — split multi-document YAML and evaluate each document independently (like Conftest --combine vs default)
  • Terraform providerctrlplane_policy_rule_plan_validation resource for managing rules as IaC
  • UI — policy rule editor with Rego syntax highlighting and a validation results viewer on deployment plan pages
  • Database migration — the Drizzle schema is defined but no migration has been generated yet

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions