Orbyx AI SPM - AI Security Posture Management

AI security posture management (AI-SPM) is a comprehensive approach to maintaining the security and integrity of artificial intelligence (AI) and machine learning (ML) systems. It involves continuous monitoring, assessment, and improvement of the security posture of AI models, data, and infrastructure. AI-SPM includes identifying and addressing vulnerabilities, misconfigurations, and potential risks associated with AI adoption, as well as ensuring compliance with relevant privacy and security regulations.

This opensource project dedicated to implementing Enterprise level AI-SPM. By doing so organizations can proactively protect their AI systems from threats, minimize data exposure, and maintain the trustworthiness of their AI applications (agents, mpc servers, models and more). Your organization is putting everything it’s got into AI applications—are you prepared to secure them?
Before you answer, think about these specific questions:
Can you identify all the shadow AI (including AI models, agents and associated resources) that's in your environment?
Are you effectively securing AI data to prevent data poisoning, bias and compliance breaches?
Do you know how to prioritize critical AI risks with context?
Are you confident that you can detect and respond quickly to suspicious activity in AI pipelines?
If you answered “not sure,” or “no” to even one of those questions, then you should take a closer look in to this project. It’s the way to see the current state of your AI ecosystem security.

Discover your AI models , agents, and associated resources security. Identify risks across AI application supply chains/piplines and agents - that can lead to data exfiltration and misuse of resources. Implement proper governance controls around AI usage.

OrbiX AI SPM

ℹ️ Project Information

👤 Author: Dany Shapiro
- https://www.linkedin.com/in/danyshapiro/
📦 Version: 1.0.0
📄 License: Apache-2.0
📂 Repository: https://github.com/dshapi/AI-SPM

Quick how to deploy 101

Get Orbyx AI SPM running locally in a few simple steps. Prerequisites:

Mac OS:

brew install mkcert istioctl
mkcert -install

Ubuntu / Debian:

sudo apt-get update
sudo apt-get install -y libnss3-tools  # mkcert needs this to trust the CA in browsers (SSL suport)

Fedora / RHEL

sudo dnf install -y nss-tools

All Linux

curl -fsSLo /tmp/mkcert "https://github.com/FiloSottile/mkcert/releases/latest/download/mkcert-v1.4.4-linux-amd64"
sudo install -m 0755 /tmp/mkcert /usr/local/bin/mkcert

curl -fsSL https://istio.io/downloadIstio | sh -
sudo install -m 0755 istio-*/bin/istioctl /usr/local/bin/istioctl

mkcert -install

If you're on arm64 Linux, swap linux-amd64 → linux-arm64 in the mkcert URL.

Step 1 — Install K8S (Kind) cluster

clone the repo.

Bring-up (clean cluster)

Run from /Users/danyshapiro/PycharmProjects/AISPM. Each step is idempotent.

export KUBECONFIG=$HOME/.kube/kind-aispm.yaml

./deploy/scripts/kind-cluster.sh init           # cluster + registry + metrics-server
./deploy/scripts/kind-storage.sh up             # MinIO + flink bucket
./deploy/scripts/kind-databases-ha.sh up        # CNPG + Bitnami Redis Sentinel

# Push AISPM service images to the local registry the kind nodes pull from:
docker compose build
docker images --format '{{.Repository}}' | grep '^aispm-' | sort -u | while read img; do
  docker tag "${img}:latest" "localhost:5001/${img}:latest"
  docker push "localhost:5001/${img}:latest"
done

# Alias for chart templates that hardcode `local-path`:
cat <<'EOF' | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-path
provisioner: rancher.io/local-path
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
EOF

SKIP_FALCO=1 SKIP_KYVERNO=1 \
  VALUES_EXTRA=deploy/helm/aispm/values.dev-multinode.yaml \
  ./deploy/scripts/bootstrap-cluster.sh

End-to-end on a fresh machine: about 20 minutes. Subsequent runs that only re-deploy the AISPM chart take about 5 minutes.

Once the bootstrap completes, navigate to:

What	URL
Chat UI	https://aispm.local:30443
Admin Portal	https://aispm.local:30443/admin/

Click Sign In on either page — a demo JWT is minted automatically, no account needed.

That's it! You're up and running.

Platform at a Glance


Microservices	16
OPA Policies	6
Kafka Topics	12+
Admin User Interface	1 ( Admin portal )
Supported Models	Anthropic / OpenAI-compatible endpoint / 3rd party model imprort
Compliance Framework	NIST AI RMF (GOVERN / MAP / MEASURE / MANAGE)

Admin Portal - Overview

A Real-time AI security posture across every agent, model, and data source — inventory, runtime, policies, and threat response unified.

Admin Portal - Dashboard

An AI Security Posture Management control plane providing real-time visibility, risk detection, and policy enforcement across agents, models, and context flows.

Admin Portal - Inventory

Check out the Demo

Security & Access Control

Authentication & Authorization

Feature	Description	Component
RS256 JWT Auth	Every API request validated against platform-generated RSA key pair. Tokens are short-lived and audience-scoped.	CPM API
Role-Based Access	Roles (`spm:admin`, `spm:auditor`, `user`) enforced on all SPM endpoints via OPA policy evaluation per request.	OPA / CPM API
Dev Token Endpoint	`/dev-token` generates 24-hour demo JWTs signed by the platform's own private key — no external IdP needed for development.	CPM API
Per-User Rate Limiting	Sliding window in Redis: 60 req/min with burst allowance of 10. Returns `429` with retry headers.	CPM API / Redis
Tenant Isolation	All events, topics, and audit records are scoped by `tenant_id`. Multi-tenant from day one.	All services

Prompt Security

Feature	Description	Component
Guard Model Screening	Every prompt passes through Llama Guard 3 (8B) before reaching the LLM. Blocks harmful content with category labels.	Guard Model
Prompt Injection Detection	Memory service scans writes for injection patterns: `ignore previous instructions`, `act as if`, `override instructions` etc.	Memory Service
OPA Prompt Policy	Rego policy evaluates posture score, intent drift, guard verdict, and auth context. Decisions: `allow` / `escalate` / `block`.	OPA
Posture-Based Blocking	Requests with risk score ≥ 0.70 are auto-blocked. 0.30–0.70 escalated. Below 0.30 allowed.	Policy Decider
Intent Drift Detection	Jaccard similarity tracks deviation from session baseline. High drift triggers escalation.	Flink CEP

Output Security

Feature	Description	Component
Secret Scanning	Regex detects API keys (`sk-`, `ghp_`, `AKIA*`), Bearer tokens, passwords in LLM responses.	CPM API
PII Detection	Detects email addresses, US SSNs, and phone numbers in responses. Triggers redaction or block via OPA output policy.	CPM API
Output Redaction	Matched secrets and PII replaced with `[REDACTED-SECRET]` / `[REDACTED-PII]` before reaching the user.	CPM API
OPA Output Policy	Second-pass policy evaluation on LLM output. Considers `contains_secret`, `contains_pii`, and LLM verdict.	OPA
Output Guard LLM	Optional second-pass LLM semantic scan for subtle policy violations not caught by regex.	Output Guard

LLM Integration & Gateway

Model Management

Feature	Description	Component
Model Registry	Full lifecycle: register → approve → freeze → retire. Tracked with provider, version, risk tier, and approver.	SPM API / DB
Model Gate	CPM API checks SPM approval status before every LLM call. Unapproved models return `403`. Fail-closed by design.	CPM API / OPA
Risk Tier Classification	Models classified as `low` / `medium` / `high` risk. Influences OPA policy thresholds and compliance evidence requirements.	SPM API
Multi-Model Support	Swap between Claude Haiku, Sonnet, Opus via `ANTHROPIC_MODEL` env var. Architecture supports any OpenAI-compatible endpoint.	CPM API
Model Freeze	Freeze controller suspends a model from serving traffic in real time via Kafka `freeze_control` topic.	Freeze Controller

Agent Posture Drift

Feature	Description	Component
Baseline vs Current Comparison	Compares approved agent posture snapshots with current runtime state to detect model, tool, identity, runtime, RAG, and guardrail drift.	`platform_shared.posture_drift`
Risk-Based Reapproval	Classifies drift as low, medium, high, or critical and recommends accept, re-review, rollback, or disable actions.	SPM API / Agent Control Plane
Evidence Hashing	Produces stable evidence hashes for audit records and compliance reporting.	Audit / Compliance

Agentic Tools

Feature	Description	Component
Web Search	Claude autonomously searches the web via Tavily API when prompted about current events or real-time data.	CPM API / Tavily
Web Fetch	Claude fetches and reads any URL provided by the user. HTML cleaned with BeautifulSoup before injection into context.	CPM API
Tool Authorization	OPA `tool_policy.rego` evaluates every tool call against posture score, intent, and auth context before execution.	OPA / Executor
Tool Execution Pipeline	Tool requests flow: `tool_request` → OPA auth → Executor → `tool_result`. Side-effect tools require approval.	Executor / Agent
Approval Workflow	Write/send/delete tools emit to `approval_request` topic and await `approval_result` before executing.	Executor

Conversation Memory

Feature	Description	Component
Cross-Session Memory	Conversation history stored in Redis with 30-day TTL. Claude receives last 20 turns as context on every request.	CPM API / Redis
Integrity Verification	Every memory write generates a SHA-256 hash. Reads verify the hash — `integrity_ok=False` triggers a security alert.	Memory Service
Namespace Scoping	Three namespaces: `session` (1h TTL), `longterm` (30d TTL), `system` (24h TTL). OPA policy controls access per namespace.	Memory Service
Injection Protection	Memory writes scanned for prompt injection patterns before storage. Malicious writes are rejected and audited.	Memory Service
Soft Delete	Memory deletes create tombstones rather than hard deleting. Audit trail preserved for forensics.	Memory Service

Observability & Compliance

Prometheus Metrics

Metric	Description
`spm_model_risk_score`	Per-model gauge updated on every posture event. Labels: `model_id`, `tenant_id`.
`spm_enforcement_actions_total`	Counter tracking `block` / `escalate` / `allow` decisions. Labels: `action`, `tenant_id`.
`spm_snapshot_lag_seconds`	Seconds since last posture snapshot write. Updated every 15s by background thread.
`spm_compliance_coverage_pct`	NIST AI RMF coverage % per function. Labels: `function` (GOVERN, MAP, MEASURE, MANAGE, OVERALL).

Grafana Dashboards

Engineering Dashboard

Model Risk Score over time (time-series)
Enforcement Actions total (stat)
Snapshot Lag (gauge with thresholds)
Model Lifecycle Status (table — name, version, status, risk tier, approver)
Web Tool Calls — every search/fetch with user, session, exact query (table)
Tool Type Breakdown — Search vs Fetch split (donut chart)
Blocked Requests — guard blocks, output blocks, model gate blocks with reason (table)

Compliance Dashboard

NIST AI RMF Coverage per function (gauge panels)
Overall Coverage % (stat)
Compliance Gap Table (table — control, status, evidence)

Audit & Compliance

Feature	Description	Component
Tamper-Evident Audit Log	All events written to Kafka audit topic and mirrored to `audit_export` table in PostgreSQL. `ON CONFLICT DO NOTHING` ensures idempotency.	SPM Aggregator / DB
NIST AI RMF Alignment	Compliance evidence mapped to GOVERN, MAP, MEASURE, MANAGE functions. Coverage % computed per function.	SPM API / DB
MITRE ATLAS TTP Mapping	CEP maps behavioural patterns to ATLAS TTPs (e.g. `AML.T0048`, `AML.T0051.000`). Attached to security alerts.	Flink CEP
Compliance Evidence	Attach evaluation results, test reports, and approval notes to each model as structured evidence records.	SPM API
Startup Audit Record	Platform startup writes an audit record per tenant. Baseline timestamp for forensic investigation.	Startup Orchestrator

Behavioural Analytics

Feature	Description	Component
Burst Detection	Tracks request volume in a 2-minute window. >5 events triggers burst alert with ATLAS TTP code.	Flink CEP
Sustained Volume Detection	1-hour rolling window detects sustained high-volume usage (>15 events).	Flink CEP
Critical Combo Detection	Specific signal combinations (e.g. exfiltration + high posture + PII) trigger immediate critical escalation.	Flink CEP
Session Signal Accumulation	Signals accumulate across a session. Repeated suspicious signals compound the risk score.	Flink CEP
Posture Snapshot History	Risk scores snapshotted every 5 minutes per model per tenant. Rolling average over configurable N snapshots.	SPM Aggregator

Infrastructure & Event Pipeline

Kafka Event Bus

Topic	Publisher	Consumer
`{tenant}.raw`	CPM API	Processor
`{tenant}.posture_enriched`	Processor	Policy Decider, Flink CEP, SPM Aggregator
`{tenant}.decision`	Policy Decider	Agent
`{tenant}.tool_request`	Agent / Tool Parser	Executor
`{tenant}.tool_result`	Executor	Agent
`{tenant}.audit`	All services	SPM Aggregator → `audit_export`
`{tenant}.memory_request`	Agent	Memory Service
`{tenant}.memory_result`	Memory Service	Agent
`{tenant}.approval_request`	Executor	(human reviewer)
`{tenant}.freeze_control`	Freeze Controller	All consumers

Platform Services

Service	Role
Startup Orchestrator	Validates OPA policies, waits for Kafka, creates topics, registers models, smoke-tests all policies on boot.
Processor	Enriches raw events with posture scoring, intent analysis, CEP signals. Publishes `PostureEnrichedEvent`.
Policy Decider	Evaluates OPA prompt policy on enriched events. Publishes `DecisionEvent`.
Agent Orchestrator	Plans tool execution and memory access based on OPA intent manifest.
Executor	Runs authorised tools. Implements tool registry with approval flow for side-effect operations.
Tool Parser	Extracts and validates structured tool calls from LLM output before forwarding to executor.
Memory Service	Scoped key-value store in Redis with integrity hashing, injection protection, and soft delete.
Output Guard	Optional second-pass LLM semantic scan of responses for subtle policy violations.
Retrieval Gateway	RAG-ready retrieval service. Scores document chunks for trust before injecting into LLM context.
Freeze Controller	Real-time model suspension via Kafka. Freeze propagates to all consumers within milliseconds.
Policy Simulator	Dry-run any policy change before deployment. Returns allow/block/escalate without touching live traffic.
SPM Aggregator	Consumes posture and audit events, writes to PostgreSQL, updates Prometheus metrics.
SPM API	REST API for model registry, compliance evidence, approval workflow, and audit export.
Guard Model	Llama Guard 3 (8B) inference service. Screens every prompt for harmful content categories.

UI & Developer Experience

Feature	Description
Orbyx Admin Portal	An AI Security Posture Management control plane providing real-time visibility, risk detection, and policy enforcement across agents, models, and context flows..
Orbyx Chat UI	React + Vite chat interface with landing state, simulated streaming, model selector, and New Chat button.
Tool Use Badges	Web search and fetch tool calls rendered as blue pill badges above the response text.
Security Footer	Persistent footer: "All messages are screened by the Orbyx security layer" — visible on every message.
Mock Fallback	UI falls back to mock responses when API is unreachable. Graceful degradation for demos.
Cross-Session Memory UI	Claude remembers previous conversations across sessions — no user action required.
Model Selector	Switch between Claude Haiku / Sonnet / Opus from the chat header or landing page.

#TODO: add more screenshorts from the admin portal

Roadmap

Features not yet implemented — candidates for the next sprint:

Orbyx AI SPM v3.0 · April 2026

Installation

Prerequisites

Tool	Minimum version	Notes
Docker	24+	The kind cluster runs as Docker containers; any Docker daemon works (Docker Desktop, OrbStack, Colima, native dockerd on Linux).
Docker Compose	v2.20+	Used for `docker compose build` to produce service images that get pushed to the kind-side registry.
Git	any	To clone the repo
Make	any	`brew install make` (macOS) / `apt install make` (Linux)
4 GB free RAM	—	Kafka + all services
2 GB free disk	—	Images + volumes

All images are published for linux/arm64. The compose file already sets the correct platform tags.

Clone & Configure

git clone https://github.com/your-org/orbyx-aispm.git
cd orbyx-aispm

Copy the example environment file:

cp .env.example .env

Do not commit your .env file — it is already in .gitignore.

API Keys

Open .env in any editor and fill in the two required secrets:

Anthropic (required for Claude responses)

ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxx
ANTHROPIC_MODEL=claude-sonnet-4-6      # or claude-haiku-4-5-20251001 / claude-opus-4-6

Get a key at console.anthropic.com.

Tavily (required for web search tool)

TAVILY_API_KEY=tvly-xxxxxxxxxxxx

Get a free key at app.tavily.com. Without this key, the web search tool will silently skip search calls and Claude will answer from its training data only.

Groq (required for Threat Hunting Agent + optional guard acceleration)

GROQ_API_KEY=gsk_xxxxxxxxxxxx
HUNT_MODEL=llama-3.3-70b-versatile   # model used by the threat-hunting agent

Get a free key at console.groq.com.

Groq is used in two places:

Threat Hunting Agent — GROQ_API_KEY is required. Without it the threat-hunting-agent service will refuse to start.
Guard Model (Llama Guard) — optional. Without a key the guard falls back to a built-in regex classifier (still functional, just less accurate).

First Boot

make up

This single command will:

Build all Docker images from source
Start the full infrastructure stack (Kafka, Redis, PostgreSQL, OPA, Prometheus, Grafana)
Run the startup orchestrator, which automatically:
- Generates RSA key-pair into ./keys/ (used for JWT signing)
- Creates Kafka topics and ACLs per tenant
- Seeds OPA with the default policy bundle
- Registers the default AI model in the SPM registry
Start all platform services (API, Guard Model, CEP, SPM, UI, etc.)

The orchestrator exits when provisioning is complete. Expect the first build to take 3–5 minutes depending on your internet speed. Subsequent starts are near-instant.

You'll see this when it's ready:

✓ Platform started.
  Admin chat:        http://localhost:3001/
  Admin:             http://localhost:3001/admin
  API:               http://localhost:8080
  Guard Model:       http://localhost:8200
  Freeze Controller: http://localhost:8090
  Policy Simulator:  http://localhost:8091
  OPA:               http://localhost:8181

Verify the Platform

Check that all services are healthy:

make status

Expected output shows all containers as Up or healthy. The API and Guard Model health endpoints will return JSON {"status": "ok"}.

Alternatively:

docker compose ps

Access the UI & Dashboards

Service	URL	Credentials
Orbyx Chat UI	http://localhost:3000	Auto-login via JWT (click "Sign In")
Grafana	http://localhost:3001	`admin` / `admin` (change on first login)
Prometheus	http://localhost:9090	No auth
OPA	http://localhost:8181	No auth
SPM API	http://localhost:8092	JWT Bearer token required
Policy Simulator	http://localhost:8091	JWT Bearer token required

Grafana Dashboards

Three dashboards are pre-provisioned and load automatically:

AI SPM Overview — posture scores, enforcement actions, risk trends
Engineering — tool calls, blocked requests, model performance, CEP events
Compliance — NIST AI RMF control coverage, audit trail

Run the Smoke Test

Send a real request through the full pipeline and verify end-to-end:

make smoke-test

This will:

Mint a demo JWT
Send "What meetings do I have today?" → expects a Claude response
Send a prompt injection attempt → expects HTTP 400 (blocked)

A passing run ends with:

✓ Smoke test PASSED

Stopping & Cleaning Up

Stop all services (keeps data):

docker compose down
# with auth overlay:
docker compose -f compose.yml -f compose.auth.yml down

Start local Docker Compose stack:

docker compose up -d

Bootstrap or re-deploy the full K8s/Helm stack:

bash deploy/scripts/bootstrap-cluster.sh

Stop and wipe all data (volumes, generated keys):

make clean

⚠️ make clean deletes the RSA keys in ./keys/ and the Keycloak realm volume (keycloak-data). New keys are auto-generated on next boot (invalidates existing JWTs). You will also need to redo the first-time Keycloak setup.

Troubleshooting

Services fail to start / `make up` exits early

Check the orchestrator logs:

docker compose logs startup-orchestrator

Common causes: Kafka not ready in time. Re-run make up — it is idempotent.

`cpm-startup-orchestrator` not found

Use the service name (not the container name) with docker compose:

docker compose restart startup-orchestrator   # ✓ correct
docker compose restart cpm-startup-orchestrator  # ✗ wrong

Chat UI shows `[object Object]` error

This indicates a model gate rejection. Check that LLM_MODEL_ID in .env is blank:

LLM_MODEL_ID=

Then restart the API:

docker compose up -d --build api

`404 model not found` from Anthropic

The model name in your .env is outdated. Update to a current model:

ANTHROPIC_MODEL=claude-sonnet-4-6

Current valid model IDs:

Label	Model ID
Claude Haiku	`claude-haiku-4-5-20251001`
Claude Sonnet	`claude-sonnet-4-6`
Claude Opus	`claude-opus-4-6`

Grafana panels show "No data"

Panels populate after the first real request is processed. Run make smoke-test to generate events, then refresh the dashboard.

Port conflict

If any port (3000, 3001, 8080, etc.) is already in use, edit compose.yml and change the host-side port mapping:

ports:
  - "3100:3000"  # change 3000 → 3100 (host:container)

Threat hunting collectors report `column "session_id" does not exist`

The audit_export table is missing the session_id column added in migration 002. Run the migration once while the stack is up:

# Option A — via Alembic
docker compose exec spm-api alembic upgrade head

# Option B — direct SQL
docker compose exec spm-db psql -U spm_rw -d spm -c "
  ALTER TABLE audit_export ADD COLUMN IF NOT EXISTS session_id VARCHAR(64);
  CREATE INDEX IF NOT EXISTS idx_audit_export_session_id ON audit_export (session_id);
"

`threat-hunting-agent` fails to start — `GROQ_API_KEY` missing

The threat-hunting-agent requires a Groq API key. Set it in .env:

GROQ_API_KEY=gsk_xxxxxxxxxxxx

Get a free key at console.groq.com, then rebuild the service:

docker compose up -d --build threat-hunting-agent

Rebuilding a single service after code changes

docker compose up -d --build api                  # rebuild API only
docker compose up -d --build ui                   # rebuild UI only
docker compose up -d --build spm-aggregator       # rebuild SPM aggregator
docker compose up -d --build threat-hunting-agent # rebuild threat hunting agent

Environment Reference

The following variables can be tuned in .env. All have sane defaults and only the API keys need to be set for a working installation.

Variable	Default	Description
`ANTHROPIC_API_KEY`	(required)	Anthropic API key
`ANTHROPIC_MODEL`	`claude-sonnet-4-6`	Claude model to use
`TAVILY_API_KEY`	(optional)	Tavily key for web search tool
`GROQ_API_KEY`	(required for threat hunting)	Groq key — powers the Threat Hunting Agent LLM and optionally accelerates Llama Guard 3
`HUNT_MODEL`	`llama-3.3-70b-versatile`	Groq model used by the threat-hunting agent
`HUNT_BATCH_WINDOW_SEC`	`30`	Kafka batch window for the threat-hunting agent (seconds)
`THREATHUNTING_AI_INTERVAL_SEC`	`300`	Proactive threat scan interval (seconds)
`TENANTS`	`t1`	Comma-separated tenant IDs
`RATE_LIMIT_RPM`	`60`	Max requests per minute per user
`GUARD_MODEL_ENABLED`	`true`	Enable/disable content guard
`POSTURE_BLOCK_THRESHOLD`	`0.70`	Risk score at which requests are blocked
`CEP_SHORT_WINDOW_SEC`	`120`	Burst detection window (seconds)
`CEP_LONG_WINDOW_SEC`	`3600`	Sustained volume window (seconds)
`MEMORY_LONGTERM_TTL_SEC`	`2592000`	Cross-session memory TTL (30 days)
`SPM_SNAPSHOT_INTERVAL_SEC`	`300`	Posture snapshot interval (5 min)
`GRAFANA_ADMIN_PASSWORD`	`admin`	Grafana admin password
`REDIS_PASSWORD`	(blank)	Redis password (blank = no auth)
`SPM_DB_PASSWORD`	`spmpass`	PostgreSQL password for SPM DB
`LLM_MODEL_ID`	(blank)	SPM model registry ID (leave blank to bypass gate)

Quick-Reference Commands

make up              # Start everything
make down            # Stop everything
make status          # Health check
make logs            # Tail all logs
make logs-api        # Tail API logs only
make smoke-test      # End-to-end test
make token           # Mint a demo user JWT
make admin-token     # Mint an admin JWT
make freeze          # Freeze demo user (requires admin token)
make unfreeze        # Unfreeze demo user
make clean           # Wipe all data and keys

Usage

Prerequisites

Tool	Minimum version	Notes
Docker	24+	The kind cluster runs as Docker containers; any Docker daemon works (Docker Desktop, OrbStack, Colima, native dockerd on Linux).
Docker Compose	v2.20+	Used for `docker compose build` to produce service images that get pushed to the kind-side registry.
Git	any	To clone the repo
Make	any	`brew install make` (macOS) / `apt install make` (Linux)
4 GB free RAM	—	Kafka + all services
2 GB free disk	—	Images + volumes

Apple Silicon (M1/M2/M3): All images are published for linux/arm64. The compose file already sets the correct platform tags.

Clone & Configure

git clone https://github.com/your-org/orbyx-aispm.git
cd orbyx-aispm

Copy the example environment file:

cp .env.example .env

Do not commit your .env file — it is already in .gitignore.

API Keys

Open .env in any editor and fill in the two required secrets:

Anthropic (required for Claude responses)

ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxx
ANTHROPIC_MODEL=claude-sonnet-4-6      # or claude-haiku-4-5-20251001 / claude-opus-4-6

Get a key at console.anthropic.com.

Tavily (required for web search tool)

TAVILY_API_KEY=tvly-xxxxxxxxxxxx

Get a free key at app.tavily.com. Without this key, the web search tool will silently skip search calls and Claude will answer from its training data only.

Groq (required for Threat Hunting Agent + optional guard acceleration)

GROQ_API_KEY=gsk_xxxxxxxxxxxx
HUNT_MODEL=llama-3.3-70b-versatile   # model used by the threat-hunting agent

Get a free key at console.groq.com.

Groq is used in two places:

Threat Hunting Agent — GROQ_API_KEY is required. Without it the threat-hunting-agent service will refuse to start.
Guard Model (Llama Guard) — optional. Without a key the guard falls back to a built-in regex classifier (still functional, just less accurate).

First Boot

make up

This single command will:

Build all Docker images from source
Start the full infrastructure stack (Kafka, Redis, PostgreSQL, OPA, Prometheus, Grafana)
Run the startup orchestrator, which automatically:
- Generates RSA key-pair into ./keys/ (used for JWT signing)
- Creates Kafka topics and ACLs per tenant
- Seeds OPA with the default policy bundle
- Registers the default AI model in the SPM registry
Start all platform services (API, Guard Model, CEP, SPM, UI, etc.)

The orchestrator exits when provisioning is complete. Expect the first build to take 3–5 minutes depending on your internet speed. Subsequent starts are near-instant.

You'll see this when it's ready:

✓ Platform started.
  Chat:              http://localhost:3001 
  Admin portal:      http://localhost:3001/admin
  API:               http://localhost:8080
  Guard Model:       http://localhost:8200
  Freeze Controller: http://localhost:8090
  Policy Simulator:  http://localhost:8091
  OPA:               http://localhost:8181

Verify the Platform

Check that all services are healthy:

make status

Expected output shows all containers as Up or healthy. The API and Guard Model health endpoints will return JSON {"status": "ok"}.

Alternatively:

docker compose ps

Access the UI & Dashboards

Service	URL	Credentials
Orbyx Chat UI	http://localhost:3000	Auto-login via JWT (click "Sign In")
Grafana	http://localhost:3001	`admin` / `admin` (change on first login)
Prometheus	http://localhost:9090	No auth
OPA	http://localhost:8181	No auth
SPM API	http://localhost:8092	JWT Bearer token required
Policy Simulator	http://localhost:8091	JWT Bearer token required

Grafana Dashboards

Three dashboards are pre-provisioned and load automatically:

AI SPM Overview — posture scores, enforcement actions, risk trends
Engineering — tool calls, blocked requests, model performance, CEP events
Compliance — NIST AI RMF control coverage, audit trail

Run the Smoke Test

Send a real request through the full pipeline and verify end-to-end:

make smoke-test

This will:

Mint a demo JWT
Send "What meetings do I have today?" → expects a Claude response
Send a prompt injection attempt → expects HTTP 400 (blocked)

A passing run ends with:

✓ Smoke test PASSED

Stopping & Cleaning Up

Stop all services (keeps data):

docker compose down
# with auth overlay:
docker compose -f compose.yml -f compose.auth.yml down

Start local Docker Compose stack:

docker compose up -d

Bootstrap or re-deploy the full K8s/Helm stack:

bash deploy/scripts/bootstrap-cluster.sh

Stop and wipe all data (volumes, generated keys):

make clean

⚠️ make clean deletes the RSA keys in ./keys/ and the Keycloak realm volume (keycloak-data). New keys are auto-generated on next boot (invalidates existing JWTs). You will also need to redo the first-time Keycloak setup.

Troubleshooting

Services fail to start / `make up` exits early

Check the orchestrator logs:

docker compose logs startup-orchestrator

Common causes: Kafka not ready in time. Re-run make up — it is idempotent.

`cpm-startup-orchestrator` not found

Use the service name (not the container name) with docker compose:

docker compose restart startup-orchestrator   # ✓ correct
docker compose restart cpm-startup-orchestrator  # ✗ wrong

Chat UI shows `[object Object]` error

This indicates a model gate rejection. Check that LLM_MODEL_ID in .env is blank:

LLM_MODEL_ID=

Then restart the API:

docker compose up -d --build api

`404 model not found` from Anthropic

The model name in your .env is outdated. Update to a current model:

ANTHROPIC_MODEL=claude-sonnet-4-6

Current valid model IDs:

Label	Model ID
Claude Haiku	`claude-haiku-4-5-20251001`
Claude Sonnet	`claude-sonnet-4-6`
Claude Opus	`claude-opus-4-6`

Grafana panels show "No data"

Panels populate after the first real request is processed. Run make smoke-test to generate events, then refresh the dashboard.

Port conflict

If any port (3000, 3001, 8080, etc.) is already in use, edit compose.yml and change the host-side port mapping:

ports:
  - "3100:3000"  # change 3000 → 3100 (host:container)

Threat hunting collectors report `column "session_id" does not exist`

The audit_export table is missing the session_id column added in migration 002. Run the migration once while the stack is up:

# Option A — via Alembic
docker compose exec spm-api alembic upgrade head

# Option B — direct SQL
docker compose exec spm-db psql -U spm_rw -d spm -c "
  ALTER TABLE audit_export ADD COLUMN IF NOT EXISTS session_id VARCHAR(64);
  CREATE INDEX IF NOT EXISTS idx_audit_export_session_id ON audit_export (session_id);
"

`threat-hunting-agent` fails to start — `GROQ_API_KEY` missing

The threat-hunting-agent requires a Groq API key. Set it in .env:

GROQ_API_KEY=gsk_xxxxxxxxxxxx

Get a free key at console.groq.com, then rebuild the service:

docker compose up -d --build threat-hunting-agent

Rebuilding a single service after code changes

docker compose up -d --build api                  # rebuild API only
docker compose up -d --build ui                   # rebuild UI only
docker compose up -d --build spm-aggregator       # rebuild SPM aggregator
docker compose up -d --build threat-hunting-agent # rebuild threat hunting agent

Environment Reference

The following variables can be tuned in .env. All have sane defaults and only the API keys need to be set for a working installation.

Variable	Default	Description
`ANTHROPIC_API_KEY`	(required)	Anthropic API key
`ANTHROPIC_MODEL`	`claude-sonnet-4-6`	Claude model to use
`TAVILY_API_KEY`	(optional)	Tavily key for web search tool
`GROQ_API_KEY`	(required for threat hunting)	Groq key — powers the Threat Hunting Agent LLM and optionally accelerates Llama Guard 3
`HUNT_MODEL`	`llama-3.3-70b-versatile`	Groq model used by the threat-hunting agent
`HUNT_BATCH_WINDOW_SEC`	`30`	Kafka batch window for the threat-hunting agent (seconds)
`THREATHUNTING_AI_INTERVAL_SEC`	`300`	Proactive threat scan interval (seconds)
`TENANTS`	`t1`	Comma-separated tenant IDs
`RATE_LIMIT_RPM`	`60`	Max requests per minute per user
`GUARD_MODEL_ENABLED`	`true`	Enable/disable content guard
`POSTURE_BLOCK_THRESHOLD`	`0.70`	Risk score at which requests are blocked
`CEP_SHORT_WINDOW_SEC`	`120`	Burst detection window (seconds)
`CEP_LONG_WINDOW_SEC`	`3600`	Sustained volume window (seconds)
`MEMORY_LONGTERM_TTL_SEC`	`2592000`	Cross-session memory TTL (30 days)
`SPM_SNAPSHOT_INTERVAL_SEC`	`300`	Posture snapshot interval (5 min)
`GRAFANA_ADMIN_PASSWORD`	`admin`	Grafana admin password
`REDIS_PASSWORD`	(blank)	Redis password (blank = no auth)
`SPM_DB_PASSWORD`	`spmpass`	PostgreSQL password for SPM DB
`LLM_MODEL_ID`	(blank)	SPM model registry ID (leave blank to bypass gate)

Quick-Reference Commands

make up              # Start everything
make down            # Stop everything
make status          # Health check
make logs            # Tail all logs
make logs-api        # Tail API logs only
make smoke-test      # End-to-end test
make token           # Mint a demo user JWT
make admin-token     # Mint an admin JWT
make freeze          # Freeze demo user (requires admin token)
make unfreeze        # Unfreeze demo user
make clean           # Wipe all data and keys

Chat Interface

Open http://localhost:3000 in your browser.

Click Sign In — a demo JWT is minted automatically.
Type a message in the input box and press Enter or click Send.
Claude will respond. If a web search or web fetch was used, you'll see a badge above the reply:

🔍 Searched: "latest AI news" 🌐 Fetched: https://example.com
Use the model selector (top-right) to switch between Haiku, Sonnet, and Opus.

Conversation Memory

Claude remembers your previous messages across sessions for 30 days. You can refer back to earlier conversations naturally — no need to repeat context.

Blocked Requests

Some prompts are automatically blocked by the platform:

Block type	Example trigger	HTTP code
Prompt injection	"Ignore previous instructions…"	`400`
High posture score	Repeated suspicious patterns	`400`
Model gate	Unapproved model ID in request	`403`
Output guard	Sensitive data in LLM response	`400`

When a request is blocked the UI shows a red error message explaining why.

Admin Actions

Mint tokens and manage users from the terminal:

# Mint a regular user token
make token

# Mint an admin token
make admin-token

# Freeze a user (blocks all their requests)
make freeze

# Unfreeze a user
make unfreeze

Grafana Dashboards

Open http://localhost:3001 → login with admin / admin.

Dashboard	What to look at
AI SPM Overview	Real-time posture scores, enforcement actions, risk trends per tenant
Engineering	Tool call counts, blocked requests with reasons, CEP events, model latency
Compliance	NIST AI RMF control coverage, 30-day audit trail

Dashboards auto-refresh every 30 seconds. Use the time-range picker (top-right) to zoom into a specific window.

SPM API (REST)

Base URL: http://localhost:8092

All endpoints require a Bearer token. Use make admin-token or make spm-token-auditor to get one.

# List registered AI models
TOKEN=$(make admin-token -s)
curl -H "Authorization: Bearer $TOKEN" http://localhost:8092/models

# Register a new model
curl -X POST http://localhost:8092/models \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name":"my-model","version":"1.0","provider":"openai","risk_tier":"limited"}'

# NIST AI RMF compliance report
curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:8092/compliance/nist-airm/report

Policy Simulator

Test policy changes against sample events before rolling them out:

make simulate

Or call the API directly at http://localhost:8091/simulate with a JSON payload of candidate policy + sample events. The response shows which events would be allowed, escalated, or blocked under the new policy.

Logs

make logs              # all services
make logs-api          # API only
make logs-spm-api      # SPM API only
docker compose logs -f guard-model   # any service by name

Common Workflows

Investigate a blocked request

Open Grafana → Engineering dashboard → Blocked Requests table
Note the reason and session_id
Search logs: make logs-api | grep <session_id>

Check a user's posture score

TOKEN=$(make admin-token -s)
curl -H "Authorization: Bearer $TOKEN" \
  "http://localhost:8092/posture?tenant_id=t1&user_id=user-demo-1"

Simulate a compliance report

make spm-compliance

Returns a JSON report mapping NIST AI RMF controls to pass/fail/partial status based on current platform configuration.

Tech Stack

Orbyx AI SPM — Tech Stack

A full reference of every technology, library, and external service used in the platform.

Architecture Overview

Control Plane & Data Path

Infrastructure

Component	Technology	Version	Role
Container runtime	Docker + Docker Compose	Compose v2	Runs all services locally
Message broker	Apache Kafka (Confluent)	7.6.1	Event streaming backbone — audit, posture, CEP events
Cache / memory	Redis	7 (Alpine)	Session memory, long-term conversation history, rate limiting
Database	PostgreSQL	16 (Alpine)	SPM audit log, posture snapshots, model registry
Policy engine	Open Policy Agent (OPA)	0.70.0	Rego-based request policy evaluation
Metrics	Prometheus	v2.55.1	Scrapes all service `/metrics` endpoints
Dashboards	Grafana	11.4.0	Pre-provisioned AI SPM, Engineering, and Compliance dashboards

Backend Services

All backend services are written in Python 3.11 and served with FastAPI + Uvicorn.

Service	Port	Description
`api`	8080	Main gateway — auth, guard, LLM proxy, rate limiting
`guard-model`	8200	Content moderation (Llama Guard 3 via Groq or regex fallback)
`freeze-controller`	8090	Admin freeze/unfreeze of users and tenants
`policy-simulator`	8091	Dry-run policy evaluation against sample events
`spm-api`	8092	Model registry, posture API, compliance reports
`spm-aggregator`	—	Kafka consumer → Postgres writer, Prometheus metrics
`processor`	—	Kafka consumer — enriches raw events with posture scores
`memory-service`	—	Manages session, long-term, and system memory in Redis
`output-guard`	—	Second-pass LLM scan on Claude's responses
`policy-decider`	—	Evaluates OPA decisions and emits enforcement events
`retrieval-gateway`	—	Context retrieval for RAG (tool results, calendar, etc.)
`tool-parser`	—	Parses and validates tool call requests
`executor`	—	Executes approved tool calls
`agent`	—	Orchestrates multi-step agentic workflows
`agent-orchestrator`	8094	Session lifecycle, risk scoring, threat finding storage, case management
`threat-hunting-agent`	—	Autonomous AI threat hunter — 9 proactive scans + LangChain/Groq LLM
`startup-orchestrator`	—	One-shot init container: keys, Kafka topics, OPA seed

Core Python Libraries

Library	Version	Used for
FastAPI	0.115.x	REST API framework
Uvicorn	0.30–0.32	ASGI server
Pydantic	2.9	Request/response validation
anthropic	0.40.0	Claude API client (tool use, streaming)
kafka-python-ng	2.2.3	Kafka producer/consumer
redis	5.1–5.2	Redis client
PyJWT + cryptography	2.9–2.10 / 43.0	RS256 JWT signing and verification
httpx	0.27.2	Async HTTP client (tool fetch, inter-service calls)
requests	2.32	Sync HTTP client
SQLAlchemy (asyncio)	2.0.36	Async ORM for SPM database
asyncpg	0.30.0	Async PostgreSQL driver
psycopg2-binary	2.9.9	Sync PostgreSQL driver
groq	0.11.0	Groq client for Llama Guard 3 inference
tavily-python	0.5.0	Web search tool (Tavily API)
beautifulsoup4 + lxml	4.12.3 / 5.3.0	HTML parsing for web fetch tool
prometheus-client	0.21.1	Exposes `/metrics` endpoint
prometheus-fastapi-instrumentator	7.0.0	Auto-instruments FastAPI with Prometheus
weasyprint	62.3	PDF report generation (compliance exports)

Frontend

Technology	Version	Role
React	18.3	UI framework
Vite	5.4	Build tool and dev server
react-markdown	9.0	Renders Claude's markdown responses
remark-gfm	4.0	GitHub-flavored markdown (tables, strikethrough, etc.)

The UI is a single-page app served by an Nginx container (ui) on port 3000. No external CSS framework — fully custom design with CSS variables for theming.

External APIs

Service	Purpose	Required
Anthropic Claude	LLM backend (Haiku / Sonnet / Opus)	✅ Yes
Tavily	Real-time web search tool for Claude	⚠️ Optional
Groq	Threat Hunting Agent LLM (Llama 3.3 70B) + Llama Guard 3 content moderation	✅ Required (threat hunting) / ⚠️ Optional (guard)

Security & Auth

Component	Technology	Notes
Authentication	RS256 JWT	Key-pair auto-generated at startup into `./keys/`
Authorization	OPA + Rego	Policy-as-code, evaluated per request
Content moderation	Llama Guard 3 (Groq)	Falls back to regex classifier if no Groq key
Output scanning	Second-pass LLM guard	Checks Claude responses for sensitive data leakage
Rate limiting	In-process Redis counter	Configurable RPM per user
Prompt injection detection	Guard model + CEP patterns	Pattern-matched and ML-scored

Observability

Layer	Technology	Details
Metrics	Prometheus	Scraped from all services every 15 s
Dashboards	Grafana	3 pre-provisioned dashboards, auto-loaded via provisioning config
Audit log	PostgreSQL (`audit_export` table)	Every request written as JSONB with full event payload
Structured logs	Python `logging` → stdout	Collected by Docker, viewable via `make logs`
Posture snapshots	PostgreSQL + Prometheus	5-min bucketed risk scores per tenant

Data Flow

User prompt
    │
    ▼
JWT Auth → Rate Limit → Guard Model (content check)
    │
    ▼
OPA Policy Evaluation
    │
    ▼
Memory Load (Redis — last 20 turns, 30-day TTL)
    │
    ▼
Claude API (tool loop — up to 3 rounds)
    │  ├── web_search  →  Tavily
    │  └── web_fetch   →  httpx + BeautifulSoup
    ▼
Output Guard (second-pass LLM scan)
    │
    ▼
Audit Event → Kafka → SPM Aggregator → PostgreSQL + Prometheus
    │
    ▼
Response → User

Threat Hunting Agent

The threat-hunting-agent is an autonomous AI security service that continuously scans the platform for threats — independent of user-triggered requests. It runs a LangChain agent backed by Groq + Llama 3.3 70B Versatile and fires on two triggers:

Kafka consumer — reacts to session events in near real-time
Scheduler — runs a full proactive scan cycle every 5 minutes (configurable via THREATHUNTING_AI_INTERVAL_SEC)

Proactive Scans

Every scan cycle runs all 9 detectors in parallel. Each detector queries live data (Postgres, Redis, /proc) and produces structured findings that the agent analyses with the LLM before posting to the orchestrator.

Scan	What it detects
`exposed_credentials`	API keys, tokens, and passwords stored in Redis under unexpected namespaces
`sensitive_data_exposure`	PII patterns, DB connection strings in Redis (broader sweep)
`unused_open_ports`	Internal service ports reachable that should not be (misconfigured or rogue services)
`unexpected_listen_ports`	Ports in LISTEN state in `/proc/net/tcp` not on the allowed service list
`overprivileged_tools`	AI models in the registry with unacceptable risk tier still set to `active`
`runtime_anomaly_detection`	High-frequency actors, enforcement block clusters (3+/session/hour), session storms (5+/actor/10 min)
`prompt_secret_exfiltration`	API keys and bearer tokens inside prompt/response text in the audit log
`data_leakage_detection`	SSNs, credit card numbers, email addresses in agent response text
`tool_misuse_detection`	High tool-call frequency (>20/actor/hour), rapid chaining (>5 calls/session/min), high block ratios

Findings & Cases

When a scan finds an anomaly the LLM produces a structured threat finding (severity, hypothesis, evidence, recommended actions) which is:

Posted to the agent-orchestrator via POST /api/v1/threat-findings
Deduplicated by batch hash — the same pattern won't flood the findings tab
Automatically prioritised (risk score, recency, occurrence count)
Escalated to a Case when should_open_case=true and priority score ≥ 0.40

Configuration

GROQ_API_KEY=gsk_xxxxxxxxxxxx           # required — service won't start without it
HUNT_MODEL=llama-3.3-70b-versatile      # LLM model (any Groq-hosted model)
HUNT_BATCH_WINDOW_SEC=30                # Kafka batch window
THREATHUNTING_AI_INTERVAL_SEC=300       # Proactive scan interval (seconds)

Database Migration

The threat hunting collectors query the session_id column on the audit_export table. If you are upgrading an existing installation run the migration before starting the agent:

# Option A — via Alembic (recommended)
docker compose exec spm-api alembic upgrade head

# Option B — direct SQL (if Alembic is unavailable)
docker compose exec spm-db psql -U spm_rw -d spm -c "
  ALTER TABLE audit_export ADD COLUMN IF NOT EXISTS session_id VARCHAR(64);
  CREATE INDEX IF NOT EXISTS idx_audit_export_session_id ON audit_export (session_id);
"

Kafka Topics

Topic	Producers	Consumers
`{tenant}.raw_events`	API gateway	Processor, CEP
`{tenant}.posture_events`	Processor	SPM Aggregator, Policy Decider
`{tenant}.enforcement_actions`	Policy Decider, Freeze Controller	SPM Aggregator
`{tenant}.audit_export`	API gateway, SPM services	SPM Aggregator → Postgres

Language & Runtime Summary

Layer	Language	Runtime
All backend services	Python 3.11	CPython
Frontend	JavaScript (ESM)	Node 20 (build only), Nginx (serve)
Policy	Rego	OPA 0.70
Infrastructure config	YAML / Dockerfile	Docker Compose v2
Database migrations	SQL	PostgreSQL 16
Build automation	Make	GNU Make

Contributing

Contributing to Orbyx AI SPM

Thanks for your interest in contributing! Here's everything you need to get started.

Getting Started

Fork the repository and clone your fork
Follow INSTALL.md to get the platform running locally
Create a feature branch: git checkout -b feat/your-feature-name

Development Workflow

Making changes

Most services are hot-reloaded in development. After editing Python files, rebuild only the affected service:

docker compose up -d --build api          # API changes
docker compose up -d --build spm-api      # SPM API changes
docker compose up -d --build ui           # Frontend changes

Running tests

make test              # unit tests (no Docker needed)
make smoke-test        # end-to-end test against running platform

Tests live in tests/. Please add or update tests for any new behaviour.

Checking logs

make logs              # all services
make logs-api          # single service

Pull Request Guidelines

One concern per PR — keep changes focused and reviewable
Write a clear description — what changed and why
Include tests — new features and bug fixes should have test coverage
Pass CI — all tests must be green before review
Update docs — if you change behaviour, update the relevant .md file

Branch naming:

Type	Pattern
Feature	`feat/short-description`
Bug fix	`fix/short-description`
Docs	`docs/short-description`
Refactor	`refactor/short-description`

Project Structure

services/          # Backend microservices (Python / FastAPI)
ui/                # Frontend (React + Vite)
platform_shared/   # Shared Python modules (JWT, Kafka, models)
spm/               # SPM policy and compliance definitions
opa/               # OPA Rego policies
grafana/           # Dashboard JSON and provisioning config
prometheus/        # Scrape config
tests/             # Unit and integration tests
scripts/           # Dev utilities (JWT minting, etc.)

Reporting Issues

Please open a GitHub Issue and include:

A clear description of the problem
Steps to reproduce
Relevant logs (make logs-api output)
Your environment (OS, Docker version, chip architecture)

Code Style

Python — follow PEP 8; use type hints where practical
JavaScript — standard ESM; no external linting config required
Commits — use Conventional Commits (feat:, fix:, docs:, etc.)

Local SSO with Keycloak + Traefik

The auth overlay adds a full OIDC login flow in front of the admin portal, running entirely on localhost. No real domain or TLS certificate required.

What it adds

Component	Role
Traefik v3	Reverse proxy. Routes `aispm.local` → admin UI via the ForwardAuth middleware. Uses a static file provider — no Docker socket required.
Keycloak 24	OIDC identity provider running in dev mode. Realm config is persisted to `./DataVolumes/keycloak/` (host bind mount) so it survives restarts and `docker compose down -v`.
traefik-forward-auth	Sits in front of every protected route. Inspects every request via `X-Forwarded-Uri` — including `/_oauth` callbacks — and sets a signed session cookie on `aispm.local`.

One-time host setup

Add these entries to /etc/hosts on your Mac:

sudo sh -c 'echo "127.0.0.1  keycloak.local auth.local aispm.local" >> /etc/hosts'

Start / Stop

# Start the full stack with auth overlay:
docker compose -f compose.yml -f compose.auth.yml up -d

# Stop (data preserved in ./DataVolumes/):
docker compose -f compose.yml -f compose.auth.yml down

First-time Keycloak setup

Only required once — Keycloak persists the realm to ./DataVolumes/keycloak/h2/.

Start the stack (see above) then open http://keycloak.local:8180/admin/ (admin / admin)
Top-left dropdown → Create realm → name: aispm → Create
Realm Settings → General tab → Require SSL → set to none → Save
- Required for local-dev HTTP. If left at the default (external), Keycloak rejects every non-localhost request with the page "We are sorry... HTTPS required" and traefik-forward-auth's token exchange silently fails.
- Repeat the same toggle on the master realm if you want to log into the admin UI via keycloak.local (master defaults to external too).
Clients → Create client
- Client ID: traefik-forward-auth
- Turn ON Client authentication → Next
- Valid redirect URIs: http://aispm.local/_oauth
- Web origins: http://aispm.local → Save
Credentials tab → copy Client secret → paste into .env.auth:
```
PROVIDERS_OIDC_CLIENT_SECRET=<paste here>
```
Realm roles → Create role → name: spm:admin → Save. Repeat for spm:auditor.
- The spm-api enforces these via require_admin / require_auditor. Without spm:admin in the JWT roles claim, every integration write endpoint returns 403.
Users → Create user → set username and email → Create
Credentials tab → set password → turn OFF Temporary → Save password
Role mapping tab → Assign role → tick spm:admin → Assign
Restart forward-auth: docker compose -f compose.auth.yml up -d --force-recreate traefik-forward-auth

Scriptable equivalent (kcadm)

If you'd rather skip the UI, the same setup via kcadm — useful when the master realm's "HTTPS required" gate is locking you out of the admin console:

KC=/opt/keycloak/bin/kcadm.sh

docker compose exec keycloak $KC config credentials \
  --server http://localhost:8080 --realm master --user admin --password admin

# Disable SSL gate on both realms (master = admin console, aispm = the app)
docker compose exec keycloak $KC update realms/master -s sslRequired=NONE
docker compose exec keycloak $KC update realms/aispm  -s sslRequired=NONE

# Create the two realm roles spm-api expects
docker compose exec keycloak $KC create roles -r aispm -s name=spm:admin
docker compose exec keycloak $KC create roles -r aispm -s name=spm:auditor

# Create user, set password, assign admin role
docker compose exec keycloak $KC create users -r aispm \
  -s username=dany -s enabled=true -s email=dany@example.com
docker compose exec keycloak $KC set-password -r aispm \
  --username dany --new-password dany
docker compose exec keycloak $KC add-roles -r aispm \
  --uusername dany --rolename spm:admin

Access

URL	What
http://aispm.local/admin	Admin portal (SSO protected — redirects to Keycloak login)
http://keycloak.local:8180/admin/	Keycloak admin console (master realm, `admin`/`admin`)
http://localhost:9091/dashboard/	Traefik routing dashboard

Configuration files

File	Purpose
`compose.auth.yml`	Compose overlay — adds Traefik, Keycloak, and traefik-forward-auth. Keycloak data is bind-mounted from `./DataVolumes/keycloak/`.
`auth/traefik.yml`	Traefik static config (file provider, entrypoints, dashboard on `:9091`)
`auth/traefik-dynamic.yml`	Route + middleware definitions. Important: there is a single `aispm` router covering every path — including `/_oauth`. The SSO middleware itself recognizes the OIDC callback via `X-Forwarded-Uri` and short-circuits it. Do not add a separate router that routes `/_oauth` directly to the forward-auth backend service (e.g. `aispm-oauth: service: auth-svc`) — that strips the `X-Forwarded-*` headers and forward-auth then can't tell it's a callback, falling into an infinite redirect loop.
`.env.auth`	OIDC client ID, client secret, and cookie signing secret

Troubleshooting auth

Symptom	Likely cause	Fix
Browser endlessly bounces between `aispm.local/_oauth?code=...` and Keycloak	`auth/traefik-dynamic.yml` has a router that sends `/_oauth` directly to forward-auth as a backend, skipping the middleware (drops `X-Forwarded-Uri`).	Remove the `/_oauth` router; let the catch-all `aispm` router with the `sso` middleware handle every path.
Keycloak page: "We are sorry… HTTPS required"	Realm `sslRequired` is `external` (default).	`kcadm update realms/<realm> -s sslRequired=NONE` for both `master` and `aispm`, OR access the admin console via `http://localhost:8180/` (localhost bypasses the gate).
Login succeeds but `/_oauth` returns "Cookie not found"	Stale `_forward_auth_csrf` cookies from a previous failed run.	Close all incognito windows (don't just open a new tab) and start a fresh session.
spm-api returns 403 "spm:admin role required" after login	The JWT user has no `spm:admin` realm role.	In Keycloak: realm `aispm` → Users → user → Role mapping → assign `spm:admin`.
`LOG_LEVEL=debug` doesn't take effect	Compose only reloads env on recreate.	`docker compose -f compose.auth.yml up -d --force-recreate traefik-forward-auth`

Persistent data — bind-mounted volumes

Postgres, Keycloak, Redis, Grafana, and the agent-orchestrator all bind-mount their state under ./DataVolumes/ instead of using Docker named volumes. Layout:

DataVolumes/
├── spm-db/                ← Postgres data dir (UID 999 inside container)
├── keycloak/h2/           ← Keycloak embedded H2 DB (realms, users, secrets)
├── redis/                 ← Redis AOF / dump
├── grafana/               ← Grafana SQLite + dashboards
└── agent-orchestrator/    ← Orchestrator SQLite session log

The directories are tracked via .gitkeep; their contents are gitignored (see .gitignore). To reset any one of them, stop the relevant service, rm -rf the directory contents, and restart.

Database migrations (alembic)

Migrations live in spm/alembic/versions/. The CI workflow runs alembic upgrade head automatically, but local containers do not — if you pull new migrations, run them by hand:

cd spm
SPM_DB_URL="postgresql://spm_rw:spmpass@localhost:5432/spm" alembic upgrade head

If the DB's alembic_version row points at a revision that no longer exists in spm/alembic/versions/ (can happen after switching branches or restoring a snapshot), alembic upgrade head errors with Can't locate revision identified by 'NNN'. Reset the bookmark to the latest revision actually present, then re-run:

cd spm
# Replace 003 with whatever is the highest revision file present
SPM_DB_URL="..." alembic stamp --purge 003
SPM_DB_URL="..." alembic upgrade head

The repo's migrations are written to be idempotent (ADD COLUMN IF NOT EXISTS, CREATE INDEX IF NOT EXISTS), so re-running a stamped revision is safe.

Name		Name	Last commit message	Last commit date
Latest commit History 512 Commits
.beads		.beads
.github/workflows		.github/workflows
.idea		.idea
Example agents		Example agents
agent_runtime		agent_runtime
auth		auth
deploy		deploy
docs		docs
flink		flink
grafana		grafana
keys		keys
opa/policies		opa/policies
platform_shared		platform_shared
prometheus		prometheus
scripts		scripts
services		services
spm		spm
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
compose.auth.yml		compose.auth.yml
compose.yml		compose.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
sbom.json		sbom.json

Folders and files

Latest commit

History

Repository files navigation

Orbyx AI SPM - AI Security Posture Management

OrbiX AI SPM

📋 Table of Contents

ℹ️ Project Information

Quick how to deploy 101

Mac OS:

Ubuntu / Debian:

Fedora / RHEL

All Linux

Step 1 — Install K8S (Kind) cluster

Bring-up (clean cluster)

Platform at a Glance

Admin Portal - Overview

A Real-time AI security posture across every agent, model, and data source — inventory, runtime, policies, and threat response unified.

Admin Portal - Dashboard

An AI Security Posture Management control plane providing real-time visibility, risk detection, and policy enforcement across agents, models, and context flows.

Admin Portal - Inventory

Check out the Demo

Security & Access Control

Authentication & Authorization

Prompt Security

Output Security

LLM Integration & Gateway

Model Management

Agent Posture Drift

Agentic Tools

Conversation Memory

Observability & Compliance

Prometheus Metrics

Grafana Dashboards

Audit & Compliance

Behavioural Analytics

Infrastructure & Event Pipeline

Kafka Event Bus

Platform Services

UI & Developer Experience

#TODO: add more screenshorts from the admin portal

Roadmap

Installation

Table of Contents

Prerequisites

Clone & Configure

API Keys

Anthropic (required for Claude responses)

Tavily (required for web search tool)

Groq (required for Threat Hunting Agent + optional guard acceleration)

First Boot

Verify the Platform

Access the UI & Dashboards

Grafana Dashboards

Run the Smoke Test

Stopping & Cleaning Up

Troubleshooting

Services fail to start / make up exits early

cpm-startup-orchestrator not found

Chat UI shows [object Object] error

404 model not found from Anthropic

Grafana panels show "No data"

Port conflict

Threat hunting collectors report column "session_id" does not exist

threat-hunting-agent fails to start — GROQ_API_KEY missing

Rebuilding a single service after code changes

Environment Reference

Quick-Reference Commands

Usage

Table of Contents

Prerequisites

Clone & Configure

API Keys

Anthropic (required for Claude responses)

Tavily (required for web search tool)

Groq (required for Threat Hunting Agent + optional guard acceleration)

First Boot

Verify the Platform

Access the UI & Dashboards

Grafana Dashboards

Services fail to start / `make up` exits early

`cpm-startup-orchestrator` not found

Chat UI shows `[object Object]` error

`404 model not found` from Anthropic

Threat hunting collectors report `column "session_id" does not exist`

`threat-hunting-agent` fails to start — `GROQ_API_KEY` missing

Services fail to start / `make up` exits early

`cpm-startup-orchestrator` not found

Chat UI shows `[object Object]` error

`404 model not found` from Anthropic

Threat hunting collectors report `column "session_id" does not exist`

`threat-hunting-agent` fails to start — `GROQ_API_KEY` missing