Argus

Autonomous Kubernetes security platform combining eBPF-native threat detection, policy enforcement, and an AI reasoning agent for real-time threat response and remediation.

Cluster status

Network flow observability

Live TCP flows between Cilium components captured at the kernel level via eBPF

Node	Role	IP	Status
k3s-master	Control plane	192.168.139.42	✅ Ready
k3s-worker1	Worker	192.168.139.77	✅ Ready
k3s-worker2	Worker	192.168.139.45	✅ Ready

Cilium: v1.15.0 — eBPF mode, kube-proxy replacement enabled Hubble: Relay + UI enabled — live network flow observability active

Hubble UI — 3/3 nodes, 20.6 flows/s

Security status

Falco — Runtime threat detection

Driver: modern_ebpf (CO-RE, no kernel headers required)
Status: ✅ Running on all 3 nodes (DaemonSet)
Output: JSON via HTTP webhook → Argus agent
Test: cat /etc/shadow in container → detected in <1ms, tagged MITRE T1555
Custom rules: shell in prod, outbound connections, /etc writes, curl/wget, privilege escalation

Why Falco for Argus: Argus needs a low-level event stream it can reason over with an AI agent. Falco sits at the syscall layer — below the application, below the container runtime — so it catches things that application-level logging misses entirely: unexpected shell spawns, file reads on sensitive paths, outbound connections from workloads that should be silent. The structured JSON output feeds directly into the agent webhook, giving it a machine-readable event with MITRE ATT&CK tags already attached. No log scraping, no parsing — just a clean signal the agent can act on.

Kyverno — Admission control

Status: ✅ Running (v1.17.1)
Namespace: kyverno
Mode: Enforce (blocks non-compliant workloads at admission)
Policies:
- disallow-root-containers — rejects pods without runAsNonRoot=true
- require-resource-limits — rejects pods missing CPU/memory limits
- approved-registries-only — rejects images outside approved registries

Cilium Network Policies — Zero-trust network segmentation

Status: ✅ Applied
Model: Default deny ingress, explicit allow
Rules:
- prod and staging namespaces: deny all ingress by default
- monitoring namespace: allowed to scrape metrics from prod/staging
- argus-system namespace: allowed to reach prod/staging for agent remediation
- Cross-namespace traffic: blocked and visible as dropped flows in Hubble UI

Agent status

Module 4 — AI Agent Engine

Issue #13: Falco webhook receiver ✅

FastAPI endpoint POST /falco/webhook receiving Falco JSON alerts
Pydantic validation — invalid payloads return 422
Priority normalization (Critical/Warning/Error/Notice)
Deduplication engine — same rule+pod+namespace suppressed for 5 minutes
Background task processing — 202 returned immediately, Falco never blocks
18 unit tests covering validation, dedup, field mapping

Issue #14: Context enricher ✅

Parallel queries to Kubernetes API, Loki, Hubble, and Kyverno with asyncio.gather
5-second timeout with graceful degradation — partial results returned on failure
Pod metadata, recent logs, network flows, and policy violations collected per alert

Issue #15: Claude reasoning layer ✅

Structured JSON decisions: severity, confidence, recommended action, blast radius
Model routing — claude-opus-4-6 for Critical/Error, claude-sonnet-4-6 for others
Prompt caching on system prompt reduces token cost by ~90% on repeat calls
Retry with exponential backoff on rate limit and connection errors

Issue #16: Action router ✅

LOG, NOTIFY, ISOLATE, KILL, and HUMAN_REQUIRED actions
ISOLATE creates a CiliumNetworkPolicy deny-all for the offending pod
KILL requires confidence >= 0.85, falls back to ISOLATE below threshold
HUMAN_REQUIRED queues actions for manual approval via REST API
All decisions shipped to Loki as structured audit log entries

Issue #17: Containerize and deploy ✅

Multi-stage Docker build, non-root user, minimal runtime image
Kubernetes manifests: Deployment, Service, RBAC, Secret template
deploy.sh builds image locally, loads into all 3 k3s nodes via SSH, applies manifests
Liveness and readiness probes on /health
make deploy-agent target wired up

Issue #17: Network policies configured

argus-agent allowed egress to Anthropic API (world entity)
argus-agent allowed egress to monitoring/kube-system/prod/staging namespaces
Loki, Hubble, K8s API all reachable from agent pod

Observability status

Prometheus — Metrics collection

Status: ✅ Running (kube-prometheus-stack)
Retention: 7 days
Targets: alertmanager, apiserver, coredns, node-exporter (3 nodes), kube-state-metrics, kubelet

Grafana — Dashboards

Status: ✅ Running
URL: kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
Login: admin / argus-admin
Dashboards: 25 default Kubernetes dashboards loaded

Loki — Log aggregation

Status: ✅ Running (loki-stack)
Retention: 72 hours (disk-constrained on 20GiB VMs)
Promtail: DaemonSet collecting logs from all pods on all 3 nodes
Falco pipeline: JSON events parsed and labeled by rule, priority, hostname
Query: {app="falco"} returns structured Falco alerts in Grafana Explore
Datasource: http://loki.monitoring.svc.cluster.local:3100

Custom dashboards — Argus security views

Argus / Security Overview — live Falco event stream, critical alert count
Argus / Cluster Health — node CPU/memory, pod restarts
Argus / Policy Violations — Kyverno admission denials
Argus / Network Flows — Cilium eBPF dropped flows, flow rate by verdict
Provisioning: ConfigMap with grafana_dashboard=1 label — survives pod restarts

Screenshots

Critical Falco detections (T1620 fileless execution) ingested and queryable via LogQL

Stack

Layer	Tool	Why
Local cluster	k3s on OrbStack VMs	Lightweight, ARM-native, production-like topology
CNI	Cilium + Hubble	eBPF-native, replaces kube-proxy, L7 network observability
Runtime security	Falco	CNCF-graduated, syscall-level threat detection, structured JSON events
Admission control	Kyverno	K8s-native policy-as-code, rejects non-compliant workloads pre-deployment
mTLS	Linkerd	Automatic zero-trust service encryption, lightweight vs Istio
Metrics	Prometheus	Industry standard scraping and alerting
Logs	Loki + Promtail	Lightweight log aggregation, native Grafana integration
Dashboards	Grafana	Unified observability — metrics, logs, security events
AI Agent	Python + Claude API	Enriches Falco alerts with cluster context, reasons about threat severity, routes remediation actions
UI	React + Tailwind	Real-time incident feed, human approval queue, agent chat

Author

Built by Kaushikkumaran — April 2026

Original architecture, AI agent design, and UI concept. All design decisions documented in docs/decisions/.

Modules

Module	Description	Status
1 — Cluster Foundation	OrbStack VMs, k3s, Cilium, Hubble	✅ Complete
2 — Security Layers	Falco, Kyverno, CiliumNetworkPolicy	✅ Complete
3 — Observability Stack	Prometheus, Grafana, Loki	✅ Complete
4 — AI Agent Engine	Falco webhook → context enrichment → Claude → action router	✅ Complete
5 — Command & Control UI	React + Tailwind cyber console	In Progress

How it works

[fill in after agent is built]

Local setup

Prerequisites

macOS (Apple Silicon M-series)
OrbStack installed (brew install orbstack)
CLI tools: brew install kubectl helm k3sup cilium-cli hubble k9s

Spin up the cluster

make cluster-up

This provisions 3 OrbStack VMs, installs k3s, deploys Cilium with eBPF kube-proxy replacement, enables Hubble, and creates all namespaces.

Verify

make cluster-status
cilium hubble ui

Architecture decisions

See docs/decisions/ for rationale behind every tool choice.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
agent		agent
cluster		cluster
docs		docs
observability		observability
security		security
ui		ui
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
setup.md		setup.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Argus

Cluster status

Network flow observability

Hubble UI — 3/3 nodes, 20.6 flows/s

Security status

Falco — Runtime threat detection

Kyverno — Admission control

Cilium Network Policies — Zero-trust network segmentation

Agent status

Module 4 — AI Agent Engine

Observability status

Prometheus — Metrics collection

Grafana — Dashboards

Loki — Log aggregation

Custom dashboards — Argus security views

Screenshots

Stack

Author

Modules

How it works

Local setup

Prerequisites

Spin up the cluster

Verify

Architecture decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Argus

Cluster status

Network flow observability

Hubble UI — 3/3 nodes, 20.6 flows/s

Security status

Falco — Runtime threat detection

Kyverno — Admission control

Cilium Network Policies — Zero-trust network segmentation

Agent status

Module 4 — AI Agent Engine

Observability status

Prometheus — Metrics collection

Grafana — Dashboards

Loki — Log aggregation

Custom dashboards — Argus security views

Screenshots

Stack

Author

Modules

How it works

Local setup

Prerequisites

Spin up the cluster

Verify

Architecture decisions

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages