Skip to content

ajjs1ajjs/Infrastructure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Infrastructure Auto-Documentation Platform

Self-hosted, agentless platform for auto-discovery, live inventory, topology mapping, change tracking, and alerting for Windows / Linux / Docker / VM / SNMP infrastructure.

Backend: Go Frontend: Next.js UI: Tailwind v4 DB: SQLite/Postgres License: MIT

A modern, lightweight, self-hosted alternative to enterprise CMDB tools. Discovers your entire fleet via SSH, WinRM, Hyper-V and SNMP — no agents required on target hosts. Includes real-time change tracking, alerting to Telegram/Discord/Email, encrypted credentials, and JWT-based RBAC.


✨ Features

🔍 Agentless Discovery

  • Subnet Auto-Discovery (CIDR) — sweep networks to auto-add live hosts
  • Linux & Windows over SSH (PowerShell over OpenSSH)
  • Windows over WinRM (automatic SSH → WinRM fallback)
  • Multi-Hypervisor VM Inventory — parent/child VM discovery for Hyper-V, VMware ESXi, Proxmox VE, XCP-ng, and Nutanix AHV
  • Cloud Inventory — automatic background resource scans for AWS, GCP, and Azure
  • Kubernetes Collector — discovers cluster-wide Nodes, Pods, Services, and Ingresses
  • SNMP v2c / v3 for network gear (switches, routers, printers)
  • Auto-classification of VMs vs physical servers

📊 Live Inventory

  • Hostname, IP, OS, CPU, RAM, uptime
  • Disks, services, listening ports, Docker containers
  • Installed software, OS updates/patches
  • Local users, admin privileges
  • SSL/TLS certificates with expiry tracking
  • Network routes, system log alerts

🕸️ Topology Mapping

  • Automatic discovery of TCP connections between hosts
  • VM ↔ Hypervisor parent/child relationships
  • React Flow powered interactive graph (drag, zoom, pan, MiniMap)
  • Type/search filtering and click-through host details

🛡️ Compliance, Self-Healing & Security

  • Vulnerability Scanning (CVE): Integrates with OSV.dev to detect known vulnerabilities in installed software.
  • Desired State Policy Engine for both standalone hosts and Kubernetes clusters
  • Security audits: checks for direct root SSH logins and privileged K8s containers
  • Drift detection: flags service, software package, listening port, and user privilege changes on hosts
  • Kubernetes Audits: checks workload replica drifts, missing expected workloads, and orphaned resources
  • Automatic self-healing: executes alignment scripts over SSH/WinRM, and PATCH/DELETE API calls to Kubernetes cluster to automatically remediate compliance drifts

🔐 Security & Multi-Tenancy (MSP)

  • Single Sign-On (SSO) — seamless Google OAuth2 login
  • Multi-Tenancy Workspaces — isolated tenant groups for MSP operations
  • Workspace Switcher — administrative switcher with automatic JWT token re-generation and workspace-scoped data reloading
  • AES-256-GCM encrypted credential storage (passwords & private keys)
  • JWT authentication (HS256, 24 h token)
  • RBAC: admin, operator, viewer roles
  • Master key from ENCRYPTION_KEY env or auto-generated .autodoc.key (0600)

🔄 Change Tracking

  • Snapshot-based diff engine: detects new/removed services, packages, ports, disks, users
  • Severity classification (info / warning / critical)
  • Live activity feed with WebSocket push
  • Per-type statistics and historical snapshots

🚨 Alerting & Notifications

  • Telegram, Discord, Email (SMTP), and Generic Webhook channels
  • Mobile Push Notifications via Firebase Cloud Messaging (FCM) / APNS fallback
  • Voice TTS Calls phone-call alerts via Twilio Text-to-Speech bridge
  • DSL-style conditions: disk_usage > 90, cert_days_left < 14, host_offline, etc.
  • Per-rule severity, channel routing, and cooldown (anti-spam)
  • Alert event log with delivery status

⚡ Real-time & AI Assistant

  • AI Assistant — chat assistant with real-time Server-Sent Events (SSE) streaming responses
  • WebSocket hub pushes scan:start, scan:complete, host:status, host:change, alert:fired, hypervisor:vm events
  • Auto-reconnect with exponential back-off in the UI
  • Falls back to 30 s polling if WS is unavailable

📝 Auto Documentation

  • One-click Markdown report generation (GET /api/docs/generate)
  • Beautiful tables grouped per host
  • Hypervisor VM summary, disks, ports, services, software, users, certs, alerts

⚙️ Operations, IaC & Database

  • Infrastructure-as-Code (IaC) Export — 1-click download of Ansible Inventories & Terraform States
  • Data Visualization — sleek analytics dashboards with live Recharts integration
  • TimescaleDB Hypertable — auto-configures metrics_histories into a partitioned hypertable when running on PostgreSQL
  • One-click installers for Linux (install.sh) and Windows (install.ps1)
  • Periodic background scanning (30 min default)
  • Demo mode (IP=demo and Name=demo) for offline simulation showcase
  • REST API for all entities
  • SQLite (default, zero-config) or PostgreSQL (Docker)
  • Single static binary serving embedded Next.js export

🚀 Quick Start

Option 1 — Use the installer (recommended)

Linux / Ubuntu:

curl -fsSL https://raw.githubusercontent.com/ajjs1ajjs/Infrastructure/main/scripts/install.sh -o install.sh
chmod +x install.sh
sudo ./install.sh

Windows (PowerShell as Administrator):

iwr https://raw.githubusercontent.com/ajjs1ajjs/Infrastructure/main/scripts/install.ps1 -OutFile install.ps1
.\install.ps1

Then open http://<server-ip>:8080 in your browser. Default credentials: admin / admin — change them immediately in production.

Option 2 — Run from source

# 1. Start PostgreSQL (optional, SQLite is used by default)
cd docker && docker compose up -d && cd ..

# 2. Backend
cd backend
go mod download
go run .
# → listening on :8070

# 3. Frontend (separate terminal)
cd frontend
npm install
npm run build      # produces static export in ./out
# serve ./out with any static server, or let the Go backend serve it

Option 3 — Demo Mode (no real hosts required)

  1. Run the platform
  2. Open Settings and add a host with IP demo
  3. Click Scan — realistic mock infrastructure is generated (2 servers, 1 hypervisor, ~40 inventory rows)

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                       Next.js 16 (UI)                           │
│  Dashboard · Hosts · Topology · Activity · Alerts · Settings    │
│   ▲                                                             │
│   │  REST /api/*  (JWT Bearer)                                  │
│   │  WS   /ws?token=…  (events: scan, change, alert)            │
└───┼─────────────────────────────────────────────────────────────┘
    │
┌───▼─────────────────────────────────────────────────────────────┐
│                   Go backend (Gin + GORM)                       │
│                                                                 │
│  auth (JWT+RBAC)  │  crypto (AES-256-GCM)  │  ws (gorilla)     │
│  scanner (SSH/WinRM/Hyper-V/SNMP)  │  diff (snapshot engine)    │
│  alerts (Telegram/Discord/Email/Webhook)  │  docs (Markdown)     │
└──┬───────────────┬────────────────┬────────────────┬────────────┘
   │               │                │                │
   ▼               ▼                ▼                ▼
┌──────┐       ┌──────────┐    ┌──────────┐    ┌──────────┐
│ SSH  │       │  WinRM   │    │  SNMP    │    │ SQLite / │
│      │       │  (PS)    │    │  v2c/v3  │    │ Postgres │
└──┬───┘       └─────┬────┘    └─────┬────┘    └──────────┘
   ▼                 ▼               ▼
[Linux/Windows    [Windows VMs]  [Network gear
  hosts]                          & appliances]

Protocol strategy:

  1. Primary: SSH (OpenSSH on Windows / native on Linux)
  2. Fallback: WinRM (HTTP 5985 / HTTPS 5986)
  3. Hyper-V: PowerShell over WinRM → Get-VM / Get-VMNetworkAdapter
  4. Network gear: SNMP v2c / v3 via gosnmp

📂 Project Structure

.
├── backend/                       # Go 1.25
│   ├── api/                       # REST handlers (Gin)
│   │   ├── handlers.go            # Hosts / credentials / hypervisors / docs
│   │   ├── auth_handlers.go       # JWT login / me / users
│   │   ├── alert_handlers.go      # Channels / rules / events
│   │   ├── change_handlers.go     # Change history / snapshots / stats
│   │   └── network_handlers.go    # Network gear deep-dive
│   ├── auth/auth.go               # JWT + RBAC middleware
│   ├── crypto/crypto.go           # AES-256-GCM service
│   ├── ws/ws.go                   # WebSocket hub (gorilla/websocket)
│   ├── alerts/alerts.go           # Rule engine + dispatchers
│   ├── db/db.go                   # GORM bootstrap
│   ├── models/models.go           # 22 entities
│   ├── scanner/                   # Discovery engines
│   │   ├── scheduler.go           # 30-min background loop + diff/alerts
│   │   ├── ssh_collector.go       # Linux + Windows-over-SSH
│   │   ├── winrm_collector.go
│   │   ├── hyperv_collector.go
│   │   ├── snmp_collector.go      # Network gear via gosnmp
│   │   ├── snmp_deep.go           # Deep 7-MIB walk (LLDP/CDP/MAC/VLAN/ports)
│   │   ├── diff.go                # Snapshot diff engine
│   │   ├── demo.go                # Mock data
│   │   └── scanner_test.go
│   ├── main.go                    # Entry point, seeds admin, starts alert loop
│   ├── go.mod
│   └── go.sum
├── frontend/                      # Next.js 16 + React 19 + Tailwind v4
│   └── src/
│       ├── app/
│       │   ├── layout.tsx         # Sidebar + topbar shell
│       │   ├── page.tsx           # Dashboard
│       │   ├── hosts/             # Host inventory detail
│   │   ├── topology/          # React Flow graph
│   │   ├── network/           # Network gear deep-dive
│   │   ├── changes/           # Activity feed
│   │   ├── alerts/            # Channel/rule/event manager
│   │   ├── settings/          # Credentials & hypervisors
│       │   ├── login/             # JWT login page
│       │   └── TopBar.tsx         # Auth-aware header
│       └── lib/
│           ├── api.ts             # fetch wrapper with JWT
│           └── ws.ts              # useWebSocket hook
├── docker/docker-compose.yml      # PostgreSQL 15
├── scripts/
│   ├── install.sh                 # Linux one-click installer
│   └── install.ps1                # Windows one-click installer
├── implementation_plan.md         # Architectural decisions (UA)
├── Infrastructure.pdf             # Product vision & roadmap
├── LICENSE                        # MIT
└── README.md                      # ← you are here

🛠️ Tech Stack

Layer Technology
Backend Go 1.25, Gin, GORM, gorilla/websocket, gosnmp, golang-jwt, bcrypt, robfig/cron
Frontend Next.js 16, React 19, Tailwind CSS v4, React Flow (@xyflow/react), lucide
Database SQLite (default) or PostgreSQL 15 (Docker)
Protocols SSH, WinRM, Hyper-V, SNMP v2c/v3, Prometheus HTTP, Veeam, Bacula, restic
Security AES-256-GCM credentials, JWT (HS256), bcrypt, RBAC
Packaging Single static binary serving embedded Next.js export

⚙️ Configuration

Backend reads from environment variables (with sensible defaults):

Variable Default Description
PORT 8070 HTTP listen port
DB_TYPE sqlite sqlite or postgres
DB_PATH autodoc.db SQLite file path
DB_HOST localhost PostgreSQL host
DB_PORT 5432 PostgreSQL port
DB_USER autodoc_user PostgreSQL user
DB_PASSWORD autodoc_pass PostgreSQL password
DB_NAME autodoc PostgreSQL database name
GIN_MODE debug Set to release in production
JWT_SECRET dev fallback Set in production! HS256 secret
ENCRYPTION_KEY auto-generated .autodoc.key base64-encoded 32-byte AES key
OLLAMA_URL http://localhost:11434 AI Assistant base URL (any OpenAI-compatible)
OLLAMA_MODEL llama3.2 Default chat model

🔌 REST API (abridged)

# Auth
POST   /api/auth/login                 # { username, password } → { token }
GET    /api/auth/me                    # current user

# Credentials
GET    /api/credentials
POST   /api/credentials
DELETE /api/credentials/:id

# Hypervisors
GET    /api/hypervisors
POST   /api/hypervisors
DELETE /api/hypervisors/:id
POST   /api/hypervisors/:id/scan

# Hosts
GET    /api/hosts
POST   /api/hosts
GET    /api/hosts/:id
DELETE /api/hosts/:id
POST   /api/hosts/:id/scan

# Topology & docs
GET    /api/topology
GET    /api/docs/generate              # Markdown report (download)

# Changes
GET    /api/changes?host_id=&severity=&limit=
GET    /api/snapshots?host_id=
GET    /api/changes/stats              # grouped counts

# Alerts
GET    /api/alerts/channels
POST   /api/alerts/channels
DELETE /api/alerts/channels/:id
GET    /api/alerts/rules
POST   /api/alerts/rules
DELETE /api/alerts/rules/:id
GET    /api/alerts/events?limit=50

# Analytics
GET    /api/analytics/summary
GET    /api/analytics/software/publishers
GET    /api/analytics/software/names
GET    /api/analytics/os
GET    /api/analytics/host-status
GET    /api/analytics/top-ports
GET    /api/analytics/cert-expiry
GET    /api/analytics/disk-leaderboard

# Reports
GET    /api/reports/schedules
POST   /api/reports/schedules
PUT    /api/reports/schedules/:id
DELETE /api/reports/schedules/:id
POST   /api/reports/schedules/:id/run
GET    /api/reports/runs

# Backups
GET    /api/backups/jobs
POST   /api/backups/jobs
DELETE /api/backups/jobs/:id
POST   /api/backups/jobs/:id/scan

# External monitoring
GET    /api/monitoring/sources
POST   /api/monitoring/sources
DELETE /api/monitoring/sources/:id
GET    /api/monitoring/sources/:id/series
GET    /api/monitoring/host/:id

# AI (Ollama-compatible local LLM)
GET    /api/ai/status                   # daemon reachable? + available models
POST   /api/ai/ask                      # { question } → { answer }
POST   /api/ai/rca                      # { host_id, alert } → { answer }

# Network gear (SNMP deep data)
GET    /api/network/devices             # list network hosts
GET    /api/network/devices/:id         # full detail: neighbors, MACs, VLANs, ports, summary
POST   /api/network/devices/:id/scan    # [admin] trigger re-scan
GET    /api/network/topology            # all LLDP/CDP edges across devices

# Real-time
WS     /ws?token=<jwt>                 # event stream

🧪 Testing

cd backend
go test ./...

The included test suite covers parsing of df, ss, and docker ps outputs.


🛣️ Roadmap

Version Theme Status Released
v1 MVP — inventory & docs ✅ Shipped f786d4f
v2 Topology, alerts, security ✅ Shipped 76f0aca
v3 Monitoring, backups, AI ✅ Shipped 0c62628
v4 Network gear deep-dive ✅ Shipped b5f185b
v5+ Cloud, K8s, multi-tenant 💭 Exploring

🔒 Security notes

  • The default admin/admin user is seeded on first run for convenience. Change it immediately in any non-toy deployment.
  • JWT_SECRET and ENCRYPTION_KEY default to insecure fallbacks. Set them to random 32+ byte values in production.
  • The encryption key file .autodoc.key is created with 0600 permissions and stored next to the binary. Mount it on a persistent, restricted volume.
  • All API endpoints (except POST /api/auth/login and the WS handshake) require a valid JWT.

🤝 Contributing

PRs and issues are welcome. For major changes please open an issue first to discuss the design.

📄 License

MIT — see LICENSE.


?? Autonomous Infrastructure Engine Development Plan

??? Phase 0: Baseline

Key focus areas:

  • Discovery: Kubernetes (Pods, Services, Ingress) and Hypervis_ors.
  • Self-Healing: Automatic remediation of Pods and cluster drift.
  • Security: Monitoring for Presence of Privileged Containers.
  • Deep Investigation: Detailed analysis of host-level changes and events.
  • Cleanup: Automated removal of "orphaned" resources (Drift Removal).
  • RBAC: Fine-grained access control (Admin/Operator/Viewer).

?? Phase 1: Predictive Analytics (The "Oracle" Module)

Goal: Forecasting resource exhaustion ("Disk will be full soon") and identifying trends ("Usage growth rate is increasing").

Key Features:

  1. Data Ingestion (Time-Series):
    • Collecting metrics (CPU, RAM, Disk, Network) from all discovered hosts (MetricsHistory).
    • Periodic worker for aggregating and storing historical data.
  2. Trend Analysis Engine:
    • Go-based regression analysis for growth rate (Growth Rate).
    • Predicting "Time-to-Exhaustion" (e.g., "Disk will be full in 48h").
  3. Proactive Alerting:
    • Calculating a "Risk Score" for every host/service.
    • WebSocket push for real-time alerts: "Warning: Disk on Node X will be full in 48h".

??? Phase 2: Integrity & Compliance (The "Guardian" Module)

Goal: Ensuring infrastructure state matches desired configuration (GitOps-driven).

Key Features:

  1. IaC Drift Detection:
    • Comparing Live State (from K8sCollector/Hyper-V) against Desired State (JSON/YAML/Helm/Terraform).
    • Detecting changes in ReplicasCount, ConfigMaps, or labels.
  2. Compliance Auditing:
    • Checking infrastructure against security benchmarks (NIST, CIS).
    • Generating a "Compliance Score" per cluster/host.
  3. Auto-Remediation (Advanced):
    • Triggering "Rollback" or "Re-apply" actions when drift is detected.

?? Phase 3: Global Orchestration (The "Navigator" Module)

Goal: Managing distributed infrastructure across multiple clouds and regions.

Key Features:

  1. Multi-Cloud Integration:
    • Unified provider support for AWS (EKS), GCP (GKE), and Azure (AKS).
  2. Cross-Cluster Workload Management:
    • Global Load Balancing and workload migration between clusters.
  3. Unified Dashboard:
    • Single pane of glass for all clusters, regions, and clouds.

??? Technologies (Preview)

  • Analytics: Go-based Regression Models.
  • Database: PostgreSQL + TimescaleDB (for high-scale metrics storage).
  • Config/State: Git (GitOps pattern) + YAML/JSON.
  • Cloud SDKs: AWS SDK for Go, Google Cloud SDK.

? How to start?

Start with Phase 1 (Predictive Analytics), integrating with the existing K8sCollector to build the foundation for "Trend Analyzer".

Projected Focus: "Trend Analyzer"