Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
dcd50a9
feat(docker): add .dockerignore and update Dockerfile for improved bu…
teransarathchandra Mar 5, 2026
7d405b6
feat: Minor Changes in Dockerfile
teransarathchandra Mar 6, 2026
23199c9
feat(drills): enhance MigrateServiceAction with schedulerName handlin…
teransarathchandra Mar 6, 2026
ba8243d
feat: Fix Deployment Flow
teransarathchandra Mar 6, 2026
076e90e
feat: Minor Changes
teransarathchandra Mar 6, 2026
bf67fd2
feat(predictive): add PredictiveCurrentActionHandler and evaluator fo…
teransarathchandra Mar 6, 2026
b0facb1
feat(simulation): add SharedHostResources support for resource dedupl…
teransarathchandra Mar 6, 2026
2eb45f7
feat: Minor Changes in Docker and sqlite.
teransarathchandra Mar 7, 2026
461a7b6
feat: Minor Improvements
teransarathchandra Mar 7, 2026
6605dc9
feat: Fix Minor Issues in Drills
teransarathchandra Mar 7, 2026
be7f4f3
feat: Add run metadata fields for scenarioId, validationStatus, rollb…
teransarathchandra Mar 7, 2026
1eeee12
feat: migration is generated and applies cleanly in local/dev environ…
teransarathchandra Mar 7, 2026
013d86e
feat: Existing run records remain readable after migration
teransarathchandra Mar 7, 2026
f376dd1
feat: Typecheck passes
teransarathchandra Mar 7, 2026
e2e2088
feat: Scenario Catalog API returns scenarios in stable order.
teransarathchandra Mar 7, 2026
1781d13
feat: Response is marked non-cacheable (Cache-Control: no-store)
teransarathchandra Mar 7, 2026
5944510
feat: Typecheck passes
teransarathchandra Mar 7, 2026
4812f29
feat: Add endpoint returning snapshot fields for VM state, backend me…
teransarathchandra Mar 7, 2026
14edea3
feat: Endpoint includes snapshot timestamp and source timestamps for …
teransarathchandra Mar 7, 2026
3ebd64f
feat: Endpoint bypasses cached responses
teransarathchandra Mar 7, 2026
5c45508
feat: Typecheck passes
teransarathchandra Mar 7, 2026
2afb0aa
feat: Add comparison output with per-layer status for VM, API, UI met…
teransarathchandra Mar 7, 2026
7c1d7f4
feat: Comparison output includes explicit field-level mismatches (met…
teransarathchandra Mar 7, 2026
077d455
feat: Comparison output includes overall scenario verdict and reason …
teransarathchandra Mar 7, 2026
a185a75
feat: Typecheck passes
teransarathchandra Mar 7, 2026
6734175
feat: Run state records rollback verification timestamp and operator/…
teransarathchandra Mar 7, 2026
9916e5f
feat: Attempting to start next scenario without rollback returns a cl…
teransarathchandra Mar 7, 2026
f676629
feat: Typecheck passes
teransarathchandra Mar 7, 2026
5cffae4
feat: Enhance recovery source inference to prioritize persisted rollb…
teransarathchandra Mar 7, 2026
13d99d3
feat: Add expected outcome metadata for scenarios and enhance tests f…
teransarathchandra Mar 7, 2026
03df370
feat: Add tests for drill execution and rollback transition gate enfo…
teransarathchandra Mar 7, 2026
d4676bf
feat: Typecheck passes
teransarathchandra Mar 7, 2026
6191d68
feat: Add namespace handling and configuration for API requests
teransarathchandra Mar 7, 2026
7f39b58
feat: define versioned simulation scenario request contract (US-001)
teransarathchandra Mar 8, 2026
71d8479
feat: US-002 define canonical simulation response contract
teransarathchandra Mar 8, 2026
b2575ea
feat: define evidence labels, modes, and confidence rubric (US-003)
teransarathchandra Mar 8, 2026
7e3bf0d
feat: US-004 build immutable snapshot composer
teransarathchandra Mar 8, 2026
bdd2352
feat: implement tiered evidence resolver (US-005)
teransarathchandra Mar 8, 2026
051f922
feat: implement deterministic simulation execution core (US-006)
teransarathchandra Mar 8, 2026
084469c
feat: implement Failure / Service Shutdown scenario model (US-007)
teransarathchandra Mar 8, 2026
283310c
feat: implement Scaling up/down scenario model (US-008)
teransarathchandra Mar 8, 2026
6d0c374
feat: implement Traffic Spike / targeted load scenario model (US-009)
teransarathchandra Mar 8, 2026
5ba4eb3
feat: implement chatty-service co-location/migration scenario model (…
teransarathchandra Mar 8, 2026
60c0de8
feat: implement Network Cut / degradation scenario model (US-011)
teransarathchandra Mar 8, 2026
9e22677
feat: Add weak-scenario defer/remove guardrails
teransarathchandra Mar 8, 2026
d337374
feat: Add recommendation traceability fields
teransarathchandra Mar 8, 2026
c7646e0
feat: US-020 validate Failure/Service Shutdown scenario on real VM to…
teransarathchandra Mar 8, 2026
92f7643
feat: validate scaling up/down scenario on real VMs (US-021)
teransarathchandra Mar 8, 2026
87f897d
feat: US-022 validate Traffic Spike scenario on real VM topology
teransarathchandra Mar 8, 2026
3a303a8
feat: US-023 validate chatty-service co-location/migration scenario o…
teransarathchandra Mar 8, 2026
74595ca
feat: validate network cut/degradation scenario on real VMs (US-024)
teransarathchandra Mar 8, 2026
c12be23
feat: US-025 end-to-end degraded-mode and traceability validation
teransarathchandra Mar 8, 2026
8c7f9f5
feat: add /simulations/run endpoint for simulation execution
teransarathchandra Mar 8, 2026
94881c7
feat: ensure InfluxDB database exists during client initialization
teransarathchandra Mar 9, 2026
26dfb57
feat: add endpoint to verify drill rollback and handle missing drill …
teransarathchandra Mar 9, 2026
9a83457
feat: enhance simulation handler to log decisions and include decisio…
teransarathchandra Mar 9, 2026
9ea212d
Add simulation tests and enhance AddSimulationRequest structure
teransarathchandra Mar 9, 2026
d489bb9
feat: Minor Changes
teransarathchandra Mar 9, 2026
aa1e4bb
feat: Minor Changes
teransarathchandra Mar 9, 2026
fcd3d46
Minor Changes
teransarathchandra Mar 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Secrets & config
.env
.env.*
!.env.example
.drills-kubeconfig

# Built binaries
bin/
analysis-engine
!analysis-engine/
*.exe
*.dll
*.so
*.dylib

# SQLite runtime data
data/*.db
data/*.db-wal
data/*.db-shm

# VCS & IDE
.git/
.gitignore
.vscode/
.idea/
.DS_Store

# Dev artifacts
Makefile
README.md
*.pem
*.test
*.out
docs/docs.go
16 changes: 15 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Graph Engine Service API
SERVICE_GRAPH_ENGINE_URL=http://localhost:3000
# GRAPH_ENGINE_BASE_URL=http://localhost:3000 # alternative name, takes precedence if set
GRAPH_API_TIMEOUT_MS=20000
OVERVIEW_NAMESPACE=default

# Simulation Parameters
DEFAULT_LATENCY_METRIC=p95
Expand All @@ -11,6 +13,10 @@ MIN_LATENCY_FACTOR=0.6
TIMEOUT_MS=20000
MAX_PATHS_RETURNED=10

# Set to true when cluster nodes share the same physical host (e.g. minikube docker driver).
# When false (default), each node is treated as having dedicated resources (AKS, VMs, etc.).
SHARED_HOST_RESOURCES=false

# Server Configuration
PORT=7000

Expand All @@ -22,9 +28,16 @@ INFLUX_HOST=http://localhost:8181
INFLUX_TOKEN=my-token
INFLUX_DATABASE=telemetry

# Rate Limiting
RATE_LIMIT_WINDOW_MS=60000
RATE_LIMIT_MAX=60

# SQLite Configuration (for decision logging)
SQLITE_DB_PATH=./data/decisions.db

# Telemetry Configuration
TELEMETRY_ENABLED=true

# Telemetry Worker Configuration
TELEMETRY_WORKER_ENABLED=true
# Poll interval: 10000ms = 10 seconds (faster updates for development)
Expand All @@ -35,7 +48,7 @@ TELEMETRY_POLL_INTERVAL_MS=10000
# and the PollWorker is disabled. Set to false to keep legacy polling behaviour.
WEBHOOK_ENABLED=true
# Shared secret for HMAC signature verification (must match service-graph-engine WEBHOOK_SECRET)
WEBHOOK_SECRET=be1c37b54c4fc71a3d2203836013e736f67966fa46eb534019ffbe1127239d40
WEBHOOK_SECRET=change-me-to-a-random-hex-string
# Shared secret used when forwarding graph webhooks to dashboard.
# If empty, WEBHOOK_SECRET is used for forwarding as a fallback.
WEBHOOK_FORWARD_SECRET=
Expand All @@ -56,6 +69,7 @@ WEBHOOK_ACCEPT_LEGACY_SIGNATURE=true
# Drill Director / Kubernetes execution (optional for local drill runs)
# If not set, the drill engine will try in-cluster config first, then default kubeconfig loading rules.
# DRILLS_KUBECONFIG_PATH=/absolute/path/to/kubeconfig
# DRILLS_KUBECONFIG=/absolute/path/to/kubeconfig # alternative name
# DRILLS_KUBE_CONTEXT=your-context
# DRILLS_KUBE_API_SERVER=https://your-cluster-api-server
# DRILLS_LOADGEN_DEPLOYMENT=loadgenerator
Expand Down
51 changes: 12 additions & 39 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,54 +1,27 @@
# Build stage
FROM golang:1.22-alpine AS builder
# ---------- Build stage ----------
FROM golang:1.25 AS builder

WORKDIR /app

# Copy go mod and sum files
COPY go.mod go.sum ./
RUN go mod download

# Copy source code
COPY cmd/ ./cmd/
COPY pkg/ ./pkg/
COPY . .

# Build the application
# CGO_ENABLED=1 is needed for go-sqlite3, which requires gcc.
# So we need to install build-base in alpine.
RUN apk add --no-cache build-base
RUN CGO_ENABLED=1 GOOS=linux go build -o predictive-analysis-engine ./cmd/server
RUN CGO_ENABLED=0 GOOS=linux \
go build -ldflags="-s -w" \
-o analysis-engine ./cmd/analysis-engine

# Production stage
FROM alpine:3.19

WORKDIR /app

# Create non-root user (matching Node Dockerfile)
RUN addgroup -g 1001 appgroup && \
adduser -u 1001 -G appgroup -s /bin/sh -D appuser

# Install runtime dependencies (sqlite libs if dynamic, but also wget for healthcheck)
# ca-certificates for HTTPS
RUN apk add --no-cache ca-certificates wget sqlite-libs
# ---------- Runtime stage ----------
FROM gcr.io/distroless/base-debian12

# Copy binary from builder
COPY --from=builder /app/predictive-analysis-engine .

# Create data directory for SQLite
RUN mkdir -p /app/data && \
chown -R appuser:appgroup /app/data

# Set ownership
RUN chown -R appuser:appgroup /app
WORKDIR /app

# Switch to non-root user
USER appuser
COPY --from=builder /app/analysis-engine /app/analysis-engine

# Expose port (default 5000)
EXPOSE 5000

# Health check (Parity with Node: wget -qO- http://localhost:${PORT:-5000}/health || exit 1)
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD wget -qO- http://localhost:${PORT:-5000}/health || exit 1
USER nonroot:nonroot

# Start server
CMD ["./predictive-analysis-engine"]
CMD ["/app/analysis-engine"]
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ DOCKER_IMAGE=predictive-analysis-engine-go
PORT=5000

build:
go build -o $(BINARY_NAME) ./cmd/server
go build -o $(BINARY_NAME) ./cmd/analysis-engine

run: build
./$(BINARY_NAME)
Expand All @@ -27,7 +27,7 @@ docker-run:


swagger:
go run github.com/swaggo/swag/v2/cmd/swag@latest init -g cmd/server/main.go --output docs --v3.1
go run github.com/swaggo/swag/v2/cmd/swag@latest init -g cmd/analysis-engine/main.go --output docs --v3.1

swagger-check: swagger
if [ -n "$$(git status --porcelain docs)" ]; then \
Expand Down
46 changes: 29 additions & 17 deletions cmd/analysis-engine/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package main

import (
"context"
"encoding/json"
"fmt"
"log"
"net/http"
Expand All @@ -19,9 +20,9 @@ import (
"predictive-analysis-engine/pkg/clients/telemetry"
"predictive-analysis-engine/pkg/config"
"predictive-analysis-engine/pkg/drills"
"predictive-analysis-engine/pkg/predictive"
"predictive-analysis-engine/pkg/simulation"
"predictive-analysis-engine/pkg/storage"
"predictive-analysis-engine/pkg/worker"
)

// @title Predictive Analysis Engine API
Expand Down Expand Up @@ -49,6 +50,7 @@ func main() {
if err != nil {
log.Fatalf("Failed to load config: %v", err)
}
config.Init(cfg)

log.Printf("Predictive Analysis Engine started on port %d", cfg.Server.Port)
log.Printf("Graph Engine URL: %s", cfg.GraphAPI.BaseURL)
Expand All @@ -65,7 +67,7 @@ func main() {

simService := simulation.NewService(cfg, graphClient, store)

apiHandler := api.NewHandler(cfg, graphClient, simService)
apiHandler := api.NewHandler(cfg, graphClient, simService, store)
decisionsHandler := &api.DecisionsHandler{Store: store}
telemetryHandler := &api.TelemetryHandler{Client: telemetryClient, Cfg: cfg}

Expand All @@ -82,7 +84,7 @@ func main() {
UsersEnvName: cfg.Drills.TargetedLoadUsersEnv,
},
})
drillsHandler := &api.DrillsHandler{Engine: drillEngine, Store: store}
drillsHandler := &api.DrillsHandler{Engine: drillEngine, Store: store, GraphClient: graphClient}

r := chi.NewRouter()

Expand All @@ -106,27 +108,41 @@ func main() {
r.Post("/simulate/add", apiHandler.SimulateAddHandler)
r.Get("/simulate/context", apiHandler.SimulateContextHandler)
r.Get("/simulations/capabilities", apiHandler.SimulationCapabilitiesHandler)
r.Post("/simulations/run", apiHandler.SimulationsRunHandler)
r.Get("/demo/snapshots", apiHandler.DemoSnapshotsHandler)
r.Get("/dependency-graph/snapshot", apiHandler.DependencyGraphHandler)
r.Get("/predictive/actions/current", apiHandler.PredictiveCurrentActionHandler)

decisionsHandler.RegisterRoutes(r)
drillsHandler.RegisterRoutes(r)
r.Mount("/telemetry", telemetryHandler.Routes())

// Webhook endpoint: receives graph updates from service-graph-engine
webhookHandler := api.NewWebhookHandler(cfg, telemetryClient, store)
// and triggers predictive analysis on each update
predEvaluator := predictive.NewEvaluator(graphClient)
webhookHandler := api.NewWebhookHandler(cfg, telemetryClient, store, predEvaluator)
r.Post("/webhook/graph-update", webhookHandler.HandleGraphUpdate)
r.Get("/webhook/status", webhookHandler.HandleWebhookStatus)
apiHandler.WebhookHandler = webhookHandler

// Only start PollWorker if webhook mode is disabled (fallback)
var pollWorker *worker.PollWorker
if !cfg.Webhook.Enabled {
log.Println("Webhook mode disabled - starting PollWorker for backward compatibility")
pollWorker = worker.NewPollWorker(cfg, graphClient, telemetryClient)
pollWorker.Start()
} else {
log.Println("Webhook mode enabled - PollWorker disabled (data pushed via POST /webhook/graph-update)")
}
// Runtime config reload endpoint
r.Post("/admin/reload-config", func(w http.ResponseWriter, r *http.Request) {
var body struct {
Env map[string]string `json:"env"`
}
_ = json.NewDecoder(r.Body).Decode(&body)
if err := config.ReloadWithOverrides("/etc/runtime-config/runtime.env", body.Env); err != nil {
log.Printf("[CONFIG] Reload failed: %v", err)
w.WriteHeader(http.StatusInternalServerError)
w.Write([]byte(fmt.Sprintf(`{"status":"error","message":"%s"}`, err.Error())))
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"status":"reloaded"}`))
})

log.Println("Webhook mode active - analysis triggered via POST /webhook/graph-update")

addr := fmt.Sprintf(":%d", cfg.Server.Port)
srv := &http.Server{
Expand All @@ -153,10 +169,6 @@ func main() {
log.Printf("Server forced to shutdown: %v", err)
}

if pollWorker != nil {
pollWorker.Stop()
}

telemetryClient.Close()

log.Println("Server exited")
Expand Down
12 changes: 10 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,19 @@ require (
github.com/google/uuid v1.6.0
github.com/influxdata/influxdb-client-go/v2 v2.14.0
github.com/joho/godotenv v1.5.1
github.com/mattn/go-sqlite3 v1.14.33
github.com/swaggo/http-swagger/v2 v2.0.2
github.com/swaggo/swag/v2 v2.0.0-rc5
k8s.io/api v0.35.1
k8s.io/apimachinery v0.35.1
k8s.io/client-go v0.35.1
modernc.org/sqlite v1.46.1
)

require (
github.com/KyleBanks/depth v1.2.1 // indirect
github.com/apapsch/go-jsonmerge/v2 v2.0.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/emicklei/go-restful/v3 v3.12.2 // indirect
github.com/fxamacker/cbor/v2 v2.9.0 // indirect
github.com/go-logr/logr v1.4.3 // indirect
Expand All @@ -37,17 +39,21 @@ require (
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/ncruces/go-strftime v1.0.0 // indirect
github.com/oapi-codegen/runtime v1.0.0 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
github.com/spf13/pflag v1.0.9 // indirect
github.com/sv-tools/openapi v0.4.0 // indirect
github.com/swaggo/files/v2 v2.0.2 // indirect
github.com/swaggo/swag v1.16.6 // indirect
github.com/x448/float16 v0.8.4 // indirect
go.yaml.in/yaml/v2 v2.4.3 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/exp v0.0.0-20251023183803-a4bb9ffd2546 // indirect
golang.org/x/mod v0.32.0 // indirect
golang.org/x/net v0.49.0 // indirect
golang.org/x/oauth2 v0.30.0 // indirect
Expand All @@ -61,10 +67,12 @@ require (
gopkg.in/evanphx/json-patch.v4 v4.13.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/api v0.35.1 // indirect
k8s.io/klog/v2 v2.130.1 // indirect
k8s.io/kube-openapi v0.0.0-20250910181357-589584f1c912 // indirect
k8s.io/utils v0.0.0-20251002143259-bc988d571ff4 // indirect
modernc.org/libc v1.67.6 // indirect
modernc.org/mathutil v1.7.1 // indirect
modernc.org/memory v1.11.0 // indirect
sigs.k8s.io/json v0.0.0-20250730193827-2d320260d730 // indirect
sigs.k8s.io/randfill v1.0.0 // indirect
sigs.k8s.io/structured-merge-diff/v6 v6.3.0 // indirect
Expand Down
Loading