feat(simulation): introduce deterministic simulation framework, scenario models, and validation pipeline v2.0.0 by teransarathchandra · Pull Request #5 · GraphFoundry/analysis-engine

teransarathchandra · 2026-03-13T16:10:31Z

This PR introduces the core simulation framework and supporting infrastructure used for scenario execution, validation, and operational testing of VM environments.

…ild process feat(env): enhance .env.example with new rate limiting and telemetry configurations fix(makefile): correct build and swagger generation commands to use the proper main package

…g and deployment readiness checks

…r anomaly detection

…ication in SimulateAddService feat(predictive): enhance Evaluator with additional latency metrics and network pressure detection test(predictive): add tests for sustained traffic and latency spike scenarios in EvaluateFromSamples

…ackVerifiedAt, and bannerVerified

…ment

…trics, dashboard metrics source values, and graph summary for a given run

…each layer when available

…rics, and graph

…ric name, expected value, actual value)

…when failed

…system confirmation source

…ear API error

…ack verification source

…or validation

Add SimulationRequest schema with all 5 locked scenario types, snapshot reference fields, stable validation error codes, and 22 tests covering valid/invalid cases and determinism. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add SimulationResponse schema with required fields (version, scenarioType, snapshotTimestamp, evidenceSources, evidenceMode, confidenceLevel, assumptions, impactedServices, impactedPaths, beforeAfterValues, recommendation), degraded-mode label fields, and ValidateSimulationResponse with stable error codes and 28 tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add EvidenceSourceLabel type with four constants: live_service_graph, live_k8s_runtime, historical_influxdb, deterministic_fallback - Add ResolveEvidenceMode following mandatory tier order: live graph -> live runtime -> Influx history -> deterministic fallback - Add DetermineConfidenceLevel rubric: FULL->HIGH, PARTIAL->MEDIUM, DEGRADED/FALLBACK->LOW (no random weighting) - Add ResolveEvidenceSources, EvidenceModeToTierDescription helpers - Add evidence_test.go with 18 tests covering all paths and determinism Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add snapshot.go with ComposeSnapshot/ComposeSnapshotAt that capture service graph and live Kubernetes runtime truth into a SHA-256-hashed, immutable SimulationSnapshot. All slices are deep-copied and canonically sorted before hashing so identical inputs always produce the same hash regardless of input order or call time. Add snapshot_test.go with 14 tests covering determinism, immutability, order-independence, pointer deep-copy, and hash format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add EvidenceResolverInput, InfluxCheckResult, EvidenceResolverResult, ResolveEvidenceTiers, and ResolveEvidenceTiersFromSnapshot. Resolver follows mandatory tier order (graph->runtime->Influx->fallback), degrades gracefully on Influx unavailability/sparse/error, and never blocks simulation. 22 tests covering all degraded modes, determinism, and snapshot-based resolution. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add execution_core.go with BuildExecutionContext, BuildBaseResponse, NormalizeResponse, CanonicalizeResponse, and stable sort helpers (SortImpactedServices, SortImpactedPaths, SortBeforeAfterValues, SortAssumptions). Add execution_core_test.go with 20 tests verifying determinism, stable sorting, and byte-equal canonical JSON for identical inputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add RunFailureShutdownScenario using the immutable SimulationSnapshot to compute blast radius, impacted services/paths, deterministic before/after estimates, declared assumptions, and recommendation tied to evidence fields. Returns DEFERRED when target is absent from snapshot rather than guessing values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…US-010) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds RunNetworkCutScenario with deterministic full-cut and partial-degradation models. Each matched snapshot edge produces before/after BAVs for RPS, error rate, and latency (P95). Missing links return DEFERRED; partial link match proceeds with a note. Includes 15 tests covering DEFERRED, full cut, partial degradation, multi-link, determinism, evidence fields, and field-ref format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…pology

Add scaling_vm_validation_test.go with 5 reproducible VM test cases covering scale-up (5→10 pods, approve_scale_up), caution scale-down (5→3 pods, caution_scale_down), determinism, degraded-mode without Influx, and a structured pass/fail validation report artifact. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Added traffic_spike_vm_validation_test.go with 5 test functions covering: - Moderate spike (2×): full outcome assertions (roles, path sigs, incoming_rps, latency_p95_ms BAVs, monitor_and_prepare_rate_limits recommendation) - High-severity spike (4×): pre_emptive_scale_up_required recommendation and BAVs - Determinism: byte-equivalent canonical JSON across repeated identical runs - Degraded-mode without Influx: OK status with non-none degraded mode label - Structured validation report logged to test output as evidence artifact All 5 tests pass; go build ./... and go test ./pkg/simulation/... both clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n real VMs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add network_cut_vm_validation_test.go with 4 reproducible VM test cases: full cut (RPS→0, error→1.0, latency omitted, failover recommendation), 30% partial degradation (RPS 200→140, error 0.307, latency 45→58.5ms, traffic-shaping recommendation), determinism check, and degraded-mode without InfluxDB. All criteria pass; `go test ./pkg/simulation/...` clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add e2e_degraded_traceability_test.go with 7 test functions covering all 3 ACs - AC-1: verify DegradedMode label and EvidenceMode returned for empty/sparse InfluxDB - AC-2: log 27-field UI→contract traceability checklist; assert all 23 required fields populated - AC-3: confirm unknown scenarios rejected, fallback-only evidence deferred with no guessed values, EnforceDeferredConstraints strips synthetic output - All 5 supported scenarios verified runnable in degraded mode without blocking - go test ./pkg/simulation/... and go build ./... pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…run cases

…n store

- Introduced a new test file `add_test.go` with comprehensive tests for the SimulateAddService function, covering various scenarios including node feasibility, dependency risks, and shared host resource configurations. - Enhanced the AddSimulationRequest struct to include TargetNodeName and improved the AddSimulationResult struct to provide more detailed results, including selected node information and aggregate resources. - Added new types for dependency analysis and risk analysis to better encapsulate the results of service simulations.

teransarathchandra added 30 commits March 5, 2026 13:59

feat(docker): add .dockerignore and update Dockerfile for improved bu…

dcd50a9

…ild process feat(env): enhance .env.example with new rate limiting and telemetry configurations fix(makefile): correct build and swagger generation commands to use the proper main package

feat: Minor Changes in Dockerfile

7d405b6

feat(drills): enhance MigrateServiceAction with schedulerName handlin…

23199c9

…g and deployment readiness checks

feat: Fix Deployment Flow

ba8243d

feat: Minor Changes

076e90e

feat(predictive): add PredictiveCurrentActionHandler and evaluator fo…

bf67fd2

…r anomaly detection

feat: Minor Changes in Docker and sqlite.

2eb45f7

feat: Minor Improvements

461a7b6

feat: Fix Minor Issues in Drills

6605dc9

feat: Add run metadata fields for scenarioId, validationStatus, rollb…

be7f4f3

…ackVerifiedAt, and bannerVerified

feat: migration is generated and applies cleanly in local/dev environ…

1eeee12

…ment

feat: Existing run records remain readable after migration

013d86e

feat: Typecheck passes

f376dd1

feat: Scenario Catalog API returns scenarios in stable order.

e2e2088

feat: Response is marked non-cacheable (Cache-Control: no-store)

1781d13

feat: Typecheck passes

5944510

feat: Add endpoint returning snapshot fields for VM state, backend me…

4812f29

…trics, dashboard metrics source values, and graph summary for a given run

feat: Endpoint includes snapshot timestamp and source timestamps for …

14edea3

…each layer when available

feat: Endpoint bypasses cached responses

3ebd64f

feat: Typecheck passes

5c45508

feat: Add comparison output with per-layer status for VM, API, UI met…

2afb0aa

…rics, and graph

feat: Comparison output includes explicit field-level mismatches (met…

7c1d7f4

…ric name, expected value, actual value)

feat: Comparison output includes overall scenario verdict and reason …

077d455

…when failed

feat: Typecheck passes

a185a75

feat: Run state records rollback verification timestamp and operator/…

6734175

…system confirmation source

feat: Attempting to start next scenario without rollback returns a cl…

9916e5f

…ear API error

feat: Typecheck passes

f676629

feat: Enhance recovery source inference to prioritize persisted rollb…

5cffae4

…ack verification source

feat: Add expected outcome metadata for scenarios and enhance tests f…

13d99d3

…or validation

teransarathchandra and others added 28 commits March 7, 2026 21:28

feat: Add namespace handling and configuration for API requests

6191d68

feat: implement Scaling up/down scenario model (US-008)

283310c

feat: implement Traffic Spike / targeted load scenario model (US-009)

6d0c374

feat: implement chatty-service co-location/migration scenario model (…

5ba4eb3

…US-010) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: Add weak-scenario defer/remove guardrails

9e22677

feat: Add recommendation traceability fields

d337374

feat: US-020 validate Failure/Service Shutdown scenario on real VM to…

c7646e0

…pology

feat: US-023 validate chatty-service co-location/migration scenario o…

3a303a8

…n real VMs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add /simulations/run endpoint for simulation execution

8c7f9f5

feat: ensure InfluxDB database exists during client initialization

94881c7

feat: add endpoint to verify drill rollback and handle missing drill …

26dfb57

…run cases

feat: enhance simulation handler to log decisions and include decisio…

9a83457

…n store

feat: Minor Changes

d489bb9

feat: Minor Changes

aa1e4bb

Minor Changes

fcd3d46

teransarathchandra requested a review from DulangaMW March 13, 2026 16:10

prashan-s merged commit a70a491 into main Mar 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(simulation): introduce deterministic simulation framework, scenario models, and validation pipeline v2.0.0#5

feat(simulation): introduce deterministic simulation framework, scenario models, and validation pipeline v2.0.0#5
prashan-s merged 60 commits into
mainfrom
development

teransarathchandra commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teransarathchandra commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants