Skip to content

feat(simulation): introduce deterministic simulation framework, scenario models, and validation pipeline v2.0.0#5

Merged
prashan-s merged 60 commits into
mainfrom
development
Mar 14, 2026
Merged

feat(simulation): introduce deterministic simulation framework, scenario models, and validation pipeline v2.0.0#5
prashan-s merged 60 commits into
mainfrom
development

Conversation

@teransarathchandra
Copy link
Copy Markdown
Member

This PR introduces the core simulation framework and supporting infrastructure used for scenario execution, validation, and operational testing of VM environments.

…ild process

feat(env): enhance .env.example with new rate limiting and telemetry configurations
fix(makefile): correct build and swagger generation commands to use the proper main package
…ication in SimulateAddService

feat(predictive): enhance Evaluator with additional latency metrics and network pressure detection
test(predictive): add tests for sustained traffic and latency spike scenarios in EvaluateFromSamples
…trics, dashboard metrics source values, and graph summary for a given run
teransarathchandra and others added 28 commits March 7, 2026 21:28
Add SimulationRequest schema with all 5 locked scenario types, snapshot
reference fields, stable validation error codes, and 22 tests covering
valid/invalid cases and determinism.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add SimulationResponse schema with required fields (version, scenarioType,
snapshotTimestamp, evidenceSources, evidenceMode, confidenceLevel, assumptions,
impactedServices, impactedPaths, beforeAfterValues, recommendation), degraded-mode
label fields, and ValidateSimulationResponse with stable error codes and 28 tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add EvidenceSourceLabel type with four constants: live_service_graph,
  live_k8s_runtime, historical_influxdb, deterministic_fallback
- Add ResolveEvidenceMode following mandatory tier order: live graph ->
  live runtime -> Influx history -> deterministic fallback
- Add DetermineConfidenceLevel rubric: FULL->HIGH, PARTIAL->MEDIUM,
  DEGRADED/FALLBACK->LOW (no random weighting)
- Add ResolveEvidenceSources, EvidenceModeToTierDescription helpers
- Add evidence_test.go with 18 tests covering all paths and determinism

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add snapshot.go with ComposeSnapshot/ComposeSnapshotAt that capture service
graph and live Kubernetes runtime truth into a SHA-256-hashed, immutable
SimulationSnapshot. All slices are deep-copied and canonically sorted before
hashing so identical inputs always produce the same hash regardless of input
order or call time. Add snapshot_test.go with 14 tests covering determinism,
immutability, order-independence, pointer deep-copy, and hash format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add EvidenceResolverInput, InfluxCheckResult, EvidenceResolverResult,
ResolveEvidenceTiers, and ResolveEvidenceTiersFromSnapshot. Resolver
follows mandatory tier order (graph->runtime->Influx->fallback), degrades
gracefully on Influx unavailability/sparse/error, and never blocks simulation.
22 tests covering all degraded modes, determinism, and snapshot-based resolution.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add execution_core.go with BuildExecutionContext, BuildBaseResponse,
NormalizeResponse, CanonicalizeResponse, and stable sort helpers
(SortImpactedServices, SortImpactedPaths, SortBeforeAfterValues,
SortAssumptions). Add execution_core_test.go with 20 tests verifying
determinism, stable sorting, and byte-equal canonical JSON for
identical inputs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add RunFailureShutdownScenario using the immutable SimulationSnapshot to
compute blast radius, impacted services/paths, deterministic before/after
estimates, declared assumptions, and recommendation tied to evidence fields.
Returns DEFERRED when target is absent from snapshot rather than guessing values.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…US-010)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds RunNetworkCutScenario with deterministic full-cut and partial-degradation
models. Each matched snapshot edge produces before/after BAVs for RPS, error
rate, and latency (P95). Missing links return DEFERRED; partial link match
proceeds with a note. Includes 15 tests covering DEFERRED, full cut, partial
degradation, multi-link, determinism, evidence fields, and field-ref format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add scaling_vm_validation_test.go with 5 reproducible VM test cases
covering scale-up (5→10 pods, approve_scale_up), caution scale-down
(5→3 pods, caution_scale_down), determinism, degraded-mode without
Influx, and a structured pass/fail validation report artifact.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Added traffic_spike_vm_validation_test.go with 5 test functions covering:
- Moderate spike (2×): full outcome assertions (roles, path sigs, incoming_rps,
  latency_p95_ms BAVs, monitor_and_prepare_rate_limits recommendation)
- High-severity spike (4×): pre_emptive_scale_up_required recommendation and BAVs
- Determinism: byte-equivalent canonical JSON across repeated identical runs
- Degraded-mode without Influx: OK status with non-none degraded mode label
- Structured validation report logged to test output as evidence artifact

All 5 tests pass; go build ./... and go test ./pkg/simulation/... both clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n real VMs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add network_cut_vm_validation_test.go with 4 reproducible VM test cases:
full cut (RPS→0, error→1.0, latency omitted, failover recommendation),
30% partial degradation (RPS 200→140, error 0.307, latency 45→58.5ms,
traffic-shaping recommendation), determinism check, and degraded-mode
without InfluxDB. All criteria pass; `go test ./pkg/simulation/...` clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add e2e_degraded_traceability_test.go with 7 test functions covering all 3 ACs
- AC-1: verify DegradedMode label and EvidenceMode returned for empty/sparse InfluxDB
- AC-2: log 27-field UI→contract traceability checklist; assert all 23 required fields populated
- AC-3: confirm unknown scenarios rejected, fallback-only evidence deferred with no guessed values, EnforceDeferredConstraints strips synthetic output
- All 5 supported scenarios verified runnable in degraded mode without blocking
- go test ./pkg/simulation/... and go build ./... pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Introduced a new test file `add_test.go` with comprehensive tests for the SimulateAddService function, covering various scenarios including node feasibility, dependency risks, and shared host resource configurations.
- Enhanced the AddSimulationRequest struct to include TargetNodeName and improved the AddSimulationResult struct to provide more detailed results, including selected node information and aggregate resources.
- Added new types for dependency analysis and risk analysis to better encapsulate the results of service simulations.
@prashan-s prashan-s merged commit a70a491 into main Mar 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants