Skip to content

Conversation

@lanycrost
Copy link
Contributor

@lanycrost lanycrost commented Oct 16, 2025

harden reconcile context, validation, and storage guards

  • Reconcile context/timeout:

    • Apply controller-configured deadlines across StoryRun/StepRun/RealtimeEngram.
    • Ensure status patches and external calls use the passed ctx; no background contexts.
  • Conditions/status/metrics:

    • Standardize condition reasons via pkg/conditions; consistent status patch helpers.
    • Record reconcile duration metrics; emit user-facing Events where actionable.
  • StoryRun/DAG:

    • Guard oversized spec.inputs (prevent etcd bloat); rely on watches for progress.
    • RBACManager to provision per-run SA/Role/RoleBinding with least privilege.
    • DAG avoids embedding step outputs in parent; resolves from StepRuns on demand.
    • Streaming improvements (per-story/per-run handling) and safe finalization.
  • StepRun:

    • Idempotent finalizer add/remove; background cleanup for owned Jobs.
    • Fallback status updates (Succeeded/Failed) with exit code classification.
    • Wire SDK envs (BUBU_*) and optional S3 storage envs from resolved config.
  • RealtimeEngram:

    • Finalizer lifecycle; service/workload reconcile; template resolution with Ready.
  • Indexing:

    • Add field indexers (templateRef, story refs, engram refs) for efficient watches.
  • Webhooks (validating):

    • StoryRun: require storyRef; inputs shape/size check; JSON schema validation.
    • StepRun: require storyRunRef/stepId; input/status size guard; total size guard.
    • Add shared validation helpers.
  • Config:

    • OperatorConfigManager/Resolver extensions (timeouts, security, retries, GRPC).
    • Safer defaults for inline size and timeouts.
  • CI/Lint:

    • Update release-please workflow/config and golangci settings.

Pre-flight:

  • go vet ./... OK
  • go build ./... OK
  • go test ./... -race OK

…torage guards

- Reconcile context/timeout:
  - Apply controller-configured deadlines across StoryRun/StepRun/RealtimeEngram.
  - Ensure status patches and external calls use the passed ctx; no background contexts.

- Conditions/status/metrics:
  - Standardize condition reasons via pkg/conditions; consistent status patch helpers.
  - Record reconcile duration metrics; emit user-facing Events where actionable.

- StoryRun/DAG:
  - Guard oversized spec.inputs (prevent etcd bloat); rely on watches for progress.
  - RBACManager to provision per-run SA/Role/RoleBinding with least privilege.
  - DAG avoids embedding step outputs in parent; resolves from StepRuns on demand.
  - Streaming improvements (per-story/per-run handling) and safe finalization.

- StepRun:
  - Idempotent finalizer add/remove; background cleanup for owned Jobs.
  - Fallback status updates (Succeeded/Failed) with exit code classification.
  - Wire SDK envs (BUBU_*) and optional S3 storage envs from resolved config.

- RealtimeEngram:
  - Finalizer lifecycle; service/workload reconcile; template resolution with Ready.

- Indexing:
  - Add field indexers (templateRef, story refs, engram refs) for efficient watches.

- Webhooks (validating):
  - StoryRun: require storyRef; inputs shape/size check; JSON schema validation.
  - StepRun: require storyRunRef/stepId; input/status size guard; total size guard.
  - Add shared validation helpers.

- Config:
  - OperatorConfigManager/Resolver extensions (timeouts, security, retries, GRPC).
  - Safer defaults for inline size and timeouts.

- CI/Lint:
  - Update release-please workflow/config and golangci settings.

Pre-flight:
- go vet ./... OK
- go build ./... OK
- go test ./... -race OK
@lanycrost lanycrost changed the title feat(runs,webhooks,core): harden reconcile context, validation, and s… harden reconcile context, validation, and storage guards Oct 16, 2025
@lanycrost lanycrost merged commit 379dda0 into main Oct 16, 2025
12 checks passed
@lanycrost lanycrost deleted the feat/hardening-reconcile-webhooks-runs branch October 16, 2025 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants