fix: respect failure limits and fix completion hooks #725

adityachoudhari26 · 2025-12-02T22:39:21Z

Summary by CodeRabbit

Bug Fixes
- Verification completion notifications now fire only once per verification, preventing duplicates.
Changes
- Failure threshold logic adjusted so checks trigger at a stricter count, changing when a metric is considered failed.
- Per-metric handling updated: metrics with insufficient measurements remain running rather than being marked incomplete; final aggregation of metric outcomes removed so runs conclude as passed when loop finishes.
Tests
- Tests updated to add one extra measurement in affected cases and to reflect removed failure limit in one path.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-02T22:39:31Z

Walkthrough

Refines per-metric failure evaluation and completion behavior, prevents duplicate OnVerificationComplete invocations via per-verification tracking, adjusts FailureLimit usage in health-check setup/tests, and extends certain test loops to append one extra measurement.

Changes

Cohort / File(s)	Summary
Failure threshold & status logic `apps/workspace-engine/pkg/oapi/oapi.go`	Changed failure check from `failedCount >= failureLimit` to `failedCount > failureLimit`; removed aggregation paths that tracked all-completed/any-failed and altered handling of per-metric incomplete measurements to return `ReleaseVerificationStatusRunning` immediately.
Health-check FailureLimit updates `apps/workspace-engine/pkg/workspace/jobdispatch/argocd.go`, `apps/workspace-engine/pkg/workspace/store/release_targets_test.go`	Added a local `failureLimit` and set `FailureLimit: &failureLimit` for the ArgoCD health metric; removed the `FailureLimit` field in the Cancelled-path health-check metric within a test helper.
Completion hook lifecycle (scheduler) `apps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler.go`	Introduced `completionHookFired map[string]bool` on the scheduler to ensure `OnVerificationComplete` is invoked at most once per verification; initialize in constructor and clear on `StopVerification`; guard hook invocation in `runMeasurement`.
Test measurement boundary changes `apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go`	Adjusted test loops from `i < FailureLimit` to `i <= FailureLimit`, causing one additional `VerificationMeasurement` to be appended in each affected test case.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Inspect apps/workspace-engine/pkg/oapi/oapi.go for semantic correctness of the new threshold and removal of aggregation—verify all return paths and status values.
Review concurrency and lifecycle around completionHookFired in scheduler.go for possible races and ensure cleanup on StopVerification.
Confirm tests in manager_test.go and release_targets_test.go reflect intended assertions for the extra measurement and updated FailureLimit usage.

Poem

🐇 I tapped the counter, hopped once more,

Hooks sleep now when they've run before,
Tests took an extra jaunty leap,
Metrics wake and patterns keep,
A tiny tweak — the garden's neat.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly reflects the two main changes: fixing how failure limits are respected in the verification logic and fixing completion hooks to fire only once.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-failure-limits

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a9fd4d9 and 524adac.

📒 Files selected for processing (1)

apps/workspace-engine/pkg/oapi/oapi.go (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

apps/workspace-engine/**/*.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Files:

apps/workspace-engine/pkg/oapi/oapi.go

🧠 Learnings (1)

📓 Common learnings

Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.

Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.

🧬 Code graph analysis (1)

apps/workspace-engine/pkg/oapi/oapi.go (2)

apps/workspace-engine/pkg/oapi/oapi.gen.go (2)

ReleaseVerificationStatusFailed (118-118)

ReleaseVerificationStatusRunning (120-120)

apps/workspace-engine/pkg/workspace/releasemanager/verification/metrics/measurements.go (1)

Measurements (8-8)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: tests
GitHub Check: build (linux/amd64)
GitHub Check: Typecheck
GitHub Check: Lint
GitHub Check: workspace-engine-tests

🔇 Additional comments (1)

apps/workspace-engine/pkg/oapi/oapi.go (1)

166-168: LGTM on the failure limit fix.

Changing from >= to > correctly implements failure limit semantics—failureLimit=1 now allows 1 failure before failing on the 2nd. This aligns with the test updates that add an extra measurement beyond the limit.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

apps/workspace-engine/pkg/oapi/oapi.go (1)

156-191: Dead code: anyFailed is never set to true.

The anyFailed variable is declared on line 157 but is never assigned true anywhere in the function. The check on lines 186-188 is therefore unreachable. If this behavior is intentional (i.e., verification passes as long as no metric exceeds its failure limit), remove the dead code for clarity.

 func (rv *ReleaseVerification) Status() ReleaseVerificationStatus {
 	if len(rv.Metrics) == 0 {
 		return ReleaseVerificationStatusRunning
 	}

 	allCompleted := true
-	anyFailed := false

 	for _, metric := range rv.Metrics {
 		// Check if this metric has hit its failure limit
 		failureLimit := metric.GetFailureLimit()
 		failedCount := 0
 		for _, m := range metric.Measurements {
 			if !m.Passed {
 				failedCount++
 			}
 		}

 		if failedCount > failureLimit {
 			return ReleaseVerificationStatusFailed
 		}

 		// Check if metric is complete
 		if len(metric.Measurements) < metric.Count {
 			allCompleted = false
 			continue
 		}
 	}

 	// If any metric is incomplete, still running
 	if !allCompleted {
 		return ReleaseVerificationStatusRunning
 	}

-	// All metrics complete
-	if anyFailed {
-		return ReleaseVerificationStatusFailed
-	}
-
 	return ReleaseVerificationStatusPassed
 }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 85fd31a and 52c0cc7.

📒 Files selected for processing (5)

apps/workspace-engine/pkg/oapi/oapi.go (1 hunks)
apps/workspace-engine/pkg/workspace/jobdispatch/argocd.go (1 hunks)
apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go (2 hunks)
apps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler.go (3 hunks)
apps/workspace-engine/pkg/workspace/store/release_targets_test.go (0 hunks)

💤 Files with no reviewable changes (1)

apps/workspace-engine/pkg/workspace/store/release_targets_test.go

🧰 Additional context used

📓 Path-based instructions (2)

apps/workspace-engine/**/*.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Files:

apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go
apps/workspace-engine/pkg/workspace/jobdispatch/argocd.go
apps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler.go
apps/workspace-engine/pkg/oapi/oapi.go

apps/workspace-engine/**/*_test.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

Follow the existing test structure used in *_test.go files

Files:

apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go

🧠 Learnings (6)

📓 Common learnings

Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.

Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.