Skip to content

Conversation

@adityachoudhari26
Copy link
Contributor

@adityachoudhari26 adityachoudhari26 commented Dec 2, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Verification completion notifications now fire only once per verification, preventing duplicates.
  • Changes

    • Failure threshold logic adjusted so checks trigger at a stricter count, changing when a metric is considered failed.
    • Per-metric handling updated: metrics with insufficient measurements remain running rather than being marked incomplete; final aggregation of metric outcomes removed so runs conclude as passed when loop finishes.
  • Tests

    • Tests updated to add one extra measurement in affected cases and to reflect removed failure limit in one path.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

Refines per-metric failure evaluation and completion behavior, prevents duplicate OnVerificationComplete invocations via per-verification tracking, adjusts FailureLimit usage in health-check setup/tests, and extends certain test loops to append one extra measurement.

Changes

Cohort / File(s) Summary
Failure threshold & status logic
apps/workspace-engine/pkg/oapi/oapi.go
Changed failure check from failedCount >= failureLimit to failedCount > failureLimit; removed aggregation paths that tracked all-completed/any-failed and altered handling of per-metric incomplete measurements to return ReleaseVerificationStatusRunning immediately.
Health-check FailureLimit updates
apps/workspace-engine/pkg/workspace/jobdispatch/argocd.go, apps/workspace-engine/pkg/workspace/store/release_targets_test.go
Added a local failureLimit and set FailureLimit: &failureLimit for the ArgoCD health metric; removed the FailureLimit field in the Cancelled-path health-check metric within a test helper.
Completion hook lifecycle (scheduler)
apps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler.go
Introduced completionHookFired map[string]bool on the scheduler to ensure OnVerificationComplete is invoked at most once per verification; initialize in constructor and clear on StopVerification; guard hook invocation in runMeasurement.
Test measurement boundary changes
apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go
Adjusted test loops from i < FailureLimit to i <= FailureLimit, causing one additional VerificationMeasurement to be appended in each affected test case.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect apps/workspace-engine/pkg/oapi/oapi.go for semantic correctness of the new threshold and removal of aggregation—verify all return paths and status values.
  • Review concurrency and lifecycle around completionHookFired in scheduler.go for possible races and ensure cleanup on StopVerification.
  • Confirm tests in manager_test.go and release_targets_test.go reflect intended assertions for the extra measurement and updated FailureLimit usage.

Poem

🐇 I tapped the counter, hopped once more,

Hooks sleep now when they've run before,
Tests took an extra jaunty leap,
Metrics wake and patterns keep,
A tiny tweak — the garden's neat.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly reflects the two main changes: fixing how failure limits are respected in the verification logic and fixing completion hooks to fire only once.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-failure-limits

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a9fd4d9 and 524adac.

📒 Files selected for processing (1)
  • apps/workspace-engine/pkg/oapi/oapi.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
apps/workspace-engine/**/*.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Files:

  • apps/workspace-engine/pkg/oapi/oapi.go
🧠 Learnings (1)
📓 Common learnings
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
🧬 Code graph analysis (1)
apps/workspace-engine/pkg/oapi/oapi.go (2)
apps/workspace-engine/pkg/oapi/oapi.gen.go (2)
  • ReleaseVerificationStatusFailed (118-118)
  • ReleaseVerificationStatusRunning (120-120)
apps/workspace-engine/pkg/workspace/releasemanager/verification/metrics/measurements.go (1)
  • Measurements (8-8)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: tests
  • GitHub Check: build (linux/amd64)
  • GitHub Check: Typecheck
  • GitHub Check: Lint
  • GitHub Check: workspace-engine-tests
🔇 Additional comments (1)
apps/workspace-engine/pkg/oapi/oapi.go (1)

166-168: LGTM on the failure limit fix.

Changing from >= to > correctly implements failure limit semantics—failureLimit=1 now allows 1 failure before failing on the 2nd. This aligns with the test updates that add an extra measurement beyond the limit.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
apps/workspace-engine/pkg/oapi/oapi.go (1)

156-191: Dead code: anyFailed is never set to true.

The anyFailed variable is declared on line 157 but is never assigned true anywhere in the function. The check on lines 186-188 is therefore unreachable. If this behavior is intentional (i.e., verification passes as long as no metric exceeds its failure limit), remove the dead code for clarity.

 func (rv *ReleaseVerification) Status() ReleaseVerificationStatus {
 	if len(rv.Metrics) == 0 {
 		return ReleaseVerificationStatusRunning
 	}

 	allCompleted := true
-	anyFailed := false

 	for _, metric := range rv.Metrics {
 		// Check if this metric has hit its failure limit
 		failureLimit := metric.GetFailureLimit()
 		failedCount := 0
 		for _, m := range metric.Measurements {
 			if !m.Passed {
 				failedCount++
 			}
 		}

 		if failedCount > failureLimit {
 			return ReleaseVerificationStatusFailed
 		}

 		// Check if metric is complete
 		if len(metric.Measurements) < metric.Count {
 			allCompleted = false
 			continue
 		}
 	}

 	// If any metric is incomplete, still running
 	if !allCompleted {
 		return ReleaseVerificationStatusRunning
 	}

-	// All metrics complete
-	if anyFailed {
-		return ReleaseVerificationStatusFailed
-	}
-
 	return ReleaseVerificationStatusPassed
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 85fd31a and 52c0cc7.

📒 Files selected for processing (5)
  • apps/workspace-engine/pkg/oapi/oapi.go (1 hunks)
  • apps/workspace-engine/pkg/workspace/jobdispatch/argocd.go (1 hunks)
  • apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go (2 hunks)
  • apps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler.go (3 hunks)
  • apps/workspace-engine/pkg/workspace/store/release_targets_test.go (0 hunks)
💤 Files with no reviewable changes (1)
  • apps/workspace-engine/pkg/workspace/store/release_targets_test.go
🧰 Additional context used
📓 Path-based instructions (2)
apps/workspace-engine/**/*.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Files:

  • apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go
  • apps/workspace-engine/pkg/workspace/jobdispatch/argocd.go
  • apps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler.go
  • apps/workspace-engine/pkg/oapi/oapi.go
apps/workspace-engine/**/*_test.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

Follow the existing test structure used in *_test.go files

Files:

  • apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go
🧠 Learnings (6)
📓 Common learnings
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*_test.go : Follow the existing test structure used in *_test.go files

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Write comprehensive, data-driven tests for new condition types

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Use table-driven tests for all condition types

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Include edge cases in tests (empty values, special characters, unicode) for condition types

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Test validation and matching logic separately for condition types

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go
🧬 Code graph analysis (3)
apps/workspace-engine/pkg/workspace/jobdispatch/argocd.go (1)
apps/workspace-engine/pkg/oapi/oapi.gen.go (1)
  • VerificationMetricSpec (756-772)
apps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler.go (2)
apps/workspace-engine/pkg/workspace/store/store.go (1)
  • Store (42-68)
apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
  • VerificationHooks (8-23)
apps/workspace-engine/pkg/oapi/oapi.go (2)
apps/workspace-engine/pkg/oapi/oapi.gen.go (1)
  • ReleaseVerificationStatusFailed (118-118)
apps/workspace-engine/pkg/workspace/releasemanager/verification/metrics/measurements.go (1)
  • Measurements (8-8)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Format
  • GitHub Check: Typecheck
  • GitHub Check: Lint
  • GitHub Check: tests
  • GitHub Check: build (linux/amd64)
  • GitHub Check: workspace-engine-tests
🔇 Additional comments (7)
apps/workspace-engine/pkg/workspace/jobdispatch/argocd.go (1)

338-348: LGTM - Sensible default failure limit for ArgoCD health verification.

The addition of failureLimit = 5 provides reasonable tolerance for transient failures during ArgoCD application health checks. With Count: 5 and FailureLimit: 5, this allows all measurements to potentially fail without triggering an immediate verification failure (since the status logic uses failedCount > failureLimit).

apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go (2)

382-390: Test correctly updated to match new failure threshold logic.

The loop now iterates <= FailureLimit to produce FailureLimit + 1 failed measurements, which correctly triggers the Failed status under the updated failedCount > failureLimit comparison in oapi.go.


433-441: Consistent update for mixed states test.

Same adjustment as above, ensuring the failed verification has enough failures to trigger the Failed status with the new threshold logic.

apps/workspace-engine/pkg/oapi/oapi.go (1)

169-171: Verify the off-by-one boundary for failure limit.

The change from >= to > means that with FailureLimit: 0 (the default meaning "no limit"), the check failedCount > 0 will fail on the first failure. This may not be the intended behavior for "no limit".

Consider whether FailureLimit: 0 should truly mean "no limit" (never fail due to failure count) or "fail on first failure". If "no limit" is intended, this check should skip entirely when failureLimit == 0:

-		if failedCount > failureLimit {
+		if failureLimit > 0 && failedCount > failureLimit {
 			return ReleaseVerificationStatusFailed
 		}
apps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler.go (3)

22-24: Good addition of per-verification completion tracking.

The completionHookFired map properly ensures OnVerificationComplete is invoked at most once per verification lifecycle, preventing duplicate notifications when multiple metrics complete concurrently.


93-93: Correct cleanup of completion hook state.

Removing the completionHookFired entry when stopping a verification ensures that if the verification is restarted, the completion hook can fire again appropriately.


313-322: Thread-safe idempotent completion hook invocation.

The check and set of completionHookFired[verificationID] is properly protected by the mutex acquired at line 240, ensuring the hook fires exactly once even with concurrent metric completions.

@jsbroks jsbroks merged commit d793f37 into main Dec 3, 2025
8 checks passed
@jsbroks jsbroks deleted the fix-failure-limits branch December 3, 2025 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants