chore: make default argo verification more lenient#1021
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 56 minutes and 58 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughUpdated ArgoCD application health verification metrics by modifying success and sample count thresholds and introducing a new failure threshold parameter wired into the verification metric specification. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adjusts the default ArgoCD application verification metric to be more tolerant of transient failures by increasing the number of samples and adding a failure threshold, affecting how the workspace-engine evaluates ArgoCD application health during job verification.
Changes:
- Increase the default verification
Countfrom 10 to 15 (15 minutes at 60s intervals). - Increase
SuccessThresholdfrom 1 to 2 consecutive successes. - Introduce
FailureThreshold(10) to allow continued retries despite some failures.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| IntervalSeconds: 60, | ||
| Count: 10, | ||
| Count: 15, |
There was a problem hiding this comment.
Count was changed from 10 to 15 here, but there’s an existing unit test that asserts specs[0].Count == 10 (apps/workspace-engine/pkg/jobagents/argo/argoapp_test.go:279-290). Please update that test expectation (and any other assertions) so CI doesn’t fail.
| successThreshold := 2 | ||
| failureThreshold := 10 | ||
| failureCondition := "result.statusCode != 200 || result.json.status.health.status == 'Degraded' || result.json.status.health.status == 'Missing'" | ||
| spec := oapi.VerificationMetricSpec{ | ||
| Name: "argocd-application-health", | ||
| IntervalSeconds: 60, | ||
| Count: 10, | ||
| Count: 15, | ||
| SuccessThreshold: &successThreshold, | ||
| FailureThreshold: &failureThreshold, | ||
| SuccessCondition: "result.statusCode == 200 && result.json.status.sync.status == 'Synced' && result.json.status.health.status == 'Healthy'", | ||
| FailureCondition: &failureCondition, |
There was a problem hiding this comment.
With Count: 15 and FailureThreshold: 10, the verification engine will consider the metric successful once 15 measurements are taken as long as total failed measurements are <= 10 (even if the success threshold was never met). Please confirm this is the intended leniency level; if the goal is to require reaching the Healthy/Synced success condition, the thresholds/conditions may need adjusting so the metric can’t pass while remaining perpetually inconclusive or mostly failing.
Summary by CodeRabbit
Release Notes