Skip to content

Commit 8fa3e47

Browse files
authored
fix(health): remove CronJob progressing/suspended status (#24430)
Signed-off-by: Alexandre Gaudreault <alexandre_gaudreault@intuit.com>
1 parent ae16c00 commit 8fa3e47

File tree

3 files changed

+48
-27
lines changed

3 files changed

+48
-27
lines changed

docs/operator-manual/upgrading/3.1-3.2.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -40,20 +40,25 @@ The `ManifestRequest` and `RepoServerAppDetailsQuery` messages are used by the f
4040

4141
## CronJob Health
4242

43-
This realease introduce the addition of CronJob's health, a longtime omitted heath status for a native Kubernetes resource.
44-
The health of a CronJob is based on whether or not Jobs are currently running, and if the last completed Job was successful.
43+
After the upgrade, Application's status may transition to `Degraded` depending on the CronJob health.
4544

46-
After the upgrade, Application's status may transition to `Degraded`, `Progressing` or `Suspended` depending on the CronJob health.
45+
!!! note "CronJob with running jobs"
4746

48-
If the CronJob is permanently `Suspended`, then the aggregated health of the Application will now be `Suspended` instead of `Healthy`.
49-
If the CronJob is permanently running jobs, then the aggregated health of the Application will now be `Progressing` instead of `Healthy`.
47+
If the CronJob is `Degraded` and a new job is scheduled, the health will change to `Healthy` until the active job completes.
48+
This may cause your application to go from `Degraded` to `Healthy` to `Degraded` again. The CronJob status does not contain enough
49+
information to infer the health of the last completed job if there are active jobs.
5050

51-
If you do not want your CronJob to affect the Application's aggregated Health, you can configure the annotation
52-
`argocd.argoproj.io/ignore-healthcheck: "true"` on the CronJob resource.
51+
If the CronJob constantly has active jobs, the health will be constantly `Healthy` even if the last job failed.
52+
53+
!!! note "CronJob with suspended state"
5354

54-
The health can also be configured globally using the `resource.customizations.health.batch_CronJob` configuration to change the default behaviour.
55+
If the CronJob is in a suspended state, the CronJob status will remain Healthy. You can override this behaviour by configuring the
56+
health check using the `resource.customizations.health.batch_CronJob` key in the argocd-cm ConfigMap.
5557

56-
## Breaking Changes
58+
If you decide to do so and the CronJob is `Suspended`, then the aggregated health of the Application will now be `Suspended` instead of `Healthy`.
59+
60+
If you do not want your CronJob to affect the Application's aggregated Health, you can configure the annotation
61+
`argocd.argoproj.io/ignore-healthcheck: "true"` on the CronJob resource.
5762
5863
## Sanitized project API response
5964

resource_customizations/batch/CronJob/health.lua

Lines changed: 28 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,52 @@
11
hs = {}
22

33
if obj.spec.suspend == true then
4-
hs.status = "Suspended"
4+
-- Set to Healthy insted of Suspended until bug is resolved
5+
-- See https://github.com/argoproj/argo-cd/issues/24428
6+
hs.status = "Healthy"
57
hs.message = "CronJob is Suspended"
68
return hs
79
end
810

911
if obj.status ~= nil then
10-
if obj.status.active ~= nil and table.getn(obj.status.active) > 0 then
11-
-- We could be Progressing very often, depending on the Cron schedule, which would bubble up
12-
-- to the Application health. If this is undesired, the annotation `argocd.argoproj.io/ignore-healthcheck: "true"`
13-
-- can be added on the CronJob.
14-
hs.status = "Progressing"
15-
hs.message = string.format("Waiting for %d Jobs to complete", table.getn(obj.status.active))
16-
return hs
17-
end
12+
if obj.status.lastScheduleTime ~= nil then
13+
14+
-- Job is running its first execution and has not yet reported any success
15+
if obj.status.lastSuccessfulTime == nil then
16+
-- Set to healthy even if it may be degraded, because we dont know
17+
-- if it was not yet executed or if it never succeeded
18+
hs.status = "Healthy"
19+
hs.message = "The Cronjob never completed succesfully. It may not be healthy"
20+
return hs
21+
end
22+
23+
24+
-- Job is progressing, so lastScheduleTime will always be grater than lastSuccessfulTime
25+
-- Set to healthy since we do not know if it is Degraded
26+
-- See https://github.com/argoproj/argo-cd/issues/24429
27+
if obj.status.active ~= nil and table.getn(obj.status.active) > 0 then
28+
hs.status = "Healthy"
29+
hs.message = "The job is running. Its last execution may not have been successful"
30+
return hs
31+
end
1832

1933
-- If the CronJob has no active jobs and the lastSuccessfulTime < lastScheduleTime
2034
-- then we know it failed the last execution
21-
if obj.status.lastScheduleTime ~= nil then
22-
-- No issue comparing time as text
23-
if obj.status.lastSuccessfulTime == nil or obj.status.lastSuccessfulTime < obj.status.lastScheduleTime then
35+
if obj.status.lastSuccessfulTime ~= nil and obj.status.lastSuccessfulTime < obj.status.lastScheduleTime then
2436
hs.status = "Degraded"
2537
hs.message = "CronJob has not completed its last execution successfully"
2638
return hs
2739
end
40+
2841
hs.message = "CronJob has completed its last execution successfully"
42+
hs.status = "Healthy"
43+
return hs
2944
end
3045

3146
-- There is no way to know if as CronJob missed its execution based on status
3247
-- so we assume Healthy even if a cronJob is not getting scheduled.
3348
-- https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#job-creation
49+
hs.message = "CronJob has not been scheduled yet"
3450
hs.status = "Healthy"
3551
return hs
3652
end

resource_customizations/batch/CronJob/health_test.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,21 @@ tests:
55
inputPath: testdata/healthy.yaml
66
- healthStatus:
77
status: Healthy
8-
message: ''
8+
message: CronJob has not been scheduled yet
99
inputPath: testdata/never-scheduled.yaml
1010
- healthStatus:
1111
status: Degraded
1212
message: CronJob has not completed its last execution successfully
1313
inputPath: testdata/degraded.yaml
1414
- healthStatus:
15-
status: Degraded
16-
message: CronJob has not completed its last execution successfully
15+
status: Healthy
16+
message: The Cronjob never completed succesfully. It may not be healthy
1717
inputPath: testdata/never-succeeded.yaml
1818
- healthStatus:
19-
status: Progressing
20-
message: Waiting for 1 Jobs to complete
19+
status: Healthy
20+
message: The job is running. Its last execution may not have been successful
2121
inputPath: testdata/active.yaml
2222
- healthStatus:
23-
status: Suspended
23+
status: Healthy
2424
message: CronJob is Suspended
2525
inputPath: testdata/suspended.yaml

0 commit comments

Comments
 (0)