Skip to content

Commit 19fb2b3

Browse files
devantlerclaude
andcommitted
fix(apps): startupProbe initialDelaySeconds=20 to clear settle window
The original probe (periodSeconds: 2, failureThreshold: 30, no initialDelay) silenced cold-start liveness/readiness *restarts* but not the underlying "Unhealthy" Warning events — kubelet emits the same event for startup, liveness, and readiness probe failures, and the 2s period generates 5-7 failures during the ~13s cold start instead of the chart-default 1-3 (periodSeconds: 10). Merge-queue deploy of #1636 failed the check-event-warnings action, which records a marker post-reconcile and fails if any Warning event has lastTimestamp within a 90s settle window. The rollout these patches force created new pods during that window; their startup probes fired every 2s during cold start; their events landed past the marker. Set initialDelaySeconds: 20 (past the observed ~13s cold start) and periodSeconds: 5 so the first probe lands on a serving container. Zero failure events on a normal rollout; failureThreshold: 12 leaves 60s of grace if a container is unusually slow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d7295fb commit 19fb2b3

3 files changed

Lines changed: 26 additions & 9 deletions

File tree

k8s/bases/apps/actual-budget/helm-release.yaml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,15 +52,20 @@ spec:
5252
# Chart hardcodes startupProbe absence; values override the
5353
# liveness/readiness blocks below but not startupProbe. Gate
5454
# liveness/readiness on the container actually serving HTTP.
55+
# initialDelaySeconds skips past the ~10s cold-start window so
56+
# the first probe lands on a serving container — zero failure
57+
# events during the merge-queue's 90s steady-state Warning
58+
# check.
5559
- op: add
5660
path: /spec/template/spec/containers/0/startupProbe
5761
value:
5862
httpGet:
5963
path: /
6064
port: http
61-
periodSeconds: 2
62-
timeoutSeconds: 2
63-
failureThreshold: 30 # 60s max startup window
65+
initialDelaySeconds: 20
66+
periodSeconds: 5
67+
timeoutSeconds: 3
68+
failureThreshold: 12 # 60s grace beyond initial delay
6469
# https://github.com/community-charts/helm-charts/blob/main/charts/actualbudget/values.yaml
6570
values:
6671
replicaCount: ${actual_budget_replicas:=1}

k8s/bases/apps/headlamp/helm-release.yaml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,15 +77,20 @@ spec:
7777
# 1-3 Unhealthy probe warnings while the Go binary initialises.
7878
# Add a startupProbe so liveness/readiness are gated until the
7979
# main container is actually serving.
80+
# initialDelaySeconds skips past the cold-start window so the
81+
# first probe lands on a serving container — zero failure
82+
# events during the merge-queue's 90s steady-state Warning
83+
# check.
8084
- op: add
8185
path: /spec/template/spec/containers/0/startupProbe
8286
value:
8387
httpGet:
8488
path: /
8589
port: http
86-
periodSeconds: 2
87-
timeoutSeconds: 2
88-
failureThreshold: 30 # 60s max startup window
90+
initialDelaySeconds: 20
91+
periodSeconds: 5
92+
timeoutSeconds: 3
93+
failureThreshold: 12 # 60s grace beyond initial delay
8994
- target:
9095
kind: Deployment
9196
name: headlamp

k8s/bases/apps/homepage/helm-release.yaml

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,15 +58,22 @@ spec:
5858
# rollout produces 3 Unhealthy probe warnings per pod and leaves
5959
# only ~17s of headroom before the liveness restart fires. Add
6060
# a startupProbe to gate liveness/readiness during initial boot.
61+
#
62+
# initialDelaySeconds is past the observed ~13s cold start so
63+
# the first probe lands on a serving container — zero failure
64+
# events during a normal rollout (the merge-queue's 90s
65+
# steady-state Warning check would otherwise count probe
66+
# failures fired in that window).
6167
- op: add
6268
path: /spec/template/spec/containers/0/startupProbe
6369
value:
6470
httpGet:
6571
path: /
6672
port: http
67-
periodSeconds: 2
68-
timeoutSeconds: 2
69-
failureThreshold: 30 # 60s max startup window
73+
initialDelaySeconds: 20
74+
periodSeconds: 5
75+
timeoutSeconds: 3
76+
failureThreshold: 12 # 60s grace beyond initial delay
7077
# ICONS:
7178
# https://github.com/walkxcode/dashboard-icons
7279
# https://simpleicons.org

0 commit comments

Comments
 (0)