Skip to content

Fix startupProbe ignoring its own timing configuration#976

Merged
nightfury1204 merged 1 commit intomasterfrom
fix/startup-probe-template-variables
Mar 31, 2026
Merged

Fix startupProbe ignoring its own timing configuration#976
nightfury1204 merged 1 commit intomasterfrom
fix/startup-probe-template-variables

Conversation

@beastawakens
Copy link
Copy Markdown
Collaborator

@beastawakens beastawakens commented Mar 26, 2026

What is the feature/update/fix?

Fix: startupProbe Ignoring Its Own Timing Configuration

The startupProbe in the Kubernetes service deployment template was reading its timing parameters from $.Service.Liveness.* instead of $.Service.StartupProbe.*. This meant any timing values set directly on startupProbe in convox.yml were silently ignored — the liveness probe values were always used instead.

This bug was introduced in PR #848 (v3.19.7) when the startupProbe feature was first added. The template was wired to the wrong struct from the start.

What was affected:

convox.yml field Kubernetes field Before (Broken) After (Fixed)
startupProbe.grace initialDelaySeconds Read from liveness.grace Read from startupProbe.grace
startupProbe.interval periodSeconds Read from liveness.interval Read from startupProbe.interval
startupProbe.timeout timeoutSeconds Read from liveness.timeout Read from startupProbe.timeout
startupProbe.successThreshold successThreshold Read from liveness.successThreshold Read from startupProbe.successThreshold
startupProbe.failureThreshold failureThreshold Read from liveness.failureThreshold Read from startupProbe.failureThreshold

Default inheritance: When startupProbe timing fields are not explicitly set (value is 0), they now correctly fall back to the corresponding liveness probe values. This preserves backward compatibility for existing configs that only set a startupProbe.path without custom timing.

Test coverage: TestManifestStartupProbe covers 3 scenarios:

  1. Explicit values — startupProbe with all timing fields set uses its own values independently from liveness
  2. Path-only inheritance — startupProbe with only a path inherits all timing from liveness (backward compat)
  3. Partial overrides with tcpSocketPort — some fields explicit (grace, failureThreshold), others inherited from liveness; uses TCP socket check instead of HTTP

Why is this important?

The startupProbe is critical for services with slow initialization — database migrations, large model loading, cache warming, JVM startup, or GPU model initialization. When the timing values silently fell back to liveness probe values, services could experience unexpected restarts during startup because the startupProbe's failureThreshold or grace period didn't provide enough time for initialization to complete.

Example of the broken behavior:

Given this convox.yml:

services:
  web:
    build: .
    port: 8080
    liveness:
      path: /live
      grace: 10
      interval: 5
      timeout: 5
      failureThreshold: 3
    startupProbe:
      path: /startup
      grace: 60
      interval: 30
      timeout: 10
      failureThreshold: 10

Before (broken): Kubernetes startupProbe gets initialDelaySeconds: 10, periodSeconds: 5, failureThreshold: 3 (liveness values). The service has only ~25 seconds to start (3 failures × 5s interval + 10s grace).

After (fixed): Kubernetes startupProbe gets initialDelaySeconds: 60, periodSeconds: 30, failureThreshold: 10 (startupProbe values). The service has ~360 seconds to start (10 failures × 30s interval + 60s grace).

Benefits:

  • startupProbe Timing Works as Documented — Services that explicitly configure startupProbe timing in convox.yml will now use those values instead of silently falling back to liveness values
  • Safer Slow-Start Services — Services that need generous startup windows now actually get that window instead of being constrained by the tighter liveness probe timing
  • TCP Socket Check Support — startupProbe also supports tcpSocketPort for non-HTTP services, and this fix applies to TCP checks as well
  • No Impact on Default Behavior — Services that don't explicitly configure startupProbe timing continue to inherit from the liveness probe, which was the intended default behavior

Supported startupProbe fields in convox.yml:

Field Type Description
path string HTTP GET endpoint for health check
tcpSocketPort string TCP port for socket check (alternative to path)
grace int Seconds to wait before first check (initialDelaySeconds)
interval int Seconds between checks (periodSeconds)
timeout int Seconds before check times out (timeoutSeconds)
successThreshold int Consecutive successes to be considered healthy
failureThreshold int Consecutive failures before restart

Does it have a breaking change?

Potentially — if your service relies on the broken behavior where startupProbe timing was always inherited from liveness regardless of explicit values, the fix will change the effective probe timing.

However, this is a bug fix: the documented and intended behavior was always for startupProbe to use its own timing values when explicitly configured. If your convox.yml specifies startupProbe timing values, those values will now actually be applied.

Services that only set a startupProbe.path without custom timing values are completely unaffected — they will continue to inherit timing from the liveness probe.


Requirements

This fix requires version 3.24.1 or later for the rack.

Update the Rack: Run convox rack update 3.24.1 -r rackName to update to this version.

Note that your rack must already be on at least version 3.23.0 before performing this update.

If you're unfamiliar with v3 rack versioning, we recommend reviewing the documentation on Updating a Rack before applying any updates.

The startupProbe template was reading all timing parameters (grace,
interval, timeout, successThreshold, failureThreshold) from the
Liveness struct instead of the StartupProbe struct. This meant any
values set directly on startupProbe in convox.yml were silently
ignored, and the liveness defaults were used instead.

This also adds default handling for StartupProbe in ApplyDefaults(),
inheriting from liveness values when not explicitly set, preserving
backward compatibility for configs that relied on the inheritance
behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@beastawakens beastawakens requested a deployment to UpgradeDowngradeTests March 26, 2026 17:11 — with GitHub Actions Waiting
@beastawakens beastawakens requested a deployment to UpgradeDowngradeTests March 26, 2026 17:11 — with GitHub Actions Waiting
@nightfury1204 nightfury1204 merged commit 5c7cc1b into master Mar 31, 2026
13 of 22 checks passed
@nightfury1204 nightfury1204 deleted the fix/startup-probe-template-variables branch March 31, 2026 09:39
@nightfury1204 nightfury1204 mentioned this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants