Skip to content

Fix JRuby timeout to respect CF platform limit (180s max)#1141

Closed
ivanovac wants to merge 1 commit intomasterfrom
fix-jruby-timeout-platform-limit
Closed

Fix JRuby timeout to respect CF platform limit (180s max)#1141
ivanovac wants to merge 1 commit intomasterfrom
fix-jruby-timeout-platform-limit

Conversation

@ivanovac
Copy link
Copy Markdown
Contributor

Problem

PR #1140 was merged with a change that increases the JRuby test app timeout to 300 seconds in manifest.yml. However, the CF platform deployment has a hard limit of 180 seconds configured via cc.maximum_health_check_timeout.

This causes deployment failures in CI:

For application 'switchblade-jojpj-2dr': health_check_timeout Maximum exceeded: max 180s
FAILED

Root Cause

The CloudFoundry platform has this configuration that we cannot override:

cc:
  maximum_health_check_timeout: 180

Any attempt to set timeout: 300 in the manifest is rejected by the Cloud Controller.

Solution

This PR reverts only the manifest.yml change from PR #1140, setting the timeout back to 180 seconds.

What this PR keeps from #1140:

  • ✅ Test polling timeout increase (3min → 5min) - This is correct and remains
  • ✅ Updated comments in test code
  • ✅ The goal of reducing flaky failures

What this PR reverts from #1140:

  • ❌ Manifest timeout increase (180s → 300s) - This violates platform constraint

Changes

- timeout: 300
+ timeout: 180

Only 1 file, 1 line changed.

Why The Original Goal Still Works

The test polling timeout (5 minutes) is separate from the CF health check timeout (180s):

  1. CF Health Check (180s max):

    • CloudFoundry waits up to 180s for app to pass health check
    • App becomes "running" when health check passes
    • This is a platform constraint we must respect
  2. Test Polling Timeout (5 minutes, increased in Increase JRuby test polling timeout for cflinuxfs5 stability #1140):

    • Test waits up to 5 minutes for HTTP response from endpoint
    • Starts counting after app is marked as "running"
    • This allows JRuby warmup time after health check passes

Timeline:

  • 0-180s: App starting, health check running
  • 180s: Health check passes, app marked "running"
  • 180-300s: JRuby warming up, test still polling
  • 240s: JRuby ready, test succeeds

The 5-minute test timeout gives sufficient time for:

  • Health check (up to 180s) + JRuby warmup (30-60s) = Total 210-240s

Testing

This fix allows the test to:

  • ✅ Deploy with valid 180s timeout (respects platform limit)
  • ✅ Pass health check within 180s
  • ✅ Have 5 minutes total for JRuby warmup
  • ✅ Reduce flaky failures without violating constraints

Related Issues

Impact

Priority

Critical - Current master is broken for environments with 180s timeout limit.

PR #1140 increased the timeout to 300s, but the CF platform has a
configured maximum of 180s (cc.maximum_health_check_timeout: 180).
This causes deployment failures:

  'health_check_timeout Maximum exceeded: max 180s'

The test polling timeout increase (3min -> 5min) from PR #1140 is
correct and remains in place. That gives the test sufficient time
to wait for JRuby warmup after the health check passes.

This fix reverts only the manifest.yml timeout back to 180s while
keeping the test timeout at 5 minutes, which achieves the goal of
reducing flaky failures without violating platform constraints.

Fixes: #1140 (partial revert of manifest.yml change)
@ivanovac
Copy link
Copy Markdown
Contributor Author

Closing this PR - reverting to 180s doesn't solve the problem, it just returns to the original flaky test issue. We need a different approach that respects the platform 180s limit while still addressing test flakiness.

@ivanovac ivanovac closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant