fix(cd): poll for GH profile mismatch instead of fixed 12s sleep#77
Merged
Conversation
The fixed-sleep wipe-detection in #75 raced on the first real deploy: cold-boot JVM took longer than 12s to reach checkProfilesConsistency, so the grep against `docker logs --tail 80` ran before the "Profile 'bike' does not match" line had been written. CD reported success while prod GH crash-looped, requiring a manual cache wipe. Replace the fixed sleep with a 30×3s poll that exits early on either terminal signal: a hash mismatch in the logs, or `/info` answering on port 8989 (i.e. GH booted clean). Total ceiling 90s, common-path latency unchanged because successful boots break out as soon as `/info` responds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the post-deploy fixed-sleep wipe detection with a 30×3s poll that exits early on either terminal signal (hash mismatch in logs, or
/infoanswering on port 8989). Fixes the race that bit prod on #75's first deploy.What happened
CD ran the new GH image build/push +
docker compose pull && up -dcleanly. The 12s sleep then expired before the cold-boot JVM had reachedGraphHopper.checkProfilesConsistency, so thegrep "does not match"againstdocker logs --tail 80saw nothing and the conditional wipe never fired. CD reported success, then prod GH crash-looped (Profile 'bike' does not match. Stored: 1861193009 / Configured: -828139490) until I wiped the cache by hand.Fix
Poll up to 90s (30 × 3s) and break out on the first of:
Profile '...' does not matchappears → wipe + reimport (existing path)./inforeturns 200 → GH booted clean → leave cache alone.Common-path latency unchanged: clean boots exit the loop as soon as
/infoanswers (typically ~6–10s warm, ~15–25s cold). Mismatched boots get caught reliably regardless of how slow the JVM is.Test plan
/api/routereturned a valid LineString (4044m / 861s for the central-Berlin smoke pair)./infosucceed, and skip the wipe path.