-
Notifications
You must be signed in to change notification settings - Fork 3k
Fix: Preserve Gateway's independent upstream health check state when receiving config updates from Admin #6274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…from admin When admin publishes configuration updates with upstreams marked as status=false, the gateway should preserve their unhealthy state and continue health checking instead of completely removing them. This allows the gateway's independent health check to recover upstreams when they become healthy. Changes: - UpstreamCacheManager: Refactored submit() method to preserve unhealthy state for both status=true and status=false upstreams - Added processOfflineUpstreams() to handle status=false upstreams with health check enabled, keeping them in unhealthy map for monitoring - Added processValidUpstreams() to check if valid upstreams were previously unhealthy and preserve that status - UpstreamCheckTask: Made removeFromMap() public to support state preservation Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive tests to verify the fix for preserving unhealthy upstream state when receiving config updates from admin. UpstreamCacheManagerTest: - testSubmitWithStatusFalsePreservesUnhealthyState: Verify that upstreams with status=false that were previously unhealthy remain in unhealthy map - testSubmitWithNewOfflineUpstreamAddedToUnhealthy: Verify new upstreams with status=false are added to unhealthy map for monitoring - testSubmitPreservesUnhealthyForValidUpstream: Verify valid upstreams that were previously unhealthy remain in unhealthy map - testSubmitWithHealthCheckDisabledAndStatusFalse: Verify upstreams with healthCheckEnabled=false are removed, not added to unhealthy map UpstreamCheckTaskTest: - testPutToMap: Test adding upstreams to healthy map - testPutToMapUnhealthy: Test adding upstreams to unhealthy map - testRemoveFromMap: Test removing upstreams from healthy map - testRemoveFromMapUnhealthy: Test removing upstreams from unhealthy map - testMoveUpstreamBetweenMaps: Test moving upstreams between maps Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assertTrue(!healthCheckTask.getUnhealthyUpstream().containsKey(selectorId) | ||
| || healthCheckTask.getUnhealthyUpstream().get(selectorId).isEmpty()); |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion logic uses double negation which reduces code readability. Instead of checking !healthCheckTask.getUnhealthyUpstream().containsKey(selectorId) || healthCheckTask.getUnhealthyUpstream().get(selectorId).isEmpty(), consider using a more straightforward assertion that directly validates the absence of upstreams in the unhealthy map.
| assertTrue(!healthCheckTask.getHealthyUpstream().containsKey(selectorId) | ||
| || healthCheckTask.getHealthyUpstream().get(selectorId).isEmpty()); |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion logic uses double negation which reduces code readability. Instead of checking !healthCheckTask.getHealthyUpstream().containsKey(selectorId) || healthCheckTask.getHealthyUpstream().get(selectorId).isEmpty(), consider using a more straightforward assertion that directly validates the absence of upstreams in the healthy map.
| assertTrue(!healthCheckTask.getUnhealthyUpstream().containsKey(selectorId) | ||
| || healthCheckTask.getUnhealthyUpstream().get(selectorId).isEmpty()); |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion logic uses double negation which reduces code readability. Instead of checking !healthCheckTask.getUnhealthyUpstream().containsKey(selectorId) || healthCheckTask.getUnhealthyUpstream().get(selectorId).isEmpty(), consider using a more straightforward assertion that directly validates the absence of upstreams in the unhealthy map.
...u-loadbalancer/src/test/java/org/apache/shenyu/loadbalancer/cache/UpstreamCheckTaskTest.java
Outdated
Show resolved
Hide resolved
…er/cache/UpstreamCheckTaskTest.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Problem Description
When the Admin side health check detects an upstream as unhealthy and publishes a configuration update with
status=false, the Gateway side completely removes that upstream from bothhealthyUpstreamandunhealthyUpstreammaps. This causes the Gateway to lose track of the upstream, preventing its independent health check from recovering the upstream when it becomes healthy again.Error Manifestation
divide upstream configuration error
CANNOT_FIND_HEALTHY_UPSTREAM_URL
Root Cause
ShenYu has two independent health check systems:
The issue occurs when:
status=false)DivideUpstreamDataHandlerUpstreamCacheManager.submit()processesstatus=falseupstreamstriggerRemoveOne()which removes the upstream from BOTH healthy and unhealthy mapsSolution
Design Principle
Gateway's health check state should be independent of Admin's configuration updates
Core Logic Change
Before:
status=false → triggerRemoveOne() → completely removed from both maps
After:
status=false AND healthCheckEnabled=true → preserve in unhealthy map → continue health checking
status=false AND healthCheckEnabled=false → remove (no monitoring needed)
Changes
Refactored
submit()methodConcurrentModificationExceptionby creating ArrayList copy before iterationNew method:
processOfflineUpstreams()Handles upstreams with
status=false:New method:
processValidUpstreams()Handles upstreams with
status=true:unhealthyUpstreammapNew method:
getCurrentUnhealthyMap()Helper method to get current unhealthy upstreams for state preservation
Made
putToMap()andremoveFromMap()publicprivatebut are needed byUpstreamCacheManagerTesting
Added 9 comprehensive tests to verify the fix:
UpstreamCacheManagerTest (4 new tests)
testSubmitWithStatusFalsePreservesUnhealthyState: Verifies upstreams withstatus=falsethat were previously unhealthy remain in unhealthy maptestSubmitWithNewOfflineUpstreamAddedToUnhealthy: Verifies new upstreams withstatus=falseare added to unhealthy map for monitoringtestSubmitPreservesUnhealthyForValidUpstream: Verifies valid upstreams (status=true) that were previously unhealthy remain in unhealthy maptestSubmitWithHealthCheckDisabledAndStatusFalse: Verifies upstreams withhealthCheckEnabled=falseare removed, not added to unhealthy mapUpstreamCheckTaskTest (5 new tests)
testPutToMap: Tests adding upstreams to healthy maptestPutToMapUnhealthy: Tests adding upstreams to unhealthy maptestRemoveFromMap: Tests removing upstreams from healthy maptestRemoveFromMapUnhealthy: Tests removing upstreams from unhealthy maptestMoveUpstreamBetweenMaps: Tests moving upstreams between healthy and unhealthy mapsTest Results
Tests run: 19, Failures: 0, Errors: 0, Skipped: 0
BUILD SUCCESS
Impact
Before Fix
CANNOT_FIND_HEALTHY_UPSTREAM_URLerrorsAfter Fix
Commits