-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
egressgateway: fix initial reconciliation #18325
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
(Not a big fun of using Sleep, but I can't think of a better approach at this moment.)
if manager.k8sCacheSyncedChecker.K8sCacheIsSynced() { | ||
break | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking comment: If there is a bug that does not allow the cache to be synced, we would get stuck in an infinite loop. Should we add an info message here? We can print it after N iterations if we think it would be too verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are already logging this info in the watchers
package:
cilium/pkg/k8s/watchers/watcher.go
Lines 391 to 398 in 2581084
go func() { | |
select { | |
case <-cachesSynced: | |
log.Info("All pre-existing resources have been received; continuing") | |
case <-time.After(option.Config.K8sSyncTimeout): | |
log.Fatal("Timed out waiting for pre-existing resources to be received; exiting") | |
} | |
}() |
so probably no need to log it also here
Agree 👍 the alternative would be to use a |
f04be2b
to
91879e2
Compare
When a new egress gateway manager is created, it will wait for the k8s cache to be fully synced before running the first reconciliation. Currently the logic is based on the WaitUntilK8sCacheIsSynced method of the Daemon object, which waits on the k8sCachesSynced channel to be closed (which indicates that the cache has been indeed synced). The issue with this approach is that Daemon object is passed to the NewEgressGatewayManager method _before_ its k8sCachesSynced channel is properly initialized. This in turn causes the WaitUntilK8sCacheIsSynced method to never return. Since NewEgressGatewayManager must be called before that channel is initialized, we need to switch to a polling approach, where the k8sCachesSynced is checked periodically. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
91879e2
to
f9140cd
Compare
/test |
l4lb is failing consistently (will be fixed with #18370) marking as ready to merge |
When a new egress gateway manager is created, it will wait for the k8s
cache to be fully synced before running the first reconciliation.
Currently the logic is based on the WaitUntilK8sCacheIsSynced method
of the Daemon object, which waits on the k8sCachesSynced channel to be
closed (which indicates that the cache has been indeed synced).
The issue with this approach is that Daemon object is passed to
the NewEgressGatewayManager method before its k8sCachesSynced
channel is properly initialized. This in turn causes the
WaitUntilK8sCacheIsSynced method to never return.
Since NewEgressGatewayManager must be called before that channel is
initialized, we need to switch to a polling approach, where the
k8sCachesSynced is checked periodically.