New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy up Kubernetes watcher synchronization #17145
Tidy up Kubernetes watcher synchronization #17145
Conversation
close(cachesSynced) | ||
}() | ||
|
||
go func() { | ||
select { | ||
case <-cachesSynced: | ||
log.Info("All pre-existing resources related to policy have been received; continuing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These log messages mentioning this is "related to policy" have been wrong for some time now since we wait for just about every type of resource, so I just dropped those bits from the messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one small comment
f5b24a8
to
439d7d7
Compare
439d7d7
to
14d8fa2
Compare
test-me-please Job 'Cilium-PR-K8s-1.16-net-next' hit: #17060 (85.43% similarity) |
func (k *K8sWatcher) EnableK8sWatcher(ctx context.Context) error { | ||
// EnableK8sWatcher watches for policy, services and endpoint changes on the | ||
// Kubernetes api server defined in the receiver's daemon k8sClient. | ||
func (k *K8sWatcher) EnableK8sWatcher(ctx context.Context, resources []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on whether it might make sense to pass a map of handler functions instead of string slice here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It crossed my mind while reading this code, it might make it easier to read as opposed to long switch/case block, just thought we could discuss that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I briefly thought about it, it does sound tidier. Only thing is that the function prototypes are different for the different init functions. I wasn't sure if it would make sense to force them all to conform to the same signature just for this cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Hitting #16938. |
117f3ea
to
5b2a4c7
Compare
/test |
I've also triaged ci-eks failures as #16938 / cilium/cilium-cli#367 . |
Adding @jibi for review as janitor. @nebril , your review request is just for a one-liner variable -> function rename in the test directory so I think that's fine to bypass as long as we have maybe one extra sanity check from janitor. EDIT: Actually, I realize that the only reason janitor is pulled in is because of the equivalent change in |
Refactor this logic into a dedicated function to simplify later commits. No functional changes. Signed-off-by: Joe Stringer <joe@cilium.io>
Previously, the list of resources to be initialized would be prepared in k.EnableK8sWatcher(), but the logic to handle the wait for cache sync could select a different set of resources. This PR tidies these up into smaller functions divided by the functionality they provide (core CNI or feature additions), and consistently enables these using the pkg/option configurations. In future we can potentially even remove unused CRDs from the cluster and Cilium will not complain that they are unavailable; this commit doesn't try to go quite that far yet. Signed-off-by: Joe Stringer <joe@cilium.io>
Shift the declaration of the CRD resources used by the agent & other applications into pkg/k8s/synced, then declare a mapping from that canonical list of resources to the corresponding groups inside pkg/k8s/watchers. This makes both pieces of code synchronized. If for instance CiliumEndpoint CRDs are disabled, they are globally disabled everywhere rather than just in part of the agent code. Signed-off-by: Joe Stringer <joe@cilium.io>
Some resources may be dynamically disabled by configuration options in the agent / operator. It doesn't make sense to register these resources into Kubernetes if they are disabled in the agent, so don't register them in that case. Signed-off-by: Joe Stringer <joe@cilium.io>
5b2a4c7
to
6c6ad94
Compare
/test EDIT: CI was broken due to VirtualBox 6.1 upgrade. Need to re-run. |
/test |
This reverts commits b5c9b55 through 5a2c816. Rationale: it seems this broke clustermesh functionality, as both external workloads and multicluster testing workflows are consistently failing since merge. The two workflows also did not pass in the original PR, with the same errors. - External workloads: standalone Cilium agent is unable to come up while trying to reach clustermesh apiserver. - Multicluster: cross-cluster traffic not being able to reach the other cluster during connectivity test. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Opened #17675 for reverting the changes in this PR as I believe we broke clustermesh functionality (both external workloads and multicluster consistently failing since merge). |
This reverts commits b5c9b55 through 5a2c816. Rationale: it seems this broke clustermesh functionality, as both external workloads and multicluster testing workflows are consistently failing since merge. The two workflows also did not pass in the original PR, with the same errors. - External workloads: standalone Cilium agent is unable to come up while trying to reach clustermesh apiserver. - Multicluster: cross-cluster traffic not being able to reach the other cluster during connectivity test. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
This reverts commits b5c9b55 through 5a2c816. Rationale: it seems this broke clustermesh functionality, as both external workloads and multicluster testing workflows are consistently failing since merge. The two workflows also did not pass in the original PR, with the same errors. - External workloads: standalone Cilium agent is unable to come up while trying to reach clustermesh apiserver. - Multicluster: cross-cluster traffic not being able to reach the other cluster during connectivity test. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
This reverts commits b5c9b55 through 5a2c816. Rationale: it seems this broke clustermesh functionality, as both external workloads and multicluster testing workflows are consistently failing since merge. The two workflows also did not pass in the original PR, with the same errors. - External workloads: standalone Cilium agent is unable to come up while trying to reach clustermesh apiserver. - Multicluster: cross-cluster traffic not being able to reach the other cluster during connectivity test. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Cilium should no longer register or wait to sync k8s resources that are disabled such as LRP.
See commits for more details.