Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent: add support for watching kvstoremesh prefixes #26154

Merged
merged 5 commits into from Jun 15, 2023

Conversation

giorio94
Copy link
Member

This PR introduces a set of minor modifications to enable Cilium agents watching the kvstore prefixes used by kvstoremesh. Please refer to the individual commit descriptions for additional details.

@giorio94 giorio94 added area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. labels Jun 13, 2023
@giorio94 giorio94 requested review from a team as code owners June 13, 2023 07:19
@giorio94
Copy link
Member Author

giorio94 commented Jun 13, 2023

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing with bpf_host

Failure Output

FAIL: Connectivity test between nodes failed

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.25-kernel-4.19/660/

The failure was #25964

@giorio94
Copy link
Member Author

Rebased onto main to pick the fixes for conformance-ginkgo

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

giorio94 commented Jun 14, 2023

/test-1.26-net-next

Hit #26210 The PR needed to be rebased as #23165 got merged.

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

giorio94 commented Jun 14, 2023

/ci-aks

Hit #26075

@giorio94
Copy link
Member Author

/ci-external-workloads

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joestringer joestringer added the kind/enhancement This would improve or streamline existing functionality. label Jun 14, 2023
@giorio94
Copy link
Member Author

/ci-ginkgo

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clustermesh: enable watching the "cached" ipcache prefix

Thanks for the detailed commit messages. Changes lgtm for my codeowner (ipcache), just a high-level question regarding upgrade/downgrade - will the ipcache entries will be stored at both the legacy and the new path (when kvstoremesh is enabled)?

@aditighag aditighag requested a review from a team June 14, 2023 22:38
@giorio94
Copy link
Member Author

Just a high-level question regarding upgrade/downgrade - will the ipcache entries will be stored at both the legacy and the new path (when kvstoremesh is enabled)?

The clustermesh-apiserver still uses the old prefixes to store the information concerning the local cluster, while kvstoremesh uses the new ones to cache the information retrieved from remote clusters. This is transparent from the agents POW, because they select which ones to watch based on the Cached capability part of the cluster config (defaulting to the old behavior if unspecified). When performing an upgrade enabling kvstoremesh, the agents will switch from watching the remote kvstores to watching the local one (because their cilium-clustermesh config is changed by Helm), hence using the new prefixes, while during a downgrade (or if kvstoremesh is disabled), the opposite would happen.

This commit adapts the nodes and services watchers to use the newly
introduced "cache" prefixes in case the corresponding capability is set.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, the ipcache entries are stored under the kvstore key
`cilium/state/ip/v1/default/<ip>` regardless of the configured
cluster name. Yet, this is problematic if the same kvstore hosts
information concerning multiple clusters, because it is impossible
to watch only the entries referring to a single one (and there would
be conflicts in case of overlapping PodCIDRs).

This commit adapts the ipcache watcher logic to use the newly introduced
"cached" prefix (i.e., `cilium/cache/ip/v1/<cluster-name>) when
retrieving the entries from remote clusters, in case the corresponding
capability is set (i.e., it has been created by kvstoremesh). This
prevents backward compatibility issues.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, the identities entries are stored under the kvstore prefix
`cilium/state/identities/v1/id/<identity>` regardless of the configured
cluster name. Yet, this is problematic if the same kvstore hosts
information concerning multiple clusters, because it is impossible to
watch only the entries referring to a single one.

This commit adapts the identities watcher logic to use the newly
introduced "cached" prefix (`cilium/state/identities/v1/<cluster>/id>`
when retrieving the identities from remote clusters, in case the
corresponding capability is set (i.e., it has been created by
kvstoremesh). This prevents backward compatibility issues.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit adds a test to ensure that remoteCluster.Run() appropriately
sets up the kvstore watchers based on the remote cluster config settings.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
When kvstoremesh is enabled, the agent connects to the local kvstore,
rather to remote ones. Hence, it targets the corresponding service.
Yet, since agents run in host network, service resolution requires that
the DNSPolicy is set to ClusterFirstWithHostNet, introducing a
dependency on CoreDNS. To prevent this requirement, let's configure a
custom dialer responsible for service resolution based on the service
cached information.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94
Copy link
Member Author

Rebased onto main to pick the changes already introduced in #26083

@giorio94
Copy link
Member Author

/test

Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 15, 2023
@joestringer joestringer merged commit 9f5a82a into cilium:main Jun 15, 2023
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants