Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GC for per-cluster CT/SNAT maps #24576

Conversation

YutaroHayakawa
Copy link
Member

Please see individual commits for more details. This is disabled by default, so doesn't affect to existing code.

Implement GC for per-cluster CT/SNAT maps

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 27, 2023
@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/per-cluster-map-gc branch from 8f2c665 to 8e51b54 Compare March 27, 2023 09:25
@YutaroHayakawa YutaroHayakawa added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. labels Mar 27, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 27, 2023
@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa YutaroHayakawa marked this pull request as ready for review March 27, 2023 13:34
@YutaroHayakawa YutaroHayakawa requested a review from a team as a code owner March 27, 2023 13:34
} else {
ret = append(ret, im)
}
if im, err := gm.any6.getClusterMap(clusterID); err != nil || im == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns an empty list on err != nil and on im == nil, however, only the case of err != nil is handled in the defer block above. That means, if we fail at any point because of im == nil, we don't clean up the maps that we put into ret.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was not required for these GC changes. I messed up the commit while I moved these commits from my dev branch. Let me delete this.

@@ -221,6 +223,51 @@ func (gm *perClusterCTMaps) DeleteClusterCTMaps(clusterID uint32) error {
return nil
}

func (gm *perClusterCTMaps) GetClusterCTMaps(clusterID uint32) []*Map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't seem to be used anywhere. What is it for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Let me delete this.

} else {
natMap = nil
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this block removed? It's kept in goGC6, and it seems needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it is needed, thanks! I'll revive it.

@YutaroHayakawa YutaroHayakawa marked this pull request as draft March 27, 2023 15:39
Introduce a new field clusterID into ctmap.Map struct. This indicates
which cluster the CT maps are associated.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Implement CT/NAT GC for per-cluster maps. When we perform GC for global
maps, we'll GC per-cluster maps together.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/per-cluster-map-gc branch from 8e51b54 to 5562ce2 Compare March 27, 2023 23:57
@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa YutaroHayakawa marked this pull request as ready for review March 28, 2023 09:04
@YutaroHayakawa
Copy link
Member Author

ConformanceGKE: #22368

@YutaroHayakawa
Copy link
Member Author

/ci-gke

@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented Mar 29, 2023

/test-1.25-4.19

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) Tests NodePort inside cluster (kube-proxy) with IPSec and externalTrafficPolicy=Local

Failure Output

FAIL: Request from k8s1 to service http://[fd04::11]:31900 failed

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.25-kernel-4.19 so I can create one.

@YutaroHayakawa
Copy link
Member Author

k8s-1.25-kernel-4.19: #24625

@YutaroHayakawa
Copy link
Member Author

/test-1.25-4.19

@YutaroHayakawa
Copy link
Member Author

Actually, k8s-1.25-kernel-4.19 was a known flake and was solved by this PR (#24557). I think it's not worth rebasing as it is not related to this change and tests are succeeding in other test cases.

@YutaroHayakawa YutaroHayakawa added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 29, 2023
@borkmann borkmann merged commit 816f3ca into cilium:master Mar 30, 2023
42 checks passed
@@ -213,6 +213,10 @@ type Map struct {
// define maps to the macro used in the datapath portion for the map
// name, for example 'CT_MAP4'.
define string

// This field indicates which cluster this ctmap is. Zero for global
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We should probably start revising our usage of "global" to mean "the local cluster", since global sounds confusingly like all clusters. I know the naming is grandfathered in since it was referring to "globally across the node" instead of "per-endpoint CT" so the naming is not super easy here, but there's lots of opportunity to further confuse readers by continuing to use the global phrasing going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants