Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few cleanups for per-cluster CT/SNAT maps #25712

Merged

Conversation

YutaroHayakawa
Copy link
Member

Please see the commit messages for more details.

A few cleanups for per-cluster CT/SNAT maps

@YutaroHayakawa YutaroHayakawa added area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. labels May 26, 2023
@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa YutaroHayakawa marked this pull request as ready for review May 26, 2023 22:50
@YutaroHayakawa YutaroHayakawa requested review from a team as code owners May 26, 2023 22:50
@YutaroHayakawa YutaroHayakawa requested review from jschwinger233, squeed and giorio94 and removed request for squeed and jschwinger233 May 26, 2023 22:50
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm. Just a minor nit inline

pkg/maps/nat/per_cluster_nat.go Outdated Show resolved Hide resolved
@YutaroHayakawa
Copy link
Member Author

Let me ask for your review as a CLI team @squeed as you were originally assigned 🙏

1. Don't create per-cluster inner maps in the delete functions
2. Fix map access with wrong address family in DeleteClusterCTMaps
3. Clarify deliberately ignored deleteClusterNATMap error in the
   PerClusterNATMap.cleanup().
4. Fix various bugs that we miss the last element in per-cluster CT/SNAT
   outer maps when we iterate over them. Update test cases to use last
   element to prevent such bugs in the future.
5. Don't Unpin() on cleanup logic of GetAllClusterCTMaps().
6. Don't return error when deleteClusterXXMap failed with no exist.
7. Add ClusterID logfields.
8. Log an error on cleanup failure of per-cluster NAT maps.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Currently, we use hard-coded outer map name suffix (e.g. "tcp4" for CT
maps, "v4_external" for NAT maps, etc) and inner map name suffix (e.g.
"1") in some places including test code. That means we are assuming such
naming scheme in multiple places and it is easy to make naming mistakes.
To prevent the issue, remove all hard-coded outer map suffix and replace
them with variables. Also, remove all hard-coded inner map suffix and
use function to generate inner map names.

This commit also removes the hard-coded outer and inner map names of NAT
map. This makes easy to change the map names in the tests.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/cleanup-per-cluster-maps branch from 9da6ddd to 02b94e7 Compare May 30, 2023 01:40
@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa
Copy link
Member Author

k8s-1.26-kernel-net-next: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/273/

Suite-k8s-1.26.K8sUpdates and K8sDatapathBandwidthTest failures are due to the Docker rate limit

@YutaroHayakawa
Copy link
Member Author

@YutaroHayakawa
Copy link
Member Author

/test-1.26-net-next

@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented May 31, 2023

ConformanceIngress: New flake since nothing on this PR should affect to the L7. Filed an issue: #25776

@YutaroHayakawa
Copy link
Member Author

I'm pretty sure the flake of the K8sDatapathBGPTests comes from the image rate limit issue as well.

  • BGP test uses FRR from DockerHub (
    image: docker.io/frrouting/frr:v7.5.1
    )
  • BGP test waits for the FRR to up by getting PodIP (

    cilium/test/k8s/bgp.go

    Lines 70 to 77 in 57f6f04

    Eventually(func() string {
    frrPod, err := kubectl.GetPodsIPs(helpers.KubeSystemNamespace, "app=frr")
    if _, ok := frrPod["frr"]; err != nil || !ok {
    return ""
    }
    routerIP = frrPod["frr"]
    return routerIP
    }, 30*time.Second, 1*time.Second).Should(Not(BeEmpty()), "BGP router is not ready")
    )

PodIP is assigned regardless of the Pod state (even if it is ImagePullBackOff, IP address is assigned). So, this condition is met even if FRR itself is not ready. This should be fixed in this PR (#25777).

@YutaroHayakawa
Copy link
Member Author

All green. Ready to merge.

@YutaroHayakawa YutaroHayakawa added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023
@julianwiedmann julianwiedmann merged commit 667f6b4 into cilium:main May 31, 2023
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants