Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Fix egress gateway getting started guide #15984

Merged

Conversation

gandro
Copy link
Member

@gandro gandro commented May 3, 2021

This commit fixes various issues discovered while testing the egress
gateway getting started guide:

  • Use the correct Helm value (egressGateway.enabled) to enable the
    feature.
  • Use the correct field names in the example CiliumEgressNATPolicy.
  • Set the Kubernetes namespace in helm install as we do in other
    helm install invocations.
  • Ensure that the egress-ip-assign depoyment is always co-located
    with the example workload via pod affinity.
  • The examples use curl to access the external service, so use curl
    in the access log as well.

With these fixes applied, I was able to successfully validate this guide.

This commit fixes various issues discovered while testing the egress
gateway getting started guide:

 - Use the correct Helm value (`egressGateway.enabled`) to enable the
   feature.
 - Use the correct field names in the example `CiliumEgressNATPolicy`.
 - Set the Kubernetes namespace in `helm install` as we do in other
   `helm install` invocations.
 - Ensure that the `egress-ip-assign` depoyment is always co-located
   with the example workload via pod affinity.
 - The examples use `curl` to access the external service, so use `curl`
   in the access log as well.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro added area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. release-note/misc This PR makes changes that have no direct user impact. needs-backport/1.10 labels May 3, 2021
@gandro gandro requested a review from a team as a code owner May 3, 2021 15:18
@gandro gandro requested a review from qmonnet May 3, 2021 15:18
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.10.0-rc2 May 3, 2021
@pchaigno pchaigno self-requested a review May 3, 2021 21:26
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 4, 2021
@ti-mo ti-mo merged commit f9ef751 into cilium:master May 5, 2021
@brb brb mentioned this pull request May 7, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.0-rc2 May 7, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.10 in 1.10.0-rc2 May 13, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.10 in 1.10.0-rc2 May 13, 2021
@julianwiedmann
Copy link
Member

/me casts resurrect spell

Hey @gandro, do you still remember why this part was needed:

Ensure that the egress-ip-assign depoyment is always co-located
with the example workload via pod affinity.

Discussing with @jibi, having the workload on the same node as the Egress Gateway shouldn't be necessary at all.

@gandro
Copy link
Member Author

gandro commented Apr 1, 2022

/me casts resurrect spell

Hey @gandro, do you still remember why this part was needed:

Ensure that the egress-ip-assign depoyment is always co-located
with the example workload via pod affinity.

Discussing with @jibi, having the workload on the same node as the Egress Gateway shouldn't be necessary at all.

It's been a while, so I might be wrong here - but from what I can reconstruct this was not about the egress gateway itself, but the node configuration. My understanding is that the node where egress gateway is running needs the egress IP to be set, and this change made it such that we're setting the IP on the gateway node, and not just a random node.

Looking at the change, I guess the affinity should be based on the workload, but on the gateway itself, but I'm not sure to achieve that. The docs here also refer to the OSS version of egress gateway, which might behave differently from what Cilium Enterprise offers - I honestly lack the knowledge here. @jibi would know better.

@pchaigno
Copy link
Member

pchaigno commented Apr 1, 2022

Assigning the IP to the node makes it the gateway node.

But maybe what we'd want here is the opposite of the current action: to ensure that the egress gateway is not the node on which the client pod is running. That way users can see the full effect of the egress gateway feature (i.e., redirect to egress gateway + SNAT instead of just SNAT currently).

@julianwiedmann
Copy link
Member

For reference, the test-VM instead suggests using a nodeSelector for pinning the Egress Gateway to a specific node. Which feels reasonable, so that the Gateway doesn't potentially bounce around in the cluster during a re-deployment.

@jibi
Copy link
Member

jibi commented Apr 1, 2022

My understanding is that the node where egress gateway is running needs the egress IP to be set

correct

and this change made it such that we're setting the IP on the gateway node, and not just a random node.

we are setting the IP on the node where the workload is running, which works I guess, but at the same time was confusing us a bit (as the comment seems to imply that we specifically need the node with the workload, while it's actually not)

But maybe what we'd want here is the opposite of the current action: to ensure that the egress gateway is not the node on which the client pod is running. That way users can see the full effect of the egress gateway feature (i.e., redirect to egress gateway + SNAT instead of just SNAT currently).

agree with this 👍

For reference, the test-VM instead suggests using a nodeSelector for pinning the Egress Gateway to a specific node. Which feels reasonable, so that the Gateway doesn't potentially bounce around in the cluster during a re-deployment.

yep, we should probably change the example egress-ip-assign pod's manifest to use a node selector, but I'd say to wait as I have a PR almost ready with some other egress gateway changes (which will allow to use a node selector also to specify the egress gateway node in the CiliumEgressNATPolicy object), and these changes will require to update the docs anyway

@pchaigno
Copy link
Member

pchaigno commented Apr 1, 2022

yep, we should probably change the example egress-ip-assign pod's manifest to use a node selector

Won't this be hard to achieve in an arbitrary customer environment? We don't know the names and labels of nodes. It makes sense for the CI where we control all this, but not sure it does for a user guide.

@jibi
Copy link
Member

jibi commented Apr 1, 2022

I mean, we should just the same example selector (example-label: test or something alike) both in the policy:

apiVersion: cilium.io/v2alpha1
kind: CiliumEgressNATPolicy
metadata:
  name: egress-test
spec:
#[..]
  egressGateway:
    nodeSelector:
      matchLabels:
        example-label: test

and then in the egress-ip-assign pod manifest:

apiVersion: v1
kind: Pod
metadata:
  name: egress-ip-assign
spec:
#[..]
  nodeSelector:
    example-label: test

(also not sure why we are currently suggesting to use a Deployment for that 🤔)

@pchaigno
Copy link
Member

pchaigno commented Apr 1, 2022

That requires the user to add the appropriate label on the node. One more step that isn't really necessary in this context IMO.

Why do we care about which exact node the egress gateway ends up being? I understand for test (reduce randomness) but I don't see the point for a user guide. An antiaffinity rule seems enough to me.

@jibi
Copy link
Member

jibi commented Apr 1, 2022

An antiaffinity rule seems enough to me.

with the current state of the feature: yes, but once we add support for nodeSelectors in the egressgw policies you'll have to assign the egress IP to the same node that is selected by the policy (perhaps we can document/add examples for both cases)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.10.0-rc2
Backport done to v1.10
Development

Successfully merging this pull request may close these issues.

None yet

8 participants