Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-eks] attach cluster security group to self managed nodes to allow free communication between all node groups #10884

Closed
akerbergen opened this issue Oct 15, 2020 · 10 comments · Fixed by #12042
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2

Comments

@akerbergen
Copy link

Self managed node groups created with the cluster.addAutoScalingGroupCapacity function does not get a shared security group allowing all traffic between different node groups. Also traffic from managed node groups to self managed node groups are only allowed for tcp port 1025-65535. This causes DNS traffic on port 53 to be blocked between node groups which becomes an issue if you have core-dns pods running on a subset of your self managed node groups as groups without core-dns pods can't do DNS lookups.

The above issue does not exist with eksctl as it creates a shared security group that is used for all self managed and managed node groups, which allow all traffic between them.

Reproduction Steps

Create an EKS cluster with the aws-eks module's cluster construct. Add two self managed node groups with the addAutoScalingGroupCapacity function. Add a managed node group with the addNodegroupCapacity function.

Configure core-dns to run on only one of the self managed node groups.

Start a pod on the other self managed node group that tries to do a nslookup on any domain e.g. kubernetes, or github.com. It fails as it can't connect to the cluster IP for the kube-dns service.

Repeat the above step for a managed node group.

Note: DNS lookup works across managed node groups, if there is a core-dns pod running in any of them, as they all use the same security group, which allows all traffic internally.

What did you expect to happen?

DNS lookups should be possible anywhere in the cluster, regardless of where the core-dns pods are running.

What actually happened?

DNS lookups was not possible due to blocked traffic by the node security groups.

Environment

  • CLI Version : aws-cli/2.0.44
  • Framework Version: cdk 1.66.0 (build 459488d)
  • Node.js Version: v12.18.4
  • OS : Darwin/19.6.0
  • Language (Version): Version 3.9.7

Other

Working scenarios for DNS lookups (pod doing the lookup on the left, core-dns pod on the right):
Self managed node group 1 -> Self managed node group 1
Managed node group 1 -> Managed node group 1
Managed node group 1 -> Managed node group 2

Failing scenarios for DNS lookups:
Self managed node group 1 -> Self managed node group 2
Managed node group 1 -> Self managed node group 1


This is 🐛 Bug Report

@akerbergen akerbergen added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 15, 2020
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Oct 15, 2020
@melodyyangaws
Copy link

melodyyangaws commented Nov 4, 2020

+1
I have a use case "Spark on EKS". It needs executors running on a spot (self-managed) node group to communicate with the driver on a managed node group. At the moment, we have to manually change both SGs to enable the inter-cluster communication within the Spark. Just open the Port 53 doesn't solve the problem 100% .

@iliapolo
Copy link
Contributor

@akerbergen thanks for reporting this.

You can add the shared security group of the cluster to the self managed group like so:

const cluster = new eks.Cluster(...)
const selfManagedGroup = cluster.addAutoScalingGroupCapacity(...)

selfManagedGroup.addSecurityGroup(ec2.SecurityGroup.fromSecurityGroupId(this, 'SharedSecurityGroup', cluster.clusterSecurityGroupId))

This seems like a good default to apply, so i'm marking this as a feature request.

Would appreciate a response if this resolved your issue or is something else needed.

Thanks

@iliapolo iliapolo added feature-request A feature should be added or improved. p2 effort/small Small work item – less than a day of effort and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 22, 2020
@iliapolo iliapolo changed the title [aws-eks] DNS lookups across self managed node groups [aws-eks] attach cluster security group to self managed nodes to allow free communication between all node groups Nov 22, 2020
@dirknilius
Copy link
Contributor

dirknilius commented Nov 23, 2020

@iliapolo

@akerbergen thanks for reporting this.

You can add the shared security group of the cluster to the self managed group like so:

const cluster = new eks.Cluster(...)
const selfManagedGroup = cluster.addAutoScalingGroupCapacity(...)

selfManagedGroup.addSecurityGroup(ec2.SecurityGroup.fromSecurityGroupId(this, 'SharedSecurityGroup', cluster.clusterSecurityGroupId))

This seems like a good default to apply, so i'm marking this as a feature request.

Would appreciate a response if this resolved your issue or is something else needed.

Thanks

I already tried this approach. The node communication worked, including DNS. But somehow when I create a Kubernetes Service with type LoadBalancer the EXTERNAL-IP gets stuck in <pending> forever. This gets fixed just by removing that one line .addSecurityGroup(...). Any idea what is going on?

@iliapolo
Copy link
Contributor

iliapolo commented Nov 23, 2020

@dirknilius Not sure...Do you maybe have a minimal reproduction test case in code that I can run?

@dirknilius
Copy link
Contributor

Not sure...Do you maybe have a minimal reproduction test case in code that I can run?

@iliapolo I could extract the essentials from my stack and create a demo that you can deploy. I will post the link when I'm done.

@dirknilius
Copy link
Contributor

Not sure...Do you maybe have a minimal reproduction test case in code that I can run?

@iliapolo I could extract the essentials from my stack and create a demo that you can deploy. I will post the link when I'm done.

Hmm, the minimal setup worked. I definitely can trigger this issue in my stack by adding and removing the security group reference. I also tried to delete and recreate the stack. Same result. But maybe something else is interfering here. I will come back later if I found something out.

@iliapolo
Copy link
Contributor

@dirknilius Awesome, thanks

@allamand
Copy link

When using the aws load balancer controller i can see in logs that it refuses to create the LB because my ENI as 2 SG attached to it. By manually remove the first SG (only keeping the cluster SG) the load balancer is correctly created. Maybe you have the same problem ?

Also I think that if by default we create the nodegroups with only tue cluster SG that would solve both problems

@iliapolo iliapolo added this to the [GA] @aws-cdk/aws-eks milestone Dec 1, 2020
@mergify mergify bot closed this as completed in #12042 Dec 15, 2020
mergify bot pushed a commit that referenced this issue Dec 15, 2020
Attaching the EKS managed cluster security group to self managed nodes to allow free traffic flow between managed and self-managed nodes.

Closes #10884

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

flochaz pushed a commit to flochaz/aws-cdk that referenced this issue Jan 5, 2021
…2042)

Attaching the EKS managed cluster security group to self managed nodes to allow free traffic flow between managed and self-managed nodes.

Closes aws#10884

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@Zhenye-Na
Copy link

SharedSecurityGroup

Hi @iliapolo @dirknilius ,

Sorry for bumping this issue up. But I am experiencing a similar issue so really appreciate your input here

I came up a similar idea of SharedSecurityGroup, do you mind sharing more about the configuration of this SharedSecurityGroup ?

  • Is this SharedSecurityGroup for cross only self manage node group communication ?
  • or is this SharedSecurityGroup for eks managed node group communication with self managed nodes ?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants