Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS EC2 Instance tag filter #19181

Merged
merged 1 commit into from Apr 8, 2022
Merged

Conversation

prune998
Copy link
Contributor

@prune998 prune998 commented Mar 18, 2022

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Thanks for contributing!

Fixes: #18239

@prune998 prune998 requested a review from a team March 18, 2022 01:33
@prune998 prune998 requested review from a team as code owners March 18, 2022 01:33
@maintainer-s-little-helper
Copy link

Commit 96133ab6cdcacf9eb4b9a91e00240502b91cb5a7 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 18, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 18, 2022
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the subnet tag filter not enough to do this job? AFAIU, interfaces are attached to subnets, so filtering out certain subnets allows you to filter out interfaces as well.

@prune998
Copy link
Contributor Author

Why is the subnet tag filter not enough to do this job? AFAIU, interfaces are attached to subnets, so filtering out certain subnets allows you to filter out interfaces as well.

The theory is that you want Cilium to sync AWS resources that are used by the cluster.
While filtering by subnets may help in some situations, it can also be problematic:

The real good way to filter resources is by targeting things that are in play in the cluster: instances / nodegroups.
The actual subnet filtering is actually a dirty hack, badly named and documented, racing and confusing with the CNI filter.

Ideally I would PR to remove the subnet filter, but that would be a breaking change. This PR is adding an instance filter that should be able to replace the subnet filter in most if not all of the possible scenarios.

I'm also wondering about the other part of the Resync function, as we're maintaining all the VPCs, subnets and securityGroups of the whole account... while there's almost no chances that they will be used/at play in one specific cluster.
While some people have one VPC, few subnets and one cluster per AWS account, some have TONS of stuff in a single account that may have an impact on Cilium-Operator performance.

Personally, in my DEV AWS account, I have currently 8 clusters * 5 nodes * 3 private subnets for ec2 instance IPs * 3 pod subnets for pod IPs... For this setup to work, I have to either don't use a filter, or filter on both instances and pods subnets -> each Cilium-Operator will maintain all the ENIs that existe in the account, instead of the ones from the cluster it's running in.

Note that same issue seems to exist with Alibaba, Azure and all.

I'm opened to any suggestion here. And if you think the subnet-filter hack is enough (once well documented) then I'll close this PR :)

@twpayne twpayne added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Mar 22, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 22, 2022
@maintainer-s-little-helper
Copy link

Commit ef1fe5b831034982f3f0c6989ff0376041c8a3f2 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 25, 2022
@prune998 prune998 changed the title WIP: AWS EC2 Instance tag filter AWS EC2 Instance tag filter Mar 25, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 25, 2022
@prune998
Copy link
Contributor Author

latest commit msg with some infos:

This PR is adding the basic code to add a new command line option of the Cilium-Operator to filter AWS EC2 instances (and available ENI/subnets) based on EC2 instance tags

Technically, if an instance tag filter is provided, instead of searching ALL the ENIs that exist in the AWS account, we:

  • search all instances matching the tags
  • grab all the ENIS of the selected instances

We have to search twice, once to filter instances and ENIS (giving us ec2_types.InstanceNetworkInterface) and sadly, another one to grab the ec2_types.NetworkInterface that is needed down the line.

This could also be refactored... We could filter both on instance tags and subnets in the same function (even if it seems useless to do so).

@prune998
Copy link
Contributor Author

Usage:
your EKS cluster where you're deploying Cilium is made on EC2 Instances. Those Instances have some tags set at creation time (for unmanaged instances, set some tags if needed !)
ex:
Instances___EC2_Management_Console_🔊

When starting Cilium, you can use the new --instance-tags-filter option to filter the instances by tags:

--instance-tags-filter=eks:cluster-name=prune-feature-cluster-us-east-1

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks very solid to me, thanks a lot for taking care of this! 🚀

I have a few minor nits that need to be addressed, and an overall question about what the interaction with this and the old subnet tag filter should be.

operator/option/config.go Outdated Show resolved Hide resolved
pkg/aws/ec2/ec2.go Show resolved Hide resolved
pkg/aws/ec2/ec2.go Outdated Show resolved Hide resolved
pkg/aws/ec2/ec2.go Outdated Show resolved Hide resolved
pkg/aws/ec2/ec2.go Show resolved Hide resolved
@gandro gandro added the area/eni Impacts ENI based IPAM. label Mar 28, 2022
pkg/aws/ec2/ec2.go Outdated Show resolved Hide resolved
pkg/aws/ec2/ec2.go Outdated Show resolved Hide resolved
pkg/aws/ec2/ec2.go Outdated Show resolved Hide resolved
pkg/aws/ec2/ec2.go Show resolved Hide resolved
@prune998 prune998 requested a review from a team March 29, 2022 12:12
@prune998 prune998 requested review from a team as code owners March 29, 2022 12:12
@prune998
Copy link
Contributor Author

prune998 commented Apr 5, 2022

/test

1 similar comment
@gandro
Copy link
Member

gandro commented Apr 5, 2022

/test

pchaigno pushed a commit to pchaigno/cilium that referenced this pull request Apr 5, 2022
[ upstream commit 212d6e7 ]

The `eni.subnetTagsFilter` option is notoriously hard to use correctly.
If it is used with tags that don't match the subnet of the pre-attached
ENI, Cilium agent will never become ready (cilium#18239).

This PR removes it from the ENI documentation (which most users will use
as a reference configuration) such that no one enables this option
without being aware of its requirements. This PR also adds additional
context the Helm value. We might deprecate and remove the option in the
future as well (cilium#19181).

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno pushed a commit to pchaigno/cilium that referenced this pull request Apr 5, 2022
[ upstream commit 212d6e7 ]

The `eni.subnetTagsFilter` option is notoriously hard to use correctly.
If it is used with tags that don't match the subnet of the pre-attached
ENI, Cilium agent will never become ready (cilium#18239).

This PR removes it from the ENI documentation (which most users will use
as a reference configuration) such that no one enables this option
without being aware of its requirements. This PR also adds additional
context the Helm value. We might deprecate and remove the option in the
future as well (cilium#19181).

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
@prune998
Copy link
Contributor Author

prune998 commented Apr 5, 2022

I just corrected to docs wordlist, but i'm clueless about the other tests failing... how to get this PR merged ?

@gandro
Copy link
Member

gandro commented Apr 5, 2022

I just corrected to docs wordlist, but i'm clueless about the other tests failing... how to get this PR merged ?

Thanks! So to get the PR merged, we want to make sure CI runs completely and doesn't uncover any issues. Our CI tends to be unreliable at times, so not every failure is necessarily related to the PR. I'll trigger a CI run (the /test is only available to Cilium core committers) and once it finishes, we can triage any failing tests.

As a side note: Please avoid pushing to the branch before we have had a change to look at any failing tests. Re-pushing invalidates the CI results.

@gandro
Copy link
Member

gandro commented Apr 5, 2022

/test

Job 'Cilium-PR-K8s-GKE' hit: #19014 (97.44% similarity)

next-next hit #19264

pchaigno pushed a commit that referenced this pull request Apr 6, 2022
[ upstream commit 212d6e7 ]

The `eni.subnetTagsFilter` option is notoriously hard to use correctly.
If it is used with tags that don't match the subnet of the pre-attached
ENI, Cilium agent will never become ready (#18239).

This PR removes it from the ENI documentation (which most users will use
as a reference configuration) such that no one enables this option
without being aware of its requirements. This PR also adds additional
context the Helm value. We might deprecate and remove the option in the
future as well (#19181).

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno pushed a commit that referenced this pull request Apr 6, 2022
[ upstream commit 212d6e7 ]

The `eni.subnetTagsFilter` option is notoriously hard to use correctly.
If it is used with tags that don't match the subnet of the pre-attached
ENI, Cilium agent will never become ready (#18239).

This PR removes it from the ENI documentation (which most users will use
as a reference configuration) such that no one enables this option
without being aware of its requirements. This PR also adds additional
context the Helm value. We might deprecate and remove the option in the
future as well (#19181).

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
@prune998
Copy link
Contributor Author

prune998 commented Apr 7, 2022

@gandro any news ?

This PR is adding a new command line option `instance-tags-filter`  to the Cilium-Operator to filter maintained AWS EC2 instances (and available ENIs) based on EC2 instance tags

Technically, if an instance tag filter is provided, instead of searching ALL the ENIs that exist in the AWS account, we:

- skip the `instance-subnet-filter` 
- search all instances matching the tags
- grab all the ENIS of the selected instances

The code will search for matching instances and will grab all instance ENIS (giving us `ec2_types.InstanceNetworkInterface`) 
 and will then search ENIs to get `ec2_types.NetworkInterface` that is needed down the line.

Signed-off-by: Sebastien Prune THOMAS <prune@lecentre.net>
@sayboras
Copy link
Member

sayboras commented Apr 8, 2022

/test

@sayboras
Copy link
Member

sayboras commented Apr 8, 2022

I have re-triggered the tests, normally it will take 2-3 hours for full CI. Will check back once it's completed.

@pchaigno
Copy link
Member

pchaigno commented Apr 8, 2022

GKE failed with known flake #17307 but shouldn't be affected by these changes anyway. Other tests are passing and reviews are in. Marking ready to merge.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 8, 2022
@nbusseneau
Copy link
Member

@prune998 Thanks for the contribution!

@nbusseneau nbusseneau merged commit 36ea7d2 into cilium:master Apr 8, 2022
lht added a commit to lht/cilium that referenced this pull request Jul 29, 2022
… used

PR cilium#19181 introduced option `instance-tags-filter` for filtering
instances that need to be kept in sync. The implementation uses
AWS EC2 API DescribeInstances. Without the permission, the operator
would fail with message:

level=warning msg="Unable to synchronize EC2 interface list"
error="operation error EC2: DescribeInstances, https response error
StatusCode: 403, RequestID: <snap>, api error UnauthorizedOperation: You
are not authorized to perform this operation." subsys=eni

This patches documents the necessary permission the operator needs to be
granted when using option `instance-tags-filter`.

Signed-off-by: Haitao Li <lihaitao@gmail.com>
aanm pushed a commit that referenced this pull request Jul 29, 2022
… used

PR #19181 introduced option `instance-tags-filter` for filtering
instances that need to be kept in sync. The implementation uses
AWS EC2 API DescribeInstances. Without the permission, the operator
would fail with message:

level=warning msg="Unable to synchronize EC2 interface list"
error="operation error EC2: DescribeInstances, https response error
StatusCode: 403, RequestID: <snap>, api error UnauthorizedOperation: You
are not authorized to perform this operation." subsys=eni

This patches documents the necessary permission the operator needs to be
granted when using option `instance-tags-filter`.

Signed-off-by: Haitao Li <lihaitao@gmail.com>
dezmodue pushed a commit to dezmodue/cilium that referenced this pull request Aug 10, 2022
… used

PR cilium#19181 introduced option `instance-tags-filter` for filtering
instances that need to be kept in sync. The implementation uses
AWS EC2 API DescribeInstances. Without the permission, the operator
would fail with message:

level=warning msg="Unable to synchronize EC2 interface list"
error="operation error EC2: DescribeInstances, https response error
StatusCode: 403, RequestID: <snap>, api error UnauthorizedOperation: You
are not authorized to perform this operation." subsys=eni

This patches documents the necessary permission the operator needs to be
granted when using option `instance-tags-filter`.

Signed-off-by: Haitao Li <lihaitao@gmail.com>
nbusseneau pushed a commit to nbusseneau/cilium that referenced this pull request Aug 10, 2022
… used

[ upstream commit c7ebfee ]

PR cilium#19181 introduced option `instance-tags-filter` for filtering
instances that need to be kept in sync. The implementation uses
AWS EC2 API DescribeInstances. Without the permission, the operator
would fail with message:

level=warning msg="Unable to synchronize EC2 interface list"
error="operation error EC2: DescribeInstances, https response error
StatusCode: 403, RequestID: <snap>, api error UnauthorizedOperation: You
are not authorized to perform this operation." subsys=eni

This patches documents the necessary permission the operator needs to be
granted when using option `instance-tags-filter`.

Signed-off-by: Haitao Li <lihaitao@gmail.com>
Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
tklauser pushed a commit that referenced this pull request Aug 11, 2022
… used

[ upstream commit c7ebfee ]

PR #19181 introduced option `instance-tags-filter` for filtering
instances that need to be kept in sync. The implementation uses
AWS EC2 API DescribeInstances. Without the permission, the operator
would fail with message:

level=warning msg="Unable to synchronize EC2 interface list"
error="operation error EC2: DescribeInstances, https response error
StatusCode: 403, RequestID: <snap>, api error UnauthorizedOperation: You
are not authorized to perform this operation." subsys=eni

This patches documents the necessary permission the operator needs to be
granted when using option `instance-tags-filter`.

Signed-off-by: Haitao Li <lihaitao@gmail.com>
Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
ldelossa pushed a commit to ldelossa/cilium that referenced this pull request Sep 8, 2022
… used

[ upstream commit c7ebfee ]

PR cilium#19181 introduced option `instance-tags-filter` for filtering
instances that need to be kept in sync. The implementation uses
AWS EC2 API DescribeInstances. Without the permission, the operator
would fail with message:

level=warning msg="Unable to synchronize EC2 interface list"
error="operation error EC2: DescribeInstances, https response error
StatusCode: 403, RequestID: <snap>, api error UnauthorizedOperation: You
are not authorized to perform this operation." subsys=eni

This patches documents the necessary permission the operator needs to be
granted when using option `instance-tags-filter`.

Signed-off-by: Haitao Li <lihaitao@gmail.com>
Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/eni Impacts ENI based IPAM. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cilium 1.11.0 does not start on EKS with pod subnet-filter
9 participants