Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle InvalidParameterValue as well for PD fallback #31016

Merged
merged 2 commits into from
Mar 1, 2024

Conversation

hemanthmalla
Copy link
Member

#30536 prematurely concluded that AWS now uses InsufficientCidrBlocks to indicate the subnet is out of prefixes. Looks like AWS still uses InvalidParameterValue and There aren't sufficient free Ipv4 addresses or prefixes to indicate subnet is at capacity. However, when subnet is at capacity potentially due to fragmentation, InsufficientCidrBlocks is returned. In either case, it's worth trying to fallback since /32 IPs might still be available compared to /28. Details from the support ticket we opened with AWS for clarification

I would like to inform you that I have received an update from our team :

  • Client.InvalidParameterValue : There aren't sufficient free Ipv4 addresses or prefixes

    The exception 'Client.InvalidParameterValue : There aren't sufficient free Ipv4 addresses or prefixes' occurs when the number of requested IP addresses exceeds the number of available IP addresses in a subnet. This exception can occur during CreateNetworkInterface, AttachNetworkInterface, and AssignPrivateIpAddresses (please refer to Common client error codes section).

      [+] [https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateNetworkInterface.html ](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateNetworkInterface.html)
      [+] [https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AttachNetworkInterface.html ](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AttachNetworkInterface.html)
      [+] [https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssignPrivateIpAddresses.html ](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssignPrivateIpAddresses.html)
      [+] [https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html#CommonErrors ](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html#CommonErrors)
    
  • Client.InsufficientCidrBlocks : The specified subnet does not have enough free cidr blocks to satisfy the request

    The exception 'Client.InsufficientCidrBlocks : The specified subnet does not have enough free cidr blocks to satisfy the request' occurs when subnet does not have any contiguous /28 blocks available as available in our public docs.

      [+] [https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html ](https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html)
    

    Even if your subnet has available IP addresses, if the subnet does not have any contiguous /28 blocks available, you will see the following error.

    This can happen due to fragmentation of existing secondary IP addresses spread out across a subnet. To resolve this error, either create a new subnet and launch Pods there, or use an Amazon EC2 subnet CIDR reservation to reserve space within a subnet for use with prefix assignment. For more information, see Subnet CIDR reservations in the Amazon VPC User Guide.


Depending on subnet IP utilization during API call, the response for API call can be reaching either of the two scenarios above.

Additional References :-

  [+] https://repost.aws/knowledge-center/vpc-insufficient-ip-errors

@hemanthmalla hemanthmalla requested a review from a team as a code owner February 27, 2024 22:03
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 27, 2024
@hemanthmalla
Copy link
Member Author

/test

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, I note that I mentioned in the previous PR a sort lack of testing in this area. Would it be worth to write some unit tests (even some small mocks) just to validate the behavior here?

@christarazi christarazi added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/eni Impacts ENI based IPAM. sig/ipam IP address management, including cloud IPAM area/ipam Impacts IP address management functionality. labels Feb 28, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Feb 28, 2024
cilium#30536 prematurely concluded that
AWS now uses InsufficientCidrBlocks to indicate the subnet is out of
prefixes. Looks like AWS still uses InvalidParameterValue and "There
aren't sufficient free Ipv4 addresses or prefixes" to indicate subnet is
at capacity. In addition to this InsufficientCidrBlocks is returned
when subnet is at capacity potentially due to fragmentation. In either
case, it's worth trying to fallback since /32 IPs might still be
available compared to /28. See PR for details from AWS support ticket.

Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com>
@hemanthmalla
Copy link
Member Author

@christarazi updated the prefix delegation unit test to cover the fallback case. This wouldn't really catch scenarios we're hitting here (new error codes), but the fallback feature itself should now be tested based on mocked API implementation.

➜   go test ./pkg/aws/eni -test.v -test.run Test/ENISuite/TestNodeManagerPrefixDelegation
=== RUN   Test
=== RUN   Test/ENISuite
=== RUN   Test/ENISuite/TestNodeManagerPrefixDelegation
level=info msg="Synchronized ENI information" numInstances=1 numSecurityGroups=2 numSubnets=1 numVPCs=1 subsys=eni
level=info msg="Discovered new CiliumNode custom resource" name=node1 subsys=ipam
level=warning msg="Unable to compute pending pods, will not surge-allocate" error="pod store uninitialized" instanceID=i-testNodeManagerDefaultAllocation-0 name=node1 subsys=ipam
level=info msg="Resolving IP deficit of node" available=16 availableForAllocation=128 emptyInterfaceSlots=2 instanceID=i-testNodeManagerDefaultAllocation-0 maxIPsToAllocate=4 name=node1 neededIPs=4 remainingInterfaces=3 selectedInterface=5a76a247-3699-4c74-aca6-b074c7151dde selectedPoolID=s-1 subsys=ipam used=12
level=info msg="Synchronized ENI information for the corresponding instance" instance=i-testNodeManagerDefaultAllocation-0 numSecurityGroups=2 numSubnets=1 numVPCs=1 subsys=eni
level=warning msg="Unable to compute pending pods, will not surge-allocate" error="pod store uninitialized" instanceID=i-testNodeManagerDefaultAllocation-0 name=node1 subsys=ipam
level=info msg="Resolving IP deficit of node" available=32 availableForAllocation=112 emptyInterfaceSlots=2 instanceID=i-testNodeManagerDefaultAllocation-0 maxIPsToAllocate=1 name=node1 neededIPs=1 remainingInterfaces=3 selectedInterface=5a76a247-3699-4c74-aca6-b074c7151dde selectedPoolID=s-1 subsys=ipam used=25
level=warning msg="Subnet might be out of prefixes, Cilium will not allocate prefixes on this node anymore" node=node1 subsys=eni
level=info msg="Synchronized ENI information for the corresponding instance" instance=i-testNodeManagerDefaultAllocation-0 numSecurityGroups=2 numSubnets=1 numVPCs=1 subsys=eni
--- PASS: Test (0.02s)
    --- PASS: Test/ENISuite (0.02s)
        --- PASS: Test/ENISuite/TestNodeManagerPrefixDelegation (0.02s)
=== RUN   TestENIIPAMCapacityAccounting
--- PASS: TestENIIPAMCapacityAccounting (0.00s)
PASS
ok  	github.com/cilium/cilium/pkg/aws/eni	0.071s

Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com>
@hemanthmalla
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 29, 2024
@ldelossa ldelossa added this pull request to the merge queue Mar 1, 2024
Merged via the queue into cilium:main with commit 0007e35 Mar 1, 2024
62 checks passed
@hemanthmalla hemanthmalla added affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch labels Mar 7, 2024
@jibi jibi added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.8 Mar 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.13 Mar 13, 2024
@jrajahalme jrajahalme removed this from Needs backport from main in 1.15.2 Mar 13, 2024
@thorn3r thorn3r removed this from Needs backport from main in 1.13.13 Mar 13, 2024
@thorn3r thorn3r added this to Needs backport from main in 1.13.13 Mar 13, 2024
@thorn3r thorn3r added this to Needs backport from main in 1.13.14 Mar 13, 2024
@thorn3r thorn3r removed this from Needs backport from main in 1.13.13 Mar 13, 2024
@thorn3r thorn3r added this to Needs backport from main in 1.14.9 Mar 13, 2024
@thorn3r thorn3r removed this from Needs backport from main in 1.14.8 Mar 13, 2024
@gandro gandro mentioned this pull request Mar 19, 2024
21 tasks
@gandro gandro added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 19, 2024
@gandro gandro mentioned this pull request Mar 19, 2024
10 tasks
@gandro gandro added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Mar 19, 2024
@gandro gandro mentioned this pull request Mar 19, 2024
8 tasks
@gandro gandro added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Mar 19, 2024
@github-actions github-actions bot added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels Mar 20, 2024
@thorn3r thorn3r moved this from Needs backport from main to Backport done to v1.13 in 1.13.14 Mar 26, 2024
@jrajahalme jrajahalme moved this from Needs backport from main to Backport done to v1.14 in 1.14.9 Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch area/eni Impacts ENI based IPAM. area/ipam Impacts IP address management functionality. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/ipam IP address management, including cloud IPAM
Projects
No open projects
1.13.14
Backport done to v1.13
1.14.9
Backport done to v1.14
Status: Released
Status: Released
Status: Released
Development

Successfully merging this pull request may close these issues.

None yet

5 participants