Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid caching dynamic VPC CIDR info #1113

Merged
merged 1 commit into from
Sep 23, 2020
Merged

Conversation

anguslees
Copy link
Contributor

@anguslees anguslees commented Aug 3, 2020

We expect the list of VPC CIDRs to change. Rather than cache the VPC CIDRs and then asynchronously update that cache, just don't cache :) and fetch the latest VPC CIDRs on demand.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@anguslees
Copy link
Contributor Author

anguslees commented Aug 3, 2020

Note to reviewer:

@anguslees anguslees force-pushed the no-cache-md branch 2 times, most recently from a9e2b5c to 8cec203 Compare August 3, 2020 23:40
Copy link
Contributor

@mogren mogren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1110 and #1111 have been merged, a rebase will make this change easier to see.

I agree that caching IMDS is probably not worth it. We don't check metadata all that often anyway.

@@ -22,6 +24,7 @@ import (
// EC2Metadata wraps the methods from the amazon-sdk-go's ec2metadata package
type EC2Metadata interface {
GetMetadata(path string) (string, error)
GetMetadataWithContext(ctx context.Context, path string) (string, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... We already set the default config to have WithMaxRetries(10)...

return eni, 0, nil
}
// 0 is reserved for primary ENI, the rest of them has to +1 to avoid collision at 0
return eni, int(deviceNum + 1), nil
Copy link
Contributor Author

@anguslees anguslees Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I removed the +1 thing here, since we're already assuming that 0 == primary (and the previous code did too).

I think (with high confidence) that that's ok. I think (with much less confidence) that's it's also ok across upgrade. Upgrade testing/failures should look here for the cause - especially upgrade tests involving multiple interfaces...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to do something similar in #1071, but was not 100% sure that the primary ENI is always at index 0: https://github.com/aws/amazon-vpc-cni-k8s/pull/1071/files#diff-b55f09f925e40eba84053fdae2096d9cR648

return strconv.Atoi(data)
}

// GetDeviceNumber returns the unique device number associated with an interface. The primary interface is 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the primary interface always 0? I have kind of assumed it is, but is that documented somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not explicitly afaics :( Primary ENI is assumed to be always "eth0" in all our scripts/docs/tutorials and the metadata docs for device-number say "eg eth3 is device-number 3" (*). Primary ENI is also the first/only interface attached at boot, so assuming indices go up then the first is also always 0.

(*) .. which is obviously nonsense in the general case, because you can't know how the OS is going to do device enumeration - but I believe this is the fantasy that the original Xen API was based around.

I can remove this comment if you want. The rest of our code already assumes 0==primary (eg here and here) and this hasn't caused us any problems - yet.

Mostly I wanted to be more consistent across our codebase instead of using a mix of .Primary and .DeviceIndex==0, and here seemed a reasonable place to declare this assumption. I can certainly soften it to say "vpc-cni assumes that primary interface is always 0"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly different:

		// TODO: Verify that the first IP is always the primary
		primary := isFirst

This is checking with IP should not be used by pods. I've checked in the aws/amazon-ec2-net-util repo where they also use index 0 to be the "attchment IP". We should definitely see if we can get rid of this limitation though, and just reserve the "node IP".

The aws.Int64Value(ec2res.Attachment.DeviceIndex) == 0 check to add that log line was to help customers who don't have delete_on_termination set for the primary ENI. #1023

Copy link
Contributor Author

@anguslees anguslees Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oh yes, you're right, one of my examples was primary IP not primary ENI. Anything that calls this getENIDeviceNumber() function (ie: anything that calls GetAttachedENIs()) is another example of code that always sees primaryENI as DeviceNumber==0.)

I'm happy to remove this, and instead make the code track primary ENI separately, and allow any DeviceNumber (including 0) for any ENI. What I don't want to do is have a function called EC2InstanceMetadataCache.getENIDeviceNumber() that is hard-coded to return 0 for the primaryENI and a value that is subtly not the real IMDS device-number for everything else :P

Shall I put in a PR ahead of this one that tracks "primary-ness" explicitly and calculates our legacy "table number" without calling it device-number?

pkg/awsutils/awsutils.go Outdated Show resolved Hide resolved
return eni, 0, nil
}
// 0 is reserved for primary ENI, the rest of them has to +1 to avoid collision at 0
return eni, int(deviceNum + 1), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to do something similar in #1071, but was not 100% sure that the primary ENI is always at index 0: https://github.com/aws/amazon-vpc-cni-k8s/pull/1071/files#diff-b55f09f925e40eba84053fdae2096d9cR648

@anguslees
Copy link
Contributor Author

anguslees commented Sep 22, 2020

Rebased. With the additional changes, I think this is too big now. I'm going to split out the first commit into a new PR..

Edit: -> #1225 Will rebase this PR on that PR if/when it's merged.

@anguslees anguslees marked this pull request as draft September 22, 2020 07:24
Rather than cache and then update the cache - just don't cache :)
@anguslees anguslees changed the title Avoid caching dynamic metadata Avoid caching dynamic VPC CIDR info Sep 23, 2020
@anguslees
Copy link
Contributor Author

anguslees commented Sep 23, 2020

Drastically reduced the scope of this PR, and reworded the PR description accordingly. You can ignore most of the previous discussion.

@anguslees anguslees marked this pull request as ready for review September 23, 2020 02:50
Copy link
Contributor

@mogren mogren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One less go-routine... Nice!

Triggered another e2e test: https://github.com/aws/amazon-vpc-cni-k8s/runs/1152850334?check_suite_focus=true

Comment on lines +1419 to +1433
func (cache *EC2InstanceMetadataCache) GetVPCIPv4CIDRs() ([]string, error) {
ctx := context.TODO()

ipnets, err := cache.imds.GetVPCIPv4CIDRBlocks(ctx, cache.primaryENImac)
if err != nil {
return nil, err
}

// TODO: keep as net.IPNet and remove this round-trip to/from string
asStrs := make([]string, len(ipnets))
for i, ipnet := range ipnets {
asStrs[i] = ipnet.String()
}

return asStrs, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change. Just always fetch the CIDR blocks from the metadata, instead of having a loop to update a cache.

Copy link
Contributor

@jayanthvn jayanthvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, as @mogren mentioned one less thread :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants