Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for more kubelet and container runtime configuration #1346

Closed
wants to merge 1 commit into from
Closed

Add support for more kubelet and container runtime configuration #1346

wants to merge 1 commit into from

Conversation

alex-berger
Copy link
Contributor

@alex-berger alex-berger commented Feb 15, 2022

Add support for more kubelet and container runtime configuration parameters

1. Issue, if available:

None

2. Description of changes:

This PR adds support for commonly used kubelet configuration parameters and also introduces the first
container runtime configuration parameter (registry mirrors).

3. How was this change tested?

Only by running the unit tests via make dev.

4. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Feb 15, 2022

✔️ Deploy Preview for karpenter-docs-prod ready!

🔨 Explore the source changes: 61856fb

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/621358737b017b0007cbc491

😎 Browse the preview: https://deploy-preview-1346--karpenter-docs-prod.netlify.app

)

// ContainerRuntimeConfiguration defines args to be used when configuring the Container Runtime.
// Note, not all Container Runtime implementations will support all of these settings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this struct from? Is this a generic interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not found any existing API (struct) for those parameters. However, most Container Runtimes (CRI implementations) accept some configuration (usually via config files). This was inspired by bottlerocket which will then create the corresponding config files:

As Container Runtime configuration is not cloud provider specific, I thought that it would make sense to place it alongside KubeletConfiguration in the Constraints struct.

Copy link
Contributor

@ellistarn ellistarn Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not comfortable defining this abstraction in Karpenter without deep understanding of where this is going. I worry that this will become an unmaintainable bag of parameters. It's a different story for kubelet configuration, since it's an OSS standard we're taking a subset of.

Are you blocked on this? What's the use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am blocked on this. We need to be able to define the registry mirrors, to make sure all container images are pulled through our own (pull through caching & security policy enforcing) registries.

We can also move it into AWS provider specific part (the AWS struct). However, as written before, this kind of settings are quite common among Container Runtimes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is registry mirrors all you needed? There's a lot that's being added here and I want to make sure we only add fields as they're necessary and prevent a bloated spec.

Keep in mind that eventually we do intend on inlining UserData in the Provisioner as well which could be used to support any kind of customization (OS level or kubelet / container-runtime level).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that eventually we do intend on inlining UserData in the Provisioner as well which could be used to support any kind of customization (OS level or kubelet / container-runtime level).

@suket22 With this PR I just followed the path already paved (in the current code base), which does not support setting a verbatim UserData field and instead uses input from KubeletConfiguration and AWS structs to assemble the UserData. If you intend to replace all that logic with a simple userData field (e.g. in Constraints or AWS structs) that's fine and I am with you (and I will gladly refactor things with this PR). Just let me know which directions you guys want to go.

Is registry mirrors all you needed? There's a lot that's being added here and I want to make sure we only add fields as they're necessary and prevent a bloated spec.

While I understand your concerns, you should keep in mind that if you abstract over a configurable system (like for example AML2, bottlerocket, the kubelet, containerd, ...), there is a reason why those system have so many configuration parameters and that your abstractions must have means to pass on such configuration parameters to the underlying systems. Actually, this is a strong hint that Approach 3 - Provide a more generic way to specify launch template contents in the Provider might be the right way to avoid having to change karpenter with each user (specific) need.

In our specific case we need to fine tune the kubelet and containerd to make sure we are compliant with several different high availability and security compliance requirements. For example we must be able to set eventRecordQPS (to unlimited) for security compliance reasons (see CIS Kubernetes Benchmarks control 4.2.9).

@suket22
Copy link
Contributor

suket22 commented Feb 17, 2022

We should also update the documentation for this. If you could list out all the kubeletConfiguration fields here https://karpenter.sh/v0.6.3/provisioner/#speckubeletconfiguration and add containerRuntimeConfiguration that would be great!

Copy link
Contributor

@suket22 suket22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly lgtm.

  1. Could we add tests for this code? The UserData logic should definitely be added to IMO. @spring1843 recently added support for coveralls reports btw, but not going to block this PR on those results.

  2. Did you get a chance to try your change out as well and verify if that worked as expected? The existing CI and unit tests aren't exhaustive so a manual test / smoke test would be great since the blast radius is quite high - if we mess up UserData generation, nobody's nodes join the cluster.

  3. We can probably do this outside of this PR but we should add validation to the AWS provider that fails if a launch template is specified in addition to either kubeletConfiguration / containerRuntimeConfig since we can't honor both.

Comment on lines 158 to 159
eventRecordQPS: 5
eventBurst: 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the spacing is incorrect for these - it should be on the same level as clusterDNS

// KubeletConfiguration defines args to be used when configuring kubelet on provisioned nodes.
// They are a subset of the upstream types, recognizing not all options may be supported.
// Wherever possible, the types and names should reflect the upstream kubelet types.
// Wherever possible, the types and names should reflect the upstream kubelet types from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wherever possible,

This concerns me. This spec should be an explicit, word for word copy paste of snippits from that struct.

EvictionHard map[string]string `json:"evictionHard,omitempty"`
}

const notNegative = "cannot be negative"
Copy link
Contributor

@ellistarn ellistarn Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used once, cleaner to inline it where it's used.

const notNegative = "cannot be negative"

func (k *KubeletConfiguration) validate() (errs *apis.FieldError) {
errs = addErrIfNegative(errs, k.EventRecordQPS, "eventRecordQPS")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errs.Also will collapse nil errors. This would be a bit more idiomatic, I think:

return errs.Also(
  validateGreaterThan(k.EventBurst, 0, "eventRecordQPS"),
  validateGreaterThan(k. RegistryPullQPS, 0, "requistryPullQPS")
)

// Note that not all providers may use all addresses.
//+optional
ClusterDNS []string `json:"clusterDNS,omitempty"`
// EventRecordQPS is the maximum event creations per second. If 0,
// there is no limit enforced.
EventRecordQPS *int32 `json:"eventRecordQPS,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need all of these parameters for your use case? Can you provide a sample of how you'd use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please see #1346 (comment) (above)

//+optional
KubeletConfiguration KubeletConfiguration `json:"kubeletConfiguration,omitempty"`
// ContainerRuntimeConfiguration are options passed to the container runtime when provisioning nodes
//+optional
ContainerRuntimeConfiguration ContainerRuntimeConfiguration `json:"containerRuntimeConfiguration,omitempty"`
Copy link
Contributor

@ellistarn ellistarn Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not comfortable including this custom spec into our API surface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please see #1346 (comment) (above)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you on this request. Still -- this is API surface and we need to get it right.

Can we at least remove from this PR so that we can move forward with the kubeletconfiguration changes which are less contentious? We can address the remaining gap in a followon.

@alex-berger
Copy link
Contributor Author

3. We can probably do this outside of this PR but we should add validation to the AWS provider that fails if a launch template is specified in addition to either kubeletConfiguration / containerRuntimeConfig since we can't honor both.

@suket22 I guess a provided launch template does not conflict with kubeletConfiguration / containerRuntimeConfig as long as its userData field is empty. Maybe I am wrong, but in that case the validation logic should only fail if all of the following conditions are met:

  • a launch template is provided
  • its userData field is not empty
  • either of kubeletConfiguration / containerRuntimeConfig is not empty

@alex-berger
Copy link
Contributor Author

2. Did you get a chance to try your change out as well and verify if that worked as expected? The existing CI and unit tests aren't exhaustive so a manual test / smoke test would be great since the blast radius is quite high - if we mess up UserData generation, nobody's nodes join the cluster.

@suket22 No, I have no reasonable (local) test infrastructure / setup for this yet. Would be great if we could add this to CI (instead of each part-time contributor wasting time to figure out how to test this locally).

@ellistarn
Copy link
Contributor

@suket22 No, I have no reasonable (local) test infrastructure / setup for this yet. Would be great if we could add this to CI (instead of each part-time contributor wasting time to figure out how to test this locally).

We're working on e2e CI that should help here. FWIW, I think it's pretty critical that you test something manually when it's developed. Otherwise it can be easy to miss edge cases, performance problems, etc.

@ellistarn
Copy link
Contributor

Hey @alex-berger are you still working on this? Anything I can do to unblock?

@alex-berger
Copy link
Contributor Author

Hi @ellistarn, the discussions and comments on this PR got me thinking and I started to dig a bit deeper to figure out how we could achieve:

  • better support different specific OS distributions (Bottlerocket, AmazonLinux 2/2022/..., Ubuntu),
  • support fully generic OS choice (something like AMI & UserData).
  • support for pre-existing LaunchTemplates.
  • without making things to generic or introducing bit rot.
  • still simple and easy (intuitive) to use and fully typed & validated.
  • powerful enough to have all features needed for serious day two operations of clusters, which have strict availability and security requirements (basically, that is what I do if am not writing open source code ☺)

Currently, I am preparing a (WIP) PR, which I would like to share with you (soon) to gather feedback and to see whether Karpenter would benefit from such changes. I would like to stall this PR here (for now) and wait for the feedback on the new PR. If the new PR has any chance to be accepted it might obsolete this one (otherwise not).

As soon as the above mentioned new PR is ready, I will share a link here.

@bwagner5
Copy link
Contributor

We're starting to add more of these options in this PR: #1720 Thanks for brining up these issues!

@bwagner5 bwagner5 closed this Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants