Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make vm memory overhead configurable #1953

Merged
merged 7 commits into from Jun 17, 2022

Conversation

bwagner5
Copy link
Contributor

Fixes #

Description

  • Custom AMIs require different VM memory overhead due to other software or agents running on the machine. This PR makes the VM memory overhead parameter configurable via a CLI arg or env var.
  • I've also added a script to generate a table with the env var / cli flag and usage description for configuring the karpenter controller

How was this change tested?

  • make docgen and observed the overhead calculation is the same as before.
  • Varied the vm overhead parameter between 0, 1, and 0.075, observing correct docs gen of memory.

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note

VM Memory Overhead is now configurable via an environment variable to the Karpenter controller (see the options here: https://karpenter.sh/preview/tasks/configuration/)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@bwagner5 bwagner5 requested a review from a team as a code owner June 17, 2022 16:49
@bwagner5 bwagner5 requested a review from suket22 June 17, 2022 16:49
@netlify
Copy link

netlify bot commented Jun 17, 2022

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit a743baa
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/62accad7204da000076436f1

@@ -31,7 +33,9 @@ func main() {

os.Setenv("AWS_SDK_LOAD_CONFIG", "true")
os.Setenv("AWS_REGION", "us-east-1")
ctx := context.Background()
os.Setenv("CLUSTER_NAME", "docs-gen")
os.Setenv("CLUSTER_ENDPOINT", "https://docs-gen.aws")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:O What's this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just dummy params to allow for flag parsing to take place. This allowed me to plug in other params that could change the generated instance types resources like max-pods and vm overhead easily. Figured I'd leave in how I did it so that if we need to or want to change some of those parameters, it's a simple os.Setenv(...

@@ -33,6 +33,20 @@ func WithDefaultInt(key string, def int) int {
return i
}

// WithDefaultFloat64 returns the float64 value of the supplied environment variable or, if not present,
// the supplied default value. If the float64 conversion fails, returns the default
func WithDefaultFloat64(key string, def float64) float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a generics way to do this?

Copy link
Contributor Author

@bwagner5 bwagner5 Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The float64 func uses strconv.ParseFloat, the Bool one uses strconv.ParseBool and the Int one uses strconv.Atoi. I think the only way you could do it is to use a generic return type and accept a default interface{} as the parameter so that you could do a type switch:

func WithDefault[V float64 | int | string | bool](key string, def interface{}) V {
    var err error
    var p V
    val, ok := os.LookupEnv(key)
    if !ok {
        return def
    }
    switch def.(type) {
        case float64:
           p, err = strconv.ParseFloat(val, 64)
        case int:
           p, err = strconv.Atoi(val)
        case bool:
           p, err = strconv.ParseBool(val)
       default:
           panic("not supported")
    }
    if err != nil {
        return def.(V)
    }
    return p
}

I think they're better as separate funcs :)

f.IntVar(&opts.HealthProbePort, "health-probe-port", env.WithDefaultInt("HEALTH_PROBE_PORT", 8081), "The port the health probe endpoint binds to for reporting controller health")
f.IntVar(&opts.KubeClientQPS, "kube-client-qps", env.WithDefaultInt("KUBE_CLIENT_QPS", 200), "The smoothed rate of qps to kube-apiserver")
f.IntVar(&opts.KubeClientBurst, "kube-client-burst", env.WithDefaultInt("KUBE_CLIENT_BURST", 300), "The maximum allowed burst of queries to the kube-apiserver")
f.Float64Var(&opts.VMMemoryOverhead, "vm-memory-overhead", env.WithDefaultFloat64("VM_MEMORY_OVERHEAD", 0.075), "The VM memory overhead as a percent that will be subtracted from the total memory for all instance types")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you doing some testing on this - did we want to tune 0.075 any further or is that still the best approximation we've got by default?

Have we also given up trying to do this per instance type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing found 0.075 to be very close to the best param for AL2, but this will at least give users control if they want to tune it themselves which could make sense if the instance types are constrained too.

I gave up on per instance type, there's too much variability and we could be wrong as new ones are released, the percentage is a conservative, safe value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a customer, how do I know what value to be setting here?

I'd probably see that 0.075 is too high and therefore my nodes aren't as densely packed as they are with just nodegroups + k8s scheduler, but how do I tune this to say 0.072 etc? Is it just guesswork?

If I choose a really bad number, the worst that will happen is that Karpenter may launch an incorrect number of nodes to satisfy my pending pods but it should eventually converge. Is my understanding correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants