New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make vm memory overhead configurable #1953
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled.
|
@@ -31,7 +33,9 @@ func main() { | |||
|
|||
os.Setenv("AWS_SDK_LOAD_CONFIG", "true") | |||
os.Setenv("AWS_REGION", "us-east-1") | |||
ctx := context.Background() | |||
os.Setenv("CLUSTER_NAME", "docs-gen") | |||
os.Setenv("CLUSTER_ENDPOINT", "https://docs-gen.aws") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:O What's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just dummy params to allow for flag parsing to take place. This allowed me to plug in other params that could change the generated instance types resources like max-pods and vm overhead easily. Figured I'd leave in how I did it so that if we need to or want to change some of those parameters, it's a simple os.Setenv(...
@@ -33,6 +33,20 @@ func WithDefaultInt(key string, def int) int { | |||
return i | |||
} | |||
|
|||
// WithDefaultFloat64 returns the float64 value of the supplied environment variable or, if not present, | |||
// the supplied default value. If the float64 conversion fails, returns the default | |||
func WithDefaultFloat64(key string, def float64) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a generics way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The float64 func uses strconv.ParseFloat
, the Bool one uses strconv.ParseBool and the Int one uses strconv.Atoi
. I think the only way you could do it is to use a generic return type and accept a default interface{} as the parameter so that you could do a type switch:
func WithDefault[V float64 | int | string | bool](key string, def interface{}) V {
var err error
var p V
val, ok := os.LookupEnv(key)
if !ok {
return def
}
switch def.(type) {
case float64:
p, err = strconv.ParseFloat(val, 64)
case int:
p, err = strconv.Atoi(val)
case bool:
p, err = strconv.ParseBool(val)
default:
panic("not supported")
}
if err != nil {
return def.(V)
}
return p
}
I think they're better as separate funcs :)
f.IntVar(&opts.HealthProbePort, "health-probe-port", env.WithDefaultInt("HEALTH_PROBE_PORT", 8081), "The port the health probe endpoint binds to for reporting controller health") | ||
f.IntVar(&opts.KubeClientQPS, "kube-client-qps", env.WithDefaultInt("KUBE_CLIENT_QPS", 200), "The smoothed rate of qps to kube-apiserver") | ||
f.IntVar(&opts.KubeClientBurst, "kube-client-burst", env.WithDefaultInt("KUBE_CLIENT_BURST", 300), "The maximum allowed burst of queries to the kube-apiserver") | ||
f.Float64Var(&opts.VMMemoryOverhead, "vm-memory-overhead", env.WithDefaultFloat64("VM_MEMORY_OVERHEAD", 0.075), "The VM memory overhead as a percent that will be subtracted from the total memory for all instance types") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember you doing some testing on this - did we want to tune 0.075 any further or is that still the best approximation we've got by default?
Have we also given up trying to do this per instance type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing found 0.075
to be very close to the best param for AL2, but this will at least give users control if they want to tune it themselves which could make sense if the instance types are constrained too.
I gave up on per instance type, there's too much variability and we could be wrong as new ones are released, the percentage is a conservative, safe value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a customer, how do I know what value to be setting here?
I'd probably see that 0.075 is too high and therefore my nodes aren't as densely packed as they are with just nodegroups + k8s scheduler, but how do I tune this to say 0.072 etc? Is it just guesswork?
If I choose a really bad number, the worst that will happen is that Karpenter may launch an incorrect number of nodes to satisfy my pending pods but it should eventually converge. Is my understanding correct?
Fixes #
Description
How was this change tested?
make docgen
and observed the overhead calculation is the same as before.Does this change impact docs?
Release Note
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.