Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make vm memory overhead configurable #1953

Merged
merged 7 commits into from
Jun 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ codegen: ## Generate code. Must be run if changes are made to ./pkg/apis/...
docgen: ## Generate docs
go run hack/docs/metrics_gen_docs.go pkg/ website/content/en/preview/tasks/metrics.md
go run hack/docs/instancetypes_gen_docs.go website/content/en/preview/AWS/instance-types.md
go run hack/docs/configuration_gen_docs.go website/content/en/preview/tasks/configuration.md

release: ## Generate release manifests and publish a versioned container image.
$(WITH_GOFLAGS) ./hack/release.sh
Expand Down
2 changes: 1 addition & 1 deletion cmd/controller/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ import (

var (
scheme = runtime.NewScheme()
opts = options.MustParse()
opts = options.New().MustParse()
component = "controller"
)

Expand Down
20 changes: 14 additions & 6 deletions cmd/webhook/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ package main

import (
"context"
"flag"
"fmt"

"k8s.io/client-go/kubernetes"
Expand All @@ -35,13 +36,20 @@ import (
"github.com/aws/karpenter/pkg/apis"
"github.com/aws/karpenter/pkg/cloudprovider"
"github.com/aws/karpenter/pkg/cloudprovider/registry"
"github.com/aws/karpenter/pkg/utils/injection"
"github.com/aws/karpenter/pkg/utils/options"
"github.com/aws/karpenter/pkg/utils/env"
)

var (
opts = options.MustParse()
)
type WebhookOpts struct {
KarpenterService string
WebhookPort int
}

var opts = WebhookOpts{}

func init() {
flag.StringVar(&opts.KarpenterService, "karpenter-service", env.WithDefaultString("KARPENTER_SERVICE", ""), "The Karpenter Service name for the dynamic webhook certificate")
flag.IntVar(&opts.WebhookPort, "port", env.WithDefaultInt("PORT", 8443), "The port the webhook endpoint binds to for validation and mutation of resources")
}

func main() {
config := knativeinjection.ParseAndGetRESTConfigOrDie()
Expand Down Expand Up @@ -94,5 +102,5 @@ func newConfigValidationController(ctx context.Context, cmw configmap.Watcher) *
}

func InjectContext(ctx context.Context) context.Context {
return injection.WithOptions(ctx, opts)
return ctx
}
52 changes: 52 additions & 0 deletions hack/docs/configuration_gen_docs.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
package main

import (
"flag"
"fmt"
"log"
"os"
"strings"

"github.com/aws/karpenter/pkg/utils/options"
)

func main() {
if len(os.Args) != 2 {
log.Printf("Usage: %s path/to/markdown.md", os.Args[0])
os.Exit(1)
}
outputFileName := os.Args[1]
mdFile, err := os.ReadFile(outputFileName)
if err != nil {
log.Printf("Can't read %s file: %v", os.Args[1], err)
os.Exit(2)
}

genStart := "[comment]: <> (the content below is generated from hack/docs/configuration_gen_docs.go)"
genEnd := "[comment]: <> (end docs generated content from hack/docs/configuration_gen_docs.go)"
startDocSections := strings.Split(string(mdFile), genStart)
if len(startDocSections) != 2 {
log.Fatalf("expected one generated comment block start but got %d", len(startDocSections)-1)
}
endDocSections := strings.Split(string(mdFile), genEnd)
if len(endDocSections) != 2 {
log.Fatalf("expected one generated comment block end but got %d", len(endDocSections)-1)
}
topDoc := fmt.Sprintf("%s%s\n\n", startDocSections[0], genStart)
bottomDoc := fmt.Sprintf("\n%s%s", genEnd, endDocSections[1])

opts := options.New()

envVarsBlock := "| Environment Variable | CLI Flag | Description |\n"
envVarsBlock += "|--|--|--|\n"
opts.VisitAll(func(f *flag.Flag) {
envVarsBlock += fmt.Sprintf("| %s | %s | %s |\n", strings.ReplaceAll(strings.ToUpper(f.Name), "-", "_"), "\\-\\-"+f.Name, f.Usage)
})

log.Println("writing output to", outputFileName)
f, err := os.Create(outputFileName)
if err != nil {
log.Fatalf("unable to open %s to write generated output: %v", outputFileName, err)
}
f.WriteString(topDoc + envVarsBlock + bottomDoc)
}
6 changes: 5 additions & 1 deletion hack/docs/instancetypes_gen_docs.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ import (
"github.com/aws/karpenter/pkg/cloudprovider"
"github.com/aws/karpenter/pkg/cloudprovider/aws"
"github.com/aws/karpenter/pkg/cloudprovider/aws/apis/v1alpha1"
"github.com/aws/karpenter/pkg/utils/injection"
"github.com/aws/karpenter/pkg/utils/options"
"github.com/aws/karpenter/pkg/utils/resources"
"github.com/samber/lo"
v1 "k8s.io/api/core/v1"
Expand All @@ -31,7 +33,9 @@ func main() {

os.Setenv("AWS_SDK_LOAD_CONFIG", "true")
os.Setenv("AWS_REGION", "us-east-1")
ctx := context.Background()
os.Setenv("CLUSTER_NAME", "docs-gen")
os.Setenv("CLUSTER_ENDPOINT", "https://docs-gen.aws")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:O What's this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just dummy params to allow for flag parsing to take place. This allowed me to plug in other params that could change the generated instance types resources like max-pods and vm overhead easily. Figured I'd leave in how I did it so that if we need to or want to change some of those parameters, it's a simple os.Setenv(...

ctx := injection.WithOptions(context.Background(), options.New().MustParse())

cp := aws.NewCloudProvider(ctx, cloudprovider.Options{
ClientSet: nil,
Expand Down
17 changes: 8 additions & 9 deletions pkg/cloudprovider/aws/instancetype.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ package aws

import (
"fmt"
"math"
"strings"

"github.com/aws/amazon-vpc-resource-controller-k8s/pkg/aws/vpc"
Expand All @@ -35,9 +36,6 @@ import (
"github.com/aws/karpenter/pkg/utils/sets"
)

// EC2VMAvailableMemoryFactor assumes the EC2 VM will consume <7.25% of the memory of a given machine
const EC2VMAvailableMemoryFactor = .925

type InstanceType struct {
*ec2.InstanceTypeInfo
offerings []cloudprovider.Offering
Expand Down Expand Up @@ -176,9 +174,7 @@ func (i *InstanceType) cpu() resource.Quantity {

func (i *InstanceType) memory() resource.Quantity {
return *resources.Quantity(
fmt.Sprintf("%dMi", int32(
float64(*i.MemoryInfo.SizeInMiB)*EC2VMAvailableMemoryFactor,
)),
fmt.Sprintf("%dMi", *i.MemoryInfo.SizeInMiB),
)
}

Expand Down Expand Up @@ -246,14 +242,17 @@ func (i *InstanceType) awsNeurons() resource.Quantity {
return *resources.Quantity(fmt.Sprint(count))
}

func (i *InstanceType) computeOverhead() v1.ResourceList {
func (i *InstanceType) computeOverhead(vmMemOverhead float64) v1.ResourceList {
memory := i.memory()
overhead := v1.ResourceList{
v1.ResourceCPU: *resource.NewMilliQuantity(
100, // system-reserved
resource.DecimalSI),
v1.ResourceMemory: resource.MustParse(fmt.Sprintf("%dMi",
// kube-reserved
((11*i.eniLimitedPods())+255)+
// vm-overhead
(int64(math.Ceil(float64(memory.Value())*vmMemOverhead/1024/1024)))+
// kube-reserved
((11*i.eniLimitedPods())+255)+
// system-reserved
100+
// eviction threshold https://github.com/kubernetes/kubernetes/blob/ea0764452222146c47ec826977f49d7001b0ea8c/pkg/kubelet/apis/config/v1beta1/defaults_linux.go#L23
Expand Down
2 changes: 1 addition & 1 deletion pkg/cloudprovider/aws/instancetypes.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ func (p *InstanceTypeProvider) newInstanceType(ctx context.Context, info *ec2.In
}
// Precompute to minimize memory/compute overhead
instanceType.resources = instanceType.computeResources(injection.GetOptions(ctx).AWSEnablePodENI)
instanceType.overhead = instanceType.computeOverhead()
instanceType.overhead = instanceType.computeOverhead(injection.GetOptions(ctx).VMMemoryOverhead)
instanceType.requirements = instanceType.computeRequirements()
if !injection.GetOptions(ctx).AWSENILimitedPodDensity {
instanceType.maxPods = ptr.Int32(110)
Expand Down
14 changes: 14 additions & 0 deletions pkg/utils/env/env.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,20 @@ func WithDefaultInt(key string, def int) int {
return i
}

// WithDefaultFloat64 returns the float64 value of the supplied environment variable or, if not present,
// the supplied default value. If the float64 conversion fails, returns the default
func WithDefaultFloat64(key string, def float64) float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a generics way to do this?

Copy link
Contributor Author

@bwagner5 bwagner5 Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The float64 func uses strconv.ParseFloat, the Bool one uses strconv.ParseBool and the Int one uses strconv.Atoi. I think the only way you could do it is to use a generic return type and accept a default interface{} as the parameter so that you could do a type switch:

func WithDefault[V float64 | int | string | bool](key string, def interface{}) V {
    var err error
    var p V
    val, ok := os.LookupEnv(key)
    if !ok {
        return def
    }
    switch def.(type) {
        case float64:
           p, err = strconv.ParseFloat(val, 64)
        case int:
           p, err = strconv.Atoi(val)
        case bool:
           p, err = strconv.ParseBool(val)
       default:
           panic("not supported")
    }
    if err != nil {
        return def.(V)
    }
    return p
}

I think they're better as separate funcs :)

val, ok := os.LookupEnv(key)
if !ok {
return def
}
f, err := strconv.ParseFloat(val, 64)
if err != nil {
return def
}
return f
}

// WithDefaultString returns the string value of the supplied environment variable or, if not present,
// the supplied default value.
func WithDefaultString(key string, def string) string {
Expand Down
63 changes: 40 additions & 23 deletions pkg/utils/options/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,11 @@ limitations under the License.
package options

import (
"errors"
"flag"
"fmt"
"net/url"
"os"

"go.uber.org/multierr"

Expand All @@ -31,43 +33,58 @@ const (
ResourceName AWSNodeNameConvention = "resource-name"
)

func MustParse() Options {
opts := Options{}
flag.StringVar(&opts.ClusterName, "cluster-name", env.WithDefaultString("CLUSTER_NAME", ""), "The kubernetes cluster name for resource discovery")
flag.StringVar(&opts.ClusterEndpoint, "cluster-endpoint", env.WithDefaultString("CLUSTER_ENDPOINT", ""), "The external kubernetes cluster endpoint for new nodes to connect with")
flag.StringVar(&opts.KarpenterService, "karpenter-service", env.WithDefaultString("KARPENTER_SERVICE", ""), "The Karpenter Service name for the dynamic webhook certificate")
flag.IntVar(&opts.MetricsPort, "metrics-port", env.WithDefaultInt("METRICS_PORT", 8080), "The port the metric endpoint binds to for operating metrics about the controller itself")
flag.IntVar(&opts.HealthProbePort, "health-probe-port", env.WithDefaultInt("HEALTH_PROBE_PORT", 8081), "The port the health probe endpoint binds to for reporting controller health")
flag.IntVar(&opts.WebhookPort, "port", 8443, "The port the webhook endpoint binds to for validation and mutation of resources")
flag.IntVar(&opts.KubeClientQPS, "kube-client-qps", env.WithDefaultInt("KUBE_CLIENT_QPS", 200), "The smoothed rate of qps to kube-apiserver")
flag.IntVar(&opts.KubeClientBurst, "kube-client-burst", env.WithDefaultInt("KUBE_CLIENT_BURST", 300), "The maximum allowed burst of queries to the kube-apiserver")
flag.StringVar(&opts.AWSNodeNameConvention, "aws-node-name-convention", env.WithDefaultString("AWS_NODE_NAME_CONVENTION", string(IPName)), "The node naming convention used by the AWS cloud provider. DEPRECATION WARNING: this field may be deprecated at any time")
flag.BoolVar(&opts.AWSENILimitedPodDensity, "aws-eni-limited-pod-density", env.WithDefaultBool("AWS_ENI_LIMITED_POD_DENSITY", true), "Indicates whether new nodes should use ENI-based pod density")
flag.StringVar(&opts.AWSDefaultInstanceProfile, "aws-default-instance-profile", env.WithDefaultString("AWS_DEFAULT_INSTANCE_PROFILE", ""), "The default instance profile to use when provisioning nodes in AWS")
flag.BoolVar(&opts.AWSEnablePodENI, "aws-enable-pod-eni", env.WithDefaultBool("AWS_ENABLE_POD_ENI", false), "If true then instances that support pod ENI will report a vpc.amazonaws.com/pod-eni resource")
flag.Parse()
if err := opts.Validate(); err != nil {
panic(err)
}
return opts
}

// Options for running this binary
type Options struct {
*flag.FlagSet
ClusterName string
ClusterEndpoint string
KarpenterService string
MetricsPort int
HealthProbePort int
WebhookPort int
KubeClientQPS int
KubeClientBurst int
VMMemoryOverhead float64
AWSNodeNameConvention string
AWSENILimitedPodDensity bool
AWSDefaultInstanceProfile string
AWSEnablePodENI bool
}

// New creates an Options struct and registers CLI flags and environment variables to fill-in the Options struct fields
func New() Options {
opts := Options{}
f := flag.NewFlagSet("karpenter", flag.ContinueOnError)
opts.FlagSet = f
f.StringVar(&opts.ClusterName, "cluster-name", env.WithDefaultString("CLUSTER_NAME", ""), "The kubernetes cluster name for resource discovery")
f.StringVar(&opts.ClusterEndpoint, "cluster-endpoint", env.WithDefaultString("CLUSTER_ENDPOINT", ""), "The external kubernetes cluster endpoint for new nodes to connect with")
f.IntVar(&opts.MetricsPort, "metrics-port", env.WithDefaultInt("METRICS_PORT", 8080), "The port the metric endpoint binds to for operating metrics about the controller itself")
f.IntVar(&opts.HealthProbePort, "health-probe-port", env.WithDefaultInt("HEALTH_PROBE_PORT", 8081), "The port the health probe endpoint binds to for reporting controller health")
f.IntVar(&opts.KubeClientQPS, "kube-client-qps", env.WithDefaultInt("KUBE_CLIENT_QPS", 200), "The smoothed rate of qps to kube-apiserver")
f.IntVar(&opts.KubeClientBurst, "kube-client-burst", env.WithDefaultInt("KUBE_CLIENT_BURST", 300), "The maximum allowed burst of queries to the kube-apiserver")
f.Float64Var(&opts.VMMemoryOverhead, "vm-memory-overhead", env.WithDefaultFloat64("VM_MEMORY_OVERHEAD", 0.075), "The VM memory overhead as a percent that will be subtracted from the total memory for all instance types")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you doing some testing on this - did we want to tune 0.075 any further or is that still the best approximation we've got by default?

Have we also given up trying to do this per instance type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing found 0.075 to be very close to the best param for AL2, but this will at least give users control if they want to tune it themselves which could make sense if the instance types are constrained too.

I gave up on per instance type, there's too much variability and we could be wrong as new ones are released, the percentage is a conservative, safe value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a customer, how do I know what value to be setting here?

I'd probably see that 0.075 is too high and therefore my nodes aren't as densely packed as they are with just nodegroups + k8s scheduler, but how do I tune this to say 0.072 etc? Is it just guesswork?

If I choose a really bad number, the worst that will happen is that Karpenter may launch an incorrect number of nodes to satisfy my pending pods but it should eventually converge. Is my understanding correct?

f.StringVar(&opts.AWSNodeNameConvention, "aws-node-name-convention", env.WithDefaultString("AWS_NODE_NAME_CONVENTION", string(IPName)), "The node naming convention used by the AWS cloud provider. DEPRECATION WARNING: this field may be deprecated at any time")
f.BoolVar(&opts.AWSENILimitedPodDensity, "aws-eni-limited-pod-density", env.WithDefaultBool("AWS_ENI_LIMITED_POD_DENSITY", true), "Indicates whether new nodes should use ENI-based pod density")
f.StringVar(&opts.AWSDefaultInstanceProfile, "aws-default-instance-profile", env.WithDefaultString("AWS_DEFAULT_INSTANCE_PROFILE", ""), "The default instance profile to use when provisioning nodes in AWS")
f.BoolVar(&opts.AWSEnablePodENI, "aws-enable-pod-eni", env.WithDefaultBool("AWS_ENABLE_POD_ENI", false), "If true then instances that support pod ENI will report a vpc.amazonaws.com/pod-eni resource")
return opts
}

// MustParse reads the user passed flags, environment variables, and default values.
// Options are valided and panics if an error is returned
func (o Options) MustParse() Options {
err := o.Parse(os.Args[1:])

if errors.Is(err, flag.ErrHelp) {
os.Exit(0)
}
if err != nil {
panic(err)
}
if err := o.Validate(); err != nil {
panic(err)
}
return o
}

func (o Options) Validate() (err error) {
err = multierr.Append(err, o.validateEndpoint())
if o.ClusterName == "" {
Expand Down
30 changes: 27 additions & 3 deletions website/content/en/preview/tasks/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,30 @@ description: >
Configure Karpenter
---

There are two main configuration mechanisms that can be used to configure Karpenter: Environment Variables / CLI parameters to the controller and webhook binaries and the `karpenter-global-settings` config-map.

## Environment Variables / CLI Flags

[comment]: <> (the content below is generated from hack/docs/configuration_gen_docs.go)

| Environment Variable | CLI Flag | Description |
|--|--|--|
| AWS_DEFAULT_INSTANCE_PROFILE | \-\-aws-default-instance-profile | The default instance profile to use when provisioning nodes in AWS |
| AWS_ENABLE_POD_ENI | \-\-aws-enable-pod-eni | If true then instances that support pod ENI will report a vpc.amazonaws.com/pod-eni resource |
| AWS_ENI_LIMITED_POD_DENSITY | \-\-aws-eni-limited-pod-density | Indicates whether new nodes should use ENI-based pod density |
| AWS_NODE_NAME_CONVENTION | \-\-aws-node-name-convention | The node naming convention used by the AWS cloud provider. DEPRECATION WARNING: this field may be deprecated at any time |
| CLUSTER_ENDPOINT | \-\-cluster-endpoint | The external kubernetes cluster endpoint for new nodes to connect with |
| CLUSTER_NAME | \-\-cluster-name | The kubernetes cluster name for resource discovery |
| HEALTH_PROBE_PORT | \-\-health-probe-port | The port the health probe endpoint binds to for reporting controller health |
| KUBE_CLIENT_BURST | \-\-kube-client-burst | The maximum allowed burst of queries to the kube-apiserver |
| KUBE_CLIENT_QPS | \-\-kube-client-qps | The smoothed rate of qps to kube-apiserver |
| METRICS_PORT | \-\-metrics-port | The port the metric endpoint binds to for operating metrics about the controller itself |
| VM_MEMORY_OVERHEAD | \-\-vm-memory-overhead | The VM memory overhead as a percent that will be subtracted from the total memory for all instance types |

[comment]: <> (end docs generated content from hack/docs/configuration_gen_docs.go)

## ConfigMap

Karpenter installs a default configuration via its Helm chart that should work for most. Additional configuration can be performed by editing the `karpenter-global-settings` configmap within the namespace that Karpenter was installed in.

```yaml
Expand All @@ -21,19 +45,19 @@ data:
batchIdleDuration: 1s
```

## Batching Parameters
### Batching Parameters

The batching parameters control how Karpenter batches an incoming stream of pending pods. Reducing these values may trade off a slightly faster time from pending pod to node launch, in exchange for launching smaller nodes. Increasing the values can do the inverse. Karpenter provides reasonable defaults for these values, but if you have specific knowledge about your workloads you can tweak these parameters to match the expected rate of incoming pods.

For a standard deployment scale-up, the pods arrive at the QPS setting of the `kube-controller-manager`, and the default values are typically fine. These settings are intended for use cases where other systems may create large numbers of pods over a period of many seconds or minutes and there is a desire to batch them together.

### `batchIdleDuration`
#### `batchIdleDuration`

The `batchIdleDuration` is the period of time that a new pending pod extends the current batching window. This can be increased to handle scenarios where pods arrive slower than one second part, but it would be preferable if they were batched together onto a single larger node.

This value is expressed as a string value like `10s`, `1m` or `2h45m`. The valid time units are `ns`, `us` (or `µs`), `ms`, `s`, `m`, `h`.

### `batchMaxDuration`
#### `batchMaxDuration`

The `batchMaxDuration` is the maximum period of time a batching window can be extended to. Increasing this value will allow the maximum batch window size to increase to collect more pending pods into a single batch at the expense of a longer delay from when the first pending pod was created.

Expand Down