perform min instance bin-packing to short circuit unneeded packings #962

bwagner5 · 2021-12-09T23:11:48Z

1. Issue, if available:
N/A

2. Description of changes:

This PR changes the bin-packing logic to find the maximum packing possible and then iterate through instance types until it finds a smaller instance type that supports the maximum packing. This algorithm simplifies the packing logic a bit as well.
This PR also adds GPU resource validation to the packables instance type filtering logic. Previously, GPU instance types were only excluded when the pods did not request a GPU. If a pod did request a GPU, the instance types without GPUs were excluded at packing time which increases the packing iterations substantially.
A general performance improvement was made by moving the PackablesFor(.. outside of the remainingPods loop and instead DeepCopy the packables on each iteration to short circuit validation that has already occurred.

Some Numbers:

Worst case (meaning min-packing didn't optimize anything):

Before this change a 5,000 pod packing took 1,260ms
After this change a 5,000 pod packing took 426ms

Best Case (min-packing engaged on smallest instance-type):

Before this change a 5,000 pod packing took 1,658ms
After this change a 5,000 pod packing took 99ms

3. Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: link to issue
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify · 2021-12-09T23:11:56Z

✔️ Deploy Preview for karpenter-docs-prod ready!

🔨 Explore the source changes: 86bfe14

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61b7cea8cf53290007527b32

😎 Browse the preview: https://deploy-preview-962--karpenter-docs-prod.netlify.app

ellistarn · 2021-12-09T23:39:17Z

pkg/cloudprovider/fake/instancetype.go

 }

 type InstanceType struct {
-	InstanceTypeOptions
+	options InstanceTypeOptions


This was necessary because the funcs defined in the Interface now match the struct fields of InstanceTypeOptions, since they are now exported.

Hmm -- I actually kind of did this intentionally to encourage not making them public. Do you need to modify instance types in the test? Would it be simpler to defined them all in this package so we can keep them private?

Making them public allows for more isolated tests with cloud provider specific data. For example, in my testing of the binpacking pkg, I loaded all the real AWS InstanceTypes into fake instance types and ran that thru the packing logic for lightweight, real-world testing.

I'm more in favor of keeping them public since it makes loading arbitrary instance types easier.

ellistarn · 2021-12-09T23:41:01Z

pkg/controllers/provisioning/binpacking/packable.go

@@ -39,10 +41,36 @@ type Result struct {
 	unpacked []*v1.Pod
 }

+type Packables []*Packable


I find this sort implementation pattern to be vastly inferior to https://pkg.go.dev/sort#Slice. You can separate out the comparator as necessary.

ah you're right, I'll refactor those sorts.

Did you forget to push?

no? I might have done an intermediate push when you looked, should be okay now.

pkg/controllers/provisioning/binpacking/packable.go

ellistarn · 2021-12-09T23:46:02Z

pkg/controllers/provisioning/binpacking/packable.go

-				return nil
-			}
-		}
+	needsGpu := p.needsResource(pods, resources.NvidiaGPU)


Can you collapse this into

if p.requiresResource(pods, resources.NvidiaGPU) == p.InstanceType.NvidiaGPUs().IsZero() { return fmt.Errorf("%s is only used if required", resources.NvidiaGPU) }

Maybe this error message:

return errors.New("pod requirements do not match Nvidia GPUs")

"not required" could be understood as "is allowed" which is confusing to get in an error message.

To be clear my suggestion was to not explicitly reference anything in the string and instead use the resource type. This will allow us to abstract in the future.

pkg/controllers/provisioning/binpacking/packable.go

pkg/controllers/provisioning/binpacking/packer.go

ellistarn · 2021-12-09T23:51:45Z

pkg/controllers/provisioning/scheduling/scheduler.go

 	// Pods is a set of pods that may schedule to the node; used for binpacking.
 	Pods []*v1.Pod
 }

+type RuntimeConstraints struct {


Thoughts on calling this object scheduling.Constraints for parity with v1alpha5.Constraints. I don't think that "Runtime" differentiates more meaningfully than the local context.

Also -- I'm not understanding why this object is useful.

pkg/test/pods.go

ellistarn · 2021-12-09T23:55:38Z

pkg/utils/pod/sortable.go

+type SortablePods []*v1.Pod
+type ByResourcesRequested struct{ SortablePods }
+
+func (pods SortablePods) Len() int {


Thoughts on switching to the sort.Slice(pods, pod.CompareResources) approach mentioned above?

njtran · 2021-12-10T20:05:54Z

pkg/controllers/provisioning/binpacking/packable.go

+	}
+	return packables[a].AMDGPUs().Cmp(*packables[b].AMDGPUs()) == -1 ||
+		packables[a].NvidiaGPUs().Cmp(*packables[b].NvidiaGPUs()) == -1 ||
+		packables[a].AWSNeurons().Cmp(*packables[b].AWSNeurons()) == -1


Do the number of AWSNeurons, AMDGPUs, and NvidiaGPUs on each instance increase with instance size at the same rate? If not, wouldn't it be possible for this to have non-deterministic behavior?

They increase, not necessarily at the same rate by gpu but by vcpu and memory. I'm not sure how that would introduce non-deterministic behavior though?

For example:

$ ec2-instance-selector -o table-wide --allow-list g4ad Instance Type VCPUs Mem (GiB) Hypervisor Current Gen Hibernation Support CPU Arch Network Performance ENIs GPUs GPU Mem (GiB) GPU Info On-Demand Price/Hr ------------- ----- --------- ---------- ----------- ------------------- -------- ------------------- ---- ---- ------------- -------- ------------------ g4ad.xlarge 4 16 nitro true false x86_64 Up to 10 Gigabit 2 1 8 AMD Radeon Pro V520 -No Price Filter Specified- g4ad.2xlarge 8 32 nitro true false x86_64 Up to 10 Gigabit 2 1 8 AMD Radeon Pro V520 -No Price Filter Specified- g4ad.4xlarge 16 64 nitro true false x86_64 Up to 10 Gigabit 3 1 8 AMD Radeon Pro V520 -No Price Filter Specified- g4ad.8xlarge 32 128 nitro true false x86_64 15 Gigabit 4 2 16 AMD Radeon Pro V520 -No Price Filter Specified- g4ad.16xlarge 64 256 nitro true false x86_64 25 Gigabit 8 4 32 AMD Radeon Pro V520 -No Price Filter Specified-

njtran · 2021-12-10T20:28:58Z

pkg/controllers/provisioning/binpacking/packable.go

-				return nil
-			}
-		}
+	needsGpu := p.needsResource(pods, resources.NvidiaGPU)


Maybe this error message:

return errors.New("pod requirements do not match Nvidia GPUs")

"not required" could be understood as "is allowed" which is confusing to get in an error message.

njtran · 2021-12-10T20:30:23Z

pkg/controllers/provisioning/binpacking/packable.go

+	if needsGpu && p.InstanceType.AWSNeurons().IsZero() {
+		return errors.New("aws neuron is required")
+	} else if !needsGpu && !p.InstanceType.AWSNeurons().IsZero() {
+		return errors.New("aws neuron is not required")


"not required" could be understood as "is allowed"

…type more min packing

bwagner5 force-pushed the min-packing branch from 4809a7e to 40a26b6 Compare December 9, 2021 23:21

ellistarn reviewed Dec 9, 2021

View reviewed changes

pkg/controllers/provisioning/binpacking/packable.go Show resolved Hide resolved

ellistarn reviewed Dec 9, 2021

View reviewed changes

pkg/controllers/provisioning/binpacking/packable.go Outdated Show resolved Hide resolved

ellistarn reviewed Dec 9, 2021

View reviewed changes

pkg/controllers/provisioning/binpacking/packer.go Show resolved Hide resolved

ellistarn reviewed Dec 9, 2021

View reviewed changes

pkg/test/pods.go Outdated Show resolved Hide resolved

ellistarn reviewed Dec 9, 2021

View reviewed changes

njtran reviewed Dec 10, 2021

View reviewed changes

bwagner5 force-pushed the min-packing branch from 59ff79d to b566b14 Compare December 13, 2021 22:02

bwagner5 and others added 4 commits December 13, 2021 16:52

perform min instance bin-packing rather than checking every instance …

7e9aec4

…type more min packing

pr comments

d8da4e1

remove scheduling constraints and replace with anonymous struct

7298f01

fix gpu limit stuff

86bfe14

bwagner5 force-pushed the min-packing branch from a103eeb to 86bfe14 Compare December 13, 2021 22:52

ellistarn approved these changes Dec 13, 2021

View reviewed changes

bwagner5 merged commit 1faf28a into aws:main Dec 13, 2021

bwagner5 deleted the min-packing branch December 13, 2021 23:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perform min instance bin-packing to short circuit unneeded packings #962

perform min instance bin-packing to short circuit unneeded packings #962

bwagner5 commented Dec 9, 2021

netlify bot commented Dec 9, 2021 •

edited

ellistarn Dec 9, 2021

bwagner5 Dec 13, 2021

ellistarn Dec 13, 2021 •

edited

bwagner5 Dec 13, 2021

ellistarn Dec 9, 2021

bwagner5 Dec 13, 2021

ellistarn Dec 13, 2021

bwagner5 Dec 13, 2021

ellistarn Dec 9, 2021

njtran Dec 10, 2021

ellistarn Dec 13, 2021 •

edited

ellistarn Dec 9, 2021

ellistarn Dec 9, 2021

ellistarn Dec 9, 2021 •

edited

njtran Dec 10, 2021

bwagner5 Dec 13, 2021

njtran Dec 10, 2021

njtran Dec 10, 2021

perform min instance bin-packing to short circuit unneeded packings #962

perform min instance bin-packing to short circuit unneeded packings #962

Conversation

bwagner5 commented Dec 9, 2021

Some Numbers:

netlify bot commented Dec 9, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ellistarn Dec 13, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ellistarn Dec 13, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ellistarn Dec 9, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netlify bot commented Dec 9, 2021 •

edited

ellistarn Dec 13, 2021 •

edited

ellistarn Dec 13, 2021 •

edited

ellistarn Dec 9, 2021 •

edited