Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Autoscaling Controller #385

Merged
merged 72 commits into from
Oct 21, 2017
Merged

[WIP]Autoscaling Controller #385

merged 72 commits into from
Oct 21, 2017

Conversation

typhoonzero
Copy link
Collaborator

@typhoonzero typhoonzero commented Oct 12, 2017

Fix #384

This is under development, I created this PR to show the differences.

@typhoonzero typhoonzero changed the title Autoscaling Controller [WIP]Autoscaling Controller Oct 12, 2017
@typhoonzero
Copy link
Collaborator Author

Changes made:

  • Moved API definations to go/api. Use v1.ResourceRequirements from client-go as Resource defination.
  • Add jobparser.go to parse TrainingJob resource to corresponding Job and ReplicaSet.
  • Move autoscaler to go/controller/autoscaler. Using v1.ResourceRequirements is precise but anoying, should refine this.

TODO:

  • Complete jobparser.
  • Sync status of thirdpartyresource TrainingJob.

@@ -147,7 +147,7 @@ func (c *K8sCluster) SyncResource() error {
}
for resname, q := range item.Status.Allocatable {
if resname == v1.ResourceCPU {
totalCPU += float64(q.Value()) + float64(q.MilliValue())/1000.0
Copy link
Collaborator

@helinwang helinwang Oct 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go's constant is very special:

Go is a statically typed language that does not permit operations that mix numeric types. You can't add a float64 to an int, or even an int32 to an int. Yet it is legal to write 1e6*time.Second or math.Exp(1) or even 1<<('\t'+2.0). In Go, constants, unlike variables, behave pretty much like regular numbers. This post explains why that is and what it means.

So it's more idiomatic to simple use 1000 here, don't worry about accidentally messing up anything. The Go compiler is very strict about type (doesn't allow type promotion), so in the case of using constants, as far as I know, if anything could go wrong, it won't compile.

Go's constant is very carefully designed, safe and intuitive.

Please see: https://blog.golang.org/constants

limits[podLimitName] = value
}
for _, container := range pod.Spec.Containers {
AddResourceList(&reqs, container.Resources.Requests)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to reduce the lines by using AddResourceList.

Removed getting resource from InitContainers because currently paddle job trainer seems do not use initcontainer.

Copy link
Collaborator

@helinwang helinwang Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@typhoonzero from ChenXi's article we are going to use the controller in a general cluster, so could be pods other than Paddle pods.

allLimits[key] = a
}
// get non-terminated pods from all namespaces all nodes.
// FIXME(typhoonzero): scan all pods is not a efficient way.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to get pod resource requests from all the nodes. I simplified this by get all the pods in one API call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's too much pods, this may take a lot of memory though.

Copy link
Collaborator

@helinwang helinwang Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!
Btw, I think the memory limit is fine, 1MB can fit thousands of "medium-sized" data structures.

CPURequestKilo float64
CPULimitKilo float64
CPUTotalKilo float64
CPURequestMilli int64
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K8s CPU request can be at least 0.1 core which means 100m (100 milli CPU), change kilo to milli

@@ -27,7 +27,8 @@ def wait_pods_running(label_selector, desired):
print "label selector: %s, desired: %s" % (label_selector, desired)
while True:
count = count_pods_by_phase(label_selector, 'Running')
if count == int(desired):
# NOTE: pods may be scaled.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pods may be scaled.


func TestScaleDryRunNoMoreCPU(t *testing.T) {
r := ClusterResource{
CPULimitMilli: 1000,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000 milli("1000m") means 1 cpu core, "100m" means 0.1 cpu core.

@@ -17,7 +17,7 @@ spec:
containers:
- name: trainer
image: paddlepaddle/paddlecloud-job
imagePullPolicy: Always
imagePullPolicy: Never
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why it's "Never"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, That's for my local running, will update.

@helinwang
Copy link
Collaborator

helinwang commented Oct 19, 2017

I think If we need to add a scheduler, we need to first define an interface for the scheduler, have a naive implementation for the interface, and defer the decision about which kind of scheduler to implement for as long as possible (in this way we have more information to make the correct decision). Currently the bottleneck is not the scheduler, given that the functionalities are still not complete:

  1. Controller can not submit jobs.
  2. Controller does not register the third party resource by itself.
  3. Have bugs to iron out.

I could be wrong and we can discuss more about it, but I think currently we should not our time on implementing the CFS.

Btw, the check in contains https://github.com/sakeven/RbTree/blob/master/rbtree.go , maybe the best way is to use it (or a fork) as a dependency.

func AddResourceList(a *v1.ResourceList, b v1.ResourceList) {
// AddResourceList add another v1.ResourceList to first's inner
// quantity. v1.ResourceList is equal to map[string]Quantity
func AddResourceList(a v1.ResourceList, b v1.ResourceList) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a is of type map, don't need to pass as pointer: https://github.com/golang/go/wiki/CodeReviewComments#copying

I understand from the C programming language's perspective
AddResourceList(&reqs, container.Resources.Requests) is more clear than AddResourceList(reqs, container.Resources.Requests) about reqs has been modified. I think that's fine in Go, since req is a map.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

func getPodsTotalRequestsAndLimits(podList *v1.PodList) (reqs v1.ResourceList, limits v1.ResourceList, err error) {
reqs, limits = v1.ResourceList{}, v1.ResourceList{}
for _, pod := range podList.Items {
for _, container := range pod.Spec.Containers {
AddResourceList(reqs, container.Resources.Requests)
AddResourceList(limits, container.Resources.Limits)
}

for _, container := range pod.Spec.InitContainers {
Copy link
Collaborator

@helinwang helinwang Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From ChenXi's article we are going to use the controller in a general cluster, so even though PaddlePaddle does not use init pods, other people could use init pods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! Thanks!

@typhoonzero
Copy link
Collaborator Author

@helinwang

I think If we need to add a scheduler, we need to first define an interface for the scheduler, have a naive implementation for the interface, and defer the decision about which kind of scheduler to implement for as long as possible (in this way we have more information to make the correct decision). Currently the bottleneck is not the scheduler, given that the functionalities are still not complete:

I intended to push some codes that may be useful in the future. Currently we do scale by scaning the pods one by one without consider the priority or faireness of the jobs. This may due to some "important" jobs may never get scaled. I'll add some design doc later to try to make clear of this.

I think I'll keep the choice of CFS here because we intend to let all job share resources equally(can have job weights). I'll leave the interface here and add a design doc.

@helinwang
Copy link
Collaborator

@typhoonzero I see, thanks! Maybe we can first deploy our current implementation on the cluster, and work on scheduler after we get more experience with real usage.

Copy link
Collaborator

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@typhoonzero typhoonzero merged commit d5816c5 into develop Oct 21, 2017
@xiaolao xiaolao deleted the controller branch April 15, 2022 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants