Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuring scheduler via json configuration file #4674

Merged
merged 6 commits into from Mar 2, 2015

Conversation

abhgupta
Copy link

This PR introduces the capability to configure the scheduler via a json file. The API is versioned. This is the implementation for the issue --> #4303

@googlebot
Copy link

Thanks for your pull request.

It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA) at https://cla.developers.google.com/.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check the information on your CLA or see this help article on setting the email on your git commits.

Once you've done that, please reply here to let us know. If you signed the CLA as a corporation, please let us know the company's name.

@abhgupta
Copy link
Author

@bgrant0607 @davidopp @mikedanese Please review

@vmarmol
Copy link
Contributor

vmarmol commented Feb 20, 2015

Assigning to @bgrant0607, feel free to re-assign.

@abhgupta
Copy link
Author

@smarterclayton PTAL - this incorporates the feedback that you provided.

@vmarmol
Copy link
Contributor

vmarmol commented Feb 20, 2015

@abhgupta can you rebase? It looks like this doesn't merge cleanly anymore.

@abhgupta abhgupta force-pushed the abhgupta-dev branch 2 times, most recently from 21f4c71 to 2ddb069 Compare February 20, 2015 19:55
@abhgupta
Copy link
Author

I have some configuration examples written up that I'll submit once this PR goes through some feedback/review. Sample JSON configuration:

{
    "predicates" : [
        {"name" : "RegionAffinity", "argument" : {"serviceAffinity" : {"labels" : ["region"]}}},
        {"name" : "RequireRegion", "argument" : {"labelsPresence" : {"labels" : ["region"], "presence" : true}}},
        {"name" : "PodFitsPorts"},
        {"name" : "MatchNodeSelector}
    ],
    "priorities" : [
        {"name" : "ZoneSpread", "weight" : 2, "argument" : {"serviceAntiAffinity" : {"label" : "zone"}}},
        {"name" : "RackSpread", "weight" : 2, "argument" : {"serviceAntiAffinity" : {"label" : "rack"}}},
        {"name" : "ServiceSpreadingPriority", "weight" : 1}
    ]
}

@@ -37,9 +37,9 @@ func affinityPredicates() util.StringSet {
"PodFitsResources",
"NoDiskConflict",
// Ensures that all pods within the same service are hosted on minions within the same region as defined by the "region" label
factory.RegisterFitPredicate("ServiceAffinity", algorithm.NewServiceAffinityPredicate(factory.PodLister, factory.ServiceLister, factory.MinionLister, []string{"region"})),
factory.RegisterFitPredicate("RegionAffinity", algorithm.NewServiceAffinityPredicate(factory.PodLister, factory.ServiceLister, factory.MinionLister, []string{"region"})),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do "Region" and "Zone" mean in this context? We do not advocate running a single Kubernetes across multiple availability zones.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current solution is flexible and uses labels to define topological levels. The levels do not themselves have any meaning within the system, and the user is free to ascribe any meaning to them. The labels could be regions/zones/racks or building/server-room/rack.

Given that cross-region deployments is something that is not advocated, I am leaning towards simply removing the Affinity provider completely and just provide configuration examples to the user in documentation to educate them on the possible use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supportive of spreading by arbitrary labels. A common use case on bare metal is spreading across racks (power and network domain) and bus bars (power domain), for example.

I would just prefer not to use region and zone as examples, nor as defaults.

Another possibility is that we could support a few generic topology levels (level1, level2, level3, ..., level6) by default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since generic nested topological levels have been implemented, I would be wary of taking a step back and restricting the number of nested levels.

And I agree, I will remove the affinity provider for now and just have affinity/anti-affinity at the different levels be generically defined in documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to imply that we should restrict the number of levels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhgupta That sounds reasonable, please ping me when you've made the change you described.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danmcp: Do you need anything here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgrant0607 Since you mentioned @danmcp I would just like to get your quick input on something. If it turns out to be a more involved question, I'll create a separate issue.

Dan had asked me why we used services instead of replication controllers for identifying peer pods for spreading. The argument is that services might not be needed and could be avoided in some cases, whereas replication controller will most likely always be there. Technically, you could have a set of individual pods placed behind a service or pods from multiple replication controllers placed behind a service - but what is the most convenient/practical approach?

Thoughts???

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgrant0607 I would only like to clarify the statements around not running kubernetes across zone. I believe the statement is that you should run kubernetes across small enough area that you can afford to lose (at least the master tier for some time period) and that has low enough network latency to communicate master <-> node and node <-> node for your applications. For some deployments that might mean cluster can't cross zone. For others, cross zone would be desired for higher availability of applications without having to maintain multiple kubernetes clusters. And the more highly available kubernetes masters are, the more likely cross zone would be popular for smaller deployments.

@bgrant0607
Copy link
Member

@abhgupta How do expect to distribute the configuration file to the scheduler?

@davidopp Would you mind reviewing? Feel free to ping me directly via email regarding any specific part you'd like me to review.

@bgrant0607 bgrant0607 assigned davidopp and unassigned bgrant0607 Feb 25, 2015
@abhgupta
Copy link
Author

@bgrant0607 @davidopp The configuration file is something that the admin will place and make accessible to the scheduler. Typically, I would assume this would be handled by configuration management tools like puppet/chef/ansible/salt/etc.

Either ways, the configuration file is optional and the DefaultProvider comes into play, ensuring that this does not become a gating factor for simple use cases.

@davidopp
Copy link
Member

Sorry for the delay, taking a look now.

if err != nil {
return nil, fmt.Errorf("Unable to read policy config: %v", err)
}
//err = json.Unmarshal(configData, &policy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove before submitting.

@davidopp
Copy link
Member

At a high level I think this PR is great, my main concern is that config files are almost certainly not the way we want to go long-term for component configs. We want something more like #1627, where we use config objects in the store. Unfortunately that effort has fallen victim to a bit of analysis paralysis and I don't see it happening before 1.0 (though I will ping that issue). It would not be hard to adapt what you've done here to use a config source other than file, but I'm worried about people who might start using the file-based config if/when we switch to a different config mechanism.

@abhgupta
Copy link
Author

@bgrant0607 @davidopp Have made the other changes based on review/feedback. Ideally I would like to get this PR merged soon since the current enhancements to predicates/priorities are not consumable without this PR. Getting this PR in would allow some much needed testing for the non-default scheduler predicates/priorities.

Are we blocking this PR in its current form or do we agree that we can let it be currently merged and convert to secrets later?

@bgrant0607
Copy link
Member

I think we can convert to the new config mechanism later, yes. If we implement the secret-like approach, the data will be populated in a volume, so this mechanism might not even require changes.

@bgrant0607
Copy link
Member

@abhgupta @danmcp: why use services instead of replication controllers for identifying peer pods for spreading. Discussed in #1965 and #2312. Replication controllers are associated with deployments. We expect there to be multiple per service: canaries, rolling updates, multiple release tracks, etc. What people want is spreading of the whole service. Custom spreading is discussed in #4301 and #367.

@bgrant0607
Copy link
Member

@danmcp: Yes, I agree with your description.

@davidopp
Copy link
Member

LGTM

@davidopp davidopp added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 27, 2015
@abhgupta abhgupta force-pushed the abhgupta-dev branch 2 times, most recently from eba9270 to add2868 Compare February 27, 2015 20:53
@abhgupta
Copy link
Author

@davidopp PTAL - I have added test cases that accept a JSON config for the scheduler. This is now ready for a merge.

@davidopp
Copy link
Member

davidopp commented Mar 1, 2015

LGTM

@abhgupta
Copy link
Author

abhgupta commented Mar 2, 2015

@davidopp I modified the test case to remove the dependency on the algorithm provider. Importing the "defaults" algorithm provider was creating an import loop and all I needed for the test case was to test the combination of configurable and pre-defined predicates/priorities.

davidopp added a commit that referenced this pull request Mar 2, 2015
Configuring scheduler via json configuration file
@davidopp davidopp merged commit 32523f8 into kubernetes:master Mar 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants