-
Notifications
You must be signed in to change notification settings - Fork 605
Separate operator workload nodegroup #577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
c74baf2
NodeGroup spot instances
vishalbollu f4fd69c
Update cluster-autoscaler.yaml
deliahu abfe18f
Update autoscaler to version 1.16
vishalbollu a6a096b
Merge branch 'spot-instances' into separate-operator-workload-nodegroup
vishalbollu fdc8201
Calculate allocatable resources more accurately
vishalbollu 6f12aa9
Merge branch 'master' into separate-operator-workload-nodegroup
vishalbollu b0e0fa6
Separate nodegroups
vishalbollu 12ee06f
Merge branch 'master' into separate-operator-workload-nodegroup
vishalbollu 3268382
Add desired instances
vishalbollu 8d4ea32
Minor cleanup
vishalbollu e952392
Remove debug statements
vishalbollu 607c545
Merge branch 'master' into separate-operator-workload-nodegroup
vishalbollu 351e68b
Remove more debugging helpers
vishalbollu 58e4933
Reset go.mod
vishalbollu c56ca3e
Remove more echo statements
vishalbollu cdf862e
Remove unnecessary boto3 dependency
vishalbollu 1f18d52
Address some PR comments and fix linting
vishalbollu f90f921
Remove InternalClusterConfig
deliahu 2703944
Address more PR comments
vishalbollu bd24c1c
Separate internal cluster config
deliahu a8c16f4
Change cortex internal cluster path for dev to be in the dev directory
vishalbollu fad20f4
Update config.md docs
vishalbollu 96005c1
Change config map key name
vishalbollu 5848fbe
Remove outdated comment and minor refactor
vishalbollu d37914c
Fix formatting
deliahu acf0058
Update api_workload.go
deliahu 19fe4ff
Update memory_capacity.go
deliahu 3f9a62f
Update metrics-server.yaml
deliahu 1c40e7e
Merge branch 'master' into separate-operator-workload-nodegroup
vishalbollu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Copyright 2019 Cortex Labs, Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
apiVersion: eksctl.io/v1alpha5 | ||
kind: ClusterConfig | ||
|
||
metadata: | ||
name: $CORTEX_CLUSTER_NAME | ||
region: $CORTEX_REGION | ||
version: "1.14" | ||
|
||
nodeGroups: | ||
- name: ng-cortex-operator | ||
instanceType: t3.medium | ||
minSize: 1 | ||
maxSize: 2 | ||
desiredCapacity: 1 | ||
ami: auto | ||
iam: | ||
withAddonPolicies: | ||
autoScaler: true | ||
tags: | ||
k8s.io/cluster-autoscaler/enabled: 'true' | ||
kubeletExtraConfig: | ||
kubeReserved: | ||
cpu: 150m | ||
memory: 300Mi | ||
ephemeral-storage: 1Gi | ||
kubeReservedCgroup: /kube-reserved | ||
systemReserved: | ||
cpu: 150m | ||
memory: 300Mi | ||
ephemeral-storage: 1Gi | ||
evictionHard: | ||
memory.available: 200Mi | ||
nodefs.available: 5% | ||
|
||
- name: ng-cortex-worker | ||
instanceType: $CORTEX_INSTANCE_TYPE | ||
minSize: $CORTEX_MIN_INSTANCES | ||
maxSize: $CORTEX_MAX_INSTANCES | ||
desiredCapacity: $CORTEX_DESIRED_INSTANCES | ||
ami: auto | ||
iam: | ||
withAddonPolicies: | ||
autoScaler: true | ||
tags: | ||
k8s.io/cluster-autoscaler/enabled: 'true' | ||
k8s.io/cluster-autoscaler/node-template/label/nvidia.com/gpu: 'true' | ||
k8s.io/cluster-autoscaler/node-template/taint/dedicated: nvidia.com/gpu=true | ||
k8s.io/cluster-autoscaler/node-template/label/workload: 'true' | ||
labels: | ||
lifecycle: Ec2Spot | ||
workload: "true" | ||
nvidia.com/gpu: 'true' | ||
taints: | ||
nvidia.com/gpu: "true:NoSchedule" | ||
workload: "true:NoSchedule" | ||
kubeletExtraConfig: | ||
kubeReserved: | ||
cpu: 150m | ||
memory: 300Mi | ||
ephemeral-storage: 1Gi | ||
kubeReservedCgroup: /kube-reserved | ||
systemReserved: | ||
cpu: 150m | ||
memory: 300Mi | ||
ephemeral-storage: 1Gi | ||
evictionHard: | ||
memory.available: 200Mi | ||
nodefs.available: 5% |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
# Copyright 2019 Cortex Labs, Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import requests | ||
import sys | ||
import re | ||
import os | ||
import pathlib | ||
import json | ||
import yaml | ||
|
||
PRICING_ENDPOINT_TEMPLATE = ( | ||
"https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/{}/index.json" | ||
) | ||
|
||
|
||
def download_metadata(cluster_config): | ||
response = requests.get(PRICING_ENDPOINT_TEMPLATE.format(cluster_config["region"])) | ||
offers = response.json() | ||
|
||
instance_mapping = {} | ||
|
||
for product_id, product in offers["products"].items(): | ||
if product.get("attributes") is None: | ||
continue | ||
if product["attributes"].get("servicecode") != "AmazonEC2": | ||
continue | ||
if product["attributes"].get("tenancy") != "Shared": | ||
continue | ||
if product["attributes"].get("operatingSystem") != "Linux": | ||
continue | ||
if product["attributes"].get("capacitystatus") != "Used": | ||
continue | ||
if product["attributes"].get("operation") != "RunInstances": | ||
continue | ||
price_dimensions = list(offers["terms"]["OnDemand"][product["sku"]].values())[0][ | ||
"priceDimensions" | ||
] | ||
|
||
price = list(price_dimensions.values())[0]["pricePerUnit"]["USD"] | ||
|
||
instance_type = product["attributes"]["instanceType"] | ||
metadata = { | ||
"sku": product["sku"], | ||
"instance_type": instance_type, | ||
"cpu": int(product["attributes"]["vcpu"]), | ||
"mem": int( | ||
float(re.sub("[^0-9\\.]", "", product["attributes"]["memory"].split(" ")[0])) * 1024 | ||
), | ||
"price": float(price), | ||
} | ||
if product["attributes"].get("gpu") is not None: | ||
metadata["gpu"] = product["attributes"]["gpu"] | ||
instance_mapping[instance_type] = metadata | ||
|
||
return instance_mapping | ||
|
||
|
||
def set_ec2_metadata(cluster_config_path, internal_cluster_config_path): | ||
with open(cluster_config_path, "r") as f: | ||
cluster_config = yaml.safe_load(f) | ||
instance_mapping = download_metadata(cluster_config) | ||
instance_metadata = instance_mapping[cluster_config["instance_type"]] | ||
|
||
internal_cluster_config = { | ||
"instance_mem": str(instance_metadata["mem"]) + "Mi", | ||
"instance_cpu": str(instance_metadata["cpu"]), | ||
"instance_gpu": int(instance_metadata.get("gpu", 0)), | ||
} | ||
|
||
with open(internal_cluster_config_path, "w") as f: | ||
yaml.dump(internal_cluster_config, f) | ||
|
||
|
||
def main(): | ||
set_ec2_metadata(sys.argv[1], sys.argv[2]) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.