-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* wip to add AWS * add support for cluster deletion * finish up scaling and example, a few bug fixes the steps here will do a full creation of the cluster, which include VPC, subnets (private and public), and security group, associating (and creating if needed) a pem, getting the endpoint and certificate to make a kube config yaml file to authenticate, and then another stack to create the workers pool. This is interesting that AWS first creates you an "empty" cluster, meaning just a control plane, and then you need to create the workers as a separate request, and apply a config map secret to the kube-system so the control plane can see the workers! This is so much more complex than GKE, and now that I have everything working to go UP I have to go backwards and figure out how to delete everything before looking into scaling... hahahahahahah aahhhhh! :) Signed-off-by: vsoch <vsoch@users.noreply.github.com>
- Loading branch information
Showing
24 changed files
with
1,057 additions
and
85 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,6 @@ dist/ | |
__pycache__ | ||
*.img | ||
/.eggs | ||
*auth-config.yaml | ||
*kubeconfig.yaml | ||
*kubeconfig-*.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# AWS Examples | ||
|
||
## Create and Delete a Cluster | ||
|
||
This example shows creating and deleting a cluster. You should be able to run | ||
this also if a cluster is already created. First, make sure your AWS credentials | ||
are exported: | ||
|
||
```bash | ||
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxx | ||
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxx | ||
export AWS_SESSION_TOKEN=xxxxxxxxxxxxxxxxxxxxxx | ||
``` | ||
|
||
And then run the script (using defaults, min size 1, max size 3) | ||
|
||
```bash | ||
$ python create-delete-cluster.py --min-node-count 1 --max-node-count 3 --machine-type m5.large | ||
``` | ||
|
||
## Test Scale | ||
|
||
Here are some example runs for testing the time it takes to scale a cluster up. | ||
We also time separate components of scaling, like creating the worker pool and | ||
the vpc. We do small max sizes here since it's just a demo! This first example runs on GKE: | ||
|
||
```bash | ||
$ pip install -e .[aws] | ||
$ pip install -e kubescaler[aws] | ||
``` | ||
```bash | ||
# Test scale up in increments of 1 (up to 3) for c2-standard-8 (the default) just one iteration! | ||
$ python test-scale.py --increment 1 small-cluster --max-node-count 3 --min-node-count 0 --start-iter 0 --end-iter 1 | ||
|
||
# Slightly more reasonable experiment | ||
$ python test-scale.py --increment 1 test-cluster --max-node-count 32 --min-node-count 0 --start-iter 0 --end-iter 10 | ||
|
||
# Test scale down in increments of 2 (5 down to 1) for 10 iterations (default) | ||
$ python test-scale.py --increment 2 test-cluster --down --max-node-count 5 --down | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
#!/usr/bin/env python3 | ||
|
||
import argparse | ||
import sys | ||
import time | ||
|
||
from kubescaler.scaler import EKSCluster | ||
|
||
|
||
def get_parser(): | ||
parser = argparse.ArgumentParser( | ||
description="K8s Cluster Creator / Destroyer!", | ||
formatter_class=argparse.RawTextHelpFormatter, | ||
) | ||
parser.add_argument( | ||
"cluster_name", nargs="?", help="Cluster name suffix", default="flux-cluster" | ||
) | ||
parser.add_argument( | ||
"--experiment", help="Experiment name (defaults to script name)", default=None | ||
) | ||
parser.add_argument("--node-count", help="starting node count", type=int, default=2) | ||
parser.add_argument( | ||
"--max-node-count", help="maximum node count", type=int, default=3 | ||
) | ||
parser.add_argument( | ||
"--min-node-count", | ||
help="minimum node count", | ||
type=int, | ||
default=1, | ||
) | ||
parser.add_argument("--machine-type", help="AWS machine type", default="m5.large") | ||
return parser | ||
|
||
|
||
def main(): | ||
""" | ||
Demonstrate creating and deleting a cluster. If the cluster exists, | ||
we should be able to retrieve it and not create a second one. | ||
""" | ||
parser = get_parser() | ||
|
||
# If an error occurs while parsing the arguments, the interpreter will exit with value 2 | ||
args, _ = parser.parse_known_args() | ||
|
||
# Pull cluster name out of argument | ||
cluster_name = args.cluster_name | ||
|
||
# Derive the experiment name, either named or from script | ||
experiment_name = args.experiment | ||
if not experiment_name: | ||
experiment_name = sys.argv[0].replace(".py", "") | ||
time.sleep(2) | ||
|
||
# Update cluster name to include experiment name | ||
cluster_name = f"{experiment_name}-{cluster_name}" | ||
print(f"📛️ Cluster name is {cluster_name}") | ||
|
||
print( | ||
f"⭐️ Creating the cluster sized {args.min_node_count} to {args.max_node_count}..." | ||
) | ||
cli = EKSCluster( | ||
name=cluster_name, | ||
node_count=args.node_count, | ||
max_nodes=args.max_node_count, | ||
min_nodes=args.min_node_count, | ||
machine_type=args.machine_type, | ||
) | ||
cli.create_cluster() | ||
print("⭐️ Deleting the cluster...") | ||
cli.delete_cluster() | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
#!/usr/bin/env python3 | ||
|
||
import argparse | ||
import json | ||
import os | ||
import sys | ||
import time | ||
|
||
from kubescaler.scaler import EKSCluster | ||
from kubescaler.utils import read_json | ||
|
||
# Save data here | ||
here = os.path.dirname(os.path.abspath(__file__)) | ||
|
||
# Create data output directory | ||
data = os.path.join(here, "data") | ||
|
||
|
||
def get_parser(): | ||
parser = argparse.ArgumentParser( | ||
description="K8s Scaling Experiment Runner", | ||
formatter_class=argparse.RawTextHelpFormatter, | ||
) | ||
parser.add_argument( | ||
"cluster_name", nargs="?", help="Cluster name suffix", default="flux-cluster" | ||
) | ||
parser.add_argument( | ||
"--outdir", | ||
help="output directory for results", | ||
default=data, | ||
) | ||
parser.add_argument( | ||
"--experiment", help="Experiment name (defaults to script name)", default=None | ||
) | ||
parser.add_argument( | ||
"--start-iter", help="start at this iteration", type=int, default=0 | ||
) | ||
parser.add_argument( | ||
"--end-iter", help="end at this iteration", type=int, default=3, dest="iters" | ||
) | ||
parser.add_argument( | ||
"--max-node-count", help="maximum node count", type=int, default=3 | ||
) | ||
parser.add_argument( | ||
"--min-node-count", help="minimum node count", type=int, default=0 | ||
) | ||
parser.add_argument( | ||
"--start-node-count", | ||
help="start at this many nodes and go up", | ||
type=int, | ||
default=1, | ||
) | ||
parser.add_argument("--machine-type", help="AWS machine type", default="m5.large") | ||
parser.add_argument( | ||
"--increment", help="Increment by this value", type=int, default=1 | ||
) | ||
parser.add_argument( | ||
"--down", action="store_true", help="Test scaling down", default=False | ||
) | ||
return parser | ||
|
||
|
||
def main(): | ||
""" | ||
This experiment will test scaling a cluster, three times, each | ||
time going from 2 nodes to 32. We want to understand if scaling is | ||
impacted by cluster size. | ||
""" | ||
parser = get_parser() | ||
|
||
# If an error occurs while parsing the arguments, the interpreter will exit with value 2 | ||
args, _ = parser.parse_known_args() | ||
|
||
# Pull cluster name out of argument | ||
cluster_name = args.cluster_name | ||
|
||
# Derive the experiment name, either named or from script | ||
experiment_name = args.experiment | ||
if not experiment_name: | ||
experiment_name = sys.argv[0].replace(".py", "") | ||
time.sleep(2) | ||
|
||
# Shared tags for logging and output | ||
if args.down: | ||
direction = "decrease" | ||
tag = "down" | ||
else: | ||
direction = "increase" | ||
tag = "up" | ||
|
||
# Update cluster name to include tag and increment | ||
experiment_name = f"{experiment_name}-{tag}-{args.increment}" | ||
print(f"📛️ Experiment name is {experiment_name}") | ||
|
||
# Prepare an output directory, named by cluster | ||
outdir = os.path.join(args.outdir, experiment_name, cluster_name) | ||
if not os.path.exists(outdir): | ||
print(f"📁️ Creating output directory {outdir}") | ||
os.makedirs(outdir) | ||
|
||
# Define stopping conditions for two directions | ||
def less_than_max(node_count): | ||
return node_count <= args.max_node_count | ||
|
||
def greater_than_zero(node_count): | ||
return node_count > 0 | ||
|
||
# Update cluster name to include experiment name | ||
cluster_name = f"{experiment_name}-{cluster_name}" | ||
print(f"📛️ Cluster name is {cluster_name}") | ||
|
||
# Create 10 clusters, each going up to 32 nodes | ||
for iter in range(args.start_iter, args.iters): | ||
results_file = os.path.join(outdir, f"scaling-{iter}.json") | ||
|
||
# Start at the max if we are going down, otherwise the starting count | ||
node_count = args.max_node_count if args.down else args.start_node_count | ||
print( | ||
f"⭐️ Creating the initial cluster, iteration {iter} with size {node_count}..." | ||
) | ||
cli = EKSCluster( | ||
name=cluster_name, | ||
node_count=node_count, | ||
machine_type=args.machine_type, | ||
min_nodes=args.min_node_count, | ||
max_nodes=args.max_node_count, | ||
) | ||
# Load a result if we have it | ||
if os.path.exists(results_file): | ||
result = read_json(results_file) | ||
cli.times = result["times"] | ||
|
||
# Create the cluster (this times it) | ||
res = cli.create_cluster() | ||
print(f"📦️ The cluster has {cli.node_count} nodes!") | ||
|
||
# Flip between functions to decide to keep going based on: | ||
# > 0 (we are decreasing from the max node count) | ||
# <= max nodes (we are going up from a min node count) | ||
keep_going = less_than_max | ||
if args.down: | ||
keep_going = greater_than_zero | ||
|
||
# Continue scaling until we reach stopping condition | ||
while keep_going(node_count): | ||
old_size = node_count | ||
|
||
# Are we doing down or up? | ||
if args.down: | ||
node_count -= args.increment | ||
else: | ||
node_count += args.increment | ||
|
||
print( | ||
f"⚖️ Iteration {iter}: scaling to {direction} by {args.increment}, from {old_size} to {node_count}" | ||
) | ||
|
||
# Scale the cluster - we should do similar logic for the GKE client (one function) | ||
start = time.time() | ||
res = cli.scale(node_count) | ||
end = time.time() | ||
seconds = round(end - start, 3) | ||
cli.times[f"scale_{tag}_{old_size}_to_{node_count}"] = seconds | ||
print( | ||
f"📦️ Scaling from {old_size} to {node_count} took {seconds} seconds, and the cluster now has {res.initial_node_count} nodes!" | ||
) | ||
|
||
# Save the times as we go | ||
print(json.dumps(cli.data, indent=4)) | ||
cli.save(results_file) | ||
|
||
# Delete the cluster and clean up | ||
cli.delete_cluster() | ||
print(json.dumps(cli.data, indent=4)) | ||
cli.save(results_file) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1 @@ | ||
from kubescaler.scaler import GKECluster | ||
from kubescaler.version import __version__ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.