Switching Google Cloud Engine Nodepool should not break system availability
Let's say your GCE nodes are corrupted or need patching, you know how to create a new nodepool but you wonder if switching from one to the other might impact your system availability. It should not but let's try, shall we?
Make sure you have deployed a Kubernetes cluster and OpenFaaS as per the main README of this repository.
Make sure you have the right Kubernetes credentials so that you can connect
to your cluster. Basically, if
kubectl works, you should be fine as the Chaos
Toolkit uses the same credentials.
In addition, you will aso need credentials to connect to your GCE project via
a service account file. Make sure to first create one and edit the
configuration sections accordingly.
WARNING: This experiment is fairly powerful as it creates a new nodepool on your cluster. As this is a demo, make sure you have the right environment to toy with it first. The existing nodepool will not be deleted while the new one will be deleted at the end of the experiment automatically. If not, you can delete it as follows:
$ gcloud container node-pools list --cluster CLUSTER_ID yet-other-pool
We are relying on Vegeta to inject load into the system. Please download the command line and make it available into your PATH.
(chaostk) $ pip install -U chaostoolkit chaostoolkit-kubernetes chaostoolkit-google-cloud
Also, to generate reports, you will need to install the chaostoolkit-reporting plugin.
Run the experiment as follows:
(chaostk) $ cd repo/toplevel/directory (chaostk) $ chaos run experiments/switching-gce-nodepool/experiment.json
Here is a sample of this experiment being executed:
At the same time, let's have a view of our system via Weave Cloud.
Notice how the new node joins the cluster and how OpenFaaS reacts by distributing the load accordingly once the nodes on the existing nodepool have been cordon.
Note also how we uncordon those nodes and delete the new nodepool in the rollbacks.
You can create a report of the results as follows:
(chaostk) $ chaos report --export-format=pdf journal.json report.pdf
You can find an example of such a report here.
We notice a few 502 indicating that some users could be impacted in the operation. However this could also be an experiment artifact.