Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Switching Google Cloud Engine Nodepool should not break system availability


Let's say your GCE nodes are corrupted or need patching, you know how to create a new nodepool but you wonder if switching from one to the other might impact your system availability. It should not but let's try, shall we?



Make sure you have deployed a Kubernetes cluster and OpenFaaS as per the main README of this repository.

Make sure you have the right Kubernetes credentials so that you can connect to your cluster. Basically, if kubectl works, you should be fine as the Chaos Toolkit uses the same credentials.

In addition, you will aso need credentials to connect to your GCE project via a service account file. Make sure to first create one and edit the experiment secrets and configuration sections accordingly.

WARNING: This experiment is fairly powerful as it creates a new nodepool on your cluster. As this is a demo, make sure you have the right environment to toy with it first. The existing nodepool will not be deleted while the new one will be deleted at the end of the experiment automatically. If not, you can delete it as follows:

$ gcloud container node-pools list --cluster CLUSTER_ID yet-other-pool


We are relying on Vegeta to inject load into the system. Please download the command line and make it available into your PATH.

Chaos Toolkit

You need to have the Chaos Toolkit installed on your local machine as well as the Kubernetes and GCE dependencies:

(chaostk) $ pip install -U chaostoolkit chaostoolkit-kubernetes chaostoolkit-google-cloud

Also, to generate reports, you will need to install the chaostoolkit-reporting plugin.


Run the experiment as follows:

(chaostk) $ cd repo/toplevel/directory
(chaostk) $ chaos run experiments/switching-gce-nodepool/experiment.json

Here is a sample of this experiment being executed:

Chaos Toolkit Experiment Run

At the same time, let's have a view of our system via Weave Cloud.

System View via Weave Scope

Notice how the new node joins the cluster and how OpenFaaS reacts by distributing the load accordingly once the nodes on the existing nodepool have been cordon.

Note also how we uncordon those nodes and delete the new nodepool in the rollbacks.


You can create a report of the results as follows:

(chaostk) $ chaos report --export-format=pdf journal.json report.pdf

You can find an example of such a report here.

We notice a few 502 indicating that some users could be impacted in the operation. However this could also be an experiment artifact.

You can’t perform that action at this time.