Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.md
chaostoolkit-run.gif
experiment.json
report.pdf
system-view.gif

README.md

Switching Google Cloud Engine Nodepool should not break system availability

Learnings

Let's say your GCE nodes are corrupted or need patching, you know how to create a new nodepool but you wonder if switching from one to the other might impact your system availability. It should not but let's try, shall we?

Requirements

Environment

Make sure you have deployed a Kubernetes cluster and OpenFaaS as per the main README of this repository.

Make sure you have the right Kubernetes credentials so that you can connect to your cluster. Basically, if kubectl works, you should be fine as the Chaos Toolkit uses the same credentials.

In addition, you will aso need credentials to connect to your GCE project via a service account file. Make sure to first create one and edit the experiment secrets and configuration sections accordingly.

WARNING: This experiment is fairly powerful as it creates a new nodepool on your cluster. As this is a demo, make sure you have the right environment to toy with it first. The existing nodepool will not be deleted while the new one will be deleted at the end of the experiment automatically. If not, you can delete it as follows:

$ gcloud container node-pools list --cluster CLUSTER_ID yet-other-pool

Vegeta

We are relying on Vegeta to inject load into the system. Please download the command line and make it available into your PATH.

Chaos Toolkit

You need to have the Chaos Toolkit installed on your local machine as well as the Kubernetes and GCE dependencies:

(chaostk) $ pip install -U chaostoolkit chaostoolkit-kubernetes chaostoolkit-google-cloud

Also, to generate reports, you will need to install the chaostoolkit-reporting plugin.

Usage

Run the experiment as follows:

(chaostk) $ cd repo/toplevel/directory
(chaostk) $ chaos run experiments/switching-gce-nodepool/experiment.json

Here is a sample of this experiment being executed:

Chaos Toolkit Experiment Run

At the same time, let's have a view of our system via Weave Cloud.

System View via Weave Scope

Notice how the new node joins the cluster and how OpenFaaS reacts by distributing the load accordingly once the nodes on the existing nodepool have been cordon.

Note also how we uncordon those nodes and delete the new nodepool in the rollbacks.

Reporting

You can create a report of the results as follows:

(chaostk) $ chaos report --export-format=pdf journal.json report.pdf

You can find an example of such a report here.

We notice a few 502 indicating that some users could be impacted in the operation. However this could also be an experiment artifact.

You can’t perform that action at this time.