kubectl-val implements a simplified kubernetes model using an object-oriented state machine and searches for any scenario that may lead to a 'failure'. Failures are currently defined as
Service having no associated running pods. Other definitions are also possible and are currently work in progress.
kubectl-val is written in modern Python and requires Python 3.7+, so please be prepared that if your default script installation uses older Python versions you may have to manually specify the interpreter for the script.
Easiest way is to use our binary release which auto-updates itself. You can download latest release here, or you can execute these commands:
wget https://github.com/criticalhop/kubectl-val/releases/latest/download/kubectl-val chmod +x ./kubectl-val sudo ln -s $(pwd)/kubectl-val /usr/local/bin/kubectl-val
remember to mark the file executable by issuing
chmod +x ./kubectl-val
$ pip install kubectl-val
kubectl-val comes as a simple kubectl plugin, so a working
kubectl is a requirement if you want to access real cluster. If you do not have
kubectl you can use it just as standalone shell command
kubectl-val instead of
kubectl val ...
Checking if creating a resource won't break anything
To try it against sample "broken" kubernetes configurations, use
-d option to supply a folder with a collection of Kubernetes resources' stored from
kubectl get <...> -o=yaml > <...>.yaml, and try to create a new resource with
$ git clone https://github.com/criticalhop/kubectl-val $ cd kubectl-val/examples/daemonset-eviction $ kubectl val -d cluster-dump/ -f daemonset_create.yaml
You can find a bigger cluster example in tests/ folder.
Checking a Kubernetes configuration for correctness
kubectl val without
-f will run a check of current configuration and (hopefully) find no issues, as the configuration is already running.
$ kubectl val -d cluster-dump/
Checking live cluster
Before checking the cluster you should first "dump" all of current resources into a "cluster dump" folder:
mkdir my-cluster-dump cd my-cluster-dump kubectl get nodes --all-namespaces -o=yaml > nodes.yaml kubectl get pods --all-namespaces -o=yaml > pods.yaml kubectl get services --all-namespaces -o=yaml > services.yaml kubectl get priorityclass --all-namespaces -o=yaml > priority.yaml
After you have the dump folder, you can continue with a check described above.
Excluding a Scenario
There may be more than one issue with the configuration and
kubectl-val will only detect one scenario at a time. To select next scenario it is possible to exclude certain resources from search by issuing
-e Service:<service-name> command:
kubectl-val -d ... -f ... -e Service:redis-master
kubectl-val detects a scenario with a kind
Service and name
redis-master this command would exclude it from search and you will either get the next scenario, if any - or a clean cluster state as a result.
To search for a failure scenario, kubectl-val builds a model representation of the current cluster state that it reads from the files created by
kubectl get -o=yaml. The constructed model is sent to PDDL planner and the resulting solution is then interpreted as a failure scenario and sent back to console as YAML-encoded scenario steps.
Scenario output can later be used by the pipeline operator to aid with decision making - e.g. whether stop the deployment, log the event to the dashboard, etc.
kubectl-val also calculates the probability of the scenario by multiplying the probability associated with every step.
kubectl-val depends on a configured PDDL AI-planning
poodlesolver running as http service. By default it uses a cloud solver hosted by CriticalHop.
poodlesolver comes with
poodle python library and installs automatically when
kubectl-val is installed via
pip install. To run a local solver, please refer to poodle documentation.
Build from source
git clone https://github.com/criticalhop/kubectl-val cd kubectl-val poetry install
Specifying solver location
kubectl-val uses a hosted solver. You can learn how to run you local solver by checking poodle repository.
The goal for the project is to create an intent-driven, self-healing Kubernetes configuration system that will abstract the cluster manager from error-prone manual tweaking.
kubectl-val is a developer preview and currently supports a subset of resource/limits validation and partial label match validation.
We invite you to follow @criticalhop on Twitter and to chat with the team at #kubectl-val on freenode. If you have any questions or suggestions - feel free to open a github issue or contact firstname.lastname@example.org directly.