k8eraid

(KAY-ter-ade)

What does this do?

The point of k8eraid is to provide a relatively simple, unified method for reporting on Kubernetes resource issues.

How is this different from metrics based alerters (ie: Prometheus with AlertManager)?

This tool directly integrates with the Kubernetes API to look up the actual real state of resources and their changes. That means where metrics based systems are good for looking up a snapshot of status based on specifications (X number of pods running right now), k8eraid can look up the current status as well as see if the status has recently changed by looking at the timestamps for API metadata like deletion times and pod state readiness times. Additionally, the configuration for k8eraid is simple enough that it is easy to expand or remove alerting and monitoring as you see fit. Not only that but k8eraid is incredibly lightweight, using about 100KB of memory at runtime to look up the status of 100 individual resources, making it a prime, lightweight candidate for allowing multiple teams in a shared environment to deploy per-namespace and allowing them to manage alerting as they see fit!

How does it work?

Create a JSON file specifying what resources you want to monitor, and how you want to alert.
Make the config into a configmap.
Make an appropriate role with permissions for k8eraid to read all resources in the Apps and Core APIs.
Deploy it to your cluster with k8eraid.
Get annoyed at the fact that you now get alerts when things go wrong in your cluster.

Which Kubernetes versions are supported?

Kubernetes version	Works
1.6.X	✅
1.7.X	✅
1.9.X	✅
1.10.X	✅

What does k8eraid monitor?

Resource	Statuses
Pods	Minimum pod count, pod restarts, Failed scheduling, Stuck terminating
Deployments	Minimum replica count
Daemonsets	Minimum replica count, Failed scheduling
Nodes	Out of disk, Memory pressure, Disk pressure, Node readiness, Node count

K8eraid can not only perform these checks against single resources, but you can specify "global" rules using "*". Additionally, global rules can use filters based on resource labels!

What alert methods does k8eraid support?

Alert type	Options
stderr
smtp	Mail server, Port, Password ENV var, Subject, From address, To address
pagerdutyV2	Service key ENV var, Proxy server, Subject
webhook	Server, Proxy server, Subject

Get it from DockerHub:

# get from DockerHub
docker pull bloomberg/k8eraid

Awesome! So how does configuration work?

There are five types of objects in a config- "deployments", "pods", "daemonsets", "nodes", and "alerters". Each of these objects contain one or more desired definitions. There are a few important rules that you will need to remember when configuring your rules, most of these are due to the way the kubernetes client functions in list vs get functions.

The config is self-reloading. You do not need to redeploy k8eraid when you update the configmap.
If using a wildcard for a POD, you MUST specify a valid filterLabel.
If specifying a name for any target resource, you MUST specify a valid filterNamespace.
If your pendingThreshold is too short for a POD rule, you may get alerts for normal pod startups.
For DEPLOYMENT and DAEMONSET type resources- "filter" can either be a literal string for a namespace, or a key/value pair string for a metadata label.

Pod configuration examples

Check for pod restarts and failures scheduling of pod named "foobarbaz-pod" in the "default" namespace. But only if the pod has existed in kubernetes for at least 120 seconds. Send errors to stderr

{
	"name": "foobarbaz-pod",
	"filterNamespace": "default",
	"filterLabel": "",
	"alerter": "stderr",
	"reportStatus": {
		"minPods": 1,
		"podRestarts": true,
		"failedScheduling": true,
		"pendingThreshold": 120
	}
}

Check for failures scheduling or deleting ALL pods, in any namespace with the metadata label "monitor=true", but only if the pod has existed in kubernetes for at least 10 seconds. Send email using "example-email" alerter

{
	"name": "*",
	"filterNamespace": "",
	"filterLabel": "monitor=true",
	"alerter": "example-email",
	"reportStatus": {
		"minPods": 1,
		"podRestarts": false,
		"failedScheduling": true,
		"stuckTerminating": true,
		"pendingThreshold": 10
	}
}

Deployment configuration examples

Check to make sure the "foobar-deployment" deployment has at least 3 ready pods, but only if "foobar-deployment" has been around for at least 10 seconds. Use pagerduty to send an alert.

{
	"name": "foobar-deployment",
	"filter": "default",
	"alerter": "example-pagerduty",
	"reportStatus": {
		"minReplicas": 3,
		"pendingThreshold": 10
	}
}

Check to make sure all deployments older than 30 seconds have at least 1 ready pod. Send alerts to stderr.

{
	"name": "*",
	"filter": "",
	"alerter": "stderr",
	"reportStatus": {
		"minReplicas": 1,
		"pendingThreshold": 30
	}
}

Daemonset configuration examples

Check to see if the daemonset "daemon-of-glory" has the expected number of replicas deployed, checking for failed scheduling- assuming the Daemonset is at least 10 seconds old. Send alerts to stderr.

{
	"name": "daemon-of-glory",
	"filter": "",
	"alerter": "stderr",
	"reportStatus": {
		"checkReplicas": true,
		"failedScheduling": true,
		"pendingThreshold": 10
	}
}

Node configuration examples

Examine all nodes with the label "monitor=true" that are at least 5 minutes old. Check to make sure there are at least 10 nodes in the cluster, and watch for OutOfDisk, MemoryPressure, DiskPressure, and Readiness issues. Send alerts to stderr.

{
	"name": "*",
	"filter": "monitor=true",
	"alerter": "stderr",
	"reportStatus": {
		"minNodes": 10,
		"outOfDisk": true,
		"memoryPressure": true,
		"diskPressure": true,
		"readiness": true,
		"pendingThreshold": 300
	}
}

Alerter configuration

stdout is a default constant alerter name that will always spew errors to stdout where the application is running. No special configuration is needed.
All other alert types may be configured multiple different ways each with unique names- allowing you to change alert behavior based on your rules as desired.
Example smtp alert named "example-email", this will email me@example.com when called upon

{
	"name": "example-email",
	"toAddress": "me@example.com",
	"fromAddress": "kubernetes@example.com",
	"mailServer": "smtp.example.com",
	"port": 25,
	"subject": "Observed issue with Kubernetes cluster"
}

Example smtp alert named "gmail-email", this will email me@example.com from foo@gmail.com when called upon, using the password saved in GMAIL_PW ENV var for authentication.

{
	"name": "gmail-email",
	"toAddress": "me@example.com",
	"fromAddress": "foo@gmail.com",
	"mailServer": "smtp.gmail.com",
	"port": 587,
	"passwordEnvVar": "GMAIL_PW",
	"subject": "Observed issue with Kubernetes cluster"
}

Example Pagerduty alert name "example-pagerduty", this will trigger a pagerduty alert using the value of the injected ENV variable of PD_KEY as the service key, using http://proxy.example.com:80 as an http proxy.

{
	"name": "example-pagerduty",
	"serviceKeyEnvVar": "PD_KEY",
	"proxyServer": "http://proxy.example.com:80",
	"subject": "Observed issue with Kubernetes cluster"
}

Contributing

Got features or bugfixes? please feel free to contribute with code or issues!

Required tools

Docker
gofmt
golint

Useful editor integrations

Vim: vim-go
Emacs go-mode.el
VSCode vscode-go

Building binary with valid Golang environment

make build

Building binary with valid Docker installation

make buildcontainer

Testing code with a valid Docker installation

make testcontainer

Building a Docker container

make container

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
cmd/k8eraid		cmd/k8eraid
examples		examples
hack		hack
pkgs		pkgs
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
DCO.md		DCO.md
Dockerfile		Dockerfile
Dockerfile.vendor		Dockerfile.vendor
Gopkg.lock		Gopkg.lock
Gopkg.toml		Gopkg.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker_deploy.sh		docker_deploy.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

k8eraid

What does this do?

How does it work?

Which Kubernetes versions are supported?

What does k8eraid monitor?

What alert methods does k8eraid support?

Get it from DockerHub:

Awesome! So how does configuration work?

Pod configuration examples

Deployment configuration examples

Daemonset configuration examples

Node configuration examples

Alerter configuration

Contributing

Required tools

Useful editor integrations

Building binary with valid Golang environment

Building binary with valid Docker installation

Testing code with a valid Docker installation

Building a Docker container

About

Releases

Packages

Contributors 2

Languages

License

bloomberg/k8eraid

Folders and files

Latest commit

History

Repository files navigation

k8eraid

What does this do?

How does it work?

Which Kubernetes versions are supported?

What does k8eraid monitor?

What alert methods does k8eraid support?

Get it from DockerHub:

Awesome! So how does configuration work?

Pod configuration examples

Deployment configuration examples

Daemonset configuration examples

Node configuration examples

Alerter configuration

Contributing

Required tools

Useful editor integrations

Building binary with valid Golang environment

Building binary with valid Docker installation

Testing code with a valid Docker installation

Building a Docker container

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages