Missing Container Metrics - metrics cadvisor won't give you

STATUS: stable, maintained

cadvisor is great, but missing a few important metrics, that every serious devops person wants to know about. This is a secondary process to export all the missing Prometheus metrics:

OOM-kill
number of container restarts
last exit code

This was motivated by hunting down a OOM kills in a large Kubernetes cluster. It's possible for containers to keep running, even after a OOM-kill, if a sub-process got affect for example. Without this metric, it becomes much more difficult to find the root cause of the issue.

True story; after this was deployed, a recurring OOM-kill in Fluentd was quickly discovered on one of the nodes. It turns out that the resource limits were set too low, and this particular node was logging a lot more. Logs were not being forwarded because the Fluentd worker process kept being OOM-kill and then restarted by the main process. A fix was then deployed 10 minute later.

Deployment

Kubernetes

> daemon-set.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: missing-container-metrics
  namespace: kube-system
  labels:
    k8s-app: missing-container-metrics
spec:
  selector:
    matchLabels:
      name: missing-container-metrics
  template:
    metadata:
      labels:
        name: missing-container-metrics
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '3001'
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: missing-container-metrics
        image: dmilhdef/missing-container-metrics:v0.14.0
        resources:
          limits:
            memory: 20Mi
          requests:
            memory: 20Mi
        volumeMounts:
        - name: dockersock
          mountPath: /var/run/docker.sock
      terminationGracePeriodSeconds: 30
      volumes:
      - name: dockersock
        hostPath:
          path: /var/run/docker.sock

Docker

$ docker run -d -p 3001:3001 -v /var/run/docker.sock:/var/run/docker.sock dmilhdef/missing-container-metrics:v0.14.0

Usage

Exposes metrics about Docker containers from Docker events. Every metric contains following labels:

Exposed Metrics

Each of those metrics, are published with the labels from the next section.

`container_restarts` (counter)

Number of restarts of the container.

`container_ooms` (counter)

Number of OOM kills for the container. This covers OOM kill of any process in the container cgroup.

`container_last_exit_code` (gauge)

Last exit code of the container.

Labels

`docker_container_id`

Full id of the Docker container.

`container_short_id`

First 6 bytes of the Docker container id.

`container_id`

Container id represented in the same format as in metrics of k8s pods - prefixed with docker://. This enables easy joins in Prometheus to kube_pod_container_info metric.

`name`

Name of the container.

`image_id`

Image id represented in the same format as in metrics of k8s pods - prefixed with docker-pullable://. This enables easy joins in Prometheus to kube_pod_container_info metric.

`pod`

If io.kubernetes.pod.name label is set on the container, it's value will be set as the pod label in the metric

`namespace`

If io.kubernetes.pod.namespace label is set on the container, it's value will be set as the namespace label of the metric.

This label, together with pod is useful in the context of Kubernetes deployments, to determine namespace/pod to which the container is part of, instead of having to join with kube_pod_container_info metric to determine those values.

Contributing

Contributions are welcome, send your issues and PRs to this repo.

License

MIT - Copyright Dragan Milic and contributors

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
hooks		hooks
Dockerfile		Dockerfile
LICENCE.txt		LICENCE.txt
README.md		README.md
daemon-set.yaml		daemon-set.yaml
event_handler.go		event_handler.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
prometheus.go		prometheus.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Missing Container Metrics - metrics cadvisor won't give you

Deployment

Kubernetes

Docker

Usage

Exposed Metrics

`container_restarts` (counter)

`container_ooms` (counter)

`container_last_exit_code` (gauge)

Labels

`docker_container_id`

`container_short_id`

`container_id`

`name`

`image_id`

`pod`

`namespace`

Contributing

License

About

Releases

Packages

Languages

License

Paycasso/missing-container-metrics

Folders and files

Latest commit

History

Repository files navigation

Missing Container Metrics - metrics cadvisor won't give you

Deployment

Kubernetes

Docker

Usage

Exposed Metrics

container_restarts (counter)

container_ooms (counter)

container_last_exit_code (gauge)

Labels

docker_container_id

container_short_id

container_id

name

image_id

pod

namespace

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`container_restarts` (counter)

`container_ooms` (counter)

`container_last_exit_code` (gauge)

`docker_container_id`

`container_short_id`

`container_id`

`name`

`image_id`

`pod`

`namespace`

Packages