Monitoring Docker Containers

Table of Contents

Docker CLI
Docker Remote API
Docker Events
- Use filters
Prometheus
cAdvisor
- Prometheus and cAdvisor
Monitor Java Applications
Grafana

This chapter will cover different ways to monitor a Docker container.

Docker CLI

docker container stats command displays a live stream of container(s) runtime metrics. The command supports CPU, memory usage, memory limit, and network IO metrics.

Start a container: docker container run --name web -d jboss/wildfly:10.1.0.Final
Check the container stats using docker container stats web. It shows the output as:
```
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
web                 0.14%               274.9MiB / 1.952GiB   13.75%              828B / 0B           96.7MB / 4.1kB      53
```
The output is continually updated. It shows:
1. Container name
2. Percent CPU utilization
3. Total memory usage vs amount available to the container
4. Percent memory utilization
5. Network activity
6. Disk activity
7. PIDS??

Check stats for multiple containers

Terminate single instance of the container

docker container stop web
docker container rm web

Create a service with multiple replicas of the container

docker swarm init
docker service create --name=web jboss/wildfly

Now check the stats again:

docker stats

to see the output:

CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
1f9d5a1c08a8        0.24%               294.9MiB / 1.952GiB   14.75%              828B / 0B           96.5MB / 4.1kB      115

Note that the container id is shown instead of container’s name in this case.

Scale the service to 3 replicas:
```
docker service scale web=3
```

The stats are now shown for all three containers:

CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
1f9d5a1c08a8        0.14%               287.8MiB / 1.952GiB   14.40%              1.03kB / 0B         96.5MB / 4.1kB      53
a3395efbb6ff        0.77%               263.4MiB / 1.952GiB   13.18%              578B / 0B           0B / 4.1kB          114
f484ff2f4c12        0.06%               294.8MiB / 1.952GiB   14.75%              578B / 0B           0B / 4.1kB          114

Display only container id and percent CPU utilization using the command docker container stats --format "{{.Container}}: {{.CPUPerc}}":
```
55198043b6aa: 0.10%
5b5dd33b675d: 0.11%
6e98a9597e6a: 0.10%
```

Format the output in a table. The results should include container name, percent CPU utilization and percent memory utilization. This can be achieved using the command:

docker container stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

It shows the output:

NAME                              CPU %               MEM USAGE / LIMIT
web.2.1ic0vevvvu2nwwyc6css58ref   0.10%               267.5MiB / 1.952GiB
web.3.pwcgr58s1xo28gwa1znlrn2s3   0.11%               274.7MiB / 1.952GiB
web.1.bg1dcwagl9tzz0azuegxzoys8   0.12%               274MiB / 1.952GiB

Display only the first result using the command docker container stats --no-stream:

CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
55198043b6aa        0.12%               267.5MiB / 1.952GiB   13.39%              1.44kB / 0B         0B / 4.1kB          51
5b5dd33b675d        0.07%               274.7MiB / 1.952GiB   13.75%              1.51kB / 0B         0B / 4.1kB          51
6e98a9597e6a        0.26%               274.1MiB / 1.952GiB   13.71%              1.69kB / 0B         0B / 4.1kB          51

Docker Remote API

Docker Remote API provides a lot more details about the health of the container. It can be invoked using the following format:

curl --unix-socket /var/run/docker.sock http://localhost/containers/<name>/stats

On Docker for Mac, enabling remote HTTP API still requires a few steps. So this command uses the --unix-socket option to invoke the Remote API.

A specific invocation for a container can be done as:

curl --unix-socket /var/run/docker.sock http://localhost/containers/web.1.bg1dcwagl9tzz0azuegxzoys8/stats

Note, the container name is from the previous run of the service. It will show the output:

{"read":"2017-10-13T14:08:38.045217731Z","preread":"0001-01-01T00:00:00Z","pids_stats":{"current":51},"blkio_stats":{"io_service_bytes_recursive":[{"major":8,"minor":0,"op":"Read","value":0},{"major":8,"minor":0,"op":"Write","value":4096},{"major":8,"minor":0,"op":"Sync","value":0},{"major":8,"minor":0,"op":"Async","value":4096},{"major":8,"minor":0,"op":"Total","value":4096}],"io_serviced_recursive":[{"major":8,"minor":0,"op":"Read","value":0},{"major":8,"minor":0,"op":"Write","value":1},{"major":8,"minor":0,"op":"Sync","value":0},{"major":8,"minor":0,"op":"Async","value":1},{"major":8,"minor":0,"op":"Total","value":1}],"io_queue_recursive":[],"io_service_time_recursive":[],"io_wait_time_recursive":[],"io_merged_recursive":[],"io_time_recursive":[],"sectors_recursive":[]},"num_procs":0,"storage_stats":{},"cpu_stats":{"cpu_usage":{"total_usage":11130296115,"percpu_usage":[2687118654,3014514615,2971860160,2456802686],"usage_in_kernelmode":2700000000,"usage_in_usermode":7630000000},"system_cpu_usage":952826800000000,"online_cpus":4,"throttling_data":{"periods":0,"throttled_periods":0,"throttled_time":0}},"precpu_stats":{"cpu_usage":{"total_usage":0,"usage_in_kernelmode":0,"usage_in_usermode":0},"throttling_data":{"periods":0,"throttled_periods":0,"throttled_time":0}},"memory_stats":{"usage":288051200,"max_usage":297189376,"stats":{"active_anon":283893760,"active_file":0,"cache":135168,"dirty":16384,"hierarchical_memory_limit":9223372036854771712,"hierarchical_memsw_limit":9223372036854771712,"inactive_anon":0,"inactive_file":135168,"mapped_file":32768,"pgfault":83204,"pgmajfault":0,"pgpgin":78441,"pgpgout":9093,"rss":283914240,"rss_huge":0,"swap":0,"total_active_anon":283893760,"total_active_file":0,"total_cache":135168,"total_dirty":16384,"total_inactive_anon":0,"total_inactive_file":135168,"total_mapped_file":32768,"total_pgfault":83204,"total_pgmajfault":0,"total_pgpgin":78441,"total_pgpgout":9093,"total_rss":283914240,"total_rss_huge":0,"total_swap":0,"total_unevictable":0,"total_writeback":0,"unevictable":0,"writeback":0},"limit":2095874048},"name":"/web.1.bg1dcwagl9tzz0azuegxzoys8","id":"6e98a9597e6af085e73a4d211fff9a164aa012727a46525d4fbaa164b572e23f","networks":{"eth0":{"rx_bytes":1882,"rx_packets":37,"rx_errors":0,"rx_dropped":0,"tx_bytes":0,"tx_packets":0,"tx_errors":0,"tx_dropped":0}}}

As you can see, far more details about container’s health are shown here. These stats are refereshed every one second. The continuous refresh of metrics can be terminated using Ctrl + C.

Docker Events

docker system events provide real time events for the Docker host.

In one terminal (T1), type docker system events. The command does not show output and waits for any event worth reporting to occur. The list of events is listed at https://docs.docker.com/engine/reference/commandline/events/#/extended-description.
In a new terminal (T2), kill existing container using docker service scale web=2.

T1 shows the updated list of events as:

2017-10-13T07:12:00.223791013-07:00 service update r4i0x8ujnn2q8osj8dowgvw72 (name=web, replicas.new=2, replicas.old=3)
2017-10-13T07:12:00.332724880-07:00 container kill 5b5dd33b675d3b6be3e6aaf0ecde928b3ac882b0a221ff71e57c86faae8181ab (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=pwcgr58s1xo28gwa1znlrn2s3, com.docker.swarm.task.name=web.3.pwcgr58s1xo28gwa1znlrn2s3, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.pwcgr58s1xo28gwa1znlrn2s3, signal=15, vendor=CentOS)
2017-10-13T07:12:00.613143701-07:00 container die 5b5dd33b675d3b6be3e6aaf0ecde928b3ac882b0a221ff71e57c86faae8181ab (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=pwcgr58s1xo28gwa1znlrn2s3, com.docker.swarm.task.name=web.3.pwcgr58s1xo28gwa1znlrn2s3, exitCode=0, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.pwcgr58s1xo28gwa1znlrn2s3, vendor=CentOS)
2017-10-13T07:12:00.897831488-07:00 network disconnect 8f8e6ce771d6db6065f2472a7e83612ff6a657de3b6d08dab0617b8a596234fa (container=5b5dd33b675d3b6be3e6aaf0ecde928b3ac882b0a221ff71e57c86faae8181ab, name=bridge, type=bridge)
2017-10-13T07:12:01.017523717-07:00 container stop 5b5dd33b675d3b6be3e6aaf0ecde928b3ac882b0a221ff71e57c86faae8181ab (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=pwcgr58s1xo28gwa1znlrn2s3, com.docker.swarm.task.name=web.3.pwcgr58s1xo28gwa1znlrn2s3, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.pwcgr58s1xo28gwa1znlrn2s3, vendor=CentOS)
2017-10-13T07:12:01.023414108-07:00 container destroy 5b5dd33b675d3b6be3e6aaf0ecde928b3ac882b0a221ff71e57c86faae8181ab (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=pwcgr58s1xo28gwa1znlrn2s3, com.docker.swarm.task.name=web.3.pwcgr58s1xo28gwa1znlrn2s3, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.pwcgr58s1xo28gwa1znlrn2s3, vendor=CentOS)

The output shows a list of events, one in each line. The events shown here are container kill, container die, network disconnect, container stop, and container destroy. Date and timestamp for each event is displayed at the beginning of the line. Other event specific information is displayed as well.

In T2, scale the service back to 3 replicas: docker service scale web=3

The output in T1 is updated to show:

2017-10-13T07:13:47.161848609-07:00 service update r4i0x8ujnn2q8osj8dowgvw72 (name=web, replicas.new=3, replicas.old=2)
2017-10-13T07:13:47.429074382-07:00 container create 0574d1fd74bef2e6fc54174e1fbeda25efd7ed270dce1d6dbede4ead19c7c485 (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=xcmylcwlag5vot4tp3l5z6oam, com.docker.swarm.task.name=web.3.xcmylcwlag5vot4tp3l5z6oam, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.xcmylcwlag5vot4tp3l5z6oam, vendor=CentOS)
2017-10-13T07:13:47.445010259-07:00 network connect 8f8e6ce771d6db6065f2472a7e83612ff6a657de3b6d08dab0617b8a596234fa (container=0574d1fd74bef2e6fc54174e1fbeda25efd7ed270dce1d6dbede4ead19c7c485, name=bridge, type=bridge)
2017-10-13T07:13:47.778855117-07:00 container start 0574d1fd74bef2e6fc54174e1fbeda25efd7ed270dce1d6dbede4ead19c7c485 (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=xcmylcwlag5vot4tp3l5z6oam, com.docker.swarm.task.name=web.3.xcmylcwlag5vot4tp3l5z6oam, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.3.xcmylcwlag5vot4tp3l5z6oam, vendor=CentOS)

The list of events shown here are container create, network connect, and container start.

Use filters

The list of events can be restricted by filters specified using --filter or -f option. The currently supported filters are:

container (container=<name or id>)
daemon (daemon=<name or id>)
event (event=<event action>)
image (image=<tag or id>)
label (label=<key> or label=<key>=<value>)
network (network=<name or id>)
plugin (plugin=<name or id>)
type (type=<container or image or volume or network or daemon>)
volume (volume=<name or id>)

Let’s look at the list of running containers first using docker container ls, and then learn how to apply these filters.

Here is the list of running containers from the service:

CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS               NAMES
074447f26452        jboss/wildfly:latest   "/opt/jboss/wildfl..."   3 minutes ago       Up 3 minutes        8080/tcp            web.1.ytyv0gqi7dzxtetssrlsgvvbu
0574d1fd74be        jboss/wildfly:latest   "/opt/jboss/wildfl..."   8 minutes ago       Up 8 minutes        8080/tcp            web.3.xcmylcwlag5vot4tp3l5z6oam
55198043b6aa        jboss/wildfly:latest   "/opt/jboss/wildfl..."   25 minutes ago      Up 25 minutes       8080/tcp            web.2.1ic0vevvvu2nwwyc6css58ref

Let’s apply the filters.

Show events for a container by name
1. In T1, give the command to listen to a specific container as:
  docker system events -f container=web.1.ytyv0gqi7dzxtetssrlsgvvbu
  You may have to terminate previous run of docker system events using Ctrl + C to give this new command.
2. In T2, terminate the second replica of the service as docker container rm -f web.2.1ic0vevvvu2nwwyc6css58ref.
3. T1 does not show any events because its only listening for events from the first replica of the service.

Show events for an event

In T1, give the command docker system events -f event=create.
In T2, scale the service by one more replica:
```
docker service scale web=4
```

T1 shows the event for container creation

2017-10-13T07:24:22.971050949-07:00 container create 84e4604ffd983cfcc53ad619b4c11156518834fe23e4a0a8b299905b978a0022 (build-date=20170911, com.docker.swarm.node.id=wgujclh0492kkszpil81d3ugb, com.docker.swarm.service.id=r4i0x8ujnn2q8osj8dowgvw72, com.docker.swarm.service.name=web, com.docker.swarm.task=, com.docker.swarm.task.id=38unfmcsxmnvr844gysn28lwa, com.docker.swarm.task.name=web.4.38unfmcsxmnvr844gysn28lwa, image=jboss/wildfly:latest@sha256:d3af084d024753e4799809c10cd188f675a5b254a8e279b34709035b95d27dc7, license=GPLv2, name=web.4.38unfmcsxmnvr844gysn28lwa, vendor=CentOS)

This is accurate as a new container is created and the event is shown in T1 console.

In T2, scale the service back to 2 using the command docker servie scale web=2
T1 does not show any additional events because its only looking for create events
More samples are explained at https://docs.docker.com/engine/reference/commandline/events/#/filter-events-by-criteria.

Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit. Prometheus collects metrics from monitored targets by scraping metrics from HTTP endpoints on these targets. Docker instance can be configured as Prometheus target.

Different targets to scrape are defined in the Prometheus configuration file. Targets may be statically configured via the static_configs parameter in the configuration fle or dynamically discovered using one of the supported service-discovery mechanisms (Consul, DNS, Etcd, etc.).

Prometheus collects metrics from monitored targets by scraping metrics from HTTP endpoints on these targets. Since Prometheus also exposes data in the same manner about itself, it can also scrape and monitor its own health.

Docker metrics for Prometheus

Docker exposes Prometheus-compatible metrics on port 9323. This support is only available as an experimental feature.

For Docker for Mac, click on Docker icon in the status menu
Select Preferences…, Daemon, Advanced tab

Update daemon settings:

{
  "metrics-addr" : "0.0.0.0:9323",
  "experimental" : true
}

Click on Apply & Restart to restart the daemon
Show the complete list of metrics using curl http://localhost:9323/metrics
Show the list of engine metrics using curl http://localhost:9323/metrics | grep engine

Start Prometheus

In this section, we’ll start Prometheus and use it to scrape it’s own health.

Create a new directory prometheus and change to that directory

Create a text file prometheus.yml and use the following content

# A scrape configuration scraping a Node Exporter and the Prometheus server
# itself.
scrape_configs:
  # Scrape Prometheus itself every 5 seconds.
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

This configuration file scrapes data from the Prometheus container which will be started subsequently on port 9090.

Start a single-replica Prometheus service:

docker service create \
  --replicas 1 \
  --name metrics \
  --mount type=bind,source=`pwd`/prometheus.yml,destination=/etc/prometheus/prometheus.yml \
  --publish 9090:9090/tcp \
  prom/prometheus

This will start the Prometheus container on port 9090.

Prometheus dashboard is at http://localhost:9090. Check the list of enabled targets at http://localhost:9090/targets (also accessible from Status → Targets menu).

It shows that the Prometheus endpoint is available for scraping.
Click on Graph and click on -insert metric at cursor- to see the list of metrics available:

These are all the metrics published by the Prometheus endpoint.
Choose http_request_total metrics, click on Execute
Switch from Console to Graph
Change the duration from 1h to 5m
Click on Add Graph, select a different metric, say http_requests_duration_microseconds, and click on Execute
Switch from Console to Graph and change the duration from 1h to 5m
Stop the container: docker service rm metrics

Multiple graphs can be added this way.

Node health

In this section, we’ll start Prometheus node exporter that will publish machine metrics. Then we’ll use Prometheus to scrape its health information about the node running Docker.

Start Node Exporter

All containers need to use the same overlay network so that they can communicate with each other. Let’s create an overlay network:
```
docker network create --driver overlay prom
```
Start Prometheus node exporter:
```
docker service create --name node \
 --mode global \
 --mount type=bind,source=/proc,target=/host/proc \
 --mount type=bind,source=/sys,target=/host/sys \
 --mount type=bind,source=/,target=/rootfs \
 --network prom \
 --publish 9100:9100 \
 prom/node-exporter:v0.15.0 \
  --path.procfs /host/proc \
  --path.sysfs /host/sys \
  --collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"
```
A few observations in this command:
1. This is started as a global service such that it is started on all nodes of the cluster.
2. As explained in prometheus/node_exporter#610, node exporter only works with host network on Mac OSX. This is not needed if you are running on Linux.
3. It uses the overlay network previously created.
4. It needs access to host’s filesystems such that the metrics about the node can be published.

Restart Prometheus

Update prometheus.yml to the following text:

global:
  scrape_interval: 10s
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets:
        - 'localhost:9090'
  - job_name: 'node resources'
    dns_sd_configs:
      - names: ['tasks.node']
        type: 'A'
        port: 9100
    params:
      collect[]:
        - cpu
        - meminfo
        - diskstats
        - netdev
        - netstat

  - job_name: 'node storage'
    scrape_interval: 1m
    dns_sd_configs:
      - names: ['tasks.node']
        type: 'A'
        port: 9100
    params:
      collect[]:
        - filefd
        - filesystem
        - xfs

A few observations:

DNS-based service discovery is used to discover the scraper for node-exporter. This is further explained at dns_sd_configs. A record-based queries are used to discover the service.
Two different jobs are created even though they are scraping from the same endpoint. This provides a more logical way to represent data.

Terminate previously running Prometheus service:
```
docker service rm metrics
```

Restart the Prometheus service, this time using the overlay network, as:

docker service create \
  --replicas 1 \
  --name metrics \
  --network prom \
  --mount type=bind,source=`pwd`/prometheus.yml,destination=/etc/prometheus/prometheus.yml \
  --publish 9090:9090/tcp \
  prom/prometheus

Check metrics

Confirm that both the services have started:

ID                  NAME                MODE                REPLICAS            IMAGE                       PORTS
lzl41s2i66jd        metrics             replicated          1/1                 prom/prometheus:latest      *:9090->9090/tcp
dro3ncpyuchp        node                global              1/1                 prom/node-exporter:latest

Confirm that all the targets are configured correctly at Prometheus dashboard:
Now a lot more metrics, this time from the node, are also available:

Console output and graphs for all these metrics is now available:

Complete list of metrics is available at https://github.com/prometheus/node_exporter.

Query using PromQL (TODO)

Add some fun queries from https://prometheus.io/docs/querying/basics/.

cAdvisor

cAdvisor (Container Advisor) provides resource usage and performance characteristics running containers. Let’s take a look on how cAdvisor can be used to get these metrics from containers.

Run cAdvisor

docker container run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:latest

Dashboard is available at http://localhost:8080
A high-level CPU and Memory utilization is shown. More details about CPU, memory, network and filesystem usage is shown in the same page. CPU usage looks like as shown:
All Docker containers are in /docker sub-container.

Click on any of the containers and see more details about the container.

cAdvisor samples once a second and has historical data for only one minute. The data generated from cAdvisor can be exported to InfluxDB. Optionally, you may use a Grafana front end to visualize the data as explained in How to setup Docker monitoring.

Prometheus and cAdvisor

cAdvisor also exposes container statistics as Prometheus metrics out of the box. By default, these metrics are served under the /metrics HTTP endpoint. Let’s take a look at how these container metrics can be observed using Prometheus.

Terminate previously running cAdvisor:
```
docker container rm -f cadvisor
```

Start a new cAdvisor service, using the prom overlay network created earlier:

docker service create \
  --name cadvisor \
  --network prom \
  --mode global \
  --mount type=bind,source=/,target=/rootfs \
  --mount type=bind,source=/var/run,target=/var/run \
  --mount type=bind,source=/sys,target=/sys \
  --mount type=bind,source=/var/lib/docker,target=/var/lib/docker \
  google/cadvisor:latest

Terminate the previously running Prometheus service:
```
docker service rm metrics
```

The update prometheus.yml configuration file is:

global:
  scrape_interval: 10s
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets:
        - 'localhost:9090'

  - job_name: 'node resources'
    dns_sd_configs:
      - names: ['tasks.node']
        type: 'A'
        port: 9100
    params:
      collect[]:
        - cpu
        - meminfo
        - diskstats
        - netdev
        - netstat

  - job_name: 'node storage'
    scrape_interval: 1m
    dns_sd_configs:
      - names: ['tasks.node']
        type: 'A'
        port: 9100
    params:
      collect[]:
        - filefd
        - filesystem
        - xfs

  - job_name: 'cadvisor'
    dns_sd_configs:
      - names: ['tasks.cadvisor']
        type: 'A'
        port: 8080

Start the new Prometheus service

docker service create \
  --replicas 1 \
  --name metrics \
  --network prom \
  --mount type=bind,source=`pwd`/prometheus.yml,destination=/etc/prometheus/prometheus.yml \
  --publish 9090:9090/tcp \
  prom/prometheus

Confirm that all the targets are configured correctly at Prometheus dashboard:

Note, all four scrape endpoints are shown here.
In Graphs, now, a lot more metrics, this time from cAdvisor, are also available:

Console output and graphs for all these metrics is now available:

Complete list of metrics is available at https://github.com/google/cadvisor.

Here is a basic query written using PromQL worth trying:

sum by (container_label_com_docker_swarm_node_id) (
  irate(
    container_cpu_usage_seconds_total{
      container_label_com_docker_swarm_service_name="metrics"
      }[1m]
  )
)

This shows the average amount of CPU used per minute by the service metrics aggregated over multiple CPUs. The graph will look as shown:

Monitor Java Applications

This section will explain how an existing Java application can be updated to publish metrics and monitored by Prometheus.

Prometheus collects metrics from monitored targets by scraping metrics HTTP endpoints on these targets.

As discussed earlier, Prometheus collects metrics from monitored targets by scraping from an HTTP endpoint on these targets. By default, these metrics are expected to be published at /metrics. Any existing Java application can be updated to publish Prometheus-style metrics at this endpoint.

An earlier chapter explained a simple Java EE application that talks to a MySQL database. This application also publishes Prometheus-style metrics for the underlying JVM at /metrics. It also publishes application-specific metrics such as total number of times GET / and GET /{id} is called.

The complete set of JVM metrics are explained at https://github.com/prometheus/client_java. Refer to https://github.com/arun-gupta/docker-javaee/tree/master/employees/src/main/java/org/javaee/samples/employees/metrics for more details on how these metrics are enabled.

Start Java application

Use the Compose file to deploy a simple the Java EE application. This will start WildFly Swarm application and MySQL database.
```
docker stack deploy --compose-file=docker-compose.yml webapp
```
This will create webapp_default overlay network, and start the webapp_web and webapp_db services.

Verify the network:

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
u6ybdaqx5h5y        webapp_default      overlay             swarm

Other networks may be shown here as well.

Verify the services:

$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                            PORTS
ucztcpf1vw0a        webapp_db           replicated          1/1                 mysql:8                          *:3306->3306/tcp
jttfgvr09kre        webapp_web          replicated          1/1                 arungupta/docker-javaee:latest   *:8080->8080/tcp,*:9990->9990/tcp

Verify that the endpoint is accessible:

$ curl http://localhost:8080/resources/employees
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><collection><employee><id>1</id><name>Penny</name></employee><employee><id>2</id><name>Sheldon</name></employee><employee><id>3</id><name>Amy</name></employee><employee><id>4</id><name>Leonard</name></employee><employee><id>5</id><name>Bernadette</name></employee><employee><id>6</id><name>Raj</name></employee><employee><id>7</id><name>Howard</name></employee><employee><id>8</id><name>Priya</name></employee></collection>

Access the metrics published by the endpoint using curl http://localhost:8080/metrics to see the output:

# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{version="1.8.0_141-8u141-b15-1~deb9u1-b15",vendor="Oracle Corporation",} 1.0
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="PS Scavenge",} 25.0
jvm_gc_collection_seconds_sum{gc="PS Scavenge",} 0.386
jvm_gc_collection_seconds_count{gc="PS MarkSweep",} 6.0
jvm_gc_collection_seconds_sum{gc="PS MarkSweep",} 0.546
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 25.5
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.508056592419E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 499.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 4.244393984E9
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 5.06601472E8
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 13096.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 13096.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 0.0
# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 60.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 12.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 67.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 93.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 1.2072508E8
jvm_memory_bytes_used{area="nonheap",} 9.3550048E7
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{area="heap",} 2.69484032E8
jvm_memory_bytes_committed{area="nonheap",} 1.0133504E8
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{area="heap",} 4.66092032E8
jvm_memory_bytes_max{area="nonheap",} -1.0
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{pool="Code Cache",} 1.4589888E7
jvm_memory_pool_bytes_used{pool="Metaspace",} 6.9998048E7
jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 8962112.0
jvm_memory_pool_bytes_used{pool="PS Eden Space",} 2.3732032E7
jvm_memory_pool_bytes_used{pool="PS Survivor Space",} 6073592.0
jvm_memory_pool_bytes_used{pool="PS Old Gen",} 9.0919456E7
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{pool="Code Cache",} 1.47456E7
jvm_memory_pool_bytes_committed{pool="Metaspace",} 7.5800576E7
jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 1.0788864E7
jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 9.2274688E7
jvm_memory_pool_bytes_committed{pool="PS Survivor Space",} 3.8797312E7
jvm_memory_pool_bytes_committed{pool="PS Old Gen",} 1.38412032E8
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9
jvm_memory_pool_bytes_max{pool="PS Eden Space",} 9.699328E7
jvm_memory_pool_bytes_max{pool="PS Survivor Space",} 3.8797312E7
jvm_memory_pool_bytes_max{pool="PS Old Gen",} 3.49700096E8

It shows all the JVM metrics that are published by the Prometheus JVM Client. The metrics generated by the application are not shown yet. It requires for the application to be accessed first.

Let’s access the JVM metrics in Prometheus dashboard first, and then we’ll access the app to show app-specific metrics.

Start Prometheus service

Make sure to terminate any previously running Prometheus endpoints:
```
docker service rm metrics
```
Create a directory prometheus and change into that directory.
Create a text file prometheus.yml and add the following content:
```
global:
  scrape_interval: 10s
scrape_configs:
  - job_name: 'webapp'
    dns_sd_configs:
      - names: ['tasks.webapp_web']
        type: 'A'
        port: 8080
```
This defines the configuration for the HTTP endpoint that publishes Prometheus-style metrics from the Java application.

Start Prometheus service:

docker service create \
  --replicas 1 \
  --network webapp_default \
  --name metrics \
  --mount type=bind,source=`pwd`/prometheus.yml,destination=/etc/prometheus/prometheus.yml \
  --publish 9090:9090 \
  prom/prometheus

Note, this service is using the webapp_default overlay network that is created when the application stack was deployed.

Access Prometheus dashboard at http://localhost:9090
Check the configured targets at http://localhost:9090/targets:

It shows that the application metrics HTTP endpoint is configured as a Prometheus target.

View application metrics

On Prometheus dashboard, click on -insert metric at cursor- to see the list of metrics available:

JVM metrics shown earlier are displayed here as well.
Select jvm_memory_pool_bytes_used metric and click on Execute to view the metric.
Select Graph to view the graphical representation
Now access the application using curl http://localhost:8080/resources/employees a few times.
Refresh Prometheus dashboard and see the updated list of metrics:

Note, app* and requests* that are generated by the application.
Select requests_get_all metric and view the graph:
Access the application a few times using curl http://localhost:8080/resources/employees/5 and then watch the requests_get_one metric.

Grafana

Grafana is an open source metric analytics & visualization suite. It supports many different storage backends, called as Data Source. Prometheus can be added as Grafana data source. It even provides support for runnning Prometheus queries from the Grafana dashboard as well. More details can be found in Using Prometheus in Grafana.

Start Grafana

This section will explain how to start Grafana, use Prometheus as the data source, and view some container metrics.

Start Grafana:

docker container run \
  -d \
  -p 3000:3000 \
  --name=grafana \
  -e "GF_SECURITY_ADMIN_PASSWORD=secret" \
  grafana/grafana

Use the login name admin and password secret.

Read more details about different configuration options.

Access Grafana dashboard at http://localhost:3000. Use the login and password as credentials to see Grafana console.

Add Prometheus as data source

Click the Add data source button in the top header.
Specify the parameters as shown:
Click on Add to test and save the data source:

The green bar indicates that the data source was added successfully.

Create chart with Prometheus data source

Click on Create your first dashboard, save it and give it a name, say Docker and Java dashboard
Click on Graph, edit, under the Metrics tab, select your Prometheus data source.
Enter the following Prometheus query expressions in the query field. The graphs will referesh in a few seconds and will look like as shown:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ch10-monitoring.adoc

ch10-monitoring.adoc

Monitoring Docker Containers

Docker CLI

Docker Remote API

Docker Events

Use filters

Prometheus

Docker metrics for Prometheus

Start Prometheus

Node health

Start Node Exporter

Restart Prometheus

Check metrics

Query using PromQL (TODO)

cAdvisor

Prometheus and cAdvisor

Monitor Java Applications

Start Java application

Start Prometheus service

View application metrics

Grafana

Start Grafana

Add Prometheus as data source

Create chart with Prometheus data source

Files

ch10-monitoring.adoc

Latest commit

History

ch10-monitoring.adoc

File metadata and controls

Monitoring Docker Containers

Docker CLI

Docker Remote API

Docker Events

Use filters

Prometheus

Docker metrics for Prometheus

Start Prometheus

Node health

Start Node Exporter

Restart Prometheus

Check metrics

Query using PromQL (TODO)

cAdvisor

Prometheus and cAdvisor

Monitor Java Applications

Start Java application

Start Prometheus service

View application metrics

Grafana

Start Grafana

Add Prometheus as data source

Create chart with Prometheus data source