Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addon: expose /metrics endpoints for Prometheus #49

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

solsson
Copy link
Contributor

@solsson solsson commented Jul 28, 2017

Good fit with https://github.com/Yolean/kubernetes-monitoring.

TODO recommend a Grafana dashboard json

@solsson solsson added the addon label Jul 28, 2017
solsson added a commit that referenced this pull request Jul 28, 2017
#49
but maybe with tests instead of talk
@solsson solsson force-pushed the addon-metrics branch 3 times, most recently from 7240ded to 5221e4d Compare July 31, 2017 07:30
@solsson
Copy link
Contributor Author

solsson commented Jul 31, 2017

Based on the observation that the test pod in https://github.com/Yolean/kubernetes-kafka/blob/addon-metrics/test/jmx-selftest.yml takes 40 - 100 MB memory (in GKE according to kubectl top) I've tried to fit all metrics containers in a 100 MB resources limit. The problem is that the JVM will have to be restricted to such a limit, or it will have spikes that cause pod restarts. I think that the current 64 MB for app and 32 MB for "metaspace" avoids such restarts while keeping scrapes almost as performant as without resource limits.

@solsson
Copy link
Contributor Author

solsson commented Jul 31, 2017

Poor results in GKE, getting pod restarts at least once per five minutes:

    Command:
      java
      -Xmx64M
      -XX:MaxMetaspaceSize=32m
      -jar
      jmx_prometheus_httpserver.jar
      5556
      example_configs/kafka-prometheus-monitoring.yml
    State:		Running
      Started:		Mon, 31 Jul 2017 21:36:37 +0200
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137

solsson added a commit that referenced this pull request Aug 5, 2017
at least for now, as it allows exec into the pods to investigate.
We've been having frequent restarts that are not due to OOMKilled (i.e. not #49).
Now failed probes will lead to unready pods, which we can monitor for using #60.
solsson referenced this pull request Aug 5, 2017
which might not matter because we no longer have a loadbalancing service.

These probes won't catch all failure modes,
but if they fail we're pretty sure the container is malfunctioning.

I found some sources recommending ./bin/kafka-topics.sh for probes
but to me it looks risky to introduce a dependency to some other service for such things.
One such source is helm/charts#144

The zookeeper probe is from
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
An issue is that zookeeper's logs are quite verbose for every probe.
@yacut
Copy link

yacut commented Sep 25, 2017

@solsson I had the same problem with kafka metrics.

  1. the metrics are too big for small server (especially if you have many topics/partitions and exports all java lang metrics) That was the reason of an error java.lang.OutOfMemoryError: GC Overhead Limit Exceeded. I have reduced the metrics for kafka to:
lowercaseOutputName: true
jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi
rules:
  - pattern : kafka.server<type=ReplicaFetcherManager, name=MaxLag, clientId=(.+)><>Value
  - pattern : kafka.server<type=BrokerTopicMetrics, name=(.+), topic=(.+)><>OneMinuteRate
  - pattern : kafka.server<type=KafkaRequestHandlerPool, name=RequestHandlerAvgIdlePercent><>OneMinuteRate
  - pattern : kafka.server<type=Produce><>queue-size
  - pattern : kafka.server<type=ReplicaManager, name=(.+)><>(Value|OneMinuteRate)
  - pattern : kafka.server<type=controller-channel-metrics, broker-id=(.+)><>(.*)
  - pattern : kafka.server<type=socket-server-metrics, networkProcessor=(.+)><>(.*)
  - pattern : kafka.server<type=Fetch><>queue-size
  - pattern : kafka.server<type=SessionExpireListener, name=(.+)><>OneMinuteRate
  - pattern : java.lang<type=OperatingSystem><>SystemCpuLoad
  - pattern : java.lang<type=Memory><HeapMemoryUsage>used
  - pattern : java.lang<type=OperatingSystem><>FreePhysicalMemorySize
  1. jmx exporter ist very slow. It does not matter how many memory or cpu has the server. So the solution is increase timeouts for prometheus and kubernetes.

prometheus global config:

global:
  scrape_interval: 30s
  scrape_timeout: 30s
  evaluation_interval: 30s

k8s container live probe:

        livenessProbe:
            httpGet:
              path: /metrics
              port: 8080
            initialDelaySeconds: 60
            periodSeconds: 60
            timeoutSeconds: 60
            successThreshold: 1
            failureThreshold: 3

Enjoy :)

@solsson
Copy link
Contributor Author

solsson commented Sep 26, 2017

@yacut Thanks for the feedback. I noticed the default export just included everything, so I'll update the exporter config for brokers to your suggestion.

I'm surprised about the performance issue though. I the tests I ran, 3 seconds were sufficient according to https://github.com/Yolean/kubernetes-kafka/blob/addon-metrics/test/metrics.yml#L80.

@yacut
Copy link

yacut commented Sep 26, 2017

@solsson I'm surprised too.

There are some issue about it:

I'm not an expert, but I guess the bigger kafka cluster (brokers/topics/partitions/messages rate) the slower the responses. With our cluster size the responses are ~ 15-35 seconds 😟

I also saw that jmx exporter responds very quickly, if I stop the broker and he is no longer in the cluster replication, but still runs a bit.

@solsson
Copy link
Contributor Author

solsson commented Sep 26, 2017

Thanks for the background. This looks like a weakness with jmx_exporter. Before we dig deep here it could be worth investigating if there's other ways to get Prometheus compliant metrics out of Kafka.

@yacut
Copy link

yacut commented Sep 26, 2017

There are not much exporters for kafka: https://prometheus.io/docs/instrumenting/exporters/

For me the important metrics are:

  • brokers live cycle (up/down/replica), that only jmx exporter can
  • message rate per topic, that also can only jmx exporter
  • consumer group lag, two exporters can do that
  • free disk space in volume, no exporter for that, only with k8s v1.8 possible with node exporter ;(
  • maybe also java cpu and heap, again jmx exporter

If you find another exporter, it would be great, but at the moment we have no choice...

@yacut
Copy link

yacut commented Oct 6, 2017

@solsson Performance improved from ~35-40 seconds to ~5-8 seconds per request by adding the settings ssl and whitelistObjectNames:

lowercaseOutputName: true
jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi
ssl: false
whitelistObjectNames: ["kafka.server:*","java.lang:*"]
rules:
  - pattern : kafka.server<type=ReplicaFetcherManager, name=MaxLag, clientId=(.+)><>Value
  - pattern : kafka.server<type=BrokerTopicMetrics, name=(.+), topic=(.+)><>OneMinuteRate
  - pattern : kafka.server<type=KafkaRequestHandlerPool, name=RequestHandlerAvgIdlePercent><>OneMinuteRate
  - pattern : kafka.server<type=Produce><>queue-size
  - pattern : kafka.server<type=ReplicaManager, name=(.+)><>(Value|OneMinuteRate)
  - pattern : kafka.server<type=controller-channel-metrics, broker-id=(.+)><>(.*)
  - pattern : kafka.server<type=socket-server-metrics, networkProcessor=(.+)><>(.*)
  - pattern : kafka.server<type=Fetch><>queue-size
  - pattern : kafka.server<type=SessionExpireListener, name=(.+)><>OneMinuteRate
  - pattern : java.lang<type=OperatingSystem><>SystemCpuLoad
  - pattern : java.lang<type=Memory><HeapMemoryUsage>used
  - pattern : java.lang<type=OperatingSystem><>FreePhysicalMemorySize

Prometheus scrape settings are back to normal:

global:
  scrape_interval: 15s
  scrape_timeout: 15s

solsson added a commit that referenced this pull request Oct 6, 2017
through ssl=false and whitelist.

Thanks to @yacut, see #49
@solsson
Copy link
Contributor Author

solsson commented Oct 6, 2017

@yacut great find. Does the branch metrics-improve-scrape-times correspond to your config? It get speedy scrapes with it, and it contains the metrics I've looked for except jmx_scrape_duration_seconds.

Have you had a look at the scrape config for zookeeper? I failed completely to extract meaningful metrics in #61.

k8s container live probe:

I assume this is for the metrics container, but I don't understand port 8080. Do you think it's worth the extra jmx runs to have this kind of liveness probe, given performance is an issue already?

@yacut
Copy link

yacut commented Oct 8, 2017

@solsson Basically yes, but I don't think that (.+) pattern is good for the jmx exporter performance. I use it only if necessary, e.g. for the topic labels:

  - pattern : kafka.server<type=ReplicaManager, name=(PartitionCount|UnderReplicatedPartitions)><>Value
  - pattern : kafka.server<type=BrokerTopicMetrics, name=(BytesInPerSec|BytesOutPerSec|MessagesInPerSec), topic=(.+)><>OneMinuteRate

I believe the k8s container live probe is important because if the jmx exporter can't response anymore than it's useless. One minute live probe duration should not be a problem if you use the whitelist config and only the metrics that important to you.

In my humble opinion, the following metrics are important for zookeeper:

  • Alive connection: shows the number of brokers that joined to cluster
  • Packets Sent Rate: shows the zookeeper liveness rate and who is the leader right now
  • Quorum Size: shows the zookeeper quorum config and the member id

More info here: https://zookeeper.apache.org/doc/r3.1.2/zookeeperJMX.html

@solsson solsson changed the base branch from kafka-011 to master October 22, 2017 18:22
solsson added a commit that referenced this pull request Oct 22, 2017
@solsson
Copy link
Contributor Author

solsson commented Oct 22, 2017

I believe the k8s container live probe is important because if the jmx exporter can't response anymore than it's useless.

Suggested a liveness probe in e4fadac

@solsson
Copy link
Contributor Author

solsson commented Nov 1, 2017

Confluent's release post for 1.0.0 mentions changes to metrics. Most of it, according to the release notes is in Connect. For Kafka I found https://issues.apache.org/jira/browse/KAFKA-5341.

@solsson
Copy link
Contributor Author

solsson commented Nov 3, 2017

This is a great addition, but with + 100M-150M memory per pod (+800M with the default scale) I'm a bit hesitant to merge. Will test more in #84.

@solsson
Copy link
Contributor Author

solsson commented Nov 4, 2017

I had always started from jmx-exporter's sample yaml for kafka, but it's much more enlightening to do as in metrics-experiment -- export everything.

To inspect the result I'm using:

metrics_save() {
  pod=$1
  kubectl -n kafka port-forward $pod 5556:5556 &
  sleep 1
  time curl -o "tmp-metrics-$pod-$(date +%FT%H%M%S).txt" -f -s http://localhost:5556/metrics
  kill %%
}
metrics_save kafka-0
metrics_save pzoo-0

Sample full kafka /metrics at https://gist.github.com/solsson/efb929260fd663a9e15e0ac8557c5028, zoo at https://gist.github.com/solsson/15e2bdce7c23b2d1c7aea0ef895900cb

@solsson
Copy link
Contributor Author

solsson commented Nov 7, 2017

I've been testing kafka on a cluster with quite busy nodes, and I'm having more problems with the metrics containers than with Kafka itself. Currently exporting more metrics than committed conf, but with ssl=false.

  Normal   Pulled                 10m (x2 over 13m)   kubelet, gke-eu-west-3-b1-default-pool-b345de87-whj6  Container image "solsson/kafka-prometheus-jmx-exporter@sha256:40a6ab24ccac0ed5acb8c02dccfbb1f5924fd97f46c0450e0245686c24138b53" already present on machine
  Normal   Created                10m (x2 over 13m)   kubelet, gke-eu-west-3-b1-default-pool-b345de87-whj6  Created container
  Normal   Killing                10m                 kubelet, gke-eu-west-3-b1-default-pool-b345de87-whj6  Killing container with id docker://metrics:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Started                10m (x2 over 13m)   kubelet, gke-eu-west-3-b1-default-pool-b345de87-whj6  Started container
  Warning  Unhealthy              3m (x10 over 12m)   kubelet, gke-eu-west-3-b1-default-pool-b345de87-whj6  Liveness probe failed: Get http://10.0.8.195:5556/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

I've also raised the memory limit to 200M. I think we must find a liveness probe that doesn't cost an additional round of JMX probing.

Or drop the liveness probes, and have the monitoring system alert on stale metrics.

solsson added a commit that referenced this pull request Nov 9, 2017
Already included in #49, but here we don't add any export container to the pod.

Can be utilized by kafka-manager (#83) - just tick the JMX box when adding a cluster -
to see bytes in/out rates.
This was referenced Nov 10, 2017
@solsson solsson mentioned this pull request Dec 22, 2017
solsson added a commit that referenced this pull request Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants