What started as a simple CLI/Service to evaluate Kafka cluster topics which have no activity, ended up being a somewhat comprehensive way to monitor Kafka cluster activities.
Takes a configuration file as input, where you can list one or multiple cluster(s) you wish to monitor.
After a set period of time, it can produce a report (to local disk or AWS S3) with the list of topics that haven't seen any activity. It also exposes metrics via a prometheus endpoint.
kafka-overwatch -c config.local.yaml
Supports evaluating multiple Kafka clusters at once
Generates a report on topic usage based on topic watermarks offsets (store local or to S3)
Generates commands script to re-create all the topics in case of DR (store local or to S3)
Exposes metrics via prometheus
- Topics count
- Partitions count
- Number of new messages (measured with topic offsets)
AWS Secret integration for client config values
Schema Registry integration
- Scan schema registries, map 1 to many kafka clusters
- Backup of the schemas, and CLI to restore schemas to existing/new registry.
- Multi-nodes awareness (split the load with multiple nodes)
- cfn-kafka-admin output format
- topic messages meta-data analysis (i.e are messages compressed?)
- scripts to perform cleanup
- Recommendations generated from/based on models
- Conduktor Gateway vClusters auto-discovery
Whilst a much more comprehensive documentation is yet to be written, please look at kafka_overwatch/specs/config.json
which is used with jsonschema to perform validation of the input.
0 - all successful. 1 - error during execution 2 - error importing configuration.
Thanks to the Apache Kafka OpenSource community for their continuous efforts in making the eco-system great. Thanks to the NASA for having a public cluster to run tests with
Inspired by kafka-idle-topics, yet completely re-written to be a continuous monitoring of the topics, similar to cruise-control.
Images build status
Docs build status