Skip to content

EladLeev/schema-registry-statistics

Repository files navigation

Schema-registry-statistics

GitHub Workflow Status Go Report Card Renovate

Schema Registry Statistics Tool is a small utility that allows you to easily identify the usage of different schema versions within a topic.
Using this tool, you can consume from a topic, while calculating the percentage of each schema version.

Table of Contents

Example output:

[sr-stats] 2022/12/28 10:02:12 Starting to consume from payments-topic
[sr-stats] 2022/12/28 10:02:12 Consumer up and running!...
[sr-stats] 2022/12/28 10:02:12 Use SIGINT to stop consuming.
[sr-stats] 2022/12/28 10:02:14 terminating: via signal
[sr-stats] 2022/12/28 10:02:14 Total messages consumed: 81
Schema ID 1 => 77%
Schema ID 3 => 23%

As you can see, in the payments-topic, 77% of the messages are produced using schema ID 1, while the remaining messages are produced using schema ID 3.

You can get the schema by ID:

curl -s http://<SCHEMA_REGISTRY_ADDR>/schemas/ids/1 | jq .

For further offsets analysis, you can store the results into a JSON file:

{"1":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61],"3":[62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80]}

Flags

Name Description Require Type default
--bootstrap The Kafka bootstrap servers. V string "localhost:9092"
--topic The topic name to consume. V string ""
--version The Kafka client version to be used. string "2.1.1"
--group The consumer group name. string schema-stats
--user The Kafka username for authentication. string ""
--password The Kafka authentication password. string ""
--tls Use TLS communication. bool false
--cert When TLS communication is enabled, specify the path for the CA certificate. when tls string ""
--store Store results into a file. bool false
--chart Generate pie chart from results. bool false
--path If store flag is set, the path to store the file. string "/tmp/results.json"
--oldest Consume from oldest offset. bool true
--limit Limit consumer to X messages, if different than 0. int 0
--verbose Raise the consumer log level. bool false

Usage

./schema-registry-statistics --bootstrap kafka1:9092 --group stat-consumer --topic payments-topic --store --path ~/results.json

Consume from payments-topic of kafka1 and store the results. The consumer will run until SIGINT (CMD + C) will be used.

Generate Pie Chart

By using the --chart flag, you can generate an HTML page with a pie chart visualization: Pie Chart Example

How does it work?

According to the Kafka wire format, has only a couple of components:

Bytes Area Description
0 Magic Byte Confluent serialization format version number; currently always 0.
1-4 Schema ID 4-byte schema ID as returned by the Schema Registry.
5.. Data Serialized data in the specified schema format (Avro, Protobuf).

The tool leverage this format, and reads the binary format of the each message in order to extract the schema ID and store it.

Local testing

You can use the docker-compose.yml file to create a local environment from scratch.
In the /scripts directory, there are 2 versions of the same schema, and a simple Python Avro producer.

License

This project is licensed under the Apache License - see the LICENSE file for details.