Skip to content
Pull request Compare This branch is 47 commits behind wikimedia:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Table of Contents generated with DocToc

Kafka Puppet Module

A Puppet module for installing and managing Apache Kafka brokers.

This module is currently being maintained by The Wikimedia Foundation in Gerrit at operations/puppet/kafka and mirrored here on GitHub. It was originally developed for 0.7.2 at



Kafka (Clients)

# Install the kafka package.
class { 'kafka': }

This will install the Kafka package which includes /usr/sbin/kafka, useful for running client (console-consumer, console-producer, etc.) commands.

Kafka Broker Server

# Include Kafka Broker Server.
class { 'kafka::server':
    log_dirs         => ['/var/spool/kafka/a', '/var/spool/kafka/b'],
    brokers          => {
        '' => { 'id' => 1, 'port' => 12345 },
        '' => { 'id' => 2 },
    zookeeper_hosts  => ['zk-node01:2181', 'zk-node02:2181', 'zk-node03:2181'],
    zookeeper_chroot => '/kafka/cluster_name',

log_dirs defaults to a single ['/var/spool/kafka], but you may specify multiple Kafka log data directories here. This is useful for spreading your topic partitions across multiple disks.

The brokers parameter is a Hash keyed by $::fqdn. Each value is another Hash that contains config settings for that kafka host. id is required and must be unique for each Kafka Broker Server host. port is optional, and defaults to 9092.

Each Kafka Broker Server's broker_id and port properties in will be set based by looking up the node's $::fqdn in the hosts Hash passed into the kafka base class.

zookeeper_hosts is an array of Zookeeper host:port pairs. zookeeper_chroot is optional, and allows you to specify a Znode under which Kafka will store its metadata in Zookeeper. This is useful if you want to use a single Zookeeper cluster to manage multiple Kafka clusters. See below for information on how to create this Znode in Zookeeper.

Custom Zookeeper Chroot

If Kafka will share a Zookeeper cluster with other users, you might want to create a Znode in zookeeper in which to store your Kafka cluster's data. You can set the zookeeper_chroot parameter on the kafka class to do this.

First, you'll need to create the znode manually yourself. You can use that ships with Zookeeper, or you can use the kafka built in

$ kafka zookeeper-shell <zookeeper_host>:2182
Connecting to kraken-zookeeper
Welcome to ZooKeeper!
JLine support is enabled


WatchedEvent state:SyncConnected type:None path:null
[zk: kraken-zookeeper(CONNECTED) 0] create /my_kafka kafka
Created /my_kafka

You can use whatever chroot znode path you like. The second argument (data) is arbitrary. I used 'kafka' here.


class { 'kafka::server':
    brokers => {
        '' => { 'id' => 1, 'port' => 12345 },
        '' => { 'id' => 2 },
    zookeeper_hosts => ['zk-node01:2181', 'zk-node02:2181', 'zk-node03:2181'],
    # set zookeeper_chroot on the kafka class.
    zookeeper_chroot => '/kafka/clusterA',

Kafka Mirror

Kafka MirrorMaker is usually used for inter data center Kafka cluster replication and aggregation. You can consume from any number of source Kafka clusters, and produce to a single destination Kafka cluster.

# Configure kafka-mirror to produce to Kafka Brokers which are
# part of our kafka aggregator cluster.
class { 'kafka::mirror':
    destination_brokers => {
        '' => { 'id' => 11 },
        '' => { 'id' => 12 },
    topic_whitelist => 'webrequest.*',

# Configure kafka-mirror to consume from both clusterA and clusterB
kafka::mirror::consumer { 'clusterA':
    zookeeper_hosts  => ['zk-node01:2181', 'zk-node02:2181', 'zk-node03:2181'],
    zookeeper_chroot => ['/kafka/clusterA'],
kafka::mirror::consumer { 'clusterB':
    zookeeper_hosts  => ['zk-node01:2181', 'zk-node02:2181', 'zk-node03:2181'],
    zookeeper_chroot => ['/kafka/clusterB'],

jmxtrans monitoring

This module contains a class called kafka::server::jmxtrans. It contains a useful jmxtrans JSON config object that can be used to tell jmxtrans to send to any output writer (Ganglia, Graphite, etc.). To you use this, you will need the puppet-jmxtrans module.

# Include this class on each of your Kafka Broker Servers.
class { '::kafka::server::jmxtrans':
    ganglia => '',

This will install jmxtrans and start render JSON config files for sending JVM and Kafka Broker stats to Ganglia. See for a fully rendered jmxtrans Kafka JSON config file.

Something went wrong with that request. Please try again.