## Publisher/Subscriber System
Pattern that is characterized by the sender (publisher) of a piece of data (message) not specifically directing it to a receiver. Pub/sub systems often have a broker, a central point where messages are published, to facilitate this.  

Kafka is pub/sub system. Data within Kafka is stored *durably*, in *order*, and can be read *deterministically*. In addition, the data can be *distributed* within the system to provide additional protections against failures, as well as significant opportunities for *scaling* performance.

## Terminologies
**Message:**  unit of data within Kafka - similar to a row in database terms. It doesn't have any special format, it is just an array of bytes.  
**Key:** optional bit of metadata associated with a message. Key is also a byte array and, as with the message, has no specific meaning to Kafka. Keys are used when messages are to be written to partitions in a more controlled manner. The simplest such scheme is to generate a consistent hash of the key, and then select the partition number for that message by taking the result of the hash modulo, the total number of partitions in the topic. This assures that messages with the same key are always written to the same
partition.  
**Batch:** collection of messages being produced to the same topic and message. This increases throughput with some penalty on latency.  
**Topic:** messages are categorized into topics. Analogous to a table in database.  
**Partition:** topic is broken down into a number of partitions. As a topic typically has multiple partitions, there is no guarantee of message time-ordering across the entire topic, just within a single partition. Partitions are also the way that Kafka provides redundancy and scalability. Each partition can be hosted on a different server, which means that a single topic can be scaled horizontally across multiple servers.  
**Producers:** create new messages destined to a specific topic. By default producer doesn't care which partition the message is added to, however facilities exist to send messages to specific partition using keys.  
**Consumers:** read message in the order they were produced. Consumer keeps track of which messages it has already consumed by keeping track of the offset of message.  
**Offset:** another bit of metadata—an integer value that continually increases—that Kafka adds to each message as it is produced. Each message in a given partition has a unique offset, and Kafka can store offset for each partition.  
**Consumer group:** one or more consumers that work together to consume a topic. The group assures that each partition is only consumed by one member.  

<img src="./images/partitions_consumer_groups.png"/>

**Broker:** a single Kafka server is called a broker. Broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk. It also services consumers, responding to fetch requests for partitions and responding with the messages that have been committed to disk.  
**Cluster:** brokers are part of a cluster. Among the brokers, one will be the leader broker called as controller. It is responsible for assigning partitions to brokers and monitoring for broker failures. A partition is owned by a single broker in the cluster, and that broker is called the leader of the partition. A partition may be assigned to multiple brokers, which will result in the partition being replicated, thus providing redundancy. All consumers and producers operating on that partition, however must connect to the leader.  

<img src="./images/cluster_broker_replication.png" />

## Partition Considerations
Usually the number of partitions for a topic is chosen to be equal to the number of brokers in the cluster. There are several factors to consider while determining the correct number of partitions:
- Expected throughput on the topic. 100KB or 1GB per second?
- Maximum throughput expected from a single partition? At any time, we can have a maximum of one consumer reading from a partition. So in a way the throughput would be limited by the slowest consumer
- If we are sending messages to partitions based on keys, adding partition later can be very challenging.
- Consider number of partitions being placed on each broker in terms of diskspace and network bandwidth.

## Producer
Flow:  
<img src="./images/producer_flow.png" />

### Creating Producer
Producer object facilitates thread safe pulishing of messages to Kafka. To instantiate a producer object, we need the following mandatory properties:

In [None]:
private Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers", "broker1:9092,broker2:9092");
kafkaProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
kafkaProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = new KafkaProducer<>(kafkaProps);

Full list of producer configuration properties [here](https://kafka.apache.org/documentation.html#producerconfigs). However some important ones are listed below:

| Property                  | Description                                                                                                                                                                                                                                                                                                                                                                                                                         |
|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| acks                      | acks=0, producer will not wait for a reply from the broker before assuming the message was sent successfully. High throughput, but producer will not know about any failures. acks=1, producer will receive a success response from the broker the moment the leader replica received the message. acks=all, producer will receive a success response from the broker once all in-sync replicas received the message. High latency. |
| buffer.memory             | sets the amount of memory the producer will use to buffer messages waiting to be sent to brokers. If messages are sent by the application faster than they can be delivered to the server, the producer may run out of space and additional send() calls will either block or throw an exception, based on the `block.on.buffer.full` prop.                                                                                         |
| compression.type          | can be set to snappy, gzip, or lz4.                                                                                                                                                                                                                                                                                                                                                                                                 |
| retries                   | how many times the producer will retry sending the message before giving up and notifying the client of an issue. Producer will wait `retry.backoff.ms` amount of time before retrying.                                                                                                                                                                                                                                             |
| client.id                 | used by broker to identify client                                                                                                                                                                                                                                                                                                                                                                                                   |
| timeout.ms                | controls the time the broker will wait for in-sync replicas to acknowledge the message.                                                                                                                                                                                                                                                                                                                                             |
| metadata.fetch.timeout.ms | how long the producer will wait for a reply from the server when sending data.                                                                                                                                                                                                                                                                                                                                                      |

### Sending Message
A message can be sent by producer in the following ways:
1. Fire and forget
2. Synchronously
3. Asynchronously

In [None]:
// Fire and forget
ProducerRecord<String, String> record = 
    new ProducerRecord<>("CustomerCountry", "Precision Products", "France");
    // <Topic>,<Key>,<Value>

try {
    producer.send(record);  // Send returns Future<RecordMetadata>
} catch(Exception e) {
    logger.error(e);
}

There are some errors that can happen before sending message to Kafka. Those can be `SerializationException`, `BufferExhaustedException` or `TimeoutException` if buffers are full. Or `InterruptionException` if the sending thread is interrupted.

In [None]:
// Synchronous send
try {
    producer.send(record).get();
} catch(Exception e) {
    logger.error(e);
}

Some errors are retriable, whereas others aren't. Kafka producer automatically retries in case of retriable exceptions. Examples of retriable exception would be connection error or a "no leader" error. Non retriable error would be something like message size being too large.

In [None]:
// Asynchronous send
producer.send(record, (m, e)->{
    if(e != null){
        logger.error(e);
    }
});

// The second argument is a org.apache.kafka.clients.producer.Callback interface object
// which has one method public void onCompletion(RecordMetadata recordMetadata, Exception e)

### Partitions
Kafka messages are key-value pairs and while it is possible to create a `ProducerRecord` with just a topic and a value, with the key set to null by default, most applications produce records with keys. Keys serve two goals: they are additional information that gets stored with the message, and they are also used to decide which one of the topic partitions the message will be written to. All messages with the same key will go to the same partition.  

When the key is null and the default partitioner is used, the record will be sent to one of the available partitions of the topic at random. A round-robin algorithm will be used to balance the messages among the partitions.  

The mapping of keys to partitions is consistent only as long as the number of partitions in a topic does not change. However, the moment you add new partitions to the topic, this is no longer guaranteed.

## Consumers
Applications that need to read data from Kafka use a `KafkaConsumer` to subscribe to Kafka topics and receive messages from these topics. Kafka consumers are typically part of a consumer group. When multiple consumers are subscribed to a topic and belong to the same consumer group, each consumer in the group will receive messages from a different subset of the partitions in the topic. Here is how consumer groups get mapped to partitions: 

<img src="./images/cg_scaling.png" />  

We can have multiple consumer groups consuming the same topic. Each will get messages in the topic independent of one another.  

<img src="./images/multi_cg.png" />

### Partition Rebalance
When we add a new consumer to the group, it starts consuming messages from partitions previously consumed by another consumer. The same thing happens when a consumer shuts down or crashes; it leaves the group, and the partitions it used to consume will be consumed by one of the remaining consumers. Reassignment of partitions to consumers also happen when the topics the consumer group is consuming are modified (e.g., if an administrator adds new partitions).  

During a rebalance, consumers can’t consume messages, so a rebalance is basically a short window of unavailability of the entire consumer group.  

The way consumers maintain membership in a consumer group and ownership of the partitions assigned to them is by sending heartbeats to a Kafka broker designated as the *group coordinator* (this broker can be different for different consumer groups).

### Creating Consumers

In [None]:
Properties props = new Properties();
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("group.id", "CountryCounter"); // This is the consumer group (not mandatory)
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

### Subscribing to Topic

In [None]:
consumer.subscribe(Collections.singletonList("customerCountries"));

// Polling for messages - handles all details of coordination,
// partition rebalances, heartbeats, and data fetching
try {
    while (true) {
        // argument is timeout interval and controls how long poll() will block if data is not available
        // in the consumer buffer. Set to 0, poll() will return immediately
        ConsumerRecords<String, String> records = consumer.poll(100);
        
        // Poll returns multiple records. Each record contains the topic and partition the
        // record came from, the offset of the record within the partition, and of course the
        // key and the value of the record.
        for (ConsumerRecord<String, String> record : records) {
            log.debug("topic = %s, partition = %s, offset = %d, customer = %s, country = %s\n",
                record.topic(), record.partition(), record.offset(), record.key(), record.value());
            
            int updatedCount = 1;
            if (custCountryMap.countainsValue(record.value())) {
                updatedCount = custCountryMap.get(record.value()) + 1;
            }
            
            custCountryMap.put(record.value(), updatedCount);
            JSONObject json = new JSONObject(custCountryMap);
            System.out.println(json.toString(4));
        }
    }
} finally {
    // This will close the network connections and sockets.
    // Also retrigger rebalance immediately.
    consumer.close();
}

The poll loop does a lot more than just get data. The first time we call `poll()` with a new consumer, it is responsible for finding the *GroupCoordinator*, joining the consumer group, and receiving a partition assignment. If a rebalance is triggered, it will
be handled inside the poll loop as well. And of course the heartbeats that keep consumers alive are sent from within the poll loop.  

One consumer per thread is the rule. To run multiple consumers in the same group in one application, we will need to run each in its own thread.

### Consumer Configurations
Full list available [here](https://kafka.apache.org/documentation.html#newconsumerconfigs). Some important ones are: 

| Property           | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| fetch.min.bytes    | minimum amount of data that it wants to receive from the broker when fetching records. If a broker receives a request for records from a consumer but the new records amount to fewer bytes, the broker will wait until more messages are available before sending the records back to the consumer.                                                                                                                                                                                                                       |
| fetch.max.wait.ms  | lets us control how long to wait. By default, Kafka will wait up to 500 ms.                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| session.timeout.ms | amount of time a consumer can be out of contact with the brokers while still considered alive defaults to 3 seconds. Its value should be greater than `heartbeat.interval.ms`                                                                                                                                                                                                                                                                                                                                              |
| auto.offset.reset  | Controls the behavior of the consumer when it starts reading a partition for which it doesn’t have a committed offset or if the committed offset it has is invalid (the consumer was down for so long that the record with that offset was already aged out of the broker). latest: default value and means that lacking a valid offset, the consumer will start reading from the newest records earliest: lacking a valid offset, the consumer will read all the data in the partition, starting from the very beginning. |
| max.poll.records   | maximum number of records that a single call to `poll()` will return.                                                                                                                                                                                                                                                                                                                                                                                                                                                      |

### Commits and Offsets
Kafka differs from other messaging system like JMS in that it does not track acknowledgments from consumers. Consumers are responsible for tracking their position in each partition. The action of updating the current position in the partition a **commit**. Consumers commit an offset by regularly commiting their offset by producing message to a special topic named `__consumer_offsets`.  

Normally, the client library maintains the last read offset in memory and when a client polls, the client library gives the app the next messages from its own client side buffer which it continues to fill in the background (if needed) with fetch requests to the brokers. However that mechanism is independent of committed offsets and works even if the app never commits an offset to the `__consumer-offsets` topic. Committing offsets is like a checkpoint for where you want the app to return should it fail.

In this setting everything works well in case all consumers are up and working fine, however in case of a consumer crash, rebalance happens and the new consumer needs to know where to start consuming new message from. The consumer will read the latest committed offset of each partition and continue from there.  

This can lead situations where:
- messages between the last processed offset and the committed offset will be processed twice.
- OR committed offset is larger than the offset of the last message the client actually processed, all messages between the last processed offset and the committed offset will be missed by the consumer group.

**Automatic Commits:** configure `enable.auto.commit=true`, then every five seconds the consumer will commit the largest offset your client received from `poll()`. Just like everything else in the consumer, the automatic commits are driven by the poll loop. Whenever
we poll, the consumer checks if it is time to commit, and if it is, it will commit the offsets it returned in the last poll. This configuration can however lead to duplicate processing of messages.  

**Commit Current Offset:** By setting `auto.commit.offset=false`, offsets will only be committed when the application explicitly chooses to do so (using `commitSync()`).

In [None]:
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("topic = %s, partition = %s, offset = %d, customer = %s, country = %s\n",
            record.topic(), record.partition(), record.offset(), record.key(), record.value());
    }
    
    try {
        consumer.commitSync(); // commits the latest offset returned by the last poll()
    } catch (CommitFailedException e) {
        log.error("commit failed", e)
    }
}

**Commit Asynchronously:** to improve throughput of the application. The drawback is that while `commitSync()` will retry the commit until it either succeeds or encounters a nonretriable failure, `commitAsync()` will not retry.

In [None]:
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("topic = %s, partition = %s, offset = %d, customer = %s, country = %s\n",
        record.topic(), record.partition(), record.offset(), record.key(), record.value());
    }
    
    consumer.commitAsync();
}

// Commit async supports callback
consumer.commitAsync(new OffsetCommitCallback() {
    public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception exception) {
    if (e != null)
        log.error("Commit failed for offsets {}", offsets, e);
    }
});

A simple pattern to get commit order right for asynchronous retries is to use a monotonically increasing sequence number. Increase the sequence number every time we commit and add the sequence number at the time of the commit to the `commitAsync` callback. When we’re getting ready to send a retry, check if the commit sequence number the callback got is equal to the instance variable; if it is, there was no newer commit and it is safe to retry. If the instance sequence number is higher, don’t retry because a newer commit was already sent.

**Combined:** A common pattern is to combine `commitAsync()` with `commitSync()` just before shutdown.

In [None]:
try {
    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(100);
        for (ConsumerRecord<String, String> record : records) {
            System.out.printf("topic = %s, partition = %s, offset = %d, customer = %s, country = %s\n",
            record.topic(), record.partition(), record.offset(), record.key(), record.value());
        }
        consumer.commitAsync();
    }
} catch (Exception e) {
    log.error("Unexpected error", e);
} finally {
    try {
        consumer.commitSync();
    } finally {
        consumer.close();
    }
}

What if the number of records received in poll is high and we want to commit in between processing? The following code commits after every 1000 records processed using an alternate form of `commitAsync`:

In [None]:
private Map<TopicPartition, OffsetAndMetadata> currentOffsets = new HashMap<>();
int count = 0;
// ...

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("topic = %s, partition = %s, offset = %d, customer = %s, country = %s\n",
            record.topic(), record.partition(), record.offset(), record.key(), record.value());
    
        currentOffsets.put(new TopicPartition(record.topic(), record.partition()), 
            new OffsetAndMetadata(record.offset()+1, "no metadata"));
        if (count % 1000 == 0)
            consumer.commitAsync(currentOffsets, null);
        count++;
    }
}

**Consuming Records with a Paricular Offset:** If we want to start reading all messages from the beginning of the partition, or we
want to skip all the way to the end of the partition and start consuming only new messages, there are APIs specifically for that: `seekToBeginning(TopicPartition tp)` and `seekToEnd(TopicPartition tp)`.  

However, consider the following case:

In [None]:
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        currentOffsets.put(new TopicPartition(record.topic(), record.partition()), record.offset());
        processRecord(record);
        storeRecordInDB(record);
        // failure happens here, we are unable to commit latest offset
        consumer.commitAsync(currentOffsets);
    }
}

This could be avoided if there was a way to store both the record and the offset in one atomic action. Either both the record and the offset are committed, or neither of them are committed - using perhaps database transactions. The offset is now somehow stored in database instead of Kafka. We can use `seek` in this scenario.

In [None]:
public class SaveOffsetsOnRebalance implements ConsumerRebalanceListener {
    public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
        commitDBTransaction();
    }
    public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
        for(TopicPartition partition: partitions) {
            consumer.seek(partition, getOffsetFromDB(partition));
        }
    }
}

consumer.subscribe(topics, new SaveOffsetOnRebalance(consumer));
// we call poll() once to make sure we join a consumer group and get assigned partitions
consumer.poll(0);

for (TopicPartition partition: consumer.assignment()) 
    consumer.seek(partition, getOffsetFromDB(partition));

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        processRecord(record);
        storeRecordInDB(record);
        storeOffsetInDB(record.topic(), record.partition(), record.offset());
    }
    // database records and offsets will be inserted to the database when we commit transaction
    commitDBTransaction();
}