## Overview
A producer writes messages to a Kafka topic. Many different types of data can be written with varying appetite for message duplication and failure. Producer is composed of the many components and follows the below message publishing flow:  
<img src="images/producer_flow.png" />

A message in Kafka is represented using a `ProducerMessage` which is defined as:

In [None]:
public class ProducerRecord<K, V> {
    private final String topic;  // required
    private final Integer partition;
    private final Headers headers;  // can be used to add context like trace id
    private final K key;
    private final V value;  // required
    private final Long timestamp;  // System.currentTimeMillis() if not specified

    // ...
}

The `key` and `value` components are serialized and sent over network. If the `partition` is not specified, the partitioner will select a partition using the `key`.

Once the message is successfully written, we get back a `RecordMetadata` containing:

In [None]:
public final class RecordMetadata {
    private final long offset;
    private final long timestamp;  // same as in ProducerRecord

    private final TopicPartition topicPartition;
    // ...
}

### Serializer
Serializer is a piece of code on the Producer side responsible for converting Java objects (or any other language-specific data structure) into a byte array (`byte[]`). This conversion is essential because Kafka is a byte-oriented system; it stores and transmits all data - `key` and `value` as raw arrays of bytes.

Kafka client provides multiple different serializers all inheriting from `Serializer<T>` interface:

In [None]:
public interface Serializer<T> extends Closeable {
    byte[] serialize(String topic, T data);
    // ...
}

Some implementations are:
- `StringSerializer`: used to serialize string keys and values. It converts a string to `byte[]` using the specified encoding.
- `IntegerSerializer`: used to serialize integer keys and values.
- `JsonPOJOSerializer`: used to serialize Java POJO keys and values as JSON `byte[]`. It uses library like Jackson

### Partitioner
Partitioner is responsible for determining which partition a specific `ProducerRecord` should be sent to within a topic. Partitioners help distribute message across partitions acting as load balancer. It also ensures message ordering if required (by sending similar messages to same partition).

Partitioner works in the following manner:
1. If the `ProducerRecord` explicitly specifies a `partition` number, the Partitioner simply uses this value.
2. If the `partition` is not specified, but a `key` is present in the `ProducerRecord`, the Partitioner uses a hashing algorithm on the key.
3. If both the `partition` and the `key` are `null`, the Partitioner's goal shifts to simply distributing the load as evenly and efficiently as possible. A round-robin algorithm will be used to balance the messages among the partitions.

Thus, it means that **it is the producer's responsibility to decide which partition to send data to**.

## Constructing Producer
Producer object facilitates thread safe publishing of messages to Kafka. To instantiate a producer object, we need the following mandatory properties:

In [None]:
private Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers", "broker1:9092,broker2:9092");  // list of host:port pairs of brokers
kafkaProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
kafkaProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");

KafkaProducer<String, String> producer = new KafkaProducer<>(kafkaProps);

Full list of producer configuration properties [here](https://kafka.apache.org/documentation.html#producerconfigs). However some important ones are listed below:

| Property                  | Description                                                                                                                                                                                                                                                                                                                                                                                                                         |
|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| acks                      | `acks=0`, producer will not wait for a reply from the broker before assuming the message was sent successfully. High throughput, but producer will not know about any failures. `acks=1`, producer will receive a success response from the broker the moment the leader replica received the message. `acks=all`, producer will receive a success response from the broker once all in-sync replicas received the message. High latency. |
| buffer.memory             | sets the amount of memory the producer will use to buffer messages waiting to be sent to brokers. If messages are sent by the application faster than they can be delivered to the server, the producer may run out of space and additional `send()` calls will either block or throw an exception, based on the `block.on.buffer.full` prop.                                                                                         |
| compression.type          | can be set to snappy, gzip, or lz4.                                                                                                                                                                                                                                                                                                                                                                                                 |
| retries                   | how many times the producer will retry sending the message before giving up and notifying the client of an issue. Producer will wait `retry.backoff.ms` amount of time before retrying.                                                                                                                                                                                                                                             |
| client.id                 | used by broker to identify client                                                                                                                                                                                                                                                                                                                                                                                                   |
| timeout.ms                | controls the time the broker will wait for in-sync replicas to acknowledge the message.                                                                                                                                                                                                                                                                                                                                             |
| metadata.fetch.timeout.ms | how long the producer will wait for a reply from the server when sending data.                                                                                                                                                                                                                                                                                                                                                      |

## Sending Message
There are multiple ways to send messages.

**Fire and forget:**

In [None]:
ProducerRecord<String, String> record = 
    new ProducerRecord<>("CustomerCountry", "Precision Products", "France");
    // <Topic>,<Key>,<Value>

try {
    producer.send(record);  // Send returns Future<RecordMetadata>
} catch(Exception e) {
    logger.error(e);
}

The `send()` method can throw an exception synchronously if the error is detected on the client side before the data is even placed on the network. Example:
- Serialization Error: the key or value cannot be converted into bytes using the given serializer. Throws `SerializationException`.
- Buffer Exhaustion: the producer's internal buffer is full, and the `max.block.ms` timeout has been reached while waiting for space. Throws `BufferExhaustedException`.
- Invalid Topic/Partition: the topic name is invalid, or the partition specified is out of range.
- Thread interruption: the producer thread was interrupted

If the message is successfully placed in the producer's internal buffer and sent over the network, the `producer.send(record)` call will succeed and return immediately, even if the broker later fails to write the message. Using this pattern one cannot be certain that the message was written to the topic.

**Synchronous Send:**

In [None]:
try {
    producer.send(record).get();
} catch(Exception e) {
    logger.error(e);
}

Using this pattern we be sure that the message was written. Call to `get()` will throw `ExecutionException`.

What if `ack` is set to 0? It is equivalent to fire and forget mode. Note that Kafka will not allow consumers to read records until they are written to
all in sync replicas. So does setting `ack=0` make any difference in the end to end flow?

Some errors are retriable, whereas others aren't. Kafka producer automatically retries in case of retriable exceptions. Examples of retriable exception would be connection error or a "no leader" error. Non retriable error would be something like message size being too large.

**Asynchronous Send:**

In [None]:
producer.send(record, (m, e) -> {
    if(e != null){
        logger.error(e);
    }
});

// The second argument is a org.apache.kafka.clients.producer.Callback interface object
// which has one method public void onCompletion(RecordMetadata recordMetadata, Exception e)

The callbacks execute in the producerâ€™s main thread. This ensures that when we send two messages to the same partition one after another, their callbacks will be executed in the same order that we sent them.

The Kafka producer is designed to be highly efficient by always sending data directly to the *leader broker* for the specific partition it wants to write to. The producer client finds the correct leader broker through a process of metadata discovery and caching.

## Batching
Kafka producer batches messages being sent to improve throughput. Instead of sending every individual record immediately in its own network request, the producer accumulates multiple records into a single logical unit called a batch before transmitting them to the Kafka broker.

This beings the following improvement:
- fewer network calls
- fewer disk writes made by the broker
- compression works better

Internally, when `send` method is called, the `ProducerRecord` is first kept in memory buffer. Separate buffers are maintained for every unique topic-partition combination. The batch is marked as closed and subsequently sent to broker when:
- the buffer size matches `batch.size` which is 16KB by default
  <div style="display: inline-block">
      
  | Buffer Queue | Topic	 |Partition ID   | batch.size Limit|  
  ---------------|-----------|---------------|-----------------|
  | Buffer A	 |UserEvents | 0	         | 16 KB           | 
  | Buffer B	 |UserEvents | 1	         | 16 KB           | 
  | Buffer C	 |AuditLogs  | 0	         | 16 KB           | 
  | Buffer D	 |AuditLogs  | 2	         | 16 KB           | 
  
- when producer has waited for `linger.ms` (5ms by default) even if the buffer is not full