# Kafka Message Format

### 📦 Kafka Message Format – Explained in Detail

A **Kafka message** is a data unit that is sent from a **producer** to a **topic**, and read by a **consumer**. Kafka stores and transports **binary messages**, but under the hood, each message includes metadata and structure.



## 🧱 Kafka Message Structure

Each Kafka message contains the following parts:

| Field         | Description                                                                     |
| ------------- | ------------------------------------------------------------------------------- |
| **Key**       | (Optional) Used for message routing to partitions.                              |
| **Value**     | The actual data (e.g., JSON, text, binary). This is what you typically process. |
| **Offset**    | Unique ID of the message **within a partition**.                                |
| **Partition** | Partition number where the message is stored.                                   |
| **Timestamp** | When the message was produced.                                                  |
| **Headers**   | (Optional) Key-value pairs for metadata (like HTTP headers).                    |



### 📌 Example (JSON Message)

Here’s a simple message as seen by a **consumer** (after deserialization):

```json
{
  "sensor_id": "sensor_1",
  "temperature": 29.3,
  "timestamp": "2025-06-02T20:12:11"
}
```

But under the hood, Kafka stores it like:

```text
Partition: 0
Offset: 12
Key: "sensor_1"
Value: {"sensor_id":"sensor_1","temperature":29.3,"timestamp":"2025-06-02T20:12:11"}
Timestamp: 2025-06-02 20:12:11
```



## 🔑 Field Breakdown

### 1. **Key (Optional)**

* Helps Kafka **route** messages to specific **partitions**.
* Messages with the same key always go to the **same partition**.
* Useful in scenarios like sending all messages of a specific user/device to one partition.

### 2. **Value**

* The main content (your actual data).
* Can be in any format: text, JSON, XML, Avro, binary.
* Must be **serialized** into bytes before sending.

### 3. **Offset**

* A unique identifier **per partition** (not global).
* Consumers use this to keep track of where they left off.

### 4. **Partition**

* Kafka divides a topic into partitions.
* Partitions allow **parallelism** and **scaling**.

### 5. **Timestamp**

* Kafka automatically adds a timestamp to each message.
* Can be **producer-generated** or **broker-generated**.

### 6. **Headers** (Optional)

* Metadata for the message.
* Format: `{"source": "sensorA", "location": "room1"}`



## 📌 Real Example (Python Kafka Producer)

```python
producer.send(
    topic='temperature-readings',
    key=b'sensor_1',  # key as bytes
    value=json.dumps(data).encode('utf-8'),  # value as serialized JSON
    headers=[('source', b'raspberry-pi')]
)
```



## 🧠 Summary

| Field     | Purpose                             | Required? |
| --------- | ----------------------------------- | --------- |
| Key       | Routing to partition                | No        |
| Value     | Actual message content              | Yes       |
| Offset    | Unique ID in a partition (auto-set) | Auto      |
| Partition | Data sharding and parallelism       | Auto/Set  |
| Timestamp | Track event time                    | Auto      |
| Headers   | Extra metadata                      | No        |



# Offsets and Commit Mechanism

### 🧭 Kafka Offsets and Commit Mechanism – Explained Clearly



## 🔢 What is an **Offset** in Kafka?

An **offset** is a **unique identifier for each message** in a Kafka **partition**.

* It's a **sequential number**: `0, 1, 2, 3...`
* It **identifies the position** of a message within a **partition**, not across the entire topic.

🧠 Think of it as a **line number** in a file.
If a consumer reads offset `5`, it has processed the **6th message** in that partition.



## 🧪 Example:

| Partition | Message (Value) | Offset |
| --------- | --------------- | ------ |
| 0         | {"temp": 25}    | 0      |
| 0         | {"temp": 26}    | 1      |
| 0         | {"temp": 27}    | 2      |

If your consumer reads up to offset `1`, it has read the first two messages.



## 🔁 What is Offset Commit?

Kafka **does not delete messages immediately after consumption**.
Instead, consumers **keep track of their own position** using the **commit mechanism**.

### ✅ Committing an Offset:

* It means “📌 I've successfully processed this message. Remember this offset.”
* On restart, the consumer resumes **from the next offset** (not re-reading old data).



## 🔧 Types of Offset Management

| Type                           | Description                                                                      |
| ------------------------------ | -------------------------------------------------------------------------------- |
| **Automatic Commit** (default) | Kafka client **periodically commits** offset in the background (e.g., every 5s). |
| **Manual Commit**              | You choose **when to commit** — typically after successful processing.           |



## 📦 Where Are Offsets Stored?

Kafka stores committed offsets in an internal topic:

```
__consumer_offsets
```

Each consumer group maintains its own offset per partition.



## 🛠️ Python Example – Auto vs Manual

### ✅ Auto Commit (default)

```python
KafkaConsumer(
    'topic-name',
    group_id='my-group',
    bootstrap_servers='localhost:9092',
    enable_auto_commit=True,             # Auto commit is ON
    auto_commit_interval_ms=5000         # Commit every 5 seconds
)
```

⚠️ Risk: If app crashes before auto-commit → **messages may be reprocessed**.



### ✅ Manual Commit

```python
consumer = KafkaConsumer(
    'topic-name',
    group_id='my-group',
    bootstrap_servers='localhost:9092',
    enable_auto_commit=False             # Turn off auto-commit
)

for message in consumer:
    print(message.value)
    # Only commit after successful processing
    consumer.commit()
```

This ensures **no message is marked as “done”** until your code explicitly says so.



## 🧠 Real-World Tip

| Scenario                 | Use...                              |
| ------------------------ | ----------------------------------- |
| Simple logging           | Auto commit                         |
| Critical data processing | Manual commit                       |
| Long batch jobs          | Manual commit                       |
| Exactly-once semantics   | Manual + transaction API (advanced) |



# Sync vs Async Producer

### 🚀 Kafka Sync vs Async Producer – Explained Clearly

When using a Kafka **Producer** in Python (or any language), you can send messages in two ways:



## 🔁 1. **Synchronous Producer**

### ✅ What is it?

* The producer sends a message and **waits** for Kafka to **acknowledge** it before continuing.
* It's **blocking**: the next line runs only after Kafka confirms success/failure.

### 📦 Example (Python):

```python
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

future = producer.send('test-topic', {"name": "sync message"})
result = future.get(timeout=10)  # blocks until acknowledged

print("Message sent:", result)
```

### ✅ Pros:

* Reliable
* Easy to debug

### ❌ Cons:

* Slower (waits after each message)
* Not suitable for high-throughput systems



## ⚡ 2. **Asynchronous Producer**

### ✅ What is it?

* The producer sends messages and **doesn't wait** for the response.
* Uses **callback functions** to handle success or failure.
* It's **non-blocking**.

### 📦 Example (Python):

```python
from kafka import KafkaProducer
import json

def on_success(record_metadata):
    print("Message sent to:", record_metadata.topic, record_metadata.partition)

def on_error(excp):
    print("Failed to send message:", excp)

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

producer.send('test-topic', {"name": "async message"}).add_callback(on_success).add_errback(on_error)

# optional: wait for all async messages to be delivered
producer.flush()
```

### ✅ Pros:

* High performance
* Ideal for streaming large volumes of data

### ❌ Cons:

* More complex (callbacks)
* Need careful error handling



## ⚖️ Summary: Sync vs Async

| Feature        | Synchronous                   | Asynchronous                       |
| -------------- | ----------------------------- | ---------------------------------- |
| Blocking       | Yes                           | No                                 |
| Speed          | Slower                        | Faster                             |
| Error Handling | Straightforward (try/except)  | Via callbacks (success/error)      |
| Use Case       | Simpler, low-throughput tasks | High-throughput, real-time systems |



# Auto Commit vs Manual Commit (Consumer)

### 🔄 Kafka Auto Commit vs Manual Commit in Consumer – Explained Clearly

Kafka consumers **track what messages they've read** using **offsets**.
How and **when** these offsets are saved ("committed") to Kafka is controlled by:

> ✅ **Auto Commit** or 🔧 **Manual Commit**



## 🧭 What is "Commit" in Kafka?

Committing an offset means:

> 📌 “Hey Kafka, I have processed this message. If I restart, I’ll start from the next one.”



## ✅ Auto Commit

### 📋 Behavior:

* Kafka **automatically commits offsets** in the background at fixed intervals.

### 🔧 Config:

```python
KafkaConsumer(
    'my-topic',
    group_id='my-group',
    bootstrap_servers='localhost:9092',
    enable_auto_commit=True,         # Turned ON (default)
    auto_commit_interval_ms=5000     # Commit every 5 seconds
)
```

### ✅ Pros:

* Easy to use
* Less code

### ❌ Cons:

* Not always reliable — if the consumer crashes before commit, messages may be reprocessed.
* You **can’t guarantee exactly-once** processing.



## 🔧 Manual Commit

### 📋 Behavior:

* You control **when** to commit the offset, typically after successful processing.

### 🔧 Config:

```python
consumer = KafkaConsumer(
    'my-topic',
    group_id='my-group',
    bootstrap_servers='localhost:9092',
    enable_auto_commit=False
)

for message in consumer:
    # Process message
    print(message.value)
    
    # Commit after processing
    consumer.commit()
```

### ✅ Pros:

* More control
* Helps avoid duplicate processing
* Required for **exactly-once or at-least-once** patterns

### ❌ Cons:

* Slightly more complex
* You must remember to commit manually



## ⚖️ Comparison Table

| Feature              | Auto Commit                  | Manual Commit                   |
| -------------------- | ---------------------------- | ------------------------------- |
| Offset commit timing | Periodic, automatic          | Explicit, controlled by dev     |
| Risk of duplication  | Yes (if crash before commit) | Low (commit only after success) |
| Setup effort         | Minimal                      | More coding required            |
| Best for             | Simple/logging consumers     | Critical, transactional logic   |



## 🧠 Real-world Rule of Thumb

| Use Case                        | Commit Type   |
| ------------------------------- | ------------- |
| Logging, analytics, dashboards  | Auto Commit   |
| Payments, Orders, Notifications | Manual Commit |
| ETL pipelines                   | Manual Commit |

