# Module 01: Introduction to Event Streaming

**Estimated Time:** 60 minutes

## Learning Objectives

By the end of this module, you will:
- Understand event-driven architecture principles
- Learn Apache Kafka architecture and core concepts
- Work with brokers, topics, and partitions
- Master producer and consumer patterns
- Understand consumer groups and parallel processing
- Build a multi-producer, multi-consumer system

---

## 1. Event-Driven Architecture (EDA)

### What is an Event?

An **event** is a record of something that happened in your system:
- **State change**: User updated profile, order was shipped
- **Action**: Button clicked, file uploaded, payment processed
- **Measurement**: Temperature recorded, CPU usage captured

### Anatomy of an Event

```json
{
  "event_id": "evt_123456",
  "event_type": "user.profile.updated",
  "timestamp": "2024-01-15T10:30:00Z",
  "user_id": "user_789",
  "data": {
    "field": "email",
    "old_value": "old@example.com",
    "new_value": "new@example.com"
  },
  "metadata": {
    "source": "web-app",
    "version": "1.0"
  }
}
```

### Event-Driven vs Request-Response

**Traditional Request-Response:**
```
Client → [Request] → Server → [Process] → Response → Client
         (waiting...)                     (blocked)

- Synchronous
- Tight coupling
- Client waits for response
```

**Event-Driven:**
```
Producer → [Event] → Event Stream → [Consumer 1]
                                  → [Consumer 2]
                                  → [Consumer 3]

- Asynchronous
- Loose coupling
- Fire and forget
```

### Benefits of Event-Driven Architecture

1. **Decoupling**: Producers don't know about consumers
2. **Scalability**: Add consumers independently
3. **Flexibility**: New consumers can be added without changing producers
4. **Resilience**: If a consumer fails, events are still stored
5. **Auditability**: Complete event history is preserved

### Real-World Use Cases

| Industry | Use Case | Events |
|----------|----------|--------|
| E-commerce | Order processing | OrderPlaced, PaymentProcessed, OrderShipped |
| Finance | Fraud detection | TransactionInitiated, AccountAccessed, LargeWithdrawal |
| IoT | Smart home | TemperatureChanged, MotionDetected, DoorOpened |
| Social Media | Activity feed | PostCreated, CommentAdded, UserFollowed |
| Gaming | Player actions | PlayerMoved, ItemPurchased, LevelCompleted |

---

## 2. Apache Kafka Architecture

### What is Kafka?

Apache Kafka is a **distributed event streaming platform** that:
- Publishes and subscribes to streams of events
- Stores events durably and reliably
- Processes events in real-time

### Kafka Architecture Components

```
┌─────────────────────────────────────────────────────────┐
│                    Kafka Cluster                        │
│                                                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
│  │  Broker 1   │  │  Broker 2   │  │  Broker 3   │    │
│  │             │  │             │  │             │    │
│  │ Topic: user │  │ Topic: user │  │ Topic: user │    │
│  │ Partition 0 │  │ Partition 1 │  │ Partition 2 │    │
│  └─────────────┘  └─────────────┘  └─────────────┘    │
│                                                         │
│  ┌──────────────────────────────────────────────┐     │
│  │           Zookeeper (Coordination)           │     │
│  └──────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────┘
     ↑                                           ↓
 Producers                                  Consumers
```

### Core Components

1. **Broker**: A Kafka server that stores and serves events
2. **Topic**: A category/feed of events (like a table in a database)
3. **Partition**: Subdivision of a topic for parallel processing
4. **Producer**: Application that writes events to topics
5. **Consumer**: Application that reads events from topics
6. **Zookeeper**: Manages cluster coordination (being replaced by KRaft)

### Topics and Partitions

**Topic Structure:**
```
Topic: "user-events" (3 partitions)

Partition 0:  [evt1] → [evt4] → [evt7] → [evt10] →
Partition 1:  [evt2] → [evt5] → [evt8] → [evt11] →
Partition 2:  [evt3] → [evt6] → [evt9] → [evt12] →
              ↑offset 0  offset 1  offset 2
```

**Key Concepts:**
- Events in a partition are **ordered**
- Each event has an **offset** (position in partition)
- Partitions enable **parallelism**
- Events with the same **key** go to the same partition

In [None]:
# Let's explore Kafka cluster metadata
from confluent_kafka.admin import AdminClient, NewTopic
import json

# Connect to Kafka
admin_client = AdminClient({"bootstrap.servers": "localhost:9092"})

# Get cluster metadata
metadata = admin_client.list_topics(timeout=5)

print("[OK] Kafka Cluster Information\n")
print(f"Cluster ID: {metadata.cluster_id}")
print(f"Controller Broker: {metadata.controller_id}")
print(f"\nBrokers in cluster:")

for broker_id, broker_metadata in metadata.brokers.items():
    print(f"  - Broker {broker_id}: {broker_metadata.host}:{broker_metadata.port}")

print(f"\nExisting Topics ({len(metadata.topics)}):")
for topic_name in list(metadata.topics.keys())[:10]:  # Show first 10
    topic = metadata.topics[topic_name]
    print(f"  - {topic_name}: {len(topic.partitions)} partitions")

---

## 3. Creating and Managing Topics

In [None]:
# Create a new topic with specific configuration
from confluent_kafka.admin import AdminClient, NewTopic, KafkaException

TOPIC_NAME = "user-events"
NUM_PARTITIONS = 3
REPLICATION_FACTOR = 1  # For local dev; use 3 in production

# Define new topic
new_topic = NewTopic(
    topic=TOPIC_NAME,
    num_partitions=NUM_PARTITIONS,
    replication_factor=REPLICATION_FACTOR,
    config={
        "retention.ms": "604800000",  # 7 days in milliseconds
        "compression.type": "gzip",
        "max.message.bytes": "1048576",  # 1 MB
    },
)

# Create topic
admin_client = AdminClient({"bootstrap.servers": "localhost:9092"})

try:
    # Returns a dict of futures
    futures = admin_client.create_topics([new_topic])

    # Wait for operation to complete
    for topic, future in futures.items():
        try:
            future.result()  # Block until topic is created
            print(f"[OK] Topic '{topic}' created successfully")
            print(f"     Partitions: {NUM_PARTITIONS}")
            print(f"     Replication: {REPLICATION_FACTOR}")
        except KafkaException as e:
            if "TOPIC_ALREADY_EXISTS" in str(e):
                print(f"[WARNING] Topic '{topic}' already exists")
            else:
                print(f"[FAIL] Failed to create topic '{topic}': {e}")
except Exception as e:
    print(f"[FAIL] Error creating topic: {e}")

In [None]:
# Inspect topic configuration
from confluent_kafka.admin import ConfigResource

# Get topic configuration
resource = ConfigResource("topic", TOPIC_NAME)
futures = admin_client.describe_configs([resource])

for res, future in futures.items():
    try:
        configs = future.result()
        print(f"[DATA] Configuration for topic '{res.name}':\n")

        # Show key configurations
        important_configs = [
            "retention.ms",
            "compression.type",
            "max.message.bytes",
            "cleanup.policy",
            "segment.ms",
        ]

        for config_name in important_configs:
            if config_name in configs:
                config = configs[config_name]
                print(f"  {config_name}: {config.value}")
    except Exception as e:
        print(f"[FAIL] Could not get config: {e}")

---

## 4. Producers: Writing Events to Kafka

### Producer Basics

**How Producers Work:**
```
Producer → [Serialization] → [Partitioner] → [Buffer] → Broker
              (to bytes)       (which partition?)  (batch)
```

**Key Producer Concepts:**
1. **Serialization**: Convert data to bytes (JSON, Avro, Protobuf)
2. **Partitioning**: Decide which partition receives the event
3. **Batching**: Group events for efficiency
4. **Acknowledgments**: Confirm delivery (acks=0, 1, all)
5. **Idempotence**: Prevent duplicates (enable.idempotence=true)

In [None]:
# Simple Producer Example
from confluent_kafka import Producer
from datetime import datetime
import json
import random

# Producer configuration
producer_config = {
    "bootstrap.servers": "localhost:9092",
    "client.id": "user-events-producer",
    "acks": "all",  # Wait for all replicas to acknowledge
    "retries": 3,
    "enable.idempotence": True,  # Prevent duplicates
}

producer = Producer(producer_config)


# Delivery callback
def delivery_report(err, msg):
    """Callback for message delivery reports"""
    if err:
        print(f"[FAIL] Delivery failed: {err}")
    else:
        print(
            f"[OK] Message delivered to {msg.topic()} [partition {msg.partition()}] at offset {msg.offset()}"
        )


# Simulate user events
user_actions = ["login", "logout", "profile_update", "purchase", "view_page"]
user_ids = [f"user_{i}" for i in range(1, 6)]

print("Producing user events...\n")

for i in range(10):
    event = {
        "event_id": f"evt_{i}",
        "event_type": f"user.{random.choice(user_actions)}",
        "user_id": random.choice(user_ids),
        "timestamp": datetime.now().isoformat(),
        "data": {
            "session_id": f"session_{random.randint(1000, 9999)}",
            "ip_address": f"192.168.1.{random.randint(1, 255)}",
        },
    }

    # Produce with key (ensures same user goes to same partition)
    producer.produce(
        topic=TOPIC_NAME,
        key=event["user_id"],  # Partition by user_id
        value=json.dumps(event),
        callback=delivery_report,
    )

    # Trigger delivery callbacks (non-blocking)
    producer.poll(0)

# Wait for all messages to be delivered
print("\nFlushing remaining messages...")
producer.flush()
print("[SUCCESS] All events produced!")

### Understanding Partitioning

**Partitioning Strategies:**

1. **Key-based (default)**:
   ```python
   # Events with same key → same partition
   producer.produce(topic, key='user_123', value=event)
   # Ensures ordering for each user
   ```

2. **Round-robin (no key)**:
   ```python
   # Events distributed evenly across partitions
   producer.produce(topic, value=event)
   # Good for throughput, no ordering guarantees
   ```

3. **Custom partitioner**:
   ```python
   # Implement your own partitioning logic
   def custom_partitioner(key, all_partitions, available_partitions):
       return hash(key) % len(all_partitions)
   ```

**Why Partitioning Matters:**
- Enables **parallel processing**
- Provides **ordering guarantees** within a partition
- Allows **consumer groups** to scale

In [None]:
# Demonstrate partition distribution
from collections import defaultdict

# Track which partition each user went to
user_partitions = defaultdict(set)


def partition_tracker(err, msg):
    """Track partition assignments"""
    if not err:
        user_id = msg.key().decode("utf-8")
        partition = msg.partition()
        user_partitions[user_id].add(partition)


# Produce events with partition tracking
producer = Producer(producer_config)

for i in range(30):
    user_id = random.choice(user_ids)
    event = {"event_id": f"evt_{i}", "user_id": user_id, "action": random.choice(user_actions)}

    producer.produce(
        topic=TOPIC_NAME, key=user_id, value=json.dumps(event), callback=partition_tracker
    )
    producer.poll(0)

producer.flush()

# Show partition distribution
print("[DATA] Partition Distribution by User:\n")
for user_id, partitions in sorted(user_partitions.items()):
    print(f"  {user_id}: Partition {list(partitions)[0]}")

print("\n[OK] Notice: Each user consistently goes to the same partition!")
print("     This ensures ordering for events from the same user.")

---

## 5. Consumers: Reading Events from Kafka

### Consumer Basics

**How Consumers Work:**
```
Broker → [Fetch] → Consumer → [Deserialize] → [Process] → [Commit Offset]
          (poll)              (bytes to data)             (mark as read)
```

**Key Consumer Concepts:**
1. **Polling**: Fetch events from Kafka
2. **Offset**: Position in partition (which events have been read)
3. **Offset Commit**: Save progress (auto or manual)
4. **Consumer Group**: Multiple consumers working together
5. **Rebalancing**: Reassign partitions when consumers join/leave

In [None]:
# Simple Consumer Example
from confluent_kafka import Consumer, KafkaException

# Consumer configuration
consumer_config = {
    "bootstrap.servers": "localhost:9092",
    "group.id": "user-events-consumer-group",
    "auto.offset.reset": "earliest",  # Start from beginning
    "enable.auto.commit": True,
    "auto.commit.interval.ms": 5000,
}

consumer = Consumer(consumer_config)

# Subscribe to topic
consumer.subscribe([TOPIC_NAME])

print(f"[OK] Consumer subscribed to '{TOPIC_NAME}'")
print(f"     Consumer Group: {consumer_config['group.id']}\n")
print("Reading events (will stop after 15 messages)...\n")

messages_read = 0
max_messages = 15

try:
    while messages_read < max_messages:
        # Poll for messages
        msg = consumer.poll(timeout=2.0)

        if msg is None:
            print("[WARNING] No more messages available")
            break

        if msg.error():
            print(f"[FAIL] Consumer error: {msg.error()}")
            continue

        # Successfully received a message
        key = msg.key().decode("utf-8") if msg.key() else None
        value = json.loads(msg.value().decode("utf-8"))

        print(f"[{messages_read + 1}] Event: {value['event_type']}")
        print(f"    User: {value['user_id']}")
        print(f"    Partition: {msg.partition()}, Offset: {msg.offset()}")
        print()

        messages_read += 1

finally:
    consumer.close()
    print(f"[SUCCESS] Read {messages_read} events")

---

## 6. Consumer Groups and Parallel Processing

### What is a Consumer Group?

A **consumer group** is a set of consumers that cooperate to consume events from a topic.

**Single Consumer Group:**
```
Topic: user-events (3 partitions)

Partition 0  ────→  Consumer A  ┐
Partition 1  ────→  Consumer B  ├─ Group: analytics
Partition 2  ────→  Consumer C  ┘

- Each partition assigned to ONE consumer
- Consumers share the workload
- Parallel processing
```

**Multiple Consumer Groups:**
```
Topic: user-events

Partition 0  ──┬──→  Consumer A (Group: analytics)
               └──→  Consumer X (Group: fraud-detection)

Partition 1  ──┬──→  Consumer B (Group: analytics)
               └──→  Consumer Y (Group: fraud-detection)

- Each group gets ALL events independently
- Different groups can process the same events
```

### Consumer Group Rules

1. **One partition per consumer** (in the same group)
2. **Multiple consumers per group** (up to number of partitions)
3. **Different groups are independent**
4. **Rebalancing happens** when consumers join/leave

In [None]:
# Demonstrate multiple consumers in the same group
import threading
import time


def consume_events(consumer_id, group_id, topic, max_messages=10):
    """Consumer function for threading"""
    config = {
        "bootstrap.servers": "localhost:9092",
        "group.id": group_id,
        "auto.offset.reset": "earliest",
        "enable.auto.commit": True,
    }

    consumer = Consumer(config)
    consumer.subscribe([topic])

    messages_read = 0
    partitions_read = set()

    print(f"[{consumer_id}] Started consuming from group '{group_id}'")

    try:
        while messages_read < max_messages:
            msg = consumer.poll(timeout=2.0)

            if msg is None:
                break

            if msg.error():
                continue

            partitions_read.add(msg.partition())

            value = json.loads(msg.value().decode("utf-8"))
            print(
                f"[{consumer_id}] Partition {msg.partition()}, Offset {msg.offset()}: {value['event_type']}"
            )

            messages_read += 1
            time.sleep(0.1)  # Simulate processing

    finally:
        consumer.close()
        print(
            f"[{consumer_id}] Finished. Read {messages_read} events from partitions {sorted(partitions_read)}"
        )


# Create 3 consumers in the same group
print("Starting 3 consumers in the same group...\n")

threads = []
for i in range(3):
    consumer_id = f"Consumer-{i+1}"
    thread = threading.Thread(
        target=consume_events, args=(consumer_id, "parallel-consumer-group", TOPIC_NAME, 10)
    )
    threads.append(thread)
    thread.start()

# Wait for all consumers to finish
for thread in threads:
    thread.join()

print("\n[SUCCESS] All consumers finished!")
print("\n[OK] Notice: Each consumer read from different partitions!")
print("     This is how Kafka achieves parallel processing.")

---

## 7. Offset Management

### What are Offsets?

**Offsets** track which events have been read:
```
Partition 0: [evt0][evt1][evt2][evt3][evt4][evt5]
              ↑     ↑     ↑     ↑
            offset offset offset offset
              0      1      2      3

Consumer commits offset 3 → Next read starts at offset 3
```

### Offset Commit Strategies

1. **Auto-commit (default)**:
   - Kafka commits offsets automatically
   - Simple but may lose/duplicate events on failure
   
2. **Manual commit**:
   - You control when offsets are committed
   - Better control, at-least-once or exactly-once

3. **Manual commit after processing**:
   - Commit only after successful processing
   - Prevents data loss

In [None]:
# Manual offset commit example
consumer_config_manual = {
    "bootstrap.servers": "localhost:9092",
    "group.id": "manual-commit-group",
    "auto.offset.reset": "earliest",
    "enable.auto.commit": False,  # Disable auto-commit
}

consumer = Consumer(consumer_config_manual)
consumer.subscribe([TOPIC_NAME])

print("[OK] Consumer with manual offset commit\n")

messages_processed = 0
max_messages = 10

try:
    while messages_processed < max_messages:
        msg = consumer.poll(timeout=2.0)

        if msg is None:
            break

        if msg.error():
            continue

        # Process the message
        value = json.loads(msg.value().decode("utf-8"))
        print(f"[{messages_processed + 1}] Processing: {value['event_type']}")

        # Simulate processing that might fail
        try:
            # Your processing logic here
            time.sleep(0.05)

            # Only commit if processing succeeds
            consumer.commit(msg)
            print(f"     [OK] Committed offset {msg.offset()} on partition {msg.partition()}")

            messages_processed += 1

        except Exception as e:
            print(f"     [FAIL] Processing failed: {e}")
            print(f"     Will retry this message on next poll")
            break  # Don't commit, will re-read this message

finally:
    consumer.close()
    print(f"\n[SUCCESS] Processed {messages_processed} events with manual commits")

---

## 8. Mini-Project: Event-Driven User Activity System

Let's build a complete system with:
- Multiple producers generating different event types
- Multiple consumers in different groups
- Real-time analytics

In [None]:
# Create a topic for the mini-project
PROJECT_TOPIC = "user-activity-stream"

new_topic = NewTopic(topic=PROJECT_TOPIC, num_partitions=3, replication_factor=1)

try:
    futures = admin_client.create_topics([new_topic])
    for topic, future in futures.items():
        try:
            future.result()
            print(f"[OK] Created topic '{topic}' for mini-project")
        except KafkaException as e:
            if "TOPIC_ALREADY_EXISTS" in str(e):
                print(f"[OK] Topic '{topic}' already exists")
except Exception as e:
    print(f"[FAIL] Error: {e}")

In [None]:
# Producer 1: Web application events
def web_event_producer(num_events=20):
    """Simulate web application events"""
    producer = Producer({"bootstrap.servers": "localhost:9092"})

    actions = ["page_view", "button_click", "form_submit", "search"]
    pages = ["/home", "/products", "/cart", "/checkout", "/profile"]

    for i in range(num_events):
        user_id = f"user_{random.randint(1, 10)}"
        event = {
            "source": "web-app",
            "event_id": f"web_{i}",
            "event_type": random.choice(actions),
            "user_id": user_id,
            "timestamp": datetime.now().isoformat(),
            "data": {"page": random.choice(pages), "session_duration": random.randint(10, 300)},
        }

        producer.produce(topic=PROJECT_TOPIC, key=user_id, value=json.dumps(event))
        producer.poll(0)
        time.sleep(0.1)

    producer.flush()
    print(f"[WEB] Produced {num_events} web events")


# Producer 2: Mobile app events
def mobile_event_producer(num_events=20):
    """Simulate mobile app events"""
    producer = Producer({"bootstrap.servers": "localhost:9092"})

    actions = ["app_open", "app_close", "notification_tap", "in_app_purchase"]

    for i in range(num_events):
        user_id = f"user_{random.randint(1, 10)}"
        event = {
            "source": "mobile-app",
            "event_id": f"mobile_{i}",
            "event_type": random.choice(actions),
            "user_id": user_id,
            "timestamp": datetime.now().isoformat(),
            "data": {"device": random.choice(["iOS", "Android"]), "app_version": "2.1.0"},
        }

        producer.produce(topic=PROJECT_TOPIC, key=user_id, value=json.dumps(event))
        producer.poll(0)
        time.sleep(0.1)

    producer.flush()
    print(f"[MOBILE] Produced {num_events} mobile events")


# Run both producers in parallel
print("Starting event producers...\n")

web_thread = threading.Thread(target=web_event_producer, args=(20,))
mobile_thread = threading.Thread(target=mobile_event_producer, args=(20,))

web_thread.start()
mobile_thread.start()

web_thread.join()
mobile_thread.join()

print("\n[SUCCESS] All producers finished!")

In [None]:
# Consumer 1: Analytics - Count events by type
def analytics_consumer(duration_seconds=10):
    """Analyze event patterns"""
    consumer = Consumer(
        {
            "bootstrap.servers": "localhost:9092",
            "group.id": "analytics-group",
            "auto.offset.reset": "earliest",
        }
    )

    consumer.subscribe([PROJECT_TOPIC])

    event_counts = defaultdict(int)
    source_counts = defaultdict(int)
    start_time = time.time()

    print("[ANALYTICS] Starting event analysis...")

    try:
        while time.time() - start_time < duration_seconds:
            msg = consumer.poll(timeout=1.0)

            if msg is None:
                continue

            if msg.error():
                continue

            event = json.loads(msg.value().decode("utf-8"))
            event_counts[event["event_type"]] += 1
            source_counts[event["source"]] += 1

    finally:
        consumer.close()

        print("\n[ANALYTICS] Event Analysis Results:")
        print("\nBy Event Type:")
        for event_type, count in sorted(event_counts.items(), key=lambda x: x[1], reverse=True):
            print(f"  {event_type}: {count}")

        print("\nBy Source:")
        for source, count in source_counts.items():
            print(f"  {source}: {count}")


# Consumer 2: Monitoring - Detect unusual activity
def monitoring_consumer(duration_seconds=10):
    """Monitor for unusual patterns"""
    consumer = Consumer(
        {
            "bootstrap.servers": "localhost:9092",
            "group.id": "monitoring-group",
            "auto.offset.reset": "earliest",
        }
    )

    consumer.subscribe([PROJECT_TOPIC])

    user_activity = defaultdict(list)
    start_time = time.time()

    print("[MONITORING] Starting activity monitoring...")

    try:
        while time.time() - start_time < duration_seconds:
            msg = consumer.poll(timeout=1.0)

            if msg is None:
                continue

            if msg.error():
                continue

            event = json.loads(msg.value().decode("utf-8"))
            user_id = event["user_id"]
            user_activity[user_id].append(event["event_type"])

            # Alert if user has > 5 events
            if len(user_activity[user_id]) == 6:
                print(f"  [ALERT] High activity detected for {user_id}")

    finally:
        consumer.close()

        print("\n[MONITORING] Activity Report:")
        for user_id, events in sorted(user_activity.items(), key=lambda x: len(x[1]), reverse=True)[
            :5
        ]:
            print(f"  {user_id}: {len(events)} events")


# Run both consumers in parallel
print("\nStarting consumers...\n")

analytics_thread = threading.Thread(target=analytics_consumer, args=(10,))
monitoring_thread = threading.Thread(target=monitoring_consumer, args=(10,))

analytics_thread.start()
monitoring_thread.start()

analytics_thread.join()
monitoring_thread.join()

print("\n[SUCCESS] Mini-project complete!")

---

## 9. Key Takeaways

[OK] **Event-Driven Architecture**: Decoupled, scalable, flexible approach to system design

[OK] **Kafka Components**: Brokers, topics, partitions, producers, consumers

[OK] **Topics and Partitions**: Enable parallelism and ordering guarantees

[OK] **Producers**: Write events with keys for partitioning, support batching and idempotence

[OK] **Consumers**: Read events, manage offsets, work in groups for parallel processing

[OK] **Consumer Groups**: Enable scalability and independent processing

### Important Patterns

1. **Partitioning by Key**:
   - Same key → same partition → ordering guarantee
   - Use user_id, session_id, or correlation_id as keys

2. **Consumer Groups**:
   - One group per application
   - Multiple consumers per group for parallelism
   - Different groups for different use cases

3. **Offset Management**:
   - Auto-commit for simple use cases
   - Manual commit after processing for reliability
   - Store offsets externally for exactly-once processing

4. **Error Handling**:
   - Retry on transient errors
   - Dead letter queues for poison messages
   - Monitoring and alerting

### Production Best Practices

1. **Set replication factor ≥ 3** for fault tolerance
2. **Monitor consumer lag** to detect processing issues
3. **Use Schema Registry** for data governance (covered in Module 03)
4. **Enable compression** (gzip, snappy, lz4)
5. **Configure retention** based on your needs
6. **Use monitoring tools** (Kafka UI, Prometheus, Grafana)

---

## 10. Practice Exercises

Try these exercises to reinforce your learning:

1. **Create a topic** with 5 partitions and produce 100 events
2. **Run 5 consumers** in the same group and observe partition assignment
3. **Implement manual offset commit** with error handling
4. **Create two consumer groups** reading from the same topic
5. **Monitor partition distribution** by analyzing consumer assignments

In [None]:
# Your practice code here

---

## 11. Next Steps

Congratulations on completing Module 01!

### What You've Learned

- [OK] Event-driven architecture fundamentals
- [OK] Apache Kafka core concepts
- [OK] Producer and consumer patterns
- [OK] Consumer groups and parallelism
- [OK] Offset management strategies

### Coming Up in Module 02: Kafka Deep Dive

You'll learn:
- Kafka internals: log structure, segments, indexes
- Replication and fault tolerance
- Performance tuning (batching, compression, linger.ms)
- Monitoring and metrics
- Troubleshooting common issues

### Resources

- [Kafka Documentation](https://kafka.apache.org/documentation/)
- [Confluent Kafka Python](https://docs.confluent.io/kafka-clients/python/current/overview.html)
- [Event-Driven Architecture](https://martinfowler.com/articles/201701-event-driven.html)
- [Kafka: The Definitive Guide](https://www.confluent.io/resources/kafka-the-definitive-guide/)

---

**Ready to dive deeper?** Open `02_kafka_deep_dive.ipynb` to continue!