# Exercice 16 - Producer Kafka Python

## Objectifs
- Creer un producer Kafka robuste
- Envoyer des messages JSON structures
- Utiliser les cles de partitionnement
- Simuler un flux de donnees en temps reel

---

## 1. Architecture Producer

```
+------------------------------------------------------------------+
|                    PRODUCER KAFKA                                |
+------------------------------------------------------------------+
|                                                                  |
|   APPLICATION           PRODUCER              KAFKA              |
|                                                                  |
|  +-----------+      +-------------+      +----------------+      |
|  |           |      |             |      |                |      |
|  |  Donnees  |----->| Serializer  |----->|    Topic       |      |
|  |  (dict)   |      | (JSON)      |      |                |      |
|  +-----------+      +-------------+      | +------------+ |      |
|                            |             | | Partition 0| |      |
|                     +------v------+      | +------------+ |      |
|                     |             |      | | Partition 1| |      |
|                     | Partitioner |----->| +------------+ |      |
|                     | (par cle)   |      | | Partition 2| |      |
|                     +-------------+      | +------------+ |      |
|                                          |                |      |
|                                          +----------------+      |
|                                                                  |
+------------------------------------------------------------------+

Options importantes :
- bootstrap_servers : liste des brokers
- key_serializer    : serialisation des cles
- value_serializer  : serialisation des valeurs
- acks              : garantie de livraison (0, 1, all)
- retries           : nombre de tentatives en cas d'erreur
```

## 2. Configuration

In [1]:
!pip install kafka-python -q

from kafka import KafkaProducer, KafkaAdminClient
from kafka.admin import NewTopic
import json
import time
import random
from datetime import datetime

KAFKA_BROKER = "broker:29092"

print("Configuration prete")

Configuration prete


In [2]:
# Creer les topics necessaires
def creer_topic_si_absent(nom, partitions=3):
    try:
        admin = KafkaAdminClient(bootstrap_servers=KAFKA_BROKER)
        if nom not in admin.list_topics():
            admin.create_topics([NewTopic(name=nom, num_partitions=partitions, replication_factor=1)])
            print(f"Topic '{nom}' cree")
        else:
            print(f"Topic '{nom}' existe deja")
        admin.close()
    except Exception as e:
        print(f"Erreur : {e}")

creer_topic_si_absent("commandes-json", 3)
creer_topic_si_absent("logs-application", 1)
creer_topic_si_absent("metrics", 3)

Topic 'commandes-json' cree
Topic 'logs-application' cree
Topic 'metrics' cree


## 3. Producer JSON

In [3]:
# Creer un producer avec serialisation JSON
producer = KafkaProducer(
    bootstrap_servers=KAFKA_BROKER,
    key_serializer=lambda k: k.encode('utf-8') if k else None,
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    acks='all',  # Attendre confirmation de tous les replicas
    retries=3,   # Reessayer 3 fois en cas d'erreur
)

print("Producer JSON configure")

Producer JSON configure


In [4]:
# Envoyer un message JSON
commande = {
    "order_id": 1001,
    "customer_id": "CUST-123",
    "timestamp": datetime.now().isoformat(),
    "produits": [
        {"product_id": 1, "quantity": 2, "price": 29.99},
        {"product_id": 5, "quantity": 1, "price": 49.99}
    ],
    "total": 109.97
}

# Utiliser customer_id comme cle (meme client = meme partition)
future = producer.send(
    topic="commandes-json",
    key=commande["customer_id"],
    value=commande
)

result = future.get(timeout=10)
print(f"Message envoye :")
print(f"  Topic: {result.topic}")
print(f"  Partition: {result.partition}")
print(f"  Offset: {result.offset}")
print(f"  Cle: {commande['customer_id']}")

Message envoye :
  Topic: commandes-json
  Partition: 1
  Offset: 0
  Cle: CUST-123


## 4. Generateur de commandes aleatoires

In [5]:
# Donnees de reference
PRODUITS = [
    {"id": 1, "nom": "Laptop", "prix": 999.99},
    {"id": 2, "nom": "Souris", "prix": 29.99},
    {"id": 3, "nom": "Clavier", "prix": 79.99},
    {"id": 4, "nom": "Ecran", "prix": 349.99},
    {"id": 5, "nom": "Casque", "prix": 149.99},
    {"id": 6, "nom": "Webcam", "prix": 89.99},
    {"id": 7, "nom": "USB Hub", "prix": 39.99},
    {"id": 8, "nom": "SSD", "prix": 129.99},
]

CLIENTS = [f"CUST-{i:03d}" for i in range(1, 21)]  # 20 clients

def generer_commande():
    """Genere une commande aleatoire"""
    client = random.choice(CLIENTS)
    nb_produits = random.randint(1, 4)
    
    produits_commande = []
    total = 0.0
    
    for _ in range(nb_produits):
        produit = random.choice(PRODUITS)
        quantite = random.randint(1, 3)
        sous_total = produit["prix"] * quantite
        
        produits_commande.append({
            "product_id": produit["id"],
            "product_name": produit["nom"],
            "quantity": quantite,
            "unit_price": produit["prix"],
            "subtotal": round(sous_total, 2)
        })
        total += sous_total
    
    return {
        "order_id": random.randint(10000, 99999),
        "customer_id": client,
        "timestamp": datetime.now().isoformat(),
        "items": produits_commande,
        "total": round(total, 2),
        "status": "created"
    }

In [6]:
# Tester le generateur
commande_test = generer_commande()
print(json.dumps(commande_test, indent=2))

{
  "order_id": 15066,
  "customer_id": "CUST-016",
  "timestamp": "2026-01-18T10:08:10.843924",
  "items": [
    {
      "product_id": 3,
      "product_name": "Clavier",
      "quantity": 1,
      "unit_price": 79.99,
      "subtotal": 79.99
    },
    {
      "product_id": 1,
      "product_name": "Laptop",
      "quantity": 2,
      "unit_price": 999.99,
      "subtotal": 1999.98
    },
    {
      "product_id": 7,
      "product_name": "USB Hub",
      "quantity": 1,
      "unit_price": 39.99,
      "subtotal": 39.99
    }
  ],
  "total": 2119.96,
  "status": "created"
}


## 5. Envoi en batch

In [7]:
# Envoyer un lot de commandes
def envoyer_batch(producer, topic, nb_messages):
    """Envoie un lot de messages"""
    futures = []
    
    for i in range(nb_messages):
        commande = generer_commande()
        future = producer.send(
            topic=topic,
            key=commande["customer_id"],
            value=commande
        )
        futures.append((commande["order_id"], future))
    
    # Attendre toutes les confirmations
    producer.flush()
    
    # Verifier les resultats
    succes = 0
    erreurs = 0
    
    for order_id, future in futures:
        try:
            result = future.get(timeout=10)
            succes += 1
        except Exception as e:
            print(f"Erreur pour commande {order_id}: {e}")
            erreurs += 1
    
    return succes, erreurs

In [8]:
# Envoyer 50 commandes
print("Envoi de 50 commandes...")
succes, erreurs = envoyer_batch(producer, "commandes-json", 50)

print(f"\nResultat :")
print(f"  Succes : {succes}")
print(f"  Erreurs: {erreurs}")

Envoi de 50 commandes...

Resultat :
  Succes : 50
  Erreurs: 0


## 6. Simulation de flux en temps reel

In [9]:
def simuler_flux(producer, topic, nb_messages, delai_ms=500):
    """
    Simule un flux de donnees en temps reel.
    
    Args:
        producer: KafkaProducer
        topic: Nom du topic
        nb_messages: Nombre de messages a envoyer
        delai_ms: Delai entre les messages (millisecondes)
    """
    print(f"Simulation de {nb_messages} messages...")
    print(f"Delai entre messages: {delai_ms}ms")
    print("=" * 50)
    
    for i in range(nb_messages):
        commande = generer_commande()
        
        future = producer.send(
            topic=topic,
            key=commande["customer_id"],
            value=commande
        )
        
        result = future.get(timeout=10)
        
        print(f"[{i+1}/{nb_messages}] Order {commande['order_id']} | "
              f"Client {commande['customer_id']} | "
              f"Total {commande['total']:,.2f} EUR | "
              f"Partition {result.partition}")
        
        time.sleep(delai_ms / 1000.0)
    
    print("=" * 50)
    print("Simulation terminee")

In [10]:
# Lancer une simulation (10 messages, 300ms entre chaque)
simuler_flux(producer, "commandes-json", 10, delai_ms=300)

Simulation de 10 messages...
Delai entre messages: 300ms
[1/10] Order 90886 | Client CUST-016 | Total 999.99 EUR | Partition 0
[2/10] Order 83305 | Client CUST-019 | Total 59.98 EUR | Partition 1
[3/10] Order 65888 | Client CUST-006 | Total 1,059.97 EUR | Partition 1
[4/10] Order 95469 | Client CUST-002 | Total 39.99 EUR | Partition 0
[5/10] Order 93582 | Client CUST-008 | Total 2,199.91 EUR | Partition 2
[6/10] Order 42458 | Client CUST-020 | Total 609.93 EUR | Partition 0
[7/10] Order 11722 | Client CUST-008 | Total 3,269.93 EUR | Partition 2
[8/10] Order 73234 | Client CUST-010 | Total 499.94 EUR | Partition 2
[9/10] Order 17834 | Client CUST-009 | Total 929.96 EUR | Partition 1
[10/10] Order 62171 | Client CUST-003 | Total 459.94 EUR | Partition 1
Simulation terminee


## 7. Producer avec callbacks

In [11]:
# Callbacks pour gerer succes et erreurs
def on_success(record_metadata):
    """Callback appele en cas de succes"""
    print(f"[OK] Topic={record_metadata.topic}, "
          f"Partition={record_metadata.partition}, "
          f"Offset={record_metadata.offset}")

def on_error(exception):
    """Callback appele en cas d'erreur"""
    print(f"[ERREUR] {exception}")

In [12]:
# Utiliser les callbacks
for i in range(5):
    commande = generer_commande()
    
    producer.send(
        topic="commandes-json",
        key=commande["customer_id"],
        value=commande
    ).add_callback(on_success).add_errback(on_error)

producer.flush()

[OK] Topic=commandes-json, Partition=2, Offset=23
[OK] Topic=commandes-json, Partition=1, Offset=18
[OK] Topic=commandes-json, Partition=0, Offset=20
[OK] Topic=commandes-json, Partition=0, Offset=21
[OK] Topic=commandes-json, Partition=0, Offset=22


## 8. Producer de logs

In [13]:
# Generateur de logs applicatifs
NIVEAUX_LOG = ["INFO", "INFO", "INFO", "WARNING", "ERROR"]  # Plus de INFO
MODULES = ["auth", "api", "database", "cache", "payment"]
MESSAGES = {
    "auth": ["User login", "User logout", "Authentication failed", "Session expired"],
    "api": ["Request received", "Response sent", "Rate limit exceeded", "Invalid request"],
    "database": ["Query executed", "Connection opened", "Connection closed", "Query timeout"],
    "cache": ["Cache hit", "Cache miss", "Cache cleared", "Cache updated"],
    "payment": ["Payment initiated", "Payment completed", "Payment failed", "Refund processed"]
}

def generer_log():
    """Genere un log applicatif"""
    module = random.choice(MODULES)
    niveau = random.choice(NIVEAUX_LOG)
    message = random.choice(MESSAGES[module])
    
    return {
        "timestamp": datetime.now().isoformat(),
        "level": niveau,
        "module": module,
        "message": message,
        "request_id": f"req-{random.randint(10000, 99999)}",
        "user_id": f"user-{random.randint(1, 100)}",
        "duration_ms": random.randint(1, 500)
    }

In [14]:
# Envoyer des logs
print("Envoi de logs...")
print("=" * 60)

for i in range(15):
    log = generer_log()
    
    # Utiliser le niveau comme cle
    future = producer.send(
        topic="logs-application",
        key=log["level"],
        value=log
    )
    
    result = future.get(timeout=10)
    print(f"[{log['level']:7}] {log['module']:10} | {log['message']}")

producer.flush()
print("=" * 60)

Envoi de logs...
[ERROR  ] database   | Connection closed
[INFO   ] cache      | Cache hit
[INFO   ] auth       | Authentication failed
[INFO   ] auth       | User login
[ERROR  ] api        | Invalid request
[INFO   ] auth       | Session expired
[INFO   ] auth       | Session expired
[INFO   ] cache      | Cache hit
[ERROR  ] api        | Response sent
[ERROR  ] database   | Connection opened
[INFO   ] payment    | Refund processed


## 9. Producer de metriques

In [15]:
def generer_metriques():
    """Genere des metriques systeme"""
    return {
        "timestamp": datetime.now().isoformat(),
        "host": f"server-{random.randint(1, 5):02d}",
        "metrics": {
            "cpu_percent": round(random.uniform(10, 95), 2),
            "memory_percent": round(random.uniform(30, 90), 2),
            "disk_percent": round(random.uniform(20, 85), 2),
            "network_in_mbps": round(random.uniform(10, 500), 2),
            "network_out_mbps": round(random.uniform(5, 200), 2),
            "requests_per_sec": random.randint(100, 5000),
            "response_time_ms": round(random.uniform(5, 100), 2)
        }
    }

# Envoyer des metriques
print("Envoi de metriques...")
for i in range(10):
    metric = generer_metriques()
    
    producer.send(
        topic="metrics",
        key=metric["host"],
        value=metric
    )
    
    print(f"{metric['host']} | CPU: {metric['metrics']['cpu_percent']}% | "
          f"MEM: {metric['metrics']['memory_percent']}%")

producer.flush()

Envoi de metriques...
server-01 | CPU: 16.91% | MEM: 86.33%
server-05 | CPU: 62.19% | MEM: 81.09%
server-04 | CPU: 39.81% | MEM: 42.51%
server-04 | CPU: 93.19% | MEM: 39.51%
server-05 | CPU: 74.91% | MEM: 30.92%
server-01 | CPU: 59.84% | MEM: 78.65%
server-05 | CPU: 53.26% | MEM: 47.31%
server-02 | CPU: 40.06% | MEM: 39.78%
server-04 | CPU: 43.33% | MEM: 67.84%
server-02 | CPU: 19.16% | MEM: 73.88%


In [16]:
# Fermer le producer
producer.close()
print("Producer ferme")

Producer ferme


---

## Exercice

**Objectif** : Creer un producer personnalise

**Consigne** :
1. Creez un topic "exercice-events"
2. Creez un generateur d'evenements utilisateur (click, view, purchase)
3. Envoyez 20 evenements avec une simulation de flux

A vous de jouer :

In [17]:
# TODO: Creer le topic
from kafka import KafkaAdminClient
from kafka.admin import NewTopic
from kafka.errors import TopicAlreadyExistsError

# Configuration avec VOTRE port spécifique
KAFKA_BROKER = "broker:29092"
TOPIC_NAME = "exercice-events"

def create_topic_safe():
    admin = KafkaAdminClient(bootstrap_servers=KAFKA_BROKER)
    try:
        # Création d'un topic avec 2 partitions
        topic = NewTopic(name=TOPIC_NAME, num_partitions=2, replication_factor=1)
        admin.create_topics([topic])
        print(f"Topic '{TOPIC_NAME}' créé avec succès.")
    except TopicAlreadyExistsError:
        print(f"Le topic '{TOPIC_NAME}' existe déjà.")
    except Exception as e:
        print(f"Erreur : {e}")
    finally:
        admin.close()

create_topic_safe()

Topic 'exercice-events' créé avec succès.


In [18]:
# TODO: Creer le generateur d'evenements
import uuid

TYPES_EVENT = ["view_page", "click", "add_to_cart", "purchase", "login"]
PAGES = ["/home", "/product/A", "/product/B", "/cart", "/checkout"]

def generer_event():
    """
    Génère un événement JSON simulé.
    """
    event_type = random.choice(TYPES_EVENT)
    # Simulation d'utilisateurs récurrents (user_1 à user_50)
    user_id = f"user_{random.randint(1, 50)}"
    
    event = {
        "event_id": str(uuid.uuid4()),
        "timestamp": datetime.now().isoformat(),
        "event_type": event_type,
        "user_id": user_id,
        "page": random.choice(PAGES),
        "device": random.choice(["mobile", "desktop", "tablet"]),
        # Ajout d'un montant seulement si c'est un achat
        "amount": round(random.uniform(10.0, 500.0), 2) if event_type == "purchase" else None
    }
    return event

# Petit test pour vérifier le format
print("Exemple d'événement généré :")
print(generer_event())

Exemple d'événement généré :
{'event_id': '8da00891-e451-4f49-997b-f82656c64fd0', 'timestamp': '2026-01-18T10:10:48.160602', 'event_type': 'view_page', 'user_id': 'user_19', 'page': '/cart', 'device': 'desktop', 'amount': None}


In [19]:
# TODO: Envoyer les evenements
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers=KAFKA_BROKER,
    # Sérialisation de la CLÉ (string -> bytes)
    key_serializer=lambda k: k.encode('utf-8') if k else None,
    # Sérialisation de la VALEUR (dict -> json -> bytes)
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

print(f"Démarrage de l'envoi vers '{TOPIC_NAME}'...")

for i in range(20):
    event = generer_event()
    
    # Envoi avec clé pour le partitionnement
    future = producer.send(
        topic=TOPIC_NAME,
        key=event["user_id"],
        value=event
    )
    
    # On attend le résultat pour afficher la partition (juste pour la démo)
    record_metadata = future.get(timeout=10)
    
    print(f"[{i+1}/20] {event['user_id']} -> {event['event_type']:15} | Partition {record_metadata.partition}")
    
    time.sleep(0.5) # Pause pour simuler un flux humain

producer.flush()
producer.close()
print("Simulation terminée.")

Démarrage de l'envoi vers 'exercice-events'...
[1/20] user_16 -> add_to_cart     | Partition 1
[2/20] user_43 -> add_to_cart     | Partition 1
[3/20] user_9 -> add_to_cart     | Partition 0
[4/20] user_12 -> view_page       | Partition 0
[5/20] user_40 -> view_page       | Partition 0
[6/20] user_41 -> login           | Partition 1
[7/20] user_19 -> login           | Partition 1
[8/20] user_39 -> click           | Partition 1
[9/20] user_26 -> add_to_cart     | Partition 1
[10/20] user_10 -> add_to_cart     | Partition 1
[11/20] user_14 -> login           | Partition 0
[12/20] user_7 -> purchase        | Partition 0
[13/20] user_34 -> add_to_cart     | Partition 1
[14/20] user_47 -> view_page       | Partition 1
[15/20] user_9 -> click           | Partition 0
[16/20] user_44 -> add_to_cart     | Partition 0
[17/20] user_36 -> add_to_cart     | Partition 1
[18/20] user_10 -> click           | Partition 1
[19/20] user_18 -> add_to_cart     | Partition 1
[20/20] user_5 -> click           

---

## Resume

Dans ce notebook, vous avez appris :
- Comment configurer un **Producer Kafka** robuste
- Comment **serialiser en JSON**
- Comment utiliser les **cles de partitionnement**
- Comment **envoyer en batch**
- Comment **simuler un flux** en temps reel
- Comment utiliser les **callbacks**

### Prochaine etape
Dans le prochain notebook, nous apprendrons a consommer les messages avec Spark.