# TP Big Data - Initialisation du Streaming Kafka pour TAN API

Ce notebook permet d'initialiser le flux de données depuis l'API TAN vers Kafka. Nous collectons des données en temps réel sur les arrêts de transport en commun de Nantes et les temps d'attente.

## Importation des bibliothèques nécessaires

In [1]:
import requests
from kafka import KafkaProducer
import json
import time
from datetime import datetime
import logging

# Configuration du logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

## Fonction d'envoi des données vers Kafka

Cette fonction récupère les données depuis l'API TAN et les envoie à un topic Kafka.

In [3]:
def send_tan_to_kafka(topic, api_url, fields={}):
    """
    Récupère les données de l'API TAN et les envoie vers Kafka.
    
    Args:
        topic (str): Nom du topic Kafka
        api_url (str): URL de l'API TAN
        fields (dict): Mapping des champs à renommer (optionnel)
    """
    # Configuration Kafka
    kafka_config = {
        'bootstrap_servers': 'kafka1:9092',
    }

    # Initialisation du producteur Kafka
    producer = KafkaProducer(
        bootstrap_servers=kafka_config['bootstrap_servers'],
        value_serializer=lambda v: json.dumps(v).encode('utf-8')
    )

    try:
        # Récupération des données depuis l'API TAN
        response = requests.get(api_url, timeout=10)

        if response.status_code == 200:
            data = response.json()
            count = 0
            
            # Traitement des données selon le format (liste ou objet unique)
            if isinstance(data, list):
                for entry in data:
                    # Application du mapping de champs si nécessaire
                    for field in fields:
                        if field in entry:
                            entry[fields[field]] = entry.pop(field)
                    
                    # Ajout d'un timestamp pour le traitement en streaming
                    entry['timestamp'] = datetime.now().isoformat()
                    
                    # Envoi des données à Kafka
                    producer.send(topic, value=entry)
                    print(f"Sent: {entry}")
                    count += 1
            else:
                # Ajouter le timestamp
                data['timestamp'] = datetime.now().isoformat()
                producer.send(topic, value=data)
                print(f"Sent: {data}")
                count = 1

            # S'assurer que tous les messages sont envoyés
            producer.flush()
            print(f"Sent {count} records.")
        else:
            print(f"Failed to fetch data: {response.status_code}, {response.text}")
    except Exception as e:
        print(f"Error: {e}")
    finally:
        producer.close()

## Collecte des arrêts à proximité

Nous allons récupérer les arrêts à proximité d'un point donné (par exemple, place du Commerce à Nantes).

In [4]:
# Configuration pour les arrêts de bus/tram
latitude = "47.21661"  # Place du Commerce
longitude = "-1.556754"
api_url_stops = f"https://open.tan.fr/ewp/arrets.json/{latitude}/{longitude}"

# Mapping des champs (optionnel) pour harmoniser les noms
fields_stops = {
    "codeLieu": "stop_code",
    "libelle": "stop_name",
    "distance": "stop_distance"
}

# Envoi des données des arrêts à Kafka
send_tan_to_kafka("tan_stops", api_url_stops, fields_stops)

2025-03-24 21:31:27,457 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:27,459 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:31:27,461 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:27,571 - INFO - Broker version identified as 2.5.0
2025-03-24 21:31:27,572 - INFO - Set configuration api_version=(2, 5, 0) to skip auto check_version requests on startup
2025-03-24 21:31:27,760 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:27,763 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:27,764 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:90

Sent: {'ligne': [{'numLigne': '11'}, {'numLigne': '2'}, {'numLigne': '23'}, {'numLigne': '2B'}, {'numLigne': '3B'}, {'numLigne': 'C1'}, {'numLigne': 'C2'}, {'numLigne': 'C6'}, {'numLigne': 'NC'}, {'numLigne': 'NN'}, {'numLigne': 'NO'}], 'stop_code': 'CRQU', 'stop_name': 'Place du Cirque', 'stop_distance': '61 m', 'timestamp': '2025-03-24T21:31:27.705372'}
Sent: {'ligne': [{'numLigne': '11'}, {'numLigne': '12'}, {'numLigne': '23'}, {'numLigne': '2B'}, {'numLigne': '3B'}, {'numLigne': 'C1'}, {'numLigne': 'C2'}, {'numLigne': 'C6'}, {'numLigne': 'NO'}], 'stop_code': 'CMAR', 'stop_name': 'Cirque - Marais', 'stop_distance': '163 m', 'timestamp': '2025-03-24T21:31:28.867297'}
Sent: {'ligne': [{'numLigne': '3'}], 'stop_code': 'BRTA', 'stop_name': 'Bretagne', 'stop_distance': '164 m', 'timestamp': '2025-03-24T21:31:28.867497'}
Sent: {'ligne': [{'numLigne': '11'}, {'numLigne': '23'}, {'numLigne': '26'}, {'numLigne': '54'}, {'numLigne': 'C1'}, {'numLigne': 'C3'}, {'numLigne': 'C6'}], 'stop_code':

## Collecte des temps d'attente

Récupérons les temps d'attente à un arrêt spécifique (par exemple, Place du Cirque - CRQU).

In [5]:
# Configuration pour les temps d'attente
stop_code = "CRQU"  # Place du Cirque
api_url_wait = f"https://open.tan.fr/ewp/tempsattente.json/{stop_code}"

# Pas de mapping spécifique pour cet endpoint
# Envoi des données de temps d'attente à Kafka
send_tan_to_kafka("tan_wait_times", api_url_wait)

2025-03-24 21:31:32,512 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:32,514 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:31:32,516 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:32,633 - INFO - Broker version identified as 2.5.0
2025-03-24 21:31:32,634 - INFO - Set configuration api_version=(2, 5, 0) to skip auto check_version requests on startup
2025-03-24 21:31:33,125 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:33,127 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:33,128 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:90

Sent: {'sens': 1, 'terminus': 'Orvault Grand Val', 'infotrafic': False, 'temps': 'proche', 'dernierDepart': 'false', 'tempsReel': 'true', 'ligne': {'numLigne': '2', 'typeLigne': 1}, 'arret': {'codeArret': 'CRQU1'}, 'timestamp': '2025-03-24T21:31:33.069897'}
Sent: {'sens': 2, 'terminus': 'Espace Diderot', 'infotrafic': False, 'temps': '9mn', 'dernierDepart': 'false', 'tempsReel': 'true', 'ligne': {'numLigne': '2', 'typeLigne': 1}, 'arret': {'codeArret': 'CRQU2'}, 'timestamp': '2025-03-24T21:31:33.471074'}
Sent: {'sens': 1, 'terminus': 'Orvault Grand Val', 'infotrafic': False, 'temps': '16mn', 'dernierDepart': 'false', 'tempsReel': 'false', 'ligne': {'numLigne': '2', 'typeLigne': 1}, 'arret': {'codeArret': 'CRQU1'}, 'timestamp': '2025-03-24T21:31:33.471153'}
Sent: {'sens': 2, 'terminus': 'Gare de Pont Rousseau', 'infotrafic': False, 'temps': '24mn', 'dernierDepart': 'false', 'tempsReel': 'true', 'ligne': {'numLigne': '2', 'typeLigne': 1}, 'arret': {'codeArret': 'CRQU2'}, 'timestamp': '20

## Collecte des horaires à un arrêt spécifique

Récupérons les horaires détaillés pour un arrêt, une ligne et une direction spécifiques.

In [6]:
# Configuration pour les horaires
stop_code = "COMM"  # Commerce
line = "1"          # Ligne 1
direction = "1"     # Direction 1
api_url_schedule = f"https://open.tan.fr/ewp/horairesarret.json/{stop_code}/{line}/{direction}"

# Envoi des données d'horaires à Kafka
send_tan_to_kafka("tan_schedules", api_url_schedule)

2025-03-24 21:31:36,343 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:36,344 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:31:36,346 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:36,454 - INFO - Broker version identified as 2.5.0
2025-03-24 21:31:36,455 - INFO - Set configuration api_version=(2, 5, 0) to skip auto check_version requests on startup
2025-03-24 21:31:36,596 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 


Failed to fetch data: 500, <!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8" />
    <meta name="robots" content="noindex,nofollow,noarchive" />
    <title>An Error Occurred: Internal Server Error</title>
    <style>body { background-color: #fff; color: #222; font: 16px/1.5 -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif; margin: 0; }
.container { margin: 30px; max-width: 600px; }
h1 { color: #dc3545; font-size: 24px; }
h2 { font-size: 18px; }</style>
</head>
<body>
<div class="container">
    <h1>Oops! An Error Occurred</h1>
    <h2>The server returned a "500 Internal Server Error".</h2>

    <p>
        Something is broken. Please let us know what you were doing when this error occurred.
        We will fix it as soon as possible. Sorry for any inconvenience caused.
    </p>
</div>
</body>
</html>


## Collecte continue des données

Pour une analyse en temps réel, nous pouvons collecter les données à intervalles réguliers.

In [7]:
def continuous_data_collection(duration_minutes=5, interval_seconds=60):
    """
    Collecte continue de données pour une durée déterminée.
    
    Args:
        duration_minutes (int): Durée de la collecte en minutes (0 = infini)
        interval_seconds (int): Intervalle entre les collectes en secondes
    """
    # Points d'intérêt à Nantes
    locations = [
        ("47.21661", "-1.556754"),  # Place du Commerce
        ("47.2175", "-1.5419")      # Gare SNCF
    ]
    
    # Arrêts importants
    stop_codes = ["CRQU", "COMM", "PIRA1"]
    
    # Horaires des lignes principales
    schedules = [
        ("COMM", "1", "1"),  # Commerce, ligne 1, direction 1
        ("CRQU", "2", "1")   # Place du Cirque, ligne 2, direction 1
    ]
    
    # Calcul du nombre d'itérations
    max_iterations = float('inf') if duration_minutes == 0 else (duration_minutes * 60) // interval_seconds
    iteration = 0
    
    try:
        start_time = time.time()
        while iteration < max_iterations:
            print(f"\n--- Itération {iteration + 1} ---")
            
            # Collecte des arrêts à proximité pour chaque localisation
            for lat, lon in locations:
                api_url_stops = f"https://open.tan.fr/ewp/arrets.json/{lat}/{lon}"
                send_tan_to_kafka("tan_stops", api_url_stops, fields_stops)
            
            # Collecte des temps d'attente pour chaque arrêt
            for stop in stop_codes:
                api_url_wait = f"https://open.tan.fr/ewp/tempsattente.json/{stop}"
                send_tan_to_kafka("tan_wait_times", api_url_wait)
            
            # Collecte des horaires pour chaque configuration
            for stop, line, direction in schedules:
                api_url_schedule = f"https://open.tan.fr/ewp/horairesarret.json/{stop}/{line}/{direction}"
                send_tan_to_kafka("tan_schedules", api_url_schedule)
            
            # Attente avant la prochaine collecte
            iteration += 1
            elapsed = time.time() - start_time
            print(f"Collecte en cours depuis {elapsed:.1f} secondes. Attente de {interval_seconds} secondes...")
            time.sleep(interval_seconds)
            
        print(f"Collecte terminée après {duration_minutes} minutes")
    except KeyboardInterrupt:
        print("\nCollecte interrompue par l'utilisateur")
    except Exception as e:
        print(f"\nErreur lors de la collecte: {e}")

Exécutez cette cellule pour lancer une collecte continue pendant 5 minutes avec un intervalle d'une minute entre chaque collecte.

In [8]:
# Collecte continue pendant 5 minutes (intervalle de 60 secondes)
continuous_data_collection(duration_minutes=5, interval_seconds=60)

2025-03-24 21:31:42,729 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:42,731 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:31:42,733 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:42,842 - INFO - Broker version identified as 2.5.0
2025-03-24 21:31:42,843 - INFO - Set configuration api_version=(2, 5, 0) to skip auto check_version requests on startup
2025-03-24 21:31:42,922 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:42,923 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:42,924 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:90


--- Itération 1 ---
Sent: {'ligne': [{'numLigne': '11'}, {'numLigne': '2'}, {'numLigne': '23'}, {'numLigne': '2B'}, {'numLigne': '3B'}, {'numLigne': 'C1'}, {'numLigne': 'C2'}, {'numLigne': 'C6'}, {'numLigne': 'NC'}, {'numLigne': 'NN'}, {'numLigne': 'NO'}], 'stop_code': 'CRQU', 'stop_name': 'Place du Cirque', 'stop_distance': '61 m', 'timestamp': '2025-03-24T21:31:42.919389'}
Sent: {'ligne': [{'numLigne': '11'}, {'numLigne': '12'}, {'numLigne': '23'}, {'numLigne': '2B'}, {'numLigne': '3B'}, {'numLigne': 'C1'}, {'numLigne': 'C2'}, {'numLigne': 'C6'}, {'numLigne': 'NO'}], 'stop_code': 'CMAR', 'stop_name': 'Cirque - Marais', 'stop_distance': '163 m', 'timestamp': '2025-03-24T21:31:42.919680'}
Sent: {'ligne': [{'numLigne': '3'}], 'stop_code': 'BRTA', 'stop_name': 'Bretagne', 'stop_distance': '164 m', 'timestamp': '2025-03-24T21:31:42.919873'}
Sent: {'ligne': [{'numLigne': '11'}, {'numLigne': '23'}, {'numLigne': '26'}, {'numLigne': '54'}, {'numLigne': 'C1'}, {'numLigne': 'C3'}, {'numLigne':

2025-03-24 21:31:42,942 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:31:42,946 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:42,947 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:31:42,948 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:43,052 - INFO - Broker version identified as 2.5.0
2025-03-24 21:31:43,053 - INFO - Set configuration api_version=(2, 5, 0) to skip auto check_version requests on startup


Sent 10 records.


2025-03-24 21:31:43,156 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:43,158 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:43,158 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:31:43,182 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:31:43,186 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:43,186 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:31:43,188 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Co

Sent: {'ligne': [{'numLigne': '1'}, {'numLigne': '1B'}], 'stop_code': 'GSNO', 'stop_name': 'Gare Nord - Jardin des Plantes', 'stop_distance': '47 m', 'timestamp': '2025-03-24T21:31:43.154778'}
Sent: {'ligne': [{'numLigne': '5'}, {'numLigne': '54'}, {'numLigne': 'C2'}, {'numLigne': 'C3'}, {'numLigne': 'NA'}, {'numLigne': 'NS'}], 'stop_code': 'GSSU', 'stop_name': 'Gare Sud', 'stop_distance': '176 m', 'timestamp': '2025-03-24T21:31:43.155088'}
Sent: {'ligne': [{'numLigne': '11'}, {'numLigne': '12'}, {'numLigne': 'C1'}, {'numLigne': 'C6'}], 'stop_code': 'TBCH', 'stop_name': 'Trébuchet', 'stop_distance': '379 m', 'timestamp': '2025-03-24T21:31:43.155192'}
Sent: {'ligne': [{'numLigne': '54'}, {'numLigne': 'C2'}, {'numLigne': 'C3'}, {'numLigne': 'NA'}, {'numLigne': 'NS'}], 'stop_code': 'LUNI', 'stop_name': 'Lieu Unique', 'stop_distance': '379 m', 'timestamp': '2025-03-24T21:31:43.155279'}
Sent: {'ligne': [{'numLigne': '1'}, {'numLigne': '1B'}], 'stop_code': 'MNFA', 'stop_name': 'Manufacture',

2025-03-24 21:31:43,809 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:43,819 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:43,823 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:31:43,836 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:31:43,839 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:43,839 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:31:43,841 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Co

Sent: {'sens': 1, 'terminus': 'Orvault Grand Val', 'infotrafic': False, 'temps': 'proche', 'dernierDepart': 'false', 'tempsReel': 'true', 'ligne': {'numLigne': '2', 'typeLigne': 1}, 'arret': {'codeArret': 'CRQU1'}, 'timestamp': '2025-03-24T21:31:43.776969'}
Sent: {'sens': 2, 'terminus': 'Espace Diderot', 'infotrafic': False, 'temps': '9mn', 'dernierDepart': 'false', 'tempsReel': 'true', 'ligne': {'numLigne': '2', 'typeLigne': 1}, 'arret': {'codeArret': 'CRQU2'}, 'timestamp': '2025-03-24T21:31:43.779689'}
Sent: {'sens': 1, 'terminus': 'Orvault Grand Val', 'infotrafic': False, 'temps': '16mn', 'dernierDepart': 'false', 'tempsReel': 'false', 'ligne': {'numLigne': '2', 'typeLigne': 1}, 'arret': {'codeArret': 'CRQU1'}, 'timestamp': '2025-03-24T21:31:43.779891'}
Sent: {'sens': 2, 'terminus': 'Gare de Pont Rousseau', 'infotrafic': False, 'temps': '24mn', 'dernierDepart': 'false', 'tempsReel': 'true', 'ligne': {'numLigne': '2', 'typeLigne': 1}, 'arret': {'codeArret': 'CRQU2'}, 'timestamp': '20

2025-03-24 21:31:44,493 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:44,494 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:44,495 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:31:44,507 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:31:44,510 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:44,511 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:31:44,512 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Co

Sent: {'sens': 2, 'terminus': 'Commerce', 'infotrafic': True, 'temps': '4mn', 'dernierDepart': 'false', 'tempsReel': 'true', 'ligne': {'numLigne': '1', 'typeLigne': 1}, 'arret': {'codeArret': 'COMC1'}, 'timestamp': '2025-03-24T21:31:44.490824'}
Sent: {'sens': 1, 'terminus': 'Jamet', 'infotrafic': True, 'temps': '7mn', 'dernierDepart': 'false', 'tempsReel': 'true', 'ligne': {'numLigne': '1', 'typeLigne': 1}, 'arret': {'codeArret': 'COMB2'}, 'timestamp': '2025-03-24T21:31:44.491038'}
Sent: {'sens': 2, 'terminus': 'Commerce', 'infotrafic': True, 'temps': '11mn', 'dernierDepart': 'false', 'tempsReel': 'false', 'ligne': {'numLigne': '1', 'typeLigne': 1}, 'arret': {'codeArret': 'COMC1'}, 'timestamp': '2025-03-24T21:31:44.491103'}
Sent: {'sens': 1, 'terminus': 'François Mitterrand', 'infotrafic': True, 'temps': '17mn', 'dernierDepart': 'false', 'tempsReel': 'false', 'ligne': {'numLigne': '1', 'typeLigne': 1}, 'arret': {'codeArret': 'COMB2'}, 'timestamp': '2025-03-24T21:31:44.491156'}
Sent: {'

2025-03-24 21:31:44,988 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:44,989 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:44,990 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:31:45,013 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:31:45,020 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:45,021 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:31:45,023 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Co

Sent: {'sens': 1, 'terminus': 'Basse-Goulaine', 'infotrafic': False, 'temps': '19mn', 'dernierDepart': 'false', 'tempsReel': 'true', 'ligne': {'numLigne': 'C9', 'typeLigne': 3}, 'arret': {'codeArret': 'PIRA1'}, 'timestamp': '2025-03-24T21:31:44.984898'}
Sent: {'sens': 1, 'terminus': 'Chalonges', 'infotrafic': False, 'temps': '49mn', 'dernierDepart': 'false', 'tempsReel': 'true', 'ligne': {'numLigne': 'C9', 'typeLigne': 3}, 'arret': {'codeArret': 'PIRA1'}, 'timestamp': '2025-03-24T21:31:44.985469'}
Sent: {'sens': 1, 'terminus': 'La Herdrie', 'infotrafic': False, 'temps': '', 'dernierDepart': 'false', 'tempsReel': 'false', 'ligne': {'numLigne': '27', 'typeLigne': 3}, 'arret': {'codeArret': 'PIRA1'}, 'timestamp': '2025-03-24T21:31:44.985685'}
Sent: {'sens': 1, 'terminus': 'Chalonges', 'infotrafic': False, 'temps': '', 'dernierDepart': 'false', 'tempsReel': 'false', 'ligne': {'numLigne': '27', 'typeLigne': 3}, 'arret': {'codeArret': 'PIRA1'}, 'timestamp': '2025-03-24T21:31:44.985920'}
Sent

2025-03-24 21:31:45,257 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:31:45,261 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:31:45,262 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:31:45,264 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:31:45,370 - INFO - Broker version identified as 2.5.0
2025-03-24 21:31:45,371 - INFO - Set configuration api_version=(2, 5, 0) to skip auto check_version requests on startup


Failed to fetch data: 500, <!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8" />
    <meta name="robots" content="noindex,nofollow,noarchive" />
    <title>An Error Occurred: Internal Server Error</title>
    <style>body { background-color: #fff; color: #222; font: 16px/1.5 -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif; margin: 0; }
.container { margin: 30px; max-width: 600px; }
h1 { color: #dc3545; font-size: 24px; }
h2 { font-size: 18px; }</style>
</head>
<body>
<div class="container">
    <h1>Oops! An Error Occurred</h1>
    <h2>The server returned a "500 Internal Server Error".</h2>

    <p>
        Something is broken. Please let us know what you were doing when this error occurred.
        We will fix it as soon as possible. Sorry for any inconvenience caused.
    </p>
</div>
</body>
</html>


2025-03-24 21:31:45,512 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 


Failed to fetch data: 500, <!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8" />
    <meta name="robots" content="noindex,nofollow,noarchive" />
    <title>An Error Occurred: Internal Server Error</title>
    <style>body { background-color: #fff; color: #222; font: 16px/1.5 -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif; margin: 0; }
.container { margin: 30px; max-width: 600px; }
h1 { color: #dc3545; font-size: 24px; }
h2 { font-size: 18px; }</style>
</head>
<body>
<div class="container">
    <h1>Oops! An Error Occurred</h1>
    <h2>The server returned a "500 Internal Server Error".</h2>

    <p>
        Something is broken. Please let us know what you were doing when this error occurred.
        We will fix it as soon as possible. Sorry for any inconvenience caused.
    </p>
</div>
</body>
</html>
Collecte en cours depuis 2.8 secondes. Attente de 60 secondes...

Collecte interrompue par l'utilisateur


## Vérification des données dans Kafka

Vérifions que les données ont bien été envoyées à Kafka en utilisant un consommateur simple.

In [9]:
from kafka import KafkaConsumer
import json

def check_kafka_topic(topic, max_messages=5):
    """
    Vérifie les messages dans un topic Kafka.
    
    Args:
        topic (str): Nom du topic à vérifier
        max_messages (int): Nombre maximum de messages à récupérer
    """
    try:
        # Initialisation du consommateur Kafka
        consumer = KafkaConsumer(
            topic,
            bootstrap_servers="kafka1:9092",
            value_deserializer=lambda m: json.loads(m.decode('utf-8')),
            auto_offset_reset="latest",  # Récupère uniquement les nouveaux messages
            consumer_timeout_ms=5000  # Timeout après 5 secondes sans message
        )
        
        print(f"Vérification du topic '{topic}' (max {max_messages} messages)...")
        count = 0
        for message in consumer:
            print(f"Message {count + 1}: {message.value}")
            count += 1
            if count >= max_messages:
                break
                
        if count == 0:
            print(f"Aucun nouveau message dans le topic '{topic}'")
        else:
            print(f"Récupéré {count} messages du topic '{topic}'")
            
        consumer.close()
    except Exception as e:
        print(f"Erreur lors de la lecture du topic Kafka: {e}")

In [10]:
# Vérification des topics
check_kafka_topic("tan_stops", max_messages=3)

2025-03-24 20:47:32,965 - INFO - <BrokerConnection client_id=kafka-python-2.1.3.dev, node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 20:47:32,985 - INFO - Broker version identified as 2.6
2025-03-24 20:47:32,986 - INFO - <BrokerConnection client_id=kafka-python-2.1.3.dev, node_id=bootstrap-0 host=kafka1:9092 <checking_api_versions_recv> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 20:47:32,991 - INFO - Updating subscribed topics to: ('tan_stops',)
2025-03-24 20:47:32,995 - INFO - Updated partition assignment: [TopicPartition(topic='tan_stops', partition=0), TopicPartition(topic='tan_stops', partition=1), TopicPartition(topic='tan_stops', partition=2)]
2025-03-24 20:47:32,997 - INFO - <BrokerConnection client_id=kafka-python-2.1.3.dev, node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 20:47:

Vérification du topic 'tan_stops' (max 3 messages)...


2025-03-24 20:47:37,996 - INFO - <BrokerConnection client_id=kafka-python-2.1.3.dev, node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 20:47:37,997 - INFO - Fetch to node 1 failed: Cancelled: <BrokerConnection client_id=kafka-python-2.1.3.dev, node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>


Aucun nouveau message dans le topic 'tan_stops'


In [10]:
check_kafka_topic("tan_wait_times", max_messages=3)

2025-03-24 21:32:02,660 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:32:02,664 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:32:02,669 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:32:02,783 - INFO - Broker version identified as 2.5.0
2025-03-24 21:32:02,784 - INFO - Set configuration api_version=(2, 5, 0) to skip auto check_version requests on startup
2025-03-24 21:32:02,789 - INFO - Updating subscribed topics to: ('tan_wait_times',)
2025-03-24 21:32:02,794 - INFO - Updated partition assignment: [TopicPartition(topic='tan_wait_times', partition=0)]
2025-03-24 21:32:02,796 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:32:02,798 - 

Vérification du topic 'tan_wait_times' (max 3 messages)...


2025-03-24 21:32:07,800 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:32:07,803 - ERROR - Fetch to node 1 failed: Cancelled: <BrokerConnection node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>


Aucun nouveau message dans le topic 'tan_wait_times'


In [11]:
check_kafka_topic("tan_schedules", max_messages=3)

2025-03-24 21:32:07,824 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:32:07,829 - INFO - Probing node bootstrap-0 broker version
2025-03-24 21:32:07,831 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:32:07,939 - INFO - Broker version identified as 2.5.0
2025-03-24 21:32:07,940 - INFO - Set configuration api_version=(2, 5, 0) to skip auto check_version requests on startup
2025-03-24 21:32:07,943 - INFO - Updating subscribed topics to: ('tan_schedules',)


Vérification du topic 'tan_schedules' (max 3 messages)...


2025-03-24 21:32:08,098 - INFO - Updated partition assignment: [TopicPartition(topic='tan_schedules', partition=0)]
2025-03-24 21:32:08,099 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: connecting to kafka1:9092 [('172.21.0.6', 9092) IPv4]
2025-03-24 21:32:08,100 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connecting> [IPv4 ('172.21.0.6', 9092)]>: Connection complete.
2025-03-24 21:32:08,101 - INFO - <BrokerConnection node_id=bootstrap-0 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:32:12,951 - INFO - <BrokerConnection node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>: Closing connection. 
2025-03-24 21:32:12,954 - ERROR - Fetch to node 1 failed: Cancelled: <BrokerConnection node_id=1 host=kafka1:9092 <connected> [IPv4 ('172.21.0.6', 9092)]>


Aucun nouveau message dans le topic 'tan_schedules'
