# Exercise: Building a Kafka Producer for Weather Data

In this exercise, you will create a Kafka producer script that retrieves weather data from the OpenWeatherMap API for a specified location every minute and publishes it to a Kafka topic. This project will help you understand how to integrate external APIs with Kafka for real-time data streaming.

## Table of Contents

1. Prerequisites
2. Setting Up the Environment
3. Installing Required Packages
4. Obtaining an OpenWeatherMap API Key
5. Setting Up Kafka
6. Creating the Kafka Producer Script
7. Running the Kafka Producer
8. Verifying the Produced Messages
9. Conclusion

## 1. Prerequisites

Before you begin, ensure you have the following:

- **Python 3.7 or higher**: Installed on your machine. You can download it from Python's official website.

- **Kafka Cluster**: A running Kafka instance. You can set up a local Kafka environment using Confluent Platform or use a managed Kafka service.

## 2. Setting Up the Environment

It's good practice to create a virtual environment for your project to manage dependencies.

### 2.1 Create a Project Directory

`# Create a new directory for the project`

`mkdir kafka_weather_producer`

`cd kafka_weather_producer`

In [None]:
# Alternatively, using Jupyter Notebook cell magic
%mkdir kafka_weather_producer
%cd kafka_weather_producer

### 2.2 Initialize and Active a Virtual Environment

If you're working in this notebook, you should already have made an environment to run the cell above. Either way, here are the commands for creating a virtual environment:

`# Create a virtual environment named 'venv'`

`python3 -m venv venv`

`# Activate the virtual environment`

`source venv/bin/activate`

*Note: On Windows, activate the virtual environment using `venv\Scripts\activate`.*

## 3. Installing Required Packages

Install the necessary Python packages using `pip`.

In [None]:
%pip install confluent_kafka requests

### 3.1 Verify Installation

In [None]:
%pip list

*Ensure that `confluent_kafka` and `requests` are listed.*

## 4. Obtaining an OpenWeatherMap API key

**Sign Up**: Go to OpenWeatherMap Sign Up and create an account.

**API Key**: After verifying your email, navigate to the API keys section to retrieve your API key.

*Keep your API key secure and do not share it publicly.*

## 5. Setting Up Kafka

Unlike the traditional Kafka setup that relies on ZooKeeper, KRaft mode allows Kafka to handle its own metadata management, simplifying the architecture. In this section, we'll set up Kafka in KRaft mode using Docker Compose, ensuring that all required configurations, including a unique `CLUSTER_ID`, are properly addressed.

### 5.1 Generating a unique `CLUSTER_ID`

A CLUSTER_ID is essential for Kafka in KRaft mode to uniquely identify the cluster. We'll generate a UUID (Universally Unique Identifier) to serve as the CLUSTER_ID.

In [None]:
import uuid

# Generate a UUID for CLUSTER_ID
cluster_id = str(uuid.uuid4())
print(f"Generated CLUSTER_ID: {cluster_id}")

*Note: The actual UUID will be different each time you run the code.*

### 5.2 Creating the Docker Compose File

We'll create a `docker-compose.yml` file that sets up Kafka in KRaft mode using the generated CLUSTER_ID. This configuration ensures that Kafka operates without ZooKeeper.

In [None]:
%%writefile docker-compose.yml
version: '3.8'

services:
  kafka:
    image: confluentinc/cp-kafka:latest
    container_name: kafka_kraft
    ports:
      - "9092:9092"  # Kafka listener
      - "9093:9093"  # Controller listener
    environment:
      # Unique Cluster ID (Replace with your generated UUID)
      CLUSTER_ID: "549a81cb-fe1e-4453-ba95-8619625cfe10"
      
      # KRaft Mode Configuration
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka_kraft:9093
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      
      # Topic Configuration
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      
      # Additional Configuration (Optional)
      KAFKA_LOG_DIRS: /var/lib/kafka/data
    volumes:
      - kafka_data:/var/lib/kafka/data

volumes:
  kafka_data:



**Important**:

Replace "e4eaaaf2-d142-11e1-b3e4-080027620cdd" with the CLUSTER_ID you generated in Step 5.1.

**Key Configuration Parameters**:

`CLUSTER_ID`: Your unique Kafka cluster identifier.

`KAFKA_NODE_ID`: Unique identifier for the Kafka broker within the cluster. Since we're setting up a single broker, this is set to 1.

`KAFKA_PROCESS_ROLES`: Defines the roles of the Kafka process. Here, it's set to both broker and controller.

`KAFKA_LISTENERS`: Specifies the endpoints for clients and controllers.

`KAFKA_ADVERTISED_LISTENERS`: Advertises the listener to clients.

`KAFKA_CONTROLLER_QUORUM_VOTERS`: Defines the controllers in the quorum. Format: <node.id>@<hostname>:<port>.

`KAFKA_CONTROLLER_LISTENER_NAMES`: Specifies which listener the controllers use.

`KAFKA_INTER_BROKER_LISTENER_NAME`: Defines which listener brokers use to communicate with each other.

`volumes`: Mounts a Docker volume to persist Kafka data.

### 5.3 Starting the Kafka Broker

With the `docker-compose.yml` configured, we'll start the Kafka broker in KRaft mode.

In [None]:
%%bash
docker compose up -d

### 5.4 Verifying Kafka is Running

In [None]:
%%bash
docker ps

### 5.5 Checking Kafka Logs for Successful Initialization

To confirm that Kafka has initialized correctly in KRaft mode with the provided CLUSTER_ID, inspect the container logs.

In [None]:
%%bash
# You'll need to stop this cell manually once your confirm initialization is complete
docker logs -f kafka_kraft

### 5.6 Creating the Kafka Topic

We'll create a Kafka topic named weather to which our producer will send messages.

In [None]:
%%bash
docker exec kafka_kraft kafka-topics --create --topic weather --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

### 5.7 Verifying Topic Creation

Confirm that the weather topic has been successfully created.

In [None]:
%%bash
docker exec kafka_kraft kafka-topics --list --bootstrap-server localhost:9092

*You should see weather listed among the topics.*

## 6 Creating the Kafka Producer Script

We'll create a Python script named weather_producer.py that fetches weather data and sends it to a Kafka topic.

### 6.1 Define Configuration Variables

In [29]:
# weather_producer.py
import time
import json
import requests
from confluent_kafka import Producer

# Configuration Parameters

# OpenWeatherMap API Configuration
OPENWEATHERMAP_API_KEY = 'removed_for_security'         # Replace with your API key
CITY_NAME = 'London'                                    # Replace with your city
COUNTRY_CODE = 'UK'                                     # Replace with your country code
UNITS = 'metric'                                        # 'metric' or 'imperial'

# Kafka Configuration
KAFKA_BOOTSTRAP_SERVERS = 'localhost:9092'  # Replace if different
KAFKA_TOPIC = 'weather'                      # Kafka topic name

# OpenWeatherMap API Endpoint
OWM_ENDPOINT = 'https://api.openweathermap.org/data/2.5/weather'

# Fetch Interval (in seconds)
FETCH_INTERVAL = 600  # 600 seconds = 10 minutes

### 6.2 Helper Functions

In [30]:
def get_weather_data(api_key, city, country, units='metric'):
    """
    Fetches weather data from OpenWeatherMap API for the specified location.
    
    :param api_key: API key for OpenWeatherMap
    :param city: City name
    :param country: Country code (e.g., 'UK')
    :param units: Units of measurement ('metric' or 'imperial')
    :return: Dictionary containing weather data or None if failed
    """
    params = {
        'q': f'{city},{country}',
        'appid': api_key,
        'units': units
    }
    try:
        response = requests.get(OWM_ENDPOINT, params=params)
        response.raise_for_status()
        data = response.json()
        
        # Extract desired weather stats
        weather = {
            'city': data.get('name'),
            'country': data.get('sys', {}).get('country'),
            'temperature': data.get('main', {}).get('temp'),
            'humidity': data.get('main', {}).get('humidity'),
            'weather_description': data.get('weather', [{}])[0].get('description'),
            'timestamp': data.get('dt')  # Unix timestamp
        }
        return weather
    except requests.exceptions.RequestException as e:
        print(f"[ERROR] Failed to fetch weather data: {e}")
        return None


### 6.3 Delivery Callback Function

In [31]:
def delivery_callback(err, msg):
    """
    Callback function called once for each message produced to indicate delivery result.
    
    :param err: Error information, if any
    :param msg: The message produced
    """
    if err:
        print(f"[ERROR] Message delivery failed: {err}")
    else:
        print(f"[SUCCESS] Message delivered to {msg.topic()} [{msg.partition()}] at offset {msg.offset()}")

### 6.4 Main Producer Function

In [32]:
def main():
    # Kafka Producer Configuration
    config = {
        'bootstrap.servers': KAFKA_BOOTSTRAP_SERVERS,
        'acks': 'all'  # Ensure all replicas acknowledge
    }

    # Create Producer instance
    producer = Producer(config)

    print(f"Starting Kafka producer for weather data: {CITY_NAME}, {COUNTRY_CODE}")
    print(f"Producing to topic '{KAFKA_TOPIC}' every {FETCH_INTERVAL} seconds.\n")

    try:
        while True:
            # Fetch weather data
            weather_data = get_weather_data(
                OPENWEATHERMAP_API_KEY,
                CITY_NAME,
                COUNTRY_CODE,
                UNITS
            )
            
            if weather_data:
                # Serialize weather data to JSON string
                weather_json = json.dumps(weather_data)
                
                # Use city name as the key (optional)
                key = weather_data['city'].encode('utf-8')
                
                # Produce message to Kafka
                producer.produce(
                    topic=KAFKA_TOPIC,
                    key=key,
                    value=weather_json,
                    callback=delivery_callback
                )
                
                # Trigger delivery report callbacks
                producer.poll(0)
            
            # Wait for the specified interval before next fetch
            time.sleep(FETCH_INTERVAL)

    except KeyboardInterrupt:
        print("\n[INFO] Producer interrupted by user. Flushing messages...")

    except Exception as e:
        print(f"[ERROR] An unexpected error occurred: {e}")

    finally:
        # Flush any remaining messages
        producer.flush()
        print("[INFO] Producer has been shut down.")


## 7 Running the Kafka Producer

If you're running this all within the notebook, edit the functions above with your `API_KEY`, `CITY_NAME` and `COUNTRY_CODE` (in variable definitions), and ensure that you're using the right address for your Kafka broker. Run the cell below to start producing. If you'd like, you can copy all the above functions (as well as the `main` execution below) into a script and execute that instead.

In [None]:
if __name__ == '__main__':
    main()

## 8 Verifying the Produced Messages

Now that you have the producer running, you can ensure that your producer is successfully sending messages to the Kafka topic. You can consume messages using the Kafka console consumer.

### 8.1 Start a Kafka Console Consumer

This command should be executed in your CLI. You'll see the produced messages in the weather topic. OpenWeatherMap updates its weather data every 10 minutes, so you'll need to wait at least that long if you want to see new messages. You can change the duration between requests, but be sure not to exceed 1000 requests in a day, or you'll get charged.

Bash command:
`docker exec -it kafka_kraft kafka-console-consumer --bootstrap-server localhost:9092 --topic weather --from-beginning`


## 9 Conclusion

In this exercise, you've successfully:
- Set up a Python environment with necessary dependencies.

- Configured and started a Kafka broker in KRaft mode using Docker.

- Created a Kafka producer script that fetches weather data from the OpenWeatherMap API every minute.

- Published the fetched data to a Kafka topic.

- Verified the data flow using a Kafka console consumer.

This setup forms the foundation for building real-time data pipelines and streaming applications using Kafka without the complexity of ZooKeeper. You can further enhance this project by:

- Adding Multiple Locations: Modify the script to fetch and produce data for multiple cities.

- Implementing Error Handling: Enhance the script to handle potential failures gracefully.

- Integrating with Other Systems: Use Kafka consumers to process and analyze the weather data in real-time.


Feel free to experiment and expand upon this project to suit your learning objectives!

Run the cell below to bring your Docker container down -

In [None]:
%%bash
docker compose down 