
### Overview
This guide outlines the procedure for consuming and validating data from a Kafka topic. It is designed to ensure that data flowing into Kafka is correctly formatted and intact, which is crucial for downstream processing and analytics. This process involves connecting to Kafka, consuming data, performing validations, and handling potential errors effectively.

### Prerequisites
Ensure the following prerequisites are met before running this notebook:
- **Kafka Setup:** Confirm that Kafka is set up and data is being produced into a Kafka topic. This should have been completed in the notebook **01 Kafka Getting Started - Produce Messages**.

### Validate Data using Consumer
Follow these steps to consume and validate the data in your Kafka topic:

1. **Connect to Confluent Kafka:**
   - Establish a connection to your Confluent Kafka environment. Ensure that you have the necessary configurations and credentials to connect successfully.

2. **Consume Data:**
   - Utilize the relevant APIs to start consuming messages from the Kafka topic.
   - You may want to write a script or use existing tools to consume a few messages as a test.

3. **Data Validation:**
   - Verify the integrity and accuracy of the data consumed from the Kafka topic.
   - Check for specific attributes or values in the messages to ensure they meet the expected criteria.

4. **Logging and Output:**
   - Optionally, log the results of your validation or output them to a file or screen to review the consumed messages.

### Additional Notes
- Remember to handle any errors or exceptions that may occur during the connection or consumption processes.
- Ensure that your consumer script is configured to connect to the correct Kafka cluster and topic.

### Conclusion
Implementing thorough validation checks on data consumed from Kafka not only ensures data quality but also mitigates the risks associated with data corruption or misconfiguration. This step-by-step guide provides a systematic approach to verify and validate data integrity in real-time streaming applications.

In [0]:
from confluent_kafka import Consumer, KafkaError

In [0]:
# Kafka and Confluent Cloud Configuration
kafka_bootstrap_servers = "pkc-rgm37.us-west-2.aws.confluent.cloud:9092"
kafka_topic = "users"
kafka_api_key = "HHYHHAHFHYVIJPOH"
kafka_api_secret = "ep304/Y9c+b7wOslz/1r0SDDuqzFZC+5WZMbLFUILg/l+2URJMcYTy7V1erTv74I"

In [0]:
# Consumer configuration
conf = {
    'bootstrap.servers': kafka_bootstrap_servers,
    'sasl.mechanisms': 'PLAIN',
    'security.protocol': 'SASL_SSL',
    'sasl.username': kafka_api_key,
    'sasl.password': kafka_api_secret,
    'group.id': 'user-consumer-group',
    'auto.offset.reset': 'latest'
}

In [0]:
# Create Consumer instance
consumer = Consumer(conf)

In [0]:
# Subscribe to topic
consumer.subscribe([kafka_topic])

In [0]:
# Read messages from Kafka
try:
    while True:
        msg = consumer.poll(1.0)  # Timeout of 1 second

        if msg is None:
            continue
        if msg.error():
            if msg.error().code() == KafkaError._PARTITION_EOF:
                # End of partition event
                print(f"{msg.topic()} [{msg.partition()}] reached end at offset {msg.offset()}")
            elif msg.error():
                raise KafkaException(msg.error())
        else:
            # Proper message
            print(f"Received message: {msg.value().decode('utf-8')}")

except KeyboardInterrupt:
    pass
finally:
    # Close down consumer to commit final offsets.
    consumer.close()


Received message: {"user_id": "e8d0a0bf-d01a-4714-aaac-cb8d87f2d17e", "user_first_name": "Shane", "user_last_name": "Dickerson", "user_email": "wholland@example.net", "created_ts": "2024-02-29T01:37:35.460836", "last_updated_ts": "2025-04-13T22:58:22.186992"}
Received message: {"user_id": "686f0b87-7edc-4ad0-b812-fd3f30b394bb", "user_first_name": "Tyler", "user_last_name": "Murphy", "user_email": "charlesramirez@example.com", "created_ts": "2024-05-12T07:23:47.512102", "last_updated_ts": "2024-12-25T14:24:57.735028"}
Received message: {"user_id": "17c1fae4-abf3-43ca-abac-b3233d7e53d6", "user_first_name": "Christine", "user_last_name": "Campbell", "user_email": "kyle52@example.com", "created_ts": "2024-03-09T08:51:31.272298", "last_updated_ts": "2024-11-30T21:18:32.002136"}
Received message: {"user_id": "9d82664d-d0ed-4d5c-a0bb-49eb971e234d", "user_first_name": "Bianca", "user_last_name": "Martinez", "user_email": "angela38@example.com", "created_ts": "2022-08-06T20:40:24.707969", "last