### Imports
- The code snippet imports the `Consumer` and `KafkaException` classes from the `confluent_kafka` library.
- It imports the `MongoClient` class from the `pymongo` library.
- It also imports the `json` module.
- These imports allow the code to work with consuming messages from Kafka, connecting to a MongoDB database, and handling JSON data.


In [0]:
from confluent_kafka import Consumer, KafkaException
from pymongo import MongoClient
import json

### Configurations
This code snippet defines a configuration dictionary `kafka_conf` for a Kafka consumer. Here is an explanation of the key-value pairs in the dictionary:

- `'bootstrap.servers': 'pkc-p11xm.us-east-1.aws.confluent.cloud:9092'`: This specifies the Kafka broker's address that the consumer will connect to.
- `'group.id': 'stdtrdt'`: This sets the consumer group ID, which is used to identify the consumer group to which this consumer belongs.
- `'auto.offset.reset': 'earliest'`: This determines where the consumer should start reading messages when there is no initial offset or the current offset is invalid.
- `'security.protocol': 'SASL_SSL'`: This specifies the security protocol to be used for communication with the Kafka broker.
- `'sasl.mechanisms': 'PLAIN'`: This sets the SASL mechanism to be used for authentication.
- `'sasl.username': 'F6EYWWXMDPDQSNBE'`: This is the SASL username used for authentication.
- `'sasl.password': 'qis/bvd/QNa6WLOQ6oCM5TNnGMsudIg2GulTtW4SM8QAo7t+j+lHdnFeCv0Z0wU3'`: This is the SASL password used for authentication.

These configuration settings are essential for setting up a Kafka consumer that can connect to a Kafka broker securely and consume messages from specified topics.


In [0]:
# Kafka consumer configuration
kafka_conf = {
    'bootstrap.servers': 'pkc-p11xm.us-east-1.aws.confluent.cloud:9092',
    'group.id': 'stdtrdt',
    'auto.offset.reset': 'earliest',
    'security.protocol': 'SASL_SSL',
    'sasl.mechanisms': 'PLAIN',
    'sasl.username': 'F6EYWWXMDPDQSNBE',
    'sasl.password': 'qis/bvd/QNa6WLOQ6oCM5TNnGMsudIg2GulTtW4SM8QAo7t+j+lHdnFeCv0Z0wU3'
}

In [0]:
# Kafka topic to consume from
topic = 'transformed_data'

In [0]:
# MongoDB configuration
mongo_uri = 'mongodb+srv://divinesam100:Divinesam1.@cluster0.daoynj5.mongodb.net/'
mongo_db_name = 'student_db'
mongo_collection_name = 'student_transformed_data'

In [0]:
# Create Kafka consumer
consumer = Consumer(kafka_conf)

In [0]:
# Subscribe to the Kafka topic
consumer.subscribe([topic])

In [0]:
# Create MongoDB client
mongo_client = MongoClient(mongo_uri)

In [0]:
# Get the MongoDB database and collection
mongo_db = mongo_client[mongo_db_name]
mongo_collection = mongo_db[mongo_collection_name]

### Consume messages from a Kafka Topic to a MongoDB Collection
In this code snippet, a try-except-finally block is used to continuously poll for new messages from a Kafka topic using a Kafka consumer and then insert these messages into a MongoDB collection. Here's an explanation of each part:

1. `try:`: The code within this block is executed, and any exceptions that occur during the execution are caught and handled in the `except` block.

2. `while True:`: This creates an infinite loop to continuously poll for new messages from the Kafka topic.

3. `msg = consumer.poll(timeout=1.0)`: The `poll()` method of the Kafka consumer is used to fetch messages from the Kafka topic. The `timeout` parameter specifies the maximum time (1.0 second in this case) to wait for new messages.

4. `if msg is None:`: If no message is received during the polling, the loop continues to the next iteration.

5. `if msg.error():`: Checks if there is an error in the message received from Kafka.

6. `if msg.error().code() == KafkaError._PARTITION_EOF:`: If the error code indicates that it is the end of a partition, the loop continues to the next iteration.

7. `else:`: If the error is not due to the end of a partition, a KafkaException is raised with the error message.

8. `message_value = msg.value().decode('utf-8')`: The message value, which is in bytes, is decoded to a string using UTF-8 encoding.

9. `message_data = json.loads(message_value)`: The JSON message is parsed and converted into a Python dictionary using `json.loads()`.

10. `mongo_collection.insert_one(message_data)`: The message data is inserted into a MongoDB collection using `insert_one()` method.

11. `print(f"Inserted message into MongoDB: {message_data}")`: A message is printed to indicate that the message has been successfully inserted into the MongoDB collection.

12. `except KeyboardInterrupt:`: If a keyboard interrupt (Ctrl+C) is detected, the code jumps to this block to handle the interruption.

13. `finally:`: The code within this block is always executed, regardless of whether an exception occurs or not.

14. `consumer.close()`: The Kafka consumer is closed to release resources and cleanly shut down the consumer.

This code snippet demonstrates the process of continuously consuming messages from a Kafka topic, decoding and parsing the messages, and then inserting the message data into a MongoDB collection. The code also includes handling for keyboard interrupts and properly closing the Kafka consumer.


In [0]:
try:
    while True:
        # Poll for new messages from Kafka
        msg = consumer.poll(timeout=1.0)
        if msg is None:
            continue
        if msg.error():
            if msg.error().code() == KafkaError._PARTITION_EOF:
                # End of partition
                continue
            else:
                raise KafkaException(msg.error())
        
        # Decode the message value from bytes to string
        message_value = msg.value().decode('utf-8')
        
        # Parse the JSON message
        message_data = json.loads(message_value)
        
        # Insert the message data into MongoDB collection
        mongo_collection.insert_one(message_data)
        print(f"Inserted message into MongoDB: {message_data}")

except KeyboardInterrupt:
    pass

finally:
    # Close Kafka consumer
    consumer.close()

Inserted message into MongoDB: {'Student_id': 111, 'Name': 'Alice Smith', 'Age': 16, 'Grade': '10th', 'Attendance': 18, 'Marks_outof350': 253, '_id': ObjectId('6619d70702be31e92c52a180')}
Inserted message into MongoDB: {'Student_id': 112, 'Name': 'Bob Johnson', 'Age': 17, 'Grade': '11th', 'Attendance': 20, 'Marks_outof350': 262, '_id': ObjectId('6619d70902be31e92c52a181')}
Inserted message into MongoDB: {'Student_id': 113, 'Name': 'Charlie Lee', 'Age': 15, 'Grade': '9th', 'Attendance': 15, 'Marks_outof350': 228, '_id': ObjectId('6619d70902be31e92c52a182')}
Inserted message into MongoDB: {'Student_id': 114, 'Name': 'David Williams', 'Age': 18, 'Grade': '12th', 'Attendance': 19, 'Marks_outof350': 242, '_id': ObjectId('6619d70902be31e92c52a183')}
Inserted message into MongoDB: {'Student_id': 115, 'Name': 'Emily Brown', 'Age': 16, 'Grade': '10th', 'Attendance': 17, 'Marks_outof350': 258, '_id': ObjectId('6619d70902be31e92c52a184')}
Inserted message into MongoDB: {'Student_id': 116, 'Name':