### 1. Library Imports  (Packages to be installed - pymongo, confluent-kafka)
This code snippet is written in Python and it involves importing necessary modules for working with MongoDB and Kafka.

1. `from pymongo import MongoClient`: This line imports the `MongoClient` class from the `pymongo` module. The `MongoClient` class is used to establish a connection to a MongoDB database and perform operations on it.

2. `from confluent_kafka import Producer`: This line imports the `Producer` class from the `confluent_kafka` module. The `Producer` class is used to produce messages to a Kafka topic.

3. `import json`: This line imports the built-in `json` module in Python, which provides functions for encoding and decoding JSON data. This module is commonly used for working with JSON data in Python.

By importing these modules, you can now use the `MongoClient` class to interact with MongoDB databases and the `Producer` class to produce messages to Kafka topics in your Python code.


In [0]:
from pymongo import MongoClient
from confluent_kafka import Producer
import json

### 2. Configurations
Here we are establishing a connection to a MongoDB database and accessing a specific collection within that database using the `pymongo` module in Python.

1. `mongo_client = MongoClient('mongodb+srv://divinesam100:[your_password]@cluster0.daoynj5.mongodb.net/')`: This line creates a `MongoClient` object named `mongo_client` that connects to a MongoDB database hosted on the server `cluster0.daoynj5.mongodb.net`. The connection string includes the username `divinesam100` and a placeholder `[your_password]` for the password. You need to replace `[your_password]` with the actual password to authenticate and establish a connection to the MongoDB database.

2. `mongo_db = mongo_client['student_db']`: This line selects the database named `student_db` from the MongoDB server and assigns it to the variable `mongo_db`. This allows us to work with the collections within the `student_db` database.

3. `mongo_collection = mongo_db['info']`: This line selects a specific collection named `info` from the `student_db` database and assigns it to the variable `mongo_collection`. A collection in MongoDB is similar to a table in a relational database and stores documents (data records) in a structured format.

By executing these lines of code with the correct password provided in the connection string, you can establish a connection to the MongoDB database, select the `student_db` database, and access the `info` collection within that database for performing operations like inserting, updating, querying, or deleting documents.


In [0]:
mongo_client = MongoClient('mongodb+srv://divinesam100:Divinesam1..@cluster0.daoynj5.mongodb.net/')
mongo_db = mongo_client['student_db']
mongo_collection = mongo_db['info']

We are defining a dictionary named `kafka_conf` that contains configuration settings for connecting to a Kafka cluster using the `confluent_kafka` Python library.

1. `'bootstrap.servers': 'pkc-p11xm.us-east-1.aws.confluent.cloud:9092'`: This setting specifies the list of Kafka brokers (bootstrap servers) that the producer or consumer should connect to. In this case, the Kafka cluster is hosted at `pkc-p11xm.us-east-1.aws.confluent.cloud` on port `9092`.

2. `'security.protocol': 'SASL_SSL'`: This setting defines the security protocol to be used for communication with the Kafka cluster. In this case, it is set to `SASL_SSL`, which provides secure communication using SASL (Simple Authentication and Security Layer) over SSL (Secure Sockets Layer).

3. `'sasl.mechanisms': 'PLAIN'`: This setting specifies the SASL mechanism to be used for authentication. Here, it is set to `PLAIN`, which is a simple username/password authentication mechanism.

4. `'sasl.username': 'F6EYWWXMDPDQSNBE'`: This setting provides the username required for SASL authentication when connecting to the Kafka cluster.

5. `'sasl.password': 'qis/bvd/QNa6WLOQ6oCM5TNnGMsudIg2GulTtW4SM8QAo7t+j+lHdnFeCv0Z0wU3'`: This setting provides the password required for SASL authentication when connecting to the Kafka cluster. It is important to keep this password secure and not expose it in your code or any public repositories.

By setting up these configuration parameters in the `kafka_conf` dictionary, you can establish a secure connection to the specified Kafka cluster using the provided SASL credentials.


In [0]:
kafka_conf = {'bootstrap.servers': 'pkc-p11xm.us-east-1.aws.confluent.cloud:9092',
              'security.protocol': 'SASL_SSL',
              'sasl.mechanisms': 'PLAIN',
              'sasl.username': 'F6EYWWXMDPDQSNBE',
              'sasl.password': 'qis/bvd/QNa6WLOQ6oCM5TNnGMsudIg2GulTtW4SM8QAo7t+j+lHdnFeCv0Z0wU3'}

In [0]:
# Create Kafka producer
producer = Producer(kafka_conf)

In [0]:
# Kafka topic
topic = 'raw_data'

### 3. Produce Messages to the Kafka Topic
In this code snippet, a try-except block is used to handle the process of retrieving data from a MongoDB collection and producing messages to a Kafka topic. Here's an explanation of each part:

1. `try:`: The code within this block is executed, and any exceptions that occur during the execution are caught and handled in the `except` block.

2. `mongodb_data = mongo_collection.find()`: This line retrieves data from the MongoDB collection stored in the `mongo_collection` variable. The `find()` method is used to fetch all documents from the collection.

3. `for data in mongodb_data:`: This loop iterates over each document retrieved from the MongoDB collection.

4. `message = json.dumps(data)`: Each document is converted to a JSON string using the `json.dumps()` method to prepare it for sending as a message to the Kafka topic.

5. `producer.produce(topic, message.encode('utf-8'))`: The `produce()` method of the Kafka producer is used to send the JSON-encoded message to the specified Kafka topic. The message is encoded to UTF-8 format before sending.

6. `producer.flush()`: The `flush()` method is called on the Kafka producer to ensure that all produced messages are sent to the Kafka cluster.

7. `print("Messages produced successfully.")`: If the messages are successfully produced to the Kafka topic without any exceptions, this message is printed to the console.

8. `except Exception as e:`: If an exception occurs during the try block execution, the code jumps to this block to handle the exception.

9. `print(f"Error producing messages: {e}")`: If an exception occurs, this line prints an error message indicating that there was an issue producing messages to the Kafka topic, along with the specific exception message (`e`).

Overall, this code snippet demonstrates the process of fetching data from a MongoDB collection, converting it to JSON, and then producing the JSON data as messages to a Kafka topic. Any errors encountered during this process are caught and handled appropriately.


In [0]:
try:
    # Retrieve data from MongoDB collection
    mongodb_data = mongo_collection.find()

    # Produce messages to the Kafka topic
    for data in mongodb_data:
        message = json.dumps(data)
        producer.produce(topic, message.encode('utf-8'))

    # Flush the producer to ensure all messages are sent
    producer.flush()

    print("Messages produced successfully.")
except Exception as e:
    print(f"Error producing messages: {e}")

Messages produced successfully.
