# Kafka Consumer


#### 1. Install required packages
We need to install a few Python packages to interact with Kafka, Event Hub, and stock data.

Run the following commands to install the necessary libraries:





In [0]:
%pip install requests
%pip install confluent_kafka
%pip install azure-eventhub

# Consumer 
#### Consuming Data from Kafka Topic (via Event Hubs)
Now, we need to consume the stock data from the Kafka topic in Azure Event Hubs.


In [0]:
!pip install confluent_kafka

Collecting confluent_kafka
  Downloading confluent_kafka-2.10.0-cp310-cp310-manylinux_2_28_x86_64.whl (3.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 MB 6.8 MB/s eta 0:00:00
Installing collected packages: confluent_kafka
Successfully installed confluent_kafka-2.10.0
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
from confluent_kafka import Consumer, KafkaException, KafkaError, TopicPartition
import json  # JSON module to deserialize the data
import pandas as pd  # Pandas for DataFrame
import time
import datetime

In [0]:
# Azure Event Hub connection details
# Global Offset Settings
group_id = "python-consumer"
event_hub_namespace = ""
event_hub_name = ""
event_hub_connection_string = ""
#OFFSET_TYPE = "earliest"  # Change to "latest" or "specific"
OFFSET_TYPE = "latest"  # Change to "earliest" or "specific"
SPECIFIC_OFFSET = None  # Change this to a number (e.g., 50) for a specific offset
POLL_INTERVAL = 12  # Interval between function calls in seconds
PARTITION = 0  # Change this if needed

In [0]:
# Kafka configuration for Event Hubs
conf = {
    'bootstrap.servers': f'{event_hub_namespace}.servicebus.windows.net:9093',
    'security.protocol': 'SASL_SSL',
    'sasl.mechanisms': 'PLAIN',
    'sasl.username': '$ConnectionString',
    'sasl.password': event_hub_connection_string,
    'group.id': group_id,
    "auto.offset.reset": OFFSET_TYPE,  # Uses global OFFSET_TYPE
    "enable.auto.commit": False  # Manually commit offsets
}

# Create a Kafka Consumer instance
consumer = Consumer(conf)

# Subscribe to the topic where data is being produced
consumer.subscribe([event_hub_name])

In [0]:

def consume_from_kafka():
    try:
        while True:
            # Poll for messages (timeout of 1 second)
            msg = consumer.poll(timeout=1.0)
            
            if msg is None:
                continue  # No message available, continue polling

            if msg.error():
                # Handle any errors that occur
                if msg.error().code() == KafkaError._PARTITION_EOF:
                    # End of partition reached
                    print(f"End of partition {msg.partition} reached at offset {msg.offset}")
                else:
                    raise KafkaException(msg.error())
            else:
                # Successfully received a message
                stock_data = json.loads(msg.value().decode('utf-8'))  # Deserialize the message to JSON
                print(f"Received stock data: {stock_data}")  # Print the consumed stock data

                # Introduce a 10-second delay before processing the next message
                time.sleep(10)  # Wait for 10 seconds before consuming the next message

    except KeyboardInterrupt:
        print("Terminating consumer...")
    finally:
        # Close the consumer connection
        consumer.close()

# Start consuming messages
consume_from_kafka()

Received stock data: {'ticker': 'AAPL', 'queryCount': 1, 'resultsCount': 1, 'adjusted': True, 'results': [{'T': 'AAPL', 'v': 68616943.0, 'vw': 195.9883, 'o': 199.17, 'c': 196.25, 'h': 199.44, 'l': 193.25, 't': 1746648000000, 'n': 753075}], 'status': 'OK', 'request_id': '5809c4020759db0ac84e4dfbdabf987d', 'count': 1}
Received stock data: {'ticker': 'AAPL', 'queryCount': 1, 'resultsCount': 1, 'adjusted': True, 'results': [{'T': 'AAPL', 'v': 68616943.0, 'vw': 195.9883, 'o': 199.17, 'c': 196.25, 'h': 199.44, 'l': 193.25, 't': 1746648000000, 'n': 753075}], 'status': 'OK', 'request_id': '9b291004b79f3af84d8a1a077fc99d3d', 'count': 1}
Received stock data: {'ticker': 'AAPL', 'queryCount': 1, 'resultsCount': 1, 'adjusted': True, 'results': [{'T': 'AAPL', 'v': 68616943.0, 'vw': 195.9883, 'o': 199.17, 'c': 196.25, 'h': 199.44, 'l': 193.25, 't': 1746648000000, 'n': 753075}], 'status': 'OK', 'request_id': '783a54b3c0fd00861f54ac8982ace980', 'count': 1}
Received stock data: {'ticker': 'AAPL', 'query

com.databricks.backend.common.rpc.CommandCancelledException
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$5(SequenceExecutionState.scala:136)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3(SequenceExecutionState.scala:136)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3$adapted(SequenceExecutionState.scala:133)
	at scala.collection.immutable.Range.foreach(Range.scala:158)
	at com.databricks.spark.chauffeur.SequenceExecutionState.cancel(SequenceExecutionState.scala:133)
	at com.databricks.spark.chauffeur.ExecContextState.cancelRunningSequence(ExecContextState.scala:717)
	at com.databricks.spark.chauffeur.ExecContextState.$anonfun$cancel$1(ExecContextState.scala:435)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.ExecContextState.cancel(ExecContextState.scala:435)
	at com.databricks.spark.chauffeur.ExecutionContextManagerV1.can

# Read Messages from Beginning
In this code consumer consume the data from beginning messages 

- The `consume_from_kafka` function continuously polls for new messages.
- When a message is received, it is deserialized from JSON and added to the `consumed_data` list.
- The data is displayed using **Pandas** for easy visualization in the notebook.

### Key Points for `consume_from_kafka()`:

1. **Offset Handling**:
   - The consumer starts consuming messages from the beginning (earliest offset) using the `auto.offset.reset` configuration.
   - This ensures that if the consumer is unable to find a valid offset, it will read from the earliest available message.

2. **Polling for Messages**:
   - The `poll()` method is used with a 1-second timeout, continuously checking for new messages from the Kafka topic.

3. **Error Handling**:
   - The code checks for any Kafka errors and handles them (e.g., reaching the end of a partition or other issues).

4. **Message Consumption & Processing**:
   - Successfully received messages are decoded from JSON and printed.
   - A delay of 10 seconds (`time.sleep(10)`) is added before consuming the next message.

5. **Graceful Termination**:
   - The consumer will stop gracefully on a keyboard interrupt, ensuring the connection is closed properly.
