# Azure Event Hubs
Azure Event Hubs is a fully managed, cloud-based event streaming platform powered by Microsoft Azure. It acts as a hub to collect, consume, and distribute real-time event data from multiple sources, enabling applications to respond to events as they occur. Azure Event Hubs excels in providing reliability, scalability and operational efficiency, making it the first choice for businesses and developers and the fastest data in the world.

![Event Hubs Architecture](https://learn.microsoft.com/en-us/azure/event-hubs/media/event-hubs-about/components.png)

## Key Concepts

- **Producer applications**: These are programs that send data to an event hub. They use special tools or libraries (like Event Hubs SDK or Kafka producer) to send the data.

- **Namespace**: A namespace is like a folder where you keep multiple event hubs or Kafka topics. It helps manage things like security, capacity, and disaster recovery.

- **Event Hubs/Kafka topic**: An event hub or Kafka topic is where your data (events) is stored. It works like a log, keeping data in order. It can be split into smaller parts called partitions.

- **Partitions**: Partitions help scale (or increase) the capacity of the event hub. Think of partitions as lanes on a highway. If you need more space for data, you add more lanes (partitions).

- **Consumer applications**: These are programs that read the data from the event hub. They keep track of where they left off in the data and continue from that point.

- **Consumer group**: A group of consumers that can independently read data from the same event hub or Kafka topic. Each consumer in the group reads at its own speed and keeps track of its own position in the data.

#### 1. Install required packages
We need to install a few Python packages to interact with Kafka, Event Hub, and stock data.

Run the following commands to install the necessary libraries:





In [0]:
%pip install requests
%pip install confluent_kafka
%pip install azure-eventhub

These packages are essential for:

- Fetching stock data from external APIs.
- Interacting with Kafka to send and consume messages.
- Sending data to Azure Event Hub.

#### 2. Fetching Real-Time Stock Data
To fetch stock data, we will use the Polygon.io API. This API provides real-time market data for various securities.

In [0]:
import time
import requests

In [0]:

ticker = "AAPL"
URL = f"https://api.polygon.io/v2/aggs/ticker/{ticker}/prev"
Params = {
        "apiKey": "d55f0pA_khGbgp_h6ZFBbf8qW9D3EsDq"
   }

In [0]:

# Global variables
MAX_API_CALLS_PER_MINUTE = 3 # maximum number of API calls allowed in one minute
CALLS_MADE = 0 #track of how many API calls have been made in the current minute
START_TIME = time.time()  # Initialize it at the beginning of the script

# GET request to the Polygon API u
# sing the ticker symbol (like "AAPL") 
# and returns the stock data in JSON format


def get_stock_data(ticker):
    url = URL
    params = Params

    # Make the API call
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        data = response.json()
        return data
    else:
        print(f"Error: {response.status_code}")
        return None

# function ensures that no more than 5 API calls are made per minute. 
# If the call limit is exceeded, it waits until the next minute to make another request. 
# This function also tracks the number of API calls and resets the count when a new minute begins.
def rate_limited_get_stock_data(ticker):
    global CALLS_MADE, START_TIME  # Ensure these are accessed globally
    
    current_time = time.time()

    # Check if the time window has passed (60 seconds)
    if current_time - START_TIME >= 60:
        CALLS_MADE = 0  # Reset call count
        START_TIME = current_time  # Reset the start time for the new minute

    # Ensure we don't exceed the rate limit
    if CALLS_MADE >= MAX_API_CALLS_PER_MINUTE:
        print(f"Rate limit reached. Waiting until the next minute...")
        time.sleep(60 - (current_time - START_TIME))  # Wait until the next minute

    # Make the API call and track the number of calls
    data = get_stock_data(ticker)
    if data:
        CALLS_MADE += 1
    return data

# Example to get data for AAPL

stock_data = rate_limited_get_stock_data(ticker)

# Display the result
print(stock_data)

{'ticker': 'AAPL', 'queryCount': 1, 'resultsCount': 1, 'adjusted': True, 'results': [{'T': 'AAPL', 'v': 36453923.0, 'vw': 198.6614, 'o': 199, 'c': 198.53, 'h': 200.5399, 'l': 197.535, 't': 1746820800000, 'n': 423067}], 'status': 'OK', 'request_id': 'c00ea33b517f00b0158a041b0cc98bcd', 'count': 1}


- The function `rate_limited_get_stock_data` ensures that no more than 3 API calls are made per minute.
- If the API call limit is reached, the function waits for the next minute to continue fetching data.

## Integration with Kafka and Azure Event Hubs

Apache Kafka Integration with Azure Event Hubs
Azure Event Hubs is a versatile event streaming engine supporting AMQP, Apache Kafka, and HTTPS protocols. By natively supporting Kafka, Event Hubs allows seamless migration of Kafka workloads without changes to existing code or the need to manage Kafka clusters.

For more information, see [Azure Event Hubs for Apache Kafka.](https://learn.microsoft.com/en-us/azure/event-hubs/azure-event-hubs-apache-kafka-overview)

#### Azure Schema Registry
Schema Registry in Event Hubs simplifies schema management for streaming applications by ensuring consistent data exchange between producers and consumers. It supports schema evolution, validation, and governance, enhancing data compatibility and efficiency.


![Kafka and Event Hub](https://learn.microsoft.com/en-us/azure/event-hubs/media/event-hubs-about/schema-registry.png)



#### 3. Sending Data to Kafka (via Event Hubs)
Here’s how to send the stock data fetched from the API to Azure Event Hubs using Kafka:

In [0]:
from confluent_kafka import Producer  
import json 

In [0]:
# Azure Event Hub connection details
event_hub_namespace = "Transformatics-eventhubs"  
event_hub_name = ""  
# Shared access key for connecting to Event Hub
event_hub_connection_string =  ""

In [0]:
# Kafka configuration for Event Hubs
conf = {
    'bootstrap.servers': f'{event_hub_namespace}.servicebus.windows.net:9093',  # The Event Hub broker address
    'security.protocol': 'SASL_SSL',  # Security protocol to use for SSL encryption
    'sasl.mechanisms': 'PLAIN',  # SASL authentication mechanism
    'sasl.username': '$ConnectionString',  # The username is the connection string key
    'sasl.password': event_hub_connection_string,  # The connection string (password) for authentication
    'client.id': 'python-producer'  # Client ID for the Kafka producer
}

# Create a Kafka Producer instance
producer = Producer(conf)  # The Producer object will be responsible for sending messages to Kafka

def send_to_kafka(topic, data):
    """Send the stock data to Kafka (Event Hub)"""
    try:
        # The data must be serialized to JSON format for transmission
        producer.produce(topic, key="stock-data", value=json.dumps(data))  # Send the stock data with 'stock-data' as the key
        producer.flush()  # Ensure that all messages are sent to Kafka
        print(f"Data sent to Kafka topic: {topic}")  # Print confirmation message
        print(stock_data)  # Optionally print the stock data that was sent
    except Exception as e:
        # Handle errors that may occur while sending data
        print(f"Error sending data to Kafka: {e}")

In [0]:
while True:
    stock_data = rate_limited_get_stock_data(ticker)
    if stock_data:
        send_to_kafka(event_hub_name, stock_data)
    time.sleep(10)  # Optional: Wait 10 seconds before next request

Data sent to Kafka topic: zong-training
{'ticker': 'AAPL', 'queryCount': 1, 'resultsCount': 1, 'adjusted': True, 'results': [{'T': 'AAPL', 'v': 36453923.0, 'vw': 198.6614, 'o': 199, 'c': 198.53, 'h': 200.5399, 'l': 197.535, 't': 1746820800000, 'n': 423067}], 'status': 'OK', 'request_id': '1e34a03203482b0718c8eed31db6e465', 'count': 1}
Data sent to Kafka topic: zong-training
{'ticker': 'AAPL', 'queryCount': 1, 'resultsCount': 1, 'adjusted': True, 'results': [{'T': 'AAPL', 'v': 36453923.0, 'vw': 198.6614, 'o': 199, 'c': 198.53, 'h': 200.5399, 'l': 197.535, 't': 1746820800000, 'n': 423067}], 'status': 'OK', 'request_id': 'a6c6a809bcfb9288b24ab66b1b8d2005', 'count': 1}


com.databricks.backend.common.rpc.CommandCancelledException
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$5(SequenceExecutionState.scala:136)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3(SequenceExecutionState.scala:136)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3$adapted(SequenceExecutionState.scala:133)
	at scala.collection.immutable.Range.foreach(Range.scala:158)
	at com.databricks.spark.chauffeur.SequenceExecutionState.cancel(SequenceExecutionState.scala:133)
	at com.databricks.spark.chauffeur.ExecContextState.cancelRunningSequence(ExecContextState.scala:717)
	at com.databricks.spark.chauffeur.ExecContextState.$anonfun$cancel$1(ExecContextState.scala:435)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.ExecContextState.cancel(ExecContextState.scala:435)
	at com.databricks.spark.chauffeur.ExecutionContextManagerV1.can

- We configure the Kafka producer to connect to Azure Event Hubs via **SASL_SS**.
- The function `send_to_kafka` sends stock data to the specified Event Hub topic (via Kafka).
- The `produce` method sends the message with a `stock-data` key, ensuring it’s partitioned.
