## Consuming data using Kafka and Visualise (20%)
In this task, we will implement an Apache Kafka consumer to consume the data from Part 2.  
  
Important:   
-	In this part, Kafka consumers are used to consume the streaming data published from task 2.8.  
-	This visualisation part doesn’t require parallel processing, so please do not use Spark. It’s OK to use Pandas or any Python library to do simple calculations for the visualisation.  

In [None]:
!pip install kafka-python pandas matplotlib seaborn



## 1. (Basic plot) Plot a diagram to show data from 6a (i.e. every 15 seconds, plot the total number of revenues for each type of order.) You are free to choose the type of plot.

In [None]:
from kafka import KafkaConsumer
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import time

# Kafka Configurations
kafka_bootstrap_servers = "kafka:9092"
kafka_topic_6a = "time_windowed_revenue_topic"

# Initialize Kafka Consumer with a timeout (10 seconds)
consumer = KafkaConsumer(
    kafka_topic_6a,
    bootstrap_servers=kafka_bootstrap_servers,
    auto_offset_reset="earliest",
    enable_auto_commit=True,
    value_deserializer=lambda x: json.loads(x.decode("utf-8")),
    consumer_timeout_ms=10000  # Timeout after 10 seconds if no data
)

# Data Storage
data_list = []
start_time = time.time()

# Consume messages from Kafka
for message in consumer:
    data = message.value
    time_window_start = data["time_window"]["start"]
    order_type = data["type_of_order"]
    revenue = data["total_revenue"]

    # Store data in list
    data_list.append({
        "time_window": time_window_start,
        "type_of_order": order_type,
        "total_revenue": revenue
    })

    # Stop after collecting enough data
    if len(data_list) > 50 or (time.time() - start_time) > 15:  # Max 15 seconds
        break

# Close consumer
consumer.close()

# Print status
print(f"Collected {len(data_list)} messages.")

# If no data, exit
if not data_list:
    print("No data received from Kafka.")
    exit()


In [1]:
import pandas as pd
# Convert to Pandas DataFrame
df = pd.DataFrame(data_list)

# Convert time_window to datetime for plotting
df["time_window"] = pd.to_datetime(df["time_window"])

# Display first few rows
print(df.head())



NameError: name 'data_list' is not defined

In [None]:
# Set up the visualization
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x="time_window", y="total_revenue", hue="type_of_order", marker="o")

# Formatting
plt.xlabel("Time Window (Every 15s)")
plt.ylabel("Total Revenue")
plt.title("Total Revenue per Order Type (Every 15 Seconds)")
plt.xticks(rotation=45)
plt.legend(title="Order Type")
plt.grid(True)

# Show plot
plt.show()


### 2.	(Advanced plot) Plot a choropleth or bubble map to visualise data from 6b (restaurant’s suburb-based order count for <=15 and >15 minutes; you may use different colors or subplots.).  
Choropleth: https://python-graph-gallery.com/choropleth-map/  
Bubble Map: https://python-graph-gallery.com/bubble-map/  
Note: Both plots shall be real-time plots, which will be updated if new streaming data comes in from part 2. For the advanced plot, if you need additional data for the plots, you can add them in part 2.  

In [None]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt

# Load the data from the saved parquet file
delivery_time_aggregation_df = pd.read_parquet("parquet/delivery_time_aggregation")

# Load the map data (Assuming we have a shapefile with suburb boundaries)
suburb_map = gpd.read_file("suburb_boundaries.shp")

# Aggregate counts per suburb
suburb_aggregated = delivery_time_aggregation_df.groupby("restaurant_suburb").sum().reset_index()

# Merge with the geographical data
merged = suburb_map.merge(suburb_aggregated, left_on="suburb_name", right_on="restaurant_suburb")

# Create the figure
fig, ax = plt.subplots(1, 2, figsize=(15, 7))

# Plot orders <= 15 minutes
merged.plot(column="count_under_15", cmap="Blues", linewidth=0.8, edgecolor="black", legend=True, ax=ax[0])
ax[0].set_title("Orders Delivered in ≤15 Minutes")
ax[0].set_axis_off()

# Plot orders > 15 minutes
merged.plot(column="count_over_15", cmap="Reds", linewidth=0.8, edgecolor="black", legend=True, ax=ax[1])
ax[1].set_title("Orders Delivered in >15 Minutes")
ax[1].set_axis_off()

# Show the plot
plt.tight_layout()
plt.show()
