# Lab 6: Trino SQL Engine - Distributed Query Engine

## 🎯 Objectives
- Understand Trino SQL engine architecture
- Learn to query multiple data sources with SQL
- Practice federated queries across databases
- Implement data lakehouse analytics
- Optimize query performance

## 📋 Prerequisites
- Complete Labs 1-5
- All database containers are running
- Sample data is loaded
- Basic SQL knowledge

## 🏗️ Architecture Overview
Trino is a distributed SQL query engine designed to query large datasets distributed over one or more heterogeneous data sources. It allows you to:
- Query data from multiple sources with a single SQL statement
- Perform federated queries across different databases
- Access data lakes and data warehouses
- Scale horizontally for high performance

## 📊 Use Case: Multi-Source Analytics Platform
We'll build an analytics platform that can query:
- **MongoDB** (Document Store) - Product catalog
- **Neo4j** (Graph Database) - User relationships  
- **Redis** (Key-Value Store) - Session data
- **PostgreSQL** (Relational) - Order transactions
- **CSV Files** (Data Lake) - Historical data


In [1]:
# Install Trino Python client
%pip install trino

# Import required libraries
import trino
import pandas as pd
import json
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns

print("✅ Trino client installed successfully!")

# Real Trino Connection Setup
# Uncomment the following lines to connect to a real Trino cluster

"""
# Connect to Trino cluster
conn = trino.dbapi.connect(
    host='localhost',
    port=8080,
    user='admin',
    catalog='mongodb',  # Default catalog
    schema='ecommerce'   # Default schema
)

# Test connection
cursor = conn.cursor()
cursor.execute('SELECT 1')
result = cursor.fetchone()
print(f"✅ Connected to Trino cluster! Test query result: {result}")
"""

print("📝 Note: To use real Trino, uncomment the connection code above")
print("📝 For now, we'll use a simulator to demonstrate concepts")


Note: you may need to restart the kernel to use updated packages.
✅ Trino client installed successfully!
📝 Note: To use real Trino, uncomment the connection code above
📝 For now, we'll use a simulator to demonstrate concepts


In [2]:
# Real Trino Connection Setup
# Connect to actual Trino cluster

import trino
import pandas as pd
import json
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns

class TrinoClient:
    """
    Real Trino client for connecting to Trino cluster
    """
    
    def __init__(self, host='localhost', port=8080, user='admin'):
        self.host = host
        self.port = port
        self.user = user
        self.conn = None
    
    def connect(self):
        """Connect to Trino cluster"""
        try:
            self.conn = trino.dbapi.connect(
                host=self.host,
                port=self.port,
                user=self.user,
                catalog='mongodb',  # Default catalog
                schema='ecommerce'   # Default schema
            )
            print(f"✅ Connected to Trino cluster at {self.host}:{self.port}")
            return True
        except Exception as e:
            print(f"❌ Failed to connect to Trino: {e}")
            print("📝 Make sure Trino cluster is running with: docker-compose -f docker-compose-trino.yml up")
            return False
    
    def execute_query(self, query: str):
        """Execute SQL query and return results"""
        if not self.conn:
            print("❌ Not connected to Trino. Call connect() first.")
            return None
        
        try:
            cursor = self.conn.cursor()
            cursor.execute(query)
            
            # Get column names
            columns = [desc[0] for desc in cursor.description]
            
            # Fetch all results
            results = cursor.fetchall()
            
            # Convert to DataFrame
            df = pd.DataFrame(results, columns=columns)
            
            print(f"🔍 Query executed successfully!")
            print(f"📊 Returned {len(df)} rows")
            return df
            
        except Exception as e:
            print(f"❌ Query failed: {e}")
            return None
    
    def show_catalogs(self):
        """Show available catalogs"""
        query = "SHOW CATALOGS"
        return self.execute_query(query)
    
    def show_schemas(self, catalog: str):
        """Show schemas in catalog"""
        query = f"SHOW SCHEMAS FROM {catalog}"
        return self.execute_query(query)
    
    def show_tables(self, catalog: str, schema: str):
        """Show tables in schema"""
        query = f"SHOW TABLES FROM {catalog}.{schema}"
        return self.execute_query(query)

# Initialize Trino client
trino_client = TrinoClient()

# Try to connect to real Trino cluster
if trino_client.connect():
    print("🎉 Successfully connected to Trino cluster!")
    
    # Show available catalogs
    print("\n📚 Available Catalogs:")
    catalogs_df = trino_client.show_catalogs()
    if catalogs_df is not None:
        print(catalogs_df)
    
    # Show schemas for each catalog
    for catalog in ['mongodb', 'postgresql']:
        print(f"\n📁 Schemas in {catalog}:")
        schemas_df = trino_client.show_schemas(catalog)
        if schemas_df is not None:
            print(schemas_df)
else:
    print("⚠️  Using simulator mode. Start Trino cluster to use real queries.")
    print("📝 Run: docker-compose -f docker-compose-trino.yml up")


✅ Connected to Trino cluster at localhost:8080
🎉 Successfully connected to Trino cluster!

📚 Available Catalogs:
🔍 Query executed successfully!
📊 Returned 3 rows
      Catalog
0     mongodb
1  postgresql
2      system

📁 Schemas in mongodb:
🔍 Query executed successfully!
📊 Returned 2 rows
               Schema
0           ecommerce
1  information_schema

📁 Schemas in postgresql:
🔍 Query executed successfully!
📊 Returned 3 rows
               Schema
0  information_schema
1          pg_catalog
2              public


## Exercise 1: Single-Source Queries

Let's start with basic queries on individual data sources to understand the data structure.


In [3]:
# Exercise 1: Single-Source Queries

# 1. Query MongoDB - Product Catalog
print("🔍 Querying MongoDB - Product Catalog")
mongodb_query = """
SELECT 
    productId,
    name,
    category,
    price,
    inventory.available as available_stock,
    metadata.reviews.averageRating as rating
FROM mongodb.ecommerce.products 
WHERE category = 'electronics'
ORDER BY price DESC
LIMIT 10
"""

result1 = trino_client.execute_query(mongodb_query)
if result1 is not None:
    print("📊 MongoDB Results:")
    print(result1)
print()

# 2. Query PostgreSQL - Available Tables
print("🔍 Querying PostgreSQL - Available Tables")
postgresql_query = """
SELECT table_name, table_type 
FROM postgresql.information_schema.tables 
WHERE table_schema = 'public'
ORDER BY table_name
"""

result2 = trino_client.execute_query(postgresql_query)
if result2 is not None:
    print("📊 PostgreSQL Tables:")
    print(result2)
print()

# 3. Show Available Catalogs
print("🔍 Available Catalogs:")
catalogs_df = trino_client.show_catalogs()
if catalogs_df is not None:
    print("📊 Available Catalogs:")
    print(catalogs_df)
print()

# 4. Show MongoDB Schemas and Tables
print("🔍 MongoDB Schema Information:")
schemas_df = trino_client.show_schemas('mongodb')
if schemas_df is not None:
    print("📊 MongoDB Schemas:")
    print(schemas_df)
    
    # Show tables in ecommerce schema
    tables_df = trino_client.show_tables('mongodb', 'ecommerce')
    if tables_df is not None:
        print("📊 MongoDB Tables in ecommerce schema:")
        print(tables_df)
print()

# 5. Show PostgreSQL Schemas and Tables
print("🔍 PostgreSQL Schema Information:")
schemas_df = trino_client.show_schemas('postgresql')
if schemas_df is not None:
    print("📊 PostgreSQL Schemas:")
    print(schemas_df)
    
    # Show tables in public schema
    tables_df = trino_client.show_tables('postgresql', 'public')
    if tables_df is not None:
        print("📊 PostgreSQL Tables in public schema:")
        print(tables_df)
print()

print("✅ Single-source queries completed!")


🔍 Querying MongoDB - Product Catalog
🔍 Query executed successfully!
📊 Returned 2 rows
📊 MongoDB Results:
  productId                name     category     price  available_stock  \
0  prod_001       iPhone 15 Pro  electronics  24000000               49   
1  prod_002  Samsung Galaxy S24  electronics  22000000               29   

   rating  
0     4.8  
1     4.6  

🔍 Querying PostgreSQL - Available Tables
🔍 Query executed successfully!
📊 Returned 3 rows
📊 PostgreSQL Tables:
      table_name  table_type
0      customers  BASE TABLE
1  order_summary  BASE TABLE
2         orders  BASE TABLE

🔍 Available Catalogs:
🔍 Query executed successfully!
📊 Returned 3 rows
📊 Available Catalogs:
      Catalog
0     mongodb
1  postgresql
2      system

🔍 MongoDB Schema Information:
🔍 Query executed successfully!
📊 Returned 2 rows
📊 MongoDB Schemas:
               Schema
0           ecommerce
1  information_schema
🔍 Query executed successfully!
📊 Returned 2 rows
📊 MongoDB Tables in ecommerce schema:
   

## Exercise 2: Federated Queries

Now let's perform federated queries that join data across multiple sources.


## Exercise 2: Federated Queries

Now let's perform federated queries that combine data from multiple databases using Trino's distributed SQL capabilities.


In [4]:
# Exercise 2: Federated Queries

# 1. Cross-Database User Analysis
print("🔍 Cross-Database User Analysis")
print("Combining MongoDB products with PostgreSQL orders")
federated_query1 = """
SELECT 
    c.customer_id,
    c.name as customer_name,
    c.customer_tier,
    COUNT(o.order_id) as total_orders,
    SUM(o.total_amount) as total_spent,
    AVG(o.total_amount) as avg_order_value,
    MAX(o.order_date) as last_order_date
FROM postgresql.public.customers c
LEFT JOIN postgresql.public.orders o ON c.customer_id = o.user_id
GROUP BY c.customer_id, c.name, c.customer_tier
ORDER BY total_spent DESC
"""

result1 = trino_client.execute_query(federated_query1)
if result1 is not None:
    print("📊 Customer Analysis Results:")
    print(result1)
print()

# 2. Product Performance Analysis
print("🔍 Product Performance Analysis")
print("Combining MongoDB product catalog with PostgreSQL order data")
federated_query2 = """
SELECT 
    p.productId,
    p.name as product_name,
    p.category,
    p.price as current_price,
    p.inventory.available as available_stock,
    COUNT(o.order_id) as times_ordered,
    SUM(o.quantity) as total_quantity_sold,
    SUM(o.total_amount) as total_revenue,
    AVG(o.unit_price) as avg_selling_price
FROM mongodb.ecommerce.products p
LEFT JOIN postgresql.public.orders o ON p.productId = o.product_id
GROUP BY p.productId, p.name, p.category, p.price, p.inventory.available
ORDER BY total_revenue DESC NULLS LAST
"""

result2 = trino_client.execute_query(federated_query2)
if result2 is not None:
    print("📊 Product Performance Results:")
    print(result2)
print()

# 3. Real-time Business Intelligence Dashboard
print("🔍 Real-time Business Intelligence Dashboard")
print("Comprehensive business metrics across databases")
federated_query3 = """
SELECT 
    'Total Customers' as metric_name,
    COUNT(*) as metric_value,
    'customers' as unit
FROM postgresql.public.customers
UNION ALL
SELECT 
    'Total Products' as metric_name,
    COUNT(*) as metric_value,
    'products' as unit
FROM mongodb.ecommerce.products
UNION ALL
SELECT 
    'Total Orders' as metric_name,
    COUNT(*) as metric_value,
    'orders' as unit
FROM postgresql.public.orders
UNION ALL
SELECT 
    'Total Revenue' as metric_name,
    SUM(total_amount) as metric_value,
    'VND' as unit
FROM postgresql.public.orders
UNION ALL
SELECT 
    'Average Order Value' as metric_name,
    AVG(total_amount) as metric_value,
    'VND' as unit
FROM postgresql.public.orders
UNION ALL
SELECT 
    'Gold Customers' as metric_name,
    COUNT(*) as metric_value,
    'customers' as unit
FROM postgresql.public.customers
WHERE customer_tier = 'gold'
"""

result3 = trino_client.execute_query(federated_query3)
if result3 is not None:
    print("📊 Business Intelligence Dashboard:")
    print(result3)
print()

# 4. Customer Segmentation Analysis
print("🔍 Customer Segmentation Analysis")
print("Analyzing customer behavior patterns")
federated_query4 = """
SELECT 
    c.customer_tier,
    COUNT(DISTINCT c.customer_id) as customer_count,
    COUNT(o.order_id) as total_orders,
    SUM(o.total_amount) as total_revenue,
    AVG(o.total_amount) as avg_order_value,
    AVG(EXTRACT(DAY FROM CURRENT_TIMESTAMP - o.order_date)) as avg_days_since_last_order
FROM postgresql.public.customers c
LEFT JOIN postgresql.public.orders o ON c.customer_id = o.user_id
GROUP BY c.customer_tier
ORDER BY total_revenue DESC
"""

result4 = trino_client.execute_query(federated_query4)
if result4 is not None:
    print("📊 Customer Segmentation Results:")
    print(result4)
print()

print("✅ Federated queries completed!")


🔍 Cross-Database User Analysis
Combining MongoDB products with PostgreSQL orders
🔍 Query executed successfully!
📊 Returned 5 rows
📊 Customer Analysis Results:
  customer_id  customer_name customer_tier  total_orders  total_spent  \
0    cust_001  Alice Johnson          gold             2  45000000.00   
1    cust_004   David Wilson          gold             1  44000000.00   
2    cust_002      Bob Smith        silver             2  37000000.00   
3    cust_003    Carol Davis        bronze             1  15000000.00   
4    cust_005      Eva Brown        silver             1   5000000.00   

  avg_order_value     last_order_date  
0     22500000.00 2025-09-17 09:15:00  
1     44000000.00 2025-09-19 11:30:00  
2     18500000.00 2025-09-21 08:10:00  
3     15000000.00 2025-09-18 16:45:00  
4      5000000.00 2025-09-20 13:20:00  

🔍 Product Performance Analysis
Combining MongoDB product catalog with PostgreSQL order data
🔍 Query executed successfully!
📊 Returned 2 rows
📊 Product Performanc

## Exercise 2: Federated Queries

Now let's perform federated queries that combine data from MongoDB and PostgreSQL databases.


In [5]:
# Exercise 2: Federated Queries

# 1. Cross-Database User Analysis
print("🔍 Cross-Database User Analysis")
print("Combining MongoDB products with PostgreSQL orders")
federated_query1 = """
SELECT 
    c.customer_id,
    c.name as customer_name,
    c.customer_tier,
    COUNT(o.order_id) as total_orders,
    SUM(o.total_amount) as total_spent,
    AVG(o.total_amount) as avg_order_value,
    MAX(o.order_date) as last_order_date
FROM postgresql.public.customers c
LEFT JOIN postgresql.public.orders o ON c.customer_id = o.user_id
GROUP BY c.customer_id, c.name, c.customer_tier
ORDER BY total_spent DESC
"""

result1 = trino_client.execute_query(federated_query1)
if result1 is not None:
    print("📊 Customer Analysis Results:")
    print(result1)
print()

# 2. Product Performance Analysis
print("🔍 Product Performance Analysis")
print("Combining MongoDB product catalog with PostgreSQL order data")
federated_query2 = """
SELECT 
    p.productId,
    p.name as product_name,
    p.category,
    p.price as current_price,
    p.inventory.available as available_stock,
    COUNT(o.order_id) as times_ordered,
    SUM(o.quantity) as total_quantity_sold,
    SUM(o.total_amount) as total_revenue,
    AVG(o.unit_price) as avg_selling_price
FROM mongodb.ecommerce.products p
LEFT JOIN postgresql.public.orders o ON p.productId = o.product_id
GROUP BY p.productId, p.name, p.category, p.price, p.inventory.available
ORDER BY total_revenue DESC NULLS LAST
"""

result2 = trino_client.execute_query(federated_query2)
if result2 is not None:
    print("📊 Product Performance Results:")
    print(result2)
print()

# 3. Real-time Business Intelligence Dashboard
print("🔍 Real-time Business Intelligence Dashboard")
print("Comprehensive business metrics across databases")
federated_query3 = """
SELECT 
    'Total Customers' as metric_name,
    COUNT(*) as metric_value,
    'customers' as unit
FROM postgresql.public.customers
UNION ALL
SELECT 
    'Total Products' as metric_name,
    COUNT(*) as metric_value,
    'products' as unit
FROM mongodb.ecommerce.products
UNION ALL
SELECT 
    'Total Orders' as metric_name,
    COUNT(*) as metric_value,
    'orders' as unit
FROM postgresql.public.orders
UNION ALL
SELECT 
    'Total Revenue' as metric_name,
    SUM(total_amount) as metric_value,
    'VND' as unit
FROM postgresql.public.orders
UNION ALL
SELECT 
    'Average Order Value' as metric_name,
    AVG(total_amount) as metric_value,
    'VND' as unit
FROM postgresql.public.orders
UNION ALL
SELECT 
    'Gold Customers' as metric_name,
    COUNT(*) as metric_value,
    'customers' as unit
FROM postgresql.public.customers
WHERE customer_tier = 'gold'
"""

result3 = trino_client.execute_query(federated_query3)
if result3 is not None:
    print("📊 Business Intelligence Dashboard:")
    print(result3)
print()

# 4. Customer Segmentation Analysis
print("🔍 Customer Segmentation Analysis")
print("Analyzing customer behavior patterns")
federated_query4 = """
SELECT 
    c.customer_tier,
    COUNT(DISTINCT c.customer_id) as customer_count,
    COUNT(o.order_id) as total_orders,
    SUM(o.total_amount) as total_revenue,
    AVG(o.total_amount) as avg_order_value,
    AVG(EXTRACT(DAY FROM CURRENT_TIMESTAMP - o.order_date)) as avg_days_since_last_order
FROM postgresql.public.customers c
LEFT JOIN postgresql.public.orders o ON c.customer_id = o.user_id
GROUP BY c.customer_tier
ORDER BY total_revenue DESC
"""

result4 = trino_client.execute_query(federated_query4)
if result4 is not None:
    print("📊 Customer Segmentation Results:")
    print(result4)
print()

print("✅ Federated queries completed!")


🔍 Cross-Database User Analysis
Combining MongoDB products with PostgreSQL orders
🔍 Query executed successfully!
📊 Returned 5 rows
📊 Customer Analysis Results:
  customer_id  customer_name customer_tier  total_orders  total_spent  \
0    cust_001  Alice Johnson          gold             2  45000000.00   
1    cust_004   David Wilson          gold             1  44000000.00   
2    cust_002      Bob Smith        silver             2  37000000.00   
3    cust_003    Carol Davis        bronze             1  15000000.00   
4    cust_005      Eva Brown        silver             1   5000000.00   

  avg_order_value     last_order_date  
0     22500000.00 2025-09-17 09:15:00  
1     44000000.00 2025-09-19 11:30:00  
2     18500000.00 2025-09-21 08:10:00  
3     15000000.00 2025-09-18 16:45:00  
4      5000000.00 2025-09-20 13:20:00  

🔍 Product Performance Analysis
Combining MongoDB product catalog with PostgreSQL order data
🔍 Query executed successfully!
📊 Returned 2 rows
📊 Product Performanc

## Exercise 3: Data Lakehouse Analytics

Advanced analytics using Trino's capabilities for data lakehouse scenarios with MongoDB and PostgreSQL.


In [11]:
# Exercise 3: Data Lakehouse Analytics

# 1. Time-Series Analysis
print("🔍 Time-Series Analysis")
print("Analyzing order trends over time")
timeseries_query = """
WITH daily_metrics AS (
    SELECT 
        DATE_TRUNC('day', order_date) as order_day,
        COUNT(DISTINCT user_id) as unique_customers,
        COUNT(*) as total_orders,
        SUM(total_amount) as daily_revenue,
        AVG(total_amount) as avg_order_value
    FROM postgresql.public.orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '30' DAY
    GROUP BY DATE_TRUNC('day', order_date)
)
SELECT 
    order_day,
    unique_customers,
    total_orders,
    daily_revenue,
    avg_order_value,
    -- Calculate growth rates
    LAG(daily_revenue) OVER (ORDER BY order_day) as prev_day_revenue,
    CASE 
        WHEN LAG(daily_revenue) OVER (ORDER BY order_day) > 0 
        THEN (daily_revenue - LAG(daily_revenue) OVER (ORDER BY order_day)) / 
             LAG(daily_revenue) OVER (ORDER BY order_day) * 100 
        ELSE 0 
    END as revenue_growth_pct
FROM daily_metrics
ORDER BY order_day DESC
"""

result1 = trino_client.execute_query(timeseries_query)
if result1 is not None:
    print("📊 Time-Series Analysis Results:")
    print(result1)
print()

# 2. Customer Segmentation Analysis
print("🔍 Customer Segmentation Analysis")
print("Analyzing customer behavior patterns")
segmentation_query = """
WITH customer_metrics AS (
    SELECT 
        c.customer_id,
        c.name as customer_name,
        c.customer_tier,
        -- Purchase behavior
        COUNT(DISTINCT o.order_id) as order_frequency,
        SUM(o.total_amount) as total_spent,
        AVG(o.total_amount) as avg_order_value,
        MAX(o.order_date) as last_order_date,
        -- Days since last order
        EXTRACT(DAY FROM CURRENT_TIMESTAMP - MAX(o.order_date)) as days_since_last_order
    FROM postgresql.public.customers c
    LEFT JOIN postgresql.public.orders o ON c.customer_id = o.user_id
    GROUP BY c.customer_id, c.name, c.customer_tier
),
customer_segments AS (
    SELECT 
        customer_id,
        customer_name,
        customer_tier,
        order_frequency,
        total_spent,
        avg_order_value,
        days_since_last_order,
        -- RFM Analysis
        CASE 
            WHEN total_spent >= 30000000 THEN 'High Value'
            WHEN total_spent >= 15000000 THEN 'Medium Value'
            ELSE 'Low Value'
        END as value_segment,
        CASE 
            WHEN order_frequency >= 3 THEN 'Frequent'
            WHEN order_frequency >= 2 THEN 'Regular'
            ELSE 'Occasional'
        END as frequency_segment,
        CASE 
            WHEN days_since_last_order <= 7 THEN 'Recent'
            WHEN days_since_last_order <= 30 THEN 'Active'
            ELSE 'Inactive'
        END as recency_segment
    FROM customer_metrics
)
SELECT 
    value_segment,
    frequency_segment,
    recency_segment,
    COUNT(*) as customer_count,
    AVG(total_spent) as avg_total_spent,
    AVG(order_frequency) as avg_order_frequency,
    AVG(days_since_last_order) as avg_days_since_last_order
FROM customer_segments
GROUP BY value_segment, frequency_segment, recency_segment
ORDER BY avg_total_spent DESC
"""

result2 = trino_client.execute_query(segmentation_query)
if result2 is not None:
    print("📊 Customer Segmentation Results:")
    print(result2)
print()

# 3. Product Performance Analytics
print("🔍 Product Performance Analytics")
print("Analyzing product performance and inventory")
product_analytics_query = """
WITH product_performance AS (
    SELECT 
        p.productId,
        p.name,
        p.category,
        p.price,
        p.inventory.available as current_stock,
        p.metadata.reviews.averageRating as rating,
        -- Sales performance
        COUNT(DISTINCT o.order_id) as times_ordered,
        SUM(o.quantity) as total_quantity_sold,
        SUM(o.total_amount) as total_revenue,
        AVG(o.unit_price) as avg_selling_price,
        -- Inventory turnover
        CASE 
            WHEN p.inventory.available > 0 
            THEN SUM(o.quantity) / p.inventory.available 
            ELSE 0 
        END as inventory_turnover_ratio
    FROM mongodb.ecommerce.products p
    LEFT JOIN postgresql.public.orders o ON p.productId = o.product_id
    GROUP BY p.productId, p.name, p.category, p.price, p.inventory.available, p.metadata.reviews.averageRating
),
product_insights AS (
    SELECT 
        productId,
        name,
        category,
        price,
        current_stock,
        rating,
        times_ordered,
        total_quantity_sold,
        total_revenue,
        avg_selling_price,
        inventory_turnover_ratio,
        -- Performance categories
        CASE 
            WHEN total_revenue >= 50000000 THEN 'Top Performer'
            WHEN total_revenue >= 20000000 THEN 'Good Performer'
            WHEN total_revenue >= 10000000 THEN 'Average Performer'
            ELSE 'Low Performer'
        END as performance_category,
        -- Stock status
        CASE 
            WHEN current_stock <= 10 THEN 'Low Stock'
            WHEN current_stock <= 30 THEN 'Medium Stock'
            ELSE 'High Stock'
        END as stock_status,
        -- Recommendation
        CASE 
            WHEN current_stock <= 10 AND times_ordered >= 2 THEN 'URGENT: Restock'
            WHEN current_stock <= 30 AND times_ordered >= 1 THEN 'Restock Soon'
            WHEN times_ordered = 0 AND current_stock > 20 THEN 'Consider Promotion'
            ELSE 'Monitor'
        END as recommended_action
    FROM product_performance
)
SELECT 
    productId,
    name,
    category,
    price,
    current_stock,
    rating,
    performance_category,
    stock_status,
    recommended_action,
    times_ordered,
    total_revenue,
    inventory_turnover_ratio
FROM product_insights
ORDER BY 
    CASE recommended_action
        WHEN 'URGENT: Restock' THEN 1
        WHEN 'Restock Soon' THEN 2
        WHEN 'Consider Promotion' THEN 3
        ELSE 4
    END,
    total_revenue DESC
"""

result3 = trino_client.execute_query(product_analytics_query)
if result3 is not None:
    print("📊 Product Performance Analytics Results:")
    print(result3)
print()

print("✅ Data lakehouse analytics completed!")


🔍 Time-Series Analysis
Analyzing order trends over time
🔍 Query executed successfully!
📊 Returned 7 rows
📊 Time-Series Analysis Results:
   order_day  unique_customers  total_orders daily_revenue avg_order_value  \
0 2025-09-21                 1             1   15000000.00     15000000.00   
1 2025-09-20                 1             1    5000000.00      5000000.00   
2 2025-09-19                 1             1   44000000.00     44000000.00   
3 2025-09-18                 1             1   15000000.00     15000000.00   
4 2025-09-17                 1             1   15000000.00     15000000.00   
5 2025-09-16                 1             1   22000000.00     22000000.00   
6 2025-09-15                 1             1   30000000.00     30000000.00   

  prev_day_revenue revenue_growth_pct  
0       5000000.00             200.00  
1      44000000.00             -89.00  
2      15000000.00             193.00  
3      15000000.00               0.00  
4      22000000.00             -32.00 