# ACME Corp S3 Tables Complete Guide

This notebook provides a comprehensive guide for working with ACME Corp data in S3 Tables format, integrated with SageMaker Lakehouse and queryable via Amazon Athena.

## Table of Contents
1. [Environment Setup](#1-environment-setup)
2. [Data Preparation](#2-data-preparation)
3. [S3 Upload and Configuration](#3-s3-upload)
4. [SageMaker Lakehouse Setup](#4-sagemaker-lakehouse)
5. [Athena Query Examples](#5-athena-queries)
6. [MCP Server Integration](#6-mcp-server)
7. [Troubleshooting](#7-troubleshooting)
8. [Advanced Analytics](#8-advanced-analytics)

## 1. Environment Setup <a id='1-environment-setup'></a>

First, let's install and import all required libraries.

In [None]:
# Install required packages
!pip install pandas pyarrow boto3 matplotlib seaborn plotly --quiet

# Import libraries
import pandas as pd
import numpy as np
import boto3
import json
import time
import os
from datetime import datetime
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from IPython.display import display, Markdown, HTML

# Set up plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Configure pandas display
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

print("✅ Environment setup complete!")

### AWS Configuration

Configure your AWS credentials and region.

In [None]:
# AWS Configuration
AWS_REGION = os.getenv('AWS_REGION', 'us-west-2')
AWS_PROFILE = os.getenv('AWS_PROFILE', 'default')

# Initialize AWS clients
session = boto3.Session(region_name=AWS_REGION)
s3 = session.client('s3')
glue = session.client('glue')
athena = session.client('athena')
sts = session.client('sts')

# Get account ID
ACCOUNT_ID = sts.get_caller_identity()['Account']

# Define bucket names
S3_BUCKET = f'acme-corp-lakehouse-{ACCOUNT_ID}'
ATHENA_RESULTS_BUCKET = f'{S3_BUCKET}/athena-results/'

print(f"🔧 AWS Configuration:")
print(f"   Region: {AWS_REGION}")
print(f"   Account ID: {ACCOUNT_ID}")
print(f"   S3 Bucket: {S3_BUCKET}")

## 2. Data Preparation <a id='2-data-preparation'></a>

Convert ACME Corp CSV data to Parquet format for S3 Tables.

In [None]:
def prepare_s3_tables():
    """
    Convert CSV files to Parquet format for S3 Tables
    """
    # Create output directory
    s3_tables_dir = "s3_tables_format"
    os.makedirs(s3_tables_dir, exist_ok=True)
    
    # Define data directories
    data_dirs = {
        "ad_campaign": "ad_campaign_data",
        "streaming": "streaming_analytics", 
        "users": "user_details"
    }
    
    converted_files = []
    table_stats = []
    
    for category, dir_name in data_dirs.items():
        category_dir = os.path.join(s3_tables_dir, category)
        os.makedirs(category_dir, exist_ok=True)
        
        # Find all CSV files
        csv_files = list(Path(dir_name).glob("*.csv"))
        
        for csv_file in csv_files:
            try:
                # Read CSV
                df = pd.read_csv(csv_file)
                
                # Convert to Parquet
                table_name = Path(csv_file).stem
                output_path = os.path.join(category_dir, f"{table_name}.parquet")
                df.to_parquet(output_path, engine='pyarrow', compression='snappy')
                
                # Collect statistics
                stats = {
                    'table': table_name,
                    'category': category,
                    'rows': len(df),
                    'columns': len(df.columns),
                    'size_mb': os.path.getsize(output_path) / (1024 * 1024)
                }
                table_stats.append(stats)
                
                print(f"✅ Converted {table_name}: {len(df):,} rows")
                
            except Exception as e:
                print(f"❌ Error converting {csv_file}: {e}")
    
    # Display statistics
    stats_df = pd.DataFrame(table_stats)
    display(stats_df)
    
    print(f"\n📊 Summary:")
    print(f"   Total tables: {len(stats_df)}")
    print(f"   Total rows: {stats_df['rows'].sum():,}")
    print(f"   Total size: {stats_df['size_mb'].sum():.2f} MB")
    
    return stats_df

# Run data preparation
if os.path.exists('ad_campaign_data'):
    table_stats = prepare_s3_tables()
else:
    print("⚠️  Data directories not found. Please ensure ACME Corp data is in the current directory.")

### Data Quality Check

Let's verify the data quality and structure.

In [None]:
def check_data_quality(parquet_file):
    """
    Perform data quality checks on a Parquet file
    """
    df = pd.read_parquet(parquet_file)
    
    print(f"\n📋 Data Quality Report: {os.path.basename(parquet_file)}")
    print("=" * 50)
    
    # Basic info
    print(f"\nShape: {df.shape[0]:,} rows × {df.shape[1]} columns")
    print(f"Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
    
    # Missing values
    missing = df.isnull().sum()
    if missing.any():
        print("\n⚠️  Missing values:")
        display(missing[missing > 0])
    else:
        print("\n✅ No missing values")
    
    # Data types
    print("\n📊 Data types:")
    display(df.dtypes.value_counts())
    
    # Sample data
    print("\n🔍 Sample data (first 5 rows):")
    display(df.head())
    
    return df

# Check a sample file
sample_file = "s3_tables_format/users/user_details.parquet"
if os.path.exists(sample_file):
    sample_df = check_data_quality(sample_file)

## 3. S3 Upload and Configuration <a id='3-s3-upload'></a>

Upload the prepared Parquet files to S3.

In [None]:
def create_s3_bucket(bucket_name, region='us-west-2'):
    """
    Create S3 bucket if it doesn't exist
    """
    try:
        if region == 'us-east-1':
            s3.create_bucket(Bucket=bucket_name)
        else:
            s3.create_bucket(
                Bucket=bucket_name,
                CreateBucketConfiguration={'LocationConstraint': region}
            )
        print(f"✅ Created bucket: {bucket_name}")
    except s3.exceptions.BucketAlreadyExists:
        print(f"ℹ️  Bucket already exists: {bucket_name}")
    except s3.exceptions.BucketAlreadyOwnedByYou:
        print(f"ℹ️  You already own bucket: {bucket_name}")

def upload_to_s3(local_path, s3_path):
    """
    Upload file to S3
    """
    try:
        s3.upload_file(local_path, S3_BUCKET, s3_path)
        print(f"✅ Uploaded: {s3_path}")
        return True
    except Exception as e:
        print(f"❌ Error uploading {local_path}: {e}")
        return False

def upload_all_tables():
    """
    Upload all Parquet files to S3
    """
    # Create bucket
    create_s3_bucket(S3_BUCKET, AWS_REGION)
    
    # Upload files
    upload_count = 0
    s3_tables_dir = "s3_tables_format"
    
    for category in ['users', 'streaming', 'ad_campaign']:
        category_dir = os.path.join(s3_tables_dir, category)
        if os.path.exists(category_dir):
            for parquet_file in Path(category_dir).glob("*.parquet"):
                table_name = parquet_file.stem
                s3_key = f"tables/{category}/{table_name}/{table_name}.parquet"
                
                if upload_to_s3(str(parquet_file), s3_key):
                    upload_count += 1
    
    print(f"\n✅ Upload complete! {upload_count} files uploaded to S3.")
    return upload_count

# Upload tables to S3
# upload_count = upload_all_tables()
print("⚠️  Uncomment the line above to actually upload files to S3")

## 4. SageMaker Lakehouse Setup <a id='4-sagemaker-lakehouse'></a>

Set up the Glue Data Catalog for SageMaker Lakehouse integration.

In [None]:
# Glue database configuration
DATABASE_NAME = 'acme_corp_lakehouse'

def create_glue_database():
    """
    Create Glue database for SageMaker Lakehouse
    """
    try:
        glue.create_database(
            DatabaseInput={
                'Name': DATABASE_NAME,
                'Description': 'ACME Corp SageMaker Lakehouse database',
                'LocationUri': f's3://{S3_BUCKET}/databases/{DATABASE_NAME}/'
            }
        )
        print(f"✅ Created Glue database: {DATABASE_NAME}")
    except glue.exceptions.AlreadyExistsException:
        print(f"ℹ️  Database already exists: {DATABASE_NAME}")

def get_table_schema(table_name):
    """
    Define table schemas for Glue
    """
    schemas = {
        "user_details": [
            {"Name": "user_id", "Type": "string"},
            {"Name": "email", "Type": "string"},
            {"Name": "age", "Type": "bigint"},
            {"Name": "gender", "Type": "string"},
            {"Name": "country", "Type": "string"},
            {"Name": "city", "Type": "string"},
            {"Name": "subscription_plan", "Type": "string"},
            {"Name": "monthly_price", "Type": "double"},
            {"Name": "signup_date", "Type": "string"},
            {"Name": "is_active", "Type": "boolean"},
            {"Name": "last_payment_date", "Type": "string"},
            {"Name": "primary_device", "Type": "string"},
            {"Name": "num_profiles", "Type": "bigint"},
            {"Name": "payment_method", "Type": "string"},
            {"Name": "lifetime_value", "Type": "double"},
            {"Name": "referral_source", "Type": "string"},
            {"Name": "language_preference", "Type": "string"}
        ],
        "campaigns": [
            {"Name": "campaign_id", "Type": "string"},
            {"Name": "campaign_name", "Type": "string"},
            {"Name": "campaign_type", "Type": "string"},
            {"Name": "objective", "Type": "string"},
            {"Name": "start_date", "Type": "string"},
            {"Name": "end_date", "Type": "string"},
            {"Name": "budget", "Type": "double"},
            {"Name": "target_audience", "Type": "string"},
            {"Name": "target_countries", "Type": "string"},
            {"Name": "promoted_content_id", "Type": "string"},
            {"Name": "promoted_content_title", "Type": "string"}
        ]
        # Add other table schemas as needed
    }
    return schemas.get(table_name, [])

def register_glue_table(table_name, s3_location, schema):
    """
    Register table in Glue Data Catalog
    """
    try:
        # Delete existing table if any
        try:
            glue.delete_table(DatabaseName=DATABASE_NAME, Name=table_name)
            print(f"🗑️  Deleted existing table: {table_name}")
        except:
            pass
        
        # Create table with correct Parquet configuration
        table_input = {
            'Name': table_name,
            'StorageDescriptor': {
                'Columns': schema,
                'Location': s3_location,
                'InputFormat': 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat',
                'OutputFormat': 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat',
                'SerdeInfo': {
                    'SerializationLibrary': 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe',
                    'Parameters': {'serialization.format': '1'}
                },
                'StoredAsSubDirectories': False
            },
            'PartitionKeys': [],
            'TableType': 'EXTERNAL_TABLE',
            'Parameters': {
                'EXTERNAL': 'TRUE',
                'parquet.compression': 'SNAPPY',
                'classification': 'parquet'
            }
        }
        
        glue.create_table(
            DatabaseName=DATABASE_NAME,
            TableInput=table_input
        )
        
        print(f"✅ Registered table: {table_name}")
        return True
        
    except Exception as e:
        print(f"❌ Error registering table {table_name}: {e}")
        return False

# Example: Register a table
# create_glue_database()
# schema = get_table_schema('user_details')
# register_glue_table('user_details', f's3://{S3_BUCKET}/tables/users/user_details/', schema)
print("⚠️  Uncomment the lines above to create Glue database and tables")

## 5. Athena Query Examples <a id='5-athena-queries'></a>

Execute queries using Amazon Athena.

In [None]:
class AthenaQueryExecutor:
    """
    Helper class for executing Athena queries
    """
    def __init__(self, database=DATABASE_NAME, workgroup='primary'):
        self.athena = athena
        self.database = database
        self.workgroup = workgroup
        self.output_location = f's3://{S3_BUCKET}/athena-results/'
    
    def execute_query(self, query, max_wait=30):
        """
        Execute Athena query and return results as DataFrame
        """
        try:
            # Start query execution
            response = self.athena.start_query_execution(
                QueryString=query,
                QueryExecutionContext={'Database': self.database},
                ResultConfiguration={'OutputLocation': self.output_location}
            )
            
            query_id = response['QueryExecutionId']
            print(f"🔄 Query ID: {query_id}")
            
            # Wait for completion
            for i in range(max_wait):
                result = self.athena.get_query_execution(QueryExecutionId=query_id)
                status = result['QueryExecution']['Status']['State']
                
                if status in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
                    break
                    
                time.sleep(1)
                print(f"⏳ Waiting... ({i+1}s)", end='\r')
            
            if status == 'SUCCEEDED':
                print(f"\n✅ Query succeeded!")
                
                # Get results
                results = self.athena.get_query_results(QueryExecutionId=query_id)
                
                # Convert to DataFrame
                rows = results['ResultSet']['Rows']
                if len(rows) > 1:
                    columns = [col['VarCharValue'] for col in rows[0]['Data']]
                    data = [[col.get('VarCharValue', None) for col in row['Data']] 
                            for row in rows[1:]]
                    return pd.DataFrame(data, columns=columns)
                else:
                    return pd.DataFrame()
                    
            else:
                error = result['QueryExecution']['Status'].get('StateChangeReason', 'Unknown')
                print(f"\n❌ Query failed: {error}")
                return None
                
        except Exception as e:
            print(f"\n❌ Error: {e}")
            return None

# Initialize query executor
query_executor = AthenaQueryExecutor()

# Example queries
example_queries = {
    "user_stats": """
        SELECT 
            subscription_plan,
            COUNT(*) as user_count,
            AVG(CAST(monthly_price AS DOUBLE)) as avg_price,
            SUM(CASE WHEN is_active = true THEN 1 ELSE 0 END) as active_users
        FROM user_details
        GROUP BY subscription_plan
        ORDER BY user_count DESC
    """,
    
    "payment_analysis": """
        SELECT 
            payment_method,
            COUNT(*) as users,
            ROUND(AVG(lifetime_value), 2) as avg_ltv,
            ROUND(SUM(lifetime_value), 2) as total_ltv
        FROM user_details
        GROUP BY payment_method
        ORDER BY total_ltv DESC
    """,
    
    "device_distribution": """
        SELECT 
            primary_device,
            subscription_plan,
            COUNT(*) as user_count
        FROM user_details
        WHERE is_active = true
        GROUP BY primary_device, subscription_plan
        ORDER BY primary_device, user_count DESC
    """
}

# Execute a sample query
# result_df = query_executor.execute_query(example_queries['user_stats'])
# if result_df is not None:
#     display(result_df)
print("⚠️  Uncomment the lines above to execute Athena queries")

### Query Visualization

Visualize query results with interactive charts.

In [None]:
def visualize_subscription_distribution(df):
    """
    Create interactive visualization of subscription distribution
    """
    # Convert string numbers to numeric
    df['user_count'] = pd.to_numeric(df['user_count'])
    df['active_users'] = pd.to_numeric(df['active_users'])
    
    # Create subplots
    fig = go.Figure()
    
    # Add bars for total users
    fig.add_trace(go.Bar(
        name='Total Users',
        x=df['subscription_plan'],
        y=df['user_count'],
        text=df['user_count'],
        textposition='auto',
    ))
    
    # Add bars for active users
    fig.add_trace(go.Bar(
        name='Active Users',
        x=df['subscription_plan'],
        y=df['active_users'],
        text=df['active_users'],
        textposition='auto',
    ))
    
    # Update layout
    fig.update_layout(
        title='Subscription Plan Distribution',
        xaxis_title='Subscription Plan',
        yaxis_title='Number of Users',
        barmode='group',
        height=500,
        template='plotly_white'
    )
    
    return fig

# Example: Create visualization from sample data
sample_data = pd.DataFrame({
    'subscription_plan': ['Premium', 'Standard', 'Basic', 'Premium Plus'],
    'user_count': [3143, 2992, 2238, 1627],
    'active_users': [2984, 2702, 1887, 1556]
})

fig = visualize_subscription_distribution(sample_data)
fig.show()

## 6. MCP Server Integration <a id='6-mcp-server'></a>

Set up and test the AWS Data Processing MCP Server.

In [None]:
# MCP Server Configuration
mcp_config = {
    "mcpServers": {
        "aws-dataprocessing": {
            "command": "uvx",
            "args": [
                "awslabs.aws-dataprocessing-mcp-server@latest",
                "--allow-write"
            ],
            "env": {
                "AWS_REGION": AWS_REGION,
                "AWS_PROFILE": "default"
            }
        }
    },
    "capabilities": {
        "athena": {
            "enabled": True,
            "workgroup": "primary",
            "output_location": f"s3://{S3_BUCKET}/athena-query-results/",
            "database": DATABASE_NAME
        },
        "glue": {
            "enabled": True,
            "catalog_id": "auto"
        }
    }
}

print("📋 MCP Server Configuration:")
print(json.dumps(mcp_config, indent=2))

# Installation instructions
print("\n📦 To install MCP Server:")
print("1. Install uv: curl -LsSf https://astral.sh/uv/install.sh | sh")
print("2. Install server: uvx awslabs.aws-dataprocessing-mcp-server@latest")
print("3. Save configuration to ~/.config/mcp/aws-dataprocessing.json")

### AI Agent Query Examples

Examples of how AI agents can query the data using natural language.

In [None]:
# Natural language query examples
nl_queries = [
    {
        "question": "What is the average lifetime value of Premium subscribers?",
        "sql": """
            SELECT 
                subscription_plan,
                COUNT(*) as user_count,
                ROUND(AVG(lifetime_value), 2) as avg_lifetime_value,
                ROUND(MIN(lifetime_value), 2) as min_ltv,
                ROUND(MAX(lifetime_value), 2) as max_ltv
            FROM user_details
            WHERE subscription_plan = 'Premium'
            GROUP BY subscription_plan
        """
    },
    {
        "question": "Which payment methods are most popular among active users?",
        "sql": """
            SELECT 
                payment_method,
                COUNT(*) as active_user_count,
                ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 2) as percentage
            FROM user_details
            WHERE is_active = true
            GROUP BY payment_method
            ORDER BY active_user_count DESC
        """
    },
    {
        "question": "Show me the geographic distribution of Premium Plus subscribers",
        "sql": """
            SELECT 
                country,
                COUNT(*) as subscriber_count,
                COUNT(DISTINCT city) as cities_count
            FROM user_details
            WHERE subscription_plan = 'Premium Plus'
            GROUP BY country
            ORDER BY subscriber_count DESC
            LIMIT 10
        """
    }
]

# Display natural language queries
for i, query in enumerate(nl_queries, 1):
    print(f"\n🤖 Query {i}: {query['question']}")
    print(f"\nSQL Translation:")
    print(query['sql'])
    print("=" * 80)

## 7. Troubleshooting <a id='7-troubleshooting'></a>

Common issues and solutions.

In [None]:
def diagnose_athena_issues():
    """
    Diagnose common Athena query issues
    """
    print("🔍 Athena Diagnostics")
    print("=" * 50)
    
    # Check database exists
    try:
        response = glue.get_database(Name=DATABASE_NAME)
        print(f"✅ Database '{DATABASE_NAME}' exists")
    except:
        print(f"❌ Database '{DATABASE_NAME}' not found")
        return
    
    # Check tables
    try:
        response = glue.get_tables(DatabaseName=DATABASE_NAME)
        tables = response['TableList']
        print(f"\n📋 Found {len(tables)} tables:")
        
        for table in tables:
            table_name = table['Name']
            location = table['StorageDescriptor']['Location']
            serde = table['StorageDescriptor']['SerdeInfo']['SerializationLibrary']
            
            print(f"\n  Table: {table_name}")
            print(f"  Location: {location}")
            print(f"  SerDe: {serde}")
            
            # Check if location exists in S3
            bucket = location.split('/')[2]
            prefix = '/'.join(location.split('/')[3:])
            
            try:
                response = s3.list_objects_v2(
                    Bucket=bucket,
                    Prefix=prefix,
                    MaxKeys=1
                )
                if 'Contents' in response:
                    print(f"  ✅ S3 location exists")
                else:
                    print(f"  ❌ S3 location is empty")
            except:
                print(f"  ❌ Cannot access S3 location")
                
    except Exception as e:
        print(f"\n❌ Error checking tables: {e}")

# Run diagnostics
# diagnose_athena_issues()
print("⚠️  Uncomment the line above to run Athena diagnostics")

### Common Issues and Solutions

In [None]:
# Display common issues and solutions
issues = [
    {
        "issue": "HIVE_UNSUPPORTED_FORMAT error",
        "cause": "Incorrect SerDe configuration for Parquet files",
        "solution": "Use ParquetHiveSerDe instead of LazySimpleSerDe"
    },
    {
        "issue": "COLUMN_NOT_FOUND error",
        "cause": "Column name mismatch or wrong data type",
        "solution": "Check column names and types in Glue table definition"
    },
    {
        "issue": "S3 access denied",
        "cause": "Missing IAM permissions",
        "solution": "Add s3:GetObject and s3:ListBucket permissions"
    },
    {
        "issue": "Query timeout",
        "cause": "Large data scan or complex query",
        "solution": "Use partitions, add filters, or optimize query"
    }
]

issues_df = pd.DataFrame(issues)
display(HTML(issues_df.to_html(index=False, escape=False)))

## 8. Advanced Analytics <a id='8-advanced-analytics'></a>

Complex analytical queries and visualizations.

In [None]:
# Advanced query examples
advanced_queries = {
    "cohort_analysis": """
        -- Monthly cohort retention analysis
        WITH cohorts AS (
            SELECT 
                user_id,
                DATE_TRUNC('month', CAST(signup_date AS DATE)) as cohort_month,
                subscription_plan,
                is_active
            FROM user_details
        )
        SELECT 
            cohort_month,
            subscription_plan,
            COUNT(DISTINCT user_id) as cohort_size,
            SUM(CASE WHEN is_active = true THEN 1 ELSE 0 END) as active_users,
            ROUND(100.0 * SUM(CASE WHEN is_active = true THEN 1 ELSE 0 END) / COUNT(*), 2) as retention_rate
        FROM cohorts
        GROUP BY cohort_month, subscription_plan
        ORDER BY cohort_month DESC, subscription_plan
    """,
    
    "ltv_prediction": """
        -- Customer lifetime value by segment
        SELECT 
            subscription_plan,
            payment_method,
            CASE 
                WHEN age < 25 THEN '18-24'
                WHEN age < 35 THEN '25-34'
                WHEN age < 45 THEN '35-44'
                WHEN age < 55 THEN '45-54'
                ELSE '55+'
            END as age_group,
            COUNT(*) as user_count,
            ROUND(AVG(lifetime_value), 2) as avg_ltv,
            ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY lifetime_value), 2) as median_ltv,
            ROUND(STDDEV(lifetime_value), 2) as ltv_stddev
        FROM user_details
        WHERE is_active = true
        GROUP BY subscription_plan, payment_method, age_group
        HAVING COUNT(*) >= 10
        ORDER BY avg_ltv DESC
    """,
    
    "churn_analysis": """
        -- Churn analysis by multiple factors
        SELECT 
            subscription_plan,
            primary_device,
            COUNT(*) as total_users,
            SUM(CASE WHEN is_active = false THEN 1 ELSE 0 END) as churned_users,
            ROUND(100.0 * SUM(CASE WHEN is_active = false THEN 1 ELSE 0 END) / COUNT(*), 2) as churn_rate,
            ROUND(AVG(CASE WHEN is_active = false THEN lifetime_value ELSE NULL END), 2) as avg_churned_ltv
        FROM user_details
        GROUP BY subscription_plan, primary_device
        HAVING COUNT(*) >= 50
        ORDER BY churn_rate DESC
    """
}

print("📊 Advanced Analytics Queries Available:")
for name, query in advanced_queries.items():
    print(f"\n• {name.replace('_', ' ').title()}")
    print(f"  Query length: {len(query)} characters")

### Interactive Dashboard Components

In [None]:
def create_analytics_dashboard():
    """
    Create interactive dashboard components
    """
    # Sample data for visualization
    metrics = {
        'Total Users': 10000,
        'Active Users': 9129,
        'Avg Monthly Revenue': 450000,
        'Churn Rate': '8.71%',
        'Avg LTV': '$438.50'
    }
    
    # Create metrics cards
    html = '<div style="display: flex; gap: 20px; flex-wrap: wrap;">'
    
    for metric, value in metrics.items():
        html += f'''
        <div style="
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            padding: 20px;
            border-radius: 10px;
            min-width: 150px;
            text-align: center;
            box-shadow: 0 4px 6px rgba(0,0,0,0.1);
        ">
            <h3 style="margin: 0; font-size: 14px; opacity: 0.9;">{metric}</h3>
            <p style="margin: 10px 0 0 0; font-size: 24px; font-weight: bold;">{value:,}</p>
        </div>
        '''
    
    html += '</div>'
    
    display(HTML(html))
    
    # Create sample time series data
    dates = pd.date_range('2024-01-01', periods=30, freq='D')
    revenue_data = pd.DataFrame({
        'date': dates,
        'revenue': np.random.normal(15000, 2000, 30).cumsum(),
        'new_users': np.random.poisson(50, 30),
        'churned_users': np.random.poisson(5, 30)
    })
    
    # Create time series plot
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(
        x=revenue_data['date'],
        y=revenue_data['revenue'],
        mode='lines+markers',
        name='Cumulative Revenue',
        line=dict(color='#667eea', width=3)
    ))
    
    fig.update_layout(
        title='Revenue Trend (Sample Data)',
        xaxis_title='Date',
        yaxis_title='Revenue ($)',
        height=400,
        template='plotly_white',
        hovermode='x unified'
    )
    
    fig.show()

# Create dashboard
create_analytics_dashboard()

## Summary and Next Steps

### What We've Covered
1. ✅ Data preparation - Converting CSV to Parquet
2. ✅ S3 upload - Storing data in S3 Tables format
3. ✅ Glue catalog - Registering tables for querying
4. ✅ Athena queries - Executing SQL queries on the data
5. ✅ MCP server - Enabling AI agent integration
6. ✅ Visualizations - Creating interactive charts

### Next Steps
1. **Production Deployment**
   - Set up automated data pipelines
   - Implement data quality checks
   - Configure monitoring and alerts

2. **Advanced Analytics**
   - Build ML models using SageMaker
   - Create real-time dashboards
   - Implement predictive analytics

3. **Integration**
   - Connect to BI tools (QuickSight, Tableau)
   - Set up API endpoints
   - Enable cross-account access

### Resources
- [AWS S3 Tables Documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables.html)
- [Amazon Athena Best Practices](https://docs.aws.amazon.com/athena/latest/ug/best-practices.html)
- [SageMaker Lakehouse Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/lakehouse.html)
- [MCP Server Documentation](https://awslabs.github.io/mcp/)