# Lab 14: Performance Tuning and Optimization - Interactive Practice

This notebook provides hands-on practice with MySQL performance tuning and optimization techniques. You'll learn to analyze query performance, optimize indexes, tune configuration settings, and monitor database performance.

## Learning Objectives
- Analyze query performance using EXPLAIN plans
- Design and implement effective indexing strategies
- Optimize slow queries and identify bottlenecks
- Configure MySQL server parameters for optimal performance
- Monitor database performance and resource usage
- Implement performance testing and benchmarking
- Apply best practices for database optimization

## Prerequisites
- Lab 13: Database Triggers and Events
- MySQL Server running with administrative access
- Large dataset for performance testing
- Python packages: mysql-connector-python, pandas, matplotlib

## Setup

In [None]:
# Install required packages if not already installed
# !pip install mysql-connector-python pandas matplotlib numpy

# Import required libraries
import mysql.connector
from mysql.connector import Error
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import time
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Database connection configuration
config = {
    'host': 'localhost',
    'user': 'root',
    'password': 'your_password',  # Replace with your MySQL password
    'database': 'performance_test',
    'autocommit': True
}

def create_connection():
    """Create database connection"""
    try:
        connection = mysql.connector.connect(**config)
        print("‚úÖ Connected to MySQL database")
        return connection
    except Error as e:
        print(f"‚ùå Error connecting to MySQL: {e}")
        return None

def execute_query(connection, query, description="", fetch=True, timing=False):
    """Execute a query and optionally return results as DataFrame"""
    start_time = time.time() if timing else None
    
    try:
        cursor = connection.cursor()
        cursor.execute(query)
        
        if description:
            print(f"\n{description}")
        
        if timing:
            execution_time = time.time() - start_time
            print(f"‚è±Ô∏è  Execution time: {execution_time:.4f} seconds")
        
        if fetch and cursor.description:
            columns = [desc[0] for desc in cursor.description]
            rows = cursor.fetchall()
            cursor.close()
            
            if rows:
                df = pd.DataFrame(rows, columns=columns)
                return df
            else:
                print("No results returned")
                return None
        else:
            cursor.close()
            print("Query executed successfully")
            return None
            
    except Error as e:
        print(f"‚ùå Error executing query: {e}")
        return None

def run_performance_test(connection, test_name, iterations=10):
    """Run a performance test with timing"""
    print(f"\nüß™ Running performance test: {test_name}")
    
    # Test queries
    queries = [
        "SELECT COUNT(*) FROM customers WHERE state = 'CA'",
        "SELECT COUNT(*) FROM customers c JOIN orders o ON c.id = o.customer_id WHERE c.state = 'CA' AND o.total_amount > 100",
        "SELECT AVG(total_amount) FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE state = 'CA' LIMIT 100)"
    ]
    
    total_time = 0
    
    for i in range(iterations):
        for query in queries:
            start_time = time.time()
            execute_query(connection, query, fetch=False)
            total_time += time.time() - start_time
    
    avg_time = total_time / (iterations * len(queries))
    print(f"üìä Average query time: {avg_time:.4f} seconds")
    print(f"üöÄ Queries per second: {1/avg_time:.2f}")
    
    return avg_time

# Test connection
conn = create_connection()
if conn:
    conn.close()
    print("‚úÖ Connection test successful")
else:
    print("‚ùå Please check your database configuration")

## Exercise 1: Setting Up Performance Test Database

Let's create the performance test database and generate sample data for our optimization exercises.

In [None]:
# Connect to database
conn = create_connection()

if conn:
    # Enable performance monitoring
    execute_query(conn, "SET GLOBAL performance_schema = ON", 
                  description="Enabling performance schema", fetch=False)
    execute_query(conn, "SET GLOBAL slow_query_log = 'ON'", 
                  description="Enabling slow query log", fetch=False)
    execute_query(conn, "SET GLOBAL long_query_time = 1", 
                  description="Setting slow query threshold to 1 second", fetch=False)
    
    # Create database
    execute_query(conn, "CREATE DATABASE IF NOT EXISTS performance_test", 
                  description="Creating performance_test database", fetch=False)
    execute_query(conn, "USE performance_test", 
                  description="Switching to performance_test database", fetch=False)
    
    # Create tables
    tables_sql = """
    CREATE TABLE IF NOT EXISTS customers (
        id INT PRIMARY KEY AUTO_INCREMENT,
        name VARCHAR(100),
        email VARCHAR(100),
        city VARCHAR(50),
        state VARCHAR(2),
        zip_code VARCHAR(10),
        created_date DATE,
        last_login TIMESTAMP,
        INDEX idx_name (name),
        INDEX idx_email (email),
        INDEX idx_city_state (city, state)
    );
    
    CREATE TABLE IF NOT EXISTS orders (
        id INT PRIMARY KEY AUTO_INCREMENT,
        customer_id INT,
        order_date DATE,
        total_amount DECIMAL(10,2),
        status ENUM('pending', 'processing', 'shipped', 'delivered'),
        shipping_address TEXT,
        FOREIGN KEY (customer_id) REFERENCES customers(id),
        INDEX idx_customer_date (customer_id, order_date),
        INDEX idx_status_date (status, order_date),
        INDEX idx_total (total_amount)
    );
    
    CREATE TABLE IF NOT EXISTS order_items (
        id INT PRIMARY KEY AUTO_INCREMENT,
        order_id INT,
        product_id INT,
        quantity INT,
        unit_price DECIMAL(8,2),
        discount DECIMAL(5,2) DEFAULT 0,
        FOREIGN KEY (order_id) REFERENCES orders(id),
        INDEX idx_order_product (order_id, product_id),
        INDEX idx_product (product_id)
    );
    """
    
    execute_query(conn, tables_sql, 
                  description="Creating test tables", fetch=False)
    
    # Generate sample data (smaller dataset for faster testing)
    print("\nüìä Generating sample data...")
    
    # Insert customers
    customers_data = []
    for i in range(1, 1001):  # 1000 customers
        customers_data.append((
            f'Customer {i}',
            f'customer{i}@example.com',
            ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'][i % 5],
            ['NY', 'CA', 'IL', 'TX', 'AZ'][i % 5],
            f'{i:05d}',
            (datetime.now() - timedelta(days=np.random.randint(1, 365))).date(),
            datetime.now() - timedelta(days=np.random.randint(1, 30))
        ))
    
    cursor = conn.cursor()
    cursor.executemany(
        "INSERT INTO customers (name, email, city, state, zip_code, created_date, last_login) VALUES (%s, %s, %s, %s, %s, %s, %s)",
        customers_data
    )
    print(f"‚úÖ Inserted {len(customers_data)} customers")
    
    # Insert orders
    orders_data = []
    for i in range(1, 5001):  # 5000 orders
        orders_data.append((
            np.random.randint(1, 1001),  # customer_id
            (datetime.now() - timedelta(days=np.random.randint(1, 365))).date(),
            round(np.random.uniform(10, 1000), 2),
            ['pending', 'processing', 'shipped', 'delivered'][np.random.randint(0, 4)],
            f'Address for order {i}'
        ))
    
    cursor.executemany(
        "INSERT INTO orders (customer_id, order_date, total_amount, status, shipping_address) VALUES (%s, %s, %s, %s, %s)",
        orders_data
    )
    print(f"‚úÖ Inserted {len(orders_data)} orders")
    
    # Insert order items
    items_data = []
    for i in range(1, 20001):  # 20,000 order items
        items_data.append((
            np.random.randint(1, 5001),  # order_id
            np.random.randint(1, 1001),  # product_id
            np.random.randint(1, 11),    # quantity
            round(np.random.uniform(5, 200), 2),  # unit_price
            round(np.random.uniform(0, 20), 2)     # discount
        ))
    
    cursor.executemany(
        "INSERT INTO order_items (order_id, product_id, quantity, unit_price, discount) VALUES (%s, %s, %s, %s, %s)",
        items_data
    )
    print(f"‚úÖ Inserted {len(items_data)} order items")
    
    cursor.close()
    
    # Check table sizes
    df = execute_query(conn, """
    SELECT table_name, table_rows 
    FROM information_schema.tables 
    WHERE table_schema = 'performance_test' AND table_name IN ('customers', 'orders', 'order_items')
    """, description="Checking table sizes")
    if df is not None:
        display(df)
    
    conn.close()
else:
    print("‚ùå Cannot proceed without database connection")

## Exercise 2: Query Analysis with EXPLAIN

Let's analyze query performance using EXPLAIN plans to understand how MySQL executes queries.

In [None]:
# Connect to database
conn = create_connection()

if conn:
    # Basic EXPLAIN examples
    explain_queries = [
        ("Simple primary key lookup", "EXPLAIN SELECT * FROM customers WHERE id = 1"),
        ("Index scan on name", "EXPLAIN SELECT * FROM customers WHERE name LIKE 'Customer 1%'"),
        ("Composite index usage", "EXPLAIN SELECT * FROM customers WHERE city = 'New York' AND state = 'NY'"),
        ("JOIN without optimization", """EXPLAIN SELECT c.name, COUNT(o.id) as order_count
                                         FROM customers c
                                         LEFT JOIN orders o ON c.id = o.customer_id
                                         WHERE c.state = 'CA'
                                         GROUP BY c.id, c.name
                                         ORDER BY order_count DESC LIMIT 10""")
    ]
    
    for description, query in explain_queries:
        df = execute_query(conn, query, description=f"{description}:")
        if df is not None:
            display(df)
    
    # Analyze query execution time
    print("\n‚è±Ô∏è  Query Performance Comparison:")
    
    test_queries = [
        ("Simple count", "SELECT COUNT(*) FROM customers"),
        ("Filtered count", "SELECT COUNT(*) FROM customers WHERE state = 'CA'"),
        ("JOIN query", "SELECT COUNT(*) FROM customers c JOIN orders o ON c.id = o.customer_id WHERE c.state = 'CA'")
    ]
    
    performance_results = []
    
    for description, query in test_queries:
        start_time = time.time()
        execute_query(conn, query, fetch=False)
        execution_time = time.time() - start_time
        performance_results.append((description, execution_time))
        print(f"{description}: {execution_time:.4f} seconds")
    
    # Visualize performance
    if performance_results:
        df_perf = pd.DataFrame(performance_results, columns=['Query', 'Time'])
        plt.figure(figsize=(10, 6))
        plt.bar(df_perf['Query'], df_perf['Time'])
        plt.title('Query Performance Comparison')
        plt.ylabel('Execution Time (seconds)')
        plt.xticks(rotation=45)
        plt.tight_layout()
        plt.show()
    
    conn.close()
else:
    print("‚ùå Cannot proceed without database connection")

## Exercise 3: Index Optimization

Let's create and analyze the impact of different indexing strategies on query performance.

In [None]:
# Connect to database
conn = create_connection()

if conn:
    # Check existing indexes
    df = execute_query(conn, "SHOW INDEXES FROM customers",
                      description="Existing indexes on customers table")
    if df is not None:
        display(df)
    
    # Performance test before adding indexes
    print("\nüìä Baseline Performance Test:")
    baseline_time = run_performance_test(conn, "Baseline (no additional indexes)", iterations=5)
    
    # Add strategic indexes
    index_queries = [
        "CREATE INDEX idx_customers_state ON customers (state)",
        "CREATE INDEX idx_orders_customer_amount ON orders (customer_id, total_amount)",
        "CREATE INDEX idx_orders_date_status ON orders (order_date, status)",
        "CREATE INDEX idx_order_items_order_quantity ON order_items (order_id, quantity)"
    ]
    
    for query in index_queries:
        execute_query(conn, query, description="Creating index", fetch=False)
    
    print("\n‚úÖ Indexes created successfully")
    
    # Performance test after adding indexes
    print("\nüìä Performance Test After Index Optimization:")
    optimized_time = run_performance_test(conn, "After Index Optimization", iterations=5)
    
    # Calculate improvement
    if baseline_time > 0:
        improvement = ((baseline_time - optimized_time) / baseline_time) * 100
        print(f"\nüéØ Performance Improvement: {improvement:.1f}%")
        print(f"‚ö° Speed increase: {baseline_time/optimized_time:.1f}x faster")
    
    # Analyze index usage
    df = execute_query(conn, """
    SELECT 
        object_name,
        index_name,
        count_read,
        count_fetch,
        count_insert,
        count_update,
        count_delete
    FROM performance_schema.table_io_waits_summary_by_index_usage
    WHERE object_schema = 'performance_test'
    AND count_read > 0
    ORDER BY count_read DESC
    """, description="Index usage statistics")
    if df is not None:
        display(df)
    
    # Test specific query optimizations
    print("\nüîç Specific Query Optimizations:")
    
    # Query 1: Customer search by state
    explain_before = execute_query(conn, "EXPLAIN SELECT COUNT(*) FROM customers WHERE state = 'CA'",
                                  description="EXPLAIN: State-based customer count")
    
    # Query 2: High-value orders
    explain_join = execute_query(conn, """EXPLAIN SELECT c.name, o.total_amount
                                         FROM customers c 
                                         JOIN orders o ON c.id = o.customer_id 
                                         WHERE c.state = 'CA' AND o.total_amount > 500""")
    
    # Query 3: Recent orders analysis
    explain_agg = execute_query(conn, """EXPLAIN SELECT DATE_FORMAT(order_date, '%Y-%m') as month, 
                                         COUNT(*) as orders, SUM(total_amount) as revenue
                                         FROM orders 
                                         WHERE order_date >= '2023-01-01'
                                         GROUP BY DATE_FORMAT(order_date, '%Y-%m')""")
    
    conn.close()
else:
    print("‚ùå Cannot proceed without database connection")

## Exercise 4: Query Optimization Techniques

Let's explore different query optimization techniques and their impact on performance.

In [None]:
# Connect to database
conn = create_connection()

if conn:
    # Subquery vs JOIN optimization
    print("üîÑ Subquery vs JOIN Optimization:")
    
    # Inefficient subquery
    subquery_sql = """
    SELECT *
    FROM customers c
    WHERE c.id IN (
        SELECT customer_id
        FROM orders
        WHERE total_amount > 500
    )
    """
    
    # Optimized JOIN
    join_sql = """
    SELECT DISTINCT c.*
    FROM customers c
    JOIN orders o ON c.id = o.customer_id
    WHERE o.total_amount > 500
    """
    
    # EXISTS optimization
    exists_sql = """
    SELECT c.*
    FROM customers c
    WHERE EXISTS (
        SELECT 1 FROM orders o
        WHERE o.customer_id = c.id
        AND o.total_amount > 500
    )
    """
    
    queries_to_test = [
        ("Subquery", subquery_sql),
        ("JOIN", join_sql),
        ("EXISTS", exists_sql)
    ]
    
    optimization_results = []
    
    for name, query in queries_to_test:
        start_time = time.time()
        df = execute_query(conn, query)
        execution_time = time.time() - start_time
        
        result_count = len(df) if df is not None else 0
        optimization_results.append((name, execution_time, result_count))
        print(f"{name}: {execution_time:.4f}s, {result_count} results")
    
    # Visualize optimization results
    if optimization_results:
        df_opt = pd.DataFrame(optimization_results, columns=['Method', 'Time', 'Results'])
        
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
        
        ax1.bar(df_opt['Method'], df_opt['Time'])
        ax1.set_title('Execution Time by Method')
        ax1.set_ylabel('Time (seconds)')
        
        ax2.bar(df_opt['Method'], df_opt['Results'])
        ax2.set_title('Result Count by Method')
        ax2.set_ylabel('Number of Results')
        
        plt.tight_layout()
        plt.show()
    
    # LIMIT optimization
    print("\nüìè LIMIT Optimization:")
    
    limit_queries = [
        ("Without LIMIT", "SELECT * FROM orders ORDER BY total_amount DESC"),
        ("With LIMIT 10", "SELECT * FROM orders ORDER BY total_amount DESC LIMIT 10"),
        ("With LIMIT 100", "SELECT * FROM orders ORDER BY total_amount DESC LIMIT 100")
    ]
    
    limit_results = []
    
    for name, query in limit_queries:
        start_time = time.time()
        df = execute_query(conn, query)
        execution_time = time.time() - start_time
        
        result_count = len(df) if df is not None else 0
        limit_results.append((name, execution_time, result_count))
        print(f"{name}: {execution_time:.4f}s, {result_count} results")
    
    # UNION vs UNION ALL
    print("\nüîó UNION vs UNION ALL:")
    
    union_sql = """
    SELECT customer_id, 'high_value' as category FROM orders WHERE total_amount > 500
    UNION
    SELECT customer_id, 'frequent' as category FROM orders GROUP BY customer_id HAVING COUNT(*) > 10
    """
    
    union_all_sql = """
    SELECT customer_id, 'high_value' as category FROM orders WHERE total_amount > 500
    UNION ALL
    SELECT customer_id, 'frequent' as category FROM orders GROUP BY customer_id HAVING COUNT(*) > 10
    """
    
    # Test UNION
    start_time = time.time()
    df_union = execute_query(conn, union_sql)
    union_time = time.time() - start_time
    
    # Test UNION ALL
    start_time = time.time()
    df_union_all = execute_query(conn, union_all_sql)
    union_all_time = time.time() - start_time
    
    print(f"UNION: {union_time:.4f}s, {len(df_union) if df_union is not None else 0} results")
    print(f"UNION ALL: {union_all_time:.4f}s, {len(df_union_all) if df_union_all is not None else 0} results")
    
    if union_all_time > 0:
        speedup = union_time / union_all_time
        print(f"UNION ALL is {speedup:.1f}x faster")
    
    conn.close()
else:
    print("‚ùå Cannot proceed without database connection")

## Exercise 5: MySQL Configuration Tuning

Let's analyze and optimize MySQL server configuration settings.

In [None]:
# Connect to database
conn = create_connection()

if conn:
    # Check current configuration
    config_vars = [
        'innodb_buffer_pool_size',
        'innodb_log_file_size',
        'max_connections',
        'query_cache_size',
        'tmp_table_size',
        'max_heap_table_size',
        'slow_query_log',
        'long_query_time'
    ]
    
    print("‚öôÔ∏è  Current MySQL Configuration:")
    for var in config_vars:
        df = execute_query(conn, f"SHOW VARIABLES LIKE '{var}'")
        if df is not None and not df.empty:
            value = df.iloc[0]['Value']
            # Convert bytes to MB for readability
            if var in ['innodb_buffer_pool_size', 'innodb_log_file_size', 'query_cache_size', 'tmp_table_size', 'max_heap_table_size']:
                try:
                    mb_value = int(value) / 1024 / 1024
                    print(f"{var}: {mb_value:.1f} MB")
                except:
                    print(f"{var}: {value}")
            else:
                print(f"{var}: {value}")
    
    # Get system information for recommendations
    print("\nüíª System Information:")
    
    # Check available memory (simplified - in real scenarios you'd check system specs)
    df = execute_query(conn, "SELECT @@version as mysql_version")
    if df is not None:
        print(f"MySQL Version: {df.iloc[0]['mysql_version']}")
    
    # Performance recommendations
    print("\nüìã Performance Tuning Recommendations:")
    print("1. Buffer Pool Size: Should be 70-80% of available RAM")
    print("2. Max Connections: Based on application requirements")
    print("3. Query Cache: Enable for read-heavy workloads (MySQL 5.7 and earlier)")
    print("4. Temporary Tables: Increase for complex queries")
    print("5. Slow Query Log: Enable for performance monitoring")
    
    # Apply some optimizations (be careful in production!)
    print("\nüîß Applying Configuration Optimizations:")
    
    # Note: These are examples - adjust based on your system
    optimizations = [
        ("Setting max_connections", "SET GLOBAL max_connections = 200"),
        ("Setting tmp_table_size", "SET GLOBAL tmp_table_size = 134217728"),  # 128MB
        ("Setting max_heap_table_size", "SET GLOBAL max_heap_table_size = 134217728"),  # 128MB
        ("Configuring slow query log", "SET GLOBAL slow_query_log = 'ON'"),
        ("Setting slow query threshold", "SET GLOBAL long_query_time = 2")
    ]
    
    for description, query in optimizations:
        try:
            execute_query(conn, query, fetch=False)
            print(f"‚úÖ {description}")
        except Exception as e:
            print(f"‚ö†Ô∏è  {description}: {str(e)}")
    
    # Test configuration impact
    print("\nüß™ Testing Configuration Impact:")
    
    # Run performance test with new configuration
    config_time = run_performance_test(conn, "After Configuration Tuning", iterations=3)
    
    # Check current status
    df = execute_query(conn, "SHOW ENGINE INNODB STATUS", fetch=False)
    
    # Check process list
    df = execute_query(conn, "SHOW PROCESSLIST",
                      description="Current database connections")
    if df is not None:
        print(f"Active connections: {len(df)}")
    
    # Check open tables
    df = execute_query(conn, "SHOW OPEN TABLES",
                      description="Open tables status")
    if df is not None:
        print(f"Open tables: {len(df)}")
    
    conn.close()
else:
    print("‚ùå Cannot proceed without database connection")

## Exercise 6: Performance Monitoring

Let's implement comprehensive performance monitoring and analysis.

In [None]:
# Connect to database
conn = create_connection()

if conn:
    # Create performance monitoring dashboard
    print("üìä Performance Monitoring Dashboard")
    
    # System overview
    df = execute_query(conn, "SELECT @@version as mysql_version, @@innodb_buffer_pool_size/1024/1024 as buffer_pool_mb")
    if df is not None:
        display(df)
    
    # Connection status
    df = execute_query(conn, "SELECT COUNT(*) as active_connections, SUM(time) as total_connection_time FROM information_schema.processlist")
    if df is not None:
        display(df)
    
    # Database size information
    df = execute_query(conn, """
    SELECT 
        table_schema,
        ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) as size_mb,
        COUNT(*) as tables_count
    FROM information_schema.tables
    WHERE table_schema NOT IN ('information_schema', 'performance_schema', 'mysql')
    GROUP BY table_schema
    ORDER BY size_mb DESC
    """, description="Database size information")
    if df is not None:
        display(df)
    
    # Table-specific statistics
    df = execute_query(conn, """
    SELECT
        table_name,
        table_rows,
        ROUND(data_length/1024/1024, 2) as data_mb,
        ROUND(index_length/1024/1024, 2) as index_mb,
        ROUND((data_length + index_length)/1024/1024, 2) as total_mb
    FROM information_schema.tables
    WHERE table_schema = 'performance_test'
    ORDER BY data_length DESC
    """, description="Table statistics")
    if df is not None:
        display(df)
        
        # Visualize table sizes
        plt.figure(figsize=(10, 6))
        plt.bar(df['table_name'], df['total_mb'])
        plt.title('Table Size Distribution')
        plt.ylabel('Size (MB)')
        plt.xticks(rotation=45)
        plt.tight_layout()
        plt.show()
    
    # Index usage analysis
    df = execute_query(conn, """
    SELECT
        object_name,
        index_name,
        count_read,
        count_fetch,
        ROUND(count_read / (count_read + count_fetch + 1) * 100, 2) as usage_pct
    FROM performance_schema.table_io_waits_summary_by_index_usage
    WHERE object_schema = 'performance_test'
    AND count_read > 0
    ORDER BY count_read DESC
    LIMIT 10
    """, description="Top 10 most used indexes")
    if df is not None:
        display(df)
    
    # Slow query analysis
    df = execute_query(conn, """
    SELECT
        sql_text,
        exec_count,
        ROUND(avg_timer_wait/1000000000, 3) as avg_time_sec,
        ROUND(sum_timer_wait/1000000000, 3) as total_time_sec
    FROM performance_schema.events_statements_summary_by_digest
    WHERE avg_timer_wait > 1000000000 -- > 1 second average
    ORDER BY sum_timer_wait DESC
    LIMIT 5
    """, description="Slowest queries (last 5)")
    if df is not None:
        display(df)
    
    # Memory usage
    df = execute_query(conn, """
    SELECT
        event_name,
        ROUND(current_alloc/1024/1024, 2) as current_mb,
        ROUND(high_alloc/1024/1024, 2) as peak_mb
    FROM sys.memory_global_by_current_bytes
    WHERE current_alloc > 1024*1024 -- > 1MB
    ORDER BY current_alloc DESC
    LIMIT 10
    """, description="Top memory consumers")
    if df is not None:
        display(df)
    
    # Create performance summary report
    print("\nüìà Performance Summary Report")
    
    # Overall health score (simplified)
    df = execute_query(conn, """
    SELECT 
        COUNT(*) as total_indexes,
        SUM(CASE WHEN count_read > 0 THEN 1 ELSE 0 END) as used_indexes
    FROM performance_schema.table_io_waits_summary_by_index_usage
    WHERE object_schema = 'performance_test'
    """, fetch=False)
    
    # Calculate some metrics
    df = execute_query(conn, "SELECT COUNT(*) as slow_queries FROM performance_schema.events_statements_summary_by_digest WHERE avg_timer_wait > 1000000000")
    slow_queries = df.iloc[0]['slow_queries'] if df is not None and not df.empty else 0
    
    df = execute_query(conn, "SELECT COUNT(*) as total_connections FROM information_schema.processlist")
    connections = df.iloc[0]['total_connections'] if df is not None and not df.empty else 0
    
    print(f"Slow queries (>1s): {slow_queries}")
    print(f"Active connections: {connections}")
    print(f"Performance monitoring: {'Enabled' if slow_queries >= 0 else 'Disabled'}")
    
    conn.close()
else:
    print("‚ùå Cannot proceed without database connection")

## Exercise 7: Advanced Optimization Techniques

Let's implement advanced optimization techniques like partitioning and summary tables.

In [None]:
# Connect to database
conn = create_connection()

if conn:
    # Table partitioning
    print("üìä Implementing Table Partitioning:")
    
    # Partition orders table by year
    partition_sql = """
    ALTER TABLE orders
    PARTITION BY RANGE (YEAR(order_date)) (
        PARTITION p2020 VALUES LESS THAN (2021),
        PARTITION p2021 VALUES LESS THAN (2022),
        PARTITION p2022 VALUES LESS THAN (2023),
        PARTITION p2023 VALUES LESS THAN (2024),
        PARTITION p2024 VALUES LESS THAN (2025),
        PARTITION p_future VALUES LESS THAN MAXVALUE
    )
    """
    
    try:
        execute_query(conn, partition_sql, fetch=False)
        print("‚úÖ Orders table partitioned successfully")
    except Exception as e:
        print(f"‚ö†Ô∏è  Partitioning failed (may already be partitioned): {str(e)}")
    
    # Check partition information
    df = execute_query(conn, """
    SELECT
        table_name,
        partition_name,
        table_rows,
        ROUND(data_length/1024/1024, 2) as data_mb
    FROM information_schema.partitions
    WHERE table_schema = 'performance_test'
    AND table_name = 'orders'
    ORDER BY partition_ordinal_position
    """, description="Partition information")
    if df is not None:
        display(df)
    
    # Test partition pruning
    print("\nüéØ Testing Partition Pruning:")
    
    partition_tests = [
        ("All data", "SELECT COUNT(*) FROM orders"),
        ("2024 data only", "SELECT COUNT(*) FROM orders WHERE order_date >= '2024-01-01'"),
        ("2023 data only", "SELECT COUNT(*) FROM orders WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'")
    ]
    
    for description, query in partition_tests:
        df = execute_query(conn, f"EXPLAIN {query}", description=f"EXPLAIN: {description}")
        if df is not None:
            # Check if partitions are mentioned in the explain plan
            partitions_used = "partitions" in str(df.to_string()).lower()
            print(f"Partition pruning: {'Yes' if partitions_used else 'No'}")
    
    # Create summary tables
    print("\nüìã Creating Summary Tables:")
    
    # Monthly sales summary
    summary_sql = """
    CREATE TABLE IF NOT EXISTS monthly_sales_summary (
        year_month VARCHAR(7) PRIMARY KEY,
        total_orders INT DEFAULT 0,
        total_revenue DECIMAL(12,2) DEFAULT 0,
        avg_order_value DECIMAL(10,2) DEFAULT 0,
        unique_customers INT DEFAULT 0,
        last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
    )
    """
    
    execute_query(conn, summary_sql, description="Creating monthly sales summary table", fetch=False)
    
    # Populate summary table
    populate_sql = """
    INSERT INTO monthly_sales_summary (year_month, total_orders, total_revenue, avg_order_value, unique_customers, last_updated)
    SELECT
        DATE_FORMAT(order_date, '%Y-%m') as year_month,
        COUNT(DISTINCT o.id) as total_orders,
        ROUND(SUM(o.total_amount), 2) as total_revenue,
        ROUND(AVG(o.total_amount), 2) as avg_order_value,
        COUNT(DISTINCT o.customer_id) as unique_customers,
        NOW() as last_updated
    FROM orders o
    GROUP BY DATE_FORMAT(order_date, '%Y-%m')
    ORDER BY year_month
    ON DUPLICATE KEY UPDATE
        total_orders = VALUES(total_orders),
        total_revenue = VALUES(total_revenue),
        avg_order_value = VALUES(avg_order_value),
        unique_customers = VALUES(unique_customers),
        last_updated = NOW()
    """
    
    execute_query(conn, populate_sql, description="Populating monthly sales summary", fetch=False)
    
    # Compare query performance
    print("\n‚ö° Performance Comparison: Original vs Summary Table")
    
    # Query using original table
    original_query = """
    SELECT 
        DATE_FORMAT(order_date, '%Y-%m') as month,
        COUNT(DISTINCT id) as orders,
        ROUND(SUM(total_amount), 2) as revenue,
        ROUND(AVG(total_amount), 2) as avg_order
    FROM orders 
    WHERE order_date >= '2023-01-01'
    GROUP BY DATE_FORMAT(order_date, '%Y-%m')
    ORDER BY month
    """
    
    # Query using summary table
    summary_query = """
    SELECT 
        year_month as month,
        total_orders as orders,
        total_revenue as revenue,
        avg_order_value as avg_order
    FROM monthly_sales_summary
    WHERE year_month >= '2023-01'
    ORDER BY year_month
    """
    
    # Time original query
    start_time = time.time()
    df_original = execute_query(conn, original_query)
    original_time = time.time() - start_time
    
    # Time summary query
    start_time = time.time()
    df_summary = execute_query(conn, summary_query)
    summary_time = time.time() - start_time
    
    print(f"Original table query: {original_time:.4f} seconds")
    print(f"Summary table query: {summary_time:.4f} seconds")
    
    if original_time > 0:
        speedup = original_time / summary_time
        print(f"Summary table is {speedup:.1f}x faster!")
    
    # Verify results are the same
    if df_original is not None and df_summary is not None:
        results_match = df_original.equals(df_summary.rename(columns={'year_month': 'month', 'total_orders': 'orders', 'total_revenue': 'revenue', 'avg_order_value': 'avg_order'}))
        print(f"Results match: {results_match}")
    
    # Display summary data
    df = execute_query(conn, "SELECT * FROM monthly_sales_summary ORDER BY year_month DESC LIMIT 5",
                      description="Recent monthly sales summary")
    if df is not None:
        display(df)
    
    conn.close()
else:
    print("‚ùå Cannot proceed without database connection")

## Exercise 8: Performance Testing and Benchmarking

Let's create a comprehensive performance testing framework.

In [None]:
# Connect to database
conn = create_connection()

if conn:
    # Create performance testing framework
    print("üß™ Creating Performance Testing Framework")
    
    # Create results table
    results_table_sql = """
    CREATE TABLE IF NOT EXISTS performance_test_results (
        id INT PRIMARY KEY AUTO_INCREMENT,
        test_name VARCHAR(100),
        test_scenario VARCHAR(100),
        iterations INT,
        avg_query_time DECIMAL(10,6),
        total_time DECIMAL(10,6),
        queries_per_second DECIMAL(10,2),
        test_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        notes TEXT
    )
    """
    
    execute_query(conn, results_table_sql, description="Creating performance test results table", fetch=False)
    
    # Create performance testing procedure
    test_procedure_sql = """
    DELIMITER $$
    
    CREATE PROCEDURE run_performance_test(
        IN test_name VARCHAR(100),
        IN test_scenario VARCHAR(100),
        IN iterations INT
    )
    BEGIN
        DECLARE i INT DEFAULT 1;
        DECLARE start_time TIMESTAMP;
        DECLARE total_time DECIMAL(10,6);
        DECLARE avg_time DECIMAL(10,6);
        DECLARE qps DECIMAL(10,2);
        
        SET start_time = NOW();
        SET total_time = 0;
        
        -- Run test iterations
        WHILE i <= iterations DO
            -- Test Query 1: Simple customer lookup
            SELECT SQL_NO_CACHE COUNT(*) INTO @count1 
            FROM customers 
            WHERE state = 'CA';
            
            -- Test Query 2: Customer orders JOIN
            SELECT SQL_NO_CACHE COUNT(*) INTO @count2
            FROM customers c
            JOIN orders o ON c.id = o.customer_id
            WHERE c.state = 'CA' AND o.total_amount > 100;
            
            -- Test Query 3: Aggregation query
            SELECT SQL_NO_CACHE AVG(total_amount) INTO @avg_amount
            FROM orders
            WHERE customer_id IN (
                SELECT id FROM customers WHERE state = 'CA' LIMIT 100
            );
            
            SET i = i + 1;
        END WHILE;
        
        -- Calculate metrics
        SET total_time = TIMESTAMPDIFF(MICROSECOND, start_time, NOW()) / 1000000;
        SET avg_time = total_time / (iterations * 3); -- 3 queries per iteration
        SET qps = (iterations * 3) / total_time;
        
        -- Store results
        INSERT INTO performance_test_results 
        (test_name, test_scenario, iterations, avg_query_time, total_time, queries_per_second)
        VALUES (test_name, test_scenario, iterations, avg_time, total_time, qps);
        
        -- Return results
        SELECT 
            test_name,
            test_scenario,
            iterations,
            ROUND(avg_time, 6) as avg_query_time,
            ROUND(total_time, 3) as total_time_seconds,
            ROUND(qps, 2) as queries_per_second,
            CONCAT('Test completed: ', iterations * 3, ' queries in ', ROUND(total_time, 3), ' seconds') as status;
    END$$
    
    DELIMITER ;
    """
    
    execute_query(conn, test_procedure_sql, description="Creating performance testing procedure", fetch=False)
    
    # Run baseline test
    print("\nüìä Running Baseline Performance Test")
    df = execute_query(conn, "CALL run_performance_test('Baseline Test', 'No Optimizations', 50)")
    if df is not None:
        display(df)
    
    # Add more indexes for optimization test
    additional_indexes = [
        "CREATE INDEX idx_orders_status_date ON orders (status, order_date)",
        "CREATE INDEX idx_customers_created_state ON customers (created_date, state)",
        "CREATE INDEX idx_order_items_product_quantity ON order_items (product_id, quantity)"
    ]
    
    for index_sql in additional_indexes:
        execute_query(conn, index_sql, description="Adding optimization index", fetch=False)
    
    # Run optimized test
    print("\nüìä Running Optimized Performance Test")
    df = execute_query(conn, "CALL run_performance_test('Optimized Test', 'With Additional Indexes', 50)")
    if df is not None:
        display(df)
    
    # Compare results
    df = execute_query(conn, """
    SELECT 
        test_name,
        test_scenario,
        iterations,
        ROUND(avg_query_time * 1000, 3) as avg_query_time_ms,
        ROUND(queries_per_second, 2) as queries_per_second,
        test_date
    FROM performance_test_results
    ORDER BY test_date DESC
    LIMIT 2
    """, description="Performance test comparison")
    if df is not None:
        display(df)
        
        # Calculate improvement
        if len(df) >= 2:
            baseline_qps = df.iloc[1]['queries_per_second']  # Older test
            optimized_qps = df.iloc[0]['queries_per_second']  # Newer test
            
            if baseline_qps > 0:
                improvement = ((optimized_qps - baseline_qps) / baseline_qps) * 100
                print(f"\nüéØ Performance Improvement: {improvement:.1f}%")
                print(f"‚ö° Queries per second: {baseline_qps:.1f} ‚Üí {optimized_qps:.1f}")
    
    # Visualize performance trends
    df = execute_query(conn, """
    SELECT 
        test_name,
        queries_per_second,
        test_date
    FROM performance_test_results
    ORDER BY test_date
    """, description="Performance trend data")
    if df is not None and len(df) > 1:
        plt.figure(figsize=(10, 6))
        plt.plot(df['test_name'], df['queries_per_second'], marker='o')
        plt.title('Performance Test Results')
        plt.ylabel('Queries per Second')
        plt.xticks(rotation=45)
        plt.grid(True)
        plt.tight_layout()
        plt.show()
    
    conn.close()
else:
    print("‚ùå Cannot proceed without database connection")

## Summary

In this lab, you learned:

1. **Query Analysis**: Using EXPLAIN plans to understand query execution
2. **Index Optimization**: Creating and managing database indexes effectively
3. **Query Optimization**: Various techniques to improve query performance
4. **Configuration Tuning**: Optimizing MySQL server settings
5. **Performance Monitoring**: Tracking and analyzing database performance
6. **Advanced Techniques**: Partitioning, summary tables, and caching
7. **Performance Testing**: Creating benchmarks and measuring improvements

### Key Takeaways:
- Always analyze queries with EXPLAIN before optimization
- Proper indexing is crucial for query performance
- Configuration should match your workload requirements
- Regular monitoring helps identify performance issues early
- Summary tables can dramatically improve reporting query performance
- Partitioning helps manage large tables efficiently

### Performance Optimization Hierarchy:
1. **Fix the query** - Optimize SQL statements first
2. **Add indexes** - Ensure proper indexing
3. **Tune configuration** - Adjust server settings
4. **Use summary tables** - Pre-aggregate common queries
5. **Implement partitioning** - For very large tables
6. **Consider hardware upgrades** - As a last resort

### Next Steps:
- **Lab 15**: Database Security and User Management
- **Lab 16**: Backup, Recovery, and High Availability
- Apply these techniques to your production databases
- Set up automated monitoring and alerting
- Consider MySQL Enterprise features for advanced optimization

Remember: Performance optimization is an ongoing process. Monitor your database regularly and adjust as your workload changes!