# Lab 10: Advanced JOINs and Set Operations

**Duration**: 90 minutes  
**Prerequisites**: Lab 7 (Keys and Relationships)  
**Learning Objectives**:
- Master all SQL JOIN types in MySQL
- Understand set operations (UNION/UNION ALL)
- Apply JOINs in complex real-world scenarios
- Optimize JOIN performance

---

## Step 1: Environment Setup

First, let's install the MySQL connector and set up our environment.

In [None]:
# Install MySQL connector (run this in Google Colab or your local environment)
# !pip install mysql-connector-python

# Import required libraries
import mysql.connector
import pandas as pd
from IPython.display import display, HTML

print("Libraries imported successfully!")

In [None]:
# Connect to MySQL database
# Replace with your actual database credentials
try:
    connection = mysql.connector.connect(
        host='localhost',
        user='your_username',
        password='your_password',
        database='advanced_joins_db'
    )
    cursor = connection.cursor()
    print("‚úÖ Connected to MySQL database successfully!")
except mysql.connector.Error as err:
    print(f"‚ùå Connection failed: {err}")
    print("Please check your database credentials and ensure MySQL is running.")

## Step 2: Create Sample Database

Let's create the sample tables and data for our JOIN exercises.

In [None]:
# Create database and tables
def execute_query(query, description=""):
    """Helper function to execute SQL queries and display results"""
    try:
        if description:
            print(f"\nüìù {description}")
        
        # Split multiple statements
        statements = [stmt.strip() for stmt in query.split(';') if stmt.strip()]
        
        for stmt in statements:
            if stmt:
                cursor.execute(stmt)
        
        # Try to fetch results if it's a SELECT query
        try:
            results = cursor.fetchall()
            if results:
                df = pd.DataFrame(results, columns=[desc[0] for desc in cursor.description])
                display(df)
                print(f"üìä {len(results)} rows returned")
            else:
                print("‚úÖ Query executed successfully (no results to display)")
        except mysql.connector.Error:
            print("‚úÖ Query executed successfully")
            
    except mysql.connector.Error as err:
        print(f"‚ùå Error: {err}")

print("Helper function created!")

In [None]:
# Create database
execute_query("""
CREATE DATABASE IF NOT EXISTS advanced_joins_db;
USE advanced_joins_db;
""", "Creating database")

# Create tables
execute_query("""
CREATE TABLE IF NOT EXISTS departments (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50) NOT NULL,
    location VARCHAR(50),
    budget DECIMAL(10,2)
);

CREATE TABLE IF NOT EXISTS users (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50) NOT NULL,
    department_id INT,
    manager_id INT,
    email VARCHAR(100),
    hire_date DATE
);

CREATE TABLE IF NOT EXISTS orders (
    id INT PRIMARY KEY AUTO_INCREMENT,
    user_id INT,
    product_name VARCHAR(100),
    amount DECIMAL(10,2),
    order_date DATE,
    status ENUM('pending', 'shipped', 'delivered', 'cancelled'),
    FOREIGN KEY (user_id) REFERENCES users(id)
);
""", "Creating tables")

In [None]:
# Insert sample data
execute_query("""
INSERT INTO departments (name, location, budget) VALUES
('IT', 'Floor 1', 500000.00),
('HR', 'Floor 2', 300000.00),
('Sales', 'Floor 3', 400000.00),
('Marketing', 'Floor 4', 350000.00)
ON DUPLICATE KEY UPDATE name = VALUES(name);
""", "Inserting department data")

execute_query("""
INSERT INTO users (name, department_id, email, hire_date) VALUES
('Aarav', 1, 'aarav@company.com', '2023-01-15'),
('Sneha', 2, 'sneha@company.com', '2023-02-20'),
('Raj', 3, 'raj@company.com', '2023-03-10'),
('Priya', 1, 'priya@company.com', '2023-04-05'),
('Vikram', 3, 'vikram@company.com', '2023-05-12'),
('Anjali', 4, 'anjali@company.com', '2023-06-18')
ON DUPLICATE KEY UPDATE name = VALUES(name);
""", "Inserting user data")

# Set up manager hierarchy
execute_query("""
UPDATE users SET manager_id = 1 WHERE id IN (2, 3);
UPDATE users SET manager_id = 2 WHERE id = 4;
UPDATE users SET manager_id = 3 WHERE id = 5;
""", "Setting up manager hierarchy")

In [None]:
# Insert orders data
execute_query("""
INSERT INTO orders (user_id, product_name, amount, order_date, status) VALUES
(1, 'Laptop', 1200.00, '2024-01-15', 'delivered'),
(1, 'Mouse', 25.00, '2024-01-16', 'delivered'),
(2, 'Keyboard', 75.00, '2024-01-20', 'shipped'),
(3, 'Monitor', 300.00, '2024-01-25', 'delivered'),
(4, 'Headphones', 150.00, '2024-02-01', 'pending'),
(1, 'Webcam', 80.00, '2024-02-05', 'shipped'),
(5, 'Printer', 250.00, '2024-02-10', 'delivered'),
(2, 'USB Drive', 15.00, '2024-02-12', 'delivered')
ON DUPLICATE KEY UPDATE product_name = VALUES(product_name);
""", "Inserting orders data")

## Step 3: INNER JOIN Practice

INNER JOIN returns only the rows that have matching values in both tables.

In [None]:
# Basic INNER JOIN
execute_query("""
SELECT u.name, d.name as department, d.location
FROM users u
INNER JOIN departments d ON u.department_id = d.id;
""", "INNER JOIN: Users with their departments")

In [None]:
# INNER JOIN with three tables
execute_query("""
SELECT u.name, d.name as department, o.product_name, o.amount
FROM users u
INNER JOIN departments d ON u.department_id = d.id
INNER JOIN orders o ON u.id = o.user_id;
""", "INNER JOIN: Complete order information")

**Exercise 1**: Write an INNER JOIN query to show all orders over $100 with customer and department information.

In [None]:
# Exercise 1 Solution
execute_query("""
-- Write your INNER JOIN query here
SELECT u.name, d.name as department, o.product_name, o.amount
FROM users u
INNER JOIN departments d ON u.department_id = d.id
INNER JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100;
""", "Exercise 1: Orders over $100")

## Step 4: LEFT JOIN Practice

LEFT JOIN returns all rows from the left table and matching rows from the right table.

In [None]:
# LEFT JOIN: All users, even those without orders
execute_query("""
SELECT u.name, u.email, o.product_name, o.amount, o.status
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
ORDER BY u.name;
""", "LEFT JOIN: All users with their orders (NULL if no orders)")

In [None]:
# LEFT JOIN with aggregation
execute_query("""
SELECT 
    u.name,
    COUNT(o.id) as order_count,
    COALESCE(SUM(o.amount), 0) as total_spent,
    COALESCE(AVG(o.amount), 0) as avg_order_amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name
ORDER BY total_spent DESC;
""", "LEFT JOIN: Order statistics per user")

**Exercise 2**: Find users who haven't placed any orders using a LEFT JOIN.

In [None]:
# Exercise 2 Solution
execute_query("""
-- Write your LEFT JOIN query here
SELECT u.name, u.email
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL;
""", "Exercise 2: Users with no orders")

## Step 5: RIGHT JOIN Practice

RIGHT JOIN returns all rows from the right table and matching rows from the left table.

In [None]:
# RIGHT JOIN: All departments, even those without users
execute_query("""
SELECT d.name as department, d.location, u.name as employee
FROM users u
RIGHT JOIN departments d ON u.department_id = d.id
ORDER BY d.name;
""", "RIGHT JOIN: All departments with their employees")

**Exercise 3**: Show department statistics including employee count and total budget.

In [None]:
# Exercise 3 Solution
execute_query("""
-- Write your RIGHT JOIN query here
SELECT 
    d.name as department,
    d.location,
    d.budget,
    COUNT(u.id) as employee_count
FROM users u
RIGHT JOIN departments d ON u.department_id = d.id
GROUP BY d.id, d.name, d.location, d.budget;
""", "Exercise 3: Department statistics")

## Step 6: CROSS JOIN Practice

CROSS JOIN returns the Cartesian product of both tables.

In [None]:
# CROSS JOIN: All possible combinations
execute_query("""
SELECT u.name, d.name as department
FROM users u
CROSS JOIN departments d
ORDER BY u.name, d.name
LIMIT 20;
""", "CROSS JOIN: All user-department combinations (limited to 20 rows)")

## Step 7: SELF JOIN Practice

SELF JOIN joins a table with itself.

In [None]:
# SELF JOIN: Employee-manager relationships
execute_query("""
SELECT 
    e.name as employee,
    m.name as manager
FROM users e
LEFT JOIN users m ON e.manager_id = m.id
ORDER BY m.name, e.name;
""", "SELF JOIN: Employee-manager hierarchy")

**Exercise 4**: Find employees who share the same manager.

In [None]:
# Exercise 4 Solution
execute_query("""
-- Write your SELF JOIN query here
SELECT 
    e1.name as employee1,
    e2.name as employee2,
    m.name as manager
FROM users e1
JOIN users e2 ON e1.manager_id = e2.manager_id AND e1.id < e2.id
JOIN users m ON e1.manager_id = m.id;
""", "Exercise 4: Employees with same manager")

## Step 8: UNION and UNION ALL Practice

UNION combines result sets and removes duplicates, UNION ALL keeps duplicates.

In [None]:
# Create archived orders table for UNION examples
execute_query("""
CREATE TABLE IF NOT EXISTS archived_orders (
    id INT PRIMARY KEY,
    user_id INT,
    product_name VARCHAR(100),
    amount DECIMAL(10,2),
    archived_date DATE
);

INSERT INTO archived_orders VALUES
(1, 1, 'Old Laptop', 800.00, '2023-01-15'),
(2, 2, 'Old Keyboard', 50.00, '2023-02-20'),
(3, 3, 'Old Monitor', 200.00, '2023-03-10')
ON DUPLICATE KEY UPDATE product_name = VALUES(product_name);
""", "Creating archived orders table")

In [None]:
# UNION: Remove duplicates
execute_query("""
SELECT product_name, amount, 'current' as source
FROM orders
UNION
SELECT product_name, amount, 'archived' as source
FROM archived_orders
ORDER BY product_name;
""", "UNION: Combine orders without duplicates")

In [None]:
# UNION ALL: Keep duplicates
execute_query("""
SELECT product_name, amount, 'current' as source
FROM orders
UNION ALL
SELECT product_name, amount, 'archived' as source
FROM archived_orders
ORDER BY product_name;
""", "UNION ALL: Combine orders with duplicates preserved")

## Step 9: Complex JOIN Scenarios

Let's practice complex real-world scenarios combining multiple JOINs.

In [None]:
# Complex business intelligence query
execute_query("""
SELECT
    d.name as department,
    d.location,
    COUNT(DISTINCT u.id) as employee_count,
    COUNT(o.id) as total_orders,
    COALESCE(SUM(o.amount), 0) as total_amount,
    COALESCE(AVG(o.amount), 0) as avg_order_amount,
    MAX(o.order_date) as last_order_date
FROM departments d
LEFT JOIN users u ON d.id = u.department_id
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY d.id, d.name, d.location
ORDER BY total_amount DESC;
""", "Complex JOIN: Department performance analysis")

**Exercise 5**: Create a customer lifetime value analysis using JOINs.

In [None]:
# Exercise 5 Solution
execute_query("""
-- Write your complex JOIN query here
SELECT
    u.name as customer,
    u.hire_date,
    MIN(o.order_date) as first_order,
    MAX(o.order_date) as last_order,
    COUNT(o.id) as total_orders,
    SUM(o.amount) as lifetime_value,
    AVG(o.amount) as avg_order_value
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name, u.hire_date
HAVING total_orders > 0
ORDER BY lifetime_value DESC;
""", "Exercise 5: Customer lifetime value analysis")

## Step 10: Performance Optimization

Let's learn about optimizing JOIN performance.

In [None]:
# Check query execution plan
execute_query("""
EXPLAIN SELECT u.name, d.name
FROM users u
INNER JOIN departments d ON u.department_id = d.id;
""", "EXPLAIN: Check query execution plan")

In [None]:
# Create indexes for better performance
execute_query("""
CREATE INDEX IF NOT EXISTS idx_users_dept ON users(department_id);
CREATE INDEX IF NOT EXISTS idx_orders_user ON orders(user_id);
CREATE INDEX IF NOT EXISTS idx_orders_date ON orders(order_date);
""", "Creating indexes for JOIN optimization")

In [None]:
# Test performance after indexing
execute_query("""
EXPLAIN SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
""", "EXPLAIN: Check performance after indexing")

## Step 11: Final Project

**Real-world Scenario**: Create a comprehensive sales dashboard query that shows:
- Department performance
- Top customers by spending
- Order status distribution
- Monthly sales trends

Use multiple JOINs, aggregations, and set operations.

In [None]:
# Final Project Solution
execute_query("""
-- Department Performance Dashboard
SELECT 
    'Department Performance' as report_type,
    d.name as department,
    COUNT(DISTINCT u.id) as employees,
    COUNT(o.id) as orders,
    SUM(o.amount) as revenue,
    AVG(o.amount) as avg_order
FROM departments d
LEFT JOIN users u ON d.id = u.department_id
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY d.id, d.name

UNION ALL

-- Top Customers
SELECT 
    'Top Customers' as report_type,
    u.name as customer,
    d.name as department,
    COUNT(o.id) as orders,
    SUM(o.amount) as total_spent,
    MAX(o.order_date) as last_order
FROM users u
JOIN departments d ON u.department_id = d.id
JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name, d.name
ORDER BY total_spent DESC
LIMIT 5;
""", "Final Project: Sales Dashboard")

## Summary

Congratulations! You've completed Lab 10: Advanced JOINs and Set Operations.

### What You Learned:
- **INNER JOIN**: Matching rows only
- **LEFT JOIN**: All left table rows + matching right rows
- **RIGHT JOIN**: All right table rows + matching left rows
- **CROSS JOIN**: Cartesian product
- **SELF JOIN**: Join table with itself
- **UNION/UNION ALL**: Combine result sets
- **Performance optimization**: Indexes and EXPLAIN

### Key Takeaways:
1. Choose the right JOIN type for your data requirements
2. Use table aliases to avoid column name conflicts
3. Consider performance implications of different JOINs
4. UNION removes duplicates, UNION ALL preserves them
5. Indexes significantly improve JOIN performance

### Next Steps:
- Practice with larger datasets
- Learn about subqueries and CTEs
- Explore window functions
- Study database design for optimal JOINs

---

**Lab 10 Complete!** üéâ

In [None]:
# Close the database connection
if 'connection' in locals() and connection.is_connected():
    cursor.close()
    connection.close()
    print("‚úÖ Database connection closed.")

print("\nüèÅ Lab 10: Advanced JOINs and Set Operations - COMPLETED!")
print("Great work mastering complex SQL JOINs and set operations!")