# Lab 11: Subqueries and Advanced Querying

**Duration**: 90 minutes  
**Prerequisites**: Lab 8 (Advanced Querying)  
**Learning Objectives**:
- Master subqueries in all contexts (WHERE, FROM, SELECT, HAVING)
- Understand GROUP BY and HAVING clauses
- Apply ROLLUP operator for summary reports
- Choose between WHERE and HAVING appropriately
- Optimize complex queries

---

## Step 1: Environment Setup

First, let's install the MySQL connector and set up our environment.

In [None]:
# Install MySQL connector (run this in Google Colab or your local environment)
# !pip install mysql-connector-python

# Import required libraries
import mysql.connector
import pandas as pd
from IPython.display import display, HTML

print("Libraries imported successfully!")

In [None]:
# Connect to MySQL database
# Replace with your actual database credentials
try:
    connection = mysql.connector.connect(
        host='localhost',
        user='your_username',
        password='your_password',
        database='subqueries_db'
    )
    cursor = connection.cursor()
    print("‚úÖ Connected to MySQL database successfully!")
except mysql.connector.Error as err:
    print(f"‚ùå Connection failed: {err}")
    print("Please check your database credentials and ensure MySQL is running.")

## Step 2: Create Sample Database

Let's create the sample tables and data for our subqueries exercises.

In [None]:
# Create database and tables
def execute_query(query, description=""):
    """Helper function to execute SQL queries and display results"""
    try:
        if description:
            print(f"\nüìù {description}")
        
        # Split multiple statements
        statements = [stmt.strip() for stmt in query.split(';') if stmt.strip()]
        
        for stmt in statements:
            if stmt:
                cursor.execute(stmt)
        
        # Try to fetch results if it's a SELECT query
        try:
            results = cursor.fetchall()
            if results:
                df = pd.DataFrame(results, columns=[desc[0] for desc in cursor.description])
                display(df)
                print(f"üìä {len(results)} rows returned")
            else:
                print("‚úÖ Query executed successfully (no results to display)")
        except mysql.connector.Error:
            print("‚úÖ Query executed successfully")
            
    except mysql.connector.Error as err:
        print(f"‚ùå Error: {err}")

print("Helper function created!")

In [None]:
# Create database
execute_query("""
CREATE DATABASE IF NOT EXISTS subqueries_db;
USE subqueries_db;
""", "Creating database")

# Create tables
execute_query("""
CREATE TABLE IF NOT EXISTS departments (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50) NOT NULL,
    budget DECIMAL(12,2),
    location VARCHAR(50)
);

CREATE TABLE IF NOT EXISTS employees (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50) NOT NULL,
    department_id INT,
    salary DECIMAL(10,2),
    hire_date DATE,
    manager_id INT
);

CREATE TABLE IF NOT EXISTS projects (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100),
    department_id INT,
    budget DECIMAL(12,2),
    start_date DATE,
    end_date DATE
);
""", "Creating tables")

In [None]:
# Insert sample data
execute_query("""
INSERT INTO departments (name, budget, location) VALUES
('IT', 500000.00, 'Floor 1'),
('HR', 300000.00, 'Floor 2'),
('Sales', 400000.00, 'Floor 3'),
('Marketing', 350000.00, 'Floor 4'),
('Finance', 450000.00, 'Floor 5')
ON DUPLICATE KEY UPDATE name = VALUES(name);
""", "Inserting department data")

execute_query("""
INSERT INTO employees (name, department_id, salary, hire_date) VALUES
('Aarav', 1, 75000.00, '2023-01-15'),
('Sneha', 2, 65000.00, '2023-02-20'),
('Raj', 3, 80000.00, '2023-03-10'),
('Priya', 1, 70000.00, '2023-04-05'),
('Vikram', 3, 55000.00, '2023-05-12'),
('Anjali', 4, 60000.00, '2023-06-18'),
('Rohit', 1, 72000.00, '2023-07-22'),
('Kavita', 5, 85000.00, '2023-08-30'),
('Suresh', 3, 58000.00, '2023-09-14'),
('Meera', 2, 62000.00, '2023-10-08')
ON DUPLICATE KEY UPDATE name = VALUES(name);
""", "Inserting employee data")

# Set up manager hierarchy
execute_query("""
UPDATE employees SET manager_id = 1 WHERE id IN (2, 4, 7);
UPDATE employees SET manager_id = 3 WHERE id IN (5, 9);
UPDATE employees SET manager_id = 8 WHERE id = 6;
""", "Setting up manager hierarchy")

In [None]:
# Insert projects data
execute_query("""
INSERT INTO projects (name, department_id, budget, start_date, end_date) VALUES
('Website Redesign', 1, 150000.00, '2024-01-01', '2024-06-30'),
('HR System Upgrade', 2, 80000.00, '2024-02-01', '2024-08-31'),
('Sales CRM', 3, 200000.00, '2024-01-15', '2024-12-15'),
('Marketing Campaign', 4, 120000.00, '2024-03-01', '2024-09-30'),
('Financial Reporting', 5, 90000.00, '2024-04-01', '2024-10-31'),
('Mobile App', 1, 250000.00, '2024-05-01', '2024-12-31')
ON DUPLICATE KEY UPDATE name = VALUES(name);
""", "Inserting projects data")

## Step 3: Single-Row Subqueries

Single-row subqueries return exactly one value and are used with comparison operators.

In [None]:
# Find employees in the Sales department
execute_query("""
SELECT name, salary
FROM employees
WHERE department_id = (
    SELECT id
    FROM departments
    WHERE name = 'Sales'
);
""", "Single-row subquery: Employees in Sales department")

In [None]:
# Find employees with salary above company average
execute_query("""
SELECT name, salary
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
);
""", "Single-row subquery: Above-average salary employees")

**Exercise 1**: Find the employee with the highest salary using a subquery.

In [None]:
# Exercise 1 Solution
execute_query("""
-- Write your single-row subquery here
SELECT name, salary
FROM employees
WHERE salary = (
    SELECT MAX(salary)
    FROM employees
);
""", "Exercise 1: Employee with highest salary")

## Step 4: Multiple-Row Subqueries

Multiple-row subqueries return multiple values and use operators like IN, ANY, ALL.

In [None]:
# Find employees in IT or HR departments
execute_query("""
SELECT e.name, e.salary, d.name as department
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.department_id IN (
    SELECT id
    FROM departments
    WHERE name IN ('IT', 'HR')
);
""", "Multiple-row subquery: Employees in IT or HR")

In [None]:
# Find employees with salary above ANY department average
execute_query("""
SELECT e.name, e.salary, d.name as department
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.salary > ANY (
    SELECT AVG(salary)
    FROM employees
    GROUP BY department_id
);
""", "Multiple-row subquery: Salary above ANY department average")

**Exercise 2**: Find employees earning more than ALL department averages.

In [None]:
# Exercise 2 Solution
execute_query("""
-- Write your multiple-row subquery here
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > ALL (
    SELECT AVG(salary)
    FROM employees
    GROUP BY department_id
);
""", "Exercise 2: Salary above ALL department averages")

## Step 5: Correlated Subqueries

Correlated subqueries reference columns from the outer query and execute once per outer row.

In [None]:
# Find employees earning more than their department average
execute_query("""
SELECT e1.name, e1.salary, d.name as department
FROM employees e1
JOIN departments d ON e1.department_id = d.id
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);
""", "Correlated subquery: Above department average salary")

In [None]:
# Find employees who earn more than their manager
execute_query("""
SELECT e.name as employee, e.salary as emp_salary,
       m.name as manager, m.salary as mgr_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
""", "Correlated subquery: Employees earning more than manager")

**Exercise 3**: Find departments where all employees earn above 60,000.

In [None]:
# Exercise 3 Solution
execute_query("""
-- Write your correlated subquery here
SELECT d.name
FROM departments d
WHERE 60000 <= ALL (
    SELECT e.salary
    FROM employees e
    WHERE e.department_id = d.id
);
""", "Exercise 3: Departments with all employees above 60k")

## Step 6: Subqueries in FROM Clause

Subqueries in FROM clause create derived tables.

In [None]:
# Department salary statistics using derived table
execute_query("""
SELECT
    dept_stats.department_name,
    dept_stats.employee_count,
    dept_stats.avg_salary
FROM (
    SELECT
        d.name as department_name,
        COUNT(e.id) as employee_count,
        ROUND(AVG(e.salary), 2) as avg_salary
    FROM departments d
    LEFT JOIN employees e ON d.id = e.department_id
    GROUP BY d.id, d.name
) dept_stats
WHERE dept_stats.avg_salary > 65000;
""", "Derived table: Department statistics above average")

**Exercise 4**: Show project budget as percentage of department budget.

In [None]:
# Exercise 4 Solution
execute_query("""
-- Write your derived table query here
SELECT
    p.name as project_name,
    p.budget as project_budget,
    dept_info.dept_budget,
    ROUND((p.budget / dept_info.dept_budget * 100), 2) as budget_percentage
FROM projects p
JOIN (
    SELECT id, budget as dept_budget
    FROM departments
) dept_info ON p.department_id = dept_info.id;
""", "Exercise 4: Project budget percentages")

## Step 7: GROUP BY and HAVING

GROUP BY groups rows, HAVING filters groups after aggregation.

In [None]:
# Basic GROUP BY
execute_query("""
SELECT d.name as department, COUNT(e.id) as employee_count
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name
ORDER BY employee_count DESC;
""", "GROUP BY: Employee count per department")

In [None]:
# GROUP BY with HAVING
execute_query("""
SELECT d.name as department, COUNT(e.id) as employee_count
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name
HAVING COUNT(e.id) > 1;
""", "HAVING: Departments with more than 1 employee")

In [None]:
# Complex HAVING with aggregates
execute_query("""
SELECT d.name as department, ROUND(AVG(e.salary), 2) as avg_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name
HAVING AVG(e.salary) > 65000;
""", "HAVING: Departments with average salary above 65k")

**Exercise 5**: Show departments where total salary exceeds department budget.

In [None]:
# Exercise 5 Solution
execute_query("""
-- Write your GROUP BY and HAVING query here
SELECT d.name as department, SUM(e.salary) as total_salary, d.budget
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name, d.budget
HAVING SUM(e.salary) > d.budget;
""", "Exercise 5: Departments exceeding budget")

## Step 8: ROLLUP Operator

ROLLUP creates subtotals and grand totals for grouped data.

In [None]:
# ROLLUP: Department and year subtotals
execute_query("""
SELECT
    d.name as department,
    YEAR(e.hire_date) as hire_year,
    COUNT(e.id) as employees_hired
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.name, YEAR(e.hire_date) WITH ROLLUP;
""", "ROLLUP: Department and year hiring subtotals")

In [None]:
# ROLLUP with COALESCE for better formatting
execute_query("""
SELECT
    COALESCE(d.name, 'TOTAL') as department,
    COALESCE(YEAR(e.hire_date), 'ALL YEARS') as hire_year,
    COUNT(e.id) as employees_hired,
    ROUND(SUM(e.salary), 2) as total_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.name, YEAR(e.hire_date) WITH ROLLUP
ORDER BY
    CASE WHEN d.name IS NULL THEN 1 ELSE 0 END,
    d.name,
    CASE WHEN YEAR(e.hire_date) IS NULL THEN 1 ELSE 0 END,
    YEAR(e.hire_date);
""", "ROLLUP: Formatted summary report")

## Step 9: Complex Subquery Scenarios

Combining subqueries with other advanced SQL features.

In [None]:
# Subquery with EXISTS
execute_query("""
SELECT d.name as department
FROM departments d
WHERE EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.id
    AND e.salary > 70000
);
""", "EXISTS: Departments with high-salary employees")

In [None]:
# Nested subqueries
execute_query("""
SELECT name, salary
FROM employees
WHERE department_id = (
    SELECT id
    FROM departments
    WHERE budget = (
        SELECT MAX(budget)
        FROM departments
    )
);
""", "Nested subqueries: Employees in highest-budget department")

**Exercise 6**: Create a department performance dashboard using subqueries and GROUP BY.

In [None]:
# Exercise 6 Solution
execute_query("""
-- Write your complex query here
SELECT
    d.name as department,
    COUNT(e.id) as employees,
    ROUND(AVG(e.salary), 2) as avg_salary,
    (
        SELECT COUNT(*)
        FROM projects p
        WHERE p.department_id = d.id
    ) as project_count,
    (
        SELECT SUM(budget)
        FROM projects p
        WHERE p.department_id = d.id
    ) as total_project_budget
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name
ORDER BY total_project_budget DESC;
""", "Exercise 6: Department performance dashboard")

## Step 10: Performance Considerations

Understanding when to use subqueries vs JOINs for better performance.

In [None]:
# Check execution plan
execute_query("""
EXPLAIN SELECT name, salary
FROM employees
WHERE salary > (
    SELECT AVG(salary) FROM employees
);
""", "EXPLAIN: Check subquery execution plan")

In [None]:
# Alternative using JOIN (potentially faster)
execute_query("""
SELECT e.name, e.salary
FROM employees e
CROSS JOIN (SELECT AVG(salary) as avg_salary FROM employees) avg_table
WHERE e.salary > avg_table.avg_salary;
""", "Alternative: Using CROSS JOIN instead of subquery")

## Step 11: Final Project

**Business Intelligence Dashboard**: Create a comprehensive company analytics report using:
- Subqueries for complex filtering
- GROUP BY and ROLLUP for summaries
- Multiple aggregation levels
- Performance-optimized queries

Generate reports for:
- Department performance metrics
- Employee salary analysis
- Project budget utilization
- Hiring trends and patterns

In [None]:
# Final Project: Comprehensive Business Intelligence Dashboard
execute_query("""
-- Department Performance Summary
SELECT
    'Department Performance' as report_section,
    d.name as department,
    COUNT(DISTINCT e.id) as employee_count,
    ROUND(AVG(e.salary), 2) as avg_salary,
    ROUND(d.budget / COUNT(DISTINCT e.id), 2) as budget_per_employee,
    (
        SELECT COUNT(*) FROM projects p WHERE p.department_id = d.id
    ) as active_projects
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name, d.budget

UNION ALL

-- Top Performers by Salary
SELECT
    'Top Salary Earners' as report_section,
    e.name as employee,
    d.name as department,
    e.salary as salary,
    ROUND(
        (e.salary - (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id)) /
        (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) * 100, 2
    ) as percent_above_dept_avg,
    NULL as active_projects
FROM employees e
JOIN departments d ON e.department_id = d.id
ORDER BY e.salary DESC
LIMIT 5;
""", "Final Project: Business Intelligence Dashboard")

## Summary

Congratulations! You've completed Lab 11: Subqueries and Advanced Querying.

### What You Learned:
- **Single-row subqueries**: Return one value (=, <, >)
- **Multiple-row subqueries**: Return multiple values (IN, ANY, ALL)
- **Correlated subqueries**: Reference outer query columns
- **Derived tables**: Subqueries in FROM clause
- **GROUP BY**: Group identical data for aggregation
- **HAVING**: Filter groups after aggregation
- **ROLLUP**: Create subtotals and grand totals
- **EXISTS/NOT EXISTS**: Check for record existence

### Key Takeaways:
1. **WHERE** filters rows before grouping, **HAVING** filters groups after
2. Correlated subqueries execute once per outer row (consider performance)
3. Use appropriate subquery type for your data requirements
4. ROLLUP creates hierarchical summary reports
5. EXISTS is often faster than IN for existence checks

### Next Steps:
- Practice with larger datasets
- Learn Common Table Expressions (CTEs)
- Explore window functions
- Study query optimization
- Consider stored procedures for complex logic

---

**Lab 11 Complete!** üéâ

In [None]:
# Close the database connection
if 'connection' in locals() and connection.is_connected():
    cursor.close()
    connection.close()
    print("‚úÖ Database connection closed.")

print("\nüèÅ Lab 11: Subqueries and Advanced Querying - COMPLETED!")
print("Great work mastering advanced SQL querying techniques!")