# Advanced Querying Lab

## Lab Objectives
By the end of this lab, you will be able to:
- Write complex SELECT queries with multiple filtering conditions
- Use advanced filtering techniques (BETWEEN, LIKE, IN, EXISTS)
- Implement sorting and pagination
- Handle NULL values effectively
- Perform data modification operations safely
- Modify table structures with ALTER TABLE

## Prerequisites
- Basic SQL knowledge (SELECT, FROM, WHERE)
- Understanding of MySQL data types
- MySQL Server installed and running

## Lab Duration
Approximately 90 minutes

## Materials Needed
- MySQL Server
- Python environment with mysql-connector-python
- This Jupyter notebook

## Advanced Querying Concepts

### Filtering Techniques
- **BETWEEN**: Range-based filtering for dates and numbers
- **LIKE**: Pattern matching with wildcards (% and _)
- **IN**: Match against multiple values
- **EXISTS**: Check for existence in subqueries

### Logical Operators
- **AND**: All conditions must be true
- **OR**: At least one condition must be true
- **NOT**: Negate a condition
- **Parentheses**: Control evaluation order

### NULL Handling
- **IS NULL**: Check for NULL values
- **IS NOT NULL**: Check for non-NULL values
- NULL comparisons always return NULL (use IS NULL)

### Sorting and Pagination
- **ORDER BY**: Sort by one or more columns
- **LIMIT**: Restrict number of rows
- **OFFSET**: Skip rows before returning
- **ASC/DESC**: Sort direction

## Step-by-Step Guide

First, install the required Python package:

In [None]:
!pip install mysql-connector-python

## Step 1: Connect to MySQL and Set Up Database

Connect to MySQL and create the practice database with sample data.

In [None]:
import mysql.connector

conn = mysql.connector.connect(
    host='localhost',
    user='root',
    password='your_password'
)
cursor = conn.cursor()

# Create database
cursor.execute('CREATE DATABASE IF NOT EXISTS advanced_queries_lab')
cursor.execute('USE advanced_queries_lab')
print('Database ready for advanced querying practice')

## Step 2: Create Sample Tables and Data

Create tables with diverse data for practicing advanced queries.

In [None]:
# Create departments table
cursor.execute('''
CREATE TABLE IF NOT EXISTS departments (
    department_id INT PRIMARY KEY AUTO_INCREMENT,
    department_name VARCHAR(50) NOT NULL UNIQUE,
    location VARCHAR(50),
    budget DECIMAL(12, 2),
    created_date DATE DEFAULT (CURRENT_DATE)
)
''')

# Create employees table
cursor.execute('''
CREATE TABLE IF NOT EXISTS employees (
    employee_id INT PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE,
    phone VARCHAR(20),
    hire_date DATE NOT NULL,
    salary DECIMAL(10, 2),
    department_id INT,
    manager_id INT,
    job_title VARCHAR(50),
    FOREIGN KEY (department_id) REFERENCES departments(department_id),
    FOREIGN KEY (manager_id) REFERENCES employees(employee_id)
)
''')

print('Tables created successfully')

In [None]:
# Insert sample data
departments_data = [
    ('Engineering', 'Building A', 500000.00),
    ('Marketing', 'Building B', 300000.00),
    ('Sales', 'Building C', 400000.00),
    ('HR', 'Building A', 200000.00),
    ('Finance', 'Building B', 350000.00)
]

cursor.executemany('''
INSERT IGNORE INTO departments (department_name, location, budget)
VALUES (%s, %s, %s)
''', departments_data)

employees_data = [
    ('John', 'Doe', 'john.doe@company.com', '555-0101', '2020-01-15', 75000.00, 1, None, 'Senior Engineer'),
    ('Jane', 'Smith', 'jane.smith@company.com', '555-0102', '2019-03-20', 80000.00, 1, None, 'Engineering Manager'),
    ('Bob', 'Johnson', 'bob.johnson@company.com', '555-0103', '2021-06-10', 65000.00, 2, None, 'Marketing Specialist'),
    ('Alice', 'Williams', 'alice.williams@company.com', '555-0104', '2018-11-05', 90000.00, 3, None, 'Sales Director'),
    ('Charlie', 'Brown', 'charlie.brown@company.com', '555-0105', '2022-01-20', 55000.00, 4, None, 'HR Coordinator'),
    ('Diana', 'Davis', 'diana.davis@company.com', '555-0106', '2020-09-15', 70000.00, 5, None, 'Financial Analyst'),
    ('Eve', 'Miller', 'eve.miller@company.com', '555-0107', '2021-12-01', 60000.00, 2, None, 'Marketing Coordinator'),
    ('Frank', 'Wilson', 'frank.wilson@company.com', None, '2019-07-30', 85000.00, 1, None, 'Senior Developer'),
    ('Grace', 'Moore', 'grace.moore@company.com', '555-0109', '2020-04-25', 72000.00, 3, None, 'Sales Representative'),
    ('Henry', 'Taylor', 'henry.taylor@company.com', '555-0110', '2022-03-15', 58000.00, 4, None, 'HR Assistant')
]

cursor.executemany('''
INSERT IGNORE INTO employees 
(first_name, last_name, email, phone, hire_date, salary, department_id, manager_id, job_title)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)
''', employees_data)

# Update manager relationships
cursor.execute('UPDATE employees SET manager_id = 2 WHERE employee_id IN (1, 8)')  # Engineering team
cursor.execute('UPDATE employees SET manager_id = 4 WHERE employee_id IN (3, 7, 9)')  # Marketing/Sales team

conn.commit()
print(f'Inserted {cursor.rowcount} departments and employees')

## Step 3: Practice Basic Advanced Queries

Start with complex WHERE clauses combining multiple conditions.

In [None]:
# Query 1: Complex WHERE with multiple conditions
cursor.execute('''
SELECT 
    employee_id,
    CONCAT(first_name, ' ', last_name) AS full_name,
    salary,
    hire_date,
    department_id
FROM employees
WHERE salary BETWEEN 60000 AND 80000
  AND hire_date >= '2020-01-01'
  AND department_id IN (1, 2, 3)
ORDER BY salary DESC
''')

results = cursor.fetchall()
print('Employees with salary $60K-$80K, hired after 2020, in departments 1-3:')
print('-' * 70)
for row in results:
    print(f'{row[0]:<3} | {row[1]:<20} | ${row[2]:>8,.0f} | {row[3]} | Dept {row[4]}')

In [None]:
# Query 2: Pattern matching with LIKE
cursor.execute('''
SELECT 
    employee_id,
    CONCAT(first_name, ' ', last_name) AS full_name,
    email,
    job_title
FROM employees
WHERE first_name LIKE 'J%'
   OR last_name LIKE '%son'
   OR job_title LIKE '%Manager%'
ORDER BY first_name
''')

results = cursor.fetchall()
print('\nEmployees with names starting with J, last names ending with son, or Manager titles:')
print('-' * 80)
for row in results:
    print(f'{row[0]:<3} | {row[1]:<20} | {row[2]:<25} | {row[3]}')

## Step 4: NULL Value Handling

Practice handling NULL values in queries.

In [None]:
# Query 3: NULL value handling
cursor.execute('''
SELECT 
    employee_id,
    CONCAT(first_name, ' ', last_name) AS full_name,
    phone,
    manager_id,
    CASE 
        WHEN phone IS NULL THEN 'No phone'
        ELSE 'Has phone'
    END as phone_status,
    CASE 
        WHEN manager_id IS NULL THEN 'Manager'
        ELSE 'Employee'
    END as role_type
FROM employees
WHERE phone IS NULL 
   OR manager_id IS NULL
ORDER BY employee_id
''')

results = cursor.fetchall()
print('Employees with missing phone or no manager (NULL values):')
print('-' * 85)
for row in results:
    phone_display = row[2] if row[2] else 'NULL'
    manager_display = str(row[3]) if row[3] else 'NULL'
    print(f'{row[0]:<3} | {row[1]:<20} | {phone_display:<12} | {manager_display:<8} | {row[4]:<9} | {row[5]}')

## Step 5: Advanced Filtering Techniques

Practice BETWEEN, IN, and EXISTS operators.

In [None]:
# Query 4: BETWEEN for ranges
cursor.execute('''
SELECT 
    employee_id,
    CONCAT(first_name, ' ', last_name) AS full_name,
    salary,
    hire_date
FROM employees
WHERE salary BETWEEN 65000 AND 85000
  AND hire_date BETWEEN '2019-01-01' AND '2021-12-31'
ORDER BY salary DESC
''')

results = cursor.fetchall()
print('Employees with salary $65K-$85K, hired between 2019-2021:')
print('-' * 65)
for row in results:
    print(f'{row[0]:<3} | {row[1]:<20} | ${row[2]:>8,.0f} | {row[3]}')

In [None]:
# Query 5: IN operator
cursor.execute('''
SELECT 
    d.department_name,
    COUNT(e.employee_id) as employee_count,
    AVG(e.salary) as avg_salary
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id
WHERE d.location IN ('Building A', 'Building B')
GROUP BY d.department_id, d.department_name
ORDER BY employee_count DESC
''')

results = cursor.fetchall()
print('\nDepartments in Building A or B:')
print('-' * 50)
for row in results:
    avg_salary = f'${row[2]:,.0f}' if row[2] else 'N/A'
    print(f'{row[0]:<15} | {row[1]:<5} employees | {avg_salary}')

## Step 6: Complex Logical Expressions

Practice combining AND, OR, and NOT with proper parentheses.

In [None]:
# Query 6: Complex AND/OR logic
cursor.execute('''
SELECT 
    employee_id,
    CONCAT(first_name, ' ', last_name) AS full_name,
    job_title,
    salary,
    hire_date
FROM employees
WHERE (job_title LIKE '%Engineer%' OR job_title LIKE '%Developer%')
  AND salary >= 70000
  AND (hire_date <= '2020-12-31' OR manager_id IS NOT NULL)
ORDER BY salary DESC
''')

results = cursor.fetchall()
print('Engineers/Developers with salary >= $70K, hired before 2021 or have manager:')
print('-' * 85)
for row in results:
    print(f'{row[0]:<3} | {row[1]:<20} | {row[2]:<20} | ${row[3]:>8,.0f} | {row[4]}')

## Step 7: Sorting and Pagination

Practice ORDER BY, LIMIT, and OFFSET.

In [None]:
# Query 7: Multi-column sorting
cursor.execute('''
SELECT 
    employee_id,
    CONCAT(first_name, ' ', last_name) AS full_name,
    department_id,
    salary,
    hire_date
FROM employees
ORDER BY department_id ASC, salary DESC, hire_date ASC
''')

results = cursor.fetchall()
print('Employees sorted by department (ASC), then salary (DESC), then hire date (ASC):')
print('-' * 80)
for row in results:
    print(f'{row[0]:<3} | {row[1]:<20} | Dept {row[2]:<2} | ${row[3]:>8,.0f} | {row[4]}')

In [None]:
# Query 8: Pagination with LIMIT and OFFSET
page_size = 3
for page in range(1, 4):  # Show first 3 pages
    offset = (page - 1) * page_size
    cursor.execute(f'''
    SELECT 
        employee_id,
        CONCAT(first_name, ' ', last_name) AS full_name,
        salary
    FROM employees
    ORDER BY salary DESC
    LIMIT {page_size} OFFSET {offset}
    ''')
    
    results = cursor.fetchall()
    print(f'\nPage {page} (Top salaries):')
    print('-' * 35)
    for row in results:
        print(f'{row[0]:<3} | {row[1]:<20} | ${row[2]:>8,.0f}')

## Step 8: Data Modification Operations

Practice UPDATE and DELETE operations safely.

In [None]:
# First, let's see current data before updates
cursor.execute('''
SELECT employee_id, CONCAT(first_name, ' ', last_name) AS full_name, salary, phone
FROM employees 
WHERE phone IS NULL OR salary < 60000
ORDER BY employee_id
''')

results = cursor.fetchall()
print('Employees needing updates (NULL phone or low salary):')
print('-' * 60)
for row in results:
    phone_display = row[3] if row[3] else 'NULL'
    print(f'{row[0]:<3} | {row[1]:<20} | ${row[2]:>8,.0f} | {phone_display}')

In [None]:
# UPDATE operation: Add phone numbers and salary increases
cursor.execute('''
UPDATE employees 
SET phone = '555-0000', salary = salary * 1.05
WHERE phone IS NULL AND salary < 60000
''')

print(f'Updated {cursor.rowcount} employee(s) with phone and salary increase')

# Verify the updates
cursor.execute('''
SELECT employee_id, CONCAT(first_name, ' ', last_name) AS full_name, salary, phone
FROM employees 
WHERE phone = '555-0000'
ORDER BY employee_id
''')

results = cursor.fetchall()
print('\nUpdated employees:')
print('-' * 50)
for row in results:
    print(f'{row[0]:<3} | {row[1]:<20} | ${row[2]:>8,.0f} | {row[3]}')

conn.commit()  # Commit the changes

## Step 9: Table Structure Modification

Practice ALTER TABLE operations.

In [None]:
# ALTER TABLE: Add performance rating column
cursor.execute('''
ALTER TABLE employees 
ADD COLUMN performance_rating INT DEFAULT 3 
CHECK (performance_rating >= 1 AND performance_rating <= 5)
''')

print('Added performance_rating column to employees table')

# Update some performance ratings
cursor.execute('''
UPDATE employees 
SET performance_rating = CASE 
    WHEN salary > 80000 THEN 5
    WHEN salary > 70000 THEN 4
    WHEN salary > 60000 THEN 3
    ELSE 2
END
''')

conn.commit()
print(f'Updated performance ratings for {cursor.rowcount} employees')

In [None]:
# ALTER TABLE: Modify phone column to be NOT NULL
cursor.execute('''
ALTER TABLE employees 
MODIFY COLUMN phone VARCHAR(25) NOT NULL DEFAULT 'Not provided'
''')

print('Modified phone column to be NOT NULL with default value')

# Verify the table structure
cursor.execute('DESCRIBE employees')
columns = cursor.fetchall()
print('\nEmployees table structure:')
print('-' * 60)
for col in columns:
    nullable = 'YES' if col[2] == 'YES' else 'NO'
    default = col[4] if col[4] else 'NULL'
    print(f'{col[0]:<20} | {col[1]:<15} | {nullable:<3} | {default}')

## Step 10: Advanced Query Patterns

Practice complex queries with subqueries and aggregations.

In [None]:
# Query 9: Subquery with IN
cursor.execute('''
SELECT 
    employee_id,
    CONCAT(first_name, ' ', last_name) AS full_name,
    department_id,
    salary
FROM employees
WHERE department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE location = 'Building A'
)
ORDER BY salary DESC
''')

results = cursor.fetchall()
print('Employees in Building A departments:')
print('-' * 55)
for row in results:
    print(f'{row[0]:<3} | {row[1]:<20} | Dept {row[2]:<2} | ${row[3]:>8,.0f}')

In [None]:
# Query 10: EXISTS subquery
cursor.execute('''
SELECT 
    d.department_name,
    d.location,
    COUNT(e.employee_id) as employee_count
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id
WHERE EXISTS (
    SELECT 1 
    FROM employees e2 
    WHERE e2.department_id = d.department_id 
    AND e2.salary > 70000
)
GROUP BY d.department_id, d.department_name, d.location
ORDER BY employee_count DESC
''')

results = cursor.fetchall()
print('\nDepartments with at least one employee earning > $70K:')
print('-' * 55)
for row in results:
    print(f'{row[0]:<15} | {row[1]:<12} | {row[2]:<5} employees')

## Step 11: Clean Up

Close the database connection.

In [None]:
cursor.close()
conn.close()
print('Database connection closed')

## Lab Summary

Excellent! You have successfully completed the Advanced Querying Lab. In this lab, you learned how to:

1. **Write Complex Queries**: Multi-condition WHERE clauses with AND/OR logic
2. **Advanced Filtering**: BETWEEN, LIKE, IN, and EXISTS operators
3. **NULL Handling**: Proper NULL value checking and handling
4. **Sorting & Pagination**: ORDER BY, LIMIT, and OFFSET for result control
5. **Data Modification**: Safe UPDATE and DELETE operations
6. **Table Alteration**: ALTER TABLE for schema modifications
7. **Subqueries**: IN, EXISTS, and correlated subqueries

## Key Concepts Learned
- **BETWEEN**: Range filtering for dates and numbers
- **LIKE**: Pattern matching with % and _ wildcards
- **IN/EXISTS**: Multiple value and existence checking
- **IS NULL**: Proper NULL value detection
- **LIMIT/OFFSET**: Result pagination
- **UPDATE/DELETE**: Data modification with WHERE clauses
- **ALTER TABLE**: Schema modification commands

## Best Practices
- Always test queries on small datasets first
- Use parentheses to control operator precedence
- Be cautious with UPDATE/DELETE without WHERE
- Use LIMIT when exploring large tables
- Backup data before major modifications
- Use meaningful aliases for complex queries

## Common Query Patterns
```sql
-- Range filtering
WHERE salary BETWEEN 50000 AND 100000

-- Pattern matching
WHERE name LIKE 'John%'

-- NULL checking
WHERE email IS NOT NULL

-- Pagination
ORDER BY created_date DESC LIMIT 10 OFFSET 20

-- Safe updates
UPDATE users SET status = 'active' WHERE last_login > '2023-01-01'
```

## Advanced Topics to Explore
- Window functions (ROW_NUMBER, RANK, LAG/LEAD)
- Common Table Expressions (CTEs)
- Recursive queries
- Query optimization and indexing
- Stored procedures and functions
- Triggers for automatic data management

## Performance Tips
- Use indexes on frequently filtered columns
- Avoid SELECT * in production
- Use LIMIT for large result sets
- Consider query execution plans
- Batch large updates/deletes

## Next Steps
- Learn JOIN operations for multi-table queries
- Study database design and normalization
- Explore stored procedures and triggers
- Practice with real-world datasets
- Learn about query optimization

## Challenge Exercises

### Challenge 1: Complex Employee Report
Create a query that shows:
- Employees hired in the last 2 years
- With salaries above department average
- Who have phone numbers
- Sorted by department and salary

### Challenge 2: Department Analysis
Write queries to:
- Find departments with no employees
- Calculate department salary statistics
- Identify departments needing budget increases
- Compare department performance

### Challenge 3: Data Maintenance
Create safe scripts to:
- Update all NULL phones to 'Not available'
- Give raises to employees based on performance
- Archive employees hired before 2019
- Clean up duplicate email addresses

Remember: Advanced querying is about extracting meaningful insights from your data. The more complex your queries become, the more important it is to write clear, maintainable SQL code!