# SQL Performance Best Practices

This notebook covers essential performance tips and best practices for writing efficient SQL queries across different operations.

## Table of Contents
1. Introduction to Performance Tips
2. Performance Tips for Fetching Data
3. Performance Tips for Filtering
4. Performance Tips for Joining
5. Performance Tips for Aggregation
6. Performance Tips for Subqueries
7. Performance Tips for DDL
8. Performance Tips for Indexing


## 1. Introduction to Performance Tips

### Why Performance Matters
- **Query execution time**: Faster queries improve user experience
- **Resource utilization**: Efficient queries reduce CPU, memory, and I/O usage
- **Scalability**: Well-optimized queries handle larger datasets better
- **Cost**: In cloud environments, inefficient queries increase costs

### Key Performance Principles
1. **Minimize data movement**: Only fetch what you need
2. **Reduce I/O operations**: Fewer disk reads/writes = better performance
3. **Leverage indexes**: Proper indexing dramatically speeds up queries
4. **Optimize joins**: Choose appropriate join types and order
5. **Avoid unnecessary operations**: Eliminate redundant calculations and subqueries
6. **Use appropriate data types**: Right-sized data types reduce storage and improve speed
7. **Plan for scale**: Design queries that perform well as data grows

### Performance Analysis Tools
- **EXPLAIN / EXPLAIN ANALYZE**: Understand query execution plans
- **Query profilers**: Identify bottlenecks in query execution
- **Database statistics**: Monitor table sizes, index usage, and query patterns
- **Performance monitoring**: Track slow queries and resource usage


## 2. Performance Tips for Fetching Data

### Use SELECT Specific Columns
**❌ Bad:**
```sql
SELECT * FROM large_table;
```

**✅ Good:**
```sql
SELECT id, name, email FROM large_table;
```

**Why:** Fetching only needed columns reduces:
- Network transfer time
- Memory usage
- I/O operations
- Processing overhead

### Limit Result Sets
**❌ Bad:**
```sql
SELECT * FROM orders;
```

**✅ Good:**
```sql
SELECT * FROM orders LIMIT 100;
-- or
SELECT TOP 100 * FROM orders;  -- SQL Server
SELECT * FROM orders FETCH FIRST 100 ROWS ONLY;  -- DB2
```

**Why:** Limiting results prevents fetching unnecessary data, especially useful for:
- Pagination
- Preview queries
- Testing queries

### Use DISTINCT Sparingly
**❌ Bad:**
```sql
SELECT DISTINCT * FROM large_table;
```

**✅ Good:**
```sql
-- Only use DISTINCT when necessary
SELECT DISTINCT customer_id FROM orders;

-- Or better, use GROUP BY if aggregating
SELECT customer_id FROM orders GROUP BY customer_id;
```

**Why:** DISTINCT requires sorting/grouping entire result set, which is expensive

### Avoid SELECT in Loops
**❌ Bad:**
```sql
-- In application code
FOR each customer:
    SELECT * FROM orders WHERE customer_id = customer.id
```

**✅ Good:**
```sql
-- Fetch all at once
SELECT * FROM orders WHERE customer_id IN (1, 2, 3, ...);
```

**Why:** Batch operations are much more efficient than multiple round trips

### Use Appropriate Data Types
**❌ Bad:**
```sql
CREATE TABLE users (
    id VARCHAR(255),  -- Should be INT
    age TEXT,         -- Should be INT
    created_at TEXT   -- Should be TIMESTAMP
);
```

**✅ Good:**
```sql
CREATE TABLE users (
    id INT PRIMARY KEY,
    age INT,
    created_at TIMESTAMP
);
```

**Why:** Proper data types:
- Use less storage
- Enable better indexing
- Allow faster comparisons and sorting


## 3. Performance Tips for Filtering

### Use Indexed Columns in WHERE Clauses
**❌ Bad:**
```sql
SELECT * FROM orders WHERE order_date = '2024-01-01';
-- If order_date is not indexed
```

**✅ Good:**
```sql
-- Create index first
CREATE INDEX idx_order_date ON orders(order_date);

-- Then query
SELECT * FROM orders WHERE order_date = '2024-01-01';
```

**Why:** Indexes allow database to quickly locate rows without full table scans

### Avoid Functions on Indexed Columns
**❌ Bad:**
```sql
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
SELECT * FROM users WHERE UPPER(name) = 'JOHN';
```

**✅ Good:**
```sql
SELECT * FROM orders 
WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

SELECT * FROM users WHERE name = 'JOHN';  -- Store in consistent case
```

**Why:** Functions on indexed columns prevent index usage, forcing full table scans

### Use Appropriate Comparison Operators
**❌ Bad:**
```sql
SELECT * FROM products WHERE price != 100;
SELECT * FROM orders WHERE status <> 'cancelled';
```

**✅ Good:**
```sql
-- If most rows match, use positive conditions
SELECT * FROM products WHERE price = 100;
SELECT * FROM orders WHERE status = 'active';
```

**Why:** 
- Positive conditions can use indexes better
- Query optimizer can make better decisions
- Consider selectivity: if most rows match, use the opposite condition

### Filter Early with WHERE Before JOIN
**❌ Bad:**
```sql
SELECT o.*, c.name 
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.order_date > '2024-01-01';
```

**✅ Good:**
```sql
-- Filter orders first (if possible in your DBMS)
SELECT o.*, c.name 
FROM (SELECT * FROM orders WHERE order_date > '2024-01-01') o
JOIN customers c ON o.customer_id = c.id;
```

**Why:** Reducing rows before joining decreases join complexity

### Use IN Instead of Multiple ORs (When Appropriate)
**❌ Bad:**
```sql
SELECT * FROM orders 
WHERE status = 'pending' OR status = 'processing' OR status = 'shipped';
```

**✅ Good:**
```sql
SELECT * FROM orders 
WHERE status IN ('pending', 'processing', 'shipped');
```

**Why:** IN is often optimized better by query planners

### Avoid LIKE with Leading Wildcards
**❌ Bad:**
```sql
SELECT * FROM products WHERE name LIKE '%shirt%';
SELECT * FROM products WHERE name LIKE '%shirt';
```

**✅ Good:**
```sql
-- If possible, use prefix search
SELECT * FROM products WHERE name LIKE 'shirt%';

-- Or use full-text search for better performance
SELECT * FROM products WHERE MATCH(name) AGAINST('shirt' IN NATURAL LANGUAGE MODE);
```

**Why:** Leading wildcards prevent index usage; prefix searches can use indexes


## 4. Performance Tips for Joining

### Join Order Matters
**Strategy:** Join smaller tables first, or let optimizer decide

**✅ Good:**
```sql
-- Join smaller filtered tables first
SELECT o.*, c.name, p.name
FROM (SELECT * FROM orders WHERE order_date > '2024-01-01') o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id;
```

**Why:** Smaller intermediate result sets reduce join complexity

### Use Appropriate Join Types
**❌ Bad:**
```sql
-- Using LEFT JOIN when INNER JOIN is sufficient
SELECT o.*, c.name
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
WHERE c.id IS NOT NULL;  -- Effectively an INNER JOIN
```

**✅ Good:**
```sql
SELECT o.*, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;
```

**Why:** 
- INNER JOIN is typically faster than LEFT JOIN
- Query optimizer can optimize INNER JOINs better
- Only use LEFT/RIGHT JOIN when you need unmatched rows

### Index Join Columns
**❌ Bad:**
```sql
-- Joining without indexes
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- If customer_id or c.id are not indexed
```

**✅ Good:**
```sql
-- Create indexes on join columns
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_customers_id ON customers(id);  -- Usually primary key

SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
```

**Why:** Indexes enable efficient join algorithms (hash joins, merge joins)

### Avoid Cartesian Products
**❌ Bad:**
```sql
SELECT * FROM orders, customers;  -- Missing WHERE clause
```

**✅ Good:**
```sql
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id;
```

**Why:** Cartesian products create massive result sets (n × m rows)

### Use EXISTS Instead of JOIN for Existence Checks
**❌ Bad:**
```sql
SELECT DISTINCT c.*
FROM customers c
JOIN orders o ON c.id = o.customer_id;
```

**✅ Good:**
```sql
SELECT c.*
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
```

**Why:** EXISTS stops at first match, more efficient for existence checks

### Filter Before Joining Large Tables
**❌ Bad:**
```sql
SELECT o.*, c.*
FROM large_orders_table o
JOIN large_customers_table c ON o.customer_id = c.id
WHERE o.order_date > '2024-01-01';
```

**✅ Good:**
```sql
SELECT o.*, c.*
FROM (SELECT * FROM large_orders_table WHERE order_date > '2024-01-01') o
JOIN large_customers_table c ON o.customer_id = c.id;
```

**Why:** Reducing rows before join decreases memory and processing requirements


## 5. Performance Tips for Aggregation

### Filter Before Aggregating
**❌ Bad:**
```sql
SELECT customer_id, COUNT(*), SUM(amount)
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 10;
```

**✅ Good:**
```sql
-- Filter with WHERE before grouping
SELECT customer_id, COUNT(*), SUM(amount)
FROM orders
WHERE order_date > '2024-01-01'
GROUP BY customer_id
HAVING COUNT(*) > 10;
```

**Why:** 
- WHERE filters rows before grouping (faster)
- HAVING filters groups after aggregation (slower)
- Use WHERE for row-level filters, HAVING for group-level filters

### Use Appropriate Aggregate Functions
**❌ Bad:**
```sql
SELECT customer_id, COUNT(DISTINCT order_id), COUNT(DISTINCT product_id)
FROM orders
GROUP BY customer_id;
```

**✅ Good:**
```sql
-- If you need both, consider if really necessary
SELECT customer_id, 
       COUNT(order_id) as order_count,
       COUNT(DISTINCT product_id) as unique_products
FROM orders
GROUP BY customer_id;
```

**Why:** COUNT(DISTINCT) is expensive; use only when necessary

### Index GROUP BY Columns
**❌ Bad:**
```sql
SELECT customer_id, SUM(amount)
FROM orders
GROUP BY customer_id;
-- If customer_id is not indexed
```

**✅ Good:**
```sql
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

SELECT customer_id, SUM(amount)
FROM orders
GROUP BY customer_id;
```

**Why:** Indexes help with grouping operations, especially for large datasets

### Avoid Unnecessary Aggregations
**❌ Bad:**
```sql
SELECT customer_id, COUNT(*), MAX(order_date), MIN(order_date)
FROM orders
GROUP BY customer_id
HAVING COUNT(*) = 1;  -- Only customers with one order
```

**✅ Good:**
```sql
-- If you only need customers with one order, filter differently
SELECT customer_id, order_date
FROM orders o1
WHERE NOT EXISTS (
    SELECT 1 FROM orders o2 
    WHERE o2.customer_id = o1.customer_id 
    AND o2.id != o1.id
);
```

**Why:** Sometimes alternative approaches avoid expensive aggregations

### Use Window Functions Instead of Self-Joins
**❌ Bad:**
```sql
SELECT o1.customer_id, o1.amount,
       (SELECT SUM(o2.amount) 
        FROM orders o2 
        WHERE o2.customer_id = o1.customer_id 
        AND o2.order_date <= o1.order_date) as running_total
FROM orders o1;
```

**✅ Good:**
```sql
SELECT customer_id, amount,
       SUM(amount) OVER (
           PARTITION BY customer_id 
           ORDER BY order_date 
           ROWS UNBOUNDED PRECEDING
       ) as running_total
FROM orders;
```

**Why:** Window functions are optimized for these operations and avoid self-joins


## 6. Performance Tips for Subqueries

### Use JOINs Instead of Correlated Subqueries When Possible
**❌ Bad:**
```sql
SELECT customer_id, name,
       (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) as order_count
FROM customers c;
```

**✅ Good:**
```sql
SELECT c.customer_id, c.name, COUNT(o.id) as order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.customer_id, c.name;
```

**Why:** 
- Correlated subqueries execute once per outer row (N+1 problem)
- JOINs are typically more efficient and can be better optimized

### Use EXISTS Instead of IN for Large Subqueries
**❌ Bad:**
```sql
SELECT * FROM customers
WHERE id IN (
    SELECT customer_id FROM orders WHERE amount > 1000
);
```

**✅ Good:**
```sql
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.id AND o.amount > 1000
);
```

**Why:** 
- EXISTS stops at first match
- IN must materialize entire subquery result
- EXISTS handles NULLs better

### Use IN Instead of EXISTS for Small Static Lists
**❌ Bad:**
```sql
SELECT * FROM products
WHERE EXISTS (
    SELECT 1 FROM (VALUES ('shirt'), ('pants'), ('shoes')) t(category)
    WHERE t.category = products.category
);
```

**✅ Good:**
```sql
SELECT * FROM products
WHERE category IN ('shirt', 'pants', 'shoes');
```

**Why:** For small static lists, IN is simpler and often faster

### Avoid Multiple Subqueries in SELECT
**❌ Bad:**
```sql
SELECT customer_id,
       (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) as order_count,
       (SELECT SUM(amount) FROM orders WHERE customer_id = c.id) as total_amount,
       (SELECT MAX(order_date) FROM orders WHERE customer_id = c.id) as last_order
FROM customers c;
```

**✅ Good:**
```sql
SELECT c.customer_id,
       COUNT(o.id) as order_count,
       SUM(o.amount) as total_amount,
       MAX(o.order_date) as last_order
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.customer_id;
```

**Why:** Single JOIN with aggregation is more efficient than multiple subqueries

### Materialize Expensive Subqueries
**❌ Bad:**
```sql
SELECT o.*
FROM orders o
WHERE o.customer_id IN (
    SELECT customer_id FROM large_complex_query
);
```

**✅ Good:**
```sql
-- Materialize subquery result
WITH customer_list AS (
    SELECT DISTINCT customer_id FROM large_complex_query
)
SELECT o.*
FROM orders o
WHERE o.customer_id IN (SELECT customer_id FROM customer_list);
```

**Why:** CTEs can help optimizer materialize expensive subqueries once

### Index Subquery Columns
**❌ Bad:**
```sql
SELECT * FROM orders
WHERE customer_id IN (
    SELECT id FROM customers WHERE status = 'active'
);
-- If customers.id or customers.status not indexed
```

**✅ Good:**
```sql
CREATE INDEX idx_customers_status ON customers(status);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

SELECT * FROM orders
WHERE customer_id IN (
    SELECT id FROM customers WHERE status = 'active'
);
```

**Why:** Indexes on subquery columns improve performance significantly


## 7. Performance Tips for DDL (Data Definition Language)

### Choose Appropriate Data Types
**❌ Bad:**
```sql
CREATE TABLE users (
    id VARCHAR(255),
    age VARCHAR(10),
    created_at VARCHAR(50),
    status VARCHAR(100)
);
```

**✅ Good:**
```sql
CREATE TABLE users (
    id INT PRIMARY KEY,
    age INT,
    created_at TIMESTAMP,
    status VARCHAR(20)
);
```

**Why:** 
- Smaller data types use less storage
- Faster comparisons and sorting
- Better index performance
- Reduced I/O operations

### Use NOT NULL Constraints
**❌ Bad:**
```sql
CREATE TABLE orders (
    id INT,
    customer_id INT,
    amount DECIMAL(10,2)
);
```

**✅ Good:**
```sql
CREATE TABLE orders (
    id INT NOT NULL PRIMARY KEY,
    customer_id INT NOT NULL,
    amount DECIMAL(10,2) NOT NULL
);
```

**Why:** 
- Enables better query optimization
- Reduces storage (no NULL bitmap needed)
- Prevents data quality issues
- Allows more efficient indexes

### Create Indexes Strategically
**❌ Bad:**
```sql
-- Creating too many indexes
CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    status VARCHAR(20),
    amount DECIMAL(10,2)
);
CREATE INDEX idx_customer ON orders(customer_id);
CREATE INDEX idx_date ON orders(order_date);
CREATE INDEX idx_status ON orders(status);
CREATE INDEX idx_amount ON orders(amount);
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_customer_status ON orders(customer_id, status);
-- Too many indexes!
```

**✅ Good:**
```sql
CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    status VARCHAR(20),
    amount DECIMAL(10,2)
);
-- Create indexes based on actual query patterns
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_status ON orders(status);  -- Only if frequently filtered
```

**Why:** 
- Indexes speed up SELECT but slow down INSERT/UPDATE/DELETE
- Each index requires storage and maintenance
- Create indexes based on actual query patterns

### Use Composite Indexes Wisely
**❌ Bad:**
```sql
-- Multiple single-column indexes
CREATE INDEX idx_customer ON orders(customer_id);
CREATE INDEX idx_date ON orders(order_date);
-- Query: WHERE customer_id = X AND order_date = Y
```

**✅ Good:**
```sql
-- Composite index for common query patterns
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
-- Can be used for:
-- WHERE customer_id = X
-- WHERE customer_id = X AND order_date = Y
```

**Why:** 
- Composite indexes can serve multiple query patterns
- Order matters: put most selective column first
- Can use leftmost prefix of composite index

### Partition Large Tables
**❌ Bad:**
```sql
-- Single large table
CREATE TABLE orders (
    id INT PRIMARY KEY,
    order_date DATE,
    -- millions of rows
);
```

**✅ Good:**
```sql
-- Partitioned table
CREATE TABLE orders (
    id INT,
    order_date DATE,
    -- other columns
) PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);
```

**Why:** 
- Partition pruning: queries only scan relevant partitions
- Easier maintenance: can drop/add partitions
- Better parallel processing
- Improved index performance per partition

### Use Appropriate Table Types
**❌ Bad:**
```sql
-- Using wrong storage engine
CREATE TABLE logs (
    id INT,
    log_message TEXT,
    created_at TIMESTAMP
) ENGINE=InnoDB;  -- If you only append and never update
```

**✅ Good:**
```sql
-- For append-only logs, MyISAM might be better (MySQL example)
-- Or use columnar storage for analytics
CREATE TABLE logs (
    id INT,
    log_message TEXT,
    created_at TIMESTAMP
) ENGINE=MyISAM;  -- Or use appropriate engine for your use case
```

**Why:** Different storage engines optimized for different workloads

### Normalize Appropriately
**❌ Bad:**
```sql
-- Over-normalized (too many joins needed)
SELECT o.*, c.name, c.email, a.street, a.city, a.state, p.name, p.price
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN addresses a ON c.address_id = a.id
JOIN products p ON o.product_id = p.id;
```

**✅ Good:**
```sql
-- Denormalize for read-heavy workloads
CREATE TABLE order_summary (
    order_id INT,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    product_name VARCHAR(100),
    -- denormalized data
);
```

**Why:** 
- Normalization reduces redundancy but increases joins
- Denormalization improves read performance at cost of storage
- Balance based on read vs write patterns


## 8. Performance Tips for Indexing

### Index Frequently Filtered Columns
**✅ Good:**
```sql
-- Index columns used in WHERE clauses
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_date ON orders(order_date);
```

**Why:** Indexes dramatically speed up filtering operations

### Index Foreign Keys
**✅ Good:**
```sql
CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    FOREIGN KEY (customer_id) REFERENCES customers(id)
);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
```

**Why:** 
- Foreign keys are frequently used in JOINs
- Some databases don't auto-index foreign keys
- Speeds up referential integrity checks

### Use Composite Indexes for Multi-Column Filters
**❌ Bad:**
```sql
-- Multiple single-column indexes
CREATE INDEX idx_customer ON orders(customer_id);
CREATE INDEX idx_date ON orders(order_date);
-- Query: WHERE customer_id = X AND order_date = Y
```

**✅ Good:**
```sql
-- Composite index
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
-- Can use for:
-- WHERE customer_id = X
-- WHERE customer_id = X AND order_date = Y
```

**Why:** 
- Single index can serve multiple query patterns
- More efficient than using multiple indexes
- Order matters: most selective column first

### Index Columns Used in ORDER BY
**✅ Good:**
```sql
-- If you frequently sort by order_date
CREATE INDEX idx_orders_date ON orders(order_date);

-- Query can use index for sorting
SELECT * FROM orders ORDER BY order_date;
```

**Why:** Indexes are already sorted, avoiding expensive sort operations

### Index Columns Used in JOINs
**✅ Good:**
```sql
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_customers_id ON customers(id);

-- Join will be faster
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
```

**Why:** Indexes enable efficient join algorithms (hash join, merge join)

### Avoid Over-Indexing
**❌ Bad:**
```sql
-- Too many indexes
CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    status VARCHAR(20),
    amount DECIMAL(10,2)
);
CREATE INDEX idx_customer ON orders(customer_id);
CREATE INDEX idx_date ON orders(order_date);
CREATE INDEX idx_status ON orders(status);
CREATE INDEX idx_amount ON orders(amount);
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_customer_status ON orders(customer_id, status);
CREATE INDEX idx_date_status ON orders(order_date, status);
-- Too many!
```

**✅ Good:**
```sql
-- Index based on actual query patterns
CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    status VARCHAR(20),
    amount DECIMAL(10,2)
);
-- Only create indexes you actually use
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
```

**Why:** 
- Each index slows down INSERT/UPDATE/DELETE
- Indexes consume storage space
- Maintenance overhead increases
- Query optimizer may choose wrong index

### Use Covering Indexes
**❌ Bad:**
```sql
CREATE INDEX idx_customer ON orders(customer_id);
-- Query: SELECT customer_id, order_date, amount WHERE customer_id = X
-- Must look up rows in table after index scan
```

**✅ Good:**
```sql
-- Covering index includes all columns needed
CREATE INDEX idx_customer_covering ON orders(customer_id, order_date, amount);
-- Query can be satisfied entirely from index
SELECT customer_id, order_date, amount 
FROM orders 
WHERE customer_id = X;
```

**Why:** 
- Eliminates need to access table data
- Faster query execution
- Reduced I/O operations

### Consider Partial Indexes
**✅ Good:**
```sql
-- Index only active orders (if most queries filter by status='active')
CREATE INDEX idx_active_orders ON orders(customer_id, order_date)
WHERE status = 'active';
```

**Why:** 
- Smaller index size
- Faster index operations
- Useful when filtering by specific values frequently

### Monitor Index Usage
**✅ Good:**
```sql
-- Check which indexes are actually used (syntax varies by DBMS)
-- PostgreSQL:
SELECT * FROM pg_stat_user_indexes;

-- MySQL:
SHOW INDEX FROM orders;

-- SQL Server:
SELECT * FROM sys.dm_db_index_usage_stats;
```

**Why:** 
- Identify unused indexes to drop
- Find missing indexes for slow queries
- Optimize index strategy based on actual usage

### Rebuild/Reorganize Indexes Regularly
**✅ Good:**
```sql
-- Rebuild indexes to reduce fragmentation (syntax varies)
-- SQL Server:
ALTER INDEX ALL ON orders REBUILD;

-- PostgreSQL:
REINDEX TABLE orders;

-- MySQL:
ALTER TABLE orders ENGINE=InnoDB;  -- Rebuilds indexes
```

**Why:** 
- Fragmented indexes perform poorly
- Regular maintenance keeps indexes efficient
- Improves query performance over time


## Summary

### Key Takeaways

1. **Fetch Only What You Need**: Use specific columns, limit results, avoid SELECT *
2. **Filter Efficiently**: Use indexed columns, avoid functions on indexed columns, filter early
3. **Join Wisely**: Index join columns, use appropriate join types, filter before joining
4. **Aggregate Smartly**: Filter with WHERE before GROUP BY, index GROUP BY columns
5. **Optimize Subqueries**: Prefer JOINs over correlated subqueries, use EXISTS for large lists
6. **Design Tables Well**: Choose right data types, use constraints, normalize appropriately
7. **Index Strategically**: Index frequently filtered/joined columns, avoid over-indexing, use covering indexes

### Best Practices Checklist

- [ ] Use EXPLAIN/EXPLAIN ANALYZE to understand query plans
- [ ] Index foreign keys and frequently filtered columns
- [ ] Filter early with WHERE before JOINs and aggregations
- [ ] Use appropriate data types and constraints
- [ ] Monitor slow queries and optimize them
- [ ] Regularly maintain indexes (rebuild/reorganize)
- [ ] Test query performance with realistic data volumes
- [ ] Review and remove unused indexes
- [ ] Consider partitioning for very large tables
- [ ] Balance normalization vs denormalization based on workload

### Remember
Performance optimization is an iterative process. Always measure before and after changes, and optimize based on actual query patterns and data volumes in your specific environment.
