## Table of Contents

1. [Execution Plans](#execution-plans)
2. [Indexes and Index Operations](#indexes-and-index-operations)
3. [Table Scans vs Index Scans](#table-scans-vs-index-scans)
4. [Heap Tables](#heap-tables)
5. [Index Seek vs Index Scan](#index-seek-vs-index-scan)
6. [Join Algorithms (Nested Loop)](#join-algorithms)
7. [Partitioning](#partitioning)
8. [Materialized Views](#materialized-views)


## 1. Execution Plans

An **Execution Plan** (also called Query Plan) is a roadmap that the database optimizer creates to execute a SQL query. It shows:
- The order of operations
- Which indexes will be used
- Join algorithms
- Estimated costs and row counts
- Access methods (table scan, index scan, index seek)

Understanding execution plans is crucial for identifying performance bottlenecks.

### Key Components of Execution Plans:
- **Operators**: Physical operations (Scan, Seek, Join, Sort, etc.)
- **Cost**: Estimated resource consumption (CPU, I/O, memory)
- **Cardinality**: Estimated number of rows
- **Access Methods**: How data is retrieved (Index Seek, Index Scan, Table Scan)

In [None]:
-- ============================================
-- SQL SERVER: Viewing Execution Plans
-- ============================================

-- Method 1: Graphical Execution Plan (SSMS)
-- Press Ctrl+M or enable "Include Actual Execution Plan" in SSMS
-- Then run your query

-- Method 2: Text-based execution plan
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

SELECT customer_id, order_date, total_amount
FROM orders
WHERE order_date >= '2024-01-01'
ORDER BY order_date DESC;

-- Method 3: XML Execution Plan
SET SHOWPLAN_XML ON;
GO
SELECT customer_id, order_date, total_amount
FROM orders
WHERE order_date >= '2024-01-01'
ORDER BY order_date DESC;
GO
SET SHOWPLAN_XML OFF;

-- Method 4: Using sys.dm_exec_query_plan for cached plans
SELECT 
    qp.query_plan,
    st.text
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
WHERE st.text LIKE '%orders%';

-- ============================================
-- SNOWFLAKE: Viewing Execution Plans
-- ============================================

-- Method 1: EXPLAIN command (shows query plan)
EXPLAIN
SELECT customer_id, order_date, total_amount
FROM orders
WHERE order_date >= '2024-01-01'
ORDER BY order_date DESC;

-- Method 2: EXPLAIN using JSON for detailed plan
EXPLAIN USING JSON
SELECT customer_id, order_date, total_amount
FROM orders
WHERE order_date >= '2024-01-01'
ORDER BY order_date DESC;

-- Method 3: EXPLAIN using TABULAR format
EXPLAIN USING TABULAR
SELECT customer_id, order_date, total_amount
FROM orders
WHERE order_date >= '2024-01-01'
ORDER BY order_date DESC;

-- Method 4: View query profile (after query execution)
-- In Snowflake UI: History tab -> Click on query -> View Profile
-- Or use INFORMATION_SCHEMA.QUERY_HISTORY
SELECT 
    query_id,
    query_text,
    total_elapsed_time,
    bytes_scanned,
    rows_produced
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE query_text LIKE '%orders%'
ORDER BY start_time DESC
LIMIT 10;


## 2. Indexes and Index Operations

**Indexes** are database structures that improve query performance by providing fast access paths to data. They work like a book's index, allowing the database to quickly locate rows without scanning the entire table.

### Types of Indexes:

#### SQL Server Index Types:
- **Clustered Index**: Physically orders data rows (one per table)
- **Non-Clustered Index**: Separate structure pointing to data rows (multiple allowed)
- **Unique Index**: Ensures no duplicate values
- **Composite Index**: Index on multiple columns
- **Covering Index**: Contains all columns needed for a query (includes columns in INCLUDE clause)

#### Snowflake Index Types:
- **Clustering Keys**: Automatic micro-partitioning based on specified columns
- **Search Optimization Service**: For point lookups on large tables
- **Automatic Clustering**: Snowflake automatically clusters data based on usage patterns


In [None]:
-- ============================================
-- SQL SERVER: Creating and Managing Indexes
-- ============================================

-- Create a non-clustered index
CREATE NONCLUSTERED INDEX IX_Orders_OrderDate
ON orders(order_date);

-- Create a composite index
CREATE NONCLUSTERED INDEX IX_Orders_CustomerDate
ON orders(customer_id, order_date);

-- Create a covering index (includes additional columns)
CREATE NONCLUSTERED INDEX IX_Orders_Covering
ON orders(order_date)
INCLUDE (customer_id, total_amount);

-- Create a unique index
CREATE UNIQUE NONCLUSTERED INDEX IX_Customers_Email
ON customers(email);

-- Create a clustered index (only one per table)
CREATE CLUSTERED INDEX IX_Orders_OrderID
ON orders(order_id);

-- View existing indexes
SELECT 
    i.name AS IndexName,
    i.type_desc AS IndexType,
    COL_NAME(ic.object_id, ic.column_id) AS ColumnName,
    ic.is_included_column
FROM sys.indexes i
INNER JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
WHERE i.object_id = OBJECT_ID('orders')
ORDER BY i.name, ic.key_ordinal;

-- Check index usage statistics
SELECT 
    OBJECT_NAME(s.object_id) AS TableName,
    i.name AS IndexName,
    s.user_seeks,
    s.user_scans,
    s.user_lookups,
    s.user_updates,
    s.last_user_seek,
    s.last_user_scan
FROM sys.dm_db_index_usage_stats s
INNER JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE OBJECT_NAME(s.object_id) = 'orders';

-- Rebuild index (maintenance)
ALTER INDEX IX_Orders_OrderDate ON orders REBUILD;

-- Reorganize index (lighter maintenance)
ALTER INDEX IX_Orders_OrderDate ON orders REORGANIZE;

-- Drop an index
DROP INDEX IX_Orders_OrderDate ON orders;

-- ============================================
-- SNOWFLAKE: Clustering and Search Optimization
-- ============================================

-- Create a table with clustering key
CREATE TABLE orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2),
    status VARCHAR(50)
)
CLUSTER BY (order_date, customer_id);

-- Add clustering key to existing table
ALTER TABLE orders CLUSTER BY (order_date);

-- View clustering information
SELECT 
    SYSTEM$CLUSTERING_INFORMATION('orders', '(order_date)');

-- Check clustering depth (lower is better, 1 is optimal)
SELECT SYSTEM$CLUSTERING_DEPTH('orders', '(order_date)');

-- Enable automatic clustering (for large tables)
ALTER TABLE orders CLUSTER BY (order_date);

-- Suspend automatic clustering
ALTER TABLE orders SUSPEND RECLUSTER;

-- Resume automatic clustering
ALTER TABLE orders RESUME RECLUSTER;

-- Create Search Optimization Service (for point lookups)
ALTER TABLE orders ADD SEARCH OPTIMIZATION ON EQUALITY(customer_id, order_id);

-- View search optimization status
SHOW TABLES LIKE 'orders';
SELECT * FROM TABLE(INFORMATION_SCHEMA.SEARCH_OPTIMIZATION_HISTORY())
WHERE table_name = 'ORDERS';

-- View table clustering information
SELECT 
    table_name,
    clustering_key,
    automatic_clustering,
    automatic_clustering_on
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'ORDERS';


## 3. Table Scan vs Index Scan

### TABLE SCAN
A **Table Scan** (also called Full Table Scan) reads every row in the table sequentially. This is the most expensive operation and should be avoided for large tables.

**When Table Scans Occur:**
- No suitable index exists
- Query needs to read most/all rows (optimizer decides scan is cheaper)
- Index is not selective enough
- Statistics are outdated

**Characteristics:**
- Reads entire table
- High I/O cost
- Slow for large tables
- Sometimes necessary (e.g., when reading >20-30% of rows)

### INDEX SCAN
An **Index Scan** reads all entries in an index sequentially. It's faster than a table scan but still reads the entire index.

**When Index Scans Occur:**
- Query can use an index but needs to read most index entries
- Range queries without selective predicates
- Index is not selective enough for the query

**Characteristics:**
- Reads entire index structure
- Better than table scan (indexes are smaller)
- Still involves sequential reading
- Faster than table scan but slower than index seek


In [None]:
-- ============================================
-- SQL SERVER: Identifying Table Scans and Index Scans
-- ============================================

-- Example query that might cause a TABLE SCAN
-- (No index on order_date, or reading too many rows)
SET STATISTICS IO ON;
SELECT * 
FROM orders
WHERE order_date BETWEEN '2020-01-01' AND '2024-12-31';
-- Look for "Table Scan" in execution plan

-- Example query that might cause an INDEX SCAN
-- (Index exists but query reads most of the index)
SELECT customer_id, order_date
FROM orders
WHERE order_date >= '2023-01-01'
ORDER BY order_date;
-- Look for "Index Scan" in execution plan

-- Force a table scan (not recommended, for demonstration only)
SELECT * 
FROM orders WITH (FORCESCAN)
WHERE customer_id = 12345;

-- Check if table scan occurred
SELECT 
    OBJECT_NAME(object_id) AS TableName,
    index_id,
    scan_count,
    logical_reads,
    physical_reads
FROM sys.dm_db_index_operational_stats(DB_ID(), OBJECT_ID('orders'), NULL, NULL)
WHERE scan_count > 0;

-- ============================================
-- SNOWFLAKE: Understanding Scans
-- ============================================

-- Query that causes full table scan
-- (No clustering key or reading large portion of data)
SELECT * 
FROM orders
WHERE status = 'pending';
-- Check query profile to see "TableScan" operator

-- Query with clustering key (more efficient)
SELECT * 
FROM orders
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';
-- If order_date is in clustering key, this uses micro-partition pruning

-- View query statistics to identify scans
SELECT 
    query_id,
    query_text,
    bytes_scanned,
    rows_produced,
    partitions_scanned,
    partitions_total
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE query_text LIKE '%orders%'
ORDER BY start_time DESC
LIMIT 5;

-- Check table scan statistics
SELECT 
    table_name,
    bytes_scanned,
    rows_scanned,
    partitions_scanned
FROM TABLE(INFORMATION_SCHEMA.TABLE_STORAGE_METRICS())
WHERE table_name = 'ORDERS';


In [None]:
-- ============================================
-- SQL SERVER: Working with Heap Tables
-- ============================================

-- Create a heap table (no clustered index)
CREATE TABLE heap_orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
);
-- This is a heap because no clustered index is defined

-- Check if a table is a heap
SELECT 
    OBJECT_NAME(object_id) AS TableName,
    type_desc,
    CASE 
        WHEN index_id = 0 THEN 'Heap'
        WHEN index_id = 1 THEN 'Clustered Index'
        ELSE 'Non-Clustered Index'
    END AS IndexType
FROM sys.indexes
WHERE object_id = OBJECT_ID('heap_orders')
ORDER BY index_id;

-- Create non-clustered index on heap
CREATE NONCLUSTERED INDEX IX_HeapOrders_OrderDate
ON heap_orders(order_date);

-- Query on heap table (will use RID lookup)
SELECT order_id, total_amount
FROM heap_orders
WHERE order_date = '2024-01-15';
-- Execution plan will show: Index Seek -> RID Lookup -> Heap

-- Check for forwarded records (heap fragmentation)
SELECT 
    OBJECT_NAME(object_id) AS TableName,
    forwarded_record_count,
    avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(
    DB_ID(), 
    OBJECT_ID('heap_orders'), 
    0, -- Heap
    NULL, 
    'DETAILED'
);

-- Rebuild heap to remove forwarded records
ALTER TABLE heap_orders REBUILD;

-- Convert heap to clustered index table
CREATE CLUSTERED INDEX IX_HeapOrders_OrderID
ON heap_orders(order_id);

-- ============================================
-- SNOWFLAKE: Note on Heaps
-- ============================================

-- Note: Snowflake doesn't have the same heap concept as SQL Server
-- All tables in Snowflake are automatically organized using micro-partitions
-- However, tables without clustering keys can be considered "unorganized"

-- Create table without clustering key (similar to heap concept)
CREATE TABLE unclustered_orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
);
-- This table will still use micro-partitions but without explicit clustering

-- Check table organization
SELECT 
    table_name,
    clustering_key,
    automatic_clustering
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'UNCLUSTERED_ORDERS';

-- Add clustering to improve organization
ALTER TABLE unclustered_orders CLUSTER BY (order_date);


## 5. Index Seek vs Index Scan

### INDEX SEEK
An **Index Seek** is the most efficient index operation. It directly navigates to specific rows in the index using the B-tree structure, similar to looking up a word in a dictionary.

**When Index Seek Occurs:**
- Query has a selective WHERE clause
- Index key columns are used in WHERE clause
- Query returns small percentage of rows
- Equality or range predicates on indexed columns

**Characteristics:**
- Direct navigation to data
- Very fast (O(log n) complexity)
- Low I/O cost
- Ideal for point lookups and selective queries

### INDEX SCAN
An **Index Scan** reads through the entire index sequentially, similar to reading a dictionary page by page.

**When Index Scan Occurs:**
- Query needs to read most index entries
- Non-selective predicates
- Query optimizer determines scan is cheaper than seek
- Range queries that cover large portion of index

**Performance Comparison:**
- **Index Seek**: Fastest (direct navigation)
- **Index Scan**: Moderate (sequential read of index)
- **Table Scan**: Slowest (sequential read of entire table)


In [None]:
-- ============================================
-- SQL SERVER: Index Seek vs Index Scan Examples
-- ============================================

-- Create sample table and index
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    category_id INT,
    price DECIMAL(10,2),
    created_date DATE
);

CREATE NONCLUSTERED INDEX IX_Products_CategoryID
ON products(category_id);

-- Example 1: INDEX SEEK (selective query)
-- Returns small number of rows - optimizer uses Index Seek
SELECT product_id, product_name, price
FROM products
WHERE category_id = 5;
-- Execution plan shows: Index Seek (NonClustered)

-- Example 2: INDEX SCAN (less selective query)
-- Returns large portion of rows - optimizer may use Index Scan
SELECT product_id, product_name
FROM products
WHERE category_id BETWEEN 1 AND 10;
-- Execution plan may show: Index Scan (NonClustered)

-- Example 3: INDEX SEEK with range
SELECT product_id, product_name
FROM products
WHERE category_id >= 50 AND category_id <= 55;
-- Execution plan shows: Index Seek (NonClustered) with range

-- Force Index Seek (for demonstration - not recommended in production)
SELECT product_id, product_name
FROM products WITH (INDEX(IX_Products_CategoryID))
WHERE category_id BETWEEN 1 AND 10;

-- View execution plan details
SET STATISTICS IO ON;
SELECT product_id, product_name
FROM products
WHERE category_id = 5;
-- Check logical reads - Index Seek will have fewer reads

-- Compare performance
-- Index Seek (selective):
SELECT COUNT(*) FROM products WHERE category_id = 5;

-- Index Scan (less selective):
SELECT COUNT(*) FROM products WHERE category_id BETWEEN 1 AND 100;

-- ============================================
-- SNOWFLAKE: Understanding Seek vs Scan
-- ============================================

-- Note: Snowflake uses micro-partition pruning instead of traditional index seeks
-- Clustering keys enable efficient data access similar to index seeks

-- Create table with clustering key
CREATE TABLE products (
    product_id INT,
    product_name VARCHAR(100),
    category_id INT,
    price DECIMAL(10,2),
    created_date DATE
) CLUSTER BY (category_id);

-- Example 1: Efficient lookup (similar to Index Seek)
-- Uses micro-partition pruning
SELECT product_id, product_name, price
FROM products
WHERE category_id = 5;
-- Query profile shows: TableScan with partition pruning

-- Example 2: Less efficient (similar to Index Scan)
-- Scans more micro-partitions
SELECT product_id, product_name
FROM products
WHERE category_id BETWEEN 1 AND 10;
-- Query profile shows: TableScan with less pruning

-- Check query profile for partition pruning
EXPLAIN USING JSON
SELECT product_id, product_name
FROM products
WHERE category_id = 5;
-- Look for "partitionPruning" in the plan

-- View partition pruning statistics
SELECT 
    query_id,
    partitions_scanned,
    partitions_total,
    CASE 
        WHEN partitions_total > 0 
        THEN (partitions_scanned::FLOAT / partitions_total * 100)
        ELSE 0 
    END AS pruning_percentage
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE query_text LIKE '%products%'
ORDER BY start_time DESC
LIMIT 10;


## 6. Join Algorithms: Nested Loop Join

**Nested Loop Join** is one of the three main join algorithms used by database optimizers. It works by iterating through each row in the outer table and searching for matching rows in the inner table.

### How Nested Loop Join Works:
1. For each row in the **outer table** (left table)
2. Search for matching rows in the **inner table** (right table)
3. Return matching rows

### Characteristics:
- **Best for**: Small tables, or when one table is much smaller than the other
- **Cost**: O(n Ã— m) where n = outer table rows, m = inner table rows
- **Requires**: Index on join column of inner table for efficiency
- **Memory**: Low memory usage
- **When Used**: 
  - Small result sets
  - One table is small
  - Index exists on inner table's join column
  - Top N queries with joins

### Other Join Algorithms:
- **Hash Join**: Builds hash table from smaller table, probes with larger table
- **Merge Join**: Sorts both tables and merges them (requires sorted inputs)


In [None]:
-- ============================================
-- SQL SERVER: Nested Loop Join Examples
-- ============================================

-- Create sample tables
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    email VARCHAR(100)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
);

CREATE NONCLUSTERED INDEX IX_Orders_CustomerID
ON orders(customer_id);

-- Example 1: Nested Loop Join (small result set)
-- SQL Server will likely choose Nested Loop
SELECT c.customer_name, o.order_date, o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id BETWEEN 1 AND 100;
-- Execution plan shows: Nested Loops (Inner Join)

-- Example 2: Force Nested Loop Join (for demonstration)
SELECT c.customer_name, o.order_date
FROM customers c
INNER LOOP JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id < 50;

-- Example 3: Nested Loop with Index Seek on inner table
-- This is the optimal scenario for Nested Loop
SELECT c.customer_name, o.order_date
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id = 123;
-- Execution plan: Nested Loops -> Index Seek on orders

-- View join statistics
SET STATISTICS IO ON;
SELECT c.customer_name, COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name;
-- Check execution plan for join type

-- Compare join algorithms
-- Nested Loop (good for small outer table):
SELECT TOP 10 c.customer_name, o.order_date
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

-- ============================================
-- SNOWFLAKE: Join Algorithms
-- ============================================

-- Note: Snowflake optimizer automatically chooses join algorithms
-- It uses similar concepts: Nested Loop, Hash Join, Merge Join

-- Create sample tables
CREATE TABLE customers (
    customer_id INT,
    customer_name VARCHAR(100),
    email VARCHAR(100)
) CLUSTER BY (customer_id);

CREATE TABLE orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
) CLUSTER BY (customer_id);

-- Example 1: Join that may use nested loop-like behavior
SELECT c.customer_name, o.order_date, o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id BETWEEN 1 AND 100;

-- View join plan
EXPLAIN USING JSON
SELECT c.customer_name, o.order_date
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id = 123;
-- Look for join operator in the plan

-- Check join statistics
SELECT 
    query_id,
    query_text,
    bytes_scanned,
    rows_produced,
    total_elapsed_time
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE query_text LIKE '%JOIN%'
ORDER BY start_time DESC
LIMIT 10;

-- Optimize join performance with clustering
-- Both tables clustered on join key improves performance
SELECT c.customer_name, o.order_date
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id BETWEEN 1 AND 1000;


## 7. Partitioning

**Partitioning** divides large tables into smaller, more manageable pieces called partitions. Each partition can be stored and accessed independently, improving query performance and maintenance operations.

### Benefits of Partitioning:
- **Query Performance**: Partition elimination (only scan relevant partitions)
- **Maintenance**: Operations like backups, index rebuilds on specific partitions
- **Parallel Processing**: Different partitions can be processed in parallel
- **Data Management**: Easy archival and deletion of old partitions

### Types of Partitioning:

#### SQL Server Partitioning:
- **Range Partitioning**: Partition by date ranges, numeric ranges
- **Partition Function**: Defines partition boundaries
- **Partition Scheme**: Maps partitions to filegroups
- **Partitioned Indexes**: Indexes can also be partitioned

#### Snowflake Partitioning:
- **Automatic Micro-partitioning**: Snowflake automatically partitions data
- **Clustering Keys**: Control how data is organized within micro-partitions
- **Partition Pruning**: Automatic elimination of irrelevant partitions


In [None]:
-- ============================================
-- SQL SERVER: Table Partitioning
-- ============================================

-- Step 1: Create Partition Function (defines boundaries)
CREATE PARTITION FUNCTION pf_OrderDate (DATE)
AS RANGE RIGHT FOR VALUES 
('2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01', '2024-01-01');

-- Step 2: Create Partition Scheme (maps to filegroups)
CREATE PARTITION SCHEME ps_OrderDate
AS PARTITION pf_OrderDate
TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]);
-- In production, use different filegroups for better performance

-- Step 3: Create Partitioned Table
CREATE TABLE partitioned_orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
) ON ps_OrderDate(order_date);

-- Step 4: Create Indexes on Partitioned Table
CREATE NONCLUSTERED INDEX IX_PartOrders_CustomerID
ON partitioned_orders(customer_id)
ON ps_OrderDate(order_date); -- Partition-aligned index

-- Query with partition elimination
SELECT order_id, total_amount
FROM partitioned_orders
WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';
-- Only scans partition 5 (2023 data)

-- View partition information
SELECT 
    p.partition_number,
    p.rows,
    r.value AS boundary_value,
    CASE 
        WHEN r.boundary_id IS NULL THEN 'Rightmost'
        ELSE CAST(r.value AS VARCHAR)
    END AS range_value
FROM sys.partitions p
LEFT JOIN sys.partition_range_values r 
    ON p.partition_function_id = r.function_id 
    AND p.partition_number = r.boundary_id + 1
WHERE p.object_id = OBJECT_ID('partitioned_orders')
ORDER BY p.partition_number;

-- Switch partition (fast data archival)
-- Create staging table with same structure
CREATE TABLE orders_archive_2020 (
    order_id INT,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
) ON ps_OrderDate(order_date);

-- Switch partition 1 to archive table
ALTER TABLE partitioned_orders
SWITCH PARTITION 1 TO orders_archive_2020 PARTITION 1;

-- Merge partitions
ALTER PARTITION FUNCTION pf_OrderDate()
MERGE RANGE ('2021-01-01');

-- Split partition (add new partition)
ALTER PARTITION FUNCTION pf_OrderDate()
SPLIT RANGE ('2025-01-01');

-- ============================================
-- SNOWFLAKE: Automatic Partitioning and Clustering
-- ============================================

-- Snowflake automatically partitions data into micro-partitions
-- Clustering keys control organization for efficient pruning

-- Create table with clustering key (controls micro-partition organization)
CREATE TABLE partitioned_orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
) CLUSTER BY (TO_DATE(order_date));

-- Insert data (automatically organized into micro-partitions)
INSERT INTO partitioned_orders
SELECT 
    SEQ4() AS order_id,
    MOD(SEQ4(), 1000) AS customer_id,
    DATEADD(day, MOD(SEQ4(), 1460), '2020-01-01') AS order_date,
    RANDOM() * 1000 AS total_amount
FROM TABLE(GENERATOR(ROWCOUNT => 1000000));

-- Query with partition pruning
SELECT order_id, total_amount
FROM partitioned_orders
WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';
-- Snowflake automatically prunes irrelevant micro-partitions

-- Check clustering information
SELECT SYSTEM$CLUSTERING_INFORMATION('partitioned_orders', '(order_date)');

-- View partition pruning in query profile
EXPLAIN USING JSON
SELECT COUNT(*) 
FROM partitioned_orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31';
-- Look for "partitionPruning" statistics

-- Check partition statistics
SELECT 
    table_name,
    bytes,
    row_count,
    clustering_key
FROM INFORMATION_SCHEMA.TABLE_STORAGE_METRICS
WHERE table_name = 'PARTITIONED_ORDERS';

-- Manual clustering (reorganize data)
ALTER TABLE partitioned_orders CLUSTER BY (TO_DATE(order_date));

-- Check clustering depth (1 is optimal)
SELECT SYSTEM$CLUSTERING_DEPTH('partitioned_orders', '(order_date)');

-- Automatic clustering (for large tables)
ALTER TABLE partitioned_orders CLUSTER BY (TO_DATE(order_date));
-- Snowflake automatically maintains clustering

-- View automatic clustering history
SELECT 
    table_name,
    start_time,
    end_time,
    rows_reclustered,
    bytes_reclustered
FROM TABLE(INFORMATION_SCHEMA.AUTOMATIC_CLUSTERING_HISTORY())
WHERE table_name = 'PARTITIONED_ORDERS'
ORDER BY start_time DESC;


## 8. Materialized Views

**Materialized Views** (also called Indexed Views in SQL Server) are database objects that store the results of a query physically. Unlike regular views, materialized views contain actual data, which is refreshed periodically or on-demand.

### Benefits:
- **Pre-computed Results**: Expensive calculations done once
- **Query Performance**: Fast access to aggregated or joined data
- **Reduced Load**: Less processing on base tables
- **Consistency**: Snapshot of data at refresh time

### Use Cases:
- Complex aggregations
- Expensive joins
- Frequently accessed query results
- Data warehouse reporting
- Summary tables

### Trade-offs:
- **Storage**: Requires additional storage space
- **Maintenance**: Needs periodic refresh
- **Staleness**: Data may be slightly outdated
- **Overhead**: Refresh operations consume resources


In [None]:
-- ============================================
-- SQL SERVER: Indexed Views (Materialized Views)
-- ============================================

-- Note: SQL Server uses "Indexed Views" instead of materialized views
-- They are automatically maintained and always up-to-date

-- Create base tables
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
);

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    region VARCHAR(50)
);

-- Step 1: Create a view
CREATE VIEW vw_CustomerOrderSummary
WITH SCHEMABINDING
AS
SELECT 
    c.customer_id,
    c.customer_name,
    c.region,
    COUNT_BIG(*) AS order_count,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS avg_order_amount
FROM dbo.customers c
INNER JOIN dbo.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name, c.region;
-- SCHEMABINDING is required for indexed views

-- Step 2: Create unique clustered index (this materializes the view)
CREATE UNIQUE CLUSTERED INDEX IX_CustomerOrderSummary
ON vw_CustomerOrderSummary(customer_id);

-- Query the indexed view (uses pre-computed results)
SELECT customer_name, total_spent, order_count
FROM vw_CustomerOrderSummary
WHERE region = 'North';

-- View is automatically maintained - no refresh needed
-- Data is always current with base tables

-- Check indexed view usage
SELECT 
    OBJECT_NAME(object_id) AS ViewName,
    type_desc,
    is_unique
FROM sys.indexes
WHERE object_id = OBJECT_ID('vw_CustomerOrderSummary');

-- Drop indexed view
DROP VIEW vw_CustomerOrderSummary;

-- ============================================
-- SNOWFLAKE: Materialized Views
-- ============================================

-- Create base tables
CREATE TABLE orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
);

CREATE TABLE customers (
    customer_id INT,
    customer_name VARCHAR(100),
    region VARCHAR(50)
);

-- Create materialized view
CREATE MATERIALIZED VIEW mv_customer_order_summary
AS
SELECT 
    c.customer_id,
    c.customer_name,
    c.region,
    COUNT(*) AS order_count,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS avg_order_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name, c.region;

-- Query materialized view (uses pre-computed results)
SELECT customer_name, total_spent, order_count
FROM mv_customer_order_summary
WHERE region = 'North';

-- Refresh materialized view (incremental refresh)
ALTER MATERIALIZED VIEW mv_customer_order_summary REFRESH;

-- Automatic refresh (Snowflake Enterprise edition)
-- Materialized views can be set to auto-refresh
ALTER MATERIALIZED VIEW mv_customer_order_summary 
SET AUTO_REFRESH = TRUE;

-- View materialized view information
SHOW MATERIALIZED VIEWS LIKE 'mv_customer_order_summary';

SELECT 
    table_name,
    kind,
    comment
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'MV_CUSTOMER_ORDER_SUMMARY';

-- Check materialized view refresh history
SELECT 
    name,
    refresh_start_time,
    refresh_end_time,
    refresh_type,
    rows_updated
FROM TABLE(INFORMATION_SCHEMA.MATERIALIZED_VIEW_REFRESH_HISTORY())
WHERE name = 'MV_CUSTOMER_ORDER_SUMMARY'
ORDER BY refresh_start_time DESC;

-- Drop materialized view
DROP MATERIALIZED VIEW mv_customer_order_summary;

-- ============================================
-- Performance Comparison Example
-- ============================================

-- Without materialized view (slower - computes on-the-fly)
SELECT 
    c.region,
    COUNT(*) AS order_count,
    SUM(o.total_amount) AS total_spent
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.region;
-- Execution time: ~5 seconds (example)

-- With materialized view (faster - uses pre-computed data)
SELECT 
    region,
    SUM(order_count) AS total_orders,
    SUM(total_spent) AS total_spent
FROM mv_customer_order_summary
GROUP BY region;
-- Execution time: ~0.1 seconds (example)


## Performance Optimization Best Practices

### General Guidelines:

1. **Monitor Execution Plans**
   - Regularly review execution plans for expensive operations
   - Look for table scans, missing indexes, and inefficient joins

2. **Index Strategy**
   - Create indexes on frequently queried columns
   - Use covering indexes to avoid key lookups
   - Monitor index usage and remove unused indexes
   - Keep statistics updated

3. **Query Optimization**
   - Use WHERE clauses to filter early
   - Avoid SELECT * when possible
   - Use appropriate join types
   - Consider query hints only when necessary

4. **Partitioning Strategy**
   - Partition large tables by date or other frequently filtered columns
   - Ensure partition elimination in queries
   - Align indexes with partition scheme

5. **Materialized Views**
   - Use for expensive, frequently-run queries
   - Balance refresh frequency with data freshness requirements
   - Monitor storage costs

### SQL Server Specific:
- Keep statistics updated: `UPDATE STATISTICS`
- Rebuild/reorganize indexes regularly
- Use appropriate fill factor for indexes
- Monitor fragmentation levels

### Snowflake Specific:
- Use clustering keys for large tables
- Enable automatic clustering for frequently queried tables
- Use Search Optimization Service for point lookups
- Monitor query profiles and partition pruning
- Consider materialized views for complex aggregations
