# CTAS and Temporary Tables in Snowflake

## Learning Objectives
- Understand CTAS (Create Table As Select) and its syntax
- Learn the differences between CTAS vs CREATE/INSERT
- Understand CTAS vs Views
- Learn how databases execute CTAS operations
- Explore use cases for CTAS in performance optimization and data warehousing
- Understand temporary tables and their syntax
- Learn how databases execute temporary tables
- Explore use cases for temporary tables


## 1. Introduction to CTAS (Create Table As Select)

**CTAS** (Create Table As Select) is a SQL statement that creates a new table and populates it with data from a SELECT query in a single operation. It's one of the most efficient ways to create and populate a table in a data warehouse.

### Key Characteristics:
- **Single Operation**: Creates table and inserts data in one statement
- **Atomic**: Either succeeds completely or fails completely
- **Efficient**: Optimized for bulk data loading
- **Schema Inference**: Automatically infers column types from SELECT query
- **No Separate INSERT**: Data is loaded during table creation

### Why Use CTAS?
1. **Performance**: Faster than CREATE + INSERT for large datasets
2. **Simplicity**: One statement instead of multiple
3. **Atomicity**: All-or-nothing operation
4. **Schema Flexibility**: Automatically creates schema from query results


## 2. CTAS Syntax

The basic syntax for CTAS in Snowflake is:

```sql
CREATE [OR REPLACE] TABLE [IF NOT EXISTS] table_name
[(
    column_name column_type [DEFAULT default_value] [NOT NULL],
    ...
)]
AS
SELECT ...
[ORDER BY ...]
[CLUSTER BY ...];
```

### Syntax Components:
- **CREATE TABLE**: Standard table creation clause
- **OR REPLACE**: Optionally replace existing table
- **IF NOT EXISTS**: Only create if table doesn't exist
- **AS SELECT**: The query that populates the table
- **ORDER BY**: Optional ordering (affects storage clustering in Snowflake)
- **CLUSTER BY**: Optional clustering key for optimization


In [None]:
-- First, let's create some sample tables in Snowflake for our examples
-- Create a customers table
CREATE OR REPLACE TABLE customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    city VARCHAR(50),
    country VARCHAR(50),
    registration_date DATE,
    customer_tier VARCHAR(20)
);

-- Create an orders table
CREATE OR REPLACE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10, 2),
    status VARCHAR(20),
    product_category VARCHAR(50)
);

-- Create a products table
CREATE OR REPLACE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10, 2),
    stock_quantity INT
);

-- Insert dummy data into customers table
INSERT INTO customers VALUES
(1, 'John', 'Doe', 'john.doe@email.com', 'New York', 'USA', '2023-01-15', 'Gold'),
(2, 'Jane', 'Smith', 'jane.smith@email.com', 'London', 'UK', '2023-02-20', 'Platinum'),
(3, 'Bob', 'Johnson', 'bob.johnson@email.com', 'Toronto', 'Canada', '2023-03-10', 'Silver'),
(4, 'Alice', 'Williams', 'alice.williams@email.com', 'Sydney', 'Australia', '2023-04-05', 'Gold'),
(5, 'Charlie', 'Brown', 'charlie.brown@email.com', 'New York', 'USA', '2023-05-12', 'Bronze'),
(6, 'David', 'Miller', 'david.miller@email.com', 'Chicago', 'USA', '2023-06-01', 'Platinum'),
(7, 'Emma', 'Davis', 'emma.davis@email.com', 'Manchester', 'UK', '2023-07-10', 'Silver');

-- Insert dummy data into orders table
INSERT INTO orders VALUES
(101, 1, '2023-06-01', 150.00, 'Completed', 'Electronics'),
(102, 1, '2023-07-15', 250.50, 'Completed', 'Clothing'),
(103, 2, '2023-06-20', 75.25, 'Pending', 'Books'),
(104, 3, '2023-08-01', 320.00, 'Completed', 'Electronics'),
(105, 4, '2023-08-10', 180.75, 'Completed', 'Clothing'),
(106, 1, '2023-09-05', 95.50, 'Pending', 'Books'),
(107, 5, '2023-09-12', 210.00, 'Completed', 'Electronics'),
(108, 2, '2023-09-20', 450.00, 'Completed', 'Electronics'),
(109, 6, '2023-10-01', 275.00, 'Completed', 'Clothing'),
(110, 7, '2023-10-05', 125.00, 'Completed', 'Books');

-- Insert dummy data into products table
INSERT INTO products VALUES
(1, 'Laptop', 'Electronics', 999.99, 50),
(2, 'Smartphone', 'Electronics', 699.99, 100),
(3, 'T-Shirt', 'Clothing', 29.99, 200),
(4, 'Jeans', 'Clothing', 79.99, 150),
(5, 'Novel', 'Books', 14.99, 300),
(6, 'Textbook', 'Books', 89.99, 75);


In [None]:
-- Example 1: Basic CTAS - Simple table creation from query
-- Create a table with customer summary information
CREATE OR REPLACE TABLE customer_summary AS
SELECT 
    customer_id,
    first_name || ' ' || last_name AS full_name,
    email,
    city,
    country,
    customer_tier
FROM customers
WHERE country = 'USA';

-- Query the newly created table
SELECT * FROM customer_summary;


In [None]:
-- Example 2: CTAS with explicit column definitions
-- Create a table with specific column types and constraints
CREATE OR REPLACE TABLE high_value_customers (
    customer_id INT NOT NULL,
    customer_name VARCHAR(200),
    total_spent DECIMAL(12, 2),
    order_count INT,
    avg_order_value DECIMAL(10, 2)
) AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    SUM(o.total_amount) AS total_spent,
    COUNT(o.order_id) AS order_count,
    AVG(o.total_amount) AS avg_order_value
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'Completed'
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING SUM(o.total_amount) > 200;

-- Query the table
SELECT * FROM high_value_customers ORDER BY total_spent DESC;


In [None]:
-- Example 3: CTAS with ORDER BY (for clustering optimization in Snowflake)
-- Create a table ordered by country for better query performance
CREATE OR REPLACE TABLE customer_by_country AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    c.city,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'Completed'
GROUP BY c.customer_id, c.first_name, c.last_name, c.country, c.city
ORDER BY c.country, total_spent DESC;

-- Query the table
SELECT * FROM customer_by_country;


## 3. CTAS vs CREATE/INSERT

Understanding the differences between CTAS and the traditional CREATE + INSERT approach is crucial for performance optimization.

### Comparison:

| Aspect | CTAS | CREATE + INSERT |
|--------|------|-----------------|
| **Operations** | Single atomic operation | Two separate operations |
| **Performance** | Faster (optimized bulk load) | Slower (row-by-row or batch inserts) |
| **Transaction** | Single transaction | Multiple transactions |
| **Schema** | Inferred from SELECT | Must be explicitly defined |
| **Error Handling** | All-or-nothing | Partial success possible |
| **Use Case** | Bulk data loading | Incremental data loading |
| **Locking** | Table-level lock (brief) | Row-level locks (longer) |


In [None]:
-- Demonstration: CTAS vs CREATE/INSERT

-- Method 1: Using CTAS (Single operation, faster)
CREATE OR REPLACE TABLE customer_orders_ctas AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    o.order_id,
    o.order_date,
    o.total_amount,
    o.status
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'Completed';

-- Method 2: Using CREATE + INSERT (Two operations, slower)
-- Step 1: Create the table structure
CREATE OR REPLACE TABLE customer_orders_insert (
    customer_id INT,
    customer_name VARCHAR(200),
    order_id INT,
    order_date DATE,
    total_amount DECIMAL(10, 2),
    status VARCHAR(20)
);

-- Step 2: Insert data
INSERT INTO customer_orders_insert
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    o.order_id,
    o.order_date,
    o.total_amount,
    o.status
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'Completed';

-- Both methods produce the same result, but CTAS is more efficient
SELECT 'CTAS Method' AS method, COUNT(*) AS row_count FROM customer_orders_ctas
UNION ALL
SELECT 'CREATE+INSERT Method' AS method, COUNT(*) AS row_count FROM customer_orders_insert;


### Why CTAS is Faster:

1. **Optimized Execution Plan**: Database optimizes the entire operation as one unit
2. **Bulk Loading**: Uses bulk load mechanisms instead of row-by-row inserts
3. **Reduced Overhead**: No separate parsing/planning for INSERT statement
4. **Better Locking**: Single table lock instead of multiple row locks
5. **Minimal Logging**: More efficient transaction logging
6. **Parallel Processing**: Can leverage parallel execution more effectively


## 4. CTAS vs Views

CTAS creates a physical table, while views are virtual. Understanding when to use each is important.

### Comparison:

| Aspect | CTAS (Table) | View |
|--------|--------------|------|
| **Storage** | Physical storage of data | No physical storage (virtual) |
| **Data Freshness** | Static (snapshot at creation) | Dynamic (always current) |
| **Performance** | Fast (reads stored data) | Slower (executes query each time) |
| **Storage Cost** | Consumes storage space | No storage cost |
| **Update** | Requires manual refresh/reload | Always reflects current data |
| **Indexing** | Can create indexes | Cannot create indexes directly |
| **Use Case** | Historical snapshots, performance | Real-time data access |
| **DML Operations** | Full support | Limited support |


In [None]:
-- Demonstration: CTAS vs View

-- Method 1: Create a table using CTAS (physical storage)
CREATE OR REPLACE TABLE monthly_sales_table AS
SELECT 
    DATE_TRUNC('month', order_date) AS sales_month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_revenue,
    AVG(total_amount) AS avg_order_value,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE status = 'Completed'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY sales_month DESC;

-- Method 2: Create a view (virtual, no storage)
CREATE OR REPLACE VIEW monthly_sales_view AS
SELECT 
    DATE_TRUNC('month', order_date) AS sales_month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_revenue,
    AVG(total_amount) AS avg_order_value,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE status = 'Completed'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY sales_month DESC;

-- Both return the same data initially
SELECT 'Table (CTAS)' AS type, * FROM monthly_sales_table;
SELECT 'View' AS type, * FROM monthly_sales_view;

-- Now add a new order
INSERT INTO orders VALUES
(111, 2, '2023-10-15', 300.00, 'Completed', 'Electronics');

-- Query again - see the difference!
SELECT 'Table (CTAS) - Static' AS type, * FROM monthly_sales_table;
SELECT 'View - Dynamic' AS type, * FROM monthly_sales_view;

-- The table still has old data (needs to be refreshed)
-- The view automatically shows new data (dynamic)


### When to Use CTAS vs View:

**Use CTAS (Table) when:**
- You need fast query performance on large datasets
- Data doesn't need to be real-time
- You're creating historical snapshots
- You need to create indexes for optimization
- You're doing ETL/ELT transformations
- You need to store aggregated/pre-computed data

**Use View when:**
- You need real-time, always-current data
- Storage cost is a concern
- Data changes frequently
- You want to simplify complex queries for end users
- You need to implement security/access control


## 5. How Database Executes CTAS

Understanding the execution process helps optimize CTAS operations.

### Execution Steps:

1. **Query Parsing**: Database parses the SELECT query
2. **Schema Inference**: Determines column names and types from query results
3. **Table Creation**: Creates the table structure (if not explicitly defined)
4. **Query Execution**: Executes the SELECT query
5. **Data Loading**: Loads results directly into the new table (bulk load)
6. **Index Creation**: Creates any specified indexes or constraints
7. **Commit**: Commits the transaction (all-or-nothing)

### Key Optimizations:

- **Bulk Loading**: Uses optimized bulk load mechanisms
- **Parallel Processing**: Can execute SELECT and load in parallel
- **Minimal Logging**: Reduces transaction log overhead
- **Direct Path**: Writes directly to table storage (bypasses buffer cache in some cases)
- **Compression**: Applies compression during load (in Snowflake)


In [None]:
-- Example: Understanding CTAS execution with EXPLAIN
-- In Snowflake, you can see the execution plan for CTAS

EXPLAIN
CREATE OR REPLACE TABLE customer_analytics AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS lifetime_value,
    AVG(o.total_amount) AS avg_order_value,
    MIN(o.order_date) AS first_order_date,
    MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'Completed'
GROUP BY c.customer_id, c.first_name, c.last_name, c.country;

-- The execution plan shows:
-- 1. How the SELECT query is executed
-- 2. Join operations
-- 3. Aggregations
-- 4. Table creation and data loading


## 6. CTAS Use Case - Performance Optimization

CTAS is commonly used to optimize query performance by pre-computing expensive operations.

### Performance Optimization Strategies:

1. **Pre-aggregation**: Store aggregated data instead of computing on-the-fly
2. **Denormalization**: Combine related data to reduce joins
3. **Filtering**: Pre-filter data to reduce table size
4. **Clustering**: Order data for better query performance
5. **Materialization**: Store complex query results for fast access


In [None]:
-- Use Case 1: Pre-aggregation for Performance
-- Instead of running expensive aggregations every time, create a pre-aggregated table

-- Slow approach: Running aggregation every time
-- SELECT country, SUM(total_amount) FROM orders GROUP BY country; (runs every time)

-- Fast approach: Pre-aggregate using CTAS
CREATE OR REPLACE TABLE country_sales_summary AS
SELECT 
    c.country,
    COUNT(DISTINCT c.customer_id) AS total_customers,
    COUNT(DISTINCT o.order_id) AS total_orders,
    SUM(o.total_amount) AS total_revenue,
    AVG(o.total_amount) AS avg_order_value,
    MIN(o.order_date) AS first_order_date,
    MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'Completed'
GROUP BY c.country;

-- Now queries are much faster (reading from pre-aggregated table)
SELECT * FROM country_sales_summary ORDER BY total_revenue DESC;


In [None]:
-- Use Case 2: Denormalization for Performance
-- Combine multiple tables into one to avoid joins

-- Instead of joining customers and orders every time:
-- SELECT ... FROM customers c JOIN orders o ON c.customer_id = o.customer_id

-- Create a denormalized table with CTAS
CREATE OR REPLACE TABLE customer_orders_denormalized AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.email,
    c.city,
    c.country,
    c.customer_tier,
    o.order_id,
    o.order_date,
    o.total_amount,
    o.status,
    o.product_category
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

-- Now queries don't need joins (faster reads)
SELECT 
    country,
    product_category,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue
FROM customer_orders_denormalized
WHERE status = 'Completed'
GROUP BY country, product_category
ORDER BY revenue DESC;


In [None]:
-- Use Case 3: Pre-filtering and Clustering for Performance
-- Create a filtered, clustered table for specific query patterns

-- Create a table with only completed orders, ordered by date for time-series queries
CREATE OR REPLACE TABLE completed_orders_clustered AS
SELECT 
    o.order_id,
    o.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    o.order_date,
    o.total_amount,
    o.product_category
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'Completed'
ORDER BY o.order_date DESC, o.total_amount DESC;

-- This table is optimized for queries filtering by date
SELECT * 
FROM completed_orders_clustered 
WHERE order_date >= '2023-09-01'
ORDER BY order_date DESC;


## 7. CTAS Use Case - Data Warehouse

In data warehouse environments, CTAS is essential for ETL/ELT processes and data transformation.

### Data Warehouse Use Cases:

1. **Staging Tables**: Load raw data into staging area
2. **Transformation**: Transform data between layers (staging → ODS → DWH)
3. **Data Marts**: Create subject-specific data marts
4. **Historical Snapshots**: Create point-in-time snapshots
5. **Data Quality**: Create cleaned/validated datasets
6. **Incremental Loads**: Create delta tables for incremental processing


In [None]:
-- Use Case 1: Creating a Data Mart (Sales Data Mart)
-- A data mart is a subject-specific subset of a data warehouse

CREATE OR REPLACE TABLE sales_data_mart AS
SELECT 
    o.order_id,
    o.order_date,
    DATE_TRUNC('month', o.order_date) AS order_month,
    DATE_TRUNC('quarter', o.order_date) AS order_quarter,
    DATE_TRUNC('year', o.order_date) AS order_year,
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country AS customer_country,
    c.customer_tier,
    o.product_category,
    o.total_amount AS order_amount,
    o.status AS order_status,
    CASE 
        WHEN o.total_amount > 300 THEN 'High Value'
        WHEN o.total_amount > 150 THEN 'Medium Value'
        ELSE 'Low Value'
    END AS order_segment
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;

-- This data mart can be used for sales analytics
SELECT 
    order_year,
    order_quarter,
    customer_country,
    product_category,
    COUNT(*) AS order_count,
    SUM(order_amount) AS total_revenue
FROM sales_data_mart
WHERE order_status = 'Completed'
GROUP BY order_year, order_quarter, customer_country, product_category
ORDER BY order_year DESC, order_quarter DESC, total_revenue DESC;


In [None]:
-- Use Case 2: Creating Historical Snapshot
-- Capture data at a specific point in time for reporting

CREATE OR REPLACE TABLE customer_snapshot_2023_q3 AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    c.customer_tier,
    COUNT(o.order_id) AS q3_order_count,
    SUM(o.total_amount) AS q3_total_spent,
    AVG(o.total_amount) AS q3_avg_order_value,
    MAX(o.order_date) AS last_order_date_q3
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id 
    AND o.status = 'Completed'
    AND o.order_date >= '2023-07-01' 
    AND o.order_date < '2023-10-01'
GROUP BY c.customer_id, c.first_name, c.last_name, c.country, c.customer_tier;

-- This snapshot preserves Q3 2023 data even if source data changes
SELECT * FROM customer_snapshot_2023_q3 ORDER BY q3_total_spent DESC;


In [None]:
-- Use Case 3: Data Quality - Creating Cleaned Dataset
-- Apply data quality rules and create a clean dataset

CREATE OR REPLACE TABLE cleaned_customer_orders AS
SELECT 
    o.order_id,
    o.customer_id,
    COALESCE(c.first_name || ' ' || c.last_name, 'Unknown Customer') AS customer_name,
    COALESCE(c.country, 'Unknown') AS country,
    o.order_date,
    -- Data quality: Ensure amount is positive
    CASE 
        WHEN o.total_amount < 0 THEN 0 
        ELSE o.total_amount 
    END AS total_amount,
    -- Data quality: Standardize status
    UPPER(TRIM(o.status)) AS status,
    COALESCE(o.product_category, 'Uncategorized') AS product_category,
    -- Add data quality flags
    CASE WHEN o.total_amount < 0 THEN 1 ELSE 0 END AS has_negative_amount,
    CASE WHEN c.customer_id IS NULL THEN 1 ELSE 0 END AS has_missing_customer
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date IS NOT NULL  -- Filter out invalid dates
    AND o.order_id IS NOT NULL;  -- Filter out invalid order IDs

-- Query the cleaned dataset
SELECT 
    status,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_revenue,
    SUM(has_negative_amount) AS orders_with_negative_amount,
    SUM(has_missing_customer) AS orders_with_missing_customer
FROM cleaned_customer_orders
GROUP BY status;


## 8. Introduction to Temporary Tables

**Temporary tables** are tables that exist only for the duration of a session or transaction. They are useful for storing intermediate results during complex data processing.

### Key Characteristics:
- **Session-scoped**: Exists only for the current session
- **Automatic Cleanup**: Automatically dropped when session ends
- **No Persistence**: Data is not persisted across sessions
- **Performance**: Often stored in memory or temp space (faster)
- **Isolation**: Each session has its own temporary tables
- **No Sharing**: Cannot be accessed by other sessions

### Why Use Temporary Tables?
1. **Intermediate Results**: Store results of complex calculations
2. **Performance**: Faster than permanent tables for temporary data
3. **Isolation**: Each session's data is isolated
4. **Cleanup**: Automatic cleanup reduces maintenance
5. **Testing**: Safe for testing without affecting production data


## 9. Temporary Table Syntax and Examples

In Snowflake, temporary tables are created using the `TEMPORARY` or `TEMP` keyword.

### Syntax:

```sql
CREATE [OR REPLACE] TEMPORARY TABLE [IF NOT EXISTS] table_name
(
    column_name column_type [DEFAULT default_value] [NOT NULL],
    ...
)
[AS SELECT ...];
```

### Key Points:
- Use `TEMPORARY` or `TEMP` keyword
- Automatically dropped at end of session
- Can use CTAS syntax with temporary tables
- Cannot be shared across sessions
- Stored in temporary storage (often faster)


In [None]:
-- Example 1: Create a temporary table with explicit schema
CREATE OR REPLACE TEMPORARY TABLE temp_customer_summary (
    customer_id INT,
    customer_name VARCHAR(200),
    total_orders INT,
    total_spent DECIMAL(10, 2)
);

-- Insert data into temporary table
INSERT INTO temp_customer_summary
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'Completed'
GROUP BY c.customer_id, c.first_name, c.last_name;

-- Query the temporary table
SELECT * FROM temp_customer_summary ORDER BY total_spent DESC;


In [None]:
-- Example 2: Create temporary table using CTAS (Temporary CTAS)
CREATE OR REPLACE TEMPORARY TABLE temp_high_value_customers AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    COUNT(o.order_id) AS order_count,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS avg_order_value
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'Completed'
GROUP BY c.customer_id, c.first_name, c.last_name, c.country
HAVING SUM(o.total_amount) > 200;

-- Query the temporary table
SELECT * FROM temp_high_value_customers ORDER BY total_spent DESC;


In [None]:
-- Example 3: Using temporary table for multi-step processing
-- Step 1: Create temporary table with filtered data
CREATE OR REPLACE TEMPORARY TABLE temp_recent_orders AS
SELECT 
    o.order_id,
    o.customer_id,
    o.order_date,
    o.total_amount,
    o.product_category
FROM orders o
WHERE o.order_date >= '2023-09-01'
    AND o.status = 'Completed';

-- Step 2: Use temporary table in further processing
CREATE OR REPLACE TEMPORARY TABLE temp_customer_analysis AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    COUNT(tro.order_id) AS recent_order_count,
    SUM(tro.total_amount) AS recent_spending,
    AVG(tro.total_amount) AS avg_recent_order_value
FROM customers c
INNER JOIN temp_recent_orders tro ON c.customer_id = tro.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name, c.country;

-- Step 3: Final query using temporary tables
SELECT 
    country,
    COUNT(*) AS customer_count,
    SUM(recent_order_count) AS total_orders,
    SUM(recent_spending) AS total_revenue,
    AVG(recent_spending) AS avg_customer_spending
FROM temp_customer_analysis
GROUP BY country
ORDER BY total_revenue DESC;


In [None]:
-- Example 4: Temporary table for data transformation pipeline
-- Transform and clean data in stages using temporary tables

-- Stage 1: Extract and filter
CREATE OR REPLACE TEMPORARY TABLE temp_stage1_raw AS
SELECT 
    o.*,
    c.country,
    c.customer_tier
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2023-08-01';

-- Stage 2: Transform and enrich
CREATE OR REPLACE TEMPORARY TABLE temp_stage2_transformed AS
SELECT 
    order_id,
    customer_id,
    order_date,
    DATE_TRUNC('week', order_date) AS order_week,
    total_amount,
    product_category,
    country,
    customer_tier,
    CASE 
        WHEN total_amount > 300 THEN 'High'
        WHEN total_amount > 150 THEN 'Medium'
        ELSE 'Low'
    END AS order_size_category
FROM temp_stage1_raw
WHERE status = 'Completed';

-- Stage 3: Final aggregation
SELECT 
    order_week,
    country,
    customer_tier,
    order_size_category,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue
FROM temp_stage2_transformed
GROUP BY order_week, country, customer_tier, order_size_category
ORDER BY order_week DESC, revenue DESC;


## 10. How Database Executes Temporary Tables

Understanding temporary table execution helps optimize their usage.

### Execution Process:

1. **Table Creation**: Creates table structure in temporary storage
2. **Storage Allocation**: Allocates space in temp storage (often memory or fast disk)
3. **Data Loading**: Loads data using same mechanisms as regular tables
4. **Query Execution**: Queries execute against temporary table
5. **Session Management**: Table exists only for current session
6. **Automatic Cleanup**: Dropped automatically when session ends

### Key Differences from Permanent Tables:

- **Storage Location**: Stored in temporary storage (often faster)
- **Lifecycle**: Session-scoped, not transaction-scoped
- **Isolation**: Each session has its own copy
- **Performance**: Often faster due to temp storage location
- **Cleanup**: Automatic, no manual DROP needed


In [None]:
-- Example: Understanding temporary table execution
-- In Snowflake, temporary tables are stored in temporary storage

-- Create a temporary table
CREATE OR REPLACE TEMPORARY TABLE temp_order_analysis AS
SELECT 
    o.order_id,
    o.customer_id,
    o.order_date,
    o.total_amount,
    o.product_category,
    c.country,
    c.customer_tier
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'Completed';

-- View execution plan for querying temporary table
EXPLAIN
SELECT 
    country,
    product_category,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue
FROM temp_order_analysis
GROUP BY country, product_category;

-- Note: Temporary tables behave like regular tables for query execution
-- The difference is in storage location and lifecycle management


### Temporary Table Storage in Snowflake:

- **Storage**: Stored in temporary storage (separate from permanent tables)
- **Performance**: Often faster due to storage location
- **Isolation**: Each session has isolated temporary tables
- **Cleanup**: Automatically dropped when:
  - Session ends
  - Connection closes
  - Explicit DROP command is issued
- **Naming**: Can have same name as permanent table (session isolation)


## 11. Temporary Table Use Cases

Temporary tables are invaluable for various data processing scenarios.

### Common Use Cases:

1. **ETL/ELT Pipelines**: Store intermediate transformation results
2. **Complex Calculations**: Break down complex queries into steps
3. **Data Validation**: Store validation results temporarily
4. **Testing**: Test queries without affecting production data
5. **Staging Data**: Temporary staging for data loads
6. **Session-specific Processing**: Process data specific to a session
7. **Performance Optimization**: Store frequently accessed intermediate results
8. **Data Deduplication**: Identify and handle duplicates


In [None]:
-- Use Case 1: ETL Pipeline - Multi-stage Transformation
-- Break down complex ETL into manageable steps

-- Stage 1: Extract and initial filter
CREATE OR REPLACE TEMPORARY TABLE temp_extract AS
SELECT * FROM orders WHERE order_date >= '2023-09-01';

-- Stage 2: Transform - Add calculated fields
CREATE OR REPLACE TEMPORARY TABLE temp_transform AS
SELECT 
    o.*,
    c.country,
    c.customer_tier,
    CASE 
        WHEN o.total_amount > 300 THEN 'High Value'
        WHEN o.total_amount > 150 THEN 'Medium Value'
        ELSE 'Low Value'
    END AS order_segment,
    DATEDIFF('day', c.registration_date, o.order_date) AS days_since_registration
FROM temp_extract o
INNER JOIN customers c ON o.customer_id = c.customer_id;

-- Stage 3: Load - Final aggregation for reporting
SELECT 
    country,
    customer_tier,
    order_segment,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue,
    AVG(days_since_registration) AS avg_days_since_registration
FROM temp_transform
WHERE status = 'Completed'
GROUP BY country, customer_tier, order_segment
ORDER BY revenue DESC;


In [None]:
-- Use Case 2: Data Validation and Quality Checks
-- Store validation results in temporary tables

-- Create temporary table for validation results
CREATE OR REPLACE TEMPORARY TABLE temp_validation_results AS
SELECT 
    o.order_id,
    o.customer_id,
    o.total_amount,
    o.order_date,
    CASE WHEN o.total_amount < 0 THEN 'Invalid: Negative Amount' ELSE 'Valid' END AS amount_validation,
    CASE WHEN o.order_date > CURRENT_DATE() THEN 'Invalid: Future Date' ELSE 'Valid' END AS date_validation,
    CASE WHEN o.customer_id NOT IN (SELECT customer_id FROM customers) THEN 'Invalid: Missing Customer' ELSE 'Valid' END AS customer_validation
FROM orders o;

-- Analyze validation results
SELECT 
    amount_validation,
    date_validation,
    customer_validation,
    COUNT(*) AS record_count
FROM temp_validation_results
GROUP BY amount_validation, date_validation, customer_validation;

-- Get invalid records
SELECT * 
FROM temp_validation_results
WHERE amount_validation != 'Valid' 
    OR date_validation != 'Valid' 
    OR customer_validation != 'Valid';


In [None]:
-- Use Case 3: Complex Calculations - Breaking Down Complex Queries
-- Use temporary tables to simplify complex analytical queries

-- Step 1: Calculate customer lifetime value
CREATE OR REPLACE TEMPORARY TABLE temp_customer_ltv AS
SELECT 
    c.customer_id,
    c.country,
    c.customer_tier,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS lifetime_value,
    AVG(o.total_amount) AS avg_order_value,
    MIN(o.order_date) AS first_order_date,
    MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'Completed'
GROUP BY c.customer_id, c.country, c.customer_tier;

-- Step 2: Calculate recency (days since last order)
CREATE OR REPLACE TEMPORARY TABLE temp_customer_rfm AS
SELECT 
    customer_id,
    country,
    customer_tier,
    total_orders,
    lifetime_value,
    avg_order_value,
    DATEDIFF('day', last_order_date, CURRENT_DATE()) AS days_since_last_order,
    CASE 
        WHEN DATEDIFF('day', last_order_date, CURRENT_DATE()) <= 30 THEN 'Recent'
        WHEN DATEDIFF('day', last_order_date, CURRENT_DATE()) <= 90 THEN 'Moderate'
        ELSE 'Churned'
    END AS recency_segment
FROM temp_customer_ltv;

-- Step 3: Final analysis combining all metrics
SELECT 
    country,
    customer_tier,
    recency_segment,
    COUNT(*) AS customer_count,
    AVG(lifetime_value) AS avg_lifetime_value,
    AVG(total_orders) AS avg_orders,
    AVG(days_since_last_order) AS avg_days_since_last_order
FROM temp_customer_rfm
GROUP BY country, customer_tier, recency_segment
ORDER BY country, avg_lifetime_value DESC;


In [None]:
-- Use Case 4: Data Deduplication
-- Identify and handle duplicate records using temporary tables

-- Step 1: Identify duplicates
CREATE OR REPLACE TEMPORARY TABLE temp_duplicates AS
SELECT 
    customer_id,
    order_date,
    total_amount,
    COUNT(*) AS duplicate_count,
    LISTAGG(order_id, ', ') WITHIN GROUP (ORDER BY order_id) AS order_ids
FROM orders
GROUP BY customer_id, order_date, total_amount
HAVING COUNT(*) > 1;

-- Step 2: Get unique records (deduplicated)
CREATE OR REPLACE TEMPORARY TABLE temp_deduplicated_orders AS
SELECT DISTINCT
    order_id,
    customer_id,
    order_date,
    total_amount,
    status,
    product_category
FROM orders
WHERE (customer_id, order_date, total_amount) NOT IN (
    SELECT customer_id, order_date, total_amount 
    FROM temp_duplicates
)
UNION ALL
-- Keep only one record from each duplicate group
SELECT 
    MIN(order_id) AS order_id,
    customer_id,
    order_date,
    total_amount,
    MAX(status) AS status,
    MAX(product_category) AS product_category
FROM orders
WHERE (customer_id, order_date, total_amount) IN (
    SELECT customer_id, order_date, total_amount 
    FROM temp_duplicates
)
GROUP BY customer_id, order_date, total_amount;

-- Verify deduplication
SELECT 
    'Original' AS source,
    COUNT(*) AS record_count
FROM orders
UNION ALL
SELECT 
    'Deduplicated' AS source,
    COUNT(*) AS record_count
FROM temp_deduplicated_orders;


In [None]:
-- Use Case 5: Testing and Development
-- Use temporary tables to test queries without affecting production

-- Create a temporary copy of production data for testing
CREATE OR REPLACE TEMPORARY TABLE temp_test_orders AS
SELECT * FROM orders WHERE order_date >= '2023-09-01' LIMIT 100;

-- Test your transformations on temporary data
CREATE OR REPLACE TEMPORARY TABLE temp_test_results AS
SELECT 
    customer_id,
    COUNT(*) AS test_order_count,
    SUM(total_amount) AS test_revenue
FROM temp_test_orders
WHERE status = 'Completed'
GROUP BY customer_id;

-- Verify test results
SELECT * FROM temp_test_results ORDER BY test_revenue DESC;

-- Once validated, you can apply the same logic to production tables
-- Temporary tables are automatically cleaned up, so no risk to production


## Key Takeaways

### CTAS:
1. **Single Operation**: Creates table and loads data in one statement
2. **Performance**: Faster than CREATE + INSERT for bulk operations
3. **Physical Storage**: Creates actual table with stored data
4. **Static Data**: Data is a snapshot at creation time
5. **Use for**: Pre-aggregation, denormalization, data marts, snapshots

### Temporary Tables:
1. **Session-scoped**: Exists only for current session
2. **Automatic Cleanup**: Dropped when session ends
3. **Isolation**: Each session has its own temporary tables
4. **Performance**: Often faster due to temp storage location
5. **Use for**: ETL pipelines, intermediate results, testing, complex calculations

### When to Use What:
- **CTAS (Permanent Table)**: When you need persistent, fast-access data
- **CTAS (Temporary Table)**: When you need temporary storage for session-specific processing
- **View**: When you need real-time, always-current data
- **CREATE + INSERT**: When you need incremental data loading


## Practice Problems

### Problem 1: CTAS for Performance Optimization
Create a table using CTAS that pre-aggregates monthly sales data by country and product category. Include:
- Month (YYYY-MM format)
- Country
- Product category
- Total revenue
- Number of orders
- Average order value
- Number of unique customers

This table should be optimized for fast analytical queries on monthly sales by country and category.


In [None]:
-- Solution to Problem 1
CREATE OR REPLACE TABLE monthly_sales_by_country_category AS
SELECT 
    TO_CHAR(DATE_TRUNC('month', o.order_date), 'YYYY-MM') AS sales_month,
    c.country,
    o.product_category,
    SUM(o.total_amount) AS total_revenue,
    COUNT(o.order_id) AS order_count,
    AVG(o.total_amount) AS avg_order_value,
    COUNT(DISTINCT o.customer_id) AS unique_customers
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'Completed'
GROUP BY 
    DATE_TRUNC('month', o.order_date),
    c.country,
    o.product_category
ORDER BY sales_month DESC, total_revenue DESC;

-- Verify the table
SELECT * FROM monthly_sales_by_country_category;


### Problem 2: CTAS vs CREATE/INSERT Performance
Demonstrate the difference between CTAS and CREATE/INSERT by:
1. Creating a table using CTAS that contains customer order summary (customer_id, customer_name, total_orders, total_spent)
2. Creating the same table using CREATE + INSERT
3. Compare the execution time and explain why CTAS is more efficient


In [None]:
-- Solution to Problem 2

-- Method 1: Using CTAS (Single operation)
CREATE OR REPLACE TABLE customer_summary_ctas AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'Completed'
GROUP BY c.customer_id, c.first_name, c.last_name;

-- Method 2: Using CREATE + INSERT (Two operations)
CREATE OR REPLACE TABLE customer_summary_insert (
    customer_id INT,
    customer_name VARCHAR(200),
    total_orders INT,
    total_spent DECIMAL(10, 2)
);

INSERT INTO customer_summary_insert
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'Completed'
GROUP BY c.customer_id, c.first_name, c.last_name;

-- Compare results
SELECT 'CTAS Method' AS method, COUNT(*) AS row_count FROM customer_summary_ctas
UNION ALL
SELECT 'CREATE+INSERT Method' AS method, COUNT(*) AS row_count FROM customer_summary_insert;

-- Explanation:
-- CTAS is more efficient because:
-- 1. Single atomic operation (one transaction)
-- 2. Optimized bulk loading mechanism
-- 3. Reduced overhead (no separate INSERT parsing)
-- 4. Better execution plan optimization
-- 5. Minimal transaction logging


### Problem 3: CTAS for Data Warehouse - Creating a Data Mart
Create a sales data mart using CTAS that includes:
- Order details (order_id, order_date, order_month, order_quarter)
- Customer information (customer_id, customer_name, country, customer_tier)
- Product information (product_category)
- Financial metrics (order_amount, order_segment: High/Medium/Low based on amount)
- Time dimensions (order_year, order_quarter, order_month)

This should be a comprehensive data mart for sales analytics.


In [None]:
-- Solution to Problem 3
CREATE OR REPLACE TABLE sales_data_mart AS
SELECT 
    -- Order details
    o.order_id,
    o.order_date,
    DATE_TRUNC('month', o.order_date) AS order_month,
    DATE_TRUNC('quarter', o.order_date) AS order_quarter,
    DATE_TRUNC('year', o.order_date) AS order_year,
    TO_CHAR(DATE_TRUNC('month', o.order_date), 'YYYY-MM') AS order_month_str,
    TO_CHAR(DATE_TRUNC('quarter', o.order_date), 'YYYY-Q') AS order_quarter_str,
    TO_CHAR(DATE_TRUNC('year', o.order_date), 'YYYY') AS order_year_str,
    
    -- Customer information
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    c.customer_tier,
    
    -- Product information
    o.product_category,
    
    -- Financial metrics
    o.total_amount AS order_amount,
    o.status AS order_status,
    CASE 
        WHEN o.total_amount > 300 THEN 'High'
        WHEN o.total_amount > 150 THEN 'Medium'
        ELSE 'Low'
    END AS order_segment
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
ORDER BY o.order_date DESC;

-- Query the data mart for analytics
SELECT 
    order_year_str,
    order_quarter_str,
    country,
    customer_tier,
    order_segment,
    COUNT(*) AS order_count,
    SUM(order_amount) AS total_revenue,
    AVG(order_amount) AS avg_order_amount
FROM sales_data_mart
WHERE order_status = 'Completed'
GROUP BY order_year_str, order_quarter_str, country, customer_tier, order_segment
ORDER BY order_year_str DESC, order_quarter_str DESC, total_revenue DESC;


### Problem 4: Temporary Table for Multi-step ETL
Create a multi-stage ETL process using temporary tables:
1. Stage 1: Extract orders from the last 3 months into a temporary table
2. Stage 2: Transform by adding customer information and calculated fields
3. Stage 3: Load final aggregated results showing monthly revenue by country and customer tier

Use temporary tables for each stage.


In [None]:
-- Solution to Problem 4: Multi-stage ETL with Temporary Tables

-- Stage 1: Extract - Get orders from last 3 months
CREATE OR REPLACE TEMPORARY TABLE temp_stage1_extract AS
SELECT 
    order_id,
    customer_id,
    order_date,
    total_amount,
    status,
    product_category
FROM orders
WHERE order_date >= DATEADD('month', -3, CURRENT_DATE())
    AND status = 'Completed';

-- Stage 2: Transform - Add customer information and calculated fields
CREATE OR REPLACE TEMPORARY TABLE temp_stage2_transform AS
SELECT 
    tro.order_id,
    tro.order_date,
    DATE_TRUNC('month', tro.order_date) AS order_month,
    c.customer_id,
    c.country,
    c.customer_tier,
    tro.total_amount,
    tro.product_category,
    CASE 
        WHEN tro.total_amount > 300 THEN 'High Value'
        WHEN tro.total_amount > 150 THEN 'Medium Value'
        ELSE 'Low Value'
    END AS order_segment,
    DATEDIFF('day', c.registration_date, tro.order_date) AS days_since_registration
FROM temp_stage1_extract tro
INNER JOIN customers c ON tro.customer_id = c.customer_id;

-- Stage 3: Load - Final aggregation
SELECT 
    TO_CHAR(order_month, 'YYYY-MM') AS sales_month,
    country,
    customer_tier,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_revenue,
    AVG(total_amount) AS avg_order_value,
    COUNT(DISTINCT customer_id) AS unique_customers,
    AVG(days_since_registration) AS avg_days_since_registration
FROM temp_stage2_transform
GROUP BY order_month, country, customer_tier
ORDER BY order_month DESC, total_revenue DESC;


### Problem 5: Temporary Table for Data Validation
Create a data validation process using temporary tables:
1. Create a temporary table that validates orders for:
   - Negative amounts
   - Future dates
   - Missing customers
   - Invalid status values
2. Generate a validation report showing counts of valid vs invalid records
3. Create a cleaned dataset with only valid records


In [None]:
-- Solution to Problem 5: Data Validation with Temporary Tables

-- Step 1: Create validation results table
CREATE OR REPLACE TEMPORARY TABLE temp_validation AS
SELECT 
    o.order_id,
    o.customer_id,
    o.order_date,
    o.total_amount,
    o.status,
    -- Validation flags
    CASE WHEN o.total_amount < 0 THEN 'Invalid' ELSE 'Valid' END AS amount_check,
    CASE WHEN o.order_date > CURRENT_DATE() THEN 'Invalid' ELSE 'Valid' END AS date_check,
    CASE WHEN o.customer_id NOT IN (SELECT customer_id FROM customers) THEN 'Invalid' ELSE 'Valid' END AS customer_check,
    CASE WHEN o.status NOT IN ('Completed', 'Pending', 'Cancelled') THEN 'Invalid' ELSE 'Valid' END AS status_check,
    -- Overall validation
    CASE 
        WHEN o.total_amount < 0 OR 
             o.order_date > CURRENT_DATE() OR 
             o.customer_id NOT IN (SELECT customer_id FROM customers) OR
             o.status NOT IN ('Completed', 'Pending', 'Cancelled')
        THEN 'Invalid'
        ELSE 'Valid'
    END AS overall_validation
FROM orders o;

-- Step 2: Generate validation report
SELECT 
    'Amount Check' AS validation_type,
    amount_check AS result,
    COUNT(*) AS record_count
FROM temp_validation
GROUP BY amount_check
UNION ALL
SELECT 
    'Date Check' AS validation_type,
    date_check AS result,
    COUNT(*) AS record_count
FROM temp_validation
GROUP BY date_check
UNION ALL
SELECT 
    'Customer Check' AS validation_type,
    customer_check AS result,
    COUNT(*) AS record_count
FROM temp_validation
GROUP BY customer_check
UNION ALL
SELECT 
    'Status Check' AS validation_type,
    status_check AS result,
    COUNT(*) AS record_count
FROM temp_validation
GROUP BY status_check
UNION ALL
SELECT 
    'Overall Validation' AS validation_type,
    overall_validation AS result,
    COUNT(*) AS record_count
FROM temp_validation
GROUP BY overall_validation
ORDER BY validation_type, result;

-- Step 3: Create cleaned dataset with only valid records
CREATE OR REPLACE TEMPORARY TABLE temp_cleaned_orders AS
SELECT 
    o.order_id,
    o.customer_id,
    o.order_date,
    o.total_amount,
    o.status,
    o.product_category
FROM orders o
WHERE o.total_amount >= 0
    AND o.order_date <= CURRENT_DATE()
    AND o.customer_id IN (SELECT customer_id FROM customers)
    AND o.status IN ('Completed', 'Pending', 'Cancelled');

-- Verify cleaned dataset
SELECT 
    'Original Orders' AS dataset,
    COUNT(*) AS record_count
FROM orders
UNION ALL
SELECT 
    'Cleaned Orders' AS dataset,
    COUNT(*) AS record_count
FROM temp_cleaned_orders;


### Problem 6: CTAS with Clustering for Performance
Create a table using CTAS that is optimized for queries filtering by country and order_date. Use ORDER BY in the CTAS to cluster the data by country and order_date for better query performance.


In [None]:
-- Solution to Problem 6: CTAS with Clustering
CREATE OR REPLACE TABLE orders_by_country_date AS
SELECT 
    o.order_id,
    o.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    o.order_date,
    o.total_amount,
    o.status,
    o.product_category
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'Completed'
ORDER BY c.country, o.order_date DESC, o.total_amount DESC;

-- This table is optimized for queries filtering by country and date
-- The ORDER BY clause helps Snowflake cluster the data for better performance

-- Example optimized query
SELECT 
    country,
    DATE_TRUNC('month', order_date) AS order_month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue
FROM orders_by_country_date
WHERE country = 'USA' 
    AND order_date >= '2023-09-01'
GROUP BY country, DATE_TRUNC('month', order_date)
ORDER BY order_month DESC;


### Problem 7: Temporary Table for Complex RFM Analysis
Create a temporary table-based solution for RFM (Recency, Frequency, Monetary) analysis:
1. Calculate customer recency (days since last order)
2. Calculate customer frequency (number of orders)
3. Calculate customer monetary value (total spent)
4. Segment customers into RFM categories
5. Generate final report with customer segments


In [None]:
-- Solution to Problem 7: RFM Analysis with Temporary Tables

-- Step 1: Calculate base RFM metrics
CREATE OR REPLACE TEMPORARY TABLE temp_rfm_base AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    -- Recency: Days since last order
    DATEDIFF('day', MAX(o.order_date), CURRENT_DATE()) AS recency_days,
    -- Frequency: Number of orders
    COUNT(o.order_id) AS frequency,
    -- Monetary: Total amount spent
    COALESCE(SUM(o.total_amount), 0) AS monetary_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'Completed'
GROUP BY c.customer_id, c.first_name, c.last_name, c.country;

-- Step 2: Calculate RFM scores (1-5 scale)
CREATE OR REPLACE TEMPORARY TABLE temp_rfm_scores AS
SELECT 
    customer_id,
    customer_name,
    country,
    recency_days,
    frequency,
    monetary_value,
    -- Recency Score (lower days = higher score)
    CASE 
        WHEN recency_days <= 30 THEN 5
        WHEN recency_days <= 60 THEN 4
        WHEN recency_days <= 90 THEN 3
        WHEN recency_days <= 180 THEN 2
        ELSE 1
    END AS recency_score,
    -- Frequency Score
    CASE 
        WHEN frequency >= 5 THEN 5
        WHEN frequency >= 3 THEN 4
        WHEN frequency >= 2 THEN 3
        WHEN frequency >= 1 THEN 2
        ELSE 1
    END AS frequency_score,
    -- Monetary Score
    CASE 
        WHEN monetary_value >= 500 THEN 5
        WHEN monetary_value >= 300 THEN 4
        WHEN monetary_value >= 200 THEN 3
        WHEN monetary_value >= 100 THEN 2
        ELSE 1
    END AS monetary_score
FROM temp_rfm_base;

-- Step 3: Create customer segments
SELECT 
    customer_id,
    customer_name,
    country,
    recency_days,
    frequency,
    monetary_value,
    recency_score,
    frequency_score,
    monetary_score,
    recency_score || frequency_score || monetary_score AS rfm_cell,
    CASE 
        WHEN recency_score >= 4 AND frequency_score >= 4 AND monetary_score >= 4 THEN 'Champions'
        WHEN recency_score >= 3 AND frequency_score >= 3 AND monetary_score >= 3 THEN 'Loyal Customers'
        WHEN recency_score >= 4 AND frequency_score <= 2 THEN 'New Customers'
        WHEN recency_score <= 2 AND frequency_score >= 3 THEN 'At Risk'
        WHEN recency_score <= 2 AND frequency_score <= 2 AND monetary_score >= 3 THEN 'Cannot Lose Them'
        WHEN recency_score <= 2 THEN 'Lost'
        ELSE 'Need Attention'
    END AS customer_segment
FROM temp_rfm_scores
ORDER BY monetary_value DESC;


### Problem 8: Compare CTAS Table vs View Performance
Create both a CTAS table and a view for the same query (customer order summary). Then:
1. Query both and compare execution plans
2. Insert new data into source tables
3. Query both again and explain the difference in results
4. Explain when to use each approach


In [None]:
-- Solution to Problem 8: CTAS Table vs View Comparison

-- Create CTAS table (static snapshot)
CREATE OR REPLACE TABLE customer_order_summary_table AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'Completed'
GROUP BY c.customer_id, c.first_name, c.last_name, c.country;

-- Create View (dynamic, always current)
CREATE OR REPLACE VIEW customer_order_summary_view AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS customer_name,
    c.country,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'Completed'
GROUP BY c.customer_id, c.first_name, c.last_name, c.country;

-- Initial query - both return same data
SELECT 'Table (CTAS)' AS type, COUNT(*) AS customer_count, SUM(total_spent) AS total_revenue 
FROM customer_order_summary_table
UNION ALL
SELECT 'View' AS type, COUNT(*) AS customer_count, SUM(total_spent) AS total_revenue 
FROM customer_order_summary_view;

-- Add new data
INSERT INTO orders VALUES
(112, 2, '2023-10-20', 500.00, 'Completed', 'Electronics'),
(113, 4, '2023-10-21', 350.00, 'Completed', 'Clothing');

-- Query again - see the difference!
SELECT 'Table (CTAS) - Static' AS type, COUNT(*) AS customer_count, SUM(total_spent) AS total_revenue 
FROM customer_order_summary_table
UNION ALL
SELECT 'View - Dynamic' AS type, COUNT(*) AS customer_count, SUM(total_spent) AS total_revenue 
FROM customer_order_summary_view;

-- Explanation:
-- Table (CTAS): Shows old data - needs to be refreshed/recreated
-- View: Shows new data automatically - always reflects current state
-- 
-- Use CTAS Table when:
-- - You need fast query performance
-- - Data doesn't need to be real-time
-- - You're creating historical snapshots
-- - You need to create indexes
--
-- Use View when:
-- - You need real-time, always-current data
-- - Storage cost is a concern
-- - Data changes frequently
-- - You want to simplify complex queries


## Additional Notes

### CTAS Best Practices in Snowflake:

1. **Use ORDER BY for Clustering**: Order data by frequently queried columns
2. **Explicit Column Types**: Define column types when needed for consistency
3. **OR REPLACE**: Use carefully - it drops existing table
4. **IF NOT EXISTS**: Use when you want to avoid errors if table exists
5. **Partitioning**: Consider partitioning for very large tables
6. **Compression**: Snowflake automatically compresses data
7. **Monitoring**: Monitor storage costs for large CTAS operations

### Temporary Table Best Practices:

1. **Naming**: Use clear, descriptive names (e.g., `temp_stage1_extract`)
2. **Cleanup**: Temporary tables auto-cleanup, but you can DROP manually if needed
3. **Session Isolation**: Remember each session has its own temporary tables
4. **Performance**: Use for intermediate results in complex queries
5. **Testing**: Great for testing without affecting production
6. **Size Limits**: Be aware of temporary storage limits

### Management Commands:

```sql
-- Show all tables (including temporary)
SHOW TABLES;

-- Show only temporary tables
SHOW TABLES LIKE 'TEMP%';

-- Drop temporary table manually (if needed)
DROP TABLE IF EXISTS temp_table_name;

-- Get table information
DESCRIBE TABLE table_name;
```


In [None]:
-- Example: Management commands for CTAS and temporary tables

-- List all tables
SHOW TABLES;

-- Describe a table structure
DESCRIBE TABLE customer_summary;

-- Show temporary tables (they appear in SHOW TABLES with TEMPORARY flag)
SHOW TABLES;

-- Drop a table
-- DROP TABLE IF EXISTS table_name;

-- Drop a temporary table (optional - auto-drops at session end)
-- DROP TABLE IF EXISTS temp_customer_summary;
