# SQL Keys and Constraints:

## Introduction

**Keys and constraints are the rules that make databases reliable.**

Without them, your database is just a collection of tables with no guarantees. With them, you can ensure:
- ✅ No duplicate records
- ✅ Relationships between tables are valid
- ✅ Data follows business rules
- ✅ Critical fields are never empty

**This is the foundation of data integrity.**

### Snowflake-Specific Notes

All SQL examples in this notebook are written for **Snowflake**. Key Snowflake differences:
- Uses `IDENTITY(1, 1)` instead of `AUTO_INCREMENT` for auto-incrementing columns
- Uses `CURRENT_DATE()` and `CURRENT_TIMESTAMP()` function syntax
- Uses `TIMESTAMP_NTZ` (no timezone) or `TIMESTAMP_LTZ` (local timezone) for timestamp types

---

## 1. Primary Key: The Unique Identifier

### What is a Primary Key?

A **Primary Key** is a column (or set of columns) that uniquely identifies each row in a table.

**Think of it as:** A social security number, a passport ID, or a student ID. There can only be one.

### Key Characteristics

1. **UNIQUE:** No two rows can have the same primary key value
2. **NOT NULL:** A primary key cannot be empty
3. **IMMUTABLE:** Should never change (best practice)
4. **ONE PER TABLE:** Each table can have only one primary key

### Example: Creating a Table with Primary Key

```sql
-- Single column primary key
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100)
);
```

### Composite Primary Key

Sometimes, a single column isn't enough. You might need multiple columns together to form a unique identifier.

```sql
-- Composite primary key (multiple columns)
CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    price DECIMAL(10, 2),
    PRIMARY KEY (order_id, product_id)  -- Both together must be unique
);
```

**Why composite?**
- In an order_items table, the same product can appear in multiple orders
- The same order can have multiple products
- But the combination (order_id + product_id) must be unique

### Primary Key vs. Just Unique

**Question:** Why not just use UNIQUE instead of PRIMARY KEY?

**Answer:**
- Primary keys are automatically indexed (faster lookups)
- Primary keys are used by foreign keys to establish relationships
- Primary keys are the "official" identifier of a row
- Only one primary key per table, but multiple UNIQUE constraints allowed

---

## 2. Foreign Key: Linking Tables Together

### What is a Foreign Key?

A **Foreign Key** is a column (or set of columns) in one table that references the primary key of another table.

**Think of it as:** A pointer that says "this row belongs to that row in another table."

### Why Foreign Keys Matter

**Without foreign keys:**
- You can create an order for a customer that doesn't exist
- You can delete a customer, leaving orphaned orders
- No enforced relationship between tables

**With foreign keys:**
- ✅ Database enforces that relationships are valid
- ✅ Prevents orphaned records
- ✅ Maintains referential integrity

### Example: Creating Foreign Keys

```sql
-- Parent table
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100)
);

-- Child table with foreign key
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,  -- This will reference customers.customer_id
    order_date DATE,
    total_amount DECIMAL(10, 2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
```

### What Foreign Keys Prevent

#### 1. Invalid References
```sql
-- This will FAIL because customer_id 999 doesn't exist
INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES (1, 999, '2024-01-15', 100.00);
-- Error: Foreign key constraint violation
```

#### 2. Orphaned Records
```sql
-- This will FAIL if there are orders referencing this customer
DELETE FROM customers WHERE customer_id = 123;
-- Error: Cannot delete customer because orders reference it
```

### Foreign Key Actions: ON DELETE and ON UPDATE

You can control what happens when the referenced row is deleted or updated.

#### ON DELETE CASCADE
```sql
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    FOREIGN KEY (customer_id) 
        REFERENCES customers(customer_id)
        ON DELETE CASCADE  -- If customer is deleted, delete their orders too
);
```

#### ON DELETE SET NULL
```sql
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,  -- Must allow NULL
    order_date DATE,
    FOREIGN KEY (customer_id) 
        REFERENCES customers(customer_id)
        ON DELETE SET NULL  -- If customer is deleted, set customer_id to NULL
);
```

#### ON DELETE RESTRICT (Default)
```sql
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    FOREIGN KEY (customer_id) 
        REFERENCES customers(customer_id)
        ON DELETE RESTRICT  -- Prevent deletion if orders exist (default behavior)
);
```

**Common Options:**
- `CASCADE`: Delete/update child rows when parent is deleted/updated
- `SET NULL`: Set foreign key to NULL when parent is deleted (column must allow NULL)
- `RESTRICT` / `NO ACTION`: Prevent deletion/update if child rows exist
- `SET DEFAULT`: Set foreign key to default value when parent is deleted

**Note:** Snowflake supports foreign key constraints, but they are not enforced by default in all contexts. Make sure to enable constraint enforcement in your Snowflake session if needed.

---

## 3. NOT NULL: Mandatory Fields

### What is NOT NULL?

**NOT NULL** ensures that a column must always have a value. It cannot be empty or NULL.

**Think of it as:** A required field in a form. You can't submit without filling it.

### Example

```sql
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,  -- Must have a value
    last_name VARCHAR(50) NOT NULL,    -- Must have a value
    email VARCHAR(100),                -- Can be NULL (optional)
    phone VARCHAR(20)                  -- Can be NULL (optional)
);
```

### What NOT NULL Prevents

```sql
-- This will FAIL
INSERT INTO employees (employee_id, email)
VALUES (1, 'john@example.com');
-- Error: first_name and last_name cannot be NULL

-- This will SUCCEED
INSERT INTO employees (employee_id, first_name, last_name, email)
VALUES (1, 'John', 'Doe', 'john@example.com');
```

### When to Use NOT NULL

**Use NOT NULL for:**
- ✅ Primary keys (automatic)
- ✅ Critical business fields (name, email, price)
- ✅ Fields that must always have a value for data integrity

**Don't use NOT NULL for:**
- ❌ Optional fields (middle name, phone number)
- ❌ Fields that might not be known at creation time

---

## 4. UNIQUE: No Duplicates Allowed

### What is UNIQUE?

**UNIQUE** ensures that all values in a column (or set of columns) are different. No duplicates allowed.

**Think of it as:** A username in a system. Each user must have a unique username.

### UNIQUE vs. Primary Key

| Feature | PRIMARY KEY | UNIQUE |
|---------|-------------|--------|
| Uniqueness | ✅ Yes | ✅ Yes |
| NULL values | ❌ Not allowed | ✅ Allowed (usually one NULL) |
| Number per table | 1 | Multiple |
| Auto-indexed | ✅ Yes | ✅ Yes |
| Used by foreign keys | ✅ Yes | ❌ No |

### Example: Single Column UNIQUE

```sql
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    username VARCHAR(50) UNIQUE,  -- Each username must be unique
    email VARCHAR(100) UNIQUE,    -- Each email must be unique
    password_hash VARCHAR(255)
);
```

### Example: Composite UNIQUE

```sql
CREATE TABLE enrollments (
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    UNIQUE (student_id, course_id)  -- A student can't enroll in the same course twice
);
```

### What UNIQUE Prevents

```sql
-- This will FAIL (duplicate username)
INSERT INTO users (user_id, username, email)
VALUES (1, 'johndoe', 'john@example.com');

INSERT INTO users (user_id, username, email)
VALUES (2, 'johndoe', 'jane@example.com');
-- Error: Duplicate username 'johndoe'
```

---

## 5. CHECK: Custom Business Rules

### What is CHECK?

**CHECK** constraint allows you to define custom rules that data must follow.

**Think of it as:** Validation rules. "Age must be between 0 and 150" or "Price must be positive."

### Example: Range Validation

```sql
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10, 2) CHECK (price > 0),  -- Price must be positive
    stock_quantity INT CHECK (stock_quantity >= 0),  -- Stock can't be negative
    discount_percent INT CHECK (discount_percent >= 0 AND discount_percent <= 100)  -- 0-100%
);
```

### Example: Pattern Validation

```sql
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    email VARCHAR(100) CHECK (email LIKE '%@%.%'),  -- Basic email format check
    age INT CHECK (age >= 18 AND age <= 65),  -- Working age
    salary DECIMAL(10, 2) CHECK (salary >= 0)
);
```

### What CHECK Prevents

```sql
-- This will FAIL
INSERT INTO products (product_id, name, price, stock_quantity)
VALUES (1, 'Laptop', -100.00, 10);
-- Error: CHECK constraint violation (price must be > 0)

-- This will FAIL
INSERT INTO employees (employee_id, email, age, salary)
VALUES (1, 'invalid-email', 25, 50000);
-- Error: CHECK constraint violation (email format)
```

---

## 6. DEFAULT: Automatic Values

### What is DEFAULT?

**DEFAULT** provides a value for a column when no value is specified during INSERT.

**Think of it as:** Auto-fill. If you don't provide a value, use this one.

### Example

```sql
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE DEFAULT CURRENT_DATE(),  -- Use today's date if not provided (Snowflake)
    status VARCHAR(20) DEFAULT 'pending',   -- Default status is 'pending'
    created_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()  -- Auto-set creation time (Snowflake)
);
```

### Using DEFAULT

```sql
-- This will use DEFAULT values for order_date and status
INSERT INTO orders (order_id, customer_id)
VALUES (1, 123);
-- order_date = today's date
-- status = 'pending'
-- created_at = current timestamp

-- You can still override defaults
INSERT INTO orders (order_id, customer_id, order_date, status)
VALUES (2, 123, '2024-01-20', 'completed');
```

---

## 7. Putting It All Together: Real-World Example

### Complete Example: E-Commerce Database

```sql
-- Customers table
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    email VARCHAR(100) UNIQUE NOT NULL,
    name VARCHAR(100) NOT NULL,
    phone VARCHAR(20),
    created_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()  -- Snowflake timestamp
);

-- Products table
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    price DECIMAL(10, 2) NOT NULL CHECK (price > 0),
    stock_quantity INT DEFAULT 0 CHECK (stock_quantity >= 0),
    category VARCHAR(50)
);

-- Orders table
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date DATE DEFAULT CURRENT_DATE(),  -- Snowflake function syntax
    status VARCHAR(20) DEFAULT 'pending' 
        CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled')),
    total_amount DECIMAL(10, 2) CHECK (total_amount >= 0),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id) ON DELETE RESTRICT
);

-- Order items table
CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10, 2) NOT NULL CHECK (unit_price > 0),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES products(product_id) ON DELETE RESTRICT
);
```

### What This Schema Enforces

✅ **Primary Keys:** Each table has a unique identifier  
✅ **Foreign Keys:** Orders must reference valid customers; order items must reference valid orders and products  
✅ **NOT NULL:** Critical fields like email, name, price cannot be empty  
✅ **UNIQUE:** Email addresses must be unique  
✅ **CHECK:** Prices must be positive, quantities must be valid, status must be from allowed values  
✅ **DEFAULT:** Order dates and timestamps are automatically set  

---

## 8. Best Practices

### Primary Key Best Practices

1. **Use integers (INT, BIGINT)** - Faster than strings
2. **Consider auto-increment** - Let the database generate IDs
3. **Never change primary keys** - They're meant to be permanent
4. **Keep them simple** - Prefer single column over composite when possible

```sql
-- Good: Auto-incrementing integer (Snowflake uses IDENTITY)
CREATE TABLE users (
    user_id INT PRIMARY KEY IDENTITY(1, 1),  -- Start at 1, increment by 1
    username VARCHAR(50)
);
```

### Foreign Key Best Practices

1. **Choose appropriate ON DELETE action:**
   - `CASCADE` for dependent data (order items when order is deleted)
   - `RESTRICT` for critical relationships (orders when customer is deleted)
   - `SET NULL` for optional relationships

2. **Index foreign keys** - Most databases do this automatically, but verify

3. **Name your constraints** - Makes error messages clearer

```sql
-- Named constraint
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    FOREIGN KEY (customer_id) 
        REFERENCES customers(customer_id)
        ON DELETE RESTRICT
        CONSTRAINT fk_orders_customer  -- Named constraint
);
```

### Constraint Best Practices

1. **Use NOT NULL liberally** - If a field is always required, make it NOT NULL
2. **Use CHECK for business rules** - Enforce rules at the database level, not just in application code
3. **Use DEFAULT for common values** - Reduces errors and simplifies inserts
4. **Document your constraints** - Explain why they exist

---

## 9. Common Mistakes to Avoid

### Mistake 1: Forgetting Foreign Keys

```sql
-- BAD: No foreign key constraint
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT  -- No constraint! Can insert invalid customer_id
);

-- GOOD: With foreign key
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
```

### Mistake 2: Using Strings as Primary Keys

```sql
-- BAD: String primary key (slower, takes more space)
CREATE TABLE users (
    username VARCHAR(50) PRIMARY KEY,
    email VARCHAR(100)
);

-- GOOD: Integer primary key with unique constraint
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    username VARCHAR(50) UNIQUE,
    email VARCHAR(100)
);
```

### Mistake 3: Not Using NOT NULL on Critical Fields

```sql
-- BAD: Price can be NULL
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10, 2)  -- What if price is NULL?
);

-- GOOD: Price is required
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10, 2) NOT NULL CHECK (price > 0)
);
```

---

## 10. Summary: The Constraint Hierarchy

**From most to least restrictive:**

1. **PRIMARY KEY** - Unique + NOT NULL + Only one per table
2. **UNIQUE** - Unique values (NULL usually allowed once)
3. **NOT NULL** - Must have a value
4. **CHECK** - Must follow custom rules
5. **DEFAULT** - Provides a value if none given
6. **FOREIGN KEY** - Must reference valid row in another table

**Remember:**
> Constraints are your database's way of saying "I won't let you store bad data." They're not suggestions—they're rules. Use them wisely, and your data will thank you.
