# Module 07: Database Design & Normalization

**Estimated Time:** 60 minutes

## Learning Objectives

By the end of this module, you will be able to:
- Understand database normalization (1NF, 2NF, 3NF)
- Design effective database schemas
- Create tables with proper constraints
- Use primary and foreign keys
- Apply constraints (UNIQUE, NOT NULL, CHECK, DEFAULT)
- Understand entity-relationship diagrams

In [None]:
# Setup
import sqlite3
import pandas as pd
from pathlib import Path

%load_ext sql

# Create a temporary database for examples
DB_PATH = Path.cwd().parent / "data" / "databases" / "design_examples.db"
conn = sqlite3.connect(DB_PATH)
%sql sqlite:///$DB_PATH

print("✓ Connected to design_examples.db")

## 1. What is Database Normalization?

**Normalization** is the process of organizing data to:
- Reduce data redundancy
- Improve data integrity
- Make databases easier to maintain

### Normal Forms
1. **First Normal Form (1NF)**: Atomic values, no repeating groups
2. **Second Normal Form (2NF)**: 1NF + No partial dependencies
3. **Third Normal Form (3NF)**: 2NF + No transitive dependencies

## 2. First Normal Form (1NF)

**Rules:**
- Each column contains atomic (indivisible) values
- Each column contains values of a single type
- Each column has a unique name
- Order of rows doesn't matter

### Example: NOT in 1NF

| student_id | name | phone_numbers |
|------------|------|---------------|
| 1 | John | 555-1234, 555-5678 |
| 2 | Jane | 555-9999 |

**Problem**: phone_numbers column contains multiple values

### Example: In 1NF

| student_id | name | phone_number |
|------------|------|-------------|
| 1 | John | 555-1234 |
| 1 | John | 555-5678 |
| 2 | Jane | 555-9999 |

In [None]:
# Create 1NF example
%%sql
DROP TABLE IF EXISTS students;
DROP TABLE IF EXISTS student_phones;

CREATE TABLE students (
    student_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL
);

CREATE TABLE student_phones (
    phone_id INTEGER PRIMARY KEY,
    student_id INTEGER,
    phone_number TEXT NOT NULL,
    phone_type TEXT,
    FOREIGN KEY (student_id) REFERENCES students(student_id)
);

## 3. Second Normal Form (2NF)

**Rules:**
- Must be in 1NF
- No partial dependencies (all non-key columns depend on the entire primary key)

### Example: NOT in 2NF

| order_id | product_id | product_name | quantity | price |
|----------|------------|--------------|----------|-------|
| 1 | 101 | Laptop | 2 | 1299.99 |
| 1 | 102 | Mouse | 1 | 29.99 |

**Problem**: product_name and price depend only on product_id, not on the full key (order_id, product_id)

### Solution: Separate into multiple tables

In [None]:
# Create 2NF example
%%sql
DROP TABLE IF EXISTS products_2nf;
DROP TABLE IF EXISTS order_items_2nf;

CREATE TABLE products_2nf (
    product_id INTEGER PRIMARY KEY,
    product_name TEXT NOT NULL,
    price REAL NOT NULL
);

CREATE TABLE order_items_2nf (
    order_id INTEGER,
    product_id INTEGER,
    quantity INTEGER NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (product_id) REFERENCES products_2nf(product_id)
);

## 4. Third Normal Form (3NF)

**Rules:**
- Must be in 2NF
- No transitive dependencies (non-key columns should not depend on other non-key columns)

### Example: NOT in 3NF

| employee_id | name | department_id | department_name | department_location |
|-------------|------|---------------|-----------------|---------------------|
| 1 | John | 10 | Sales | New York |
| 2 | Jane | 20 | Marketing | Los Angeles |

**Problem**: department_name and department_location depend on department_id, not employee_id

### Solution:

In [None]:
# Create 3NF example
%%sql
DROP TABLE IF EXISTS departments_3nf;
DROP TABLE IF EXISTS employees_3nf;

CREATE TABLE departments_3nf (
    department_id INTEGER PRIMARY KEY,
    department_name TEXT NOT NULL,
    department_location TEXT
);

CREATE TABLE employees_3nf (
    employee_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    FOREIGN KEY (department_id) REFERENCES departments_3nf(department_id)
);

## 5. Primary Keys

A **primary key** uniquely identifies each row in a table.

**Properties:**
- Must be unique
- Cannot be NULL
- One primary key per table
- Can be single or composite (multiple columns)

In [None]:
# Single-column primary key
%%sql
DROP TABLE IF EXISTS users;

CREATE TABLE users (
    user_id INTEGER PRIMARY KEY AUTOINCREMENT,
    username TEXT NOT NULL UNIQUE,
    email TEXT NOT NULL UNIQUE,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

In [None]:
# Composite primary key
%%sql
DROP TABLE IF EXISTS course_enrollments;

CREATE TABLE course_enrollments (
    student_id INTEGER,
    course_id INTEGER,
    enrollment_date TEXT NOT NULL,
    grade TEXT,
    PRIMARY KEY (student_id, course_id)
);

## 6. Foreign Keys

A **foreign key** links two tables together.

**Benefits:**
- Enforces referential integrity
- Prevents orphaned records
- Documents relationships between tables

In [None]:
# Example with foreign keys
%%sql
DROP TABLE IF EXISTS authors;
DROP TABLE IF EXISTS books;

CREATE TABLE authors (
    author_id INTEGER PRIMARY KEY,
    author_name TEXT NOT NULL,
    country TEXT
);

CREATE TABLE books (
    book_id INTEGER PRIMARY KEY,
    title TEXT NOT NULL,
    author_id INTEGER NOT NULL,
    published_year INTEGER,
    FOREIGN KEY (author_id) REFERENCES authors(author_id)
        ON DELETE RESTRICT
        ON UPDATE CASCADE
);

## 7. Constraints

Constraints enforce rules on data.

### Common Constraints:
- **NOT NULL**: Column cannot be null
- **UNIQUE**: All values must be unique
- **CHECK**: Values must satisfy a condition
- **DEFAULT**: Default value if none provided
- **PRIMARY KEY**: Unique identifier
- **FOREIGN KEY**: Links to another table

In [None]:
# Table with various constraints
%%sql
DROP TABLE IF EXISTS products_constrained;

CREATE TABLE products_constrained (
    product_id INTEGER PRIMARY KEY AUTOINCREMENT,
    product_name TEXT NOT NULL,
    sku TEXT NOT NULL UNIQUE,
    price REAL NOT NULL CHECK (price >= 0),
    discount_percent REAL DEFAULT 0 CHECK (discount_percent BETWEEN 0 AND 100),
    stock_quantity INTEGER DEFAULT 0 CHECK (stock_quantity >= 0),
    is_active INTEGER DEFAULT 1 CHECK (is_active IN (0, 1)),
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

In [None]:
# Test constraints
%%sql
INSERT INTO products_constrained (product_name, sku, price, discount_percent)
VALUES 
    ('Laptop', 'LAP-001', 1299.99, 10),
    ('Mouse', 'MOU-001', 29.99, 0),
    ('Keyboard', 'KEY-001', 89.99, 15);

SELECT * FROM products_constrained;

## 8. Real-World Schema Design

Let's design a blog system database.

In [None]:
%%sql
-- Blog System Database Design

DROP TABLE IF EXISTS blog_users;
DROP TABLE IF EXISTS blog_posts;
DROP TABLE IF EXISTS blog_comments;
DROP TABLE IF EXISTS blog_tags;
DROP TABLE IF EXISTS post_tags;

-- Users table
CREATE TABLE blog_users (
    user_id INTEGER PRIMARY KEY AUTOINCREMENT,
    username TEXT NOT NULL UNIQUE,
    email TEXT NOT NULL UNIQUE,
    password_hash TEXT NOT NULL,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    is_active INTEGER DEFAULT 1 CHECK (is_active IN (0, 1))
);

-- Posts table
CREATE TABLE blog_posts (
    post_id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id INTEGER NOT NULL,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    published_at TEXT,
    is_published INTEGER DEFAULT 0 CHECK (is_published IN (0, 1)),
    view_count INTEGER DEFAULT 0,
    FOREIGN KEY (user_id) REFERENCES blog_users(user_id)
);

-- Comments table
CREATE TABLE blog_comments (
    comment_id INTEGER PRIMARY KEY AUTOINCREMENT,
    post_id INTEGER NOT NULL,
    user_id INTEGER NOT NULL,
    comment_text TEXT NOT NULL,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (post_id) REFERENCES blog_posts(post_id) ON DELETE CASCADE,
    FOREIGN KEY (user_id) REFERENCES blog_users(user_id)
);

-- Tags table
CREATE TABLE blog_tags (
    tag_id INTEGER PRIMARY KEY AUTOINCREMENT,
    tag_name TEXT NOT NULL UNIQUE
);

-- Many-to-many relationship between posts and tags
CREATE TABLE post_tags (
    post_id INTEGER,
    tag_id INTEGER,
    PRIMARY KEY (post_id, tag_id),
    FOREIGN KEY (post_id) REFERENCES blog_posts(post_id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES blog_tags(tag_id) ON DELETE CASCADE
);

## 9. Exercises

### Exercise 1: Design a Library System
Create tables for a library management system with:
- Members (member_id, name, email, join_date)
- Books (book_id, title, author, isbn, copies_available)
- Loans (loan_id, member_id, book_id, loan_date, due_date, return_date)

Include appropriate constraints and foreign keys.

In [None]:
# Your code here
%%sql

### Exercise 2: Normalize a Denormalized Table
Given this denormalized table, split it into proper 3NF tables:

| order_id | customer_name | customer_email | product_name | product_category | quantity | price |
|----------|---------------|----------------|--------------|------------------|----------|-------|
| 1 | John Doe | john@email.com | Laptop | Electronics | 1 | 1299.99 |
| 1 | John Doe | john@email.com | Mouse | Electronics | 2 | 29.99 |

In [None]:
# Your code here
%%sql

### Exercise 3: Add Constraints
Create an `employees` table with these constraints:
- employee_id: PRIMARY KEY
- email: UNIQUE, NOT NULL
- salary: CHECK (salary > 0)
- hire_date: NOT NULL
- is_active: DEFAULT 1, CHECK (0 or 1)

In [None]:
# Your code here
%%sql

## Summary

In this module, you learned:
- ✓ Database normalization (1NF, 2NF, 3NF)
- ✓ Primary keys and their importance
- ✓ Foreign keys for referential integrity
- ✓ Various constraints (NOT NULL, UNIQUE, CHECK, DEFAULT)
- ✓ How to design well-structured databases
- ✓ Real-world schema design patterns

**Key Takeaways:**
- Normalization reduces redundancy and improves data integrity
- Primary keys uniquely identify rows
- Foreign keys maintain relationships and prevent orphaned data
- Constraints enforce business rules at the database level

**Next:** Module 08 - Advanced Queries

In [None]:
conn.close()