# Chapter 16: Databases and ORM

Most applications require persistent storage—data that survives program termination. While files suffice for simple needs, complex applications demand robust data management: querying capabilities, transaction integrity, concurrent access, and data relationships. Relational databases provide these features, and Object-Relational Mapping (ORM) layers bridge the object-oriented world of Python with the relational world of SQL.

This chapter explores SQLAlchemy, Python's standard database toolkit. We begin with foundational relational concepts, progress through SQLAlchemy Core for explicit SQL control, and culminate in the ORM for elegant object persistence. You will learn to design schemas, express queries in Python code, manage relationships, and evolve database schemas with migrations using Alembic.

## 16.1 SQL Basics: Relational Database Concepts

Before using ORMs, understanding the underlying relational model is essential. Relational databases organize data into tables (relations) with rows (tuples) and columns (attributes), linked through keys.

### The Relational Model

```sql
-- Users Table (Entity)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,           -- Primary Key: Unique identifier
    username VARCHAR(50) NOT NULL,   -- NOT NULL constraint
    email VARCHAR(100) UNIQUE,       -- UNIQUE constraint
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Articles Table (Entity with relationship to Users)
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    content TEXT,
    author_id INTEGER REFERENCES users(id),  -- Foreign Key: Links to users table
    status VARCHAR(20) DEFAULT 'draft',
    published_at TIMESTAMP
);

-- Tags Table
CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE NOT NULL
);

-- Article-Tags Junction Table (Many-to-Many Relationship)
CREATE TABLE article_tags (
    article_id INTEGER REFERENCES articles(id) ON DELETE CASCADE,
    tag_id INTEGER REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (article_id, tag_id)  -- Composite Primary Key
);
```

### Key Concepts

**Primary Key:** A unique identifier for each row. Typically an auto-incrementing integer (`SERIAL` in PostgreSQL, `AUTOINCREMENT` in SQLite) or UUID.

**Foreign Key:** A column that references the primary key of another table, establishing relationships. Enforces referential integrity (cannot reference non-existent rows).

**Constraints:**
*   `NOT NULL`: Column must have a value
*   `UNIQUE`: No two rows can have the same value
*   `CHECK`: Custom validation (e.g., `CHECK (age >= 0)`)
*   `DEFAULT`: Value used if none provided

### SQL Operations (CRUD)

```sql
-- Create (INSERT)
INSERT INTO users (username, email) VALUES ('alice', 'alice@example.com');

-- Read (SELECT)
SELECT * FROM users;
SELECT username, email FROM users WHERE id = 1;
SELECT * FROM users WHERE username LIKE 'a%';  -- Starts with 'a'

-- Update (UPDATE)
UPDATE users SET email = 'new@example.com' WHERE id = 1;

-- Delete (DELETE)
DELETE FROM users WHERE id = 1;
```

### Joins: Combining Tables

Joins retrieve related data across tables:

```sql
-- INNER JOIN: Only matching rows from both tables
SELECT articles.title, users.username
FROM articles
INNER JOIN users ON articles.author_id = users.id;

-- LEFT JOIN: All articles, even if no author (author_id is NULL)
SELECT articles.title, users.username
FROM articles
LEFT JOIN users ON articles.author_id = users.id;
```

### Transactions (ACID)

Databases group operations into **transactions** that satisfy ACID properties:

*   **Atomicity**: All operations succeed or all fail (no partial state)
*   **Consistency**: Database moves from one valid state to another
*   **Isolation**: Concurrent transactions don't interfere
*   **Durability**: Committed changes persist even after power loss

```sql
BEGIN TRANSACTION;

-- Transfer money from account 1 to account 2
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

-- If any error occurs:
-- ROLLBACK;  -- Undo all changes

COMMIT;  -- Save changes permanently
```

## 16.2 SQLAlchemy Core: Expressive SQL

SQLAlchemy Core provides a Pythonic SQL expression language. It generates SQL while remaining close to the underlying relational model, offering fine-grained control without string concatenation.

### Engine and Connection

The **Engine** manages connection pooling and dialect-specific SQL generation:

```python
from sqlalchemy import create_engine, text
from typing import List, Dict, Any

# Engine configuration
# Format: dialect+driver://username:password@host:port/database
DATABASE_URL = "postgresql+psycopg2://user:password@localhost/blog"
# DATABASE_URL = "sqlite:///blog.db"  # SQLite (file-based)

# Create engine with connection pooling
engine = create_engine(
    DATABASE_URL,
    pool_size=5,           # Number of connections to keep open
    max_overflow=10,       # Allow up to 10 connections beyond pool_size
    pool_pre_ping=True,    # Check connection health before use
    echo=True              # Log generated SQL (debug mode)
)

# Basic execution
with engine.connect() as conn:
    # Execute raw SQL
    result = conn.execute(text("SELECT * FROM users"))
    
    # Iterate over result rows (RowProxy objects)
    for row in result:
        print(f"User: {row.username}, Email: {row.email}")
    
    # Parameterized queries (prevents SQL injection)
    result = conn.execute(
        text("SELECT * FROM users WHERE username = :name"),
        {"name": "alice"}
    )
```

### Metadata and Tables

Define schema using Python objects:

```python
from sqlalchemy import MetaData, Table, Column, Integer, String, DateTime, Text, ForeignKey
from datetime import datetime

# Metadata catalog (collection of Table objects)
metadata = MetaData()

# Define tables
users = Table(
    'users',
    metadata,
    Column('id', Integer, primary_key=True),
    Column('username', String(50), nullable=False, unique=True),
    Column('email', String(100), nullable=False),
    Column('created_at', DateTime, default=datetime.utcnow)
)

articles = Table(
    'articles',
    metadata,
    Column('id', Integer, primary_key=True),
    Column('title', String(200), nullable=False),
    Column('content', Text),
    Column('author_id', Integer, ForeignKey('users.id', ondelete='CASCADE')),
    Column('status', String(20), default='draft'),
    Column('published_at', DateTime, nullable=True)
)

# Create tables in database
metadata.create_all(engine)
# Drop all tables: metadata.drop_all(engine)
```

### Expression Language (Pythonic SQL)

Construct queries using Python operators and methods:

```python
from sqlalchemy import select, insert, update, delete, or_, and_, func

# SELECT statements
# SELECT id, username FROM users WHERE id > 5
stmt = select(users.c.id, users.c.username).where(users.c.id > 5)

with engine.connect() as conn:
    result = conn.execute(stmt)
    for row in result:
        print(row)

# INSERT statements
# INSERT INTO users (username, email) VALUES (:username, :email)
insert_stmt = insert(users).values(
    username='bob',
    email='bob@example.com'
)

with engine.connect() as conn:
    conn.execute(insert_stmt)
    conn.commit()  # Explicit commit required

# UPDATE statements
# UPDATE users SET email = :email WHERE username = :username
update_stmt = (
    update(users)
    .where(users.c.username == 'bob')
    .values(email='bob.smith@example.com')
)

# DELETE statements
delete_stmt = delete(users).where(users.c.id == 10)

# Complex WHERE clauses
complex_query = select(users).where(
    and_(
        users.c.username.like('a%'),
        or_(
            users.c.email.contains('example'),
            users.c.id.in_([1, 2, 3])
        )
    )
)

# ORDER BY, LIMIT, OFFSET
ordered_query = (
    select(users)
    .order_by(users.c.created_at.desc())
    .limit(10)
    .offset(20)  # Pagination: page 3 (skip 20, take 10)
)

# Aggregate functions
count_query = select(func.count(users.c.id)).where(users.c.username.like('a%'))

# JOINs
join_query = (
    select(articles.c.title, users.c.username)
    .select_from(articles.join(users, articles.c.author_id == users.c.id))
    .where(articles.c.status == 'published')
)
```

### Transactions

```python
from sqlalchemy.exc import IntegrityError

def transfer_funds(from_id: int, to_id: int, amount: float) -> None:
    """Perform transaction with automatic rollback on error."""
    with engine.begin() as conn:  # Auto-commits on success, rolls back on exception
        # Debit
        conn.execute(
            update(accounts)
            .where(accounts.c.id == from_id)
            .values(balance=accounts.c.balance - amount)
        )
        
        # Credit
        conn.execute(
            update(accounts)
            .where(accounts.c.id == to_id)
            .values(balance=accounts.c.balance + amount)
        )
        # Automatic commit when exiting context
```

## 16.3 SQLAlchemy ORM: Declarative Models

The ORM maps Python classes to database tables, allowing you to work with objects rather than raw SQL. SQLAlchemy 2.0+ uses a unified query interface with `select()`, bridging Core and ORM seamlessly.

### Declarative Mapping

```python
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
from sqlalchemy import String, Integer, ForeignKey, DateTime, Text
from typing import List, Optional
from datetime import datetime

class Base(DeclarativeBase):
    """Base class for all ORM models."""
    pass

class User(Base):
    """User model mapped to 'users' table."""
    
    __tablename__ = 'users'
    
    # Modern annotation-based mapping (SQLAlchemy 2.0+)
    id: Mapped[int] = mapped_column(primary_key=True)
    username: Mapped[str] = mapped_column(String(50), unique=True)
    email: Mapped[str] = mapped_column(String(100))
    created_at: Mapped[datetime] = mapped_column(default=datetime.utcnow)
    
    # Relationship: One User has many Articles
    articles: Mapped[List['Article']] = relationship(
        back_populates='author',
        cascade='all, delete-orphan'  # Delete articles when user deleted
    )
    
    def __repr__(self) -> str:
        return f"User(id={self.id}, username='{self.username}')"

class Article(Base):
    """Article model."""
    
    __tablename__ = 'articles'
    
    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(200))
    content: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    status: Mapped[str] = mapped_column(String(20), default='draft')
    author_id: Mapped[int] = mapped_column(ForeignKey('users.id'))
    published_at: Mapped[Optional[datetime]] = mapped_column(nullable=True)
    
    # Relationship: Many Articles belong to One User
    author: Mapped['User'] = relationship(back_populates='articles')
    
    # Many-to-Many Relationship
    tags: Mapped[List['Tag']] = relationship(
        secondary='article_tags',
        back_populates='articles'
    )

class Tag(Base):
    __tablename__ = 'tags'
    
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(50), unique=True)
    
    articles: Mapped[List['Article']] = relationship(
        secondary='article_tags',
        back_populates='tags'
    )

# Association table for Many-to-Many (no ORM class needed for simple junction)
from sqlalchemy import Table

article_tags = Table(
    'article_tags',
    Base.metadata,
    Column('article_id', Integer, ForeignKey('articles.id', ondelete='CASCADE'), primary_key=True),
    Column('tag_id', Integer, ForeignKey('tags.id', ondelete='CASCADE'), primary_key=True)
)
```

### The Session: Unit of Work

The `Session` manages object identity, tracks changes, and coordinates transactions:

```python
from sqlalchemy.orm import Session, sessionmaker

# Session factory
SessionLocal = sessionmaker(bind=engine, expire_on_commit=False)

# Usage pattern
def create_user(username: str, email: str) -> User:
    """Create and persist a new user."""
    with Session(engine) as session:
        # Create object
        new_user = User(username=username, email=email)
        
        # Add to session (staged for insertion)
        session.add(new_user)
        
        # Flush sends SQL to database, assigns ID
        session.flush()
        print(f"User ID: {new_user.id}")
        
        # Commit transaction
        session.commit()
        
        # Refresh to get server-generated values
        session.refresh(new_user)
        return new_user

def get_or_create_tag(name: str) -> Tag:
    """Get existing tag or create new one."""
    with Session(engine) as session:
        tag = session.scalar(select(Tag).where(Tag.name == name))
        if not tag:
            tag = Tag(name=name)
            session.add(tag)
            session.commit()
        return tag

# Querying
from sqlalchemy import select

def list_published_articles() -> List[Article]:
    """Query articles using 2.0 style."""
    with Session(engine) as session:
        # select() returns Result object
        stmt = (
            select(Article)
            .where(Article.status == 'published')
            .order_by(Article.published_at.desc())
        )
        
        # scalars() returns iterator of ORM objects
        articles = session.scalars(stmt).all()
        return list(articles)

def get_article_with_author(article_id: int) -> Optional[Article]:
    """Load article with author in single query (Eager Loading)."""
    from sqlalchemy.orm import joinedload
    
    with Session(engine) as session:
        stmt = (
            select(Article)
            .options(joinedload(Article.author))  # Prevent N+1 queries
            .where(Article.id == article_id)
        )
        return session.scalar(stmt)

# Updating
def publish_article(article_id: int) -> None:
    """Modify and persist changes."""
    with Session(engine) as session:
        article = session.scalar(select(Article).where(Article.id == article_id))
        if article:
            article.status = 'published'
            article.published_at = datetime.utcnow()
            # No explicit update needed - session tracks changes
            session.commit()  # Automatic dirty checking

# Deleting
def delete_user(user_id: int) -> None:
    """Delete user and cascade to articles."""
    with Session(engine) as session:
        user = session.scalar(select(User).where(User.id == user_id))
        if user:
            session.delete(user)
            session.commit()
```

### Relationships and Loading Strategies

Understanding how SQLAlchemy loads related objects is crucial for performance:

```python
from sqlalchemy.orm import selectinload, joinedload, subqueryload

# Problem: N+1 Query Issue
# Fetches 1 article, then N queries for each author
articles = session.scalars(select(Article)).all()
for article in articles:
    print(article.author.username)  # Triggers separate query per article!

# Solution 1: Joined Eager Loading (JOIN in single query)
stmt = (
    select(Article)
    .options(joinedload(Article.author))
)
articles = session.scalars(stmt).all()
# Generates: SELECT articles.*, users.* FROM articles JOIN users ...

# Solution 2: Select In Loading (Separate IN query)
stmt = (
    select(Article)
    .options(selectinload(Article.tags))  # Load all tags in second query
)
articles = session.scalars(stmt).all()
# Generates:
# 1. SELECT * FROM articles
# 2. SELECT * FROM tags WHERE tags.id IN (...)

# Solution 3: Subquery Loading (For collections)
stmt = (
    select(User)
    .options(subqueryload(User.articles))
)
```

## 16.4 Database Migrations with Alembic

As applications evolve, schemas change. **Alembic** provides migration scripts to version-control database schemas, enabling upgrades and rollbacks.

### Setup

```bash
pip install alembic

# Initialize alembic in project
cd myproject
alembic init migrations
```

### Configuration

```python
# migrations/env.py
from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
from alembic import context
import sys
import os

# Add project root to path
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))

# Import your Base and models
from models import Base  # Your declarative base

config = context.config
fileConfig(config.config_file_name)

# Set target metadata for autogenerate
target_metadata = Base.metadata

def run_migrations_online():
    """Run migrations in 'online' mode."""
    connectable = engine_from_config(
        config.get_section(config.config_ini_section),
        prefix='sqlalchemy.',
        poolclass=pool.NullPool
    )
    
    with connectable.connect() as connection:
        context.configure(
            connection=connection,
            target_metadata=target_metadata,
            compare_type=True  # Detect column type changes
        )
        
        with context.begin_transaction():
            context.run_migrations()
```

```ini
# alembic.ini (configuration file)
[alembic]
script_location = migrations
prepend_sys_path = .
sqlalchemy.url = postgresql://user:password@localhost/blog

[loggers]
keys = root,sqlalchemy,alembic
```

### Creating and Running Migrations

```bash
# Generate migration from model changes (autogenerate)
alembic revision --autogenerate -m "Add email field to users"

# Generated file: migrations/versions/abc123_add_email_field.py
```

```python
# migrations/versions/abc123_add_email_field.py
"""Add email field to users

Revision ID: abc123
Revises: def456
Create Date: 2026-02-13 10:30:00
"""
from alembic import op
import sqlalchemy as sa

# Revision identifiers
revision = 'abc123'
down_revision = 'def456'
branch_labels = None
depends_on = None

def upgrade() -> None:
    """Apply changes."""
    op.add_column('users', sa.Column('email', sa.String(100), nullable=True))
    
    # Data migration
    op.execute("UPDATE users SET email = username || '@example.com'")
    
    # Make column non-nullable after data migration
    op.alter_column('users', 'email', nullable=False)

def downgrade() -> None:
    """Revert changes."""
    op.drop_column('users', 'email')
```

```bash
# Apply migrations (upgrade to latest)
alembic upgrade head

# Rollback one migration
alembic downgrade -1

# Rollback to specific revision
alembic downgrade abc123

# View migration history
alembic history

# View current revision
alembic current
```

### Programmatic Migrations

```python
# Complex migration with data transformations
def upgrade() -> None:
    # Create new table
    op.create_table(
        'categories',
        sa.Column('id', sa.Integer(), primary_key=True),
        sa.Column('name', sa.String(50), unique=True)
    )
    
    # Get connection for data operations
    conn = op.get_bind()
    
    # Migrate data: Extract categories from article tags
    conn.execute(
        sa.text("""
            INSERT INTO categories (name)
            SELECT DISTINCT tag FROM article_tags
        """)
    )
    
    # Add foreign key
    op.add_column(
        'articles',
        sa.Column('category_id', sa.Integer, sa.ForeignKey('categories.id'))
    )
```

## Summary

Database mastery separates script writers from application architects. You understand the **relational model**—tables, rows, primary keys, and foreign keys that establish the structural integrity of your data. SQL provides the query language, but raw strings in code are brittle and dangerous.

**SQLAlchemy Core** offers the middle ground: a Pythonic expression language that generates safe, dialect-specific SQL while keeping you close to the relational model. You can construct complex queries, joins, and updates using Python objects and operators, with connection pooling managing resource efficiency.

The **SQLAlchemy ORM** maps Python classes to database tables, allowing you to reason about your domain through objects. The `Session` implements the Unit of Work pattern, tracking changes and coordinating transactions automatically. You understand the critical importance of loading strategies—joinedload, selectinload—to prevent the N+1 query problem that cripples performance in production.

**Alembic** brings version control to your schemas. As models evolve, migrations capture incremental changes, enabling teams to coordinate schema modifications and deploy with confidence, knowing they can upgrade or rollback databases reliably.

Data persistence is the foundation, but modern applications demand data insight. In the next chapter, we explore Python's dominance in data science: NumPy for numerical computing, Pandas for data manipulation, and visualization tools that transform raw numbers into actionable intelligence.

**Next Chapter**: Chapter 17: The Data Science Stack (NumPy, Pandas, and Visualization).