# Part VII — Data, Integrations, and Advanced ORM  
## 31. Advanced Migrations and Data Management (Safe Changes, Backfills, Zero‑Downtime Mindset)

Migrations are “how your application evolves safely.” In production, migrations are
often riskier than code changes because they:
- change schema in ways that can lock tables
- backfill large amounts of data
- can break running app versions if not coordinated
- can be difficult to roll back

This chapter teaches you **production-safe migration discipline**:

- schema change patterns that avoid downtime
- data backfills without locking the entire table for minutes
- how to coordinate code + schema changes (expand/contract)
- how to validate and roll back
- how to manage large tables and long-running migrations

Even if you’re still learning locally, this is core “industry standard” Django.

---

## 31.0 Learning Outcomes

By the end, you should be able to:

1. Explain what makes migrations risky in production (locks, backfills, long DDL).
2. Use the **expand/contract** pattern for safe changes across deployments.
3. Add fields safely to large tables:
   - nullable first
   - backfill in batches
   - then enforce non-null constraints
4. Create and run data migrations correctly (`RunPython`) without importing models directly.
5. Use `SeparateDatabaseAndState` for advanced cases where DB state and Django state differ temporarily.
6. Use `RunSQL` safely and understand when you must (indexes concurrently, extensions).
7. Detect and mitigate locking issues (especially in PostgreSQL).
8. Create “migration runbooks” and practice rollback strategies.
9. Use migration checks in CI (e.g., ensure no missing migrations).

---

## 31.1 Why Migrations Break Production (The Real Causes)

### 31.1.1 Long-running DDL locks
In PostgreSQL, certain schema changes require locks, potentially blocking writes or even reads:
- altering column types
- adding constraints in certain ways
- creating indexes without CONCURRENTLY
- dropping columns

Result:
- requests hang or time out
- background workers pile up
- incidents happen

### 31.1.2 Backfills that scan huge tables
A “simple” `UPDATE table SET new_col = ...` on 50 million rows can take:
- minutes or hours
- heavy I/O and CPU
- lock contention

### 31.1.3 Code and schema mismatch during deploy
If you deploy new code that expects a new column **before** it exists, you get runtime errors.
If you deploy migrations that remove a column before code stops using it, you get runtime errors.

This is why professional teams coordinate:
- code deploy steps
- migration steps
- rollbacks

---

## 31.2 The Industry Standard Pattern: Expand / Contract

This is the single most important migration strategy for zero/low downtime.

### Expand phase (backward compatible)
- Add new column nullable
- Add new table
- Add new indexes
- Add new code that can work with old + new schema

### Migrate data
- backfill new column gradually
- keep app running

### Contract phase (remove old)
- make new column non-null
- remove old columns / old code paths
- clean up

This supports “deploy code first, migrate later” or “migrate first, deploy later”
depending on your deployment style, without breaking.

---

## 31.3 A Practical Scenario (We Will Implement): Add `summary` to Article

Suppose you want:
- `Article.summary` required (non-null), max 300 chars
- used in list pages and meta descriptions
- must be backfilled from existing `body` for existing rows

If you do it naïvely:
- add `summary` non-null
- migration tries to fill default for all rows
- locks/long migration risk (on big table)
- potential downtime

We’ll do it safely.

---

## 31.4 Step 1 (Expand): Add Field as Nullable + Blank

Edit `articles/models.py`:

```python
class Article(models.Model):
    # ...
    summary = models.CharField(max_length=300, null=True, blank=True)
```

Run migration:

```bash
python manage.py makemigrations
python manage.py migrate
```

### Why `null=True` first
Because your DB already has rows. If you add a non-null field, the DB needs a value
for existing rows. The DB may:
- lock the table
- set a default (which can rewrite entire table)
- fail without a default

Nullable avoids that.

---

## 31.5 Step 2 (Backfill): Data Migration in Batches (Production-Friendly)

### 31.5.1 Why batching matters
A single update on huge tables:
- can take a long time
- can create long locks
- can overwhelm replication / WAL
- can slow down live traffic

Batching:
- breaks work into smaller transactions
- reduces lock time
- allows pausing/retrying

### 31.5.2 Create an empty migration
```bash
python manage.py makemigrations --empty articles --name backfill_article_summary
```

Edit the migration file:

```python
from __future__ import annotations

from django.db import migrations


BATCH_SIZE = 2000


def backfill_summary(apps, schema_editor):
    Article = apps.get_model("articles", "Article")

    # Only backfill rows that are currently NULL/empty.
    qs = Article.objects.filter(summary__isnull=True).order_by("id")

    last_id = 0

    while True:
        batch = list(qs.filter(id__gt=last_id).values("id", "body")[:BATCH_SIZE])
        if not batch:
            break

        updates = []
        for row in batch:
            body = (row["body"] or "").strip()
            summary = body[:300]
            updates.append((row["id"], summary))

        # Bulk update via raw SQL can be faster; we’ll keep it ORM-ish and safe.
        # We'll update row-by-row in this batch to keep it simple and DB-agnostic.
        # For huge scale, use SQL CASE updates or temp tables.
        for article_id, summary in updates:
            Article.objects.filter(id=article_id).update(summary=summary)

        last_id = batch[-1]["id"]


class Migration(migrations.Migration):
    dependencies = [
        ("articles", "XXXX_add_summary_nullable"),  # adjust
    ]

    operations = [
        migrations.RunPython(backfill_summary, migrations.RunPython.noop),
    ]
```

### 31.5.3 Important explanation: why we use `apps.get_model`
Migrations must not import your current models directly because:
- your code changes over time
- migrations must run against historical states
`apps.get_model` gives you the model state at migration time.

### 31.5.4 Production optimization note
Row-by-row updates are simple but slower. For truly large tables, you often:
- do backfills in a management command
- or use SQL `UPDATE ... SET summary = substring(body,1,300) WHERE summary IS NULL`
  with careful chunking or `id` ranges
- or use a job queue to backfill gradually

But the pattern (nullable → backfill → enforce) remains the same.

---

## 31.6 Step 3 (Contract): Enforce Not Null and Remove Blank

Now you want `summary` required.

Update `articles/models.py`:

```python
summary = models.CharField(max_length=300)
```

Run:

```bash
python manage.py makemigrations
python manage.py migrate
```

### Why this is safe now
Because:
- all existing rows have summary backfilled
- future creates/edits will provide summary (you must ensure forms/serializers do)
- DB can enforce NOT NULL without needing a full-table rewrite in many cases (still
  can lock briefly; schedule wisely)

---

## 31.7 Coordinating Code + Schema Changes (Deployment Strategy)

If you deploy code and migrations separately (common in CI/CD pipelines), use this flow:

### Deploy 1
- Add nullable field + deploy code that writes it for new/updated records.

### Backfill
- Run data migration or background job to fill existing rows.

### Deploy 2
- Enforce NOT NULL
- Remove old code paths

This avoids having a window where:
- code expects non-null but DB has nulls
- or DB expects non-null but code isn’t writing it

---

## 31.8 Adding Indexes Safely in PostgreSQL (CONCURRENTLY)

Creating an index can lock writes. In PostgreSQL, you can create indexes concurrently:

```sql
CREATE INDEX CONCURRENTLY ...
```

Django migrations can do concurrent index creation with special migration operations
in newer Django versions, but the safest universal approach is `RunSQL` with
`atomic = False`.

### 31.8.1 Example migration: create index concurrently
Create empty migration:

```bash
python manage.py makemigrations --empty articles --name idx_article_summary
```

Edit:

```python
from django.db import migrations


class Migration(migrations.Migration):
    atomic = False

    dependencies = [
        ("articles", "XXXX_previous"),
    ]

    operations = [
        migrations.RunSQL(
            sql=(
                "CREATE INDEX CONCURRENTLY IF NOT EXISTS "
                "article_summary_idx ON articles_article (summary);"
            ),
            reverse_sql="DROP INDEX CONCURRENTLY IF EXISTS article_summary_idx;",
        ),
    ]
```

### Explanation: why `atomic = False`
PostgreSQL does not allow `CREATE INDEX CONCURRENTLY` inside a transaction.
Django wraps migrations in transactions by default. Setting `atomic=False` allows
that SQL to run outside a transaction.

**Industry note:** This is a key migration technique for low-downtime systems.

---

## 31.9 `SeparateDatabaseAndState` (Advanced Tool for Tricky Transitions)

Sometimes you need Django’s model state to change differently than the DB operation.

Common use cases:
- rename a column in DB without breaking old code
- complex refactors where you keep old column temporarily but want new field name in Django
- using DB views or computed columns

Example conceptual pattern:
- add a new column
- keep old column
- write to both
- then swap

`SeparateDatabaseAndState` allows:
- database operations that don’t match Django state operations 1:1

You don’t need it often, but when you do, it’s the correct tool.

If you tell me your specific scenario (rename field, split table, move to JSONB),
I can provide a concrete `SeparateDatabaseAndState` migration.

---

## 31.10 Managing Large Backfills (Professional Playbook)

For large tables, the best practice is often:

- **Schema migration** adds nullable column
- **Code deploy** writes the new field for new writes
- **Background backfill** (Celery job or management command) fills existing rows over hours/days
- **Constraint migration** enforces NOT NULL later

### Why not do huge backfills in migrations?
Because migrations run during deploy and can:
- block deploy pipeline
- cause long locks
- be hard to pause/retry safely
- cause incidents

### A better approach: management command backfill
Create `articles/management/commands/backfill_summary.py` with:
- chunked processing
- progress logs
- safe resume from last_id

This is an industry standard pattern.

---

## 31.11 Safe Deletion and “Drop Column” Strategy

Dropping columns can be risky if old code still references them.

Professional pattern:
1. Stop reading the column in code (deploy)
2. Stop writing the column (deploy)
3. Wait through one safe period (ensures all running processes updated)
4. Drop the column in a migration

This reduces “rollback pain.” If you need to rollback code, the column still exists.

---

## 31.12 Rollbacks: Reality and Strategy

### 31.12.1 Rolling back migrations is not always safe
- Data migrations may not be reversible.
- Dropping columns loses data.
- Index changes can be reversed but can be expensive.

### 31.12.2 Industry rule
Prefer “forward fixes” over rollbacks for DB schema, unless you specifically built
reversible migrations.

For a safe rollback plan:
- deploy code rollbacks should work with expanded schema (nullable fields etc.)
- do contract steps only after confidence

---

## 31.13 Detecting Migration Risk Before Running in Production

Before running a migration:
- inspect SQL:
  - `python manage.py sqlmigrate app migration_number`
- check if it includes:
  - ALTER COLUMN TYPE
  - table rewrites
  - full-table updates
  - constraint validation
- in Postgres, test with:
  - staging dataset (or a copy)
  - `EXPLAIN` where relevant

---

## 31.14 CI Discipline for Migrations (Industry Standard)

### 31.14.1 Ensure migrations are created when models change
In CI, a common check:

```bash
python manage.py makemigrations --check --dry-run
```

This fails if you forgot to commit migration files.

### 31.14.2 Ensure migrations apply cleanly
CI should run:
```bash
python manage.py migrate
python manage.py test
```

---

# 31.15 LAB: Execute a Safe Expand/Backfill/Contract Change

We’ll implement the `summary` field change in your project.

1. Add `summary` nullable field, migrate.
2. Add code (forms + serializers) to write summary for new creates/edits:
   - set default summary from body in form/serializer clean() if empty
3. Run backfill migration in batches.
4. Make `summary` required (not null), migrate.
5. Update templates to use `summary` for list/meta descriptions.

### Add summary to ArticleForm (so you don’t rely on body always)
In `articles/forms.py` include `summary` and auto-generate if blank:

```python
def clean(self):
    cleaned = super().clean()
    summary = (cleaned.get("summary") or "").strip()
    body = (cleaned.get("body") or "").strip()

    if not summary and body:
        cleaned["summary"] = body[:300]

    return cleaned
```

---

## 31.16 Exercises (Do These Before Proceeding)

1. Add a new field to `TaskExportJob`:
   - `expires_at` (DateTimeField nullable)
   - backfill for existing DONE exports to be `created_at + 7 days`
   - then enforce a cleanup job that deletes expired exports
   - explain why this is a safe expand/backfill pattern

2. Create a migration that enables a Postgres extension safely (`pg_trgm`):
   - use `RunSQL("CREATE EXTENSION IF NOT EXISTS pg_trgm;")`
   - ensure it’s reversible (drop extension only if safe; often you don’t drop)

3. Add a concurrent index for a hot query (Postgres):
   - migration with `atomic = False`
   - create index concurrently
   - explain why atomic false is required

4. Add CI migration checks to README:
   - `makemigrations --check --dry-run`
   - `migrate` and `test`

---

## 31.17 Chapter Summary

- Production migrations are risky because of locks, backfills, and code/schema mismatch.
- Use expand/contract:
  - add nullable
  - backfill safely (prefer batches or background jobs)
  - enforce constraints later
- Use `RunPython` with `apps.get_model`, never import models directly.
- Use concurrent index creation in Postgres for low downtime (`atomic=False` + `RunSQL`).
- Prefer forward fixes and avoid destructive migrations until you’re confident.
- Treat migration execution as an operational event: plan, measure, monitor, and have a runbook.

---

Next chapter: **32. Advanced Model Patterns**  
We’ll cover abstract base models, proxy models, generic relations (ContentTypes),
soft deletes, audit logging, and model inheritance pitfalls—plus when each is
appropriate in real systems.