# Part VII — Data, Integrations, and Advanced ORM  
## 30. PostgreSQL for Django (Practical Database Mastery)

This chapter upgrades your database skills from “I can use the ORM” to “I can run a
Django app on PostgreSQL like a production engineer.”

You will learn:

- how to switch Django from SQLite to PostgreSQL (local + production patterns)
- how PostgreSQL differs from SQLite in ways that affect correctness
- indexes that actually matter and how to choose them
- how to read query plans (`EXPLAIN`, `EXPLAIN ANALYZE`)
- transactional behavior, isolation levels, and locking basics
- PostgreSQL features Django teams commonly use:
  - JSONB fields
  - full-text search
  - trigram search
  - constraints and partial indexes
- operational basics: migrations at scale, backups, monitoring queries

We’ll keep everything tied to your existing domain:
- `articles` (published list/detail, tags, search)
- `tasks` (org-scoped list filters, exports)

---

## 30.0 Learning Outcomes

By the end, you should be able to:

1. Run PostgreSQL locally and connect Django to it reliably.
2. Explain key differences vs SQLite that matter in production (concurrency,
   transactions, constraints, LIKE/ILIKE behavior, ordering, NULL handling).
3. Use PostgreSQL tools in Django:
   - `django.db.connection`, `EXPLAIN`
   - `QuerySet.explain()`
4. Design indexes intentionally and verify they are used.
5. Apply performance patterns:
   - composite indexes for multi-column filters
   - partial indexes for “published only”
   - correct ordering indexes
6. Use PostgreSQL full-text search and/or trigram search for better search UX.
7. Understand transaction isolation and how locks can block your app.
8. Plan safe migrations on large tables (zero-downtime mindset).
9. Know what to monitor and how to avoid common production DB mistakes.

---

## 30.1 Why PostgreSQL (and Why SQLite Is Not “Bad”)

### 30.1.1 SQLite strengths (why we used it first)
- zero setup
- fast iteration for learning
- good for tests and small apps

### 30.1.2 Why PostgreSQL is the default for serious Django production
- high concurrency (many simultaneous writers)
- strong transactional guarantees
- robust indexing and query planner
- advanced features:
  - partial indexes
  - GIN/GiST indexes
  - full-text search
  - JSONB
  - sophisticated constraints

**Industry reality:** Most Django teams deploy to Postgres unless they have very
special requirements.

---

## 30.2 Set Up PostgreSQL Locally (Two Standard Approaches)

### Option A (recommended): Docker Compose
Create `docker-compose.yml` in repo root:

```yaml
services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: django_app
      POSTGRES_USER: django
      POSTGRES_PASSWORD: django
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:
```

Start it:

```bash
docker compose up -d
```

Verify it’s running:

```bash
docker compose ps
```

### Option B: Install PostgreSQL natively
Install via your OS package manager and create a user/db. Docker is simpler and
more reproducible.

---

## 30.3 Install the PostgreSQL Driver (psycopg)

Modern Django uses psycopg (v3) commonly.

Install:

```bash
python -m pip install "psycopg[binary]"
python -m pip freeze > requirements.txt
```

Notes:
- `"psycopg[binary]"` includes a prebuilt binary wheel for convenience.
- In production, some teams prefer building from source and using system libs; that’s an ops choice.

---

## 30.4 Configure Django to Use PostgreSQL

### 30.4.1 Minimal explicit `DATABASES` (clear and readable)
In `config/settings.py` (or `config/settings/dev.py`):

```python
import os

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": os.environ.get("POSTGRES_DB", "django_app"),
        "USER": os.environ.get("POSTGRES_USER", "django"),
        "PASSWORD": os.environ.get("POSTGRES_PASSWORD", "django"),
        "HOST": os.environ.get("POSTGRES_HOST", "127.0.0.1"),
        "PORT": os.environ.get("POSTGRES_PORT", "5432"),
    }
}
```

Set env vars (optional; Docker compose already sets them in the container, but your
Django app runs on host, so set them locally as needed):

```bash
export POSTGRES_DB=django_app
export POSTGRES_USER=django
export POSTGRES_PASSWORD=django
export POSTGRES_HOST=127.0.0.1
export POSTGRES_PORT=5432
```

### 30.4.2 `DATABASE_URL` style (common in production platforms)
Many platforms provide a single URL like:

```text
postgres://user:pass@host:5432/dbname
```

You can use a parser library (e.g., `dj-database-url`). That’s common, but the
explicit dict is easier for learning and debugging.

---

## 30.5 Create the Schema in Postgres

Run migrations against Postgres:

```bash
python manage.py migrate
```

Create superuser again (new DB):

```bash
python manage.py createsuperuser
```

Seed your models (shell or admin) as before.

---

## 30.6 PostgreSQL vs SQLite: Differences That Affect Your App

This section prevents “it worked in dev but broke in prod” surprises.

### 30.6.1 Concurrency and locking behavior
- SQLite locks the entire database file for writes.
- PostgreSQL locks rows/pages and supports much higher write concurrency.

Practical consequence:
- SQLite can hide locking bugs because it serializes writes aggressively.
- Postgres will expose real concurrency patterns.

### 30.6.2 Transaction isolation differences
Postgres default isolation is **READ COMMITTED**:
- each statement sees a snapshot committed before the statement starts
- non-repeatable reads are possible across statements in same transaction

SQLite behaves differently in some cases.

You must design:
- idempotency in background tasks
- correct locking for counters and “assign next number” workflows
- safe migrations

### 30.6.3 Text comparisons and case sensitivity
Postgres:
- `LIKE` is case-sensitive
- `ILIKE` is case-insensitive

Django’s `icontains` uses `ILIKE` on Postgres.

SQLite case sensitivity rules differ and can vary by collation.

### 30.6.4 NULL ordering and DISTINCT behavior
Ordering with NULLs can differ by DB; always test the ordering you rely on.

---

## 30.7 Inspect SQL and Query Plans (EXPLAIN)

Postgres performance work is mostly about:
- writing a query the planner can execute efficiently
- adding indexes that match your filter/order patterns

### 30.7.1 Print query SQL from Django
In shell:

```python
from articles.models import Article

qs = Article.objects.published().order_by("-published_at", "-created_at")
print(qs.query)
```

### 30.7.2 Use QuerySet.explain()
```python
print(qs.explain())
```

For deeper analysis (Postgres supports options):

```python
print(qs.explain(analyze=True, buffers=True))
```

Explanation:
- `analyze=True` actually runs the query and measures real time and rows.
- `buffers=True` shows I/O behavior (useful for “why is it slow?”).

**Important safety note:** `EXPLAIN ANALYZE` runs the query. Don’t run it on
production with destructive queries, and be careful with very expensive queries.

---

## 30.8 Index Design (Practical, Not Theoretical)

Indexes are not “always good.” They are tradeoffs:
- faster reads for indexed patterns
- slower writes (insert/update/delete must update indexes)
- more storage

### 30.8.1 The 80/20 rule for indexes
Index for:
- frequent filters
- frequent joins
- frequent orderings
- “hot paths” (list endpoints, admin changelist)

Don’t index:
- low-cardinality boolean fields alone (often not useful by itself)
- fields rarely used in queries
- everything “just in case”

---

## 30.9 Indexing Your Real Queries (Articles + Tasks)

### 30.9.1 Articles list typical query
Public list usually does:

- filter `status=published`
- maybe filter by tag join
- order by `published_at DESC, created_at DESC`
- paginate

You already have indexes, but we can improve with **partial indexes**.

---

## 30.10 Partial Indexes (Postgres Superpower for “Published Only”)

A partial index indexes only rows matching a condition, e.g., only published
articles. This can be extremely effective if most content is drafts or archived.

### 30.10.1 Add a partial index for published ordering
In `articles/models.py` `Article.Meta.indexes`, add (Postgres-only condition):

```python
from django.db import models
from django.db.models import Q

class Article(models.Model):
    ...
    class Meta:
        indexes = [
            # Existing indexes...
            models.Index(fields=["slug"]),
            models.Index(fields=["status", "-created_at"]),
            models.Index(
                fields=["-published_at", "-created_at"],
                name="article_pub_order_idx",
                condition=Q(status="published"),
            ),
        ]
```

#### Why this helps
Your public list orders by published_at/created_at and only includes published.
This index matches that exact query pattern.

#### Compatibility note
Partial index `condition=` is supported in Django with PostgreSQL. On SQLite it
won’t behave the same way; this is one reason you decide DB features intentionally.

Run migrations:

```bash
python manage.py makemigrations
python manage.py migrate
```

Inspect plan:

```python
qs = Article.objects.published().order_by("-published_at", "-created_at")
print(qs.explain(analyze=True))
```

You should see an Index Scan using your partial index on sufficiently large data.

---

## 30.11 GIN Indexes for Many-to-Many and Search (When Relevant)

### 30.11.1 Many-to-many tag join performance
M2M queries like:

```python
Article.objects.published().filter(tags__slug="django")
```

depend on:
- indexes on the join table (`article_id`, `tag_id`)
- index on `Tag.slug`

Django automatically creates indexes on FK columns. Your `Tag.slug` is unique, so
it’s indexed.

If you see slow tag filtering at scale:
- ensure join table indexes exist (they should)
- consider query shape and selectivity
- consider caching “tag pages” if content rarely changes

---

## 30.12 Full-Text Search (PostgreSQL) — Better Than icontains

`icontains` can be slow and not great UX (no ranking, partial matching weirdness).
Postgres full-text search gives:
- tokenization (words)
- ranking
- language configs
- fast search with GIN indexes

### 30.12.1 Enable postgres contrib (not required for basic FTS)
Basic full-text search uses built-in functions. For trigram you need `pg_trgm`
extension (later).

### 30.12.2 Implement a search vector field (practical approach)

Add to `articles/models.py`:

```python
from django.contrib.postgres.indexes import GinIndex
from django.contrib.postgres.search import SearchVectorField

class Article(models.Model):
    ...
    search_vector = SearchVectorField(null=True, blank=True)

    class Meta:
        indexes = [
            # keep existing indexes...
            GinIndex(fields=["search_vector"], name="article_search_vec_gin"),
        ]
```

Run migrations:

```bash
python manage.py makemigrations
python manage.py migrate
```

Now you need to populate `search_vector`.

### 30.12.3 Populate search_vector (data migration approach)

Create an empty migration:

```bash
python manage.py makemigrations --empty articles --name backfill_search_vector
```

Edit it:

```python
from django.db import migrations
from django.contrib.postgres.search import SearchVector


def backfill(apps, schema_editor):
    Article = apps.get_model("articles", "Article")
    # Use schema_editor.connection for DB operations if needed.
    # Here we use ORM update with SearchVector (works with Postgres).
    Article.objects.update(
        search_vector=(
            SearchVector("title", weight="A") +
            SearchVector("body", weight="B")
        )
    )


class Migration(migrations.Migration):
    dependencies = [
        ("articles", "XXXX_previous"),
    ]

    operations = [
        migrations.RunPython(backfill, migrations.RunPython.noop),
    ]
```

Now use it in queries:

```python
from django.contrib.postgres.search import SearchQuery, SearchRank

q = "django templates"
query = SearchQuery(q)

qs = (
    Article.objects.published()
    .annotate(rank=SearchRank(models.F("search_vector"), query))
    .filter(search_vector=query)
    .order_by("-rank", "-published_at")
)
```

#### Why this is “real”
- search becomes faster (GIN)
- results can be ranked by relevance
- you stop scanning large text fields with ILIKE

### 30.12.4 Keeping search_vector updated automatically
You have options:
1) Update in application code on save (can be tricky with bulk updates)
2) Use database triggers (more advanced but robust)
3) Use a periodic job to reindex (sometimes acceptable)

For mastery-level Django, triggers are often used in serious apps. We’ll keep the
workbook application-level for now:

In `Article.save()` override (careful: keep it minimal), or better:
- a service layer that sets it
- or a post-save signal

A professional approach is a DB trigger, but that’s outside “core Django” and
depends on operational comfort.

---

## 30.13 Trigram Search (pg_trgm) — Great for “Fuzzy” Search UX

Trigram search helps with:
- typos
- partial matches
- “contains” style search but indexed

### 30.13.1 Enable extension
Create a migration in any app (commonly a `core` app) using `RunSQL`:

```bash
python manage.py makemigrations --empty articles --name enable_pg_trgm
```

Edit:

```python
from django.db import migrations

class Migration(migrations.Migration):
    dependencies = [
        ("articles", "XXXX_previous"),
    ]

    operations = [
        migrations.RunSQL("CREATE EXTENSION IF NOT EXISTS pg_trgm;"),
    ]
```

### 30.13.2 Add trigram index (example)
Django provides `GinIndex` with opclasses (advanced). Many teams use raw SQL for
trigram indexes; Django support exists but you must be precise.

A simpler “workbook-level” approach:
- use FTS for production search
- use trigram only if you understand and need it

If you want trigram search, tell me and I’ll provide a fully correct Django
migration with opclasses for your Django version and the exact fields you want.

---

## 30.14 Transactions, Isolation, and Locks (What Django Devs Must Know)

### 30.14.1 Autocommit and request transactions
Django uses autocommit by default:
- each query is its own transaction unless you use `atomic()`

You already used `transaction.atomic()` in services—good.

### 30.14.2 `select_for_update()` (row locking)
Used when you must safely update based on current DB state.

Example (conceptual): assign next ticket number in an org:

```python
from django.db import transaction

with transaction.atomic():
    counter = Counter.objects.select_for_update().get(org=org)
    counter.value += 1
    counter.save()
```

This prevents two transactions from getting the same counter value.

### 30.14.3 Deadlocks (what they are)
Deadlock happens when:
- Tx A locks row 1 and waits for row 2
- Tx B locks row 2 and waits for row 1
Postgres detects deadlock and kills one transaction.

How to reduce deadlocks:
- lock rows in consistent order
- keep transactions short
- avoid doing network calls inside transactions

---

## 30.15 Connection Pooling (Production Reality)

Postgres has a connection limit. Django processes + Celery workers can consume many
connections.

Common approaches:
- tune worker counts (web and celery)
- use connection pooling (PgBouncer) or managed pooler
- set `CONN_MAX_AGE` appropriately (but note Django async guidance if using ASGI)

A common safe baseline for sync web apps:
```python
CONN_MAX_AGE = 60
```

For async-heavy ASGI + ORM usage, Django docs recommend:
```python
CONN_MAX_AGE = 0
```

Choose based on deployment mode and confirm with monitoring.

---

## 30.16 Monitoring Queries (What Pros Check)

### 30.16.1 Enable slow query logging (Postgres)
In production, you can set:
- `log_min_duration_statement = 200ms` (example)
Then Postgres logs slow statements.

### 30.16.2 `pg_stat_statements` (high value extension)
This extension aggregates query stats (mean time, total time, calls). Many teams
enable it in production because it answers:
- “Which queries are slowest overall?”
- “Which queries consume most total DB time?”

It’s an ops-level feature; implement when you manage your Postgres config.

---

## 30.17 Backups and Restore Drills (Yes, Even for “Simple Apps”)

### 30.17.1 The real rule
A backup you never restored is not a backup—it’s a hope.

Minimum professional baseline:
- automated daily backups
- retention policy (e.g., 7–30 days)
- periodic restore drill into staging/local

### 30.17.2 Common tools
- `pg_dump` for logical backups
- provider-managed snapshots (RDS, Cloud SQL)
- WAL archiving for point-in-time recovery (advanced)

We’ll cover operational runbooks later in deployment operations chapters, but the
DB mindset starts here.

---

# 30.18 Hands-On Labs

## Lab A — Switch your project to Postgres
1. Start Postgres via Docker compose.
2. Install psycopg.
3. Update `DATABASES`.
4. Run migrations.
5. Create superuser.
6. Load seed data and confirm pages work.

## Lab B — Confirm index usage
1. Seed 5,000+ articles quickly (management command or shell loop).
2. Run `qs.explain(analyze=True)` for:
   - published list ordering query
   - tag filter query
3. Add a partial index for published ordering.
4. Compare query plans before/after.

## Lab C — Add full-text search
1. Add `search_vector` + GIN index.
2. Backfill vector via migration.
3. Implement a search endpoint using `SearchQuery` and `SearchRank`.
4. Compare response time vs `icontains` on large dataset.

---

## 30.19 Exercises (Do These Before Proceeding)

1. Add a composite index for tasks list:
   - your most common query is likely `organization + status + created_at`
   - create an index that matches it and verify with `explain()`

2. Implement a “top tags” query optimized for Postgres:
   - annotate tag counts and order by it
   - verify query count is stable and add an index if needed

3. Document your DB config in README:
   - how to run Postgres
   - required env vars
   - how to run migrations

4. Write a “DB performance checklist” for your app:
   - every list view paginated
   - every cross-relation template uses prefetch/select_related
   - explain plans reviewed for hot endpoints

---

## 30.20 Chapter Summary

- PostgreSQL is the production standard because it handles concurrency, indexing,
  and advanced queries far better than SQLite.
- Django integrates cleanly via psycopg and `DATABASES`.
- Performance is driven by:
  - correct query shapes
  - correct indexes (including partial indexes)
  - measuring with `explain(analyze=True)`
- Full-text search (GIN) is the professional replacement for naive `icontains` at scale.
- Transactions and locks matter for correctness under concurrency.
- Operational practices (pooling, monitoring, backups) are part of “mastering Django.”

---

Next chapter: **31. Advanced Migrations and Data Management**  
We’ll learn safe schema changes in production: backfills, large tables, avoiding
downtime, `RunPython` patterns, `SeparateDatabaseAndState`, and rollout/rollback
strategies.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='../6. Async_realtime_and_background_work/29. background_tasks.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='31. advanced_migrations_and_data_management.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
