Skip to content

Configurable database backend: SQLAlchemy Core abstraction (SQLite + PostgreSQL) #300

@vybe

Description

@vybe

Summary

Replace the raw sqlite3 database layer with SQLAlchemy Core to support configurable database backends — starting with SQLite (default, zero-config) and PostgreSQL (production-grade, self-hosted or external).

Motivation

  • Production readiness: SQLite works for small deployments but has concurrency limitations under load (single-writer lock, no true connection pooling)
  • Future flexibility: SQLAlchemy Core supports SQLite, PostgreSQL, MySQL, MariaDB — adding new backends later is trivial
  • Data integrity: PostgreSQL provides native BOOLEAN, TIMESTAMPTZ, JSONB, proper constraint enforcement
  • Operational: PostgreSQL enables proper backups, replication, monitoring with standard tooling

Current State

Metric Value
Database SQLite3 (synchronous, no ORM)
DB code ~11,400 lines across 36 Python files in src/backend/db/
Tables 36 tables, 84 indexes
Migrations 30 idempotent migrations (hand-rolled, PRAGMA-based introspection)
Connection Context manager in db/connection.py (28 lines)
Queries 100% raw SQL with ? placeholders, no ORM
Operations classes 25 *Operations classes composed via DatabaseManager facade

SQLite-Specific Syntax Found

Pattern Count PG Equivalent
INTEGER PRIMARY KEY AUTOINCREMENT 6 tables SERIAL PRIMARY KEY
? param placeholders Hundreds %s (psycopg2)
INSERT OR REPLACE 4 places INSERT ... ON CONFLICT DO UPDATE
INSERT OR IGNORE 4 places INSERT ... ON CONFLICT DO NOTHING
PRAGMA table_info() 30 migrations information_schema.columns
sqlite3.IntegrityError 8 places sqlalchemy.exc.IntegrityError
sqlite3.Row row factory 1 place RowMapping
INTEGER for booleans Many columns Boolean
TEXT for timestamps Many columns DateTime
TEXT for JSON ~15 columns JSON

Proposed Approach: SQLAlchemy Core (not ORM)

Use SQLAlchemy Core — the table/expression layer — without the ORM. This gives dialect-agnostic SQL generation while keeping the explicit query-building style the codebase already uses.

Why Core, Not ORM

  • Current code uses raw SQL with explicit row-to-dict converters — Core maps naturally to this pattern
  • No object identity map overhead
  • Operations classes stay as-is (they become query builders instead of SQL string builders)
  • DatabaseManager facade API doesn't change — callers are unaffected

Configuration

# SQLite (default, zero-config — current behavior)
DATABASE_URL=sqlite:///data/trinity.db

# PostgreSQL (self-hosted container)
DATABASE_URL=postgresql://trinity:secret@postgres:5432/trinity

# PostgreSQL (external)
DATABASE_URL=postgresql://user:pass@your-rds.amazonaws.com:5432/trinity

Docker Compose Addition

services:
  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: trinity
      POSTGRES_USER: trinity
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-trinity}
    volumes:
      - postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U trinity"]
      interval: 5s
      timeout: 5s
      retries: 5
    profiles: ["postgres"]  # Only starts with: docker compose --profile postgres up

Implementation Plan

Phase 1: Foundation (no behavior change)

  • Add sqlalchemy>=2.0 to requirements
  • Create db/tables.py — SQLAlchemy MetaData + Table() definitions for all 36 tables
  • Create db/engine.py — engine factory from DATABASE_URL, connection pooling config
  • Replace db/connection.py context manager with SQLAlchemy engine.begin() wrapper
  • Verify all existing tests pass with SQLite via SQLAlchemy (no query changes yet)

Phase 2: Query Migration (module by module)

Convert each *Operations class from raw SQL strings to SQLAlchemy Core expressions. Each module is independent, so this can be done incrementally:

  • db/users.py — simple CRUD, good pilot module
  • db/settings.py — has upsert pattern (INSERT OR REPLACE)
  • db/tags.py — has INSERT OR IGNORE pattern
  • db/permissions.py — batch upsert + delete-and-replace
  • db/mcp_keys.py — CRUD + hash lookup
  • db/email_auth.py — time-limited codes
  • db/activities.py — fix positional row access (row[0]) to named access
  • db/chat.py — multi-table transactions (message + session update)
  • db/agents.py + db/agent_settings/ — mixin composition, JOINs
  • db/schedules.py — largest module (~53K), complex aggregations
  • db/shared_folders.py, db/notifications.py, db/subscriptions.py
  • db/public_links.py, db/public_chat.py
  • db/slack.py, db/slack_channels.py, db/telegram_channels.py
  • db/monitoring.py, db/dashboard_history.py
  • db/operator_queue.py, db/event_subscriptions.py
  • db/nevermined.py, db/system_views.py, db/skills.py

Phase 3: Schema & Migrations

  • Replace db/schema.py DDL strings with metadata.create_all(engine) (auto-generates dialect-appropriate DDL)
  • Convert db/migrations.py from PRAGMA-based introspection to Alembic or SQLAlchemy-based introspection (inspect(engine).get_columns())
  • Test fresh install + migration path on both SQLite and PostgreSQL

Phase 4: PostgreSQL Integration

  • Add PostgreSQL service to docker-compose.yml (with profiles: ["postgres"])
  • Add PostgreSQL connection pooling config (pool_size, max_overflow)
  • Test full platform on PostgreSQL
  • Update deployment docs

Phase 5: Cleanup

  • Remove direct sqlite3 imports from all db modules
  • Update CLAUDE.md and architecture.md
  • Add database configuration section to deployment guide

Key Design Decisions

  1. SQLAlchemy Core only — no ORM Session, no mapped classes, no relationship loading
  2. Single DATABASE_URL env var — determines engine type at startup
  3. SQLite remains default — zero-config for local dev and small deployments
  4. PostgreSQL via Docker profiledocker compose --profile postgres up enables it
  5. Upsert uses dialect-specific importsfrom sqlalchemy.dialects.{sqlite,postgresql} import insert
  6. Keep DatabaseManager facade — callers don't know or care which backend is active
  7. Keep *Operations classes — they become SQLAlchemy query builders instead of SQL string builders
  8. Alembic optional — can keep hand-rolled migrations using inspect() instead of PRAGMA

Scope & Risk

Component Lines Difficulty
db/tables.py (new) ~400 Easy
db/engine.py (new) ~60 Easy
db/connection.py rewrite Easy
db/schema.py ~800 rewrite Medium
db/migrations.py ~1000 rewrite Hard
29 db modules (queries) ~9500 audit Medium
database.py (facade) ~50 changes Easy
docker-compose.yml ~20 new Easy

Risk mitigation: Phase 1 changes zero queries — just wraps existing SQLite access through SQLAlchemy. Each Phase 2 module can be merged independently. Rollback is straightforward at any point.

Acceptance Criteria

  • DATABASE_URL=sqlite:///data/trinity.db works identically to current behavior
  • DATABASE_URL=postgresql://... works with PostgreSQL 16
  • All 36 tables created correctly on both backends
  • Migrations run successfully on both backends
  • No raw sqlite3 imports remain in db/ modules
  • Docker Compose supports optional PostgreSQL via --profile postgres
  • Zero changes to router or service layer code
  • Documentation updated

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions