Skip to content

YgriegaSB/tool-dbdiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dbdiff - Database Schema Comparison Tool

Python Version License: MIT Version

dbdiff is a powerful CLI tool for comparing and validating database schemas between environments (staging vs production, source vs target). Built with Python 3.10+, it provides comprehensive schema analysis with detailed reporting capabilities.

Table of Contents

Features

  • 🔍 Comprehensive Schema Comparison

    • Tables, columns, constraints (PK, FK, UNIQUE, CHECK)
    • Indexes (including partial and expression indexes)
    • Views and materialized views
    • Functions, procedures, and triggers
    • Sequences, custom types, and extensions
    • Grants/privileges
    • Scheduled jobs (pg_cron, pgAgent, custom)
  • 🎯 Smart Difference Detection

    • Type equivalence handling (int8 ≡ bigint, etc.)
    • Normalized SQL comparison
    • Breaking change identification
    • Customizable fail conditions
  • 📊 Multiple Output Formats

    • Markdown reports with badges and summaries
    • JSON for programmatic access
    • Plain text summaries
    • Catalog exports for baseline management
  • 🚀 Production-Ready

    • Read-only, safe for production
    • Connection pooling and timeouts
    • Structured logging (JSON/text)
    • Comprehensive test suite
  • 🔌 Extensible Architecture

    • Plugin-based adapter system
    • Currently supports PostgreSQL
    • Easy to add MySQL, SQL Server, etc.

Installation

Prerequisites

  • Python 3.10 or higher
  • PostgreSQL database access (read-only permissions sufficient)
  • pip package manager

Install from source

# Clone the repository
git clone https://github.com/YgriegaSB/tool-dbdiff.git
cd tool-dbdiff

# Create and activate virtual environment (recommended)
python -m venv venv

# Windows
venv\Scripts\activate

# Linux/macOS
source venv/bin/activate

# Install dbdiff and dependencies
pip install -e .

Verify installation

dbdiff --help

Quick Start

Note: For a complete quick start guide in Spanish, see QUICK_START.md

Step 1: Configure database connections

Create a .env file in the project root:

# Source database configuration
SOURCE_DB_HOST=localhost
SOURCE_DB_PORT=5432
SOURCE_DB_NAME=sourcedb
SOURCE_DB_USER=your_username
SOURCE_DB_PASSWORD=your_password

# Target database configuration
TARGET_DB_HOST=localhost
TARGET_DB_PORT=5433
TARGET_DB_NAME=targetdb
TARGET_DB_USER=your_username
TARGET_DB_PASSWORD=your_password

Alternative: Use full connection URLs

SOURCE_DATABASE_URL=postgresql://user:pass@localhost:5432/sourcedb
TARGET_DATABASE_URL=postgresql://user:pass@localhost:5433/targetdb

Step 2: Run your first comparison

# Basic comparison (uses .env configuration)
dbdiff --report-dir ./reports

# Or with explicit URLs
dbdiff \
  --source-url "postgresql://user:pass@localhost:5432/sourcedb" \
  --target-url "postgresql://user:pass@localhost:5433/targetdb" \
  --report-dir ./reports

Step 3: Review results

Check the generated reports in ./reports/:

  • diff.md - Human-readable Markdown report
  • diff.json - Complete JSON data for programmatic access
  • summary.txt - Executive summary

Compare specific schemas

dbdiff \
  --source-url "$SOURCE_URL" \
  --target-url "$TARGET_URL" \
  --schemas public,app \
  --report-dir ./reports

Usage Examples

Fail on specific conditions

dbdiff \
  --source-url "$SOURCE_URL" \
  --target-url "$TARGET_URL" \
  --fail-on missing_table,changed_column_type,missing_fk \
  --report-dir ./reports

Exit codes:

  • 0: No differences (or non-breaking only)
  • 1: Differences match fail-on conditions
  • 2: Error during execution

Exclude temporary objects

dbdiff \
  --source-url "$SOURCE_URL" \
  --target-url "$TARGET_URL" \
  --exclude '^(tmp_|audit_|temp_)' \
  --report-dir ./reports

Include only specific object types

dbdiff \
  --source-url "$SOURCE_URL" \
  --target-url "$TARGET_URL" \
  --include-kinds table,view,function \
  --report-dir ./reports

Export catalogs for baseline management

# Export current catalog
dbdiff \
  --source-url "$SOURCE_URL" \
  --target-url "$TARGET_URL" \
  --export-catalog \
  --report-dir ./baseline

# Later, compare against baseline
dbdiff \
  --source-url "$SOURCE_URL" \
  --target-url "$TARGET_URL" \
  --baseline ./baseline/diff.json \
  --report-dir ./reports

Custom cron job discovery

dbdiff \
  --source-url "$SOURCE_URL" \
  --target-url "$TARGET_URL" \
  --jobs-discovery-sql "SELECT id, schedule, command FROM my_jobs_table" \
  --report-dir ./reports

JSON logging for CI/CD

dbdiff \
  --source-url "$SOURCE_URL" \
  --target-url "$TARGET_URL" \
  --log-format json \
  --log-level INFO \
  --report-dir ./reports

Security: Automatic credential masking

dbdiff automatically protects your credentials in logs by default. No configuration needed!

# Your normal command - credentials are automatically masked
dbdiff --report-dir ./reports

# What you see in logs:
# [INFO] Connecting to postgresql://user:****@host:5432/db
#                                         ^^^^
#                                      Protected!

# To disable masking (NOT recommended in production)
dbdiff --report-dir ./reports --mask-secrets=false

What gets masked:

  • Database passwords: postgresql://user:pass@hostpostgresql://user:****@host
  • API keys: api_key=secretapi_key=****
  • Tokens: token=abc123token=****
  • Auth headers: Authorization: Bearer xyzAuthorization: ****

This works with:

  • ✅ Environment variables (.env file)
  • ✅ Direct URLs (--source-url)
  • ✅ Both text and JSON log formats
  • ✅ All output reports

Output Reports

Generated files

reports/
├── diff.json          # Detailed JSON report
├── diff.md            # Human-readable Markdown
├── summary.txt        # Plain text summary
├── source.json        # Source catalog (with --export-catalog)
└── target.json        # Target catalog (with --export-catalog)

Markdown report structure

# Database Schema Diff Report

## Executive Summary
- Total Differences: 42
- Breaking Changes: 8

### Summary by Object Type
| Object Type | Missing | Extra | Changed | Breaking |
|-------------|---------|-------|---------|----------|
| Column      | 3       | 2     | 5       | 4        |
| Index       | 1       | 0     | 2       | 0        |
...

## Detailed Differences

### Column (10)

#### ❌ Missing in Target (3)
- **`app.users.updated_at`** **[BREAKING]**
  - Source: `timestamp with time zone`
  - Target: `NULL`
  - 💡 Column needs to be added

#### 🔄 Changed (5)
- **`app.payments.status`** **[BREAKING]**
  - **data_type** ⚠️ **BREAKING**
    - Source: `character varying`
    - Target: `text`
    - 💡 Data type change may require migration

JSON report structure

{
  "generated_at": "2025-10-19T12:00:00Z",
  "source_url": "postgresql://user:****@localhost:5432/sourcedb",
  "target_url": "postgresql://user:****@localhost:5433/targetdb",
  "summary": {
    "column": {
      "missing": 3,
      "extra": 2,
      "changed": 5,
      "breaking": 4
    }
  },
  "diffs": [
    {
      "kind": "column",
      "schema": "app",
      "name": "users",
      "subname": "updated_at",
      "status": "missing_in_target",
      "details": [
        {
          "property": "column",
          "source_value": "exists",
          "target_value": null,
          "hint": "Column needs to be added",
          "is_breaking": true
        }
      ]
    }
  ]
}

How It Works

dbdiff compares PostgreSQL databases using a safe, read-only approach:

  1. Connection - Establishes secure connections to SOURCE and TARGET databases
  2. Extraction - Queries system catalogs (pg_catalog) to extract schema metadata
  3. Normalization - Normalizes SQL definitions and handles type equivalences
  4. Comparison - Compares 14+ types of database objects across schemas
  5. Analysis - Identifies breaking changes and potential migration issues
  6. Reporting - Generates comprehensive reports in multiple formats

Supported Object Types

Object Type Description Breaking Change Detection
Tables Table definitions and properties Yes
Columns Column definitions, types, defaults Yes
Constraints PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK Yes
Indexes B-tree, GiST, GIN, BRIN, partial, expression Partial
Views Regular and materialized views Yes
Functions Stored procedures and functions Partial
Triggers Table triggers Partial
Sequences Auto-increment sequences Yes
Types Custom types (ENUM, COMPOSITE, DOMAIN) Yes
Extensions Installed extensions Yes
Grants Table and schema privileges No
Scheduled Jobs pg_cron, pgAgent, custom No

Configuration

Command-Line Options

Database Connection:
  --source-url URL              Source database connection URL
  --target-url URL              Target database connection URL
  --connect-timeout SECONDS     Connection timeout (default: 30)
  --statement-timeout MS        Query timeout in milliseconds (default: 60000)

Schema Filtering:
  --schemas SCHEMA[,SCHEMA...]  Compare only specific schemas (comma-separated)
  --exclude PATTERN             Regex pattern to exclude objects
  --include-kinds TYPE[,TYPE...]  Include only specific object types
  --exclude-kinds TYPE[,TYPE...]  Exclude specific object types

Reporting:
  --report-dir PATH             Output directory for reports (required)
  --report-format FORMAT        Output formats: markdown,json,summary (default: all)
  --export-catalog              Export source/target catalogs as JSON

Behavior:
  --fail-on CONDITION[,...]     Exit with error code 1 on specific conditions
  --baseline PATH               Compare against a baseline catalog
  --jobs-discovery-sql SQL      Custom SQL for discovering scheduled jobs
  --mask-secrets BOOL           Mask credentials in logs (default: true)

Logging:
  --log-level LEVEL             DEBUG, INFO, WARNING, ERROR (default: INFO)
  --log-format FORMAT           text or json (default: text)

Environment Variables

All options can be configured via environment variables:

Variable Description Example
SOURCE_DATABASE_URL Full source connection URL postgresql://user:pass@host:5432/db
TARGET_DATABASE_URL Full target connection URL postgresql://user:pass@host:5432/db
SOURCE_DB_HOST Source database host localhost
SOURCE_DB_PORT Source database port 5432
SOURCE_DB_NAME Source database name mydb
SOURCE_DB_USER Source database user postgres
SOURCE_DB_PASSWORD Source database password secret
TARGET_DB_* Target database settings (same as source)
LOG_LEVEL Logging level INFO
LOG_FORMAT Log output format json
CONNECT_TIMEOUT Connection timeout (seconds) 30
STATEMENT_TIMEOUT Query timeout (milliseconds) 60000
REPORT_DIR Output directory ./reports

Fail-On Conditions

Exit with error code 1 when specific differences are detected:

Condition Description
missing_table Table exists in source but not in target
extra_table Table exists in target but not in source
missing_column Column missing in target
extra_column Column exists in target but not in source
changed_column_type Column data type changed
changed_column_nullable Column nullability changed
missing_pk Primary key missing in target
changed_pk Primary key definition changed
missing_fk Foreign key missing in target
missing_index Index missing in target
missing_view View missing in target
changed_view View definition changed
missing_function Function missing in target
changed_function Function definition changed

Example:

dbdiff --fail-on missing_table,changed_column_type,missing_fk --report-dir ./reports

Troubleshooting

Connection errors

# Test connectivity
psql -h localhost -U user -d database

# Check firewall rules
telnet localhost 5432

# Verify credentials
dbdiff --source-url "$URL" --log-level DEBUG

Timeout errors

# Increase timeouts
dbdiff \
  --source-url "$URL" \
  --target-url "$URL" \
  --connect-timeout 60 \
  --statement-timeout 120000

Memory issues with large databases

# Compare specific schemas only
dbdiff \
  --source-url "$URL" \
  --target-url "$URL" \
  --schemas public

# Exclude large tables
dbdiff \
  --source-url "$URL" \
  --target-url "$URL" \
  --exclude '^(large_table_|audit_)'

pg_cron not detected

-- Ensure pg_cron extension is installed
CREATE EXTENSION IF NOT EXISTS pg_cron;

-- Grant permissions
GRANT SELECT ON cron.job TO your_user;

Custom job tables

# Provide custom discovery SQL
dbdiff \
  --source-url "$URL" \
  --target-url "$URL" \
  --jobs-discovery-sql "SELECT job_id AS name, cron_expr AS schedule, sql_cmd AS command FROM custom_jobs"

CI/CD Integration

GitHub Actions example

name: Schema Diff

on:
  pull_request:
    branches: [main]

jobs:
  schema-diff:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dbdiff
        run: pip install dbdiff

      - name: Run schema diff
        env:
          SOURCE_DATABASE_URL: ${{ secrets.PROD_DB_URL }}
          TARGET_DATABASE_URL: ${{ secrets.STAGING_DB_URL }}
        run: |
          dbdiff \
            --report-dir ./reports \
            --fail-on missing_table,changed_column_type \
            --log-format json

      - name: Upload reports
        uses: actions/upload-artifact@v3
        if: always()
        with:
          name: schema-diff-reports
          path: reports/

GitLab CI example

schema-diff:
  image: python:3.10
  script:
    - pip install dbdiff
    - |
      dbdiff \
        --source-url "$PROD_DB_URL" \
        --target-url "$STAGING_DB_URL" \
        --report-dir ./reports \
        --fail-on missing_table,changed_column_type
  artifacts:
    paths:
      - reports/
    when: always

FAQ

Q: Does dbdiff modify my database? A: No, dbdiff is read-only and only queries metadata from system catalogs.

Q: Can I compare databases on different servers? A: Yes, just provide different connection URLs for source and target.

Q: How do I ignore specific differences? A: Use --exclude with a regex pattern, or --exclude-kinds to skip entire object types.

Q: Can I use dbdiff in production? A: Yes, it's designed to be safe with read-only queries, connection pooling, and timeouts.

Q: How do I add support for MySQL? A: Extend the DatabaseAdapter base class and implement the abstract methods. See "Adding a new database adapter" above.

Q: What's the performance on large databases? A: dbdiff uses connection pooling and can handle databases with thousands of objects. For very large schemas, use --schemas to compare incrementally.

License

MIT License - see LICENSE file for details.

Author

Created by Nicolás Pinochet Flores (@YgriegaSB)

For issues or questions, please open an issue on GitHub.


Version 0.1.0 - Database Schema Comparison Tool for PostgreSQL

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages