dbdiff is a powerful CLI tool for comparing and validating database schemas between environments (staging vs production, source vs target). Built with Python 3.10+, it provides comprehensive schema analysis with detailed reporting capabilities.
- Features
- Installation
- Quick Start
- Usage Examples
- Output Reports
- Configuration
- CI/CD Integration
- Troubleshooting
- FAQ
- Contributing
- License
-
🔍 Comprehensive Schema Comparison
- Tables, columns, constraints (PK, FK, UNIQUE, CHECK)
- Indexes (including partial and expression indexes)
- Views and materialized views
- Functions, procedures, and triggers
- Sequences, custom types, and extensions
- Grants/privileges
- Scheduled jobs (pg_cron, pgAgent, custom)
-
🎯 Smart Difference Detection
- Type equivalence handling (int8 ≡ bigint, etc.)
- Normalized SQL comparison
- Breaking change identification
- Customizable fail conditions
-
📊 Multiple Output Formats
- Markdown reports with badges and summaries
- JSON for programmatic access
- Plain text summaries
- Catalog exports for baseline management
-
🚀 Production-Ready
- Read-only, safe for production
- Connection pooling and timeouts
- Structured logging (JSON/text)
- Comprehensive test suite
-
🔌 Extensible Architecture
- Plugin-based adapter system
- Currently supports PostgreSQL
- Easy to add MySQL, SQL Server, etc.
- Python 3.10 or higher
- PostgreSQL database access (read-only permissions sufficient)
- pip package manager
# Clone the repository
git clone https://github.com/YgriegaSB/tool-dbdiff.git
cd tool-dbdiff
# Create and activate virtual environment (recommended)
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/macOS
source venv/bin/activate
# Install dbdiff and dependencies
pip install -e .dbdiff --helpNote: For a complete quick start guide in Spanish, see QUICK_START.md
Create a .env file in the project root:
# Source database configuration
SOURCE_DB_HOST=localhost
SOURCE_DB_PORT=5432
SOURCE_DB_NAME=sourcedb
SOURCE_DB_USER=your_username
SOURCE_DB_PASSWORD=your_password
# Target database configuration
TARGET_DB_HOST=localhost
TARGET_DB_PORT=5433
TARGET_DB_NAME=targetdb
TARGET_DB_USER=your_username
TARGET_DB_PASSWORD=your_passwordAlternative: Use full connection URLs
SOURCE_DATABASE_URL=postgresql://user:pass@localhost:5432/sourcedb
TARGET_DATABASE_URL=postgresql://user:pass@localhost:5433/targetdb# Basic comparison (uses .env configuration)
dbdiff --report-dir ./reports
# Or with explicit URLs
dbdiff \
--source-url "postgresql://user:pass@localhost:5432/sourcedb" \
--target-url "postgresql://user:pass@localhost:5433/targetdb" \
--report-dir ./reportsCheck the generated reports in ./reports/:
diff.md- Human-readable Markdown reportdiff.json- Complete JSON data for programmatic accesssummary.txt- Executive summary
dbdiff \
--source-url "$SOURCE_URL" \
--target-url "$TARGET_URL" \
--schemas public,app \
--report-dir ./reportsdbdiff \
--source-url "$SOURCE_URL" \
--target-url "$TARGET_URL" \
--fail-on missing_table,changed_column_type,missing_fk \
--report-dir ./reportsExit codes:
0: No differences (or non-breaking only)1: Differences match fail-on conditions2: Error during execution
dbdiff \
--source-url "$SOURCE_URL" \
--target-url "$TARGET_URL" \
--exclude '^(tmp_|audit_|temp_)' \
--report-dir ./reportsdbdiff \
--source-url "$SOURCE_URL" \
--target-url "$TARGET_URL" \
--include-kinds table,view,function \
--report-dir ./reports# Export current catalog
dbdiff \
--source-url "$SOURCE_URL" \
--target-url "$TARGET_URL" \
--export-catalog \
--report-dir ./baseline
# Later, compare against baseline
dbdiff \
--source-url "$SOURCE_URL" \
--target-url "$TARGET_URL" \
--baseline ./baseline/diff.json \
--report-dir ./reportsdbdiff \
--source-url "$SOURCE_URL" \
--target-url "$TARGET_URL" \
--jobs-discovery-sql "SELECT id, schedule, command FROM my_jobs_table" \
--report-dir ./reportsdbdiff \
--source-url "$SOURCE_URL" \
--target-url "$TARGET_URL" \
--log-format json \
--log-level INFO \
--report-dir ./reportsdbdiff automatically protects your credentials in logs by default. No configuration needed!
# Your normal command - credentials are automatically masked
dbdiff --report-dir ./reports
# What you see in logs:
# [INFO] Connecting to postgresql://user:****@host:5432/db
# ^^^^
# Protected!
# To disable masking (NOT recommended in production)
dbdiff --report-dir ./reports --mask-secrets=falseWhat gets masked:
- Database passwords:
postgresql://user:pass@host→postgresql://user:****@host - API keys:
api_key=secret→api_key=**** - Tokens:
token=abc123→token=**** - Auth headers:
Authorization: Bearer xyz→Authorization: ****
This works with:
- ✅ Environment variables (
.envfile) - ✅ Direct URLs (
--source-url) - ✅ Both text and JSON log formats
- ✅ All output reports
reports/
├── diff.json # Detailed JSON report
├── diff.md # Human-readable Markdown
├── summary.txt # Plain text summary
├── source.json # Source catalog (with --export-catalog)
└── target.json # Target catalog (with --export-catalog)
# Database Schema Diff Report
## Executive Summary
- Total Differences: 42
- Breaking Changes: 8
### Summary by Object Type
| Object Type | Missing | Extra | Changed | Breaking |
|-------------|---------|-------|---------|----------|
| Column | 3 | 2 | 5 | 4 |
| Index | 1 | 0 | 2 | 0 |
...
## Detailed Differences
### Column (10)
#### ❌ Missing in Target (3)
- **`app.users.updated_at`** **[BREAKING]**
- Source: `timestamp with time zone`
- Target: `NULL`
- 💡 Column needs to be added
#### 🔄 Changed (5)
- **`app.payments.status`** **[BREAKING]**
- **data_type** ⚠️ **BREAKING**
- Source: `character varying`
- Target: `text`
- 💡 Data type change may require migration{
"generated_at": "2025-10-19T12:00:00Z",
"source_url": "postgresql://user:****@localhost:5432/sourcedb",
"target_url": "postgresql://user:****@localhost:5433/targetdb",
"summary": {
"column": {
"missing": 3,
"extra": 2,
"changed": 5,
"breaking": 4
}
},
"diffs": [
{
"kind": "column",
"schema": "app",
"name": "users",
"subname": "updated_at",
"status": "missing_in_target",
"details": [
{
"property": "column",
"source_value": "exists",
"target_value": null,
"hint": "Column needs to be added",
"is_breaking": true
}
]
}
]
}dbdiff compares PostgreSQL databases using a safe, read-only approach:
- Connection - Establishes secure connections to SOURCE and TARGET databases
- Extraction - Queries system catalogs (pg_catalog) to extract schema metadata
- Normalization - Normalizes SQL definitions and handles type equivalences
- Comparison - Compares 14+ types of database objects across schemas
- Analysis - Identifies breaking changes and potential migration issues
- Reporting - Generates comprehensive reports in multiple formats
| Object Type | Description | Breaking Change Detection |
|---|---|---|
| Tables | Table definitions and properties | Yes |
| Columns | Column definitions, types, defaults | Yes |
| Constraints | PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK | Yes |
| Indexes | B-tree, GiST, GIN, BRIN, partial, expression | Partial |
| Views | Regular and materialized views | Yes |
| Functions | Stored procedures and functions | Partial |
| Triggers | Table triggers | Partial |
| Sequences | Auto-increment sequences | Yes |
| Types | Custom types (ENUM, COMPOSITE, DOMAIN) | Yes |
| Extensions | Installed extensions | Yes |
| Grants | Table and schema privileges | No |
| Scheduled Jobs | pg_cron, pgAgent, custom | No |
Database Connection:
--source-url URL Source database connection URL
--target-url URL Target database connection URL
--connect-timeout SECONDS Connection timeout (default: 30)
--statement-timeout MS Query timeout in milliseconds (default: 60000)
Schema Filtering:
--schemas SCHEMA[,SCHEMA...] Compare only specific schemas (comma-separated)
--exclude PATTERN Regex pattern to exclude objects
--include-kinds TYPE[,TYPE...] Include only specific object types
--exclude-kinds TYPE[,TYPE...] Exclude specific object types
Reporting:
--report-dir PATH Output directory for reports (required)
--report-format FORMAT Output formats: markdown,json,summary (default: all)
--export-catalog Export source/target catalogs as JSON
Behavior:
--fail-on CONDITION[,...] Exit with error code 1 on specific conditions
--baseline PATH Compare against a baseline catalog
--jobs-discovery-sql SQL Custom SQL for discovering scheduled jobs
--mask-secrets BOOL Mask credentials in logs (default: true)
Logging:
--log-level LEVEL DEBUG, INFO, WARNING, ERROR (default: INFO)
--log-format FORMAT text or json (default: text)
All options can be configured via environment variables:
| Variable | Description | Example |
|---|---|---|
SOURCE_DATABASE_URL |
Full source connection URL | postgresql://user:pass@host:5432/db |
TARGET_DATABASE_URL |
Full target connection URL | postgresql://user:pass@host:5432/db |
SOURCE_DB_HOST |
Source database host | localhost |
SOURCE_DB_PORT |
Source database port | 5432 |
SOURCE_DB_NAME |
Source database name | mydb |
SOURCE_DB_USER |
Source database user | postgres |
SOURCE_DB_PASSWORD |
Source database password | secret |
TARGET_DB_* |
Target database settings | (same as source) |
LOG_LEVEL |
Logging level | INFO |
LOG_FORMAT |
Log output format | json |
CONNECT_TIMEOUT |
Connection timeout (seconds) | 30 |
STATEMENT_TIMEOUT |
Query timeout (milliseconds) | 60000 |
REPORT_DIR |
Output directory | ./reports |
Exit with error code 1 when specific differences are detected:
| Condition | Description |
|---|---|
missing_table |
Table exists in source but not in target |
extra_table |
Table exists in target but not in source |
missing_column |
Column missing in target |
extra_column |
Column exists in target but not in source |
changed_column_type |
Column data type changed |
changed_column_nullable |
Column nullability changed |
missing_pk |
Primary key missing in target |
changed_pk |
Primary key definition changed |
missing_fk |
Foreign key missing in target |
missing_index |
Index missing in target |
missing_view |
View missing in target |
changed_view |
View definition changed |
missing_function |
Function missing in target |
changed_function |
Function definition changed |
Example:
dbdiff --fail-on missing_table,changed_column_type,missing_fk --report-dir ./reports# Test connectivity
psql -h localhost -U user -d database
# Check firewall rules
telnet localhost 5432
# Verify credentials
dbdiff --source-url "$URL" --log-level DEBUG# Increase timeouts
dbdiff \
--source-url "$URL" \
--target-url "$URL" \
--connect-timeout 60 \
--statement-timeout 120000# Compare specific schemas only
dbdiff \
--source-url "$URL" \
--target-url "$URL" \
--schemas public
# Exclude large tables
dbdiff \
--source-url "$URL" \
--target-url "$URL" \
--exclude '^(large_table_|audit_)'-- Ensure pg_cron extension is installed
CREATE EXTENSION IF NOT EXISTS pg_cron;
-- Grant permissions
GRANT SELECT ON cron.job TO your_user;# Provide custom discovery SQL
dbdiff \
--source-url "$URL" \
--target-url "$URL" \
--jobs-discovery-sql "SELECT job_id AS name, cron_expr AS schedule, sql_cmd AS command FROM custom_jobs"name: Schema Diff
on:
pull_request:
branches: [main]
jobs:
schema-diff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dbdiff
run: pip install dbdiff
- name: Run schema diff
env:
SOURCE_DATABASE_URL: ${{ secrets.PROD_DB_URL }}
TARGET_DATABASE_URL: ${{ secrets.STAGING_DB_URL }}
run: |
dbdiff \
--report-dir ./reports \
--fail-on missing_table,changed_column_type \
--log-format json
- name: Upload reports
uses: actions/upload-artifact@v3
if: always()
with:
name: schema-diff-reports
path: reports/schema-diff:
image: python:3.10
script:
- pip install dbdiff
- |
dbdiff \
--source-url "$PROD_DB_URL" \
--target-url "$STAGING_DB_URL" \
--report-dir ./reports \
--fail-on missing_table,changed_column_type
artifacts:
paths:
- reports/
when: alwaysQ: Does dbdiff modify my database? A: No, dbdiff is read-only and only queries metadata from system catalogs.
Q: Can I compare databases on different servers? A: Yes, just provide different connection URLs for source and target.
Q: How do I ignore specific differences?
A: Use --exclude with a regex pattern, or --exclude-kinds to skip entire object types.
Q: Can I use dbdiff in production? A: Yes, it's designed to be safe with read-only queries, connection pooling, and timeouts.
Q: How do I add support for MySQL?
A: Extend the DatabaseAdapter base class and implement the abstract methods. See "Adding a new database adapter" above.
Q: What's the performance on large databases?
A: dbdiff uses connection pooling and can handle databases with thousands of objects. For very large schemas, use --schemas to compare incrementally.
MIT License - see LICENSE file for details.
Created by Nicolás Pinochet Flores (@YgriegaSB)
For issues or questions, please open an issue on GitHub.
Version 0.1.0 - Database Schema Comparison Tool for PostgreSQL