I built this lightweight tool that crawls PostgreSQL database schemas, tracks changes over time, and provides both CLI and web interfaces for schema management and comparison.
- Schema Discovery: Automatically discover and catalog all tables, columns, and constraints
- Change Detection: Compare schema snapshots to detect additions, removals, and modifications
- Version History: Maintain a complete history of schema changes with timestamps
- Multiple Output Formats: Export schemas as JSON, CSV, or Markdown
- Rich CLI Interface: Beautiful terminal output with tables and progress indicators
- Web Dashboard: Interactive Streamlit web interface for schema exploration
- Metadata Storage: Store custom descriptions, ownership, and tags
- Scheduled Crawling: Support for automated schema monitoring
To use my tool, you'll need:
- Python 3.8+
- PostgreSQL database access
- Required Python packages (see
requirements.txt
)
-
Clone the repository:
git clone <repository-url> cd postgres-schema-crawler
-
Install dependencies:
pip install -r requirements.txt
-
Configure database connection: Edit
config.yaml
with your PostgreSQL credentials:database: host: localhost port: 5432 name: your_database user: your_username password: your_password schema: public
Windows:
run_tool.bat
Linux/Mac:
./run_tool.sh
Or directly with Python:
python run_tool.py
This single command will:
- Check all dependencies are installed
- Test database connection using your
config.yaml
- Take a new schema snapshot
- Launch the Streamlit web interface at
http://localhost:8501
If you want to run specific parts of the tool:
python src/schema_crawler.py crawl
python src/schema_crawler.py list
python src/schema_crawler.py show <snapshot_id>
python src/schema_crawler.py diff <snapshot1_id> <snapshot2_id>
python src/schema_crawler.py diff-latest
streamlit run enhanced_web_ui.py
Before running, make sure you have:
-
Python dependencies installed:
pip install -r requirements.txt
-
Database configuration set up in
config.yaml
:database: host: localhost port: 5432 name: your_database user: your_username password: your_password schema: public
-
PostgreSQL database accessible with the credentials in your config
If you get errors:
-
Check dependencies:
python -c "import psycopg2, sqlalchemy, click, rich, streamlit, plotly, pandas"
-
Test database connection:
python src/schema_crawler.py crawl
-
Check config file exists:
ls config.yaml
My tool provides a comprehensive command-line interface with multiple commands:
# Basic crawl with default settings
python src/schema_crawler.py crawl
# Custom database connection
python src/schema_crawler.py crawl --host localhost --port 5432 --database mydb --user postgres --password mypass --schema public
# Table filtering examples
python src/schema_crawler.py crawl --include-tables employees departments
python src/schema_crawler.py crawl --exclude-tables temp_table backup_table
python src/schema_crawler.py crawl --include-patterns "user*" "*_log"
python src/schema_crawler.py crawl --exclude-patterns "*_backup" "test_*"
python src/schema_crawler.py crawl --include-tables employees --include-patterns "dept*" --exclude-patterns "*_temp"
# View all saved schema snapshots
python src/schema_crawler.py list-snapshots
# Compare two specific snapshots
python src/schema_crawler.py diff 1 2
# Compare the two most recent snapshots
python src/schema_crawler.py diff-latest
# Generate a diff report
python src/schema_crawler.py diff 1 2 --output changes.md
# Export as JSON
python src/schema_crawler.py export 1 --format json
# Export as CSV
python src/schema_crawler.py export 1 --format csv
# Export as Markdown
python src/schema_crawler.py export 1 --format markdown
Launch my interactive web dashboard:
streamlit run src/web_ui.py
My web interface provides:
- Dashboard: Overview of schema snapshots and activity
- Schema Crawler: Interactive database connection and crawling
- Schema History: Browse and export historical snapshots
- Schema Comparison: Visual comparison of schema changes
- Settings: Configure database connections and preferences
My tool captures comprehensive metadata for each schema object:
- Table name and type (BASE TABLE, VIEW, FOREIGN TABLE)
- Table owner
- Creation timestamp
- Custom descriptions and tags
- Column name and data type
- Nullability constraints
- Default values
- Ordinal position
- Character length, numeric precision, scale
- Custom descriptions
- Primary keys
- Foreign keys
- Unique constraints
- Check constraints
- Not null constraints
My schema diffing engine detects:
- Added Objects: New tables, columns, or constraints
- Removed Objects: Deleted tables, columns, or constraints
- Modified Objects: Changes to data types, nullability, defaults, etc.
Schema Changes Summary:
Added: 2 | Removed: 1 | Modified: 3
π’ Added Objects (2):
ββββββββββ¬βββββββββββββββ¬βββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β Type β Name β Parent β Details β
ββββββββββΌβββββββββββββββΌβββββββββΌββββββββββββββββββββββββββββββββββββββ€
β Table β new_table β - β Table 'new_table' was added β
β Column β status β users β Column 'status' was added to table β
ββββββββββ΄βββββββββββββββ΄βββββββββ΄ββββββββββββββββββββββββββββββββββββββ
postgres-schema-crawler/
βββ src/
β βββ schema_crawler.py # Main crawler and CLI interface
β βββ schema_diff.py # Schema comparison engine
β βββ web_ui.py # Streamlit web interface
βββ data/
β βββ schema_metadata.db # SQLite database for snapshots
β βββ annotations.yaml # Custom metadata annotations
βββ config.yaml # Configuration file
βββ requirements.txt # Python dependencies
βββ README.md # This file
Configure your PostgreSQL connection in my config.yaml
:
database:
host: localhost
port: 5432
name: your_database
user: your_username
password: your_password
schema: public
crawler:
include_types:
- BASE TABLE
- VIEW
- FOREIGN TABLE
max_tables: 0 # 0 = unlimited
include_constraints: true
include_indexes: false
My tool supports powerful table filtering to focus on specific tables:
crawler:
table_filter:
# Include only specific tables
include_tables:
- employees
- departments
- salaries
# Exclude specific tables
exclude_tables:
- temp_table
- backup_table
# Include tables matching patterns (supports wildcards)
include_patterns:
- "user*" # Tables starting with "user"
- "*_log" # Tables ending with "_log"
- "temp_*" # Tables starting with "temp_"
# Exclude tables matching patterns
exclude_patterns:
- "*_backup" # Exclude backup tables
- "test_*" # Exclude test tables
- "*_old" # Exclude old tables
# Case sensitive pattern matching
case_sensitive: false
My Filtering Options:
include_tables
: List of specific tables to include (empty = all tables)exclude_tables
: List of specific tables to excludeinclude_patterns
: Wildcard patterns for tables to includeexclude_patterns
: Wildcard patterns for tables to excludecase_sensitive
: Whether pattern matching is case sensitive
My Pattern Examples:
"user*"
- Tables starting with "user""*_log"
- Tables ending with "_log""temp_*"
- Tables starting with "temp_""*_backup"
- Tables ending with "_backup"
output:
data_dir: data
export_format: json
create_reports: true
- Generate comprehensive schema documentation
- Track schema evolution over time
- Maintain up-to-date data dictionaries
- Monitor schema changes in development environments
- Validate migration scripts
- Ensure compliance with schema standards
- Track ownership and responsibility
- Monitor data lineage
- Maintain audit trails
- Compare development and production schemas
- Validate database migrations
- Document schema changes for releases
Set up automated schema monitoring using cron:
# Crawl schema daily at 2 AM
0 2 * * * cd /path/to/postgres-schema-crawler && python src/schema_crawler.py crawl
# Compare with previous day
0 3 * * * cd /path/to/postgres-schema-crawler && python src/schema_crawler.py diff-latest --output daily_changes.md
Integrate my schema validation into your CI/CD pipeline:
# Example GitHub Actions workflow
- name: Validate Schema Changes
run: |
python src/schema_crawler.py crawl
python src/schema_crawler.py diff-latest --output schema_changes.md
# Fail if breaking changes detected
if grep -q "removed\|modified" schema_changes.md; then
echo "Breaking schema changes detected!"
exit 1
fi
- Store database credentials securely (use environment variables)
- Limit database user permissions to read-only access
- Regularly rotate database passwords
- Use SSL connections for production databases
I welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
If you need help with my tool:
- Check the documentation
- Search existing issues
- Create a new issue with detailed information
- Support for other database systems (MySQL, SQL Server)
- Advanced visualization and reporting
- Integration with dbt and other data tools
- API endpoints for programmatic access
- Advanced change impact analysis
- Schema validation rules engine