# Efficient Database Rebuild - COPY FROM DATABASE

## Purpose
This notebook rebuilds the database using DuckDB's built-in `COPY FROM DATABASE` command, which is the most efficient method for eliminating fragmentation and bloat.

## The Problem
Our database balloons in size with every transaction due to:
- **Versioned updates**: DuckDB stores versioned updates and deletes
- **No automatic vacuuming**: Space isn't automatically reclaimed
- **Fragmentation**: Deleted records leave behind empty space

## The Solution
DuckDB's `COPY FROM DATABASE` is designed specifically for this purpose. It:
- Creates a fresh, clean copy of the entire database
- Eliminates all fragmentation in one operation
- Is much faster than row-by-row copying
- Maintains all indexes, constraints, and schema

## Process Overview
1. **Verify** original database exists and log statistics
2. **Create** empty new database file
3. **Attach** both databases
4. **Copy** entire contents using `COPY FROM DATABASE`
5. **Verify** data integrity and statistics match
6. **Replace** old database with new one

## Advantages Over Manual Copying (Notebook 24)
- ✅ **Much faster**: Built-in optimization
- ✅ **Simpler**: Single command vs. multiple table copies
- ✅ **Less code**: Fewer opportunities for errors
- ✅ **Preserves everything**: All schema details maintained automatically

In [1]:
# Configuration and setup
import os
import time
from pathlib import Path
from datetime import datetime
from utils.database.db_utils import get_db_connection

# Define paths
project_root = Path.cwd().parent if "notebooks" in str(Path.cwd()) else Path.cwd()
original_db_path = project_root / "data" / "processed" / "chess_games.db"
new_db_path = project_root / "data" / "processed" / "chess_games_clean.db"
backup_db_path = project_root / "data" / "processed" / "chess_games_pre_rebuild_backup.db"

print("=" * 80)
print("DATABASE REBUILD CONFIGURATION")
print("=" * 80)
print(f"\nOriginal database: {original_db_path}")
print(f"New (clean) database: {new_db_path}")
print(f"Backup path: {backup_db_path}")
print(f"\nOriginal database exists: {original_db_path.exists()}")
print(f"New database exists: {new_db_path.exists()}")

if new_db_path.exists():
    print(f"\n⚠️  WARNING: New database file already exists!")
    print(f"   It will be DELETED and recreated from scratch.")
else:
    print(f"\n✓ New database path is clear - ready to create fresh database")

DATABASE REBUILD CONFIGURATION

Original database: /Users/a/Documents/personalprojects/chess-opening-recommender/data/processed/chess_games.db
New (clean) database: /Users/a/Documents/personalprojects/chess-opening-recommender/data/processed/chess_games_clean.db
Backup path: /Users/a/Documents/personalprojects/chess-opening-recommender/data/processed/chess_games_pre_rebuild_backup.db

Original database exists: True
New database exists: False

✓ New database path is clear - ready to create fresh database


In [2]:
# Log original database statistics
if not original_db_path.exists():
    raise FileNotFoundError(f"Original database not found at {original_db_path}")

print("=" * 80)
print("ORIGINAL DATABASE STATISTICS")
print("=" * 80)

# File size
original_size_bytes = os.path.getsize(original_db_path)
original_size_mb = original_size_bytes / (1024 * 1024)
original_size_gb = original_size_mb / 1024

print(f"\n--- File Size ---")
print(f"Size: {original_size_mb:,.1f} MB ({original_size_gb:.2f} GB)")
print(f"Raw bytes: {original_size_bytes:,}")

# Record counts and statistics
with get_db_connection(original_db_path) as con:
    print(f"\n--- Record Counts ---")
    player_count = con.execute('SELECT COUNT(*) FROM player').fetchone()[0]
    opening_count = con.execute('SELECT COUNT(*) FROM opening').fetchone()[0]
    total_stats = con.execute('SELECT COUNT(*) FROM player_opening_stats').fetchone()[0]
    
    print(f"Players: {player_count:,}")
    print(f"Openings: {opening_count:,}")
    print(f"Player-Opening-Stats Records: {total_stats:,}")
    
    # Partition distribution
    print(f"\n--- Partition Distribution ---")
    partition_counts = {}
    for letter in ['A', 'B', 'C', 'D', 'E', 'other']:
        count = con.execute(f'SELECT COUNT(*) FROM player_opening_stats_{letter}').fetchone()[0]
        partition_counts[letter] = count
        percentage = (count / total_stats * 100) if total_stats > 0 else 0
        print(f"  Partition {letter}: {count:,} ({percentage:.1f}%)")
    
    # Game statistics
    print(f"\n--- Game Statistics ---")
    total_games = con.execute("""
        SELECT SUM(num_wins + num_draws + num_losses) as total_games
        FROM player_opening_stats
    """).fetchone()[0]
    
    print(f"Total Games: {total_games:,}")
    print(f"Average Games per Stats Record: {total_games/total_stats:.1f}")
    print(f"Bytes per Stats Record: {original_size_bytes/total_stats:.1f}")
    print(f"Bytes per Game: {original_size_bytes/total_games:.2f}")

# Store these for later comparison
original_stats = {
    'size_bytes': original_size_bytes,
    'players': player_count,
    'openings': opening_count,
    'stats_records': total_stats,
    'total_games': total_games,
    'partition_counts': partition_counts
}

print(f"\n✓ Original database statistics logged")

ORIGINAL DATABASE STATISTICS

--- File Size ---
Size: 3,549.0 MB (3.47 GB)
Raw bytes: 3,721,408,512

--- Record Counts ---
Players: 50,000
Openings: 3,223
Player-Opening-Stats Records: 25,378,100

--- Partition Distribution ---
  Partition A: 5,843,574 (23.0%)
  Partition B: 6,643,720 (26.2%)
  Partition C: 8,439,229 (33.3%)
  Partition D: 3,473,275 (13.7%)
  Partition E: 978,302 (3.9%)
  Partition other: 0 (0.0%)

--- Game Statistics ---
Total Games: 474,876,416
Average Games per Stats Record: 18.7
Bytes per Stats Record: 146.6
Bytes per Game: 7.84
Total Games: 474,876,416
Average Games per Stats Record: 18.7
Bytes per Stats Record: 146.6
Bytes per Game: 7.84

✓ Original database statistics logged

✓ Original database statistics logged


In [3]:
# Delete existing new database if it exists
print("=" * 80)
print("PREPARING NEW DATABASE")
print("=" * 80)

if new_db_path.exists():
    print(f"\n🗑️  Deleting existing new database file...")
    os.remove(new_db_path)
    print(f"   ✓ Deleted: {new_db_path}")

print(f"\n✓ New database path is clear")
print(f"   Note: The database file will be created during the COPY operation")

PREPARING NEW DATABASE

✓ New database path is clear
   Note: The database file will be created during the COPY operation


In [4]:
# Check for orphaned records in the ORIGINAL database before copying
print("=" * 80)
print("CHECKING FOR ORPHANED RECORDS IN ORIGINAL DATABASE")
print("=" * 80)

with get_db_connection(original_db_path) as con:
    # Check for orphaned player_ids
    print("\n📋 Checking for orphaned player_ids in player_opening_stats...")
    orphaned_players_query = """
        SELECT DISTINCT pos.player_id
        FROM player_opening_stats pos
        LEFT JOIN player p ON pos.player_id = p.id
        WHERE p.id IS NULL
        LIMIT 10
    """
    orphaned_players = con.execute(orphaned_players_query).fetchall()
    orphaned_player_count = con.execute("""
        SELECT COUNT(DISTINCT pos.player_id)
        FROM player_opening_stats pos
        LEFT JOIN player p ON pos.player_id = p.id
        WHERE p.id IS NULL
    """).fetchone()[0]
    
    print(f"   Found {orphaned_player_count:,} orphaned player_ids")
    if orphaned_player_count > 0:
        print(f"   Examples: {[p[0] for p in orphaned_players[:5]]}")
    
    # Check for orphaned opening_ids
    print("\n📋 Checking for orphaned opening_ids in player_opening_stats...")
    orphaned_openings_query = """
        SELECT DISTINCT pos.opening_id
        FROM player_opening_stats pos
        LEFT JOIN opening o ON pos.opening_id = o.id
        WHERE o.id IS NULL
        LIMIT 10
    """
    orphaned_openings = con.execute(orphaned_openings_query).fetchall()
    orphaned_opening_count = con.execute("""
        SELECT COUNT(DISTINCT pos.opening_id)
        FROM player_opening_stats pos
        LEFT JOIN opening o ON pos.opening_id = o.id
        WHERE o.id IS NULL
    """).fetchone()[0]
    
    print(f"   Found {orphaned_opening_count:,} orphaned opening_ids")
    if orphaned_opening_count > 0:
        print(f"   Examples: {[o[0] for o in orphaned_openings[:5]]}")
    
    # Count how many stats records are affected
    print("\n📋 Counting affected stats records...")
    affected_records = con.execute("""
        SELECT COUNT(*)
        FROM player_opening_stats pos
        WHERE NOT EXISTS (SELECT 1 FROM player p WHERE p.id = pos.player_id)
           OR NOT EXISTS (SELECT 1 FROM opening o WHERE o.id = pos.opening_id)
    """).fetchone()[0]
    
    print(f"   Total stats records with orphaned references: {affected_records:,}")
    print(f"   Percentage of total: {(affected_records / total_stats * 100):.4f}%")
    
    if orphaned_player_count > 0 or orphaned_opening_count > 0:
        print(f"\n⚠️  WARNING: Your database has referential integrity issues!")
        print(f"   The COPY FROM DATABASE command enforces foreign key constraints.")
        print(f"   We need to either:")
        print(f"     1. Clean up orphaned records before copying")
        print(f"     2. Use a workaround that doesn't enforce constraints")
    else:
        print(f"\n✅ No orphaned records found - database integrity is good!")


CHECKING FOR ORPHANED RECORDS IN ORIGINAL DATABASE

📋 Checking for orphaned player_ids in player_opening_stats...
   Found 0 orphaned player_ids

📋 Checking for orphaned opening_ids in player_opening_stats...
   Found 0 orphaned opening_ids

📋 Counting affected stats records...
   Found 0 orphaned player_ids

📋 Checking for orphaned opening_ids in player_opening_stats...
   Found 0 orphaned opening_ids

📋 Counting affected stats records...
   Total stats records with orphaned references: 0
   Percentage of total: 0.0000%

✅ No orphaned records found - database integrity is good!
   Total stats records with orphaned references: 0
   Percentage of total: 0.0000%

✅ No orphaned records found - database integrity is good!


In [5]:
# Perform the database copy using EXPORT/IMPORT (avoids FK constraint issues)
print("=" * 80)
print("COPYING DATABASE")
print("=" * 80)

print(f"\n🚀 Starting database copy operation...")
print(f"   Using EXPORT DATABASE + IMPORT DATABASE to avoid FK constraint checks")
print(f"\n⏱️  Started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

start_time = time.time()

try:
    # Ensure the new database file doesn't exist
    if new_db_path.exists():
        print(f"\n🗑️  Removing stale new database file...")
        os.remove(new_db_path)
        print(f"   ✓ Removed")
    
    # Create a temporary directory for the export
    temp_export_dir = project_root / "data" / "processed" / "temp_db_export"
    if temp_export_dir.exists():
        import shutil
        shutil.rmtree(temp_export_dir)
    temp_export_dir.mkdir(exist_ok=True)
    
    print(f"\n📋 Step 1: Exporting original database...")
    with get_db_connection(original_db_path) as con:
        con.execute(f"EXPORT DATABASE '{temp_export_dir}'")
    print(f"   ✓ Database exported to temporary directory")
    
    print(f"\n📋 Step 2: Importing into new database...")
    print(f"   This may take several minutes depending on database size...")
    with get_db_connection(new_db_path) as con:
        con.execute(f"IMPORT DATABASE '{temp_export_dir}'")
    print(f"   ✓ Database imported successfully")
    
    print(f"\n📋 Step 3: Cleaning up temporary files...")
    import shutil
    shutil.rmtree(temp_export_dir)
    print(f"   ✓ Temporary export directory removed")
    
    end_time = time.time()
    elapsed_seconds = end_time - start_time
    elapsed_minutes = elapsed_seconds / 60
    
    print(f"\n⏱️  Completed at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"⏱️  Time elapsed: {elapsed_seconds:.1f} seconds ({elapsed_minutes:.2f} minutes)")
    print(f"\n✅ DATABASE COPY SUCCESSFUL!")
    
except Exception as e:
    print(f"\n❌ ERROR DURING COPY: {e}")
    print(f"\nThe operation failed. The original database is unchanged.")
    raise

COPYING DATABASE

🚀 Starting database copy operation...
   Using EXPORT DATABASE + IMPORT DATABASE to avoid FK constraint checks

⏱️  Started at: 2025-10-30 13:11:48

📋 Step 1: Exporting original database...
   ✓ Database exported to temporary directory

📋 Step 2: Importing into new database...
   This may take several minutes depending on database size...
   ✓ Database exported to temporary directory

📋 Step 2: Importing into new database...
   This may take several minutes depending on database size...


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

   ✓ Database imported successfully

📋 Step 3: Cleaning up temporary files...
   ✓ Temporary export directory removed

⏱️  Completed at: 2025-10-30 13:14:34
⏱️  Time elapsed: 166.7 seconds (2.78 minutes)

✅ DATABASE COPY SUCCESSFUL!


In [6]:
# Verify the new database
print("=" * 80)
print("COMPREHENSIVE VERIFICATION")
print("=" * 80)

with get_db_connection(new_db_path) as con:
    print(f"\n--- Record Counts Verification ---")
    
    new_player_count = con.execute('SELECT COUNT(*) FROM player').fetchone()[0]
    new_opening_count = con.execute('SELECT COUNT(*) FROM opening').fetchone()[0]
    new_stats_count = con.execute('SELECT COUNT(*) FROM player_opening_stats').fetchone()[0]
    
    print(f"Players:")
    print(f"  Original: {original_stats['players']:,}")
    print(f"  New:      {new_player_count:,}")
    print(f"  Match:    {'✓ YES' if new_player_count == original_stats['players'] else '✗ NO'}")
    
    print(f"\nOpenings:")
    print(f"  Original: {original_stats['openings']:,}")
    print(f"  New:      {new_opening_count:,}")
    print(f"  Match:    {'✓ YES' if new_opening_count == original_stats['openings'] else '✗ NO'}")
    
    print(f"\nPlayer-Opening-Stats:")
    print(f"  Original: {original_stats['stats_records']:,}")
    print(f"  New:      {new_stats_count:,}")
    print(f"  Match:    {'✓ YES' if new_stats_count == original_stats['stats_records'] else '✗ NO'}")
    
    # Verify partition distribution matches
    print(f"\n--- Partition Distribution Verification ---")
    partition_matches = True
    for letter in ['A', 'B', 'C', 'D', 'E', 'other']:
        orig_count = original_stats['partition_counts'][letter]
        new_count = con.execute(f'SELECT COUNT(*) FROM player_opening_stats_{letter}').fetchone()[0]
        match = '✓' if orig_count == new_count else '✗'
        if orig_count != new_count:
            partition_matches = False
        print(f"  Partition {letter}: {orig_count:>10,} → {new_count:>10,}  {match}")
    
    # Verify game totals match
    print(f"\n--- Game Statistics Verification ---")
    new_total_games = con.execute("""
        SELECT SUM(num_wins + num_draws + num_losses)
        FROM player_opening_stats
    """).fetchone()[0]
    
    print(f"Total Games:")
    print(f"  Original: {original_stats['total_games']:,}")
    print(f"  New:      {new_total_games:,}")
    print(f"  Match:    {'✓ YES' if new_total_games == original_stats['total_games'] else '✗ NO'}")
    
    # Check for referential integrity
    print(f"\n--- Referential Integrity Checks ---")
    
    # Check for orphaned stats (player_id not in player table)
    orphaned_players = con.execute("""
        SELECT COUNT(DISTINCT pos.player_id)
        FROM player_opening_stats pos
        LEFT JOIN player p ON pos.player_id = p.id
        WHERE p.id IS NULL
    """).fetchone()[0]
    
    print(f"Orphaned player_ids in stats: {orphaned_players:,}")
    print(f"  Status: {'✓ GOOD (none)' if orphaned_players == 0 else '✗ ERROR - orphaned records exist!'}")
    
    # Check for orphaned stats (opening_id not in opening table)
    orphaned_openings = con.execute("""
        SELECT COUNT(DISTINCT pos.opening_id)
        FROM player_opening_stats pos
        LEFT JOIN opening o ON pos.opening_id = o.id
        WHERE o.id IS NULL
    """).fetchone()[0]
    
    print(f"\nOrphaned opening_ids in stats: {orphaned_openings:,}")
    print(f"  Status: {'✓ GOOD (none)' if orphaned_openings == 0 else '✗ ERROR - orphaned records exist!'}")
    
    # Verify schema by checking a sample of data
    print(f"\n--- Schema Verification (Sample Data) ---")
    sample_player = con.execute("SELECT * FROM player LIMIT 1").fetchdf()
    sample_opening = con.execute("SELECT * FROM opening LIMIT 1").fetchdf()
    sample_stats = con.execute("SELECT * FROM player_opening_stats LIMIT 1").fetchdf()
    
    print(f"Player columns: {list(sample_player.columns)}")
    print(f"Opening columns: {list(sample_opening.columns)}")
    print(f"Stats columns: {list(sample_stats.columns)}")
    
    # Final verdict
    all_checks_passed = (
        new_player_count == original_stats['players'] and
        new_opening_count == original_stats['openings'] and
        new_stats_count == original_stats['stats_records'] and
        new_total_games == original_stats['total_games'] and
        orphaned_players == 0 and
        orphaned_openings == 0 and
        partition_matches
    )
    
    print(f"\n{'='*80}")
    if all_checks_passed:
        print(f"✅ ALL VERIFICATION CHECKS PASSED")
        print(f"   The new database is an exact copy of the original")
    else:
        print(f"❌ VERIFICATION FAILED")
        print(f"   The new database does NOT match the original")
        print(f"   DO NOT proceed with replacement - investigate errors above")
    print(f"{'='*80}")

# Store verification result for next step
verification_passed = all_checks_passed

COMPREHENSIVE VERIFICATION

--- Record Counts Verification ---
Players:
  Original: 50,000
  New:      50,000
  Match:    ✓ YES

Openings:
  Original: 3,223
  New:      3,223
  Match:    ✓ YES

Player-Opening-Stats:
  Original: 25,378,100
  New:      25,378,100
  Match:    ✓ YES

--- Partition Distribution Verification ---
  Partition A:  5,843,574 →  5,843,574  ✓
  Partition B:  6,643,720 →  6,643,720  ✓
  Partition C:  8,439,229 →  8,439,229  ✓
  Partition D:  3,473,275 →  3,473,275  ✓
  Partition E:    978,302 →    978,302  ✓
  Partition other:          0 →          0  ✓

--- Game Statistics Verification ---
Total Games:
  Original: 474,876,416
  New:      474,876,416
  Match:    ✓ YES

--- Referential Integrity Checks ---
Total Games:
  Original: 474,876,416
  New:      474,876,416
  Match:    ✓ YES

--- Referential Integrity Checks ---
Orphaned player_ids in stats: 0
  Status: ✓ GOOD (none)
Orphaned player_ids in stats: 0
  Status: ✓ GOOD (none)

Orphaned opening_ids in stats: 0
 

In [7]:
# File size comparison
print("=" * 80)
print("FILE SIZE COMPARISON")
print("=" * 80)

new_size_bytes = os.path.getsize(new_db_path)
new_size_mb = new_size_bytes / (1024 * 1024)
new_size_gb = new_size_mb / 1024

size_difference_bytes = original_stats['size_bytes'] - new_size_bytes
size_difference_mb = size_difference_bytes / (1024 * 1024)
percentage_reduction = (size_difference_bytes / original_stats['size_bytes']) * 100

print(f"\n--- Original Database ---")
print(f"Size: {original_stats['size_bytes'] / (1024*1024):,.1f} MB ({original_stats['size_bytes'] / (1024*1024*1024):.2f} GB)")
print(f"Raw bytes: {original_stats['size_bytes']:,}")

print(f"\n--- Rebuilt Database ---")
print(f"Size: {new_size_mb:,.1f} MB ({new_size_gb:.2f} GB)")
print(f"Raw bytes: {new_size_bytes:,}")

print(f"\n--- Comparison ---")
if size_difference_bytes > 0:
    print(f"✅ SIZE REDUCED by {size_difference_mb:,.1f} MB ({percentage_reduction:.1f}%)")
    print(f"   Space reclaimed: {size_difference_bytes:,} bytes")
    print(f"   This represents {size_difference_bytes / (1024*1024*1024):.2f} GB saved!")
elif size_difference_bytes < 0:
    print(f"⚠️  Size INCREASED by {abs(size_difference_mb):,.1f} MB ({abs(percentage_reduction):.1f}%)")
    print(f"   This is unusual but can happen with optimization overhead")
else:
    print(f"Size is EXACTLY THE SAME")
    print(f"   The original database was already fully optimized")

# Efficiency metrics
print(f"\n--- Storage Efficiency ---")
bytes_per_record_old = original_stats['size_bytes'] / original_stats['stats_records']
bytes_per_record_new = new_size_bytes / original_stats['stats_records']
bytes_per_game_old = original_stats['size_bytes'] / original_stats['total_games']
bytes_per_game_new = new_size_bytes / original_stats['total_games']

print(f"Bytes per stats record:")
print(f"  Original: {bytes_per_record_old:.1f}")
print(f"  New:      {bytes_per_record_new:.1f}")
print(f"  Change:   {((bytes_per_record_new - bytes_per_record_old) / bytes_per_record_old * 100):+.1f}%")

print(f"\nBytes per game:")
print(f"  Original: {bytes_per_game_old:.2f}")
print(f"  New:      {bytes_per_game_new:.2f}")
print(f"  Change:   {((bytes_per_game_new - bytes_per_game_old) / bytes_per_game_old * 100):+.1f}%")

FILE SIZE COMPARISON

--- Original Database ---
Size: 3,549.0 MB (3.47 GB)
Raw bytes: 3,721,408,512

--- Rebuilt Database ---
Size: 2,043.3 MB (2.00 GB)
Raw bytes: 2,142,515,200

--- Comparison ---
✅ SIZE REDUCED by 1,505.8 MB (42.4%)
   Space reclaimed: 1,578,893,312 bytes
   This represents 1.47 GB saved!

--- Storage Efficiency ---
Bytes per stats record:
  Original: 146.6
  New:      84.4
  Change:   -42.4%

Bytes per game:
  Original: 7.84
  New:      4.51
  Change:   -42.4%


In [8]:
# Replace old database with new one (only if verification passed)
print("=" * 80)
print("DATABASE REPLACEMENT")
print("=" * 80)

if not verification_passed:
    print("\n❌ CANNOT PROCEED WITH REPLACEMENT")
    print("   Verification checks failed - the new database does not match the original")
    print("   The original database remains unchanged")
    print(f"   The new (incomplete) database is at: {new_db_path}")
    print(f"   Please investigate the errors before attempting replacement")
else:
    print(f"\n✅ Verification passed - proceeding with database replacement")
    print(f"\n📋 Step 1: Creating backup of original database...")
    
    # Create backup by renaming original
    if backup_db_path.exists():
        print(f"   ⚠️  Backup already exists at {backup_db_path}")
        print(f"   Removing old backup...")
        os.remove(backup_db_path)
    
    os.rename(original_db_path, backup_db_path)
    print(f"   ✓ Original database backed up to: {backup_db_path}")
    
    print(f"\n📋 Step 2: Moving new database to original location...")
    os.rename(new_db_path, original_db_path)
    print(f"   ✓ New database is now the active database at: {original_db_path}")
    
    print(f"\n{'='*80}")
    print(f"🎉 DATABASE REPLACEMENT COMPLETE!")
    print(f"{'='*80}")
    print(f"\nSummary:")
    print(f"  ✓ Original database backed up to: {backup_db_path}")
    print(f"  ✓ New optimized database active at: {original_db_path}")
    print(f"  ✓ Space saved: {size_difference_mb:,.1f} MB ({percentage_reduction:.1f}%)")
    print(f"\nNext steps:")
    print(f"  1. Test the new database with your application")
    print(f"  2. If everything works correctly, you can delete the backup:")
    print(f"     rm {backup_db_path}")
    print(f"  3. If there are any issues, restore the backup:")
    print(f"     mv {backup_db_path} {original_db_path}")

DATABASE REPLACEMENT

✅ Verification passed - proceeding with database replacement

📋 Step 1: Creating backup of original database...
   ✓ Original database backed up to: /Users/a/Documents/personalprojects/chess-opening-recommender/data/processed/chess_games_pre_rebuild_backup.db

📋 Step 2: Moving new database to original location...
   ✓ New database is now the active database at: /Users/a/Documents/personalprojects/chess-opening-recommender/data/processed/chess_games.db

🎉 DATABASE REPLACEMENT COMPLETE!

Summary:
  ✓ Original database backed up to: /Users/a/Documents/personalprojects/chess-opening-recommender/data/processed/chess_games_pre_rebuild_backup.db
  ✓ New optimized database active at: /Users/a/Documents/personalprojects/chess-opening-recommender/data/processed/chess_games.db
  ✓ Space saved: 1,505.8 MB (42.4%)

Next steps:
  1. Test the new database with your application
  2. If everything works correctly, you can delete the backup:
     rm /Users/a/Documents/personalproje

In [9]:
# Final summary and verification
print("=" * 80)
print("FINAL SUMMARY")
print("=" * 80)

print(f"\n🎯 Mission Accomplished!")
print(f"\nThe database has been successfully rebuilt using DuckDB's COPY FROM DATABASE.")
print(f"This is the recommended approach for eliminating fragmentation and bloat.")

print(f"\n--- Results ---")
print(f"Original size: {original_stats['size_bytes'] / (1024*1024):,.1f} MB")
print(f"New size: {new_size_mb:,.1f} MB")
print(f"Space saved: {size_difference_mb:,.1f} MB ({percentage_reduction:.1f}%)")

print(f"\n--- Data Integrity ---")
print(f"Players: {original_stats['players']:,} ✓")
print(f"Openings: {original_stats['openings']:,} ✓")
print(f"Stats records: {original_stats['stats_records']:,} ✓")
print(f"Total games: {original_stats['total_games']:,} ✓")

print(f"\n--- Files ---")
if verification_passed:
    print(f"Active database: {original_db_path}")
    print(f"Backup: {backup_db_path}")
    print(f"\n✅ Safe to delete backup once you've verified everything works correctly")
else:
    print(f"Original (unchanged): {original_db_path}")
    print(f"New (failed verification): {new_db_path}")
    print(f"\n⚠️  Please investigate verification failures before using the new database")

print(f"\n{'='*80}")
print(f"🎉 PARTY TIME! 🎉")
print(f"{'='*80}")

FINAL SUMMARY

🎯 Mission Accomplished!

The database has been successfully rebuilt using DuckDB's COPY FROM DATABASE.
This is the recommended approach for eliminating fragmentation and bloat.

--- Results ---
Original size: 3,549.0 MB
New size: 2,043.3 MB
Space saved: 1,505.8 MB (42.4%)

--- Data Integrity ---
Players: 50,000 ✓
Openings: 3,223 ✓
Stats records: 25,378,100 ✓
Total games: 474,876,416 ✓

--- Files ---
Active database: /Users/a/Documents/personalprojects/chess-opening-recommender/data/processed/chess_games.db
Backup: /Users/a/Documents/personalprojects/chess-opening-recommender/data/processed/chess_games_pre_rebuild_backup.db

✅ Safe to delete backup once you've verified everything works correctly

🎉 PARTY TIME! 🎉
