# 🎯 Compute Optimal Restoration Parameters

**Goal**: For each damaged/undamaged image pair, find the optimal color correction and sharpening parameters that maximize similarity.

**What we'll do**:
1. Load paired dataset (damaged + undamaged)
2. For each pair, test different restoration parameter combinations
3. Measure similarity using PSNR and SSIM
4. Record optimal parameters that give highest similarity
5. Create training dataset: [14 FFT features] → [optimal parameters]

**Output**: CSV with features + optimal restoration parameters for regression training

---

## 📦 Step 1: Import Libraries

In [1]:
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from tqdm import tqdm
from skimage.metrics import structural_similarity as ssim
from skimage.metrics import peak_signal_noise_ratio as psnr

# Import our modules
import sys
sys.path.append('../')
from src.basics.optimized_restoration import restore_image_optimized
from src.ml.feature_extractor import extract_ml_features

print('✅ All libraries imported successfully!')

✅ All libraries imported successfully!


## 📂 Step 2: Load Paired Dataset

In [2]:
# Paths to damaged and undamaged images
damaged_dir = '../data/raw/AI_for_Art_Restoration_2/paired_dataset_art/damaged'
undamaged_dir = '../data/raw/AI_for_Art_Restoration_2/paired_dataset_art/undamaged'

# Get list of image pairs
damaged_files = sorted([f for f in os.listdir(damaged_dir) if f.endswith(('.png', '.jpg', '.jpeg'))])
undamaged_files = sorted([f for f in os.listdir(undamaged_dir) if f.endswith(('.png', '.jpg', '.jpeg'))])

# Find matching pairs
paired_files = [f for f in damaged_files if f in undamaged_files]

print(f'📊 Dataset Statistics:')
print('='*70)
print(f'Damaged images found: {len(damaged_files)}')
print(f'Undamaged images found: {len(undamaged_files)}')
print(f'Matching pairs: {len(paired_files)}')
print()
print(f'First 5 pairs: {paired_files[:5]}')

📊 Dataset Statistics:
Damaged images found: 114
Undamaged images found: 114
Matching pairs: 112

First 5 pairs: ['1.png', '10.png', '100.png', '101.jpg', '102.jpg']


## 🔍 Step 3: Define Parameter Search Space

We'll test different combinations of restoration parameters to find what works best for each image.

In [3]:
# Parameter ranges to test
# We'll test a grid of parameters and find the best combination

COLOR_METHODS = ['white_balance', 'none']  # none means skip color correction
SHARPEN_SIGMA = [0.5, 1.0, 1.5, 2.0]  # Gaussian blur sigma
SHARPEN_STRENGTH = [0.5, 1.0, 1.5, 2.0, 2.5]  # Sharpening intensity

print('🎯 Parameter Search Space:')
print('='*70)
print(f'Color correction methods: {COLOR_METHODS}')
print(f'Sharpening sigma values: {SHARPEN_SIGMA}')
print(f'Sharpening strength values: {SHARPEN_STRENGTH}')
print()
total_combinations = len(COLOR_METHODS) * len(SHARPEN_SIGMA) * len(SHARPEN_STRENGTH)
print(f'Total parameter combinations to test: {total_combinations}')
print(f'Total evaluations: {len(paired_files) * total_combinations}')

🎯 Parameter Search Space:
Color correction methods: ['white_balance', 'none']
Sharpening sigma values: [0.5, 1.0, 1.5, 2.0]
Sharpening strength values: [0.5, 1.0, 1.5, 2.0, 2.5]

Total parameter combinations to test: 40
Total evaluations: 4480


## 🧪 Step 4: Find Optimal Parameters for Each Image Pair

For each damaged image:
1. Try all parameter combinations
2. Apply restoration
3. Compare with ground truth (undamaged image)
4. Record parameters that give highest PSNR + SSIM

In [4]:
def compute_similarity_score(restored, undamaged):
    """
    Compute combined similarity score using PSNR and SSIM.
    
    Args:
        restored: Restored image
        undamaged: Ground truth undamaged image
    
    Returns:
        score: Combined score (higher is better)
        psnr_val: PSNR value
        ssim_val: SSIM value
    """
    # Resize images to same size if needed
    if restored.shape != undamaged.shape:
        restored = cv2.resize(restored, (undamaged.shape[1], undamaged.shape[0]))
    
    # Calculate PSNR (higher is better, typically 20-50 dB)
    psnr_val = psnr(undamaged, restored)
    
    # Calculate SSIM (higher is better, 0-1 range)
    # For color images, compute SSIM per channel and average
    if len(restored.shape) == 3:
        ssim_val = ssim(undamaged, restored, channel_axis=2, data_range=255)
    else:
        ssim_val = ssim(undamaged, restored, data_range=255)
    
    # Combined score: normalize PSNR (divide by 50) and weight both metrics equally
    # This gives a score roughly in 0-1 range
    combined_score = 0.5 * (psnr_val / 50.0) + 0.5 * ssim_val
    
    return combined_score, psnr_val, ssim_val

print('✅ Similarity function defined!')

✅ Similarity function defined!


In [5]:
def find_optimal_parameters(damaged_path, undamaged_path):
    """
    Find optimal restoration parameters for a damaged/undamaged image pair.
    
    Args:
        damaged_path: Path to damaged image
        undamaged_path: Path to undamaged image
    
    Returns:
        best_params: Dictionary with optimal parameters
        best_score: Best similarity score achieved
        best_psnr: PSNR at best parameters
        best_ssim: SSIM at best parameters
    """
    # Load images
    damaged = cv2.imread(damaged_path)
    undamaged = cv2.imread(undamaged_path)
    
    if damaged is None or undamaged is None:
        return None, 0, 0, 0
    
    best_score = -1
    best_params = {}
    best_psnr = 0
    best_ssim = 0
    
    # Test all parameter combinations
    for color_method in COLOR_METHODS:
        for sigma in SHARPEN_SIGMA:
            for strength in SHARPEN_STRENGTH:
                try:
                    # Apply restoration with these parameters
                    if color_method == 'none':
                        # Only sharpening, no color correction
                        from src.basics.advanced_restoration import unsharp_mask
                        restored = unsharp_mask(damaged.copy(), sigma=sigma, strength=strength)
                    else:
                        restored = restore_image_optimized(
                            damaged.copy(),
                            color_method=color_method,
                            sharpen_sigma=sigma,
                            sharpen_strength=strength
                        )
                    
                    # Compute similarity with ground truth
                    score, psnr_val, ssim_val = compute_similarity_score(restored, undamaged)
                    
                    # Update best parameters if this is better
                    if score > best_score:
                        best_score = score
                        best_psnr = psnr_val
                        best_ssim = ssim_val
                        best_params = {
                            'color_method': color_method,
                            'apply_color_correction': 1 if color_method != 'none' else 0,
                            'sharpen_sigma': sigma,
                            'sharpen_strength': strength
                        }
                
                except Exception as e:
                    continue
    
    return best_params, best_score, best_psnr, best_ssim

print('✅ Parameter optimization function defined!')

✅ Parameter optimization function defined!


In [6]:
# Process all image pairs and find optimal parameters
print('🔄 Processing image pairs to find optimal parameters...')
print('='*70)
print('This may take several minutes...')
print()

results = []

for filename in tqdm(paired_files[:20], desc='Optimizing parameters'):  # Start with 20 images for testing
    damaged_path = os.path.join(damaged_dir, filename)
    undamaged_path = os.path.join(undamaged_dir, filename)
    
    # Find optimal parameters
    best_params, score, psnr_val, ssim_val = find_optimal_parameters(damaged_path, undamaged_path)
    
    if best_params:
        # Extract features from damaged image
        try:
            features, feature_names = extract_ml_features(damaged_path)
            
            # Combine features and optimal parameters
            result = {
                'filename': filename,
                'similarity_score': score,
                'psnr': psnr_val,
                'ssim': ssim_val,
            }
            
            # Add features
            for name, value in zip(feature_names, features):
                result[name] = value
            
            # Add optimal parameters
            result.update(best_params)
            
            results.append(result)
        
        except Exception as e:
            print(f'Error processing {filename}: {e}')

print()
print(f'✅ Processed {len(results)} image pairs successfully!')

🔄 Processing image pairs to find optimal parameters...
This may take several minutes...



Optimizing parameters:   0%|          | 0/20 [02:18<?, ?it/s]



KeyboardInterrupt: 

## 📊 Step 5: Analyze Optimal Parameters

In [None]:
# Convert to DataFrame
df_optimal = pd.DataFrame(results)

print('📊 Optimal Parameters Dataset:')
print('='*70)
print(f'Total samples: {len(df_optimal)}')
print(f'Columns: {len(df_optimal.columns)}')
print()
print('Sample data:')
df_optimal.head()

In [None]:
# Statistical summary of optimal parameters
print('📈 Statistical Summary of Optimal Parameters:')
print('='*70)
print()
print('Similarity Metrics:')
print(df_optimal[['similarity_score', 'psnr', 'ssim']].describe())
print()
print('Optimal Restoration Parameters:')
print(df_optimal[['apply_color_correction', 'sharpen_sigma', 'sharpen_strength']].describe())

In [None]:
# Visualize parameter distributions
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Similarity metrics
axes[0, 0].hist(df_optimal['similarity_score'], bins=20, color='#4ecdc4', edgecolor='black')
axes[0, 0].set_title('Combined Similarity Score', fontweight='bold')
axes[0, 0].set_xlabel('Score')
axes[0, 0].set_ylabel('Frequency')

axes[0, 1].hist(df_optimal['psnr'], bins=20, color='#ff6b6b', edgecolor='black')
axes[0, 1].set_title('PSNR (dB)', fontweight='bold')
axes[0, 1].set_xlabel('PSNR')
axes[0, 1].set_ylabel('Frequency')

axes[0, 2].hist(df_optimal['ssim'], bins=20, color='#95e1d3', edgecolor='black')
axes[0, 2].set_title('SSIM', fontweight='bold')
axes[0, 2].set_xlabel('SSIM')
axes[0, 2].set_ylabel('Frequency')

# Optimal parameters
axes[1, 0].hist(df_optimal['apply_color_correction'], bins=2, color='#ffeaa7', edgecolor='black')
axes[1, 0].set_title('Color Correction Usage', fontweight='bold')
axes[1, 0].set_xlabel('Apply (0=No, 1=Yes)')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].set_xticks([0, 1])

axes[1, 1].hist(df_optimal['sharpen_sigma'], bins=len(SHARPEN_SIGMA), color='#dfe6e9', edgecolor='black')
axes[1, 1].set_title('Optimal Sharpen Sigma', fontweight='bold')
axes[1, 1].set_xlabel('Sigma')
axes[1, 1].set_ylabel('Frequency')

axes[1, 2].hist(df_optimal['sharpen_strength'], bins=len(SHARPEN_STRENGTH), color='#fab1a0', edgecolor='black')
axes[1, 2].set_title('Optimal Sharpen Strength', fontweight='bold')
axes[1, 2].set_xlabel('Strength')
axes[1, 2].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

print('✅ Parameter distribution analysis complete!')

In [None]:
# Analyze most common optimal settings
print('🎯 Most Common Optimal Settings:')
print('='*70)
print()
print(f"Color correction used: {df_optimal['apply_color_correction'].sum()} / {len(df_optimal)} images ({df_optimal['apply_color_correction'].sum()/len(df_optimal)*100:.1f}%)")
print()
print('Most common sharpen_sigma values:')
print(df_optimal['sharpen_sigma'].value_counts().head())
print()
print('Most common sharpen_strength values:')
print(df_optimal['sharpen_strength'].value_counts().head())
print()
print('Average PSNR improvement:', df_optimal['psnr'].mean(), 'dB')
print('Average SSIM improvement:', df_optimal['ssim'].mean())

## 💾 Step 6: Save Training Dataset

In [None]:
# Save to CSV for regression training
output_path = '../data/processed/regression_training_data.csv'
df_optimal.to_csv(output_path, index=False)

print('💾 Training Dataset Saved!')
print('='*70)
print(f'File: {output_path}')
print(f'Samples: {len(df_optimal)}')
print(f'Features: {len([col for col in df_optimal.columns if col in feature_names])}')
print(f'Target variables: apply_color_correction, sharpen_sigma, sharpen_strength')
print()
print('✅ Ready for regression model training!')

## 🎨 Step 7: Visualize Sample Results

In [None]:
# Show before/after examples with optimal parameters
n_examples = 3
sample_results = df_optimal.sample(min(n_examples, len(df_optimal)))

fig, axes = plt.subplots(n_examples, 3, figsize=(15, 5*n_examples))
if n_examples == 1:
    axes = axes.reshape(1, -1)

for idx, (_, row) in enumerate(sample_results.iterrows()):
    filename = row['filename']
    
    # Load images
    damaged_path = os.path.join(damaged_dir, filename)
    undamaged_path = os.path.join(undamaged_dir, filename)
    
    damaged = cv2.imread(damaged_path)
    undamaged = cv2.imread(undamaged_path)
    
    # Apply optimal restoration
    if row['apply_color_correction'] == 1:
        restored = restore_image_optimized(
            damaged.copy(),
            color_method=row['color_method'],
            sharpen_sigma=row['sharpen_sigma'],
            sharpen_strength=row['sharpen_strength']
        )
    else:
        from src.basics.advanced_restoration import unsharp_mask
        restored = unsharp_mask(
            damaged.copy(),
            sigma=row['sharpen_sigma'],
            strength=row['sharpen_strength']
        )
    
    # Convert BGR to RGB for display
    damaged = cv2.cvtColor(damaged, cv2.COLOR_BGR2RGB)
    restored = cv2.cvtColor(restored, cv2.COLOR_BGR2RGB)
    undamaged = cv2.cvtColor(undamaged, cv2.COLOR_BGR2RGB)
    
    # Display
    axes[idx, 0].imshow(damaged)
    axes[idx, 0].set_title('Damaged', fontweight='bold')
    axes[idx, 0].axis('off')
    
    axes[idx, 1].imshow(restored)
    axes[idx, 1].set_title(f'Restored\nPSNR: {row["psnr"]:.2f} dB, SSIM: {row["ssim"]:.3f}', fontweight='bold')
    axes[idx, 1].axis('off')
    
    axes[idx, 2].imshow(undamaged)
    axes[idx, 2].set_title('Ground Truth', fontweight='bold')
    axes[idx, 2].axis('off')
    
    # Add parameter info
    param_text = f"Color: {'Yes' if row['apply_color_correction'] == 1 else 'No'}\n"
    param_text += f"Sigma: {row['sharpen_sigma']}, Strength: {row['sharpen_strength']}"
    axes[idx, 1].text(0.5, -0.1, param_text, ha='center', transform=axes[idx, 1].transAxes, fontsize=9)

plt.tight_layout()
plt.show()

print('✅ Visual comparison complete!')

## 🎉 Summary

### ✅ What We Accomplished:
1. Loaded paired damaged/undamaged images
2. Tested multiple restoration parameter combinations
3. Found optimal parameters that maximize similarity to ground truth
4. Extracted FFT features from damaged images
5. Created training dataset: **[14 features] → [optimal parameters]**

### 📊 Dataset Created:
- **Input**: 14 FFT features (damage characteristics)
- **Output**: 3 optimal parameters (apply_color_correction, sharpen_sigma, sharpen_strength)
- **Quality**: Measured by PSNR and SSIM against ground truth

### 🚀 Next Steps:
1. **Train regression model** in next notebook (`4_train_regression_model.ipynb`)
2. **Use cross-validation** to ensure model generalizes well
3. **Test on new images** and validate restoration quality
4. **Deploy smart restoration** system

### 💡 Key Insights:
- We now have **optimal restoration recipes** for each image
- Model will learn to predict these recipes from damage features
- This ensures we **never over-restore** (trained on what actually works!)

**Ready to train the regression model!** 🤖✨