# SVD Image Compression Experiments

This notebook provides a comprehensive analysis of image compression using Singular Value Decomposition (SVD). We'll explore the trade-offs between compression ratio and image quality across different image types and compression levels.

## Table of Contents
1. [Setup and Imports](#setup)
2. [Data Loading and Preprocessing](#data-loading)
3. [SVD Compression Analysis](#svd-analysis)
4. [Quality Metrics Evaluation](#quality-metrics)
5. [Visualization and Results](#visualization)
6. [Comparative Analysis](#comparative-analysis)
7. [Conclusions](#conclusions)

## 1. Setup and Imports {#setup}

First, let's import all necessary libraries and set up our environment for reproducible experiments.

In [None]:
# Standard library imports
import sys
import os
from pathlib import Path
import warnings

# Add src directory to path for imports
sys.path.append('../src')

# Scientific computing
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Image processing
from PIL import Image
import cv2

# Our custom modules
from compression.svd_compressor import SVDCompressor
from data.image_loader import ImageLoader
from data.dataset_manager import DatasetManager
from evaluation.metrics_calculator import MetricsCalculator
from evaluation.performance_profiler import PerformanceProfiler
from visualization.plot_generator import PlotGenerator
from batch.experiment_runner import ExperimentRunner, ExperimentConfig

# Configure warnings and display
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8')
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

# Set random seeds for reproducibility
np.random.seed(42)

print("✅ All imports successful!")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Matplotlib version: {plt.matplotlib.__version__}")

## 2. Data Loading and Preprocessing {#data-loading}

Let's load our image datasets and prepare them for compression experiments.

In [None]:
# Initialize our core components
dataset_manager = DatasetManager()
image_loader = ImageLoader()
compressor = SVDCompressor()
metrics_calc = MetricsCalculator()
profiler = PerformanceProfiler()
plot_gen = PlotGenerator()

print("✅ Core components initialized!")

In [None]:
# Load and organize datasets
print("Loading datasets...")
datasets = dataset_manager.load_datasets()

print(f"Available datasets: {list(datasets.keys())}")
for dataset_name, images in datasets.items():
    print(f"  {dataset_name}: {len(images)} images")

# Generate dataset manifest
manifest_df = dataset_manager.generate_manifest()
print(f"\nDataset manifest generated with {len(manifest_df)} entries")
print(manifest_df.head())

In [None]:
# Load a sample image from each dataset for demonstration
sample_images = {}
sample_paths = {}

for dataset_name, image_paths in datasets.items():
    if image_paths:  # Check if dataset has images
        # Load first image as sample
        sample_path = image_paths[0]
        sample_image = image_loader.load_image(sample_path)
        sample_images[dataset_name] = sample_image
        sample_paths[dataset_name] = sample_path
        print(f"Loaded sample from {dataset_name}: {sample_path.name} - Shape: {sample_image.shape}")

print(f"\n✅ Loaded {len(sample_images)} sample images")

### Visualize Sample Images

Let's take a look at our sample images to understand the different types of content we'll be compressing.

In [None]:
# Create a grid showing sample images from each dataset
if sample_images:
    images_list = list(sample_images.values())
    titles_list = [f"{name.title()} Sample" for name in sample_images.keys()]
    
    fig = plot_gen.create_image_grid(
        images_list, 
        titles_list,
        figsize=(15, 5)
    )
    plt.suptitle("Sample Images from Each Dataset", fontsize=16, fontweight='bold')
    plt.tight_layout()
    plt.show()
else:
    print("⚠️ No sample images available. Please ensure datasets are properly loaded.")

## 3. SVD Compression Analysis {#svd-analysis}

Now let's analyze the SVD compression characteristics of our sample images.

In [None]:
# Analyze singular value spectra for each sample image
singular_value_data = {}

for dataset_name, image in sample_images.items():
    # Convert to grayscale for singular value analysis
    if len(image.shape) == 3:
        gray_image = np.mean(image, axis=2)
    else:
        gray_image = image
    
    # Get singular value spectrum
    singular_values = compressor.singular_value_spectrum(gray_image)
    singular_value_data[dataset_name] = singular_values
    
    print(f"{dataset_name}: {len(singular_values)} singular values")
    print(f"  Top 5 values: {singular_values[:5]}")
    print(f"  Energy in top 10: {np.sum(singular_values[:10]**2) / np.sum(singular_values**2):.3f}")
    print()

In [None]:
# Plot singular value decay for each dataset
fig, axes = plt.subplots(1, len(singular_value_data), figsize=(15, 5))
if len(singular_value_data) == 1:
    axes = [axes]

for i, (dataset_name, sv_data) in enumerate(singular_value_data.items()):
    ax = axes[i]
    
    # Plot singular values on log scale
    indices = np.arange(1, len(sv_data) + 1)
    ax.semilogy(indices, sv_data, 'b-', linewidth=2, alpha=0.8)
    
    # Highlight first 50 values
    ax.semilogy(indices[:50], sv_data[:50], 'r-', linewidth=3, alpha=0.9)
    
    ax.set_title(f'{dataset_name.title()} Singular Values', fontweight='bold')
    ax.set_xlabel('Singular Value Index')
    ax.set_ylabel('Singular Value (log scale)')
    ax.grid(True, alpha=0.3)
    
    # Add annotation for energy concentration
    energy_50 = np.sum(sv_data[:50]**2) / np.sum(sv_data**2)
    ax.text(0.05, 0.95, f'Energy in top 50: {energy_50:.1%}', 
            transform=ax.transAxes, fontsize=10, 
            bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

plt.tight_layout()
plt.show()

### Compression Demonstration

Let's demonstrate the compression process with different k-values on one sample image.

In [None]:
# Choose first available sample for demonstration
demo_dataset = list(sample_images.keys())[0]
demo_image = sample_images[demo_dataset]

print(f"Demonstrating compression on {demo_dataset} image")
print(f"Original image shape: {demo_image.shape}")

# Test different k-values
k_values = [5, 15, 30, 60]
compressed_images = []
compression_info = []

# Add original image
compressed_images.append(demo_image)
titles = ['Original']

for k in k_values:
    # Compress image
    compressed_img, metadata = compressor.compress_image(demo_image, k)
    compressed_images.append(compressed_img)
    
    # Calculate metrics
    psnr = metrics_calc.calculate_psnr(demo_image, compressed_img)
    ssim = metrics_calc.calculate_ssim(demo_image, compressed_img)
    
    titles.append(f'k={k}\nPSNR: {psnr:.1f}dB\nSSIM: {ssim:.3f}')
    compression_info.append({
        'k': k,
        'psnr': psnr,
        'ssim': ssim,
        'compression_ratio': metadata['compression_ratio']
    })

# Display compression results
fig = plot_gen.create_image_grid(
    compressed_images,
    titles,
    nrows=1,
    figsize=(20, 4)
)
plt.suptitle(f'SVD Compression Demonstration - {demo_dataset.title()}', 
             fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

# Print compression statistics
print("\nCompression Statistics:")
for info in compression_info:
    print(f"k={info['k']:2d}: Ratio={info['compression_ratio']:.2f}x, "
          f"PSNR={info['psnr']:.1f}dB, SSIM={info['ssim']:.3f}")

## 4. Quality Metrics Evaluation {#quality-metrics}

Let's run systematic experiments to evaluate quality metrics across different compression levels.

In [None]:
# Configure experiment parameters
experiment_config = ExperimentConfig(
    datasets=list(datasets.keys()),
    k_values=list(range(5, 101, 5)),  # k from 5 to 100 in steps of 5
    output_dir=Path('../results'),
    save_images=False,  # Don't save images in notebook to save space
    parallel=True,
    random_seed=42
)

print(f"Experiment configuration:")
print(f"  Datasets: {experiment_config.datasets}")
print(f"  K-values: {len(experiment_config.k_values)} values from {min(experiment_config.k_values)} to {max(experiment_config.k_values)}")
print(f"  Output directory: {experiment_config.output_dir}")

In [None]:
# Run batch experiments
print("Running batch experiments...")
print("This may take a few minutes depending on the number of images and k-values.")

experiment_runner = ExperimentRunner()
results_df = experiment_runner.run_batch_experiments(experiment_config)

print(f"\n✅ Experiments completed!")
print(f"Generated {len(results_df)} result records")
print(f"\nResults summary:")
print(results_df.describe())

In [None]:
# Display sample results
print("Sample results:")
print(results_df.head(10))

print("\nResults by dataset:")
print(results_df.groupby('dataset').agg({
    'psnr': ['mean', 'std'],
    'ssim': ['mean', 'std'],
    'compression_ratio': ['mean', 'std']
}).round(3))

## 5. Visualization and Results {#visualization}

Now let's create comprehensive visualizations of our experimental results.

In [None]:
# Plot PSNR vs k-value for all datasets
fig = plot_gen.plot_quality_vs_k(
    results_df, 
    metric='psnr',
    save_path=Path('../results/plots/psnr_vs_k_notebook.png')
)
plt.show()

# Plot SSIM vs k-value for all datasets
fig = plot_gen.plot_quality_vs_k(
    results_df, 
    metric='ssim',
    save_path=Path('../results/plots/ssim_vs_k_notebook.png')
)
plt.show()

In [None]:
# Create compression analysis scatter plots
fig = plot_gen.plot_compression_analysis(
    results_df,
    quality_metric='psnr',
    save_path=Path('../results/plots/compression_vs_psnr_notebook.png')
)
plt.show()

fig = plot_gen.plot_compression_analysis(
    results_df,
    quality_metric='ssim',
    save_path=Path('../results/plots/compression_vs_ssim_notebook.png')
)
plt.show()

In [None]:
# Create multi-metric comparison plot
fig = plot_gen.plot_multiple_metrics(
    results_df,
    metrics=['psnr', 'ssim'],
    save_path=Path('../results/plots/multi_metrics_notebook.png')
)
plt.show()

### Performance Analysis

Let's analyze the computational performance of our compression algorithm.

In [None]:
# Analyze processing times
if 'processing_time' in results_df.columns:
    print("Processing Time Analysis:")
    print(f"Mean processing time: {results_df['processing_time'].mean():.4f} seconds")
    print(f"Std processing time: {results_df['processing_time'].std():.4f} seconds")
    
    # Plot processing time vs k-value
    fig, ax = plt.subplots(figsize=(10, 6))
    
    for dataset in results_df['dataset'].unique():
        dataset_data = results_df[results_df['dataset'] == dataset]
        grouped = dataset_data.groupby('k_value')['processing_time'].mean().reset_index()
        
        ax.plot(grouped['k_value'], grouped['processing_time'], 
                marker='o', label=dataset.title(), linewidth=2)
    
    ax.set_xlabel('Number of Singular Values (k)')
    ax.set_ylabel('Processing Time (seconds)')
    ax.set_title('Processing Time vs k-value', fontweight='bold')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
else:
    print("Processing time data not available in results.")

## 6. Comparative Analysis {#comparative-analysis}

Let's perform detailed comparative analysis across different image types and compression levels.

In [None]:
# Find optimal k-values for different quality thresholds
quality_thresholds = {
    'High Quality': {'psnr_min': 30, 'ssim_min': 0.9},
    'Medium Quality': {'psnr_min': 25, 'ssim_min': 0.8},
    'Low Quality': {'psnr_min': 20, 'ssim_min': 0.7}
}

optimal_k_analysis = {}

for quality_level, thresholds in quality_thresholds.items():
    print(f"\n{quality_level} Analysis (PSNR ≥ {thresholds['psnr_min']}dB, SSIM ≥ {thresholds['ssim_min']}):")
    
    for dataset in results_df['dataset'].unique():
        dataset_data = results_df[results_df['dataset'] == dataset]
        
        # Filter by quality thresholds
        quality_data = dataset_data[
            (dataset_data['psnr'] >= thresholds['psnr_min']) & 
            (dataset_data['ssim'] >= thresholds['ssim_min'])
        ]
        
        if len(quality_data) > 0:
            # Find minimum k that meets quality requirements
            min_k = quality_data['k_value'].min()
            optimal_row = quality_data[quality_data['k_value'] == min_k].iloc[0]
            
            print(f"  {dataset}: k={min_k}, "
                  f"PSNR={optimal_row['psnr']:.1f}dB, "
                  f"SSIM={optimal_row['ssim']:.3f}, "
                  f"Ratio={optimal_row['compression_ratio']:.2f}x")
            
            optimal_k_analysis[f"{quality_level}_{dataset}"] = {
                'k': min_k,
                'psnr': optimal_row['psnr'],
                'ssim': optimal_row['ssim'],
                'compression_ratio': optimal_row['compression_ratio']
            }
        else:
            print(f"  {dataset}: No k-value meets quality requirements")

In [None]:
# Create summary statistics table
summary_stats = results_df.groupby(['dataset', 'k_value']).agg({
    'psnr': ['mean', 'std'],
    'ssim': ['mean', 'std'],
    'compression_ratio': ['mean', 'std']
}).round(3)

print("Summary Statistics by Dataset and K-value:")
print(summary_stats.head(15))

# Save summary to CSV
summary_path = Path('../results/summary_statistics_notebook.csv')
summary_stats.to_csv(summary_path)
print(f"\n📊 Summary statistics saved to {summary_path}")

In [None]:
# Correlation analysis between metrics
correlation_matrix = results_df[['k_value', 'psnr', 'ssim', 'mse', 'compression_ratio']].corr()

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0,
            square=True, linewidths=0.5, cbar_kws={"shrink": .8})
plt.title('Correlation Matrix of Compression Metrics', fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

print("Key Correlations:")
print(f"k_value vs PSNR: {correlation_matrix.loc['k_value', 'psnr']:.3f}")
print(f"k_value vs SSIM: {correlation_matrix.loc['k_value', 'ssim']:.3f}")
print(f"PSNR vs SSIM: {correlation_matrix.loc['psnr', 'ssim']:.3f}")
print(f"Compression Ratio vs PSNR: {correlation_matrix.loc['compression_ratio', 'psnr']:.3f}")

## 7. Conclusions {#conclusions}

Based on our comprehensive analysis of SVD image compression, we can draw several important conclusions:

In [None]:
# Generate automated conclusions based on results
print("📋 EXPERIMENTAL CONCLUSIONS")
print("=" * 50)

# Overall performance summary
overall_stats = results_df.groupby('k_value').agg({
    'psnr': 'mean',
    'ssim': 'mean',
    'compression_ratio': 'mean'
})

# Find sweet spot k-values
good_quality_k = overall_stats[(overall_stats['psnr'] >= 30) & (overall_stats['ssim'] >= 0.85)]
if len(good_quality_k) > 0:
    optimal_k = good_quality_k.index.min()
    print(f"\n1. OPTIMAL COMPRESSION LEVEL:")
    print(f"   k = {optimal_k} provides good quality (PSNR ≥ 30dB, SSIM ≥ 0.85)")
    print(f"   at this level: PSNR = {good_quality_k.loc[optimal_k, 'psnr']:.1f}dB, "
          f"SSIM = {good_quality_k.loc[optimal_k, 'ssim']:.3f}")

# Dataset comparison
dataset_performance = results_df.groupby('dataset').agg({
    'psnr': 'mean',
    'ssim': 'mean',
    'compression_ratio': 'mean'
}).round(3)

best_dataset = dataset_performance['psnr'].idxmax()
worst_dataset = dataset_performance['psnr'].idxmin()

print(f"\n2. DATASET PERFORMANCE:")
print(f"   Best performing: {best_dataset} (avg PSNR: {dataset_performance.loc[best_dataset, 'psnr']:.1f}dB)")
print(f"   Most challenging: {worst_dataset} (avg PSNR: {dataset_performance.loc[worst_dataset, 'psnr']:.1f}dB)")

# Compression efficiency
high_compression = results_df[results_df['compression_ratio'] >= 5.0]
if len(high_compression) > 0:
    avg_quality_high_comp = high_compression['psnr'].mean()
    print(f"\n3. COMPRESSION EFFICIENCY:")
    print(f"   At 5x+ compression: average PSNR = {avg_quality_high_comp:.1f}dB")
    print(f"   High compression achievable with acceptable quality loss")

# Key findings
print(f"\n4. KEY FINDINGS:")
print(f"   • SVD compression shows strong correlation between k and quality")
print(f"   • PSNR and SSIM are highly correlated (r = {correlation_matrix.loc['psnr', 'ssim']:.3f})")
print(f"   • Different image types show varying compression characteristics")
print(f"   • Energy concentration in top singular values enables effective compression")

print(f"\n5. RECOMMENDATIONS:")
print(f"   • For high quality: use k ≥ {optimal_k if 'optimal_k' in locals() else 50}")
print(f"   • For balanced compression: k = 20-40 range provides good trade-offs")
print(f"   • Consider image content type when selecting compression parameters")
print(f"   • Monitor both PSNR and SSIM for comprehensive quality assessment")

print("\n" + "=" * 50)
print("✅ Analysis Complete!")

### Save Results and Plots

Finally, let's save our results and generated plots for future reference.

In [None]:
# Save complete results DataFrame
results_path = Path('../results/notebook_experiment_results.csv')
results_df.to_csv(results_path, index=False)
print(f"📊 Complete results saved to {results_path}")

# Save optimal k analysis
if optimal_k_analysis:
    optimal_df = pd.DataFrame(optimal_k_analysis).T
    optimal_path = Path('../results/optimal_k_analysis.csv')
    optimal_df.to_csv(optimal_path)
    print(f"🎯 Optimal k analysis saved to {optimal_path}")

# Create a final summary plot
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: PSNR vs k
for dataset in results_df['dataset'].unique():
    dataset_data = results_df[results_df['dataset'] == dataset]
    grouped = dataset_data.groupby('k_value')['psnr'].mean().reset_index()
    ax1.plot(grouped['k_value'], grouped['psnr'], marker='o', label=dataset.title())
ax1.set_xlabel('k-value')
ax1.set_ylabel('PSNR (dB)')
ax1.set_title('PSNR vs k-value')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot 2: SSIM vs k
for dataset in results_df['dataset'].unique():
    dataset_data = results_df[results_df['dataset'] == dataset]
    grouped = dataset_data.groupby('k_value')['ssim'].mean().reset_index()
    ax2.plot(grouped['k_value'], grouped['ssim'], marker='o', label=dataset.title())
ax2.set_xlabel('k-value')
ax2.set_ylabel('SSIM')
ax2.set_title('SSIM vs k-value')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Plot 3: Compression ratio vs PSNR
for dataset in results_df['dataset'].unique():
    dataset_data = results_df[results_df['dataset'] == dataset]
    ax3.scatter(dataset_data['compression_ratio'], dataset_data['psnr'], 
               alpha=0.6, label=dataset.title())
ax3.set_xlabel('Compression Ratio')
ax3.set_ylabel('PSNR (dB)')
ax3.set_title('Compression Ratio vs PSNR')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Plot 4: Dataset performance comparison
dataset_means = results_df.groupby('dataset')[['psnr', 'ssim']].mean()
x_pos = np.arange(len(dataset_means))
width = 0.35

ax4.bar(x_pos - width/2, dataset_means['psnr'], width, label='PSNR', alpha=0.8)
ax4_twin = ax4.twinx()
ax4_twin.bar(x_pos + width/2, dataset_means['ssim'], width, 
             label='SSIM', alpha=0.8, color='orange')

ax4.set_xlabel('Dataset')
ax4.set_ylabel('PSNR (dB)', color='blue')
ax4_twin.set_ylabel('SSIM', color='orange')
ax4.set_title('Average Performance by Dataset')
ax4.set_xticks(x_pos)
ax4.set_xticklabels([d.title() for d in dataset_means.index])
ax4.grid(True, alpha=0.3)

plt.suptitle('SVD Image Compression - Complete Analysis Summary', 
             fontsize=16, fontweight='bold')
plt.tight_layout()

# Save summary plot
summary_plot_path = Path('../results/plots/complete_analysis_summary.png')
plt.savefig(summary_plot_path, dpi=300, bbox_inches='tight')
plt.show()

print(f"📈 Summary plot saved to {summary_plot_path}")
print("\n🎉 Notebook analysis complete! All results and plots have been saved.")

---

## Notebook Summary

This notebook has provided a comprehensive analysis of SVD image compression, including:

1. **Data Loading**: Systematic loading and preprocessing of image datasets
2. **SVD Analysis**: Examination of singular value spectra and energy concentration
3. **Compression Demonstration**: Visual comparison of different compression levels
4. **Quality Evaluation**: Systematic measurement of PSNR, SSIM, and other metrics
5. **Performance Analysis**: Computational efficiency and processing time evaluation
6. **Comparative Study**: Cross-dataset analysis and optimal parameter identification
7. **Results Visualization**: Professional plots and statistical summaries

The results demonstrate that SVD provides an effective method for image compression with tunable quality-compression trade-offs. The analysis shows clear relationships between the number of singular values retained (k) and resulting image quality, with different image types showing varying compression characteristics.

All results, plots, and analysis summaries have been saved to the `results/` directory for further use and presentation.

---

*This notebook was generated as part of the SVD Image Compression project. For more information, see the project documentation and source code.*