# üî¨ PPN2V Denoising Pipeline - Google Colab

**Parametric Probabilistic Noise2Void** for DATASET_01

This notebook runs the complete denoising pipeline:
1. Mount Google Drive & Clone Repository
2. Install Dependencies
3. Load Data from Drive
4. Create Noise Models (Histogram + GMM)
5. Train PN2V Network
6. Generate Predictions + Uncertainty Maps
7. Save Results to Drive

---

**‚ö†Ô∏è FIRST: Enable GPU Runtime!**
- Go to `Runtime` ‚Üí `Change runtime type` ‚Üí Select `GPU` ‚Üí `Save`

## üìÅ Section 1: Mount Google Drive

This connects your Google Drive so we can read data and save results.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

print("‚úÖ Google Drive mounted!")

In [None]:
# Verify your data exists
!ls "/content/drive/MyDrive/ppn2v_data/DATASET_01/"

## üì¶ Section 2: Clone Repository & Install Dependencies

In [None]:
# Remove old clone if exists
!rm -rf /content/PPN2V

# Clone your repository from GitHub
!git clone https://github.com/ZurvanAkarna/PPN2V.git /content/PPN2V

print("\n‚úÖ Repository cloned!")
!ls /content/PPN2V

In [None]:
# Change to repo directory
%cd /content/PPN2V

# Install the PPN2V package
!pip install -e . -q
!pip install tifffile scikit-image -q

print("‚úÖ Dependencies installed!")

In [None]:
# Test imports and check GPU
import torch
import numpy as np
import sys

# Add source to path
sys.path.insert(0, '/content/PPN2V/src')

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print("‚úÖ GPU is ready!")
else:
    print("‚ö†Ô∏è WARNING: No GPU detected! Go to Runtime ‚Üí Change runtime type ‚Üí GPU")

In [None]:
# Import PPN2V modules
from ppn2v.pn2v import gaussianMixtureNoiseModel, histNoiseModel, training, prediction, utils
from ppn2v.unet.model import UNet
from tifffile import imread, imwrite
import matplotlib.pyplot as plt

print("‚úÖ PPN2V modules imported successfully!")

## ‚öôÔ∏è Section 3: Configuration

**Edit these settings if needed:**

In [None]:
# ============ PATHS ============
DRIVE_ROOT = "/content/drive/MyDrive"
DATA_DIR = f"{DRIVE_ROOT}/ppn2v_data/DATASET_01"
MODELS_DIR = f"{DRIVE_ROOT}/ppn2v_models/DATASET_01"

# Create output directory
import os
os.makedirs(MODELS_DIR, exist_ok=True)

# ============ DATASET ============
dataName = 'dataset01'
target_noise_level = 0.7  # Change if using different noise level

# ============ TRAINING CONFIG ============
# Adjust these for speed vs quality tradeoff
CONFIG = {
    'n_gaussian': 3,          # GMM components
    'n_coeff': 2,             # Polynomial coefficients
    'n_samples': 800,         # Network output samples
    'depth': 3,               # U-Net depth
    'numOfEpochs': 200,       # Max training epochs
    'stepsPerEpoch': 50,      # Steps per epoch
    'batchSize': 4,           # Batch size
    'learningRate': 1e-3,     # Learning rate
    'earlyStopPatience': 15,  # Stop if no improvement for N epochs (0=disable)
}

# Device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

print(f"üìä Dataset: {dataName}")
print(f"üìä Noise level: œÉ={target_noise_level}")
print(f"üìä Max epochs: {CONFIG['numOfEpochs']} (early stop patience: {CONFIG['earlyStopPatience']})")
print(f"üñ•Ô∏è  Device: {device}")

## üì• Section 4: Load Data from Google Drive

In [None]:
# Define file paths
clean_path = f"{DATA_DIR}/clean_image.tif"
jittered_path = f"{DATA_DIR}/jittered_image.tif"
noisy_path = f"{DATA_DIR}/Noisy images/noisy_image_jitter_skips_0__0_3_flags_0__0_4_Gaussian_{target_noise_level}.tif"

# Check if files exist
print("Checking files:")
print(f"  Clean:    {'‚úÖ' if os.path.exists(clean_path) else '‚ùå'} {clean_path}")
print(f"  Jittered: {'‚úÖ' if os.path.exists(jittered_path) else '‚ùå'} {jittered_path}")
print(f"  Noisy:    {'‚úÖ' if os.path.exists(noisy_path) else '‚ùå'} {noisy_path}")

if not os.path.exists(clean_path):
    print("\n‚ùå ERROR: Data not found! Please upload to Google Drive:")
    print(f"   {DATA_DIR}/")

In [None]:
# Load images
clean_image = imread(clean_path).astype(np.float32)
jittered_image = imread(jittered_path).astype(np.float32)
noisy_image = imread(noisy_path).astype(np.float32)

print("‚úÖ Images loaded:")
print(f"   Clean:    {clean_image.shape}, range [{clean_image.min():.2f}, {clean_image.max():.2f}]")
print(f"   Jittered: {jittered_image.shape}, range [{jittered_image.min():.2f}, {jittered_image.max():.2f}]")
print(f"   Noisy:    {noisy_image.shape}, range [{noisy_image.min():.2f}, {noisy_image.max():.2f}]")

In [None]:
# Visualize the images
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

axes[0].imshow(clean_image, cmap='gray')
axes[0].set_title('Clean (Ground Truth)')
axes[0].axis('off')

axes[1].imshow(jittered_image, cmap='gray')
axes[1].set_title('Jittered (Signal for Calibration)')
axes[1].axis('off')

axes[2].imshow(noisy_image, cmap='gray')
axes[2].set_title(f'Noisy (œÉ={target_noise_level})')
axes[2].axis('off')

plt.tight_layout()
plt.show()

In [None]:
# Prepare data for pipeline
WORK_DIR = '/content/PPN2V/examples/DATASET_01_code'
os.makedirs(WORK_DIR, exist_ok=True)
%cd {WORK_DIR}

# Save prepared data
noisy_stack = noisy_image[np.newaxis, ...] if len(noisy_image.shape) == 2 else noisy_image

imwrite(f'{dataName}_clean.tif', clean_image)
imwrite(f'{dataName}_signal.tif', jittered_image)
imwrite(f'{dataName}_noisy.tif', noisy_stack)

print(f"‚úÖ Data prepared in: {WORK_DIR}")

## üìà Section 5: Create Noise Models

We create two noise models:
1. **Histogram** - Fast, lookup table
2. **GMM** - Better quality, recommended

In [None]:
# Prepare signal/observation for calibration
signal = jittered_image
observation = noisy_image if len(noisy_image.shape) == 2 else noisy_image[0]

signal_for_hist = signal[np.newaxis, ...]
obs_for_hist = observation[np.newaxis, ...]

# Determine intensity range
all_values = np.concatenate([signal.flatten(), observation.flatten()])
minVal = np.percentile(all_values, 0.5)
maxVal = np.percentile(all_values, 99.5)
bins = 256

print(f"üìä Intensity range: [{minVal:.2f}, {maxVal:.2f}]")

In [None]:
# Create Histogram Noise Model
print("üìà Creating Histogram Noise Model...")

nameHistNoiseModel = f'HistNoiseModel_{dataName}_calibration'
histogram = histNoiseModel.createHistogram(bins, minVal, maxVal, obs_for_hist, signal_for_hist)
np.save(nameHistNoiseModel + '.npy', histogram)

print(f"‚úÖ Saved: {nameHistNoiseModel}.npy")

In [None]:
# Create GMM Noise Model (takes a few minutes)
print("üìà Creating GMM Noise Model (this takes 2-5 minutes)...")

min_signal = np.percentile(signal, 0.5)
max_signal = np.percentile(signal, 99.5)

nameGMMNoiseModel = f"GMMNoiseModel_{dataName}_{CONFIG['n_gaussian']}_{CONFIG['n_coeff']}_calibration"

gmmNoiseModel = gaussianMixtureNoiseModel.GaussianMixtureNoiseModel(
    min_signal=min_signal,
    max_signal=max_signal,
    path='./',
    weight=None,
    n_gaussian=CONFIG['n_gaussian'],
    n_coeff=CONFIG['n_coeff'],
    device=device,
    min_sigma=50
)

gmmNoiseModel.train(
    signal_for_hist,
    obs_for_hist,
    batchSize=250000,
    n_epochs=2000,
    learning_rate=0.1,
    name=nameGMMNoiseModel,
    lowerClip=0.5,
    upperClip=99.5
)

print(f"\n‚úÖ Saved: {nameGMMNoiseModel}.npz")

## üß† Section 6: Train PN2V Network

This is the main training step. Time depends on epochs:
- 50 epochs: ~15 min
- 100 epochs: ~30 min
- 200 epochs: ~60 min

In [None]:
# Load training data
data = imread(f'{dataName}_noisy.tif')
print(f"üìä Training data shape: {data.shape}")

# Select noise model (GMM recommended)
nameNoiseModel = nameGMMNoiseModel
print(f"üìä Using noise model: {nameNoiseModel}")

# Load noise model
params = np.load(nameNoiseModel + '.npz')
noiseModel = gaussianMixtureNoiseModel.GaussianMixtureNoiseModel(params=params, device=device)

In [None]:
# Create network
net = UNet(CONFIG['n_samples'], depth=CONFIG['depth'])

total_params = sum(p.numel() for p in net.parameters())
print(f"üìä Network parameters: {total_params:,}")

In [None]:
# TRAIN with Early Stopping!
print("="*60)
print(f"üöÄ Starting training: max {CONFIG['numOfEpochs']} epochs")
print(f"   Early stopping patience: {CONFIG['earlyStopPatience']} epochs")
print(f"   (Training will stop if no improvement for {CONFIG['earlyStopPatience']} epochs)")
print("="*60)

trainHist, valHist = training.trainNetwork(
    net=net,
    trainData=data.copy(),
    valData=data.copy(),
    postfix=nameNoiseModel,
    directory='./',
    noiseModel=noiseModel,
    device=device,
    numOfEpochs=CONFIG['numOfEpochs'],
    stepsPerEpoch=CONFIG['stepsPerEpoch'],
    virtualBatchSize=20,
    batchSize=CONFIG['batchSize'],
    learningRate=CONFIG['learningRate'],
    earlyStopPatience=CONFIG['earlyStopPatience']  # NEW: Enable early stopping
)

print("\n" + "="*60)
print("‚úÖ Training complete!")
print(f"   Epochs trained: {len(trainHist)}")
print(f"   Best val loss: {min(valHist):.6f} (epoch {np.argmin(valHist)+1})")
print(f"   Final val loss: {valHist[-1]:.6f}")
print("="*60)

In [None]:
# Plot training progress
plt.figure(figsize=(10, 6))
plt.plot(trainHist, label='Training Loss', alpha=0.7)
plt.plot(valHist, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Progress')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## üîÆ Section 7: Generate Predictions

In [None]:
# Load best trained network
net = torch.load(f'best_{nameNoiseModel}.net', weights_only=False)
print(f"‚úÖ Loaded: best_{nameNoiseModel}.net")

# Load noise model
params = np.load(nameNoiseModel + '.npz')
noiseModel = gaussianMixtureNoiseModel.GaussianMixtureNoiseModel(params=params, device=device)

In [None]:
# Run prediction
print("üîÆ Running prediction...")

noisy_for_pred = imread(f'{dataName}_noisy.tif')
noisy_for_pred = np.squeeze(noisy_for_pred)

from ppn2v.pn2v.prediction import predict
means, mseEst = predict(noisy_for_pred, net, noiseModel, device, outScaling=10.0)

print(f"‚úÖ Prediction complete!")
print(f"   Prior mean shape: {means.shape}")
print(f"   MMSE estimate shape: {mseEst.shape}")

In [None]:
# Compute uncertainty map
print("üìä Computing uncertainty map...")

net.eval()

# Ensure 2D image
noisy_2d = np.squeeze(noisy_for_pred)  # Remove any singleton dimensions
print(f"   Input shape: {noisy_2d.shape}")

img_normalized = (noisy_2d - net.mean) / net.std
h, w = img_normalized.shape

pad_h = (16 - h % 16) % 16
pad_w = (16 - w % 16) % 16
img_padded = np.pad(img_normalized, ((0, pad_h), (0, pad_w)), mode='reflect')

with torch.no_grad():
    img_tensor = torch.from_numpy(img_padded[np.newaxis, np.newaxis, ...].astype(np.float32)).to(device)
    output = net(img_tensor)
    samples = output.cpu().numpy()[0] * 10.0 * net.std + net.mean
    uncertainty_map = samples.std(axis=0)[:h, :w]

print(f"‚úÖ Uncertainty range: [{uncertainty_map.min():.4f}, {uncertainty_map.max():.4f}]")

## üìä Section 8: Calculate Metrics & Visualize

In [None]:
from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim

def normalize_01(img):
    return (img - img.min()) / (img.max() - img.min() + 1e-10)

clean_norm = normalize_01(clean_image)
noisy_norm = normalize_01(noisy_for_pred)
mmse_norm = normalize_01(np.squeeze(mseEst))

psnr_noisy = psnr(clean_norm, noisy_norm, data_range=1.0)
ssim_noisy = ssim(clean_norm, noisy_norm, data_range=1.0)
psnr_mmse = psnr(clean_norm, mmse_norm, data_range=1.0)
ssim_mmse = ssim(clean_norm, mmse_norm, data_range=1.0)

print("\n" + "="*60)
print("üìä QUALITY METRICS")
print("="*60)
print(f"{'Method':<25} {'PSNR (dB)':<12} {'SSIM':<10}")
print("-"*60)
print(f"{'Noisy (baseline)':<25} {psnr_noisy:<12.2f} {ssim_noisy:<10.4f}")
print(f"{'PN2V (MMSE)':<25} {psnr_mmse:<12.2f} {ssim_mmse:<10.4f}")
print("-"*60)
print(f"{'Supervisor Benchmark':<25} {'28.48':<12} {'0.73':<10}")
print("="*60)
print(f"\nüìà PSNR improvement: +{psnr_mmse - psnr_noisy:.2f} dB")

In [None]:
# Visualize results
fig, axes = plt.subplots(2, 3, figsize=(18, 12))

# Row 1: Images
axes[0, 0].imshow(clean_image, cmap='gray')
axes[0, 0].set_title('Ground Truth', fontsize=14)
axes[0, 0].axis('off')

axes[0, 1].imshow(noisy_for_pred, cmap='gray')
axes[0, 1].set_title(f'Noisy (PSNR: {psnr_noisy:.2f} dB)', fontsize=14)
axes[0, 1].axis('off')

axes[0, 2].imshow(np.squeeze(mseEst), cmap='gray')
axes[0, 2].set_title(f'PN2V Denoised (PSNR: {psnr_mmse:.2f} dB)', fontsize=14)
axes[0, 2].axis('off')

# Row 2: Residuals and Uncertainty
residual_noisy = noisy_for_pred - clean_image
residual_denoised = np.squeeze(mseEst) - clean_image
vmax = np.percentile(np.abs(residual_noisy), 99)

axes[1, 0].imshow(residual_noisy, cmap='RdBu', vmin=-vmax, vmax=vmax)
axes[1, 0].set_title('Residual (Noisy - Clean)', fontsize=14)
axes[1, 0].axis('off')

axes[1, 1].imshow(residual_denoised, cmap='RdBu', vmin=-vmax, vmax=vmax)
axes[1, 1].set_title('Residual (Denoised - Clean)', fontsize=14)
axes[1, 1].axis('off')

im = axes[1, 2].imshow(uncertainty_map, cmap='hot')
axes[1, 2].set_title('UNCERTAINTY MAP', fontsize=14, fontweight='bold')
axes[1, 2].axis('off')
plt.colorbar(im, ax=axes[1, 2], fraction=0.046)

plt.tight_layout()
plt.savefig('final_results.png', dpi=150, bbox_inches='tight')
plt.show()

## üíæ Section 9: Save Results to Google Drive

In [None]:
import shutil
from datetime import datetime

# Save denoised images locally first
imwrite(f'{dataName}_denoised_mmse.tif', np.squeeze(mseEst).astype(np.float32))
imwrite(f'{dataName}_denoised_prior_mean.tif', np.squeeze(means).astype(np.float32))
imwrite(f'{dataName}_uncertainty_map.tif', uncertainty_map.astype(np.float32))

# Files to copy to Drive
files_to_save = [
    f'{dataName}_denoised_mmse.tif',
    f'{dataName}_denoised_prior_mean.tif',
    f'{dataName}_uncertainty_map.tif',
    f'{nameNoiseModel}.npz',
    f'{nameHistNoiseModel}.npy',
    f'best_{nameNoiseModel}.net',
    f'last_{nameNoiseModel}.net',
    'final_results.png',
]

print(f"üíæ Saving to: {MODELS_DIR}")
for f in files_to_save:
    if os.path.exists(f):
        shutil.copy(f, MODELS_DIR)
        print(f"   ‚úÖ {f}")

# Save metrics report
report = f"""PPN2V Denoising Results - DATASET_01
{'='*50}
Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
Noise Level: œÉ={target_noise_level}

QUALITY METRICS
{'-'*50}
Noisy (baseline):    PSNR={psnr_noisy:.2f} dB, SSIM={ssim_noisy:.4f}
PN2V (MMSE):         PSNR={psnr_mmse:.2f} dB, SSIM={ssim_mmse:.4f}
Supervisor Benchmark: PSNR=28.48 dB, SSIM=0.73
{'-'*50}
PSNR improvement: +{psnr_mmse - psnr_noisy:.2f} dB
"""

with open(f'{MODELS_DIR}/RESULTS_SUMMARY.txt', 'w') as f:
    f.write(report)

print(f"   ‚úÖ RESULTS_SUMMARY.txt")
print(f"\nüéâ All results saved to Google Drive!")

## üîÑ Section 10: (Optional) Push to GitHub

If you want to save your changes back to GitHub, run the cells below.

**First time setup:** You need a GitHub Personal Access Token:
1. Go to GitHub ‚Üí Settings ‚Üí Developer settings ‚Üí Personal access tokens
2. Generate new token (classic) with `repo` scope
3. Copy the token

In [None]:
# Configure git (run once)
%cd /content/PPN2V

!git config user.email "your-email@example.com"  # <-- Change this!
!git config user.name "ZurvanAkarna"             # <-- Change this!

print("‚úÖ Git configured")

In [None]:
# Set your token (run once per session)
# Replace YOUR_TOKEN_HERE with your actual GitHub token

GITHUB_TOKEN = "YOUR_TOKEN_HERE"  # <-- Paste your token here!

!git remote set-url origin https://{GITHUB_TOKEN}@github.com/ZurvanAkarna/PPN2V.git

print("‚úÖ GitHub token configured")

In [None]:
# See what changed
!git status

In [None]:
# Commit and push
!git add -A
!git commit -m "Update from Colab: training results"
!git push origin main

print("\n‚úÖ Changes pushed to GitHub!")

---

## üéâ Done!

**Your results are saved to:**
- Google Drive: `MyDrive/ppn2v_models/DATASET_01/`
- GitHub: (if you ran Section 10)

**Key files:**
- `dataset01_denoised_mmse.tif` - Main denoised result
- `dataset01_uncertainty_map.tif` - Uncertainty map
- `RESULTS_SUMMARY.txt` - Metrics
- `final_results.png` - Visualization

---