# Day 1: Foundations & Data Pipeline

**Goals:**
- Deep understanding of autoencoder theory
- SAR physics and speckle statistics
- Audit and verify preprocessing pipeline

**Time:** 6 hours

In [None]:
import sys
import os
sys.path.insert(0, os.path.join(os.getcwd(), '../../src'))

import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage, stats
%matplotlib inline

In [None]:
# Import your modules
try:
    from data.preprocessing import *
    print("Loaded data.preprocessing")
except ImportError as e:
    print(f"Could not load: {e}")

try:
    from data.dataset import *
    print("Loaded data.dataset")
except ImportError as e:
    print(f"Could not load: {e}")

---
# Part 1: Theory Questions (1.5 hours)

**Answer in the markdown cells before proceeding.**

## Q1.1: Information Bottleneck

Your autoencoder compresses 256x256x1 (65,536 dims) to 16x16x64 (16,384 dims).

a) Why does this force learning useful features?

b) For 3 bits/pixel entropy, calculate total input entropy. Is bottleneck tight enough?

c) How to detect if bottleneck is too tight or too loose?

### Your Answer:

a) 

b) 

c) 

## Q1.2: SAR Physics

Predict brightness and explain physics for:
- a) Calm lake at 35 deg incidence
- b) Lake with 30cm waves
- c) Dry plowed field
- d) Same field after rain
- e) Dense forest
- f) Metal bridge (double-bounce)

### Your Answer:

a) 

b) 

c) 

d) 

e) 

f) 

## Q1.3: Speckle

Sentinel-1 GRD has ~4.4 looks.

a) What does "4.4 looks" mean? Resolution trade-off?

b) Expected CV? Show formula.

c) Reconstruction has CV=0.35 instead of 0.48. Desirable? Cause?

d) How distinguish speckle reduction from texture smoothing?

### Your Answer:

a) 

b) 

c) 

d) 

## Q1.4: Preprocessing

Pipeline: log transform -> clip [-25, +5] dB -> normalize [0,1]

a) Why log transform?

b) What gets clipped at each bound?

c) Why not normalize each image independently?

d) What params for test inference?

### Your Answer:

a) 

b) 

c) 

d) 

## Q1.5: Loss Functions

a) Why does MSE alone produce blur?

b) What does SSIM measure that MSE ignores?

c) For speckly SAR, is exact pixel preservation important?

### Your Answer:

a) 

b) 

c) 

---
# Part 2: Preprocessing Audit (2 hours)

## Exercise 1.1: Invalid Value Handling

In [None]:
test_array = np.array([
    [0.1, 0.0, -0.1],
    [np.nan, np.inf, 0.5],
    [1e-10, 1e10, 0.3]
], dtype=np.float32)

print("Test array with problematic values:")
print(test_array)

In [None]:
# TODO: Test your function
# result = handle_invalid_values(test_array)
# assert np.all(np.isfinite(result))
# assert np.all(result >= 0)
# print("PASSED")

## Exercise 1.2: dB Conversion

In [None]:
test_cases = [
    (1.0, 0.0),
    (10.0, 10.0),
    (0.1, -10.0),
    (0.01, -20.0),
]

for intensity, expected in test_cases:
    actual = 10 * np.log10(intensity)
    print(f"{intensity} -> {actual:.1f} dB (expected: {expected})")

In [None]:
# TODO: Test your to_db function against these values

## Exercise 1.3: Normalization

In [None]:
vmin, vmax = -25, 5
test_db = np.array([[-30, -25, -20], [-10, 0, 5], [5, 10, 15]])
expected = np.array([[0.0, 0.0, 0.167], [0.5, 0.833, 1.0], [1.0, 1.0, 1.0]])

print(f"Input (dB):\n{test_db}")
print(f"\nExpected output:\n{expected}")

In [None]:
# TODO: Test your normalize function

## Exercise 1.4: Roundtrip Test

In [None]:
np.random.seed(42)
original = np.random.gamma(shape=4.4, scale=0.1, size=(64, 64)).astype(np.float32)
print(f"Original: range=[{original.min():.4f}, {original.max():.4f}], mean={original.mean():.4f}")

In [None]:
# TODO: Test roundtrip
# preprocessed, params = preprocess_complete(original)
# reconstructed = inverse_preprocess(preprocessed, params)
# error = np.abs(original - reconstructed).mean()
# print(f"Mean absolute error: {error}")

---
# Part 3: Speckle Analysis (1.5 hours)

In [None]:
def compute_local_cv(image, window_size=32):
    from scipy.ndimage import uniform_filter
    image_safe = np.maximum(image, 1e-10)
    local_mean = uniform_filter(image_safe, size=window_size)
    local_sq_mean = uniform_filter(image_safe**2, size=window_size)
    local_var = np.maximum(local_sq_mean - local_mean**2, 0)
    local_std = np.sqrt(local_var)
    return local_std / (local_mean + 1e-10)

def estimate_enl(image):
    image_clean = image[image > 0]
    cv = np.std(image_clean) / np.mean(image_clean)
    return 1 / (cv ** 2), cv

In [None]:
# Test on synthetic data
cv_map = compute_local_cv(original, window_size=16)
enl, cv = estimate_enl(original)

print(f"Measured CV: {cv:.3f}")
print(f"Expected CV (L=4.4): {1/np.sqrt(4.4):.3f}")
print(f"Estimated ENL: {enl:.2f}")

In [None]:
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].imshow(original, cmap='gray')
axes[0].set_title('Original')
im = axes[1].imshow(cv_map, cmap='viridis', vmin=0, vmax=1)
axes[1].set_title(f'Local CV (mean={cv_map.mean():.3f})')
plt.colorbar(im, ax=axes[1])
plt.tight_layout()
plt.show()

## Verify Speckle Distribution

In [None]:
# Normalize and fit gamma
image_clean = original[original > 0]
normalized = image_clean / np.mean(image_clean)
shape, loc, scale = stats.gamma.fit(normalized, floc=0)

print(f"Fitted shape (ENL): {shape:.2f}")
print(f"Expected ENL: 4.4")

# Plot
fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(normalized, bins=100, density=True, alpha=0.7, label='Data')
x = np.linspace(0, 4, 200)
ax.plot(x, stats.gamma.pdf(x, a=4.4, scale=1/4.4), 'r-', lw=2, label='Gamma(L=4.4)')
ax.plot(x, stats.gamma.pdf(x, a=shape, scale=scale), 'g--', lw=2, label=f'Fitted(L={shape:.1f})')
ax.legend()
ax.set_xlabel('Normalized Intensity')
ax.set_ylabel('Density')
plt.show()

---
# Day 1 Checklist

- [ ] Answered all theory questions
- [ ] Invalid value handling test passed
- [ ] dB conversion test passed  
- [ ] Normalization test passed
- [ ] Roundtrip test passed
- [ ] Speckle statistics analyzed
- [ ] Documented issues/fixes

## Notes

*Document issues and fixes here:*

1. 
2. 
3. 