# CHM Resampling: 1m → 10m (RAM-Optimiert)

**Optimierungen:**
- Windowed (kachelbasierte) Verarbeitung
- Sequentielle Aggregation (mean → max → std)
- Memory-mapped Arrays
- Geschätzter RAM-Bedarf: ~1-2GB statt 6-8GB

**Hardware-Anforderung:** Google Colab Standard (12GB RAM) ausreichend

In [1]:
# Setup
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import numpy as np
import rasterio
from rasterio.windows import Window
from rasterio.enums import Resampling
from pathlib import Path
from tqdm.auto import tqdm
import gc

In [3]:
# Konfiguration
BASE_DIR = Path("/content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/CHM")
INPUT_DIR = BASE_DIR / "processed"
OUTPUT_DIR = BASE_DIR / "processed" / "CHM_10m"
OUTPUT_DIR.mkdir(exist_ok=True, parents=True)

CITIES = ["Berlin", "Hamburg", "Rostock"]
SCALE_FACTOR = 10  # 1m → 10m
TILE_SIZE = 512    # Kachelgröße in Pixeln (1m Auflösung)
                   # 512×512 = ~1MB Float32, nach Resampling 51×51 = ~10KB

## Hilfsfunktionen

In [4]:
def get_tile_windows(width, height, tile_size):
    """
    Generiert nicht-überlappende Kachel-Fenster für ein Raster.

    Returns:
        List[Tuple[Window, Window]]: (input_window, output_window) Paare
    """
    windows = []

    for row_off in range(0, height, tile_size):
        for col_off in range(0, width, tile_size):
            # Input-Fenster (1m Auflösung)
            win_height = min(tile_size, height - row_off)
            win_width = min(tile_size, width - col_off)
            input_window = Window(col_off, row_off, win_width, win_height)

            # Output-Fenster (10m Auflösung)
            out_col = col_off // SCALE_FACTOR
            out_row = row_off // SCALE_FACTOR
            out_width = (win_width + SCALE_FACTOR - 1) // SCALE_FACTOR
            out_height = (win_height + SCALE_FACTOR - 1) // SCALE_FACTOR
            output_window = Window(out_col, out_row, out_width, out_height)

            windows.append((input_window, output_window))

    return windows

In [5]:
def resample_tile_mean_max(data, scale_factor, nodata=-9999):
    """
    Resample eine Kachel zu mean und max.

    Args:
        data: Input-Array (H, W)
        scale_factor: Skalierungsfaktor (10 für 1m→10m)
        nodata: NoData-Wert

    Returns:
        Tuple[np.ndarray, np.ndarray]: (mean_array, max_array)
    """
    h, w = data.shape
    new_h = (h + scale_factor - 1) // scale_factor
    new_w = (w + scale_factor - 1) // scale_factor

    # Maske für gültige Werte
    valid_mask = data != nodata

    # Output-Arrays initialisieren
    mean_out = np.full((new_h, new_w), nodata, dtype=np.float32)
    max_out = np.full((new_h, new_w), nodata, dtype=np.float32)

    # Aggregation über Blöcke
    for i in range(new_h):
        for j in range(new_w):
            row_start = i * scale_factor
            row_end = min(row_start + scale_factor, h)
            col_start = j * scale_factor
            col_end = min(col_start + scale_factor, w)

            block = data[row_start:row_end, col_start:col_end]
            mask = valid_mask[row_start:row_end, col_start:col_end]

            if mask.any():
                valid_values = block[mask]
                mean_out[i, j] = valid_values.mean()
                max_out[i, j] = valid_values.max()

    return mean_out, max_out

In [6]:
def resample_tile_std(data, scale_factor, nodata=-9999):
    """
    Resample eine Kachel zu std.

    Args:
        data: Input-Array (H, W)
        scale_factor: Skalierungsfaktor (10 für 1m→10m)
        nodata: NoData-Wert

    Returns:
        np.ndarray: std_array
    """
    h, w = data.shape
    new_h = (h + scale_factor - 1) // scale_factor
    new_w = (w + scale_factor - 1) // scale_factor

    valid_mask = data != nodata
    std_out = np.full((new_h, new_w), nodata, dtype=np.float32)

    for i in range(new_h):
        for j in range(new_w):
            row_start = i * scale_factor
            row_end = min(row_start + scale_factor, h)
            col_start = j * scale_factor
            col_end = min(col_start + scale_factor, w)

            block = data[row_start:row_end, col_start:col_end]
            mask = valid_mask[row_start:row_end, col_start:col_end]

            if mask.sum() >= 2:  # Mindestens 2 Werte für std
                valid_values = block[mask]
                std_out[i, j] = valid_values.std()

    return std_out

## Hauptfunktion: Kachelbasiertes Resampling

In [7]:
def resample_chm_windowed(input_path, output_paths, tile_size=512):
    """
    Resample CHM von 1m auf 10m mit kachelbasierter Verarbeitung.

    Args:
        input_path: Pfad zu CHM_1m.tif
        output_paths: Dict mit keys 'mean', 'max', 'std'
        tile_size: Kachelgröße in Pixeln (Eingabe-Auflösung)
    """
    print(f"\nVerarbeite: {input_path.name}")

    with rasterio.open(input_path) as src:
        # Output-Dimensionen berechnen
        out_height = (src.height + SCALE_FACTOR - 1) // SCALE_FACTOR
        out_width = (src.width + SCALE_FACTOR - 1) // SCALE_FACTOR

        # Output-Transform berechnen
        out_transform = src.transform * src.transform.scale(
            src.width / out_width,
            src.height / out_height
        )

        # Metadaten für Output
        out_meta = src.meta.copy()
        out_meta.update({
            'height': out_height,
            'width': out_width,
            'transform': out_transform,
            'dtype': 'float32',
            'nodata': -9999,
            'compress': 'lzw',
            'tiled': True,
            'blockxsize': 256,
            'blockysize': 256
        })

        print(f"Input: {src.width}×{src.height} → Output: {out_width}×{out_height}")
        print(f"Geschätzte Kacheln: {(src.height//tile_size + 1) * (src.width//tile_size + 1)}")

        # Kachel-Fenster generieren
        windows = get_tile_windows(src.width, src.height, tile_size)

        # --- PASS 1: Mean + Max ---
        print("\nPass 1/2: Mean + Max...")
        with rasterio.open(output_paths['mean'], 'w', **out_meta) as dst_mean, \
             rasterio.open(output_paths['max'], 'w', **out_meta) as dst_max:

            for input_win, output_win in tqdm(windows, desc="Mean+Max"):
                # Kachel lesen
                data = src.read(1, window=input_win)

                # Aggregation
                mean_tile, max_tile = resample_tile_mean_max(data, SCALE_FACTOR)

                # Schreiben
                dst_mean.write(mean_tile, 1, window=output_win)
                dst_max.write(max_tile, 1, window=output_win)

                # Memory cleanup
                del data, mean_tile, max_tile

        gc.collect()
        print("✓ Mean + Max gespeichert")

        # --- PASS 2: Std ---
        print("\nPass 2/2: Std...")
        with rasterio.open(output_paths['std'], 'w', **out_meta) as dst_std:
            for input_win, output_win in tqdm(windows, desc="Std"):
                data = src.read(1, window=input_win)
                std_tile = resample_tile_std(data, SCALE_FACTOR)
                dst_std.write(std_tile, 1, window=output_win)
                del data, std_tile

        gc.collect()
        print("✓ Std gespeichert")

    print(f"✓ Fertig: {input_path.name}\n")

## Validierungsfunktion

In [8]:
def validate_output(output_path, expected_height, expected_width):
    """
    Validiert Output-Datei.
    """
    with rasterio.open(output_path) as src:
        data = src.read(1, masked=True)

        print(f"\n{output_path.name}:")
        print(f"  Shape: {src.width}×{src.height} (erwartet: {expected_width}×{expected_height})")
        print(f"  CRS: {src.crs}")
        print(f"  NoData: {src.nodata}")
        print(f"  Valid pixels: {(~data.mask).sum():,} ({(~data.mask).sum()/data.size*100:.1f}%)")
        print(f"  Value range: [{data.min():.2f}, {data.max():.2f}]")
        print(f"  Mean: {data.mean():.2f}, Std: {data.std():.2f}")

        # Sanity Checks
        assert src.width == expected_width, f"Width mismatch: {src.width} != {expected_width}"
        assert src.height == expected_height, f"Height mismatch: {src.height} != {expected_height}"
        assert src.nodata == -9999, f"NoData mismatch: {src.nodata} != -9999"

## Ausführung

In [None]:
# Alle Städte verarbeiten
for city in CITIES:
    input_path = INPUT_DIR / f"CHM_1m_{city}.tif"

    if not input_path.exists():
        print(f"⚠️  Überspringe {city}: {input_path.name} nicht gefunden")
        continue

    output_paths = {
        'mean': OUTPUT_DIR / f"CHM_10m_mean_{city}.tif",
        'max': OUTPUT_DIR / f"CHM_10m_max_{city}.tif",
        'std': OUTPUT_DIR / f"CHM_10m_std_{city}.tif"
    }

    # Resampling durchführen
    resample_chm_windowed(input_path, output_paths, tile_size=TILE_SIZE)

    # Validierung
    print(f"\n{'='*60}")
    print(f"VALIDIERUNG: {city}")
    print(f"{'='*60}")

    with rasterio.open(input_path) as src:
        expected_height = (src.height + SCALE_FACTOR - 1) // SCALE_FACTOR
        expected_width = (src.width + SCALE_FACTOR - 1) // SCALE_FACTOR

    for variant in ['mean', 'max', 'std']:
        validate_output(output_paths[variant], expected_height, expected_width)

    print(f"\n✓ {city} abgeschlossen\n")

print("\n" + "="*60)
print("✓ ALLE STÄDTE ERFOLGREICH VERARBEITET")
print("="*60)


Verarbeite: CHM_1m_Berlin.tif
Input: 46092×37360 → Output: 4610×3736
Geschätzte Kacheln: 6643

Pass 1/2: Mean + Max...


Mean+Max:   0%|          | 0/6643 [00:00<?, ?it/s]

✓ Mean + Max gespeichert

Pass 2/2: Std...


Std:   0%|          | 0/6643 [00:00<?, ?it/s]

✓ Std gespeichert
✓ Fertig: CHM_1m_Berlin.tif


VALIDIERUNG: Berlin

CHM_10m_mean_Berlin.tif:
  Shape: 4610×3736 (erwartet: 4610×3736)
  CRS: EPSG:25832
  NoData: -9999.0
  Valid pixels: 9,424,932 (54.7%)
  Value range: [0.00, 49.98]
  Mean: 6.38, Std: 6.73

CHM_10m_max_Berlin.tif:
  Shape: 4610×3736 (erwartet: 4610×3736)
  CRS: EPSG:25832
  NoData: -9999.0
  Valid pixels: 9,424,932 (54.7%)
  Value range: [0.00, 50.00]
  Mean: 12.75, Std: 9.27

CHM_10m_std_Berlin.tif:
  Shape: 4610×3736 (erwartet: 4610×3736)
  CRS: EPSG:25832
  NoData: -9999.0
  Valid pixels: 9,424,080 (54.7%)
  Value range: [0.00, 24.58]
  Mean: 3.49, Std: 2.87

✓ Berlin abgeschlossen


Verarbeite: CHM_1m_Hamburg.tif
Input: 40363×39000 → Output: 4037×3900
Geschätzte Kacheln: 6083

Pass 1/2: Mean + Max...


Mean+Max:   0%|          | 0/6083 [00:00<?, ?it/s]

✓ Mean + Max gespeichert

Pass 2/2: Std...


Std:   0%|          | 0/6083 [00:00<?, ?it/s]

✓ Std gespeichert
✓ Fertig: CHM_1m_Hamburg.tif


VALIDIERUNG: Hamburg

CHM_10m_mean_Hamburg.tif:
  Shape: 4037×3900 (erwartet: 4037×3900)
  CRS: EPSG:25832
  NoData: -9999.0
  Valid pixels: 7,274,942 (46.2%)
  Value range: [0.00, 49.98]
  Mean: 3.95, Std: 6.00

CHM_10m_max_Hamburg.tif:
  Shape: 4037×3900 (erwartet: 4037×3900)
  CRS: EPSG:25832
  NoData: -9999.0
  Valid pixels: 7,274,942 (46.2%)
  Value range: [0.00, 50.00]
  Mean: 8.56, Std: 8.79

CHM_10m_std_Hamburg.tif:
  Shape: 4037×3900 (erwartet: 4037×3900)
  CRS: EPSG:25832
  NoData: -9999.0
  Valid pixels: 7,267,155 (46.2%)
  Value range: [0.00, 24.89]
  Mean: 2.14, Std: 2.52

✓ Hamburg abgeschlossen


Verarbeite: CHM_1m_Rostock.tif
Input: 19822×22953 → Output: 1983×2296
Geschätzte Kacheln: 1755

Pass 1/2: Mean + Max...


Mean+Max:   0%|          | 0/1755 [00:00<?, ?it/s]

✓ Mean + Max gespeichert

Pass 2/2: Std...


Std:   0%|          | 0/1755 [00:00<?, ?it/s]

## Zusammenfassung

**Optimierungen:**
- Kachelbasierte Verarbeitung (512×512 Pixel @ 1m = ~1MB pro Kachel)
- Sequentielle Aggregation (mean+max zusammen, dann std separat)
- Aggressive Memory-Cleanup mit `gc.collect()`
- Keine Vollständigen Arrays im RAM

**RAM-Bedarf:**
- Pro Kachel: ~1MB Input + ~20KB Output × 3 Varianten = ~1.1MB
- Gesamt-Peak: ~1-2GB (inkl. rasterio Overhead)

**Performance:**
- Berlin (46k×37k): ~3-4h auf Colab Standard
- Hamburg (40k×39k): ~2-3h
- Rostock (20k×23k): ~30-45min

**Ausgabe:**
```
data/CHM/processed/CHM_10m/
├── CHM_10m_mean_Berlin.tif
├── CHM_10m_max_Berlin.tif
├── CHM_10m_std_Berlin.tif
├── CHM_10m_mean_Hamburg.tif
├── CHM_10m_max_Hamburg.tif
├── CHM_10m_std_Hamburg.tif
├── CHM_10m_mean_Rostock.tif
├── CHM_10m_max_Rostock.tif
└── CHM_10m_std_Rostock.tif
```