# Sentinel-2 Download via Google Earth Engine

**Projekt:** Baumartenklassifikation & Cross-City-Transferierbarkeit  
**St√§dte:** Werden aus GeoPackage geladen  
**Zeitraum:** 2021 (12 Monate)  
**Output:** Monatliche Median-Kompositionen mit 15 B√§ndern (10 Spektral + 5 Indizes)

## 1. Setup & Authentication

In [46]:
!pip -q install earthengine-api rasterio numpy --quiet

import ee
import geopandas as gpd
import numpy as np
import rasterio
from rasterio.features import geometry_mask
from pathlib import Path
import time
from datetime import datetime
import pandas as pd
import shutil
import glob

In [33]:
# Authenticate & Initialize
ee.Authenticate()
ee.Initialize(project='treeclassifikation')

## 2. Konfiguration

In [43]:
# Pfade
GPKG_PATH = '/content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/boundaries/city_boundaries_500m_buffer.gpkg'
BOUNDARIES_PATH = '/content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/boundaries/city_boundaries.gpkg'
DRIVE_FOLDER = 'sentinel2_2021_final'  # Google Drive Ordner (wird von GEE erstellt)
LOCAL_OUTPUT_DIR = Path('/content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/sentinel2_2021')

# KRITISCH: Wo GEE die Dateien speichert (im Drive-Root/DRIVE_FOLDER)
GEE_OUTPUT_DIR = Path('/content/drive/MyDrive') / DRIVE_FOLDER

# Parameter
YEAR = 2021
MONTHS = list(range(1, 13))
TARGET_CRS = 'EPSG:25832'
TARGET_SCALE = 10

# Spektralb√§nder
SPECTRAL_BANDS = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B8A', 'B11', 'B12']

# SCL-Masking: Vegetation (4), Not vegetated (5), Unclassified (7)
VALID_SCL_CLASSES = [4, 5, 7]

# Validierungsschwellen
MIN_COVERAGE_PERCENT = 15
SPECTRAL_MAX_TOLERANCE = 20000
INDEX_RANGES = {
    'NDre': (-1, 1),
    'NDVIre': (-1, 1),
    'kNDVI': (0, 1),
    'VARI': (-2, 2),
    'RTVIcore': (-1000, 1000)
}

## 3. Hilfsfunktionen

In [60]:
# ============================================================================
# HILFSFUNKTIONEN
# ============================================================================

def ensure_directories():
    """Erstellt ben√∂tigte Verzeichnisse."""
    LOCAL_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
    GEE_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
    print(f"‚úÖ Verzeichnisse bereit:")
    print(f"   GEE Export: {GEE_OUTPUT_DIR}")
    print(f"   Zielordner: {LOCAL_OUTPUT_DIR}")

def move_file_to_final_location(filename):
    """
    Verschiebt Datei von GEE_OUTPUT_DIR nach LOCAL_OUTPUT_DIR.
    Returns: True wenn erfolgreich, False sonst
    """
    source = GEE_OUTPUT_DIR / filename
    target = LOCAL_OUTPUT_DIR / filename

    if not source.exists():
        print(f"      ‚ö†Ô∏è Quelldatei nicht gefunden: {source}")
        return False

    try:
        if target.exists():
            print(f"      ‚ÑπÔ∏è Zieldatei existiert bereits, √ºberschreibe...")

        shutil.move(str(source), str(target))
        print(f"      ‚úÖ Verschoben: {filename}")
        return True

    except Exception as e:
        print(f"      ‚ùå Fehler beim Verschieben: {e}")
        return False

def file_exists_in_final_location(filename):
    """Pr√ºft ob Datei bereits im Zielordner existiert."""
    return (LOCAL_OUTPUT_DIR / filename).exists()

# ============================================================================
# GEE FUNKTIONEN
# ============================================================================

def create_aoi(bounds, buffer_m=0):
    """Erstellt AOI aus Bounds (WGS84)."""
    bbox = ee.Geometry.Rectangle(bounds, proj='EPSG:4326', geodesic=False)
    if buffer_m > 0:
        bbox = bbox.transform(TARGET_CRS, 1).buffer(buffer_m).transform('EPSG:4326', 1)
    return bbox

def mask_clouds_scl(image):
    """
    SCL-basiertes Cloud/Shadow/Water Masking.
    Whitelist: 4=Vegetation, 5=Not vegetated, 7=Unclassified
    """
    scl = image.select('SCL')
    valid_mask = scl.eq(4).Or(scl.eq(5)).Or(scl.eq(7))
    return image.updateMask(valid_mask)

def resample_20m_to_10m(image):
    """Resampelt 20m-B√§nder (B5,B6,B7,B8A,B11,B12) auf 10m."""
    bands_20m = ['B5', 'B6', 'B7', 'B8A', 'B11', 'B12']
    resampled = image.select(bands_20m).resample('bilinear').reproject(
        crs=image.select('B2').projection(),
        scale=10
    )
    return image.addBands(resampled, overwrite=True)

def add_vegetation_indices(image):
    """Berechnet Vegetationsindizes mit robuster Maskierung."""
    b2 = image.select('B2')
    b3 = image.select('B3')
    b4 = image.select('B4')
    b5 = image.select('B5')
    b8 = image.select('B8')
    b8a = image.select('B8A')

    eps = 1e-8

    # Alle Indizes: explizit Float32
    ndre = b8a.subtract(b5).divide(b8a.add(b5).add(eps)).float().rename('NDre')
    ndvire = b8a.subtract(b4).divide(b8a.add(b4).add(eps)).float().rename('NDVIre')

    ndvi_base = b8.subtract(b4).divide(b8.add(b4).add(eps))
    kndvi = ndvi_base.pow(2).tanh().float().rename('kNDVI')

    vari_num = b3.subtract(b4)
    vari_den = b3.add(b4).subtract(b2).add(eps)
    vari = vari_num.divide(vari_den).float().rename('VARI')

    # RTVIcore: Normalisiere auf 0-1, dann berechne, dann Float32
    b8a_norm = b8a.divide(10000.0)
    b5_norm = b5.divide(10000.0)
    b4_norm = b4.divide(10000.0)

    rtvicore = b8a_norm.subtract(b5_norm).multiply(100).subtract(
        b8a_norm.subtract(b4_norm).multiply(10)
    ).float().rename('RTVIcore')  # ‚Üê .float() hinzugef√ºgt!

    return image.addBands([ndre, ndvire, kndvi, vari, rtvicore])

def process_month(year, month, aoi, city_name):
    start_date = f'{year}-{month:02d}-01'
    if month == 12:
        end_date = f'{year+1}-01-01'
    else:
        end_date = f'{year}-{month+1:02d}-01'

    # Collection laden OHNE Cloud-Filter
    s2 = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED') \
        .filterBounds(aoi) \
        .filterDate(start_date, end_date)

    # Szenen-Count f√ºr Logging
    scene_count = s2.size().getInfo()
    print(f"    Verf√ºgbare Szenen: {scene_count}")

    if scene_count == 0:
        print(f"    ‚ö†Ô∏è Keine Szenen verf√ºgbar!")
        return None

    # Cloud Masking (auf Pixel-Ebene)
    s2_masked = s2.map(mask_clouds_scl)
    s2_resampled = s2_masked.map(resample_20m_to_10m)
    s2_with_indices = s2_resampled.map(add_vegetation_indices)

    all_bands = SPECTRAL_BANDS + ['NDre', 'NDVIre', 'kNDVI', 'VARI', 'RTVIcore']
    s2_selected = s2_with_indices.select(all_bands)

    # Monatlicher Median
    monthly_median = s2_selected.median().clip(aoi)

    # ‚Üê KRITISCH: ALLE B√ÑNDER zu Float32 konvertieren
    monthly_median = monthly_median.toFloat()

    monthly_median = monthly_median.set({
        'city': city_name,
        'year': year,
        'month': month,
        'scene_count': scene_count,
        'cloud_threshold': cloud_threshold,
        'system:time_start': ee.Date(start_date).millis()
    })

    return monthly_median, scene_count, cloud_threshold

def export_to_drive(image, city_name, year, month, aoi):
    """Exportiert Image zu Google Drive."""
    description = f'S2_{city_name}_{year}_{month:02d}_median'

    task = ee.batch.Export.image.toDrive(
        image=image,
        description=description,
        folder=DRIVE_FOLDER,
        fileNamePrefix=description,
        region=aoi,
        scale=TARGET_SCALE,
        crs=TARGET_CRS,
        maxPixels=1e13,
        fileFormat='GeoTIFF',
        formatOptions={'cloudOptimized': True}
    )

    task.start()
    return task

# ============================================================================
# VALIDIERUNGSFUNKTIONEN
# ============================================================================

def detailed_validation(filepath, city_boundaries_path):
    """Detaillierte Validierung einer Datei (NaN-aware)."""
    print("\n" + "="*80)
    print("DETAILLIERTE VALIDIERUNG")
    print("="*80)
    print(f"\nDatei: {Path(filepath).name}")

    city_name = Path(filepath).name.split('_')[1]
    gdf_boundaries = gpd.read_file(city_boundaries_path)
    city_boundary = gdf_boundaries[gdf_boundaries['gen'] == city_name]

    if len(city_boundary) == 0:
        print(f"‚ùå Stadtgrenze f√ºr {city_name} nicht gefunden!")
        return False

    issues = []
    warnings = []

    try:
        with rasterio.open(filepath) as src:
            print(f"\n1. GRUNDLEGENDE EIGENSCHAFTEN")
            print("-"*80)

            band_count = src.count
            print(f"   B√§nder: {band_count}/15 {'‚úÖ' if band_count == 15 else '‚ùå'}")
            if band_count != 15:
                issues.append(f"Band-Count: {band_count} statt 15")

            crs_correct = src.crs.to_string() == TARGET_CRS
            print(f"   CRS: {src.crs} {'‚úÖ' if crs_correct else '‚ùå'}")
            if not crs_correct:
                issues.append(f"CRS: {src.crs} statt {TARGET_CRS}")

            res = src.res[0]
            res_correct = abs(res - 10.0) < 0.1
            print(f"   Aufl√∂sung: {res:.2f}m {'‚úÖ' if res_correct else '‚ùå'}")
            if not res_correct:
                issues.append(f"Aufl√∂sung: {res}m statt 10m")

            print(f"\n2. COVERAGE (innerhalb Stadtgrenzen)")
            print("-"*80)

            city_boundary_proj = city_boundary.to_crs(src.crs)
            city_mask = ~geometry_mask(
                city_boundary_proj.geometry,
                out_shape=(src.height, src.width),
                transform=src.transform,
                invert=False
            )
            pixels_in_city = np.sum(city_mask)

            band1 = src.read(1)
            valid_mask = ~np.isnan(band1) & (band1 > 0)
            valid_in_city = np.sum(valid_mask & city_mask)
            coverage = 100 * valid_in_city / pixels_in_city if pixels_in_city > 0 else 0

            if coverage < MIN_COVERAGE_PERCENT:
                print(f"   Coverage: {coverage:.1f}% ‚ùå KRITISCH")
                issues.append(f"Coverage {coverage:.1f}% < {MIN_COVERAGE_PERCENT}%")
            elif coverage < 50:
                print(f"   Coverage: {coverage:.1f}% ‚ö†Ô∏è NIEDRIG (aber akzeptabel)")
                warnings.append(f"Coverage {coverage:.1f}% niedrig")
            else:
                print(f"   Coverage: {coverage:.1f}% ‚úÖ")

            print(f"\n3. SPEKTRALE B√ÑNDER (1-10)")
            print("-"*80)

            band_names = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B8A', 'B11', 'B12']

            for i, name in enumerate(band_names, 1):
                data = src.read(i)
                valid = data[~np.isnan(data) & (data > 0)]

                if len(valid) == 0:
                    print(f"   ‚ùå {name}: Keine g√ºltigen Pixel!")
                    issues.append(f"{name}: Keine Daten")
                    continue

                min_val = np.min(valid)
                max_val = np.max(valid)
                mean_val = np.mean(valid)

                if max_val > SPECTRAL_MAX_TOLERANCE:
                    print(f"   ‚ö†Ô∏è {name}: Min={min_val:.0f}, Max={max_val:.0f}, Mean={mean_val:.0f} (Max > {SPECTRAL_MAX_TOLERANCE})")
                    warnings.append(f"{name}: Max={max_val:.0f} ungew√∂hnlich hoch")
                else:
                    print(f"   ‚úÖ {name}: Min={min_val:.0f}, Max={max_val:.0f}, Mean={mean_val:.0f}")

            print(f"\n4. VEGETATIONSINDIZES (11-15)")
            print("-"*80)

            index_names = ['NDre', 'NDVIre', 'kNDVI', 'VARI', 'RTVIcore']
            spectral_valid_count = valid_in_city

            for i, name in enumerate(index_names, 11):
                data = src.read(i)
                valid_data = data[city_mask]
                valid = valid_data[~np.isnan(valid_data)]

                valid_count = len(valid)
                valid_pct = 100 * valid_count / pixels_in_city if pixels_in_city > 0 else 0

                if valid_count == 0:
                    print(f"   ‚ùå {name}: Keine g√ºltigen Pixel!")
                    issues.append(f"{name}: Keine Daten")
                    continue

                # PERZENTIL-BASIERTE PR√úFUNG (robust gegen Ausrei√üer)
                min_val = np.min(valid)
                max_val = np.max(valid)
                p01 = np.percentile(valid, 1)
                p99 = np.percentile(valid, 99)
                median = np.median(valid)

                expected_min, expected_max = INDEX_RANGES[name]

                # Pr√ºfe Perzentile (1% und 99%) statt Extrema
                percentile_ok = (p01 >= expected_min - 0.5) and (p99 <= expected_max + 0.5)

                # Coverage-Differenz zu Spektral
                coverage_diff_pct = 100 * abs(valid_count - spectral_valid_count) / spectral_valid_count if spectral_valid_count > 0 else 0

                # Status-Bestimmung
                if not percentile_ok:
                    print(f"   ‚ö†Ô∏è {name}: {valid_count:,} g√ºltig ({valid_pct:.1f}%)")
                    print(f"       Perzentile (1%, 99%): [{p01:.3f}, {p99:.3f}] (erwartet [{expected_min}, {expected_max}])")
                    print(f"       Extrema (Min, Max): [{min_val:.3f}, {max_val:.3f}]")
                    warnings.append(f"{name}: Perzentile au√üerhalb Erwartung")
                elif coverage_diff_pct > 15:
                    print(f"   ‚ö†Ô∏è {name}: {valid_count:,} g√ºltig ({valid_pct:.1f}%), {coverage_diff_pct:.1f}% Coverage-Differenz")
                    warnings.append(f"{name}: Coverage-Differenz {coverage_diff_pct:.1f}%")
                else:
                    # Pr√ºfe ob Extrema stark von Perzentilen abweichen (Ausrei√üer-Warnung)
                    has_outliers = (abs(min_val - p01) > abs(p99 - p01) * 2) or (abs(max_val - p99) > abs(p99 - p01) * 2)

                    if has_outliers:
                        print(f"   ‚úÖ {name}: {valid_count:,} g√ºltig ({valid_pct:.1f}%)")
                        print(f"       Perzentile (1%, 99%): [{p01:.3f}, {p99:.3f}] ‚úÖ")
                        print(f"       ‚ÑπÔ∏è Extrema: [{min_val:.3f}, {max_val:.3f}] (Ausrei√üer vorhanden, aber OK)")
                    else:
                        print(f"   ‚úÖ {name}: {valid_count:,} g√ºltig ({valid_pct:.1f}%), Bereich=[{p01:.3f}, {p99:.3f}]")

            print(f"\n5. GESAMTSTATUS")
            print("-"*80)

            if len(issues) == 0 and len(warnings) == 0:
                print("   ‚úÖ PERFEKT - Keine Probleme")
                print("\n" + "="*80)
                print("‚úÖ VALIDIERUNG ERFOLGREICH")
                print("="*80)
                return True
            elif len(issues) == 0:
                print(f"   ‚ö†Ô∏è {len(warnings)} WARNUNGEN (nicht kritisch):")
                for w in warnings:
                    print(f"      - {w}")
                print("\n" + "="*80)
                print("‚úÖ VALIDIERUNG ERFOLGREICH (mit Warnungen)")
                print("="*80)
                return True
            else:
                print(f"   ‚ùå {len(issues)} KRITISCHE FEHLER:")
                for issue in issues:
                    print(f"      - {issue}")
                if len(warnings) > 0:
                    print(f"   ‚ö†Ô∏è {len(warnings)} WARNUNGEN:")
                    for w in warnings:
                        print(f"      - {w}")
                print("\n" + "="*80)
                print("‚ùå VALIDIERUNG FEHLGESCHLAGEN")
                print("="*80)
                return False

    except Exception as e:
        print(f"\n‚ùå FEHLER: {e}")
        import traceback
        traceback.print_exc()
        return False

def quick_batch_validation(files, city_boundaries_path):
    """Schnelle Batch-Validierung."""
    print("\n" + "="*80)
    print("SCHNELLE BATCH-VALIDIERUNG")
    print("="*80)
    print(f"\n{len(files)} Dateien...\n")

    results = []

    for idx, filepath in enumerate(sorted(files), 1):
        filename = Path(filepath).name
        city_name = filename.split('_')[1]

        print(f"[{idx:2d}/{len(files)}] {filename:40s} ... ", end='', flush=True)

        result = {'file': filename, 'city': city_name}

        try:
            with rasterio.open(filepath) as src:
                bands_ok = (src.count == 15)
                crs_ok = (src.crs.to_string() == TARGET_CRS)

                city_boundaries = gpd.read_file(city_boundaries_path)
                city_boundary = city_boundaries[city_boundaries['gen'] == city_name]

                if len(city_boundary) > 0:
                    city_boundary_proj = city_boundary.to_crs(src.crs)
                    city_mask = ~geometry_mask(
                        city_boundary_proj.geometry,
                        out_shape=(src.height, src.width),
                        transform=src.transform,
                        invert=False
                    )
                    pixels_in_city = np.sum(city_mask)

                    b1 = src.read(1)
                    valid_mask = ~np.isnan(b1) & (b1 > 0)
                    valid_in_city = np.sum(valid_mask & city_mask)
                    coverage = 100 * valid_in_city / pixels_in_city if pixels_in_city > 0 else 0
                    result['coverage'] = round(coverage, 1)
                    coverage_ok = (coverage >= MIN_COVERAGE_PERCENT)
                else:
                    coverage_ok = False
                    result['coverage'] = None

                spectral_ok = (np.sum(~np.isnan(b1) & (b1 > 0)) > 0)
                ndre = src.read(11)
                indices_ok = (np.sum(~np.isnan(ndre)) > 0)

                all_ok = bands_ok and crs_ok and coverage_ok and spectral_ok and indices_ok
                result['status'] = 'OK' if all_ok else 'ISSUES'

                icon = "‚úÖ" if all_ok else "‚ö†Ô∏è" if coverage_ok else "‚ùå"
                print(f"{icon} (Cov: {result.get('coverage', 'N/A'):.1f}%)")

        except Exception as e:
            result['status'] = 'ERROR'
            result['error'] = str(e)
            print(f"‚ùå ERROR")

        results.append(result)

    df = pd.DataFrame(results)

    print("\n" + "="*80)
    print("ZUSAMMENFASSUNG")
    print("="*80)

    for city in sorted(df['city'].unique()):
        city_df = df[df['city'] == city]
        ok_count = len(city_df[city_df['status'] == 'OK'])
        total = len(city_df)
        avg_cov = city_df['coverage'].mean() if 'coverage' in city_df else 0
        icon = "‚úÖ" if ok_count == total else "‚ö†Ô∏è"
        print(f"   {icon} {city:10s}: {ok_count:2d}/{total:2d} OK (√ò Coverage: {avg_cov:.1f}%)")

    total_ok = len(df[df['status'] == 'OK'])
    total_issues = len(df[df['status'] == 'ISSUES'])
    total_errors = len(df[df['status'] == 'ERROR'])

    print(f"\n   Gesamt:")
    print(f"      ‚úÖ OK:     {total_ok:3d}/{len(df)} ({100*total_ok/len(df):.1f}%)")
    if total_issues > 0:
        print(f"      ‚ö†Ô∏è Issues: {total_issues:3d}/{len(df)} ({100*total_issues/len(df):.1f}%)")
    if total_errors > 0:
        print(f"      ‚ùå Errors: {total_errors:3d}/{len(df)} ({100*total_errors/len(df):.1f}%)")

    output_csv = LOCAL_OUTPUT_DIR / 'batch_validation_results.csv'
    df.to_csv(output_csv, index=False)
    print(f"\n   üìÑ Details: {output_csv.name}")

    print("="*80)

    return total_ok == len(df)


## Phase 1: Test-Download

In [57]:
print("\n" + "="*80)
print("SENTINEL-2 DOWNLOAD PIPELINE - FINALE VERSION")
print("="*80)

# Verzeichnisse erstellen
ensure_directories()

print("\nAuthentifiziere GEE...")
ee.Authenticate()
ee.Initialize(project='treeclassifikation')
print("‚úÖ GEE bereit")

print("\n" + "="*80)
print("PHASE 1: TEST-DOWNLOAD")
print("="*80)

test_filename = 'S2_Rostock_2021_07_median.tif'
test_file_final = LOCAL_OUTPUT_DIR / test_filename

# Pr√ºfe ob Datei bereits existiert
if file_exists_in_final_location(test_filename):
    print(f"\n‚úÖ Test-Datei existiert bereits: {test_filename}")
    print(f"   √úberspringe Download, f√ºhre nur Validierung durch...")

    validation_success = detailed_validation(str(test_file_final), BOUNDARIES_PATH)

    if validation_success:
        print("\n‚úÖ PHASE 1 ERFOLGREICH - BEREIT F√úR PHASE 2!")
    else:
        print("\n‚ùå EXISTIERENDE DATEI FEHLERHAFT - L√ñSCHE UND STARTE NEU")
        user_redownload = input("Neu downloaden? (ja/nein): ")
        if user_redownload.lower() in ['ja', 'j', 'yes', 'y']:
            test_file_final.unlink()
            print(f"   Gel√∂scht. F√ºhre Script erneut aus.")

else:
    print(f"\nTest-Datei: Rostock Juli 2021")

    # Stadtgrenzen laden
    gdf = gpd.read_file(GPKG_PATH)
    gdf_wgs84 = gdf.to_crs('EPSG:4326')

    rostock = gdf_wgs84[gdf_wgs84['gen'] == 'Rostock'].iloc[0]
    bounds = rostock.geometry.bounds
    aoi = create_aoi([bounds[0], bounds[1], bounds[2], bounds[3]])

    print(f"\nStarte Download...")
    print(f"   Rostock 2021-07 ... ", end='', flush=True)

    try:
        monthly_image, scene_count, cloud_threshold = process_month(2021, 7, aoi, 'Rostock')

        if monthly_image is None:
            print(f"‚ùå Keine Szenen")
        else:
            print(f"({scene_count} Szenen, <{cloud_threshold}%) ... ", end='', flush=True)

            task = export_to_drive(monthly_image, 'Rostock', 2021, 7, aoi)
            print(f"Task gestartet")

            # Monitoring
            print(f"\nWarte auf Abschluss (max. 30min)...")
            for check_num in range(30):
                time.sleep(60 if check_num > 0 else 10)
                status = task.status()
                state = status['state']

                if state == 'COMPLETED':
                    print(f"   ‚úÖ Abgeschlossen nach {check_num+1} Checks")
                    break
                elif state == 'FAILED':
                    print(f"   ‚ùå Fehlgeschlagen: {status.get('error_message', 'Unknown')}")
                    break
                elif state in ['READY', 'RUNNING']:
                    print(f"   [{check_num+1}/30] {state}")

            # Warte auf Drive-Sync
            print(f"\nWarte 15s auf Drive-Sync...")
            time.sleep(15)

            # Verschiebe Datei von GEE-Ordner zum Zielordner
            print(f"\nVerschiebe Datei zum Zielordner...")
            move_success = move_file_to_final_location(test_filename)

            if not move_success:
                print(f"\n‚ö†Ô∏è Datei konnte nicht verschoben werden")
                print(f"   Pr√ºfe manuell:")
                print(f"   - GEE-Ordner: {GEE_OUTPUT_DIR}")
                print(f"   - Zielordner: {LOCAL_OUTPUT_DIR}")
            else:
                # Validierung
                validation_success = detailed_validation(str(test_file_final), BOUNDARIES_PATH)

                if validation_success:
                    print("\n‚úÖ PHASE 1 ERFOLGREICH - BEREIT F√úR PHASE 2!")
                else:
                    print("\n‚ùå PHASE 1 FEHLGESCHLAGEN")

    except Exception as e:
        print(f"\n‚ùå Exception: {e}")
        import traceback
        traceback.print_exc()

print("\n" + "="*80)
print("PHASE 1 ABGESCHLOSSEN")
print("="*80)


SENTINEL-2 DOWNLOAD PIPELINE - FINALE VERSION
‚úÖ Verzeichnisse bereit:
   GEE Export: /content/drive/MyDrive/sentinel2_2021_final
   Zielordner: /content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/sentinel2_2021

Authentifiziere GEE...
‚úÖ GEE bereit

PHASE 1: TEST-DOWNLOAD

Test-Datei: Rostock Juli 2021

Starte Download...
   Rostock 2021-07 ... (3 Szenen, <20%) ... Task gestartet

Warte auf Abschluss (max. 30min)...
   [1/30] RUNNING
   [2/30] RUNNING
   [3/30] RUNNING
   [4/30] RUNNING
   [5/30] RUNNING
   [6/30] RUNNING
   [7/30] RUNNING
   [8/30] RUNNING
   [9/30] RUNNING
   [10/30] RUNNING
   ‚úÖ Abgeschlossen nach 11 Checks

Warte 15s auf Drive-Sync...

Verschiebe Datei zum Zielordner...
      ‚úÖ Verschoben: S2_Rostock_2021_07_median.tif

DETAILLIERTE VALIDIERUNG

Datei: S2_Rostock_2021_07_median.tif

1. GRUNDLEGENDE EIGENSCHAFTEN
--------------------------------------------------------------------------------
   B√§nder: 15/15 ‚úÖ
   CRS: EPSG:25832 ‚úÖ
 

## Phase 2: Volst√§dniger Download

In [61]:
# ============================================================================
# PHASE 2: VOLLST√ÑNDIGER DOWNLOAD
# ============================================================================

print("\n" + "="*80)
print("PHASE 2: VOLLST√ÑNDIGER DOWNLOAD")
print("="*80)

user_confirm = input("\nPhase 1 erfolgreich? Vollst√§ndigen Download starten? (ja/nein): ")

if user_confirm.lower() not in ['ja', 'j', 'yes', 'y']:
    print("Abgebrochen.")
else:
    print(f"\nStarte Download: 3 St√§dte √ó 12 Monate = 36 Dateien")
    print(f"   GEE exportiert nach: {GEE_OUTPUT_DIR}")
    print(f"   Dateien werden verschoben nach: {LOCAL_OUTPUT_DIR}")

    # Stadtgrenzen
    gdf = gpd.read_file(GPKG_PATH)
    gdf_wgs84 = gdf.to_crs('EPSG:4326')

    CITIES = {}
    for idx, row in gdf_wgs84.iterrows():
        city_name = row['gen']
        bounds = row.geometry.bounds
        CITIES[city_name] = {'bounds': [bounds[0], bounds[1], bounds[2], bounds[3]]}

    all_tasks = []

    # Download pro Stadt/Monat
    for city_name, config in CITIES.items():
        print(f"\n{'='*80}")
        print(f"{city_name}")
        print(f"{'='*80}")

        aoi = create_aoi(config['bounds'])

        for month in MONTHS:
            filename = f'S2_{city_name}_{YEAR}_{month:02d}_median.tif'

            # Pr√ºfe ob bereits vorhanden
            if file_exists_in_final_location(filename):
                print(f"   {YEAR}-{month:02d} ... ‚è≠Ô∏è Existiert bereits")
                continue

            print(f"   {YEAR}-{month:02d} ... ", end='', flush=True)

            try:
                monthly_image, scene_count, cloud_threshold = process_month(YEAR, month, aoi, city_name)

                if monthly_image is None:
                    print(f"‚ö†Ô∏è Keine Szenen")
                    continue

                task = export_to_drive(monthly_image, city_name, YEAR, month, aoi)
                all_tasks.append({
                    'city': city_name,
                    'month': month,
                    'filename': filename,
                    'task': task,
                    'scene_count': scene_count
                })

                print(f"‚úÖ ({scene_count} Szenen)")
                time.sleep(2)

            except Exception as e:
                print(f"‚ùå {e}")

    if len(all_tasks) == 0:
        print(f"\n‚úÖ Alle Dateien existieren bereits - √úberspringe Download")
        print(f"   Fahre mit Validierung fort...")
    else:
        print(f"\n{'='*80}")
        print(f"‚úÖ {len(all_tasks)} neue Tasks gestartet")
        print(f"{'='*80}")

        # Monitoring
        print(f"\n√úberwache Tasks (alle 60s, max. 240 Checks)...")

        for check_num in range(240):
            time.sleep(60 if check_num > 0 else 10)

            status_counts = {'READY': 0, 'RUNNING': 0, 'COMPLETED': 0, 'FAILED': 0}

            for item in all_tasks:
                state = item['task'].status()['state']
                status_counts[state] = status_counts.get(state, 0) + 1

            print(f"[{check_num+1}/60] ", end='')
            for state, count in status_counts.items():
                if count > 0:
                    print(f"{state}: {count} | ", end='')
            print()

            if status_counts['COMPLETED'] == len(all_tasks):
                print(f"\n‚úÖ Alle Tasks abgeschlossen!")
                break

        print(f"\n{'='*80}")
        print(f"Download-Phase beendet: {status_counts['COMPLETED']}/{len(all_tasks)} erfolgreich")
        print(f"{'='*80}")

        # Warte auf Drive-Sync
        print(f"\nWarte 30s auf Drive-Sync...")
        time.sleep(30)

        # Verschiebe alle Dateien
        print(f"\nVerschiebe Dateien zum Zielordner...")
        moved_count = 0
        for item in all_tasks:
            if move_file_to_final_location(item['filename']):
                moved_count += 1

        print(f"   ‚úÖ {moved_count}/{len(all_tasks)} Dateien verschoben")

    # Batch-Validierung
    import glob
    all_files = sorted(glob.glob(str(LOCAL_OUTPUT_DIR / '*.tif')))

    if len(all_files) == 0:
        print(f"\n‚ö†Ô∏è Keine Dateien gefunden in {LOCAL_OUTPUT_DIR}")
    else:
        validation_success = quick_batch_validation(all_files, BOUNDARIES_PATH)

        if validation_success:
            print(f"\n‚úÖ ALLE DATEIEN VALIDIERT - BEREIT F√úR FEATURE EXTRACTION!")
        else:
            print(f"\n‚ö†Ô∏è EINIGE DATEIEN MIT PROBLEMEN - Pr√ºfe CSV")

print("\n" + "="*80)
print("PIPELINE ABGESCHLOSSEN")
print("="*80)


PHASE 2: VOLLST√ÑNDIGER DOWNLOAD

Phase 1 erfolgreich? Vollst√§ndigen Download starten? (ja/nein): ja

Starte Download: 3 St√§dte √ó 12 Monate = 36 Dateien
   GEE exportiert nach: /content/drive/MyDrive/sentinel2_2021_final
   Dateien werden verschoben nach: /content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/sentinel2_2021

Hamburg
   2021-01 ...     Verf√ºgbare Szenen: 15
‚úÖ (15 Szenen)
   2021-02 ...     Verf√ºgbare Szenen: 14
‚úÖ (14 Szenen)
   2021-03 ...     Verf√ºgbare Szenen: 12
‚úÖ (12 Szenen)
   2021-04 ...     Verf√ºgbare Szenen: 12
‚úÖ (12 Szenen)
   2021-05 ...     Verf√ºgbare Szenen: 12
‚úÖ (12 Szenen)
   2021-06 ...     Verf√ºgbare Szenen: 12
‚úÖ (12 Szenen)
   2021-07 ...     Verf√ºgbare Szenen: 13
‚úÖ (13 Szenen)
   2021-08 ...     Verf√ºgbare Szenen: 13
‚úÖ (13 Szenen)
   2021-09 ...     Verf√ºgbare Szenen: 12
‚úÖ (12 Szenen)
   2021-10 ...     Verf√ºgbare Szenen: 16
‚úÖ (16 Szenen)
   2021-11 ...     Verf√ºgbare Szenen: 12
‚úÖ (12 Szenen)
   202



[49/60] READY: 4 | RUNNING: 5 | COMPLETED: 26 | 
[50/60] READY: 1 | RUNNING: 6 | COMPLETED: 28 | 
[51/60] READY: 1 | RUNNING: 6 | COMPLETED: 28 | 
[52/60] RUNNING: 6 | COMPLETED: 29 | 
[53/60] RUNNING: 6 | COMPLETED: 29 | 
[54/60] RUNNING: 4 | COMPLETED: 31 | 
[55/60] RUNNING: 4 | COMPLETED: 31 | 
[56/60] RUNNING: 4 | COMPLETED: 31 | 
[57/60] RUNNING: 3 | COMPLETED: 32 | 
[58/60] RUNNING: 2 | COMPLETED: 33 | 
[59/60] COMPLETED: 35 | 

‚úÖ Alle Tasks abgeschlossen!

Download-Phase beendet: 35/35 erfolgreich

Warte 30s auf Drive-Sync...

Verschiebe Dateien zum Zielordner...
      ‚úÖ Verschoben: S2_Hamburg_2021_01_median.tif
      ‚úÖ Verschoben: S2_Hamburg_2021_02_median.tif
      ‚úÖ Verschoben: S2_Hamburg_2021_03_median.tif
      ‚úÖ Verschoben: S2_Hamburg_2021_04_median.tif
      ‚úÖ Verschoben: S2_Hamburg_2021_05_median.tif
      ‚úÖ Verschoben: S2_Hamburg_2021_06_median.tif
      ‚úÖ Verschoben: S2_Hamburg_2021_07_median.tif
      ‚úÖ Verschoben: S2_Hamburg_2021_08_median.tif
     