# Phase 5: Feature Normalisierung & Model-Ready Export

**Ziel:** Features standardisieren ohne Data Leakage

**Input:**
- `data/splits/*.gpkg` (6 Dateien aus Phase 4 mit 184 Features pro Baum)

**Output:**
- `data/model_ready/experiment_0_1_single_city/` (Hamburg/Berlin Single-City)
- `data/model_ready/experiment_2_cross_city/` (Hamburg+Berlin ‚Üí Rostock Zero-Shot)
- `data/model_ready/experiment_3_finetuning/` (Fine-Tuning Eval)
- `feature_names.json`, `label_encoder.pkl`

**Methodik:**
- StandardScaler (Mean=0, Std=1)
- Experiment-spezifische Scaler (kein Leakage!)
- Label Encoding: Genus ‚Üí Numerisch (0-6)

## 1. Setup: Google Drive Mount + Imports

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# Imports
import geopandas as gpd
import pandas as pd
import numpy as np
import json
import pickle
from pathlib import Path
from sklearn.preprocessing import StandardScaler, LabelEncoder
import warnings
warnings.filterwarnings('ignore')

print('‚úÖ Imports erfolgreich')

‚úÖ Imports erfolgreich


## 2. Konfiguration

In [3]:
DRIVE_ROOT = Path("/content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit")

# Input-Pfade
SPLITS_DIR = DRIVE_ROOT / 'data' / 'splits'

# Output-Pfade
MODEL_READY_DIR = DRIVE_ROOT / 'data' / 'model_ready'
MODEL_READY_DIR.mkdir(parents=True, exist_ok=True)

# Experiment-spezifische Ordner
EXP_01_DIR = MODEL_READY_DIR / 'experiment_0_1_single_city'
EXP_02_DIR = MODEL_READY_DIR / 'experiment_2_cross_city'
EXP_03_DIR = MODEL_READY_DIR / 'experiment_3_finetuning'

for d in [EXP_01_DIR, EXP_02_DIR, EXP_03_DIR]:
    d.mkdir(parents=True, exist_ok=True)

# Hamburg/Berlin Unterordner f√ºr Exp. 0/1
(EXP_01_DIR / 'hamburg').mkdir(exist_ok=True)
(EXP_01_DIR / 'berlin').mkdir(exist_ok=True)

# Viable Genera (7 Gattungen)
VIABLE_GENERA = ['TILIA', 'ACER', 'QUERCUS', 'FRAXINUS', 'BETULA', 'SORBUS', 'PRUNUS']

# Feature-Anzahl (184 = 180 S2 + 4 CHM)
N_FEATURES = 184

print(f'Konfiguration:')
print(f'  Drive Root: {DRIVE_ROOT}')
print(f'  Viable Genera: {len(VIABLE_GENERA)} ({VIABLE_GENERA})')
print(f'  Expected Features: {N_FEATURES}')

Konfiguration:
  Drive Root: /content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit
  Viable Genera: 7 (['TILIA', 'ACER', 'QUERCUS', 'FRAXINUS', 'BETULA', 'SORBUS', 'PRUNUS'])
  Expected Features: 184


## 3. Hilfsfunktionen

In [4]:
def load_split_data(filename):
    """
    L√§dt Split-Datei und extrahiert Features + Labels.

    Args:
        filename: Name der GPKG-Datei (z.B. 'hamburg_train.gpkg')

    Returns:
        X: Feature-Matrix (n_samples, 184)
        y: Label-Array (n_samples,) als Genus-Namen
        gdf: Original GeoDataFrame
    """
    path = SPLITS_DIR / filename
    print(f'  Lade {filename}...')

    gdf = gpd.read_file(path)

    # Feature-Spalten: Alle numerischen au√üer tree_id, city, genus_latin, geometry, block_id
    exclude_cols = ['tree_id', 'city', 'genus_latin', 'geometry', 'block_id']
    feature_cols = [col for col in gdf.columns if col not in exclude_cols and gdf[col].dtype in ['float64', 'int64', 'float32', 'int32']]

    # Validierung
    if len(feature_cols) != N_FEATURES:
        print(f'    ‚ö†Ô∏è  WARNING: Erwartet {N_FEATURES} Features, gefunden {len(feature_cols)}')

    # Features extrahieren
    X = gdf[feature_cols].values.astype(np.float32)

    # Labels extrahieren
    y = gdf['genus_latin'].values

    print(f'    ‚úÖ {len(gdf):,} Samples, {X.shape[1]} Features')

    return X, y, gdf, feature_cols


def save_arrays(output_dir, X_train, y_train, X_val, y_val, scaler, suffix=''):
    """
    Speichert NumPy Arrays und Scaler.

    Args:
        output_dir: Zielverzeichnis
        X_train, y_train, X_val, y_val: Arrays
        scaler: Fitted StandardScaler
        suffix: Optional suffix f√ºr Dateinamen (z.B. '_hamburg')
    """
    output_dir = Path(output_dir)

    np.save(output_dir / f'X_train{suffix}.npy', X_train)
    np.save(output_dir / f'y_train{suffix}.npy', y_train)
    np.save(output_dir / f'X_val{suffix}.npy', X_val)
    np.save(output_dir / f'y_val{suffix}.npy', y_val)

    with open(output_dir / 'scaler.pkl', 'wb') as f:
        pickle.dump(scaler, f)

    print(f'    ‚úÖ Gespeichert in {output_dir}')


def save_test_arrays(output_dir, X_test, y_test, suffix=''):
    """
    Speichert Test-Arrays.
    """
    output_dir = Path(output_dir)

    np.save(output_dir / f'X_test{suffix}.npy', X_test)
    np.save(output_dir / f'y_test{suffix}.npy', y_test)

    print(f'    ‚úÖ Test-Arrays gespeichert in {output_dir}')


print('‚úÖ Hilfsfunktionen definiert')

‚úÖ Hilfsfunktionen definiert


## 4. Daten laden

In [5]:
print('=== LADE SPLIT-DATEN ===')

# Hamburg
print('\nHamburg:')
X_train_hh, y_train_hh, _, feature_cols = load_split_data('hamburg_train.gpkg')
X_val_hh, y_val_hh, _, _ = load_split_data('hamburg_val.gpkg')

# Berlin
print('\nBerlin:')
X_train_be, y_train_be, _, _ = load_split_data('berlin_train.gpkg')
X_val_be, y_val_be, _, _ = load_split_data('berlin_val.gpkg')

# Rostock
print('\nRostock:')
X_test_rostock_zs, y_test_rostock_zs, _, _ = load_split_data('rostock_zero_shot.gpkg')
X_test_rostock_ft, y_test_rostock_ft, _, _ = load_split_data('rostock_finetune_eval.gpkg')

print('\n‚úÖ Alle Daten geladen')

=== LADE SPLIT-DATEN ===

Hamburg:
  Lade hamburg_train.gpkg...
    ‚úÖ 8,371 Samples, 184 Features
  Lade hamburg_val.gpkg...
    ‚úÖ 2,129 Samples, 184 Features

Berlin:
  Lade berlin_train.gpkg...
    ‚úÖ 8,299 Samples, 184 Features
  Lade berlin_val.gpkg...
    ‚úÖ 1,989 Samples, 184 Features

Rostock:
  Lade rostock_zero_shot.gpkg...
    ‚úÖ 6,675 Samples, 184 Features
  Lade rostock_finetune_eval.gpkg...
    ‚úÖ 1,403 Samples, 184 Features

‚úÖ Alle Daten geladen


## 5. Label Encoding

**Wichtig:** Label Encoder wird einmal auf allen viable Genera gefittet (nicht nur auf Train).

In [6]:
print('=== LABEL ENCODING ===')

# Label Encoder fitten
label_encoder = LabelEncoder()
label_encoder.fit(VIABLE_GENERA)

print(f'\nLabel Mapping:')
for i, genus in enumerate(label_encoder.classes_):
    print(f'  {i}: {genus}')

# Transformiere alle Labels
print('\nTransformiere Labels...')
y_train_hh = label_encoder.transform(y_train_hh)
y_val_hh = label_encoder.transform(y_val_hh)
y_train_be = label_encoder.transform(y_train_be)
y_val_be = label_encoder.transform(y_val_be)
y_test_rostock_zs = label_encoder.transform(y_test_rostock_zs)
y_test_rostock_ft = label_encoder.transform(y_test_rostock_ft)

# Speichere Label Encoder
with open(MODEL_READY_DIR / 'label_encoder.pkl', 'wb') as f:
    pickle.dump(label_encoder, f)

print('‚úÖ Label Encoding abgeschlossen')

=== LABEL ENCODING ===

Label Mapping:
  0: ACER
  1: BETULA
  2: FRAXINUS
  3: PRUNUS
  4: QUERCUS
  5: SORBUS
  6: TILIA

Transformiere Labels...
‚úÖ Label Encoding abgeschlossen


## 6. Validierung: Label Distribution

In [7]:
print('=== CHECK: Label Distribution ===')

def check_labels(y, name):
    unique_labels = np.unique(y)
    counts = pd.Series(y).value_counts().sort_index()
    print(f'\n{name}:')
    print(f'  Gattungen vorhanden: {len(unique_labels)}/{len(VIABLE_GENERA)}')
    if len(unique_labels) < len(VIABLE_GENERA):
        missing = set(range(len(VIABLE_GENERA))) - set(unique_labels)
        print(f'  ‚ö†Ô∏è  Fehlende Gattungen: {[VIABLE_GENERA[i] for i in missing]}')
    print(f'  Verteilung:')
    for label, count in counts.items():
        genus_name = label_encoder.inverse_transform([label])[0]
        print(f'    {genus_name}: {count:,} ({count/len(y)*100:.1f}%)')

check_labels(y_train_hh, 'Hamburg Train')
check_labels(y_val_hh, 'Hamburg Val')
check_labels(y_train_be, 'Berlin Train')
check_labels(y_val_be, 'Berlin Val')
check_labels(y_test_rostock_zs, 'Rostock Zero-Shot')
check_labels(y_test_rostock_ft, 'Rostock Fine-Tune Eval')

=== CHECK: Label Distribution ===

Hamburg Train:
  Gattungen vorhanden: 7/7
  Verteilung:
    ACER: 1,192 (14.2%)
    BETULA: 1,217 (14.5%)
    FRAXINUS: 1,157 (13.8%)
    PRUNUS: 1,194 (14.3%)
    QUERCUS: 1,200 (14.3%)
    SORBUS: 1,205 (14.4%)
    TILIA: 1,206 (14.4%)

Hamburg Val:
  Gattungen vorhanden: 7/7
  Verteilung:
    ACER: 308 (14.5%)
    BETULA: 283 (13.3%)
    FRAXINUS: 343 (16.1%)
    PRUNUS: 306 (14.4%)
    QUERCUS: 300 (14.1%)
    SORBUS: 295 (13.9%)
    TILIA: 294 (13.8%)

Berlin Train:
  Gattungen vorhanden: 7/7
  Verteilung:
    ACER: 1,224 (14.7%)
    BETULA: 1,198 (14.4%)
    FRAXINUS: 1,239 (14.9%)
    PRUNUS: 1,184 (14.3%)
    QUERCUS: 1,226 (14.8%)
    SORBUS: 1,026 (12.4%)
    TILIA: 1,202 (14.5%)

Berlin Val:
  Gattungen vorhanden: 7/7
  Verteilung:
    ACER: 276 (13.9%)
    BETULA: 302 (15.2%)
    FRAXINUS: 261 (13.1%)
    PRUNUS: 316 (15.9%)
    QUERCUS: 274 (13.8%)
    SORBUS: 262 (13.2%)
    TILIA: 298 (15.0%)

Rostock Zero-Shot:
  Gattungen vorhanden: 7

## 7. Normalisierung: Experiment 0/1 - Hamburg Single-City

In [8]:
print('\n=== EXPERIMENT 0/1: Hamburg Single-City ===')

# Scaler auf Hamburg Train fitten
scaler_hamburg = StandardScaler()
scaler_hamburg.fit(X_train_hh)

# Transformiere Hamburg Train/Val
X_train_hh_scaled = scaler_hamburg.transform(X_train_hh)
X_val_hh_scaled = scaler_hamburg.transform(X_val_hh)

# Statistiken
print('\nTrain (nach Skalierung):')
print(f'  Mean: {X_train_hh_scaled.mean():.4f} (sollte ~0 sein)')
print(f'  Std:  {X_train_hh_scaled.std():.4f} (sollte ~1 sein)')

print('\nVal (nach Skalierung):')
print(f'  Mean: {X_val_hh_scaled.mean():.4f} (kann ‚â† 0 sein)')
print(f'  Std:  {X_val_hh_scaled.std():.4f} (kann ‚â† 1 sein)')

# Speichern
output_dir = EXP_01_DIR / 'hamburg'
save_arrays(output_dir, X_train_hh_scaled, y_train_hh, X_val_hh_scaled, y_val_hh, scaler_hamburg)

print('\n‚úÖ Hamburg Single-City abgeschlossen')


=== EXPERIMENT 0/1: Hamburg Single-City ===

Train (nach Skalierung):
  Mean: -0.0000 (sollte ~0 sein)
  Std:  1.0000 (sollte ~1 sein)

Val (nach Skalierung):
  Mean: 0.0148 (kann ‚â† 0 sein)
  Std:  1.9760 (kann ‚â† 1 sein)
    ‚úÖ Gespeichert in /content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/model_ready/experiment_0_1_single_city/hamburg

‚úÖ Hamburg Single-City abgeschlossen


## 8. Normalisierung: Experiment 0/1 - Berlin Single-City

In [9]:
print('\n=== EXPERIMENT 0/1: Berlin Single-City ===')

# Scaler auf Berlin Train fitten
scaler_berlin = StandardScaler()
scaler_berlin.fit(X_train_be)

# Transformiere Berlin Train/Val
X_train_be_scaled = scaler_berlin.transform(X_train_be)
X_val_be_scaled = scaler_berlin.transform(X_val_be)

# Statistiken
print('\nTrain (nach Skalierung):')
print(f'  Mean: {X_train_be_scaled.mean():.4f} (sollte ~0 sein)')
print(f'  Std:  {X_train_be_scaled.std():.4f} (sollte ~1 sein)')

print('\nVal (nach Skalierung):')
print(f'  Mean: {X_val_be_scaled.mean():.4f} (kann ‚â† 0 sein)')
print(f'  Std:  {X_val_be_scaled.std():.4f} (kann ‚â† 1 sein)')

# Speichern
output_dir = EXP_01_DIR / 'berlin'
save_arrays(output_dir, X_train_be_scaled, y_train_be, X_val_be_scaled, y_val_be, scaler_berlin)

print('\n‚úÖ Berlin Single-City abgeschlossen')


=== EXPERIMENT 0/1: Berlin Single-City ===

Train (nach Skalierung):
  Mean: -0.0000 (sollte ~0 sein)
  Std:  1.0000 (sollte ~1 sein)

Val (nach Skalierung):
  Mean: -0.0382 (kann ‚â† 0 sein)
  Std:  1.0143 (kann ‚â† 1 sein)
    ‚úÖ Gespeichert in /content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/model_ready/experiment_0_1_single_city/berlin

‚úÖ Berlin Single-City abgeschlossen


## 9. Normalisierung: Experiment 2 - Cross-City Transfer (Hamburg+Berlin ‚Üí Rostock)

**Kritisch:** Scaler wird auf Hamburg+Berlin COMBINED gefittet!

In [10]:
print('\n=== EXPERIMENT 2: Cross-City Transfer (Hamburg+Berlin ‚Üí Rostock) ===')

# Kombiniere Hamburg + Berlin Train
X_train_combined = np.vstack([X_train_hh, X_train_be])
print(f'\nCombined Train: {X_train_combined.shape[0]:,} Samples')

# Scaler auf Combined Train fitten
scaler_cross_city = StandardScaler()
scaler_cross_city.fit(X_train_combined)

# Transformiere alle Splits
print('\nTransformiere...')
X_train_hh_cc = scaler_cross_city.transform(X_train_hh)
X_train_be_cc = scaler_cross_city.transform(X_train_be)
X_val_hh_cc = scaler_cross_city.transform(X_val_hh)
X_val_be_cc = scaler_cross_city.transform(X_val_be)
X_test_rostock_zs_scaled = scaler_cross_city.transform(X_test_rostock_zs)

# Statistiken
print('\nTrain Combined (nach Skalierung):')
X_train_combined_scaled = np.vstack([X_train_hh_cc, X_train_be_cc])
print(f'  Mean: {X_train_combined_scaled.mean():.4f} (sollte ~0 sein)')
print(f'  Std:  {X_train_combined_scaled.std():.4f} (sollte ~1 sein)')

print('\nRostock Zero-Shot Test (nach Skalierung):')
print(f'  Mean: {X_test_rostock_zs_scaled.mean():.4f} (kann ‚â† 0 sein)')
print(f'  Std:  {X_test_rostock_zs_scaled.std():.4f} (kann ‚â† 1 sein)')

# Speichern
print('\nSpeichere...')
np.save(EXP_02_DIR / 'X_train_hamburg.npy', X_train_hh_cc)
np.save(EXP_02_DIR / 'y_train_hamburg.npy', y_train_hh)
np.save(EXP_02_DIR / 'X_train_berlin.npy', X_train_be_cc)
np.save(EXP_02_DIR / 'y_train_berlin.npy', y_train_be)
np.save(EXP_02_DIR / 'X_val_hamburg.npy', X_val_hh_cc)
np.save(EXP_02_DIR / 'y_val_hamburg.npy', y_val_hh)
np.save(EXP_02_DIR / 'X_val_berlin.npy', X_val_be_cc)
np.save(EXP_02_DIR / 'y_val_berlin.npy', y_val_be)
np.save(EXP_02_DIR / 'X_test_rostock_zero_shot.npy', X_test_rostock_zs_scaled)
np.save(EXP_02_DIR / 'y_test_rostock_zero_shot.npy', y_test_rostock_zs)

with open(EXP_02_DIR / 'scaler.pkl', 'wb') as f:
    pickle.dump(scaler_cross_city, f)

print(f'  ‚úÖ Gespeichert in {EXP_02_DIR}')

print('\n‚úÖ Cross-City Transfer abgeschlossen')


=== EXPERIMENT 2: Cross-City Transfer (Hamburg+Berlin ‚Üí Rostock) ===

Combined Train: 16,670 Samples

Transformiere...

Train Combined (nach Skalierung):
  Mean: -0.0000 (sollte ~0 sein)
  Std:  1.0000 (sollte ~1 sein)

Rostock Zero-Shot Test (nach Skalierung):
  Mean: 0.1568 (kann ‚â† 0 sein)
  Std:  1.0341 (kann ‚â† 1 sein)

Speichere...
  ‚úÖ Gespeichert in /content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/model_ready/experiment_2_cross_city

‚úÖ Cross-City Transfer abgeschlossen


## 10. Normalisierung: Experiment 3 - Fine-Tuning Eval

**Wichtig:** Nutzt denselben Scaler wie Exp. 2 (scaler_cross_city)!

In [11]:
print('\n=== EXPERIMENT 3: Fine-Tuning Eval ===')

# Transformiere Rostock Fine-Tuning Eval mit Cross-City Scaler
X_test_rostock_ft_scaled = scaler_cross_city.transform(X_test_rostock_ft)

# Statistiken
print('\nRostock Fine-Tuning Eval (nach Skalierung):')
print(f'  Mean: {X_test_rostock_ft_scaled.mean():.4f}')
print(f'  Std:  {X_test_rostock_ft_scaled.std():.4f}')

# Speichern
print('\nSpeichere...')
np.save(EXP_03_DIR / 'X_test_rostock_finetune_eval.npy', X_test_rostock_ft_scaled)
np.save(EXP_03_DIR / 'y_test_rostock_finetune_eval.npy', y_test_rostock_ft)

# Symlink zu Exp. 2 Scaler (oder kopieren)
import shutil
shutil.copy(EXP_02_DIR / 'scaler.pkl', EXP_03_DIR / 'scaler.pkl')

print(f'  ‚úÖ Gespeichert in {EXP_03_DIR}')
print('  ‚ÑπÔ∏è  Nutzt denselben Scaler wie Exp. 2 (scaler_cross_city)')

print('\n‚úÖ Fine-Tuning Eval abgeschlossen')


=== EXPERIMENT 3: Fine-Tuning Eval ===

Rostock Fine-Tuning Eval (nach Skalierung):
  Mean: 0.1621
  Std:  1.0837

Speichere...
  ‚úÖ Gespeichert in /content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/model_ready/experiment_3_finetuning
  ‚ÑπÔ∏è  Nutzt denselben Scaler wie Exp. 2 (scaler_cross_city)

‚úÖ Fine-Tuning Eval abgeschlossen


## 11. Export: Feature Names & Metadaten

In [12]:
print('\n=== EXPORT: Feature Names & Metadaten ===')

# Feature Names
feature_metadata = {
    'n_features': len(feature_cols),
    'feature_names': feature_cols,
    'feature_groups': {
        'sentinel2_bands': [col for col in feature_cols if any(band in col for band in ['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12'])],
        'vegetation_indices': [col for col in feature_cols if any(idx in col for idx in ['NDre', 'NDVIre', 'kNDVI', 'VARI', 'RTVIcore'])],
        'chm_features': [col for col in feature_cols if 'CHM' in col or col == 'height_m']
    },
    'viable_genera': VIABLE_GENERA,
    'n_classes': len(VIABLE_GENERA)
}

# Speichern
with open(MODEL_READY_DIR / 'feature_names.json', 'w') as f:
    json.dump(feature_metadata, f, indent=2)

print(f'‚úÖ Feature Metadata gespeichert: {MODEL_READY_DIR / "feature_names.json"}')
print(f'   {feature_metadata["n_features"]} Features total')
print(f'   - Sentinel-2 Bands: {len(feature_metadata["feature_groups"]["sentinel2_bands"])}')
print(f'   - Vegetation Indices: {len(feature_metadata["feature_groups"]["vegetation_indices"])}')
print(f'   - CHM Features: {len(feature_metadata["feature_groups"]["chm_features"])}')


=== EXPORT: Feature Names & Metadaten ===
‚úÖ Feature Metadata gespeichert: /content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/model_ready/feature_names.json
   184 Features total
   - Sentinel-2 Bands: 120
   - Vegetation Indices: 60
   - CHM Features: 4


## 12. Validierung: Shape & Konsistenz

In [13]:
print('\n=== FINAL VALIDATION: Shape & Konsistenz ===')

def validate_arrays(name, X, y, expected_features=N_FEATURES):
    print(f'\n{name}:')
    print(f'  X Shape: {X.shape}')
    print(f'  y Shape: {y.shape}')

    # Shape Checks
    assert X.shape[0] == y.shape[0], f"‚ùå Sample count mismatch: X={X.shape[0]}, y={y.shape[0]}"
    assert X.shape[1] == expected_features, f"‚ùå Feature count mismatch: expected {expected_features}, got {X.shape[1]}"

    # NaN Check
    if np.isnan(X).any():
        print(f"  ‚ö†Ô∏è  WARNING: {np.isnan(X).sum()} NaN values in X")

    # Label Check
    unique_labels = np.unique(y)
    if len(unique_labels) < len(VIABLE_GENERA):
        print(f"  ‚ö†Ô∏è  WARNING: Only {len(unique_labels)}/{len(VIABLE_GENERA)} classes present")

    print(f'  ‚úÖ Validation passed')

# Experiment 0/1
validate_arrays('Hamburg Train', X_train_hh_scaled, y_train_hh)
validate_arrays('Hamburg Val', X_val_hh_scaled, y_val_hh)
validate_arrays('Berlin Train', X_train_be_scaled, y_train_be)
validate_arrays('Berlin Val', X_val_be_scaled, y_val_be)

# Experiment 2
validate_arrays('Rostock Zero-Shot', X_test_rostock_zs_scaled, y_test_rostock_zs)

# Experiment 3
validate_arrays('Rostock Fine-Tune Eval', X_test_rostock_ft_scaled, y_test_rostock_ft)

print('\n‚úÖ Alle Validierungen bestanden')


=== FINAL VALIDATION: Shape & Konsistenz ===

Hamburg Train:
  X Shape: (8371, 184)
  y Shape: (8371,)
  ‚úÖ Validation passed

Hamburg Val:
  X Shape: (2129, 184)
  y Shape: (2129,)
  ‚úÖ Validation passed

Berlin Train:
  X Shape: (8299, 184)
  y Shape: (8299,)
  ‚úÖ Validation passed

Berlin Val:
  X Shape: (1989, 184)
  y Shape: (1989,)
  ‚úÖ Validation passed

Rostock Zero-Shot:
  X Shape: (6675, 184)
  y Shape: (6675,)
  ‚úÖ Validation passed

Rostock Fine-Tune Eval:
  X Shape: (1403, 184)
  y Shape: (1403,)
  ‚úÖ Validation passed

‚úÖ Alle Validierungen bestanden


## 13. Zusammenfassung

In [15]:
print('\n' + '='*70)
print('PHASE 5 ABGESCHLOSSEN: Feature Normalisierung & Model-Ready Export')
print('='*70)

print('\nüìä DATENSATZ-STATISTIK:')

print('\nExperiment 0/1 - Single-City:')
print(f'  Hamburg Train: {X_train_hh_scaled.shape[0]:,} Samples')
print(f'  Hamburg Val:   {X_val_hh_scaled.shape[0]:,} Samples')
print(f'  Berlin Train:  {X_train_be_scaled.shape[0]:,} Samples')
print(f'  Berlin Val:    {X_val_be_scaled.shape[0]:,} Samples')

print('\nExperiment 2 - Cross-City Transfer:')
print(f'  Combined Train (HH+BE): {X_train_combined_scaled.shape[0]:,} Samples')
print(f'  Rostock Zero-Shot Test: {X_test_rostock_zs_scaled.shape[0]:,} Samples')

print('\nExperiment 3 - Fine-Tuning:')
print(f'  Rostock Fine-Tune Eval: {X_test_rostock_ft_scaled.shape[0]:,} Samples')

print('\nüìÅ EXPORTIERTE DATEIEN:')
print(f'\n{MODEL_READY_DIR}/')
print('‚îú‚îÄ‚îÄ experiment_0_1_single_city/')
print('‚îÇ   ‚îú‚îÄ‚îÄ hamburg/')
print('‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ X_train.npy, y_train.npy')
print('‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ X_val.npy, y_val.npy')
print('‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ scaler.pkl')
print('‚îÇ   ‚îî‚îÄ‚îÄ berlin/')
print('‚îÇ       ‚îú‚îÄ‚îÄ X_train.npy, y_train.npy')
print('‚îÇ       ‚îú‚îÄ‚îÄ X_val.npy, y_val.npy')
print('‚îÇ       ‚îî‚îÄ‚îÄ scaler.pkl')
print('‚îú‚îÄ‚îÄ experiment_2_cross_city/')
print('‚îÇ   ‚îú‚îÄ‚îÄ X_train_hamburg.npy, y_train_hamburg.npy')
print('‚îÇ   ‚îú‚îÄ‚îÄ X_train_berlin.npy, y_train_berlin.npy')
print('‚îÇ   ‚îú‚îÄ‚îÄ X_val_hamburg.npy, y_val_hamburg.npy')
print('‚îÇ   ‚îú‚îÄ‚îÄ X_val_berlin.npy, y_val_berlin.npy')
print('‚îÇ   ‚îú‚îÄ‚îÄ X_test_rostock_zero_shot.npy')
print('‚îÇ   ‚îú‚îÄ‚îÄ y_test_rostock_zero_shot.npy')
print('‚îÇ   ‚îî‚îÄ‚îÄ scaler.pkl (Hamburg+Berlin Combined)')
print('‚îú‚îÄ‚îÄ experiment_3_finetuning/')
print('‚îÇ   ‚îú‚îÄ‚îÄ X_test_rostock_finetune_eval.npy')
print('‚îÇ   ‚îú‚îÄ‚îÄ y_test_rostock_finetune_eval.npy')
print('‚îÇ   ‚îî‚îÄ‚îÄ scaler.pkl (Copy of Exp. 2)')
print('‚îú‚îÄ‚îÄ feature_names.json')
print('‚îî‚îÄ‚îÄ label_encoder.pkl')

print('\nüî¨ SCALER-STRATEGIE:')
print('  ‚úÖ Hamburg Single-City: Scaler auf Hamburg Train gefittet')
print('  ‚úÖ Berlin Single-City: Scaler auf Berlin Train gefittet')
print('  ‚úÖ Cross-City (Exp. 2): Scaler auf Hamburg+Berlin Combined gefittet')
print('  ‚úÖ Fine-Tuning (Exp. 3): Nutzt Exp. 2 Scaler (kein Leakage!)')

print('\n‚úÖ KEIN DATA LEAKAGE - Scaler immer NUR auf Train gefittet!')
print('\n‚è≠Ô∏è  N√ÑCHSTER SCHRITT: Experiment 0 - RF/CNN Baseline Training')


PHASE 5 ABGESCHLOSSEN: Feature Normalisierung & Model-Ready Export

üìä DATENSATZ-STATISTIK:

Experiment 0/1 - Single-City:
  Hamburg Train: 8,371 Samples
  Hamburg Val:   2,129 Samples
  Berlin Train:  8,299 Samples
  Berlin Val:    1,989 Samples

Experiment 2 - Cross-City Transfer:
  Combined Train (HH+BE): 16,670 Samples
  Rostock Zero-Shot Test: 6,675 Samples

Experiment 3 - Fine-Tuning:
  Rostock Fine-Tune Eval: 1,403 Samples

üìÅ EXPORTIERTE DATEIEN:

/content/drive/MyDrive/Studium/Geoinformation/Module/Projektarbeit/data/model_ready/
‚îú‚îÄ‚îÄ experiment_0_1_single_city/
‚îÇ   ‚îú‚îÄ‚îÄ hamburg/
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ X_train.npy, y_train.npy
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ X_val.npy, y_val.npy
‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ scaler.pkl
‚îÇ   ‚îî‚îÄ‚îÄ berlin/
‚îÇ       ‚îú‚îÄ‚îÄ X_train.npy, y_train.npy
‚îÇ       ‚îú‚îÄ‚îÄ X_val.npy, y_val.npy
‚îÇ       ‚îî‚îÄ‚îÄ scaler.pkl
‚îú‚îÄ‚îÄ experiment_2_cross_city/
‚îÇ   ‚îú‚îÄ‚îÄ X_train_hamburg.npy, y_train_hamburg.npy
‚îÇ   ‚îú‚îÄ‚îÄ X_train_berlin.