# City2TABULA Validation Notebook

This notebook validates the calculations performed by the City2TABULA pipeline by comparing calculated building attributes against source thematic data from CityGML/CityJSON datasets.

## Validation Strategy

1. **Building-Level Attributes**: Height, footprint area, aggregated surface areas
2. **Surface-Level Attributes**: Individual surface area, tilt (roof only), azimuth (roof only)

The validation uses a configuration-driven approach where source property names are mapped to City2TABULA calculated columns via YAML configuration files.

## Stage 0: Load Configuration and Setup Database Connection

Load the validation configuration from YAML file based on the `COUNTRY` environment variable. The configuration contains:
- Dataset information and metadata
- Attribute mappings (source property names → City2TABULA columns)
- Database connection settings (automatically configured)
- Validation tolerances

In [7]:
# Add parent directory to Python path to import validation modules
import sys
import os

# Get the notebook directory
notebook_dir = os.getcwd()
print(f"Notebook directory: {notebook_dir}")

# Add to path (no need to go up if already in validation/)
if notebook_dir not in sys.path:
    sys.path.insert(0, notebook_dir)

# Configure matplotlib backend BEFORE importing pyplot
import matplotlib
# Use non-interactive backend for saving figures
matplotlib.use('Agg')  # Non-GUI backend for file output

# Now import from modules
from modules.config import load_config, print_config_summary
from modules.db import get_db_engine

# Get country from environment variable
country = os.getenv('COUNTRY', 'germany').lower()

# Build path to config file
config_path = os.path.join('configs', f'config_{country}.yaml')

# Load configuration
print(f"\nLoading configuration from: config_{country}.yaml")
config = load_config(config_path)

# Display configuration summary
print_config_summary(config)

# Set up output directory
output_dir = os.path.join('outputs')
os.makedirs(output_dir, exist_ok=True)
print(f"\nOutput directory: {output_dir}")

# Figure output format
fig_format = 'png'  # Options: 'png', 'svg', 'pdf', 'ipe'

# Initialize database engine
print("\nInitializing database connection...")
db_engine = get_db_engine(config)
print(f"Connected to database: city2tabula_{country}")

Notebook directory: /home/jayravani/projects/work/City2TABULA/validation

Loading configuration from: config_germany.yaml
Loaded configuration for: Germany

CONFIGURATION SUMMARY

 Dataset: LoD2 Dataset of Bavaria
   Country: Germany
   LoD: 2
   Description: Bavarian 3D city models in CityGML format with German property names

 Building Attributes:
   min_height           <- 'value' (m)
   max_height           <- 'value' (m)
   footprint_area       <- 'Flaeche' (m²)

 Surface Attributes:
   ROOF:
   surface_area         <- 'Flaeche' (m²)
   tilt                 <- 'Dachneigung' (degrees)
   azimuth              <- 'Dachorientierung' (degrees)
   WALL:
   surface_area         <- 'Flaeche' (m²)
   FLOOR:
   surface_area         <- 'Flaeche' (m²)

 Validation Tolerances:
   Absolute:
   height               ±0.5
   tilt                 ±2.0
   azimuth              ±5.0
   Percentage:
   footprint_area       ±5.0%
   surface_area         ±5.0%

Output directory: outputs

Initializing data

## Stage 1: Load Data from PostgreSQL Database

Load calculated data from City2TABULA tables and extract attribute mappings from config.

In [8]:
from modules.utils import load_city2tabula_data

# Load calculated data from City2TABULA tables
bf_df, sf_df = load_city2tabula_data(db_engine, config)

print("\nData loading complete.")
display(bf_df.head())
display(sf_df.head())

Loading building features from city2tabula.lod2_building_feature...
Loaded 26317 buildings
Loading surface features from city2tabula.lod2_child_feature_surface...
Loaded 26317 buildings
Loading surface features from city2tabula.lod2_child_feature_surface...
Loaded 376831 surfaces

Data loading complete.
Loaded 376831 surfaces

Data loading complete.


Unnamed: 0,id,building_feature_id,tabula_variant_code_id,tabula_variant_code,construction_year,comment,heating_demand,heating_demand_unit,footprint_area,footprint_complexity,...,area_total_roof_unit,area_total_wall,area_total_wall_unit,area_total_floor,area_total_floor_unit,surface_count_floor,surface_count_roof,surface_count_wall,building_centroid_geom,building_footprint_geom
0,bd818a73-4af7-4d67-b92a-04944e9bf0cd,268185,31,DE.N.MFH.04.Gen.ReEx.001.001,,,,,1416.86153,1,...,sqm,157.205249,sqm,0.0,sqm,0,1,4,0101000020E86400003989418080022841C420B00E56A9...,01060000A0E86400000100000001030000800100000005...
1,60a6a4ea-1440-4cb3-a482-8752d9db97f6,268192,121,DE.N.TH.02.Gen.ReEx.001.001,,,,,44.7281,1,...,sqm,82.992452,sqm,44.7281,sqm,0,2,7,0101000020E8640000B60021D8CD062841076BCC19CFAA...,01060000A0E86400000100000001030000800100000007...
2,2d749319-a0d9-4379-8d80-8fdcc2d53448,268203,43,DE.N.MFH.08.Gen.ReEx.001.001,,,,,260.1117,2,...,sqm,947.63963,sqm,1300.5585,sqm,0,4,15,0101000020E8640000458D70FC240428411E0C3245B5AA...,01060000A0E8640000010000000103000080010000000B...
3,e46159a8-fcb0-4db7-8f82-5d61610331b4,20250,127,DE.N.TH.04.Gen.ReEx.001.001,,,,,15.925839,1,...,sqm,129.442293,sqm,31.851678,sqm,0,3,10,0101000020E86400006BEBF03E83162841A83F3F2D66AA...,01060000A0E86400000100000001030000800100000005...
4,26fddfce-c3df-4218-8b3b-8212a6c8957a,109073,121,DE.N.TH.02.Gen.ReEx.001.001,,,,,19.69275,0,...,sqm,65.564082,sqm,39.3855,sqm,0,1,3,0101000020E864000037D0698313212841C058F2CBBCA9...,01060000A0E86400000100000001030000800100000004...


Unnamed: 0,surface_feature_id,building_feature_id,objectclass_id,classname,surface_area,tilt,azimuth,is_valid,is_planar
0,35700,36438,709,WallSurface,22.572968,0.0,-1.0,False,True
1,35221,35216,712,RoofSurface,24.055029,59.594224,194.493579,True,True
2,32993,32980,709,WallSurface,42.060006,0.0,-1.0,False,True
3,42877,42876,709,WallSurface,23.298616,0.0,-1.0,True,True
4,42846,42823,709,WallSurface,2.470607,0.0,-1.0,False,True


## Stage 2: Validate Surface Attributes

Validate calculated surface attributes (area, tilt, azimuth) against source thematic data.

In [9]:
from modules.utils import load_thematic_building_data, load_thematic_surface_data
from modules.validators import validate_building_attributes, validate_surface_attributes, get_validation_summary
from modules.config import get_building_attribute_mapping, get_surface_attribute_mapping
import pandas as pd

# =============================================================================
# BUILDING-LEVEL VALIDATION
# =============================================================================
print("="*80)
print("BUILDING-LEVEL ATTRIBUTE VALIDATION")
print("="*80)

# Get building attribute mapping
building_attr_map = get_building_attribute_mapping(config)
print(f"\nValidating {len(building_attr_map)} building attributes:")
for attr, label in building_attr_map.items():
    print(f"  - {attr}: '{label}'")

# Get building IDs
building_ids = bf_df['building_feature_id'].tolist()
print(f"\nBuildings to validate: {len(building_ids)}")

# Load thematic data from CityDB
building_thematic_df = load_thematic_building_data(
    engine=db_engine,
    config=config,
    building_feature_ids=building_ids,
    attribute_mapping=building_attr_map
)

# Validate building attributes
building_validation_df = validate_building_attributes(
    building_calc_df=bf_df,
    building_thematic_df=building_thematic_df,
    attribute_mapping=building_attr_map
)

# Display summary
if not building_validation_df.empty:
    building_summary = get_validation_summary(building_validation_df)
    print("\n" + "="*80)
    print("BUILDING VALIDATION SUMMARY")
    print("="*80)
    display(building_summary)
else:
    print("No building validation results")

# =============================================================================
# SURFACE-LEVEL VALIDATION (ROOFS)
# =============================================================================
print("\n" + "="*80)
print("ROOF SURFACE ATTRIBUTE VALIDATION")
print("="*80)

# Get roof attribute mapping
roof_attr_map = get_surface_attribute_mapping(config, 'roof')
print(f"\nValidating {len(roof_attr_map)} roof attributes:")
for attr, label in roof_attr_map.items():
    print(f"  - {attr}: '{label}'")

# Filter for roof surfaces
roof_surfaces_df = sf_df[sf_df['classname'] == 'RoofSurface'].copy()
roof_ids = roof_surfaces_df['surface_feature_id'].tolist()
print(f"\nRoof surfaces to validate: {len(roof_ids)}")

if roof_ids:
    # Load thematic data from CityDB
    roof_thematic_df = load_thematic_surface_data(
        engine=db_engine,
        config=config,
        surface_feature_ids=roof_ids,
        attribute_mapping=roof_attr_map,
        surface_type='RoofSurface'
    )

    # Validate roof attributes
    roof_validation_df = validate_surface_attributes(
        surface_calc_df=sf_df,
        surface_thematic_df=roof_thematic_df,
        attribute_mapping=roof_attr_map,
        surface_type='RoofSurface'
    )

    # Display summary
    if not roof_validation_df.empty:
        roof_summary = get_validation_summary(roof_validation_df)
        print("\n" + "="*80)
        print("ROOF VALIDATION SUMMARY")
        print("="*80)
        display(roof_summary)
    else:
        print("No roof validation results")
else:
    print("No roof surfaces found")
    roof_validation_df = pd.DataFrame()

# =============================================================================
# SURFACE-LEVEL VALIDATION (WALLS)
# =============================================================================
print("\n" + "="*80)
print("WALL SURFACE ATTRIBUTE VALIDATION")
print("="*80)

# Get wall attribute mapping
wall_attr_map = get_surface_attribute_mapping(config, 'wall')
print(f"\nValidating {len(wall_attr_map)} wall attributes:")
for attr, label in wall_attr_map.items():
    print(f"  - {attr}: '{label}'")

# Filter for wall surfaces
wall_surfaces_df = sf_df[sf_df['classname'] == 'WallSurface'].copy()
wall_ids = wall_surfaces_df['surface_feature_id'].tolist()
print(f"\nWall surfaces to validate: {len(wall_ids)}")

if wall_ids and wall_attr_map:
    # Load thematic data from CityDB
    wall_thematic_df = load_thematic_surface_data(
        engine=db_engine,
        config=config,
        surface_feature_ids=wall_ids,
        attribute_mapping=wall_attr_map,
        surface_type='WallSurface'
    )

    # Validate wall attributes
    wall_validation_df = validate_surface_attributes(
        surface_calc_df=sf_df,
        surface_thematic_df=wall_thematic_df,
        attribute_mapping=wall_attr_map,
        surface_type='WallSurface'
    )

    # Display summary
    if not wall_validation_df.empty:
        wall_summary = get_validation_summary(wall_validation_df)
        print("\n" + "="*80)
        print("WALL VALIDATION SUMMARY")
        print("="*80)
        display(wall_summary)
    else:
        print("No wall validation results")
else:
    print("No wall surfaces or attributes to validate")
    wall_validation_df = pd.DataFrame()

# =============================================================================
# SURFACE-LEVEL VALIDATION (FLOORS/GROUND)
# =============================================================================
print("\n" + "="*80)
print("FLOOR/GROUND SURFACE ATTRIBUTE VALIDATION")
print("="*80)

# Get floor attribute mapping
floor_attr_map = get_surface_attribute_mapping(config, 'floor')
print(f"\nValidating {len(floor_attr_map)} floor attributes:")
for attr, label in floor_attr_map.items():
    print(f"  - {attr}: '{label}'")

# Filter for ground surfaces
floor_surfaces_df = sf_df[sf_df['classname'] == 'GroundSurface'].copy()
floor_ids = floor_surfaces_df['surface_feature_id'].tolist()
print(f"\nFloor/Ground surfaces to validate: {len(floor_ids)}")

if floor_ids and floor_attr_map:
    # Load thematic data from CityDB
    floor_thematic_df = load_thematic_surface_data(
        engine=db_engine,
        config=config,
        surface_feature_ids=floor_ids,
        attribute_mapping=floor_attr_map,
        surface_type='GroundSurface'
    )

    # Validate floor attributes
    floor_validation_df = validate_surface_attributes(
        surface_calc_df=sf_df,
        surface_thematic_df=floor_thematic_df,
        attribute_mapping=floor_attr_map,
        surface_type='GroundSurface'
    )

    # Display summary
    if not floor_validation_df.empty:
        floor_summary = get_validation_summary(floor_validation_df)
        print("\n" + "="*80)
        print("FLOOR/GROUND VALIDATION SUMMARY")
        print("="*80)
        display(floor_summary)
    else:
        print("No floor validation results")
else:
    print("No floor/ground surfaces or attributes to validate")
    floor_validation_df = pd.DataFrame()

BUILDING-LEVEL ATTRIBUTE VALIDATION

Validating 3 building attributes:
  - min_height: 'value'
  - max_height: 'value'
  - footprint_area: 'Flaeche'

Buildings to validate: 26317
Loaded thematic data for 52634 building attribute values
Validated 52634 building attribute values across 26317 buildings

BUILDING VALIDATION SUMMARY
Loaded thematic data for 52634 building attribute values
Validated 52634 building attribute values across 26317 buildings

BUILDING VALIDATION SUMMARY


Unnamed: 0,attribute_name,count,mean_difference,std_difference,rmse,mean_percent_error,median_percent_error,std_percent_error
0,max_height,26317,2.7881,2.3583,3.6517,53.3025,32.6402,116.0297
1,min_height,26317,0.9439,2.273,2.4612,28.6414,0.0,117.3885



ROOF SURFACE ATTRIBUTE VALIDATION

Validating 3 roof attributes:
  - surface_area: 'Flaeche'
  - tilt: 'Dachneigung'
  - azimuth: 'Dachorientierung'

Roof surfaces to validate: 76023
Loaded thematic data for 170862 RoofSurface attribute values
  Excluded 13692 surfaces with azimuth = -1 (flat roofs/undefined)
Validated 214377 RoofSurface attribute values across 56954 surfaces

ROOF VALIDATION SUMMARY
Loaded thematic data for 170862 RoofSurface attribute values
  Excluded 13692 surfaces with azimuth = -1 (flat roofs/undefined)
Validated 214377 RoofSurface attribute values across 56954 surfaces

ROOF VALIDATION SUMMARY


Unnamed: 0,attribute_name,count,mean_difference,std_difference,rmse,mean_percent_error,median_percent_error,std_percent_error
0,azimuth,62331,0.3297,41.8805,41.8814,349.7754,-0.0002,19108.5761
1,surface_area,76023,-1.0772,15.3251,15.3628,-1.4898,-0.0001,8.7721
2,tilt,76023,-0.4806,7.2407,7.2565,-0.4712,0.0,52.3048



WALL SURFACE ATTRIBUTE VALIDATION

Validating 1 wall attributes:
  - surface_area: 'Flaeche'

Wall surfaces to validate: 262163
Loaded thematic data for 185391 WallSurface attribute values
Validated 262163 WallSurface attribute values across 185391 surfaces

WALL VALIDATION SUMMARY
Loaded thematic data for 185391 WallSurface attribute values
Validated 262163 WallSurface attribute values across 185391 surfaces

WALL VALIDATION SUMMARY


Unnamed: 0,attribute_name,count,mean_difference,std_difference,rmse,mean_percent_error,median_percent_error,std_percent_error
0,surface_area,262163,-2.0636,31.8479,31.9146,-2.6607,0.0001,16.8403



FLOOR/GROUND SURFACE ATTRIBUTE VALIDATION

Validating 1 floor attributes:
  - surface_area: 'Flaeche'

Floor/Ground surfaces to validate: 38645
Loaded thematic data for 26317 GroundSurface attribute values
Validated 38645 GroundSurface attribute values across 26317 surfaces

FLOOR/GROUND VALIDATION SUMMARY
Loaded thematic data for 26317 GroundSurface attribute values
Validated 38645 GroundSurface attribute values across 26317 surfaces

FLOOR/GROUND VALIDATION SUMMARY


Unnamed: 0,attribute_name,count,mean_difference,std_difference,rmse,mean_percent_error,median_percent_error,std_percent_error
0,surface_area,38645,0.0137,0.2359,0.2363,0.0087,0.0,0.032


In [10]:
# =============================================================================
# SAVE VALIDATION RESULTS
# =============================================================================
import os
from datetime import datetime

# Create timestamped output directory
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
results_dir = os.path.join(output_dir, config.get('dataset', {}).get('country', 'unknown'), f"validation_{timestamp}")
os.makedirs(results_dir, exist_ok=True)

print(f"\nSaving validation results to: {results_dir}")

# Save building validation results
if not building_validation_df.empty:
    building_output = os.path.join(results_dir, "building_validation.csv")
    building_validation_df.to_csv(building_output, index=False)
    print(f"Saved building validation: {building_output}")
    
    building_summary_output = os.path.join(results_dir, "building_summary.csv")
    building_summary.to_csv(building_summary_output, index=False)
    print(f"Saved building summary: {building_summary_output}")

# Save roof validation results
if not roof_validation_df.empty:
    roof_output = os.path.join(results_dir, "roof_validation.csv")
    roof_validation_df.to_csv(roof_output, index=False)
    print(f"Saved roof validation: {roof_output}")
    
    roof_summary_output = os.path.join(results_dir, "roof_summary.csv")
    roof_summary.to_csv(roof_summary_output, index=False)
    print(f"Saved roof summary: {roof_summary_output}")

# Save wall validation results
if not wall_validation_df.empty:
    wall_output = os.path.join(results_dir, "wall_validation.csv")
    wall_validation_df.to_csv(wall_output, index=False)
    print(f"Saved wall validation: {wall_output}")
    
    wall_summary_output = os.path.join(results_dir, "wall_summary.csv")
    wall_summary.to_csv(wall_summary_output, index=False)
    print(f"Saved wall summary: {wall_summary_output}")

# Save floor validation results
if not floor_validation_df.empty:
    floor_output = os.path.join(results_dir, "floor_validation.csv")
    floor_validation_df.to_csv(floor_output, index=False)
    
    floor_summary_output = os.path.join(results_dir, "floor_summary.csv")
    floor_summary.to_csv(floor_summary_output, index=False)
    print(f"Saved floor summary: {floor_summary_output}")

print(f"\n{'='*80}")
print("RESULTS SAVED")
print(f"{'='*80}")


Saving validation results to: outputs/Germany/validation_20260217_225807
Saved building validation: outputs/Germany/validation_20260217_225807/building_validation.csv
Saved building summary: outputs/Germany/validation_20260217_225807/building_summary.csv
Saved building validation: outputs/Germany/validation_20260217_225807/building_validation.csv
Saved building summary: outputs/Germany/validation_20260217_225807/building_summary.csv
Saved roof validation: outputs/Germany/validation_20260217_225807/roof_validation.csv
Saved roof summary: outputs/Germany/validation_20260217_225807/roof_summary.csv
Saved roof validation: outputs/Germany/validation_20260217_225807/roof_validation.csv
Saved roof summary: outputs/Germany/validation_20260217_225807/roof_summary.csv
Saved wall validation: outputs/Germany/validation_20260217_225807/wall_validation.csv
Saved wall summary: outputs/Germany/validation_20260217_225807/wall_summary.csv
Saved floor summary: outputs/Germany/validation_20260217_225807/

## Stage 2.5: Export Problematic Buildings

Identify buildings with surfaces that have high validation errors and export them with geometries for inspection in QGIS.

In [11]:
from modules.validators import export_problematic_surfaces

# =============================================================================
# EXPORT PROBLEMATIC SURFACES WITH GEOMETRIES
# =============================================================================

# Define error threshold (percentage error to flag as problematic)
error_threshold = 10.0

print("="*80)
print("EXPORTING PROBLEMATIC SURFACES")
print(f"Error threshold: {error_threshold}%")
print("="*80)

# Export problematic roofs
if not roof_validation_df.empty:
    print("\n--- Roof Surfaces ---")
    roof_prob_file = os.path.join(results_dir, 'problematic_roofs.csv')
    roof_prob = export_problematic_surfaces(roof_validation_df, roof_prob_file, error_threshold)

# Export problematic walls
if not wall_validation_df.empty:
    print("\n--- Wall Surfaces ---")
    wall_prob_file = os.path.join(results_dir, 'problematic_walls.csv')
    wall_prob = export_problematic_surfaces(wall_validation_df, wall_prob_file, error_threshold)

# Export problematic floors
if not floor_validation_df.empty:
    print("\n--- Floor Surfaces ---")
    floor_prob_file = os.path.join(results_dir, 'problematic_floors.csv')
    floor_prob = export_problematic_surfaces(floor_validation_df, floor_prob_file, error_threshold)

print("\n" + "="*80)
print("PROBLEMATIC SURFACES EXPORT COMPLETE")
print("="*80)
print("\nFiles contain: building_feature_id, surface_feature_id, geometry (WKT),")
print("              calculated_value, thematic_value, difference, percent_error")
print("\nTo visualize in QGIS:")
print("1. Layer → Add Layer → Add Delimited Text Layer")
print("2. Select the problematic_*.csv file")
print("3. Geometry definition → Well-Known Text (WKT)")
print("4. Geometry field: geom")
print("="*80)

EXPORTING PROBLEMATIC SURFACES
Error threshold: 10.0%

--- Roof Surfaces ---

Exported 11950 problematic validations
  - 4658 unique surfaces
  - 4655 unique buildings
  - Saved to: outputs/Germany/validation_20260217_225807/problematic_roofs.csv


--- Wall Surfaces ---

Exported 11429 problematic validations
  - 8405 unique surfaces
  - 6971 unique buildings
  - Saved to: outputs/Germany/validation_20260217_225807/problematic_walls.csv


--- Floor Surfaces ---
No surfaces found with errors above 10.0% threshold

PROBLEMATIC SURFACES EXPORT COMPLETE

Files contain: building_feature_id, surface_feature_id, geometry (WKT),
              calculated_value, thematic_value, difference, percent_error

To visualize in QGIS:
1. Layer → Add Layer → Add Delimited Text Layer
2. Select the problematic_*.csv file
3. Geometry definition → Well-Known Text (WKT)
4. Geometry field: geom

Exported 11950 problematic validations
  - 4658 unique surfaces
  - 4655 unique buildings
  - Saved to: outputs/Germa

## Stage 3: Generate Validation Plots

Create scatter plots and error distribution visualizations for validated attributes.

In [12]:
from modules.plots import (plot_comparison_scatter, plot_error_distribution, 
                            plot_percent_error_distribution, plot_multi_attribute_comparison)
import matplotlib.pyplot as plt

# Create plots subdirectory
plots_dir = os.path.join(results_dir, "plots")
os.makedirs(plots_dir, exist_ok=True)

print("="*80)
print("GENERATING VALIDATION PLOTS")
print("="*80)

# =============================================================================
# BUILDING ATTRIBUTE PLOTS
# =============================================================================
if not building_validation_df.empty:
    print("\nGenerating building attribute plots...")
    print(building_validation_df.head())
    # Multi-attribute comparison
    plot_multi_attribute_comparison(
        building_validation_df,
        save_path=os.path.join(plots_dir, f"building_multi_comparison.{fig_format}"),
        title_prefix="Building",
        fig_format=fig_format
    )
    
    # Individual attribute plots
    for attr in building_validation_df['attribute_name'].unique():        
        plot_comparison_scatter(
            building_validation_df, attr,
            save_path=os.path.join(plots_dir, f"building_{attr}_scatter.{fig_format}")
        )
        
        plot_error_distribution(
            building_validation_df, attr,
            save_path=os.path.join(plots_dir, f"building_{attr}_error_dist.{fig_format}")
        )
        
        plot_percent_error_distribution(
            building_validation_df, attr,
            save_path=os.path.join(plots_dir, f"building_{attr}_percent_error.{fig_format}")
        )

# =============================================================================
# ROOF SURFACE ATTRIBUTE PLOTS
# =============================================================================
if not roof_validation_df.empty:
    print("\nGenerating roof surface attribute plots...")
    
    plot_multi_attribute_comparison(
        roof_validation_df,
        save_path=os.path.join(plots_dir, f"roof_multi_comparison.{fig_format}"),
        title_prefix="Roof",
        fig_format=fig_format
    )
    
    for attr in roof_validation_df['attribute_name'].unique():
        
        plot_comparison_scatter(
            roof_validation_df, attr,
            save_path=os.path.join(plots_dir, f"roof_{attr}_scatter.{fig_format}")
        )
        
        plot_error_distribution(
            roof_validation_df, attr,
            save_path=os.path.join(plots_dir, f"roof_{attr}_error_dist.{fig_format}")
        )
        
        plot_percent_error_distribution(
            roof_validation_df, attr,
            save_path=os.path.join(plots_dir, f"roof_{attr}_percent_error.{fig_format}")
        )

# =============================================================================
# WALL SURFACE ATTRIBUTE PLOTS
# =============================================================================
if not wall_validation_df.empty:
    print("\nGenerating wall surface attribute plots...")
    
    plot_multi_attribute_comparison(
        wall_validation_df,
        save_path=os.path.join(plots_dir, f"wall_multi_comparison.{fig_format}"),
        title_prefix="Wall",
        fig_format=fig_format
    )
    
    for attr in wall_validation_df['attribute_name'].unique():

        plot_comparison_scatter(
            wall_validation_df, attr,
            save_path=os.path.join(plots_dir, f"wall_{attr}_scatter.{fig_format}")
        )
        
        plot_error_distribution(
            wall_validation_df, attr,
            save_path=os.path.join(plots_dir, f"wall_{attr}_error_dist.{fig_format}")
        )
        
        plot_percent_error_distribution(
            wall_validation_df, attr,
            save_path=os.path.join(plots_dir, f"wall_{attr}_percent_error.{fig_format}")
        )

# =============================================================================
# FLOOR SURFACE ATTRIBUTE PLOTS
# =============================================================================
if not floor_validation_df.empty:
    print("\nGenerating floor surface attribute plots...")
    
    plot_multi_attribute_comparison(
        floor_validation_df,
        save_path=os.path.join(plots_dir, f"floor_multi_comparison.{fig_format}"),
        title_prefix="Floor",
        fig_format=fig_format
    )
    
    for attr in floor_validation_df['attribute_name'].unique():
        
        plot_comparison_scatter(
            floor_validation_df, attr,
            save_path=os.path.join(plots_dir, f"floor_{attr}_scatter.{fig_format}")
        )
        
        plot_error_distribution(
            floor_validation_df, attr,
            save_path=os.path.join(plots_dir, f"floor_{attr}_error_dist.{fig_format}")
        )
        
        plot_percent_error_distribution(
            floor_validation_df, attr,
            save_path=os.path.join(plots_dir, f"floor_{attr}_percent_error.{fig_format}")
        )

print(f"\nAll plots saved to: {plots_dir}")

GENERATING VALIDATION PLOTS

Generating building attribute plots...
   building_feature_id attribute_name  calculated_value  thematic_value  \
0               268185     min_height             1.000           0.442   
1               268192     min_height             3.200           3.197   
2               268203     min_height            12.040          12.039   
3                20250     min_height             4.050           3.910   
4               109073     min_height             4.639           4.639   

     difference  percent_error  
0  5.580000e-01   1.262443e+02  
1  3.000000e-03   9.383797e-02  
2  1.000000e-03   8.306338e-03  
3  1.400000e-01   3.580563e+00  
4  9.769963e-15   2.106049e-13  

Generating roof surface attribute plots...

Generating roof surface attribute plots...
No valid percentage errors for attribute 'tilt'
No valid percentage errors for attribute 'tilt'
No valid percentage errors for attribute 'azimuth'

Generating wall surface attribute plots...
No v

## Stage 4: Interpretation & Summary

Review the validation results and summary statistics.

In [13]:
print("="*80)
print("VALIDATION SUMMARY REPORT")
print("="*80)

# =============================================================================
# BUILDING VALIDATION SUMMARY
# =============================================================================
if not building_validation_df.empty:
    print("\n" + "="*80)
    print("BUILDING ATTRIBUTE VALIDATION")
    print("="*80)
    print(f"\nTotal buildings validated: {building_validation_df['building_feature_id'].nunique()}")
    print(f"Total comparisons: {len(building_validation_df)}")
    print("\nValidation Statistics:")
    display(building_summary)
else:
    print("\nNo building validation data available")

# =============================================================================
# ROOF SURFACE VALIDATION SUMMARY
# =============================================================================
if not roof_validation_df.empty:
    print("\n" + "="*80)
    print("ROOF SURFACE ATTRIBUTE VALIDATION")
    print("="*80)
    print(f"\nTotal roof surfaces validated: {roof_validation_df['surface_feature_id'].nunique()}")
    print(f"Total comparisons: {len(roof_validation_df)}")
    print("\nValidation Statistics:")
    display(roof_summary)
else:
    print("\nNo roof surface validation data available")

# =============================================================================
# WALL SURFACE VALIDATION SUMMARY
# =============================================================================
if not wall_validation_df.empty:
    print("\n" + "="*80)
    print("WALL SURFACE ATTRIBUTE VALIDATION")
    print("="*80)
    print(f"\nTotal wall surfaces validated: {wall_validation_df['surface_feature_id'].nunique()}")
    print(f"Total comparisons: {len(wall_validation_df)}")
    print("\nValidation Statistics:")
    display(wall_summary)
else:
    print("\nNo wall surface validation data available")

# =============================================================================
# FLOOR SURFACE VALIDATION SUMMARY
# =============================================================================
if not floor_validation_df.empty:
    print("\n" + "="*80)
    print("FLOOR SURFACE ATTRIBUTE VALIDATION")
    print("="*80)
    print(f"\nTotal floor surfaces validated: {floor_validation_df['surface_feature_id'].nunique()}")
    print(f"Total comparisons: {len(floor_validation_df)}")
    print("\nValidation Statistics:")
    display(floor_summary)
else:
    print("\nNo floor surface validation data available")

# =============================================================================
# OVERALL SUMMARY
# =============================================================================
print("\n" + "="*80)
print("OVERALL VALIDATION SUMMARY")
print("="*80)

total_validations = 0
if not building_validation_df.empty:
    total_validations += len(building_validation_df)
if not roof_validation_df.empty:
    total_validations += len(roof_validation_df)
if not wall_validation_df.empty:
    total_validations += len(wall_validation_df)
if not floor_validation_df.empty:
    total_validations += len(floor_validation_df)

print(f"\nTotal validation comparisons: {total_validations}")
print(f"Results directory: {results_dir}")
print("\n" + "="*80)

VALIDATION SUMMARY REPORT

BUILDING ATTRIBUTE VALIDATION

Total buildings validated: 26317
Total comparisons: 52634

Validation Statistics:


Unnamed: 0,attribute_name,count,mean_difference,std_difference,rmse,mean_percent_error,median_percent_error,std_percent_error
0,max_height,26317,2.7881,2.3583,3.6517,53.3025,32.6402,116.0297
1,min_height,26317,0.9439,2.273,2.4612,28.6414,0.0,117.3885



ROOF SURFACE ATTRIBUTE VALIDATION

Total roof surfaces validated: 56954
Total comparisons: 214377

Validation Statistics:


Unnamed: 0,attribute_name,count,mean_difference,std_difference,rmse,mean_percent_error,median_percent_error,std_percent_error
0,azimuth,62331,0.3297,41.8805,41.8814,349.7754,-0.0002,19108.5761
1,surface_area,76023,-1.0772,15.3251,15.3628,-1.4898,-0.0001,8.7721
2,tilt,76023,-0.4806,7.2407,7.2565,-0.4712,0.0,52.3048



WALL SURFACE ATTRIBUTE VALIDATION

Total wall surfaces validated: 185391
Total comparisons: 262163

Validation Statistics:


Unnamed: 0,attribute_name,count,mean_difference,std_difference,rmse,mean_percent_error,median_percent_error,std_percent_error
0,surface_area,262163,-2.0636,31.8479,31.9146,-2.6607,0.0001,16.8403



FLOOR SURFACE ATTRIBUTE VALIDATION

Total floor surfaces validated: 26317
Total comparisons: 38645

Validation Statistics:


Unnamed: 0,attribute_name,count,mean_difference,std_difference,rmse,mean_percent_error,median_percent_error,std_percent_error
0,surface_area,38645,0.0137,0.2359,0.2363,0.0087,0.0,0.032



OVERALL VALIDATION SUMMARY

Total validation comparisons: 567819
Results directory: outputs/Germany/validation_20260217_225807



## Stage 5: Export Notebook as HTML & PDF

Export this notebook with all outputs to HTML and PDF formats for documentation.

In [14]:
import subprocess

print("="*80)
print("EXPORTING NOTEBOOK")
print("="*80)

# Get the notebook filename
notebook_path = "validation.ipynb"
notebook_name = os.path.splitext(os.path.basename(notebook_path))[0]

# Export paths
html_output = os.path.join(results_dir, f"{notebook_name}_report.html")
pdf_output = os.path.join(results_dir, f"{notebook_name}_report.pdf")

try: 
    # Export to HTML
    try:
        result = subprocess.run(
            ["jupyter", "nbconvert", "--to", "html", notebook_path, "--output", html_output],
            capture_output=True,
            text=True,
            check=True
        )
    except subprocess.CalledProcessError as e:
        print(f"HTML export failed: {e.stderr}")
    except FileNotFoundError:
        print("jupyter nbconvert not found. Install with: pip install nbconvert")

    # Export to PDF (requires nbconvert and additional dependencies)
    try:
        # Check if wkhtmltopdf or similar is available
        result = subprocess.run(
            ["jupyter", "nbconvert", "--to", "pdf", notebook_path, "--output", pdf_output],
            capture_output=True,
            text=True,
            check=True
        )
    except subprocess.CalledProcessError as e:
        print(f"PDF export failed: {e.stderr}")
        print("   PDF export requires additional dependencies:")
        print("   - Install pandoc: conda install pandoc")
        print("   - Install LaTeX: conda install -c conda-forge texlive-core")
        print("   Alternative: Use HTML export and print to PDF from browser")
    except FileNotFoundError:
        print("jupyter nbconvert not found. Install with: pip install nbconvert")

    print("="*80)
    print("NOTEBOOK EXPORTED SUCCESSFULLY AS PDF & HTML DOCUMENT!")
    print("="*80)

except:
    print("="*80)
    print("NOTEBOOK EXPORTED FAILED!")
    print("="*80)

EXPORTING NOTEBOOK
HTML export failed: usage: jupyter [-h] [--version] [--config-dir] [--data-dir] [--runtime-dir]
               [--paths] [--json] [--debug]
               [subcommand]

Jupyter: Interactive Computing

positional arguments:
  subcommand     the subcommand to launch

options:
  -h, --help     show this help message and exit
  --version      show the versions of core jupyter packages and exit
  --config-dir   show Jupyter config dir
  --data-dir     show Jupyter data dir
  --runtime-dir  show Jupyter runtime dir
  --paths        show all Jupyter paths. Add --json for machine-readable
                 format.
  --json         output paths as machine-readable json
  --debug        output debug information about paths

Available subcommands: kernel kernelspec migrate run troubleshoot

Jupyter command `jupyter-nbconvert` not found.

PDF export failed: usage: jupyter [-h] [--version] [--config-dir] [--data-dir] [--runtime-dir]
               [--paths] [--json] [--debug]
    