# City2TABULA Validation Notebook

This notebook validates the calculations performed by the City2TABULA pipeline by comparing calculated building attributes against source thematic data from CityGML/CityJSON datasets.

## Validation Strategy

1. **Building-Level Attributes**: Height, footprint area, aggregated surface areas
2. **Surface-Level Attributes**: Individual surface area, tilt (roof only), azimuth (roof only)

The validation uses a configuration-driven approach where source property names are mapped to City2TABULA calculated columns via YAML configuration files.

## Stage 0: Load Configuration and Setup Database Connection

Load the validation configuration from YAML file based on the `COUNTRY` environment variable. The configuration contains:
- Dataset information and metadata
- Attribute mappings (source property names → City2TABULA columns)
- Database connection settings (automatically configured)
- Validation tolerances

In [1]:
# Add parent directory to Python path to import validation modules
import sys
import os

# Get the notebook directory
notebook_dir = os.getcwd()
print(f"Notebook directory: {notebook_dir}")

# Add to path (no need to go up if already in validation/)
if notebook_dir not in sys.path:
    sys.path.insert(0, notebook_dir)

# Now import from modules
from modules.config import load_config, print_config_summary
from modules.db import get_db_engine

# Get country from environment variable
country = os.getenv('COUNTRY', 'germany').lower()

# Build path to config file
config_path = os.path.join('configs', f'config_{country}.yaml')

# Load configuration
print(f"\nLoading configuration from: config_{country}.yaml")
config = load_config(config_path)

# Display configuration summary
print_config_summary(config)

# Set up output directory
output_dir = os.path.join('outputs')
os.makedirs(output_dir, exist_ok=True)
print(f"\nOutput directory: {output_dir}")

# Figure output format
fig_format = 'png'  # Options: 'png', 'svg', 'pdf', 'ipe'

# Initialize database engine
print("\nInitializing database connection...")
db_engine = get_db_engine(config)
print(f"Connected to database: city2tabula_{country}")

Notebook directory: /home/jayravani/Projects/Work/City2TABULA/github/City2TABULA/validation

Loading configuration from: config_germany.yaml
Loaded configuration for: Germany

CONFIGURATION SUMMARY

 Dataset: LoD2 Dataset of Bavaria
   Country: Germany
   LoD: 2
   Description: Bavarian 3D city models in CityGML format with German property names

 Building Attributes:
   min_height           <- 'value' (m)
   max_height           <- 'value' (m)
   footprint_area       <- 'Flaeche' (m²)

 Surface Attributes:
   ROOF:
   surface_area         <- 'Flaeche' (m²)
   tilt                 <- 'Dachneigung' (degrees)
   azimuth              <- 'Dachorientierung' (degrees)
   WALL:
   surface_area         <- 'Flaeche' (m²)
   FLOOR:
   surface_area         <- 'Flaeche' (m²)

 Validation Tolerances:
   Absolute:
   height               ±0.5
   tilt                 ±2.0
   azimuth              ±5.0
   Percentage:
   footprint_area       ±5.0%
   surface_area         ±5.0%

Output directory: outputs

## Stage 1: Load Data from PostgreSQL Database

Load calculated data from City2TABULA tables and extract attribute mappings from config.

In [2]:
from modules.utils import load_city2tabula_data

# Load calculated data from City2TABULA tables
bf_df, sf_df = load_city2tabula_data(db_engine, config)

print("\nData loading complete.")
display(bf_df.head())
display(sf_df.head())

Loading building features from city2tabula.lod2_building_feature...
Loaded 1314 buildings
Loading surface features from city2tabula.lod2_child_feature_surface...
Loaded 0 surfaces

Data loading complete.


Unnamed: 0,id,building_feature_id,tabula_variant_code_id,tabula_variant_code,construction_year,comment,heating_demand,heating_demand_unit,footprint_area,footprint_complexity,...,max_volume_unit,area_total_roof,area_total_roof_unit,area_total_wall,area_total_wall_unit,area_total_floor,area_total_floor_unit,surface_count_floor,surface_count_roof,surface_count_wall
0,f6c49fc4-013c-4fca-9e67-6ec0b267b29a,5953,124,DE.N.TH.03.Gen.ReEx.001.001,,,,,46.989415,1,...,cbm,46.989415,sqm,65.574002,sqm,46.989415,sqm,0,1,4
1,1bf29014-d71c-469d-bac4-cdf43c3a7131,11341,188,DE.DistrictMZLerch.F.DHH.SD.ReEx.001.001,,,,,61.874395,1,...,cbm,63.150227,sqm,127.387552,sqm,123.748791,sqm,0,1,4
2,9d259953-4b91-40df-85aa-e9c351b10622,10990,49,DE.N.MFH.10.Gen.ReEx.001.001,,,,,971.988341,2,...,cbm,986.909597,sqm,1752.527071,sqm,4859.941703,sqm,0,4,12
3,855be477-1001-466e-81d0-2a044bc96b8e,10999,100,DE.N.SFH.10.Gen.ReEx.001.001,,,,,41.025056,1,...,cbm,31.555906,sqm,187.756996,sqm,123.075168,sqm,0,1,4
4,ac11f0c8-58bf-4eeb-8f2b-9a9b2ac6226b,11006,7,DE.N.AB.04.Gen.ReEx.001.001,,,,,364.436411,2,...,cbm,370.084311,sqm,1336.718871,sqm,1822.182057,sqm,0,2,12


Unnamed: 0,surface_feature_id,building_feature_id,objectclass_id,classname,surface_area,tilt,azimuth,is_valid,is_planar


## Stage 2: Validate Surface Attributes

Validate calculated surface attributes (area, tilt, azimuth) against source thematic data.

In [3]:
from modules.utils import load_thematic_building_data, load_thematic_surface_data
from modules.validators import validate_building_attributes, validate_surface_attributes, get_validation_summary
from modules.config import get_building_attribute_mapping, get_surface_attribute_mapping
import pandas as pd

# =============================================================================
# BUILDING-LEVEL VALIDATION
# =============================================================================
print("="*80)
print("BUILDING-LEVEL ATTRIBUTE VALIDATION")
print("="*80)

# Get building attribute mapping
building_attr_map = get_building_attribute_mapping(config)
print(f"\nValidating {len(building_attr_map)} building attributes:")
for attr, label in building_attr_map.items():
    print(f"  - {attr}: '{label}'")

# Get building IDs
building_ids = bf_df['building_feature_id'].tolist()
print(f"\nBuildings to validate: {len(building_ids)}")

# Load thematic data from CityDB
building_thematic_df = load_thematic_building_data(
    engine=db_engine,
    config=config,
    building_feature_ids=building_ids,
    attribute_mapping=building_attr_map
)

# Validate building attributes
building_validation_df = validate_building_attributes(
    building_calc_df=bf_df,
    building_thematic_df=building_thematic_df,
    attribute_mapping=building_attr_map
)

# Display summary
if not building_validation_df.empty:
    building_summary = get_validation_summary(building_validation_df)
    print("\n" + "="*80)
    print("BUILDING VALIDATION SUMMARY")
    print("="*80)
    display(building_summary)
else:
    print("No building validation results")

# =============================================================================
# SURFACE-LEVEL VALIDATION (ROOFS)
# =============================================================================
print("\n" + "="*80)
print("ROOF SURFACE ATTRIBUTE VALIDATION")
print("="*80)

# Get roof attribute mapping
roof_attr_map = get_surface_attribute_mapping(config, 'roof')
print(f"\nValidating {len(roof_attr_map)} roof attributes:")
for attr, label in roof_attr_map.items():
    print(f"  - {attr}: '{label}'")

# Filter for roof surfaces
roof_surfaces_df = sf_df[sf_df['classname'] == 'RoofSurface'].copy()
roof_ids = roof_surfaces_df['surface_feature_id'].tolist()
print(f"\nRoof surfaces to validate: {len(roof_ids)}")

if roof_ids:
    # Load thematic data from CityDB
    roof_thematic_df = load_thematic_surface_data(
        engine=db_engine,
        config=config,
        surface_feature_ids=roof_ids,
        attribute_mapping=roof_attr_map,
        surface_type='RoofSurface'
    )

    # Validate roof attributes
    roof_validation_df = validate_surface_attributes(
        surface_calc_df=sf_df,
        surface_thematic_df=roof_thematic_df,
        attribute_mapping=roof_attr_map,
        surface_type='RoofSurface'
    )

    # Display summary
    if not roof_validation_df.empty:
        roof_summary = get_validation_summary(roof_validation_df)
        print("\n" + "="*80)
        print("ROOF VALIDATION SUMMARY")
        print("="*80)
        display(roof_summary)
    else:
        print("No roof validation results")
else:
    print("No roof surfaces found")
    roof_validation_df = pd.DataFrame()

# =============================================================================
# SURFACE-LEVEL VALIDATION (WALLS)
# =============================================================================
print("\n" + "="*80)
print("WALL SURFACE ATTRIBUTE VALIDATION")
print("="*80)

# Get wall attribute mapping
wall_attr_map = get_surface_attribute_mapping(config, 'wall')
print(f"\nValidating {len(wall_attr_map)} wall attributes:")
for attr, label in wall_attr_map.items():
    print(f"  - {attr}: '{label}'")

# Filter for wall surfaces
wall_surfaces_df = sf_df[sf_df['classname'] == 'WallSurface'].copy()
wall_ids = wall_surfaces_df['surface_feature_id'].tolist()
print(f"\nWall surfaces to validate: {len(wall_ids)}")

if wall_ids and wall_attr_map:
    # Load thematic data from CityDB
    wall_thematic_df = load_thematic_surface_data(
        engine=db_engine,
        config=config,
        surface_feature_ids=wall_ids,
        attribute_mapping=wall_attr_map,
        surface_type='WallSurface'
    )

    # Validate wall attributes
    wall_validation_df = validate_surface_attributes(
        surface_calc_df=sf_df,
        surface_thematic_df=wall_thematic_df,
        attribute_mapping=wall_attr_map,
        surface_type='WallSurface'
    )

    # Display summary
    if not wall_validation_df.empty:
        wall_summary = get_validation_summary(wall_validation_df)
        print("\n" + "="*80)
        print("WALL VALIDATION SUMMARY")
        print("="*80)
        display(wall_summary)
    else:
        print("No wall validation results")
else:
    print("No wall surfaces or attributes to validate")
    wall_validation_df = pd.DataFrame()

# =============================================================================
# SURFACE-LEVEL VALIDATION (FLOORS/GROUND)
# =============================================================================
print("\n" + "="*80)
print("FLOOR/GROUND SURFACE ATTRIBUTE VALIDATION")
print("="*80)

# Get floor attribute mapping
floor_attr_map = get_surface_attribute_mapping(config, 'floor')
print(f"\nValidating {len(floor_attr_map)} floor attributes:")
for attr, label in floor_attr_map.items():
    print(f"  - {attr}: '{label}'")

# Filter for ground surfaces
floor_surfaces_df = sf_df[sf_df['classname'] == 'GroundSurface'].copy()
floor_ids = floor_surfaces_df['surface_feature_id'].tolist()
print(f"\nFloor/Ground surfaces to validate: {len(floor_ids)}")

if floor_ids and floor_attr_map:
    # Load thematic data from CityDB
    floor_thematic_df = load_thematic_surface_data(
        engine=db_engine,
        config=config,
        surface_feature_ids=floor_ids,
        attribute_mapping=floor_attr_map,
        surface_type='GroundSurface'
    )

    # Validate floor attributes
    floor_validation_df = validate_surface_attributes(
        surface_calc_df=sf_df,
        surface_thematic_df=floor_thematic_df,
        attribute_mapping=floor_attr_map,
        surface_type='GroundSurface'
    )

    # Display summary
    if not floor_validation_df.empty:
        floor_summary = get_validation_summary(floor_validation_df)
        print("\n" + "="*80)
        print("FLOOR/GROUND VALIDATION SUMMARY")
        print("="*80)
        display(floor_summary)
    else:
        print("No floor validation results")
else:
    print("No floor/ground surfaces or attributes to validate")
    floor_validation_df = pd.DataFrame()

BUILDING-LEVEL ATTRIBUTE VALIDATION

Validating 3 building attributes:
  - min_height: 'value'
  - max_height: 'value'
  - footprint_area: 'Flaeche'

Buildings to validate: 1314


ProgrammingError: (psycopg2.errors.UndefinedTable) relation "lod2.property" does not exist
LINE 18:     FROM lod2.property AS p
                  ^

[SQL: 
    SELECT
        p.feature_id,
        p.name AS source_label,
        COALESCE(
            p.val_double,
            CASE
                WHEN p.val_string IS NOT NULL
                THEN
                    CASE
                        WHEN p.val_string ~ '^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$'
                        THEN p.val_string::numeric
                        ELSE NULL
                    END
                ELSE NULL
            END
        ) AS thematic_value
    FROM lod2.property AS p
    WHERE p.feature_id IN (5953,11341,10990,10999,11006,14134,14141,14149,14159,14172,1550,9147,9818,10660,1041,874,1750,1,8,17,24,33,45,53,54,73,83,99,118,126,134,145,153,164,171,180,189,197,204,212,220,230,257,265,273,280,287,295,306,314,322,335,345,353,361,372,379,386,393,401,414,809,423,435,448,455,464,472,488,496,505,515,527,537,547,557,567,574,593,602,611,622,629,638,647,654,666,674,682,828,691,705,714,722,736,743,751,771,780,788,795,805,836,845,853,864,893,901,918,929,938,949,956,971,978,988,996,1009,1017,1028,1049,1057,1065,1077,1085,1093,1100,1110,1117,1125,1133,1140,1148,1155,1166,1178,1189,1196,1204,1220,1227,1235,1247,1260,1270,1277,1294,1304,1314,1324,1334,1341,1352,1359,1369,1382,1392,1402,1410,1420,1428,1439,1446,1453,1462,1475,1484,1494,1510,1520,1527,1535,1542,1557,1564,1573,1584,1591,1598,1607,1616,1633,1645,1653,1660,1667,1674,1682,1689,1690,1707,1714,1723,1731,1742,1759,1766,1774,1783,1794,1804,1819,1837,1847,2247,1855,1865,1873,1882,1892,1906,1931,1945,1961,1968,1976,2254,1998,2026,2038,2048,2070,2077,2085,2092,2104,2264,2273,2281,2112,2131,2142,3219,2169,2204,2215,2224,2227,2240,2289,2298,2350,2368,2428,2435,2445,2455,2456,2474,2488,2500,2510,2518,2525,2533,2540,2549,2561,2608,2620,2632,2642,2651,2656,2678,2685,2694,2704,2721,2732,2739,2753,2764,2796,2804,2812,2820,2828,2845,2857,2865,2883,2893,6201,2901,2910,2935,2943,2953,2961,2974,2975,2991,3011,3023,3036,3038,3054,3062,3070,3083,3094,3102,3110,3119,3120,3135,3152,3161,3173,3181,3193,3207,3239,3256,3268,3278,3323,3337,3356,3365,3382,3391,3400,3457,3467,3475,3486,3498,3510,3518,3538,3550,3562,3575,3587,3597,3615,3623,3632,3643,3707,3718,3728,3744,3754,3770,3782,3801,4150,3815,3833,3841,3857,3968,4020,4101,4110,4118,4126,4137,4162,4166,4171,4195,4204,4213,13140,4225,4240,4249,4257,4265,4274,4286,4295,4303,4312,4321,4342,4353,4363,4373,4387,4395,4404,4631,4639,4649,4662,4671,4687,4694,4706,4718,4725,4744,4756,4766,4776,4791,6735,13148,4801,4820,4861,4872,4880,4895,4905,4914,4922,4923,4940,4949,4956,4963,4971,4978,4986,4993,5001,5374,5008,5017,5025,5033,5042,5076,5083,5092,5099,5106,5114,5126,5133,5134,5150,5159,5166,5175,5187,5194,5203,5211,5219,5231,5242,5249,5262,5271,5281,5289,5299,5309,5319,5329,5336,5343,5351,5359,5366,5381,5388,5396,5403,5414,5428,5438,5448,5456,5463,5470,5477,5485,5492,5499,5509,5534,5541,5548,5555,5566,5574,5586,5594,5602,5609,5623,5630,5637,5649,5658,5667,5675,5685,5693,5700,5708,5716,5725,5733,5742,5749,5757,5764,5774,5775,5789,5801,5810,5818,5938,5826,5836,5844,5853,5862,5872,5880,5890,5898,5906,5914,5922,5929,5945,5960,5968,5976,5977,5995,6009,6016,6024,6031,6041,6052,6059,6071,6079,6086,6093,6100,6108,6119,6138,6145,6155,6165,6174,6185,6193,6208,6218,6225,6233,6241,6248,6256,6265,6272,6281,6288,6296,6304,6311,6318,6325,6332,6341,6348,6356,6365,6373,6401,6409,6416,6425,6433,6441,6449,6457,6475,6483,6491,6500,6510,6518,6525,6975,6540,6548,6556,6566,6574,6583,6638,6664,6678,6688,6698,6715,6727,6751,6771,6783,6796,6807,6815,6823,6833,7357,6841,6852,6873,6882,6891,6899,6913,6920,6930,6940,6948,6957,6965,6983,6995,6998,7003,7004,7038,7046,7056,7064,7074,7082,7089,7112,7121,7122,13155,7145,7153,7165,7186,7194,7217,7227,7237,7475,7247,7255,7262,7272,7279,7287,7295,7303,7329,7341,7349,7364,7374,7382,7393,7401,7411,7419,7426,7433,7442,7452,7459,7468,7485,7494,13165,7506,7513,7523,7539,7549,7559,7569,7576,7592,7600,7610,13172,7619,7628,7727,7773,7787,7795,7809,7817,13173,7831,7833,7850,7859,7871,7883,7890,7897,7905,7913,7920,7930,7945,7952,7960,7968,7978,7986,8002,8022,8037,8044,8052,8059,8067,8076,8083,8112,8121,8128,8135,8142,8149,8156,8163,8172,8184,8194,8204,8216,8227,8234,8243,8253,8261,8269,8276,8283,8291,8299,8309,8440,8318,8327,8335,8342,8352,8360,8370,8382,8393,8401,8415,8424,8431,8447,8454,8474,8482,8497,8504,8518,8525,8533,8540,8547,8557,8572,8946,8583,8591,8606,8615,8622,8629,8636,8645,8655,8665,8667,8682,8688,8697,8704,8711,8733,8750,8762,8769,8776,8784,8798,8805,8812,8829,8836,8953,8844,8860,8867,8874,8883,8892,8901,8914,8921,8930,8939,8963,8970,8978,8985,8997,9006,9015,9022,9029,9039,9046,9053,9061,9068,9075,9082,9089,9100,9107,9114,9121,9130,9137,9155,9163,9173,9180,9188,9197,9205,9212,9219,9226,9243,9255,9263,9288,9311,9321,9328,9336,9343,9354,9363,9375,9385,9394,9404,9411,9425,9426,9443,9450,9459,9467,9480,9487,9495,9505,9514,9535,9549,9559,9579,9589,9604,9611,9619,9626,9633,9640,9650,9659,9669,9679,9697,9723,9739,9746,9753,9761,9783,9795,9808,9825,9835,9846,9860,9870,9881,9890,9901,9911,9922,9935,9946,9953,9967,9981,9993,10002,10011,10022,10033,10040,10052,10062,10069,10080,10091,10101,10108,10115,10124,10137,10147,10163,10177,10184,10195,10208,10214,10236,10260,10271,10286,10294,10301,10314,10321,10331,10343,10356,10363,10372,10379,10388,10395,10402,10412,10419,10426,10427,10440,10447,10462,10473,10487,10488,10503,10513,10526,10535,10544,10567,10576,10586,10594,10604,10613,10623,10638,10651,10669,10676,10684,10691,10703,10715,10723,10731,10741,10750,10758,10759,10768,10785,10792,10802,10803,10831,10838,10840,10854,10855,10856,10857,10862,10897,10904,11165,10915,10933,10940,10947,10954,10964,10973,10980,11016,11024,11032,11040,11166,11053,11054,11056,11084,11097,11107,11115,11123,11133,11148,11154,11164,11167,11196,11211,11234,11249,11256,11268,11269,11284,11294,11321,11329,11348,11356,11363,11389,11399,11408,11420,11423,11447,11456,11466,11490,11493,11508,11514,11534,11541,11548,11556,11557,11574,11584,11595,12015,11605,11633,11643,11658,11667,11675,11691,11700,11712,11720,11728,11737,11738,11759,11774,11782,11799,11808,11818,11827,11841,11853,11865,12140,11873,11897,11909,11922,11936,11950,11961,11969,11979,11987,11996,12997,12023,12037,12049,12059,12067,12080,12088,12092,12094,12124,12132,12150,12160,12168,12176,12184,12192,12193,12216,12233,12242,12250,12251,13282,12268,12276,12277,12279,12305,12314,12315,12329,12346,12353,12366,12374,12382,12392,12412,12420,12427,12435,12443,12452,12460,12470,12480,12488,12500,12615,12507,12508,12523,12530,12538,12546,12547,12565,12575,12582,12591,12599,12607,12625,12634,12642,12651,12652,12653,12674,12676,12678,12685,12698,12727,12996,12738,12745,12752,12767,12779,12786,12794,12801,12807,12816,12830,12844,12852,12859,12867,12875,12883,12896,12903,12913,12921,12929,12930,12946,12962,12969,12980,12989,13011,13017,13029,13039,13049,13051,13064,13070,13081,13092,13102,13116,13124,13132,13189,13196,13204,13213,13221,13227,13262,13275,13283,13298,13309,13320,13330,13347,13356,13363,13372,13382,13389,13406,13419,13426,13436,13444,13459,13466,13473,13481,13490,13497,13505,13512,13519,13531,13532,13558,13565,13574,13589,13598,13609,13616,13629,13639,13646,13658,13672,13680,13692,13704,13718,13734,13741,13753,13764,13774,13795,13809,13819,13833,13846,13862,13870,13877,13884,13892,13899,13907,13917,13925,13932,13939,13946,13959,13971,13979,13986,13996,14004,14014,14022,14032,14041,14049,14059,14069,14078,14090,14097,14107,14115)
      AND p.name IN ('value','value','Flaeche')
      AND (
          p.val_double IS NOT NULL
          OR (p.val_string IS NOT NULL AND p.val_string ~ '^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$')
      )
    ORDER BY p.feature_id, p.name;
    ]
(Background on this error at: https://sqlalche.me/e/20/f405)

In [None]:
# =============================================================================
# SAVE VALIDATION RESULTS
# =============================================================================
import os
from datetime import datetime

# Create timestamped output directory
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
results_dir = os.path.join(output_dir, config.get('dataset', {}).get('country', 'unknown'), f"validation_{timestamp}")
os.makedirs(results_dir, exist_ok=True)

print(f"\nSaving validation results to: {results_dir}")

# Save building validation results
if not building_validation_df.empty:
    building_output = os.path.join(results_dir, "building_validation.csv")
    building_validation_df.to_csv(building_output, index=False)
    print(f"Saved building validation: {building_output}")
    
    building_summary_output = os.path.join(results_dir, "building_summary.csv")
    building_summary.to_csv(building_summary_output, index=False)
    print(f"Saved building summary: {building_summary_output}")

# Save roof validation results
if not roof_validation_df.empty:
    roof_output = os.path.join(results_dir, "roof_validation.csv")
    roof_validation_df.to_csv(roof_output, index=False)
    print(f"Saved roof validation: {roof_output}")
    
    roof_summary_output = os.path.join(results_dir, "roof_summary.csv")
    roof_summary.to_csv(roof_summary_output, index=False)
    print(f"Saved roof summary: {roof_summary_output}")

# Save wall validation results
if not wall_validation_df.empty:
    wall_output = os.path.join(results_dir, "wall_validation.csv")
    wall_validation_df.to_csv(wall_output, index=False)
    print(f"Saved wall validation: {wall_output}")
    
    wall_summary_output = os.path.join(results_dir, "wall_summary.csv")
    wall_summary.to_csv(wall_summary_output, index=False)
    print(f"Saved wall summary: {wall_summary_output}")

# Save floor validation results
if not floor_validation_df.empty:
    floor_output = os.path.join(results_dir, "floor_validation.csv")
    floor_validation_df.to_csv(floor_output, index=False)
    
    floor_summary_output = os.path.join(results_dir, "floor_summary.csv")
    floor_summary.to_csv(floor_summary_output, index=False)
    print(f"Saved floor summary: {floor_summary_output}")

print(f"\n{'='*80}")
print("RESULTS SAVED")
print(f"{'='*80}")

## Stage 2.5: Export Problematic Buildings

Identify buildings with surfaces that have high validation errors and export them with geometries for inspection in QGIS.

In [None]:
from modules.validators import export_problematic_surfaces

# =============================================================================
# EXPORT PROBLEMATIC SURFACES WITH GEOMETRIES
# =============================================================================

# Define error threshold (percentage error to flag as problematic)
error_threshold = 10.0

print("="*80)
print("EXPORTING PROBLEMATIC SURFACES")
print(f"Error threshold: {error_threshold}%")
print("="*80)

# Export problematic roofs
if not roof_validation_df.empty:
    print("\n--- Roof Surfaces ---")
    roof_prob_file = os.path.join(results_dir, 'problematic_roofs.csv')
    roof_prob = export_problematic_surfaces(roof_validation_df, roof_prob_file, error_threshold)

# Export problematic walls
if not wall_validation_df.empty:
    print("\n--- Wall Surfaces ---")
    wall_prob_file = os.path.join(results_dir, 'problematic_walls.csv')
    wall_prob = export_problematic_surfaces(wall_validation_df, wall_prob_file, error_threshold)

# Export problematic floors
if not floor_validation_df.empty:
    print("\n--- Floor Surfaces ---")
    floor_prob_file = os.path.join(results_dir, 'problematic_floors.csv')
    floor_prob = export_problematic_surfaces(floor_validation_df, floor_prob_file, error_threshold)

print("\n" + "="*80)
print("PROBLEMATIC SURFACES EXPORT COMPLETE")
print("="*80)
print("\nFiles contain: building_feature_id, surface_feature_id, geometry (WKT),")
print("              calculated_value, thematic_value, difference, percent_error")
print("\nTo visualize in QGIS:")
print("1. Layer → Add Layer → Add Delimited Text Layer")
print("2. Select the problematic_*.csv file")
print("3. Geometry definition → Well-Known Text (WKT)")
print("4. Geometry field: geom")
print("="*80)

## Stage 3: Generate Validation Plots

Create scatter plots and error distribution visualizations for validated attributes.

In [None]:
from modules.plots import (plot_comparison_scatter, plot_error_distribution, 
                            plot_percent_error_distribution, plot_multi_attribute_comparison)
import matplotlib.pyplot as plt

# Create plots subdirectory
plots_dir = os.path.join(results_dir, "plots")
os.makedirs(plots_dir, exist_ok=True)

print("="*80)
print("GENERATING VALIDATION PLOTS")
print("="*80)

# =============================================================================
# BUILDING ATTRIBUTE PLOTS
# =============================================================================
if not building_validation_df.empty:
    print("\nGenerating building attribute plots...")
    print(building_validation_df.head())
    # Multi-attribute comparison
    plot_multi_attribute_comparison(
        building_validation_df,
        save_path=os.path.join(plots_dir, f"building_multi_comparison.{fig_format}"),
        title_prefix="Building",
        fig_format=fig_format
    )
    
    # Individual attribute plots
    for attr in building_validation_df['attribute_name'].unique():        
        plot_comparison_scatter(
            building_validation_df, attr,
            save_path=os.path.join(plots_dir, f"building_{attr}_scatter.{fig_format}")
        )
        
        plot_error_distribution(
            building_validation_df, attr,
            save_path=os.path.join(plots_dir, f"building_{attr}_error_dist.{fig_format}")
        )
        
        plot_percent_error_distribution(
            building_validation_df, attr,
            save_path=os.path.join(plots_dir, f"building_{attr}_percent_error.{fig_format}")
        )

# =============================================================================
# ROOF SURFACE ATTRIBUTE PLOTS
# =============================================================================
if not roof_validation_df.empty:
    print("\nGenerating roof surface attribute plots...")
    
    plot_multi_attribute_comparison(
        roof_validation_df,
        save_path=os.path.join(plots_dir, f"roof_multi_comparison.{fig_format}"),
        title_prefix="Roof",
        fig_format=fig_format
    )
    
    for attr in roof_validation_df['attribute_name'].unique():
        
        plot_comparison_scatter(
            roof_validation_df, attr,
            save_path=os.path.join(plots_dir, f"roof_{attr}_scatter.{fig_format}")
        )
        
        plot_error_distribution(
            roof_validation_df, attr,
            save_path=os.path.join(plots_dir, f"roof_{attr}_error_dist.{fig_format}")
        )
        
        plot_percent_error_distribution(
            roof_validation_df, attr,
            save_path=os.path.join(plots_dir, f"roof_{attr}_percent_error.{fig_format}")
        )

# =============================================================================
# WALL SURFACE ATTRIBUTE PLOTS
# =============================================================================
if not wall_validation_df.empty:
    print("\nGenerating wall surface attribute plots...")
    
    plot_multi_attribute_comparison(
        wall_validation_df,
        save_path=os.path.join(plots_dir, f"wall_multi_comparison.{fig_format}"),
        title_prefix="Wall",
        fig_format=fig_format
    )
    
    for attr in wall_validation_df['attribute_name'].unique():

        plot_comparison_scatter(
            wall_validation_df, attr,
            save_path=os.path.join(plots_dir, f"wall_{attr}_scatter.{fig_format}")
        )
        
        plot_error_distribution(
            wall_validation_df, attr,
            save_path=os.path.join(plots_dir, f"wall_{attr}_error_dist.{fig_format}")
        )
        
        plot_percent_error_distribution(
            wall_validation_df, attr,
            save_path=os.path.join(plots_dir, f"wall_{attr}_percent_error.{fig_format}")
        )

# =============================================================================
# FLOOR SURFACE ATTRIBUTE PLOTS
# =============================================================================
if not floor_validation_df.empty:
    print("\nGenerating floor surface attribute plots...")
    
    plot_multi_attribute_comparison(
        floor_validation_df,
        save_path=os.path.join(plots_dir, f"floor_multi_comparison.{fig_format}"),
        title_prefix="Floor",
        fig_format=fig_format
    )
    
    for attr in floor_validation_df['attribute_name'].unique():
        
        plot_comparison_scatter(
            floor_validation_df, attr,
            save_path=os.path.join(plots_dir, f"floor_{attr}_scatter.{fig_format}")
        )
        
        plot_error_distribution(
            floor_validation_df, attr,
            save_path=os.path.join(plots_dir, f"floor_{attr}_error_dist.{fig_format}")
        )
        
        plot_percent_error_distribution(
            floor_validation_df, attr,
            save_path=os.path.join(plots_dir, f"floor_{attr}_percent_error.{fig_format}")
        )

print(f"\nAll plots saved to: {plots_dir}")

## Stage 4: Interpretation & Summary

Review the validation results and summary statistics.

In [None]:
print("="*80)
print("VALIDATION SUMMARY REPORT")
print("="*80)

# =============================================================================
# BUILDING VALIDATION SUMMARY
# =============================================================================
if not building_validation_df.empty:
    print("\n" + "="*80)
    print("BUILDING ATTRIBUTE VALIDATION")
    print("="*80)
    print(f"\nTotal buildings validated: {building_validation_df['building_feature_id'].nunique()}")
    print(f"Total comparisons: {len(building_validation_df)}")
    print("\nValidation Statistics:")
    display(building_summary)
else:
    print("\nNo building validation data available")

# =============================================================================
# ROOF SURFACE VALIDATION SUMMARY
# =============================================================================
if not roof_validation_df.empty:
    print("\n" + "="*80)
    print("ROOF SURFACE ATTRIBUTE VALIDATION")
    print("="*80)
    print(f"\nTotal roof surfaces validated: {roof_validation_df['surface_feature_id'].nunique()}")
    print(f"Total comparisons: {len(roof_validation_df)}")
    print("\nValidation Statistics:")
    display(roof_summary)
else:
    print("\nNo roof surface validation data available")

# =============================================================================
# WALL SURFACE VALIDATION SUMMARY
# =============================================================================
if not wall_validation_df.empty:
    print("\n" + "="*80)
    print("WALL SURFACE ATTRIBUTE VALIDATION")
    print("="*80)
    print(f"\nTotal wall surfaces validated: {wall_validation_df['surface_feature_id'].nunique()}")
    print(f"Total comparisons: {len(wall_validation_df)}")
    print("\nValidation Statistics:")
    display(wall_summary)
else:
    print("\nNo wall surface validation data available")

# =============================================================================
# FLOOR SURFACE VALIDATION SUMMARY
# =============================================================================
if not floor_validation_df.empty:
    print("\n" + "="*80)
    print("FLOOR SURFACE ATTRIBUTE VALIDATION")
    print("="*80)
    print(f"\nTotal floor surfaces validated: {floor_validation_df['surface_feature_id'].nunique()}")
    print(f"Total comparisons: {len(floor_validation_df)}")
    print("\nValidation Statistics:")
    display(floor_summary)
else:
    print("\nNo floor surface validation data available")

# =============================================================================
# OVERALL SUMMARY
# =============================================================================
print("\n" + "="*80)
print("OVERALL VALIDATION SUMMARY")
print("="*80)

total_validations = 0
if not building_validation_df.empty:
    total_validations += len(building_validation_df)
if not roof_validation_df.empty:
    total_validations += len(roof_validation_df)
if not wall_validation_df.empty:
    total_validations += len(wall_validation_df)
if not floor_validation_df.empty:
    total_validations += len(floor_validation_df)

print(f"\nTotal validation comparisons: {total_validations}")
print(f"Results directory: {results_dir}")
print("\n" + "="*80)

## Stage 5: Export Notebook as HTML & PDF

Export this notebook with all outputs to HTML and PDF formats for documentation.

In [None]:
import subprocess

print("="*80)
print("EXPORTING NOTEBOOK")
print("="*80)

# Get the notebook filename
notebook_path = "validation.ipynb"
notebook_name = os.path.splitext(os.path.basename(notebook_path))[0]

# Export paths
html_output = os.path.join(results_dir, f"{notebook_name}_report.html")
pdf_output = os.path.join(results_dir, f"{notebook_name}_report.pdf")

try: 
    # Export to HTML
    try:
        result = subprocess.run(
            ["jupyter", "nbconvert", "--to", "html", notebook_path, "--output", html_output],
            capture_output=True,
            text=True,
            check=True
        )
    except subprocess.CalledProcessError as e:
        print(f"HTML export failed: {e.stderr}")
    except FileNotFoundError:
        print("jupyter nbconvert not found. Install with: pip install nbconvert")

    # Export to PDF (requires nbconvert and additional dependencies)
    try:
        # Check if wkhtmltopdf or similar is available
        result = subprocess.run(
            ["jupyter", "nbconvert", "--to", "pdf", notebook_path, "--output", pdf_output],
            capture_output=True,
            text=True,
            check=True
        )
    except subprocess.CalledProcessError as e:
        print(f"PDF export failed: {e.stderr}")
        print("   PDF export requires additional dependencies:")
        print("   - Install pandoc: conda install pandoc")
        print("   - Install LaTeX: conda install -c conda-forge texlive-core")
        print("   Alternative: Use HTML export and print to PDF from browser")
    except FileNotFoundError:
        print("jupyter nbconvert not found. Install with: pip install nbconvert")

    print("="*80)
    print("NOTEBOOK EXPORTED SUCCESSFULLY AS PDF & HTML DOCUMENT!")
    print("="*80)

except:
    print("="*80)
    print("NOTEBOOK EXPORTED FAILED!")
    print("="*80)