# Validator Module – Structural Geometry Pipeline  
**Author**: Thaddeus da Silva Correa  
**Project**: Automated Extraction and Interpretation of Structural Geometry from CAD Drawings for BIM Integration  
**Module**: 4 of 4 – Validator  
**Environment**: Google Colab  
**Last updated**: June 2025

---

This module validates the structured output generated by the **JSON Builder**. It checks if the resulting BIM data adheres to the expected format and structural correctness. The validator performs checks on the geometry, ensuring that the shapes are correctly classified, the labels match expectations, and the output adheres to the BIM schema.

**Inputs**: Structured parts data in JSON format (produced by the JSON Builder)  
**Outputs**: Validated parts definitions in JSON format, including error/warning messages if the output is invalid or inconsistent


## 1. Setup  
Import required libraries and define the validation schema for structured BIM JSON files.


In [157]:
import json
import os
import math
from pathlib import Path
from collections import defaultdict, Counter
from typing import Dict, List

import pandas as pd
from IPython.display import display
import jsonschema
from jsonschema import validate, ValidationError

# JSON Schema used to validate the BIM section of the structured output
STRUCTURED_OUTPUT_SCHEMA = {
    "type": "object",
    "properties": {
        "bim": {
            "type": "object",
            "properties": {
                "insertion_points": {"type": "object"},
                "chaining_point": {
                    "type": "object",
                    "properties": {
                        "is_insertion_point": {"type": "boolean"},
                        "origin": {
                            "type": "object",
                            "properties": {
                                "x": {"type": "number"},
                                "y": {"type": "number"},
                                "z": {"type": "number"}
                            },
                            "required": ["x", "y", "z"]
                        }
                    },
                    "required": ["is_insertion_point", "origin"]
                },
                "measurement_points": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "x": {"type": "number"},
                            "y": {"type": "number"},
                            "z": {"type": "number"}
                        },
                        "required": ["x", "y", "z"]
                    }
                },
                "parts": {
                    "type": "object",
                    "additionalProperties": {
                        "type": "object",
                        "properties": {
                            "shape": {"type": "string"},
                            "origin": {
                                "type": "object",
                                "properties": {
                                    "x": {"type": "number"},
                                    "y": {"type": "number"},
                                    "z": {"type": "number"},
                                    "rotation": {
                                        "type": "array",
                                        "items": {"type": "number"},
                                        "minItems": 3,
                                        "maxItems": 3
                                    }
                                },
                                "required": ["x", "y", "z"]
                            },
                            "profile": {
                                "type": "array",
                                "items": {
                                    "type": "object",
                                    "properties": {
                                        "x": {"type": "number"},
                                        "y": {"type": "number"},
                                        "z": {"type": "number"},
                                        "type": {"type": "string"}
                                    },
                                    "required": ["x", "y", "z", "type"]
                                }
                            },
                            "cutout": {
                                "type": "array",
                                "items": {"type": "object"}
                            },
                            "bounding_box": {"type": "object"}
                        },
                        "required": ["shape"]
                    }
                },
                "classification": {"type": "object"},
                "insertion_box": {
                    "type": "object",
                    "properties": {
                        "shape": {"type": "string"},
                        "min_point": {
                            "type": "object",
                            "properties": {
                                "x": {"type": "number"},
                                "y": {"type": "number"},
                                "z": {"type": "number"}
                            },
                            "required": ["x", "y", "z"]
                        },
                        "max_point": {
                            "type": "object",
                            "properties": {
                                "x": {"type": "number"},
                                "y": {"type": "number"},
                                "z": {"type": "number"}
                            },
                            "required": ["x", "y", "z"]
                        }
                    },
                    "required": ["shape", "min_point", "max_point"]
                }
            },
            "required": ["parts", "insertion_box"]
        }
    },
    "required": ["bim"]
}


## 2. Core Validation Functions  
In this section, we define the core functions used to validate structured geometry output. These functions are organized into the following categories:

** A. I/O and Distance Utilities**  
** B. Bounding Box and Volume Calculations**  
** C. Part Feature Checks**  
** D. Schema Validators**  
** E. Batch Validation Logic**  



### A. I/O and Distance Utilities  
Helper functions for loading structured JSON files and calculating basic geometric distances.


In [158]:
def load_json(filepath: str) -> Dict:
    """
    Load a JSON file from the given filepath.

    Args:
        filepath (str): Path to the JSON file.

    Returns:
        dict: Parsed JSON data.
    """
    with open(filepath, "r") as f:
        return json.load(f)


def euclidean_distance_3d(p1: Dict, p2: Dict) -> float:
    """
    Compute Euclidean distance between two 3D points.

    Args:
        p1 (dict): First point with 'x', 'y', 'z' keys.
        p2 (dict): Second point with 'x', 'y', 'z' keys.

    Returns:
        float: Euclidean distance between p1 and p2.
    """
    return math.sqrt(
        (p1["x"] - p2["x"])**2 +
        (p1["y"] - p2["y"])**2 +
        (p1["z"] - p2["z"])**2
    )


def mean_profile_distance(p1: Dict, p2: Dict) -> float:
    """
    Compute the mean distance between corresponding vertices of two profiles.

    Args:
        p1 (dict): First part with a 'profile' key containing 'vertices'.
        p2 (dict): Second part with a 'profile' key containing 'vertices'.

    Returns:
        float: Mean distance between corresponding 2D vertices, or None if invalid.
    """
    v1 = p1.get("profile", {}).get("vertices", [])
    v2 = p2.get("profile", {}).get("vertices", [])
    if len(v1) != len(v2) or not v1:
        return None
    return sum(
        math.sqrt((a["x"] - b["x"])**2 + (a["y"] - b["y"])**2)
        for a, b in zip(v1, v2)
    ) / len(v1)


### B. Bounding Box and Volume  
Functions to compute approximate bounding boxes and intersection-over-union (IoU) for part comparison.


In [159]:
def extract_bounding_box(part: Dict) -> (Dict, Dict):
    """
    Extract the 3D bounding box for a given part, based on its shape type.

    Supports:
    - 'cuboid': uses provided min/max points
    - 'extrusion': projects profile vertices along extrusion path
    - 'revolution': approximates full 360 sweep to generate boundary points

    Args:
        part (dict): Part dictionary from structured output.

    Returns:
        tuple: (min_point, max_point) each a dict with 'x', 'y', 'z' keys.
               Returns (None, None) if bounding box can't be computed.
    """
    shape = part.get("shape")

    if shape == "cuboid":
        return part.get("min_point"), part.get("max_point")

    elif shape == "extrusion":
        profile = part.get("profile", {}).get("vertices", [])
        path = part.get("path", {})
        if not profile or not path:
            return None, None

        dx, dy, dz = path.get("x", 0), path.get("y", 0), path.get("z", 0)
        points = []
        for v in profile:
            points.append(v)
            points.append({
                "x": v["x"] + dx,
                "y": v["y"] + dy,
                "z": v["z"] + dz,
            })

    elif shape == "revolution":
        profile = part.get("profile", {}).get("vertices", [])
        axis_origin = part.get("axis_origin", {})
        if not profile or not axis_origin:
            return None, None

        points = []
        for v in profile:
            r = ((v["x"] - axis_origin["x"])**2 + (v["y"] - axis_origin["y"])**2) ** 0.5
            z = v["z"]
            for theta in [0, 90, 180, 270]:  # Sample 4 quadrants
                rad = math.radians(theta)
                points.append({
                    "x": axis_origin["x"] + r * math.cos(rad),
                    "y": axis_origin["y"] + r * math.sin(rad),
                    "z": z
                })

    else:
        return None, None

    # Compute axis-aligned bounding box
    xs = [p["x"] for p in points]
    ys = [p["y"] for p in points]
    zs = [p["z"] for p in points]
    return {
        "x": min(xs), "y": min(ys), "z": min(zs)
    }, {
        "x": max(xs), "y": max(ys), "z": max(zs)
    }


def compute_bounding_box_iou(part_a: Dict, part_b: Dict) -> float:
    """
    Compute the intersection-over-union (IoU) of the bounding boxes of two parts.

    Args:
        part_a (dict): First part dictionary.
        part_b (dict): Second part dictionary.

    Returns:
        float: IoU value between 0 and 1. Returns None if computation fails.
    """
    min_a, max_a = extract_bounding_box(part_a)
    min_b, max_b = extract_bounding_box(part_b)

    if not min_a or not min_b:
        return None  # Can't compute without bounding boxes

    def volume(min_pt: Dict, max_pt: Dict) -> float:
        dx = max_pt["x"] - min_pt["x"]
        dy = max_pt["y"] - min_pt["y"]
        dz = max_pt["z"] - min_pt["z"]
        return max(0, dx) * max(0, dy) * max(0, dz)

    inter_min = {
        "x": max(min_a["x"], min_b["x"]),
        "y": max(min_a["y"], min_b["y"]),
        "z": max(min_a["z"], min_b["z"]),
    }
    inter_max = {
        "x": min(max_a["x"], max_b["x"]),
        "y": min(max_a["y"], max_b["y"]),
        "z": min(max_a["z"], max_b["z"]),
    }

    inter_vol = volume(inter_min, inter_max)
    union_vol = volume(min_a, max_a) + volume(min_b, max_b) - inter_vol

    if union_vol == 0:
        return 0.0
    return inter_vol / union_vol

### C. Part Feature Checks  
Geometry-specific validations for individual parts, such as planarity, extrusion alignment, volume checks, and anchor symmetry.


In [160]:
def check_extrusion_path_alignment(part: Dict) -> bool:
    """
    Check if an extrusion's path is aligned along a principal axis (X, Y, or Z).

    Args:
        part (dict): Part dictionary with 'path' key containing a list of 3D points.

    Returns:
        bool: True if path is well-aligned along one axis, False otherwise.
    """
    path = part.get("path", [])
    if len(path) < 2:
        return False

    start, end = path[0], path[-1]
    dx, dy, dz = end["x"] - start["x"], end["y"] - start["y"], end["z"] - start["z"]
    length = (dx**2 + dy**2 + dz**2)**0.5
    if length == 0:
        return False

    # Check if direction is mostly along one axis
    nx, ny, nz = abs(dx / length), abs(dy / length), abs(dz / length)
    return max(nx, ny, nz) > 0.99


def check_profile_planarity(profile: List[Dict]) -> bool:
    """
    Validate that all points in a profile lie on the same Z-plane.

    Args:
        profile (List[dict]): List of 3D points.

    Returns:
        bool: True if profile is planar in Z-direction.
    """
    z_values = [round(p["z"], 4) for p in profile]
    return len(set(z_values)) == 1


def compute_part_volume(part: Dict) -> float:
    """
    Compute an approximate volume of a part.

    Currently only supports:
    - Cuboids (from min/max points)

    Args:
        part (dict): Part dictionary.

    Returns:
        float: Estimated volume in cubic units. Returns 0 for unsupported shapes.
    """
    shape = part.get("shape")
    if shape == "cuboid":
        min_pt = part.get("min_point", {})
        max_pt = part.get("max_point", {})
        dx = max_pt.get("x", 0) - min_pt.get("x", 0)
        dy = max_pt.get("y", 0) - min_pt.get("y", 0)
        dz = max_pt.get("z", 0) - min_pt.get("z", 0)
        return dx * dy * dz
    return 0  # Volume estimation not supported for other shapes yet


def check_anchor_symmetry(instances: List[Dict]) -> bool:
    """
    Check whether a list of anchor parts are symmetrically placed along the X-axis.

    Assumes symmetry across origin (X=0).

    Args:
        instances (List[dict]): List of anchor instances (revolutions).

    Returns:
        bool: True if anchors are symmetrically placed, False otherwise.
    """
    xs = sorted([
        round(i.get("axis_origin", [0])[0], 2)
        for i in instances if "axis_origin" in i
    ])
    if len(xs) < 2:
        return False

    for i in range(len(xs) // 2):
        if abs(xs[i] + xs[-(i + 1)]) >= 0.1:
            return False
    return True


## D. Schema Validators  
Checks whether structured output JSON files conform to the expected schema definitions using both custom logic and strict JSON Schema validation.


In [161]:
def validate_output_schema(data: dict) -> bool:
    """
    Perform a basic sanity check on the structure of a structured output JSON.

    This validator ensures:
    - The "bim" section exists.
    - Each part under "bim.parts" has a valid label and required fields.
    - Revolution parts include required geometry fields.

    Args:
        data (dict): Parsed JSON object.

    Returns:
        bool: True if validation passes, False otherwise.
    """
    try:
        assert "bim" in data, "'bim' section missing"
        parts = data["bim"].get("parts", {})
        assert isinstance(parts, dict), "'bim.parts' must be a dictionary"

        for label, part in parts.items():
            assert isinstance(label, str), "Part label must be a string"
            assert isinstance(part, dict), "Each part must be a dictionary"
            assert "shape" in part, f"Part '{label}' missing required 'shape' field"

            if part["shape"] == "revolution":
                for field in ["axis_origin", "axis_direction", "sweep_angle"]:
                    assert field in part, f"Revolution part '{label}' missing '{field}'"

        return True

    except AssertionError as e:
        print(f"❌ Schema sanity check failed: {e}")
        return False


def validate_output_schema_strict(data: dict, filename: str = "") -> bool:
    """
    Validate structured output against the full JSON Schema definition.

    Args:
        data (dict): Parsed JSON object.
        filename (str): Optional filename for error reporting.

    Returns:
        bool: True if schema validation succeeds, False otherwise.
    """
    try:
        validate(instance=data, schema=STRUCTURED_OUTPUT_SCHEMA)
        return True
    except ValidationError as e:
        print(f"❌ Schema error in {filename}: {e.message}")
        return False


def run_schema_validation(data: dict, filename: str = "") -> bool:
    """
    Run both shallow and strict validations on structured JSON output.

    Args:
        data (dict): Parsed JSON object.
        filename (str): Optional filename for reporting.

    Returns:
        bool: True if all validations pass.
    """
    if not validate_output_schema(data):
        print(f"❌ Failed shallow validation: {filename}")
        return False

    if not validate_output_schema_strict(data, filename):
        print(f"❌ Failed strict schema validation: {filename}")
        return False

    return True

## E. Batch Validation Logic  
This section processes a folder of generated structured output files and, if available, compares them to ground-truth reference files.


In [162]:
def validate_structured_outputs(generated_folder, reference_folder=None):
    """
    Validate structured geometry outputs by performing schema checks and comparing to reference files.

    Args:
        generated_folder (str or Path): Folder containing *_structured_output.json files.
        reference_folder (str or Path, optional): Folder with reference JSON files for comparison.

    Returns:
        None. Prints results and displays validation tables using pandas and IPython.
    """
    generated_path = Path(generated_folder)
    reference_path = Path(reference_folder) if reference_folder else None

    generated_files = list(generated_path.glob("*_structured_output.json"))
    reference_files = list(reference_path.glob("*.json")) if reference_folder else []

    summary_rows = []

    def match_reference(gen_file):
        gen_prefix = gen_file.stem.replace("_structured_output", "")
        for ref in reference_files:
            if ref.stem.startswith(gen_prefix):
                return ref
        return None

    for gen_file in generated_files:
        gen_data = load_json(gen_file)
        if not run_schema_validation(gen_data, gen_file.name):
            continue  # Skip invalid file

        ref_file = match_reference(gen_file)

        if not ref_file:
            ref_data = None  # No reference data
        else:
            ref_data = load_json(ref_file)

        # Validate content
        gen_parts = gen_data.get("bim", {}).get("parts", {})

        # --- Unsupervised quality checks (no GT needed) ---
        unsupervised_issues = []
        total_part_volume = 0

        for label, part in gen_parts.items():
            shape = part.get("shape", "").lower()

            # Compute part volume
            part_volume = compute_part_volume(part)
            total_part_volume += part_volume

            # Check extrusion alignment
            if shape == "extrusion" and not check_extrusion_path_alignment(part):
                unsupervised_issues.append(f" {label}: misaligned extrusion axis")

            # Check profile planarity
            if shape in {"extrusion", "revolution"} and not check_profile_planarity(part.get("profile", [])):
                unsupervised_issues.append(f" {label}: non-planar profile")

            # Check anchor symmetry if labeled accordingly
            if label.startswith("anchors") and not check_anchor_symmetry(part.get("instances", [])):
                unsupervised_issues.append(f" {label}: anchor symmetry issue")

            # Symmetry check for parts labeled with 'anchor'
            if "anchor" in label.lower() and not check_anchor_symmetry(part.get("instances", [])):
                unsupervised_issues.append(f" {label}: anchor symmetry issue")

            # Shape-specific field checks
            if shape == "extrusion":
                if "path" not in part or not isinstance(part["path"], list):
                    unsupervised_issues.append(f" {label}: missing or invalid 'path' for extrusion")

            if shape == "revolution":
                if "axis_origin" not in part:
                    unsupervised_issues.append(f" {label}: missing 'axis_origin' for revolution")
                if "sweep_angle" in part and not (0 < part["sweep_angle"] <= 360):
                    unsupervised_issues.append(f" {label}: sweep_angle out of bounds (0–360)")

            # Generic numeric sanity checks
            origin = part.get("origin", {})
            if any(origin.get(dim, 0) != origin.get(dim, 0) for dim in ("x", "y", "z")):  # NaN check
                unsupervised_issues.append(f" {label}: invalid origin coordinates (NaN)")

            profile = part.get("profile", [])
            for i, pt in enumerate(profile):
                for dim in ("x", "y", "z"):
                    val = pt.get(dim)
                    if not isinstance(val, (int, float)) or not math.isfinite(val):
                        unsupervised_issues.append(f" {label}: profile point {i} has invalid {dim}={val}")

        # Estimate bounding box volume and part coverage
        bbox_volume = compute_part_volume(gen_parts.get("bounding_box", {}))
        coverage_ratio = total_part_volume / bbox_volume if bbox_volume else 0

        if coverage_ratio < 0.05:
            unsupervised_issues.append(" volume coverage too low (<5%)")
        elif coverage_ratio > 1.5:
            unsupervised_issues.append(" volume coverage too high (>150%)")

        # --- Initialize variables for errors ---
        iou_results = []
        origin_errors = []
        profile_errors = []
        cutout_results = []

        if ref_data:
            ref_parts = ref_data.get("bim", {}).get("parts", {})

            # --- Compute bounding box IoU for all supported shapes ---
            shared_labels = set(gen_parts.keys()) & set(ref_parts.keys())
            for label in shared_labels:
                gen_part = gen_parts[label]
                ref_part = ref_parts[label]

                iou = compute_bounding_box_iou(gen_part, ref_part)
                if iou is not None:
                    iou_results.append({
                        "File": gen_file.name,
                        "Label": label,
                        "Shape": gen_part.get("shape"),
                        "IoU": round(iou, 3)
                    })

            # --- Compute origin placement error for parts with 'origin' ---
            for label in shared_labels:
                gen_part = gen_parts[label]
                ref_part = ref_parts[label]

                if "origin" in gen_part and "origin" in ref_part:
                    origin_err = euclidean_distance_3d(gen_part["origin"], ref_part["origin"])
                    origin_errors.append({
                        "File": gen_file.name,
                        "Label": label,
                        "Origin Error": round(origin_err, 4)
                    })

            # --- Compare profile geometry (for extrusions and revolutions) ---
            for label in shared_labels:
                gen_part = gen_parts[label]
                ref_part = ref_parts[label]

                if gen_part.get("shape") in {"extrusion", "revolution"} and ref_part.get("shape") == gen_part.get("shape"):
                    profile_err = mean_profile_distance(gen_part, ref_part)
                    if profile_err is not None:
                        profile_errors.append({
                            "File": gen_file.name,
                            "Label": label,
                            "Profile Error": round(profile_err, 4)
                        })

            for label in shared_labels:
                gen_part = gen_parts[label]
                ref_part = ref_parts[label]

                gen_cutouts = gen_part.get("cutouts", [])
                ref_cutouts = ref_part.get("cutouts", [])

                if isinstance(gen_cutouts, list) and isinstance(ref_cutouts, list):
                    tp = min(len(gen_cutouts), len(ref_cutouts))  # assumed matched cutouts
                    fp = max(0, len(gen_cutouts) - len(ref_cutouts))
                    fn = max(0, len(ref_cutouts) - len(gen_cutouts))

                    precision = tp / (tp + fp) if (tp + fp) else 0
                    recall = tp / (tp + fn) if (tp + fn) else 0
                    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) else 0

                    cutout_results.append({
                        "File": gen_file.name,
                        "Label": label,
                        "Gen Cutouts": len(gen_cutouts),
                        "Ref Cutouts": len(ref_cutouts),
                        "Precision": round(precision, 2),
                        "Recall": round(recall, 2),
                        "F1 Score": round(f1, 2)
                    })

        # Label counts
        gen_labels = Counter(gen_parts.keys())
        ref_labels = Counter(ref_parts.keys()) if ref_data else {}

        gen_shapes = Counter(p.get("shape", "unknown") for p in gen_parts.values() if isinstance(p, dict))
        ref_shapes = Counter(p.get("shape", "unknown") for p in ref_parts.values() if isinstance(p, dict)) if ref_data else {}

        # Scores
        matched_labels = set(gen_labels.keys()) & set(ref_labels.keys()) if ref_data else set(gen_labels.keys())
        label_precision = len(matched_labels) / len(gen_labels) if gen_labels else 0
        label_recall = len(matched_labels) / len(ref_labels) if ref_labels else 0

        matched_shapes = set(gen_shapes.keys()) & set(ref_shapes.keys()) if ref_data else set(gen_shapes.keys())
        shape_accuracy = len(matched_shapes) / len(ref_shapes) if ref_shapes else 0

        generalization_score = (label_precision + label_recall + shape_accuracy) / 3

        # Bounding box presence
        has_bbox = "bounding_box" in gen_parts or any("bounding_box" in k for k in gen_parts)

        summary = {
            "File": gen_file.name,
            "Part Count": len(gen_parts),
            "Label Count": len(gen_labels),
            "Shape Count": len(gen_shapes),
            "Has Bounding Box": has_bbox,
        }

        if ref_file:
            summary.update({
                "Part Count Error": abs(len(gen_parts) - len(ref_parts)),
                "Label Precision": round(label_precision, 2),
                "Label Recall": round(label_recall, 2),
                "Shape Accuracy": round(shape_accuracy, 2),
                "Generalization Score": round(generalization_score, 2),
            })
        else:
            summary.update({
                "Part Count Error": "N/A",
                "Label Precision": round(label_precision, 2),
                "Label Recall": "N/A",
                "Shape Accuracy": "N/A",
                "Generalization Score": "N/A",
            })

        summary_rows.append(summary)


        # --- Per-label metrics ---
        all_labels = set(ref_labels.keys()) | set(gen_labels.keys()) if ref_data else set(gen_labels.keys())
        label_metrics = []

        for label in sorted(all_labels):
            tp = int(label in gen_labels and label in ref_labels) if ref_data else 0
            fp = int(label in gen_labels and label not in ref_labels) if ref_data else 0
            fn = int(label in ref_labels and label not in gen_labels) if ref_data else 0

            precision = tp / (tp + fp) if (tp + fp) else 0
            recall = tp / (tp + fn) if (tp + fn) else 0
            f1 = 2 * precision * recall / (precision + recall) if (precision + recall) else 0

            label_metrics.append({
                "File": gen_file.name,
                "Label": label,
                "TP": tp,
                "FP": fp,
                "FN": fn,
                "Precision": round(precision, 2),
                "Recall": round(recall, 2),
                "F1 Score": round(f1, 2)
            })

        label_metrics_df = pd.DataFrame(label_metrics)
        iou_df = pd.DataFrame(iou_results)
        if not iou_df.empty:
            print("\n Bounding Box IoU (Cuboid, Extrusion, Revolution)")
            display(iou_df)

        origin_df = pd.DataFrame(origin_errors)
        if not origin_df.empty:
            print("\n Origin Placement Error (Euclidean Distance)")
            display(origin_df)

            profile_df = pd.DataFrame(profile_errors)
            if not profile_df.empty:
                print("\n Profile Geometry Error (Mean Vertex Distance)")
                display(profile_df)

            cutout_df = pd.DataFrame(cutout_results)
            if not cutout_df.empty:
                print("\n Cutout Count Accuracy")
                display(cutout_df)

            # Print details
            print(f"\n {gen_file.name}")
            if unsupervised_issues:
                print(" Unsupervised Validation Issues:")
                for issue in unsupervised_issues:
                    print(f"   {issue}")
            else:
                print(" Unsupervised checks passed.")
            print(f" Labels: {dict(gen_labels)}")
            print(f" Shapes: {dict(gen_shapes)}")

        # Summary
        if not summary_rows:
            print("\n No valid structured-reference matches were found.")
            return

        summary_df = pd.DataFrame(summary_rows)
        print("\n Validation Summary")
        display(summary_df)

        score_cols = ["File", "Label Precision", "Label Recall", "Shape Accuracy", "Generalization Score"]
        if all(col in summary_df.columns for col in score_cols):
            print("\n Scoring Overview")
            display(summary_df[score_cols])


## 3. File Paths  
Define where the validator will read structured output files from, and (optionally) where the ground truth reference files are located.

If no reference is available, only a generated folder is needed.  
If reference validation is desired, provide both.


In [163]:
# Define input and output directories for processing
# Set your actual folder paths here
parts_folder = Path("")
reference_folder = Path("")  # Optional

## 4. Execute Validator  
This section runs validation on all structured output JSON files in the given folder.  
If a reference folder is provided, comparison will also be performed.


In [None]:
validate_structured_outputs(parts_folder, reference_folder)

## Final Notes

This notebook is part of a **four-step modular pipeline** for extracting and validating BIM-ready geometry from structural engineering drawings.

### Output Location
- Validation results are displayed in the notebook via dataframes and printed logs.
- No new files are written unless extended for report generation.

### How to Run
1. Set your `parts_folder` (and optionally `reference_folder`) paths in **Section 3**.
2. Ensure structured output files (`*_structured_output.json`) exist in the input folder.
3. If reference data is available, place ground truth JSON files in the reference folder.
4. Run all cells from top to bottom to validate schema and geometry.

### Next Step
- This is the final step in the pipeline. Review any flagged validation issues.

### Documentation
For full setup instructions and pipeline details, see the [README.md](https://github.com/ThadaMan/Thesis/blob/main/README.md) in the repository.
