# JSON Builder Module – Structural Geometry Pipeline  
**Author**: Thaddeus da Silva Correa  
**Project**: Automated Extraction and Interpretation of Structural Geometry from CAD Drawings for BIM Integration  
**Module**: 3 of 4 – JSON Builder  
**Environment**: Google Colab  
**Last updated**: June 2025

---

This module takes structured geometry data (from `part.json` files) and processes it into a BIM-compatible output format. It formats various types of parts (extrusions, revolutions, cuboids, etc.), assigns labels, and ensures that the generated JSON adheres to the required schema. It also validates and saves the final structured output, making it ready for use in structural engineering applications.

**Inputs**: Geometry data in JSON format (produced by the Geometry Interpreter)  
**Outputs**: Structured parts definitions in JSON, including labeled extrusions, revolutions, and cuboids with metadata


## 1. Setup  
Import necessary libraries and define file system paths.


In [17]:
import json
from typing import List, Dict
from collections import defaultdict
from pathlib import Path
import uuid

## 2. Core Parsing Functions  
In this section, we define the core functions used to format and structure the parsed parts into JSON. These functions are organized into the following categories:

** A. Geometry and Shape Normalization**  
** B. Part Formatting**  
** C. Output Validation**  
** D. File I/O**  
** E. Part Processing**


### A. Geometry and Shape Normalization  
Normalize the geometry and labels of parts.


In [18]:
def flatten_profile(points: List) -> List[Dict[str, float]]:
    """
    Convert a list of coordinate points into a flat 2D profile
    with z=0 for consistency in extrusion shapes.
    """
    return [{"x": p[0], "y": p[1], "z": 0.0} for p in points if isinstance(p, (list, tuple)) and len(p) >= 2]


def normalize_shape(part: Dict) -> str:
    """
    Normalize the geometric type of a part for standardization.
    """
    t = part.get("type", "").lower()
    if t == "extrusion":
        return "extrusion"
    elif t == "revolution":
        return "revolution"
    elif t in {"bounding_box", "cuboid", "box"}:
        return "cuboid"
    return "custom"


def normalize_label(label: str) -> str:
    """
    Normalize raw or model-generated labels into canonical form.
    """
    label = label.lower()
    mapping = {
        "main_body": "channel",
        "channel": "channel",
        "bracket": "bracket",
        "support": "bracket",
        "bounding_box": "bounding_box",
        "channel_bounding_box": "channel_bounding_box",
        "anchors_bounding_box": "anchors_bounding_box",
        "anchor": "anchors",
        "extruded": "unlabeled",
        "unlabeled": "unlabeled",
    }
    return mapping.get(label, label)


def guess_shape_from_geometry(part: Dict) -> str:
    """
    Infer shape type from geometry keys present in the part dictionary.
    """
    if "profile" in part and "path" in part:
        return "extrusion"
    if "axis" in part:
        return "revolution"
    if "boundingBox" in part or "parameters" in part:
        return "cuboid"
    return "custom"


def map_label_to_shape(label: str) -> str:
    """
    Map normalized labels to canonical shape types.
    """
    shape_map = {
        "anchors": "rebar",
        "channel": "extrusion",
        "plate": "extrusion",
        "bracket": "extrusion",
        "bounding_box": "cuboid",
        "anchors_bounding_box": "cuboid",
        "channel_bounding_box": "cuboid",
    }
    return shape_map.get(label, "custom")


def assign_labels_smart(part: Dict) -> str:
    """
    Assign the most appropriate label to a part based on its geometry and existing metadata.
    """
    shape = normalize_shape(part)
    label = normalize_label(part.get("label", ""))

    # Handle bounding box types
    if part.get("type") == "bounding_box" or "bounding_box" in label:
        return label or "bounding_box"

    # Handle revolutions (typically anchors)
    if shape == "revolution" or label in {"anchor", "anchors"}:
        return "anchors"

    # Classify extrusion shapes by length
    if shape == "extrusion":
        profile = part.get("profile", [])
        path = part.get("path", [])
        if profile and path and len(path) > 1:
            p0, p1 = path[0], path[-1]
            dz = abs(p1[2] - p0[2])
            if dz > 500:  # Threshold to distinguish channel from plate
                return "channel"
            return "plate"

    return label if label else "unlabeled"

### B. Part Formatting  
Functions to convert interpreted geometry into standardized JSON-compatible structures for extrusion, revolution, cuboid, or custom parts.


In [19]:
def force_single_channel(parts: List[Dict]):
    """
    If only one plausible extrusion exists in the parts list and its label is ambiguous,
    force it to be a 'channel' for clarity in downstream processing.
    """
    candidates = [
        p for p in parts
        if normalize_shape(p) == "extrusion" and p.get("label") in {"bracket", "plate", "extruded", "unlabeled"}
    ]
    if len(candidates) == 1:
        candidates[0]["label"] = "channel"
        candidates[0]["shape"] = "extrusion"
        candidates[0]["classification_reason"] = "forced channel due to only plausible extrusion"


def compute_overall_bounding_box(parts: List[Dict]) -> Dict:
    """
    Compute the global bounding box that encloses all parts using their center and dimensions.
    """
    min_x = min_y = min_z = float("inf")
    max_x = max_y = max_z = float("-inf")

    for part in parts:
        p = part.get("parameters", {})
        c = p.get("center", {})
        d = p.get("dimensions", {})
        cx, cy, cz = c.get("x", 0), c.get("y", 0), c.get("z", 0)
        lx, ly, lz = d.get("length", 0), d.get("width", 0), d.get("height", 0)
        min_x, max_x = min(min_x, cx - lx / 2), max(max_x, cx + lx / 2)
        min_y, max_y = min(min_y, cy - ly / 2), max(max_y, cy + ly / 2)
        min_z, max_z = min(min_z, cz - lz / 2), max(max_z, cz + lz / 2)

    return {
        "type": "cuboid",
        "min_point": {"x": min_x, "y": min_y, "z": min_z},
        "max_point": {"x": max_x, "y": max_y, "z": max_z}
    }


def format_extrusion(part: Dict) -> Dict:
    """
    Format an extrusion part into a standardized output schema with profile and path geometry.
    """
    def fmt_point(p): return {
        "x": round(p[0], 4), "y": round(p[1], 4), "z": round(p[2], 4), "type": "boundary"
    }

    def fmt_path_point(p): return {
        "x": round(p[0], 4), "y": round(p[1], 4), "z": round(p[2], 4), "type": "centerline"
    }

    profile_raw = part.get("profile", [])
    if not profile_raw or not isinstance(profile_raw, list):
        raise ValueError("Extrusion part missing or invalid 'profile' points")

    profile = [fmt_point(pt) for pt in profile_raw if isinstance(pt, (list, tuple)) and len(pt) == 3]

    path_raw = part.get("path", [])
    path = [fmt_path_point(pt) for pt in path_raw if isinstance(pt, (list, tuple)) and len(pt) == 3]

    origin_pt = part.get("origin", [0, 0, 0])
    if not isinstance(origin_pt, list) or len(origin_pt) != 3:
        origin_pt = [0, 0, 0]

    output = {
        "type": "extrusion",
        "origin": {
            "x": round(origin_pt[0], 4),
            "y": round(origin_pt[1], 4),
            "z": round(origin_pt[2], 4),
            "rotation": [0, 0, 0]
        },
        "profile": profile,
        "path": path
    }

    if isinstance(part.get("cutout"), list) and part["cutout"]:
        output["cutout"] = part["cutout"]

    return output


def format_revolution(part: Dict) -> Dict:
    """
    Format a revolution part using axis and sweep parameters.
    """
    axis = part.get("axis", {})
    origin = axis.get("origin", [0, 0, 0])
    direction = axis.get("direction", [0, 0, 1])
    angle = part.get("angle", 360)

    if not (isinstance(origin, list) and len(origin) == 3):
        origin = [0, 0, 0]
    if not (isinstance(direction, list) and len(direction) == 3):
        direction = [0, 0, 1]
    if not isinstance(angle, (int, float)):
        angle = 360

    return {
        "shape": "revolution",
        "axis_origin": [round(x, 4) for x in origin],
        "axis_direction": [round(x, 4) for x in direction],
        "sweep_angle": round(angle, 4),
        "origin": {
            "x": round(origin[0], 4),
            "y": round(origin[1], 4),
            "z": round(origin[2], 4),
            "rotation": [0, 0, 0]
        }
    }


def format_cuboid(part: Dict) -> Dict:
    """
    Format a cuboid part from bounding box or parameter info.
    """
    if "boundingBox" in part:
        bbox = part["boundingBox"]
        min_pt = bbox.get("min", [0, 0, 0])
        max_pt = bbox.get("max", [0, 0, 0])
    elif "parameters" in part:
        center = part["parameters"].get("center", {"x": 0, "y": 0, "z": 0})
        dims = part["parameters"].get("dimensions", {"length": 0, "width": 0, "height": 0})
        half_l = dims.get("length", 0) / 2
        half_w = dims.get("width", 0) / 2
        half_h = dims.get("height", 0) / 2
        min_pt = [
            center.get("x", 0) - half_l,
            center.get("y", 0) - half_w,
            center.get("z", 0) - half_h
        ]
        max_pt = [
            center.get("x", 0) + half_l,
            center.get("y", 0) + half_w,
            center.get("z", 0) + half_h
        ]
    else:
        min_pt = [0, 0, 0]
        max_pt = [0, 0, 0]

    return {
        "shape": "cuboid",
        "min_point": {
            "x": round(min_pt[0], 4),
            "y": round(min_pt[1], 4),
            "z": round(min_pt[2], 4)
        },
        "max_point": {
            "x": round(max_pt[0], 4),
            "y": round(max_pt[1], 4),
            "z": round(max_pt[2], 4)
        }
    }


def format_custom(part: Dict) -> Dict:
    """
    Fallback formatter for shapes that don't fit standard categories (extrusion, revolution, cuboid).
    """
    allowed_keys = {
        "origin", "cutout", "profile", "path",
        "axis_origin", "axis_direction", "sweep_angle"
    }
    cleaned = {
        "shape": "custom"
    }

    for key in allowed_keys:
        val = part.get(key)
        if val is None:
            continue

        if key == "origin" and isinstance(val, list) and len(val) == 3:
            cleaned["origin"] = {
                "x": round(val[0], 4),
                "y": round(val[1], 4),
                "z": round(val[2], 4),
                "rotation": [0, 0, 0]
            }

        elif key in {"profile", "path"} and isinstance(val, list):
            pt_type = "boundary" if key == "profile" else "centerline"
            points = [
                {"x": round(p[0], 4), "y": round(p[1], 4), "z": round(p[2], 4), "type": pt_type}
                for p in val if isinstance(p, (list, tuple)) and len(p) == 3
            ]
            if points:
                cleaned[key] = points

        elif key in {"axis_origin", "axis_direction"} and isinstance(val, list) and len(val) == 3:
            cleaned[key] = [round(v, 4) for v in val]

        elif key == "cutout" and isinstance(val, list):
            cleaned["cutout"] = val

        elif key == "sweep_angle" and isinstance(val, (int, float)):
            cleaned["sweep_angle"] = round(val, 4)

    if "origin" not in cleaned:
        cleaned["origin"] = {
            "x": 0,
            "y": 0,
            "z": 0,
            "rotation": [0, 0, 0]
        }

    return cleaned


### C. Output Validation  
Validate the structure and completeness of the generated BIM JSON output using basic schema rules.

In [20]:
def validate_output_schema(output: Dict):
    """
    Validate the structure of the output JSON against the required schema.

    Checks for:
    - Required fields in the 'general' metadata section.
    - Required attributes per part depending on its shape.
    """
    required_general_keys = {"product", "variant", "reference", "type", "material", "manufacturer"}
    general = output.get("general", {})
    missing = [k for k in required_general_keys if k not in general]

    if missing:
        print(f" Missing general keys: {missing}")

    parts = output.get("bim", {}).get("parts", {})
    for label, part in parts.items():
        shape = part.get("shape", "")
        if shape == "extrusion":
            for key in ["profile", "origin"]:
                if key not in part:
                    print(f" Part '{label}' is missing required extrusion key '{key}'")

        elif shape == "rebar":
            if "instances" not in part or not isinstance(part["instances"], list):
                print(f" Part '{label}' has invalid or missing 'instances' list for rebar")

        elif shape == "cuboid":
            if "min_point" not in part or "max_point" not in part:
                print(f" Part '{label}' is missing min/max points for cuboid")

### D. File I/O  
Save the validated and formatted parts into a JSON file.

In [21]:
def save_json_output(data: Dict, output_path: str):
    """
    Save the final structured JSON output to disk with indentation.

    Args:
        data (Dict): Final JSON-compatible dictionary.
        output_path (str): Filepath where the JSON will be saved.
    """
    with open(output_path, "w") as f:
        json.dump(data, f, indent=2)
    print(f" Saved structured output to {output_path}")

### E. Part Processing  
Handle formatting, labeling, and assembling of parsed parts into a final structured JSON output.


In [22]:
def build_structured_output(parts: List[Dict], product_code: str, product_family: str = "HTA", material: str = "HCR") -> Dict:
    """
    Build a structured BIM-ready JSON output from interpreted parts.

    Args:
        parts (List[Dict]): List of interpreted geometry parts.
        product_code (str): Unique product variant or reference code.
        product_family (str): Product family name (e.g., HTA).
        material (str): Material name (e.g., HCR).

    Returns:
        Dict: Structured JSON output ready for BIM consumption.
    """
    output = {
        "_id": str(uuid.uuid4()),
        "general": {
            "product": product_family,
            "variant": product_code,
            "reference": product_code,
            "type": "cast_in_channel",
            "material": material,
            "manufacturer": "Leviat"
        },
        "calculation": {},
        "bim": {
            "parts": {},
            "insertion_points": {},
            "chaining_point": {
                "is_insertion_point": True,
                "origin": {"x": 0, "y": 0, "z": 0}
            },
            "measurement_points": [],
            "classification": {}
        }
    }

    print(f"\n Building JSON for {product_code}")
    print(f" Number of interpreted parts: {len(parts)}")

    force_single_channel(parts)

    # Clean and normalize all parts
    for part in parts:
        part["cutout"] = part.pop("cutouts", [])
        part["profile"] = flatten_profile(part.get("profile", [])) if "profile" in part else []
        part["type"] = part.get("type", "").lower()
        for key in ["label", "classification_reason", "boundingBox", "parameters", "source", "debug"]:
            part.pop(key, None)

    # Assign semantic labels and determine shape types
    for part in parts:
        label = assign_labels_smart(part)
        shape = map_label_to_shape(label)
        if shape == "custom":
            shape = guess_shape_from_geometry(part)

        part.update({
            "label": label,
            "shape": shape,
            "classification_reason": f"assigned label '{label}' with shape '{shape}'"
        })

    grouped = defaultdict(list)
    cutout_ids = set()
    label_counter = defaultdict(int)
    used_labels = set()

    # Group and mark cutouts to avoid top-level duplication
    for part in parts:
        grouped[part["label"]].append(part)
        for cutout in part.get("cutout", []):
            cutout_ids.add(id(cutout))

    for label, instances in grouped.items():
        if label == "anchors":
            formatted_instances = [format_revolution({k: v for k, v in part.items() if k not in {"label", "classification_reason"}}) for part in instances]
            output["bim"]["parts"][label] = {
                "shape": "rebar",
                "instances": formatted_instances
            }
            continue

        for part in instances:
            if id(part) in cutout_ids:
                print(f" Skipping cutout part from top-level output: {part.get('label', '')}")
                continue

            base_label = label
            label_counter[base_label] += 1
            label_id = f"{base_label}_{label_counter[base_label]}" if label_counter[base_label] > 1 else base_label
            while label_id in used_labels:
                label_counter[base_label] += 1
                label_id = f"{base_label}_{label_counter[base_label]}"
            used_labels.add(label_id)

            # 📐 Attach embedded revolutions as cutouts
            cutouts = []
            for other in parts:
                if other is part or other.get("shape") != "revolution":
                    continue
                try:
                    cutouts.append(format_revolution(other))
                except Exception as e:
                    print(f" Failed to format cutout: {e}")
            if cutouts:
                part["cutout"] = cutouts

            # Format part geometry
            try:
                if part["shape"] == "extrusion":
                    geometry = format_extrusion(part)
                    print(f"✂️ Cutout count in part '{label_id}': {len(geometry.get('cutout', []))}")
                elif part["shape"] == "cuboid":
                    geometry = format_cuboid(part)
                elif part["shape"] == "revolution":
                    geometry = format_revolution(part)
                else:
                    geometry = format_custom(part)

                if "shape" not in geometry:
                    geometry["shape"] = part["shape"]

                output["bim"]["parts"][label_id] = geometry

            except Exception as e:
                print(f" Failed to insert part '{label_id}': {e}")

    #  Add fallback bounding box and insertion box
    bbox_raw = compute_overall_bounding_box(parts)
    bbox_formatted = format_cuboid(bbox_raw)

    if "bounding_box" not in output["bim"]["parts"]:
        output["bim"]["parts"]["bounding_box"] = {
            "shape": "cuboid",
            **bbox_formatted
        }

    output["bim"]["insertion_box"] = {
        "shape": "cuboid",
        **bbox_formatted
    }

    validate_output_schema(output)
    return output


def process_parts_folder(parts_folder: str, output_folder: str):
    """
    Process all *_parts.json files in a folder and generate structured BIM JSON files.

    Args:
        parts_folder (str): Folder containing parsed part JSONs.
        output_folder (str): Destination folder for structured output.
    """
    input_path = Path(parts_folder)
    output_path = Path(output_folder)
    output_path.mkdir(parents=True, exist_ok=True)

    for file in input_path.glob("*_parts.json"):
        with open(file) as f:
            parts = json.load(f)
        product_code = file.stem.replace("_parts", "")
        structured = build_structured_output(parts, product_code)
        save_json_output(structured, str(output_path / f"{product_code}_structured_output.json"))


## 3. File Paths  
Define where the JSON Builder will read part files from and where it will save the structured output.


In [23]:
# Define input and output directories for processing
parts_folder = Path("")
output_folder = Path("")

## 4. Execute JSON Builder  
This section processes the parts from the defined folder and saves the structured output JSON to the specified output folder.


In [None]:
# Run the JSON builder process across all part files
process_parts_folder(parts_folder, output_folder)

## Final Notes

This notebook is part of a **four-step modular pipeline** for extracting and validating BIM-ready geometry from structural engineering drawings.

### Output Location
- Structured BIM geometry is saved as `_structured_output.json` files in the defined `output_folder`.

### How to Run
1. Set your `parts_folder` and `output_folder` paths in **Section 3**.
2. Ensure interpreted part files (`*_parts.json`) exist in the input folder.
3. Run all cells from top to bottom.

### Next Step
- Continue to the next notebook: `[Validator]`

### Documentation
For full setup instructions and pipeline details, see the [README.md](https://github.com/ThadaMan/Thesis/blob/main/README.md) in the repository.
