Scan-to-BIM

AI-Driven 2D Floorplan to BIM-Grade 3D Reconstruction

This project implements a research-oriented Scan-to-BIM pipeline that converts a 2D architectural floorplan image into a metric-scaled, editable 3D building model (GLB) using deep learning, computational geometry, and raster-to-vector reconstruction.

Unlike mesh-based visualizers, the system reconstructs architectural structure—walls, doors, topology, and scale—suitable for BIM-style downstream use.

Project Objective

Input

A raster floorplan image (PNG / JPG)

Output

A metric-scale 3D building model containing:

Walls represented as solids
Door openings with headers (lintels)
Floor slab
Consistent pixel-to-meter scaling
Exportable .glb model
Interactive browser-based visualization

The system focuses on geometric reconstruction, not surface visualization.

The Current Working System

Our state-of-the-art methodology leverages lightweight prediction coupled with deep mathematical reconstruction.

Models

U-Net (ResNet-34 backbone): Customized Version 3. Trained for high-fidelity pixel-wise semantic segmentation of walls.
YOLOv8 Medium: Trained to detect topological features (doors, sliding doors, windows) and architectural furniture (beds, toilets, sinks, cabinets).

Datasets

Custom Synthesized Raster Set: Employed dynamic OpenCV morphological augmentation (dilations, erosions) to train the U-Net for bridging semantic gaps.
FloorPlanCAD (Adapted): Subset of high-quality architectural CAD vectors exported to raster specifically for training the YOLO object detector.

Algorithms & Math

Raster-to-Vector topology: Generates 1D spatial skeletons from 2D pixel masses.
RDP (Ramer-Douglas-Peucker): Curve simplification algorithm condensing pixel clouds into distinct vector segments.
PCA (Principal Component Analysis): Utilized for least-squares regression line-fitting on segmented pixel clouds.
Hann-window blending: $H(n) = 0.5 \left(1 - \cos\left(\frac{2\pi n}{N-1}\right)\right)$. Utilized to merge segmented $512 \times 512$ tile inferences smoothly back into high-resolution maps.
Bounding Centroids: Spatial formulas guaranteeing the Babylon.js FPS camera spawns strictly inside the interior topological boundary.

Libraries

AI/ML: torch (PyTorch inference & training), ultralytics (YOLO engine).
Computer Vision: cv2 (OpenCV morphologies), skimage.morphology (skeletonization).
Geometry: networkx (graph traversal), numpy (vector math), trimesh (3D boolean construction & GLB export).
Frontend: streamlit (UI sandboxing), Babylon.js (WebGL interactive gamified navigation layer).

Failed Trials & Deprecated Approaches

Throughout research and scale-up, several approaches were abandoned:

Abandoned Models (Zero-Shot SAM): Initially attempted to use Meta's Segment Anything Model (SAM) alongside the Roboflow API. Zero-shot foundation models completely failed to comprehend arbitrary 2D architecture logic (e.g., standard CAD door swing arcs) and were inherently too slow for single-pass inference.
Abandoned Datasets (Raw FloorPlanCAD): Training the U-Net on raw CAD vectors yielded brittle inferences. We realized we had to intentionally degrade our training data using noise and artifact injections to build a robust model capable of parsing real-world, messy raster images.
Abandoned Algorithms (2D Topological Carves - Scenario B/C): We initially attempted to mathematically splice wall vectors purely in 2D space before 3D generation. This proved extremely brittle, causing cascading intersection failures on non-orthogonal walls. We pivoted to a 3D generative projection approach where the Python engine safely emits bounding box matrices (door_metadata) and Babylon.js constructs the collision dynamics natively.
Abandoned Algorithms (Hough Line Transform): Standard computer vision heuristic for floorplan mapping. Resulted in highly fragmented, overlapping, and useless lines across thicker wall segments. Abandoned in favor of our hybrid Skeletonization + PCA strategy.

System Architecture

Floorplan Image
      │
      ▼
[ Phase 1 ]  CNN-Based Wall Probability Estimation
      │
      ▼
[ Phase 4 ]  High-Resolution Mask Generation
      │
      ▼
[ Phase 5 ]  Raster-to-Vector Geometry Extraction
      │
      ▼
[ Phase 3 ]  Procedural BIM-Style Construction
      │
      ▼
     GLB 3D Building Model

Each phase is modular and independently extensible.

Phase Overview

Phase 1 — Perception (Deep Learning)

Status: Implemented

Model: UNet with ResNet-34 encoder
Task: Pixel-wise wall probability estimation
Output: Floating-point probability map

Key characteristics:

ImageNet-normalized inference
Aspect-ratio–preserving resizing
GPU-accelerated PyTorch pipeline

Phase 4A — High-Resolution Tiled Inference

Status: Implemented

Addresses the resolution mismatch between fixed-input CNNs and large floorplans.

Input images are divided into overlapping 512×512 tiles
Each tile is independently processed by the CNN
Outputs are merged using Hann-window weighted blending

This preserves wall continuity and global geometry while maintaining high spatial resolution.

This approach is conceptually aligned with SAHI-style inference.

Phase 4B — Mask Refinement

Status: Implemented

Post-processing of CNN probability maps to ensure structural integrity before vectorization.

Operations include:

Hard thresholding
Connected-component filtering for noise removal
Morphological closing to bridge small gaps
Morphological opening for edge smoothing

The output is a clean, contiguous wall mask suitable for geometric processing.

Phase 5A — Hybrid Raster-to-Vector Conversion

Status: Implemented

This phase performs the core Scan-to-BIM transformation by combining:

Skeleton-based topology extraction
Pixel-region–based geometric fitting

Pipeline steps:

Skeletonization of the refined mask
Conversion of the skeleton into a graph
Tracing of wall paths between junctions
Segmentation at geometric bends using RDP
Extraction of wall pixel regions around each segment
Least-squares (PCA) line fitting on pixel clouds
Generation of CAD-grade wall axes

This produces straight, metric-accurate wall vectors while preserving connectivity.

Phase 5B — Junction and Topology Optimization

Status: Implemented

Enhancements include:

Vertex snapping and corner closure
Manhattan-world (orthogonality) enforcement
Noise-gap healing and wall truncation bridging

Objective: to produce a topologically consistent floorplan suitable for semantic reasoning.

Phase 2C — Semantic Object Detection & Structural Correction

Status: Implemented

Incorporates YOLOv8 object detection to identify architectural features (doors, sliding doors) and natively carve topological gaps into the vectorized walls based on advanced probabilistic geometric slicing heuristics.

Phase 3 — BIM-Style 3D Construction

Status: Implemented

Using Trimesh, vector geometry is converted into 3D architectural solids:

Pixel-to-meter scaling
Wall extrusion with thickness and height
Door gap detection
Header (lintel) generation
Floor slab construction
Watertight GLB export

Phase 6 — Gamified 3D Navigation Layer (Babylon.js)

Status: Implemented

Replaces the passive model viewer with a First-Person fully interactive spatial experience natively injected into Streamlit:

Physics & Collision: Bounding Centroid calculations force the Universal Camera to spawn cleanly inside the geometry natively.
Canvas Minimaps: Static HTML canvas rendering the 2D bounding walls with a live telemetry player tracker.
Raycast Interaction: Click-and-drag mouse look with a 2.5-meter Raycast establishing physical $90^\circ$ dynamic door swings.

Phase 7 — Parametric BIM Extensions & Multi-Floor

Status: Planned

Editable wall thickness and door dimensions
IFC / Revit-compatible export
Stair detection and floor stacking
Absolute scale calibration

Academic Context & Constrained Compute Constraints

This engine achieves state-of-the-art heuristic 3D reconstruction exclusively utilizing free-tier constrained resources (1x Kaggle P100/T4, 30 hrs/week).

Compared to contemporary "Floorplan-to-3D" repositories:

FloorPlanTo3D-unityClient (Custom Mask R-CNN) utilizes heavy multi-day training loops on Resnet101 backbones to extract geometric walls. Instead, we shifted entirely to a fast morphology-cleaned U-Net topology generator.
3DPlanNet & DeepFloorplan (Hybrid Approaches) utilize similar object detection logic, however typically render offline using heavy Desktop clients. We achieve real-time topological slicing purely by math projections mapped to YOLOv8 Medium, keeping total VRAM utilization dramatically low (~12GB max).
Babylon.js & Streamlit vs. Unity/Unreal Engine: Unlike Unity-client implementations, this project runs the entire spatial simulation seamlessly via sandboxed HTML iframes requiring exactly 0 external dependencies besides standard pip modules.

By balancing Deep Learning (U-Net & YOLO) strictly for prediction, and Computational Geometry (Raster-to-Vector topology gaps) for reconstruction, this engine mathematically exceeds the hardware limits of local consumer graphics cards while maintaining high precision.

Web Interface

A Streamlit-based interface provides:

Floorplan upload
Inference mode selection (Fast / High-Fidelity)
End-to-end Scan-to-BIM execution
Interactive 3D visualization
GLB export

The viewer is implemented using Google’s <model-viewer> and Base64-embedded GLB rendering.

Repository Structure

app.py        → Web interface
pipeline.py   → Scan-to-BIM processing pipeline
model_links.txt
sample_io/

Current Capabilities

Capability	Status
CNN-based wall detection	Implemented
High-resolution tiled inference	Implemented
Mask refinement	Implemented
Skeleton-based topology	Implemented
Pixel-cloud line fitting	Implemented
CAD-grade wall vectors	Implemented
Phase 5B (Geometric Cleanup)	Implemented
Supervised YOLOv8 Training	Implemented (mAP50=0.80)
Phase 2C (Wall-Gap Correction)	Implemented
Metric scaling	Implemented
BIM-style 3D model	Implemented
Web-based visualization	Implemented

Known Limitations

Thin or ambiguous walls may be missed by the U-Net.
Intra-class YOLOv8 confusion (single vs. double vs. sliding doors), mitigated by treating all door categories interchangeably for structural logic.
Furniture Orientation: Furniture orientation relies on snap-alignment to the nearest wall, which could be inaccurate for center-room furniture.
Single-floor support only.

Project Scope and Distinction

Most systems generate surface meshes. This system reconstructs architectural geometry.

It explicitly models topology, geometry, and scale to generate a true BIM-oriented pipeline.

Current Status

Version 2.0 — Structural Scan-to-BIM Engine

Phase 1 (Geometry Execution): U-Net Inference, Trimesh routing, and Phase 5B Junction Optimization are fully implemented.
Phase 2 (Architectural Detection): A custom YOLOv8 Object Detection model has been happily trained on Kaggle (mAP50=0.80) over 15k FloorPlanCAD samples.
Phase 2C (Structural Correlation): Completely implemented! High-fidelity YOLO detections retroactively project probabilistically onto Phase 1 maps, natively slicing intersections for real physically extracted door openings and placing dynamically sized furniture.

Evaluation Metrics

The core Phase 1 U-Net perception model was evaluated on a held-out dataset of 50 samples:

Model	IoU	Dice	Precision	Recall	F1	Accuracy
v1 (10ep)	0.8224	0.9005	0.8957	0.9080	0.9005	0.9817
v2 (+30ep)	0.9333	0.9653	0.9716	0.9593	0.9653	0.9936
v3 (+30ep)	0.9613	0.9801	0.9824	0.9781	0.9801	0.9964
v4 (aug+val)	0.9190	0.9576	0.9415	0.9745	0.9576	0.9921

The active repository currently uses the v3 weights.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets/furniture/kenney_furniture-kit		assets/furniture/kenney_furniture-kit
sampleio		sampleio
study_materials		study_materials
.gitignore		.gitignore
README.md		README.md
app.py		app.py
kaggle_train_unet.py		kaggle_train_unet.py
kaggle_train_yolov8.py		kaggle_train_yolov8.py
main_demons.py		main_demons.py
main_direct.py		main_direct.py
model_link.txt		model_link.txt
pipeline.py		pipeline.py
requirements.txt		requirements.txt
structural_corrector.py		structural_corrector.py

Folders and files

Latest commit

History

Repository files navigation

Scan-to-BIM

Project Objective

Input

Output

The Current Working System

Models

Datasets

Algorithms & Math

Libraries

Failed Trials & Deprecated Approaches

System Architecture

Phase Overview

Phase 1 — Perception (Deep Learning)

Phase 4A — High-Resolution Tiled Inference

Phase 4B — Mask Refinement

Phase 5A — Hybrid Raster-to-Vector Conversion

Phase 5B — Junction and Topology Optimization

Phase 2C — Semantic Object Detection & Structural Correction

Phase 3 — BIM-Style 3D Construction

Phase 6 — Gamified 3D Navigation Layer (Babylon.js)

Phase 7 — Parametric BIM Extensions & Multi-Floor

Academic Context & Constrained Compute Constraints

Web Interface

Repository Structure

Current Capabilities

Known Limitations

Project Scope and Distinction

Current Status

Evaluation Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages