Skip to content

3DOM-FBK/3DGeoRef

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3DGeoRef

3DGeoRef is an automated pipeline for georeferencing 3D models using synthetic rendering, AI-powered geolocation, and satellite imagery. The system transforms an arbitrary 3D model into a georeferenced asset aligned with real-world geographic coordinates, ready for integration into GIS systems or 3D viewers.

Table of Contents


Overview

Given a 3D model (GLB or GLTF format), 3DGeoRef performs the following operations:

  1. Synthetic View Generation: Creates multiple rendered views of the model using Blender
  2. Geolocation Estimation: Estimates geographic coordinates using AI models (GeoCLIP, Ollama, or Gemini)
  3. Satellite Image Download: Retrieves high-resolution satellite imagery from Mapbox API
  4. Image Matching: Performs feature matching between synthetic renders and satellite images using Deep Image Matching
  5. Transformation Computation: Calculates the affine transformation matrix to align the model
  6. Georeferencing: Applies the transformation and elevation alignment to produce the final georeferenced model

Features

  • Multi-Model AI Geolocation: Choose between GeoCLIP, Ollama (llama3.2-vision), or Google Gemini for location estimation
  • Automated Pipeline: End-to-end processing from raw 3D model to georeferenced output
  • Flexible Execution Modes: Run the full pipeline, geolocation only, or image matching only
  • Docker Support: Fully containerized environment with GPU support for CUDA acceleration
  • High-Quality Rendering: Blender-based synthetic view generation with HDRI lighting
  • Robust Image Matching: Integration with Deep Image Matching for accurate feature correspondence
  • Comprehensive Transformation Library: Advanced 3D transformation utilities for precise alignment

Project Structure

3DGeoRef/
├── main.py                             # Main entry point for the pipeline
├── Dockerfile                          # Docker image for the main pipeline (with Blender & CUDA)
├── docker-compose.yml                  # Docker Compose configuration
├── hdri/                               # HDRI environment maps for rendering
├── pipeline/                           # Core pipeline modules
│   ├── __init__.py
│   ├── core.py                         # PipelineProcessor - main orchestration logic
│   ├── geolocation/                    # Geolocation estimation modules
│   │   ├── __init__.py
│   │   ├── geoclip.py                  # GeoCLIP-based geolocation
│   │   ├── ollama.py                   # Ollama AI-based geolocation
│   │   └── gemini.py                   # Google Gemini-based geolocation
│   ├── georeferencing/                 # Georeferencing and transformation modules
│   │   ├── __init__.py
│   │   ├── dim.py                      # Deep Image Matching integration
│   │   └── transformer.py              # 3D model transformation and alignment
│   ├── rendering/                      # 3D rendering modules
│   │   ├── __init__.py
│   │   └── multiview.py                # Blender-based multi-view synthetic rendering
│   ├── services/                       # External service integrations
│   │   ├── __init__.py
│   │   └── satellite_downloader.py     # Mapbox satellite imagery download
│   └── utils/                          # Utility modules
│       ├── __init__.py
│       └── transformations.py          # 3D transformation matrix utilities
└── README.md                           # This file

Installation

Docker Setup (Recommended)

The easiest way to run 3DGeoRef is using Docker, which provides a pre-configured environment with all dependencies including Blender, CUDA, and Deep Image Matching.

Prerequisites

  • Docker Engine 20.10+
  • Docker Compose 2.0+
  • NVIDIA GPU with CUDA support (optional but recommended)
  • NVIDIA Container Toolkit (for GPU acceleration)

Build and Run

# Clone the repository
git clone https://github.com/3DOM-FBK/3DGeoRef.git
cd 3DGeoRef

# Pull the pre-built Docker image
docker pull 3domfbk/3d-georef:04032026

# Or build the Docker image locally
docker build -t 3domfbk/3d-georef:04032026 .

# Run the container interactively
docker run --rm -it \
  --gpus all \
  -v /path/to/your/data:/data \
  3domfbk/3d-georef:04032026 bash

Usage

Basic Usage

Run the complete georeferencing pipeline on a 3D model:

python main.py \
  -i /path/to/model.glb \
  -o /path/to/output \
  --geoloc_model gemini

Docker Usage Examples

Example 1: Full Pipeline with Automatic Geolocation

Process a 3D model with automatic geolocation using Gemini AI:

docker run --rm -it \
  --gpus all \
  -v /path/to/data:/data \
  3domfbk/3d-georef:04032026 \
    -i /data/input/model.glb \
    -o /data/output \
    --geoloc_model gemini \
    --gemini_model gemini-2.5-flash \
    --gemini_api_key "YOUR_GEMINI_API_KEY" \
    --mapbox_api_key "YOUR_MAPBOX_API_KEY" \
    --cleanup \
    --streetviews 8 \
    --area_size 500 \
    --zoom 18

Example 2: Using GeoCLIP for Geolocation

Use the GeoCLIP model for faster geolocation (no API key required):

docker run --rm -it \
  --gpus all \
  -v /path/to/data:/data \
  3domfbk/3d-georef:04032026 \
    -i /data/input/building.glb \
    -o /data/output \
    --geoloc_model geoclip \
    --nr_prediction 3 \
    --cleanup

Example 3: Manual Coordinates with DIM Mode

Skip geolocation and run only Deep Image Matching with known coordinates:

docker run --rm -it \
  --gpus all \
  -v /path/to/data:/data \
  3domfbk/3d-georef:04032026 \
    -i /data/input/monument.glb \
    -o /data/output \
    --mode dim \
    --lat 46.0669 \
    --lon 11.1216 \
    --mapbox_api_key "YOUR_MAPBOX_API_KEY" \
    --area_size 300 \
    --zoom 20

Example 4: Using Custom Orthophoto

Use your own orthophoto instead of downloading satellite imagery:

docker run --rm -it \
  --gpus all \
  -v /path/to/data:/data \
  3domfbk/3d-georef:04032026 \
    -i /data/input/site.glb \
    -o /data/output \
    --ortho /data/orthophoto.tif \
    --lat 48.8582 \
    --lon 2.2945

Example 5: Using Docker Compose with Ollama

For using Ollama-based geolocation, use Docker Compose to run both services:

# Start the services
docker-compose up -d

# Wait for Ollama to download the model (first run only)
docker-compose logs -f ollama

# Run the pipeline (in a new terminal)
docker exec -it 3dgeoref_python python main.py \
  -i /data/input/model.glb \
  -o /data/output \
  --geoloc_model ollama

Example 6: Interactive Development Mode

Run the container interactively for development and debugging:

docker run --rm -it \
  --gpus all \
  -v /path/to/data:/data \
  3domfbk/3d-georef:04032026

# Inside the container, you can run commands manually:
python main.py -i /data/input/test.glb -o /data/output --mode geoloc

Command-Line Arguments

Argument Type Default Description
-i, --input_file str required Path to input 3D model (.glb/.gltf)
-o, --output_folder str required Output folder for results
--streetviews int 5 Number of street-view style renderings around the model
--nr_prediction int 1 Number of GPS predictions (GeoCLIP only)
--area_size int 500 Side length of square area to download (meters)
--zoom int 18 Satellite imagery zoom level (18-20 recommended)
--lat float None Manual latitude (skips geolocation if provided with --lon)
--lon float None Manual longitude (skips geolocation if provided with --lat)
--ortho str None Path to custom orthophoto (skips satellite download)
--mode str auto Execution mode: auto, geoloc, or dim
--geoloc_model str gemini Geolocation model: geoclip, ollama, or gemini
--gemini_model str gemini-2.5-flash Gemini model version to use
--gemini_api_key str None API Key for Google Gemini (can also be set via env var)
--mapbox_api_key str None API Key for Mapbox (can also be set via env var)
--cleanup bool False Delete temporary working directory in /tmp after execution

Execution Modes

  • auto (default): Full pipeline from 3D model to georeferenced output
  • geoloc: Only perform geolocation estimation and stop
  • dim: Skip geolocation, perform only Deep Image Matching (requires --lat and --lon)

Pipeline Workflow

The complete pipeline follows these steps:

1. Synthetic View Generation (pipeline/rendering/multiview.py)

  • Loads the 3D model into Blender
  • Computes bounding box and optimal camera positions
  • Generates orthographic top-down view
  • Creates multiple street-view perspective renderings
  • Applies HDRI lighting for realistic appearance
  • Exports rendered images and scaled model

2. Geolocation Estimation (pipeline/geolocation/)

  • GeoCLIP (geoclip.py): Fast, offline geolocation using CLIP embeddings
  • Ollama (ollama.py): Vision-language model (llama3.2-vision) for location reasoning
  • Gemini (gemini.py): Google's multimodal AI for high-accuracy geolocation
  • Filters out top-down views for better accuracy
  • Returns GPS coordinates (latitude, longitude)

3. Satellite Image Download (pipeline/services/satellite_downloader.py)

  • Queries Mapbox Static API for satellite tiles
  • Downloads tiles at specified zoom level
  • Stitches tiles into georeferenced mosaic
  • Exports as GeoTIFF with proper coordinate system

4. Image Matching (pipeline/georeferencing/dim.py)

  • Integrates with Deep Image Matching (DIM)
  • Extracts and matches keypoints between synthetic renders and satellite imagery
  • Uses pycolmap for robust feature matching
  • Computes homography and affine transformation matrices

5. Model Transformation (pipeline/georeferencing/transformer.py)

  • Applies computed transformation to 3D model
  • Aligns model to correct elevation using DEM data
  • Handles coordinate system conversions (WGS84, UTM, local)
  • Exports georeferenced model in original format

6. Output Generation (pipeline/core.py)

  • Saves georeferenced 3D model
  • Exports transformation matrices
  • Generates debug visualizations
  • Creates processing logs and metadata

Module Documentation

Core Module (pipeline/core.py)

PipelineProcessor: Main orchestration class that manages the entire pipeline.

  • Handles logging and temporary directory management
  • Coordinates all pipeline stages
  • Manages error handling and recovery
  • Provides progress tracking

Geolocation Modules (pipeline/geolocation/)

  • geoclip.py: GeoCLIP-based geolocation using CLIP embeddings
  • ollama.py: Ollama AI integration for vision-language geolocation
  • gemini.py: Google Gemini API integration for multimodal geolocation

Georeferencing Modules (pipeline/georeferencing/)

  • dim.py: Deep Image Matching integration for feature correspondence
  • transformer.py: 3D transformation and georeferencing utilities

Rendering Module (pipeline/rendering/)

  • multiview.py: Blender-based synthetic view generation with HDRI lighting

Services Module (pipeline/services/)

  • satellite_downloader.py: Mapbox satellite imagery download and mosaicking

Utils Module (pipeline/utils/)

  • transformations.py: Comprehensive 3D transformation library
    • Rotation matrices (Euler angles, quaternions, axis-angle)
    • Translation and scaling
    • Homography and affine transformations
    • Matrix decomposition and composition
    • Coordinate system conversions

Requirements

System Requirements

  • OS: Linux (Ubuntu 22.04+ recommended), Windows with WSL2
  • RAM: 16 GB minimum, 32 GB recommended
  • GPU: NVIDIA GPU with 8+ GB VRAM (optional but highly recommended)
  • Storage: 10 GB for Docker images, additional space for data

Software Dependencies

  • Blender 4.4.0+
  • Python 3.9+
  • CUDA 12.1+ (for GPU acceleration)
  • Deep Image Matching (dev branch)

Python Packages

See Dockerfile for complete list. Key dependencies:

  • torch, torchvision (PyTorch)
  • geoclip (geolocation)
  • pycolmap (image matching)
  • trimesh, open3d (3D processing)
  • rasterio (geospatial data)
  • google-genai (Gemini API)
  • ollama (Ollama integration)

License

This project is developed by 3DOM-FBK (Fondazione Bruno Kessler, 3D Optical Metrology unit).

For licensing information, please contact the authors.


Future Updates

  • Improved Elevation Alignment: Refine the elevation application to the 3D model for better integration with Cesium. This involves addressing potential discrepancies between ellipsoidal and geodetic height formats when fetching data from OpenTopoData.
  • Support for Multiple 3D Formats: Extend the input pipeline to support various 3D formats, specifically point clouds. Currently, the pipeline is optimized for GLB models.

Changelog

2026-01-16

  • Image Matching Improvement: Integrated the usage of SuperPoint+SuperGlue combined with LoFTR in the Deep Image Matching (DIM) step for more robust feature correspondence.
  • Nadir Dimension Estimation: Implemented automatic estimation of the nadir image dimensions using the Gemini model.

Acknowledgments

  • Deep Image Matching: 3DOM-FBK/deep-image-matching
  • GeoCLIP: Geolocation estimation using CLIP
  • Blender: Open-source 3D creation suite
  • Google Gemini: Multimodal AI for geolocation
  • Ollama: Local AI model inference

Contact

For questions, issues, or contributions, please open an issue on the GitHub repository or contact:

3DOM-FBK
Fondazione Bruno Kessler
Via Sommarive 18, 38123 Trento, Italy
https://3dom.fbk.eu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors