3DGeoRef is an automated pipeline for georeferencing 3D models using synthetic rendering, AI-powered geolocation, and satellite imagery. The system transforms an arbitrary 3D model into a georeferenced asset aligned with real-world geographic coordinates, ready for integration into GIS systems or 3D viewers.
- Overview
- Features
- Project Structure
- Installation
- Usage
- Pipeline Workflow
- Module Documentation
- Requirements
- License
- Future Updates
- Changelog
Given a 3D model (GLB or GLTF format), 3DGeoRef performs the following operations:
- Synthetic View Generation: Creates multiple rendered views of the model using Blender
- Geolocation Estimation: Estimates geographic coordinates using AI models (GeoCLIP, Ollama, or Gemini)
- Satellite Image Download: Retrieves high-resolution satellite imagery from Mapbox API
- Image Matching: Performs feature matching between synthetic renders and satellite images using Deep Image Matching
- Transformation Computation: Calculates the affine transformation matrix to align the model
- Georeferencing: Applies the transformation and elevation alignment to produce the final georeferenced model
- Multi-Model AI Geolocation: Choose between GeoCLIP, Ollama (llama3.2-vision), or Google Gemini for location estimation
- Automated Pipeline: End-to-end processing from raw 3D model to georeferenced output
- Flexible Execution Modes: Run the full pipeline, geolocation only, or image matching only
- Docker Support: Fully containerized environment with GPU support for CUDA acceleration
- High-Quality Rendering: Blender-based synthetic view generation with HDRI lighting
- Robust Image Matching: Integration with Deep Image Matching for accurate feature correspondence
- Comprehensive Transformation Library: Advanced 3D transformation utilities for precise alignment
3DGeoRef/
├── main.py # Main entry point for the pipeline
├── Dockerfile # Docker image for the main pipeline (with Blender & CUDA)
├── docker-compose.yml # Docker Compose configuration
├── hdri/ # HDRI environment maps for rendering
├── pipeline/ # Core pipeline modules
│ ├── __init__.py
│ ├── core.py # PipelineProcessor - main orchestration logic
│ ├── geolocation/ # Geolocation estimation modules
│ │ ├── __init__.py
│ │ ├── geoclip.py # GeoCLIP-based geolocation
│ │ ├── ollama.py # Ollama AI-based geolocation
│ │ └── gemini.py # Google Gemini-based geolocation
│ ├── georeferencing/ # Georeferencing and transformation modules
│ │ ├── __init__.py
│ │ ├── dim.py # Deep Image Matching integration
│ │ └── transformer.py # 3D model transformation and alignment
│ ├── rendering/ # 3D rendering modules
│ │ ├── __init__.py
│ │ └── multiview.py # Blender-based multi-view synthetic rendering
│ ├── services/ # External service integrations
│ │ ├── __init__.py
│ │ └── satellite_downloader.py # Mapbox satellite imagery download
│ └── utils/ # Utility modules
│ ├── __init__.py
│ └── transformations.py # 3D transformation matrix utilities
└── README.md # This file
The easiest way to run 3DGeoRef is using Docker, which provides a pre-configured environment with all dependencies including Blender, CUDA, and Deep Image Matching.
- Docker Engine 20.10+
- Docker Compose 2.0+
- NVIDIA GPU with CUDA support (optional but recommended)
- NVIDIA Container Toolkit (for GPU acceleration)
# Clone the repository
git clone https://github.com/3DOM-FBK/3DGeoRef.git
cd 3DGeoRef
# Pull the pre-built Docker image
docker pull 3domfbk/3d-georef:04032026
# Or build the Docker image locally
docker build -t 3domfbk/3d-georef:04032026 .
# Run the container interactively
docker run --rm -it \
--gpus all \
-v /path/to/your/data:/data \
3domfbk/3d-georef:04032026 bashRun the complete georeferencing pipeline on a 3D model:
python main.py \
-i /path/to/model.glb \
-o /path/to/output \
--geoloc_model geminiProcess a 3D model with automatic geolocation using Gemini AI:
docker run --rm -it \
--gpus all \
-v /path/to/data:/data \
3domfbk/3d-georef:04032026 \
-i /data/input/model.glb \
-o /data/output \
--geoloc_model gemini \
--gemini_model gemini-2.5-flash \
--gemini_api_key "YOUR_GEMINI_API_KEY" \
--mapbox_api_key "YOUR_MAPBOX_API_KEY" \
--cleanup \
--streetviews 8 \
--area_size 500 \
--zoom 18Use the GeoCLIP model for faster geolocation (no API key required):
docker run --rm -it \
--gpus all \
-v /path/to/data:/data \
3domfbk/3d-georef:04032026 \
-i /data/input/building.glb \
-o /data/output \
--geoloc_model geoclip \
--nr_prediction 3 \
--cleanupSkip geolocation and run only Deep Image Matching with known coordinates:
docker run --rm -it \
--gpus all \
-v /path/to/data:/data \
3domfbk/3d-georef:04032026 \
-i /data/input/monument.glb \
-o /data/output \
--mode dim \
--lat 46.0669 \
--lon 11.1216 \
--mapbox_api_key "YOUR_MAPBOX_API_KEY" \
--area_size 300 \
--zoom 20Use your own orthophoto instead of downloading satellite imagery:
docker run --rm -it \
--gpus all \
-v /path/to/data:/data \
3domfbk/3d-georef:04032026 \
-i /data/input/site.glb \
-o /data/output \
--ortho /data/orthophoto.tif \
--lat 48.8582 \
--lon 2.2945For using Ollama-based geolocation, use Docker Compose to run both services:
# Start the services
docker-compose up -d
# Wait for Ollama to download the model (first run only)
docker-compose logs -f ollama
# Run the pipeline (in a new terminal)
docker exec -it 3dgeoref_python python main.py \
-i /data/input/model.glb \
-o /data/output \
--geoloc_model ollamaRun the container interactively for development and debugging:
docker run --rm -it \
--gpus all \
-v /path/to/data:/data \
3domfbk/3d-georef:04032026
# Inside the container, you can run commands manually:
python main.py -i /data/input/test.glb -o /data/output --mode geoloc| Argument | Type | Default | Description |
|---|---|---|---|
-i, --input_file |
str | required | Path to input 3D model (.glb/.gltf) |
-o, --output_folder |
str | required | Output folder for results |
--streetviews |
int | 5 | Number of street-view style renderings around the model |
--nr_prediction |
int | 1 | Number of GPS predictions (GeoCLIP only) |
--area_size |
int | 500 | Side length of square area to download (meters) |
--zoom |
int | 18 | Satellite imagery zoom level (18-20 recommended) |
--lat |
float | None | Manual latitude (skips geolocation if provided with --lon) |
--lon |
float | None | Manual longitude (skips geolocation if provided with --lat) |
--ortho |
str | None | Path to custom orthophoto (skips satellite download) |
--mode |
str | auto | Execution mode: auto, geoloc, or dim |
--geoloc_model |
str | gemini | Geolocation model: geoclip, ollama, or gemini |
--gemini_model |
str | gemini-2.5-flash | Gemini model version to use |
--gemini_api_key |
str | None | API Key for Google Gemini (can also be set via env var) |
--mapbox_api_key |
str | None | API Key for Mapbox (can also be set via env var) |
--cleanup |
bool | False | Delete temporary working directory in /tmp after execution |
auto(default): Full pipeline from 3D model to georeferenced outputgeoloc: Only perform geolocation estimation and stopdim: Skip geolocation, perform only Deep Image Matching (requires--latand--lon)
The complete pipeline follows these steps:
- Loads the 3D model into Blender
- Computes bounding box and optimal camera positions
- Generates orthographic top-down view
- Creates multiple street-view perspective renderings
- Applies HDRI lighting for realistic appearance
- Exports rendered images and scaled model
- GeoCLIP (
geoclip.py): Fast, offline geolocation using CLIP embeddings - Ollama (
ollama.py): Vision-language model (llama3.2-vision) for location reasoning - Gemini (
gemini.py): Google's multimodal AI for high-accuracy geolocation - Filters out top-down views for better accuracy
- Returns GPS coordinates (latitude, longitude)
- Queries Mapbox Static API for satellite tiles
- Downloads tiles at specified zoom level
- Stitches tiles into georeferenced mosaic
- Exports as GeoTIFF with proper coordinate system
- Integrates with Deep Image Matching (DIM)
- Extracts and matches keypoints between synthetic renders and satellite imagery
- Uses pycolmap for robust feature matching
- Computes homography and affine transformation matrices
- Applies computed transformation to 3D model
- Aligns model to correct elevation using DEM data
- Handles coordinate system conversions (WGS84, UTM, local)
- Exports georeferenced model in original format
- Saves georeferenced 3D model
- Exports transformation matrices
- Generates debug visualizations
- Creates processing logs and metadata
PipelineProcessor: Main orchestration class that manages the entire pipeline.
- Handles logging and temporary directory management
- Coordinates all pipeline stages
- Manages error handling and recovery
- Provides progress tracking
geoclip.py: GeoCLIP-based geolocation using CLIP embeddingsollama.py: Ollama AI integration for vision-language geolocationgemini.py: Google Gemini API integration for multimodal geolocation
dim.py: Deep Image Matching integration for feature correspondencetransformer.py: 3D transformation and georeferencing utilities
multiview.py: Blender-based synthetic view generation with HDRI lighting
satellite_downloader.py: Mapbox satellite imagery download and mosaicking
transformations.py: Comprehensive 3D transformation library- Rotation matrices (Euler angles, quaternions, axis-angle)
- Translation and scaling
- Homography and affine transformations
- Matrix decomposition and composition
- Coordinate system conversions
- OS: Linux (Ubuntu 22.04+ recommended), Windows with WSL2
- RAM: 16 GB minimum, 32 GB recommended
- GPU: NVIDIA GPU with 8+ GB VRAM (optional but highly recommended)
- Storage: 10 GB for Docker images, additional space for data
- Blender 4.4.0+
- Python 3.9+
- CUDA 12.1+ (for GPU acceleration)
- Deep Image Matching (dev branch)
See Dockerfile for complete list. Key dependencies:
torch,torchvision(PyTorch)geoclip(geolocation)pycolmap(image matching)trimesh,open3d(3D processing)rasterio(geospatial data)google-genai(Gemini API)ollama(Ollama integration)
This project is developed by 3DOM-FBK (Fondazione Bruno Kessler, 3D Optical Metrology unit).
For licensing information, please contact the authors.
- Improved Elevation Alignment: Refine the elevation application to the 3D model for better integration with Cesium. This involves addressing potential discrepancies between ellipsoidal and geodetic height formats when fetching data from OpenTopoData.
- Support for Multiple 3D Formats: Extend the input pipeline to support various 3D formats, specifically point clouds. Currently, the pipeline is optimized for GLB models.
- Image Matching Improvement: Integrated the usage of SuperPoint+SuperGlue combined with LoFTR in the Deep Image Matching (DIM) step for more robust feature correspondence.
- Nadir Dimension Estimation: Implemented automatic estimation of the nadir image dimensions using the Gemini model.
- Deep Image Matching: 3DOM-FBK/deep-image-matching
- GeoCLIP: Geolocation estimation using CLIP
- Blender: Open-source 3D creation suite
- Google Gemini: Multimodal AI for geolocation
- Ollama: Local AI model inference
For questions, issues, or contributions, please open an issue on the GitHub repository or contact:
3DOM-FBK
Fondazione Bruno Kessler
Via Sommarive 18, 38123 Trento, Italy
https://3dom.fbk.eu