Code for "Language-Driven Semantic Change Detection in Urban Maps via Multi-Modal Deep Learning" (Huaze Liu, Zihao Gao, Adyasha Mohanty — Harvey Mudd College, ION GNSS+ 2025). [paper PDF]
See demo/ for the source clip and how it was generated. GitHub does not
render local <video> sources in rendered Markdown -- after pushing, drag demo/demo.mp4
into a GitHub comment/PR box once to get a stable user-attachments URL, then swap it in here.
High-integrity maps are essential for safe autonomous navigation in dynamic urban environments, where frequent changes and sensor limitations present significant challenges. This work introduces a deep-learning-driven framework for continuous map uncertainty monitoring and semantic change detection that fuses vision and LiDAR features. Zero-shot semantic segmentation via large pre-trained vision-language models provides interpretable, language-driven explanations for detected map inconsistencies, while Kullback-Leibler divergence tracks map consistency over time and enables proactive real-time alerts.
┌─────────────────────────┐ ┌──────────────────────────┐
RGB frame ───────▶│ VISION MODULE │ │ LIDAR MODULE │◀─────── depth + semantic
(before/after) │ Grounding DINO v2 + SAM │ │ depth → pseudo-LiDAR → │ frame (before/after)
│ → per-class label mask │ │ PointNet (enhanced) │
│ → change map C(x,y) │ │ → class-removal variant │
└────────────┬─────────────┘ └────────────┬─────────────┘
│ D_vis_KL (KL divergence) │ D_lidar_KL (weighted
│ │ normalized KL scatter)
└───────────────┬────────────────────┘
▼
S = α·D_vis_KL + β·D_lidar_KL
(α,β heuristic per weather condition)
| Paper section | Code |
|---|---|
| I.1 Vision Module (Grounding DINO v2 + SAM, change map) | map_uncertainty/vision/ |
| I.1 baselines: CLIP patch-diff, LoFTR (Sec. III.1.a) | map_uncertainty/vision/baselines.py |
| I.2 LiDAR Module (depth→pseudo-LiDAR, enhanced PointNet) | map_uncertainty/lidar/ |
I.3 Fusion score S = α·D_vis + β·D_lidar |
map_uncertainty/fusion/score.py |
| II.2 KL divergence, weighted KL scatter, Change Ratio, Jaccard Distance | map_uncertainty/metrics/ |
| II.1 Class-removal scene variants (Telea inpainting, point removal) | map_uncertainty/lidar/variants.py |
| Root prototype: CLIPSeg construction-object segmentation | SemantSeg.ipynb, map_uncertainty/vision/clipseg_pipeline.py |
| Vendored classical PointNet (Qi et al., 2017) reference impl | 3D/point_net/ (MIT, © Isaac Berrios — see 3D/LICENSE) |
| PointNet exploration / S3DIS visualization notebooks | 3D/ |
map_uncertainty/ importable package -- the paper's pipeline
vision/ Grounding DINO+SAM, CLIPSeg, CLIP/LoFTR baselines, change maps
lidar/ depth→pointcloud, enhanced PointNet, class-removal variants
fusion/ vision+LiDAR fusion score
metrics/ KL divergence, weighted KL scatter, Change Ratio, Jaccard Distance
data/ synthetic Virtual-KITTI-shaped sample generator
scripts/ CLI entry points (see "Quickstart")
tests/ pytest unit tests (`pytest tests/`)
3D/ vendored PointNet reference implementation + S3DIS notebooks
demo/ source assets + script for the short demo clip above
imgs/, roadwork.jpg, ... example construction-scene images used by the prototype notebooks
SemantSeg.ipynb, PointNet.ipynb original exploratory notebooks
python -m venv .venv && source .venv/bin/activate # or your conda env of choice
pip install -r requirements.txtThe Grounding DINO + SAM proposed method and the LoFTR baseline need heavier, optional dependencies (and their own checkpoint downloads -- see below):
pip install -r requirements-optional.txtThe full Virtual KITTI 2 dataset (Cabon et al., 2020) used in the paper's experiments is tens of
GB and is not vendored here. map_uncertainty/data/synthetic.py generates small
synthetic RGB/depth/semantic frames with the same schema so every pipeline stage below runs
end-to-end without it -- point the same functions at real Virtual KITTI frames to reproduce the
paper's actual numbers (see "Reproducing the paper's results").
# 1. Generate a synthetic sample sequence
python scripts/generate_synthetic_sample.py --out data/sample_scene --frames 5
# 2. Vision branch: CLIPSeg backend (no extra checkpoints needed beyond `transformers`)
python scripts/run_vision_pipeline.py --backend clipseg \
--before without_roadwork.jpg --after roadwork.jpg \
--labels "cone" "barricade" "construction vehicle" \
--out outputs/vision_clipseg.png
# CLIP patch-difference baseline (Sec. III.1.a)
python scripts/run_vision_pipeline.py --backend clip \
--before without_roadwork.jpg --after roadwork.jpg --out outputs/vision_clip.png
# 3. LiDAR branch: depth→pointcloud, remove a class, score the change
python scripts/run_lidar_pipeline.py --variant noTrafficSigns
# 4. Fused vision+LiDAR score
python scripts/run_fusion.py --condition clearRun pytest tests/ to check the metrics/vision/lidar/fusion modules (23 tests, no GPU or
checkpoints required).
Not fetched automatically by this repo -- download separately if you want to run the proposed Grounding DINO + SAM method or the LoFTR baseline for real:
| Model | Used by | Approx. size | Source |
|---|---|---|---|
| GroundingDINO (Swin-T) | vision.GroundingDINOSAM |
~660MB | IDEA-Research/GroundingDINO |
SAM vit_b |
vision.GroundingDINOSAM |
~375MB | facebookresearch/segment-anything (vit_h is 2.4GB and unnecessary here) |
| CLIP ViT-B/32 | vision.CLIPPatchDiff |
~605MB | auto-cached via transformers (openai/clip-vit-base-patch32) |
| CLIPSeg (rd64-refined) | vision.clipseg_pipeline |
~600MB | auto-cached via transformers (CIDAS/clipseg-rd64-refined) |
| LoFTR (outdoor) | vision.LoFTRChangeDetector |
~46MB | auto-cached via kornia |
Check your available disk space before downloading GroundingDINO/SAM -- together they need roughly 1GB free.
This repo is the reference implementation of every formula and architectural component described in the paper (change map construction, weighted normalized KL scatter, Change Ratio, Jaccard Distance, fusion score, the enhanced PointNet head). The tables/figures in the paper (TPR, mIoU, KL divergence and Pearson correlation vs. baselines) were produced on the full Virtual KITTI 2 dataset with GPU training over the LiDAR classifier -- reproducing those exact numbers requires that dataset and compute, neither of which this repo bundles. The synthetic-data path above is for verifying every stage of the pipeline runs correctly end-to-end, not for reproducing the paper's benchmark numbers.
@inproceedings{liu2025languagedriven,
title = {Language-Driven Semantic Change Detection in Urban Maps via Multi-Modal Deep Learning},
author = {Liu, Huaze and Gao, Zihao and Mohanty, Adyasha},
booktitle = {Proceedings of the ION GNSS+ 2025},
year = {2025},
url = {https://huazeliu.github.io/files/paper/ION_GNSS_2025.pdf}
}Also cite the underlying methods this framework builds on: PointNet (Qi et al., 2017), Grounding DINO (Ding et al., 2024), Segment Anything (Kirillov et al., 2023), LoFTR (Sun et al., 2021), and Virtual KITTI 2 (Cabon et al., 2020).
MIT (see LICENSE). The vendored PointNet implementation in 3D/point_net/ is
separately MIT-licensed by Isaac Berrios (3D/LICENSE).