Skip to content

NIneeeeeem/LiveVLN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LiveVLN: Breaking the Stop-and-Go Loop in Vision-Language Navigation

A training-free runtime framework that keeps actions continuously available during embodied VLN deployment.

arXiv paper

⚑ Training-free Β Β |Β Β  πŸ”„ Guarded handoff Β Β |Β Β  🧠 Real-time adaptation

LiveVLN teaser illustrating multi-step execution for smoother and more efficient navigation

LiveVLN is a runtime wrapper for compatible pretrained VLM based navigators. Instead of exposing the robot to a blocking sense-inference-execution loop, it overlaps execution with background refresh so the next action chunk is ready before the committed prefix is exhausted.

✨ Overview

Previous strong VLN systems still deploy with a blocking sense β†’ inference β†’ execution interface. Thus the robot visibly pauses because motion must wait for sensing, transmission, and inference to finish before the next continuation arrives.

LiveVLN removes that stop-and-go loop at runtime with a dual-thread interface:

  • Guarded handoff: keep executing a committed guard buffer while the next continuation is refreshed in the background.
  • Revisable tail: expose only the minimal safe prefix and keep later actions editable under new observations.
  • Real-time adaptation: resize the guard budget from recent sense-inference latency instead of using a fixed horizon.
  • Training-free integration: wrap the same pretrained checkpoint without retraining the backbone.

LiveVLN overview with executed actions, guard buffer, revisable tail, and dual-thread handoff

LiveVLN keeps a short committed prefix for continuous motion, while the remaining tail stays revisable under new observations.

πŸ“Œ Why it matters

The core issue is runtime exposure: if inference latency is directly exposed to the controller, the robot pauses even when the policy is strong.

On a real NaVIDA deployment, native blocking execution spends:

  • 10.64s waiting per episode
  • 30.5% waiting ratio
  • 94.9% stop-and-go rounds

LiveVLN hides that latency behind committed execution, so motion stays available while the next continuation is refreshed.

Real-robot diagnosis showing waiting ratio and stop-and-go behavior in native blocking deployment

πŸ“Š Results

Across R2R, RxR, and real-robot streaming, LiveVLN keeps benchmark performance close to the original checkpoints while substantially improving continuity:

  • StreamVLN
    • waiting time: 7.32s β†’ 1.63s (77.7%↓)
    • wall-clock episode time: 41.98s β†’ 36.71s (12.6%↓)
    • pause count: 6.75 β†’ 0.80
  • NaVIDA
    • waiting time: 10.64s β†’ 2.89s (72.8%↓)
    • wall-clock episode time: 34.90s β†’ 28.06s (19.6%↓)
    • pause count: 9.25 β†’ 1.20

πŸš€ Quick Start

Environment

The setup is largely aligned with NaVIDA, and reusing the same environment is recommended.

  • Python 3.10
  • CUDA 11.8+
  • Habitat-Sim 0.2.4
  • Habitat-Lab 0.2.4
  • PyTorch / Transformers / PEFT
  • qwen_vl_utils, vllm, flash-attn

Data and Checkpoints

Following NaVIDA, the expected layout is the standard Habitat VLN-CE structure:

data/
β”œβ”€β”€ scene_datasets/
β”œβ”€β”€ R2R_VLNCE_v1-3_preprocessed/
└── ...

Download weights of StreamVLN and NaVIDA.

πŸ€– Real-World Demo

Useful entry points:

  • real_world_demo/agent_service_live.py: local Flask action service
  • real_world_demo/real_world_vln_live.py: RGB-D + robot control client

πŸ“ Update the endpoint and device settings before deployment in your own robot environment.

# For NaVIDA demo
cd real_world_demo
bash start_server.sh   # on server
python3 real_world_vln_live.py   # on robot

πŸ™ Acknowledgements

This repository builds on or reuses ideas/code from:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors