A training-free runtime framework that keeps actions continuously available during embodied VLN deployment.
β‘ Training-free Β Β |Β Β π Guarded handoff Β Β |Β Β π§ Real-time adaptation
LiveVLN is a runtime wrapper for compatible pretrained VLM based navigators. Instead of exposing the robot to a blocking sense-inference-execution loop, it overlaps execution with background refresh so the next action chunk is ready before the committed prefix is exhausted.
Previous strong VLN systems still deploy with a blocking sense β inference β execution interface. Thus the robot visibly pauses because motion must wait for sensing, transmission, and inference to finish before the next continuation arrives.
LiveVLN removes that stop-and-go loop at runtime with a dual-thread interface:
- Guarded handoff: keep executing a committed guard buffer while the next continuation is refreshed in the background.
- Revisable tail: expose only the minimal safe prefix and keep later actions editable under new observations.
- Real-time adaptation: resize the guard budget from recent sense-inference latency instead of using a fixed horizon.
- Training-free integration: wrap the same pretrained checkpoint without retraining the backbone.
LiveVLN keeps a short committed prefix for continuous motion, while the remaining tail stays revisable under new observations.
The core issue is runtime exposure: if inference latency is directly exposed to the controller, the robot pauses even when the policy is strong.
On a real NaVIDA deployment, native blocking execution spends:
- 10.64s waiting per episode
- 30.5% waiting ratio
- 94.9% stop-and-go rounds
LiveVLN hides that latency behind committed execution, so motion stays available while the next continuation is refreshed.
Across R2R, RxR, and real-robot streaming, LiveVLN keeps benchmark performance close to the original checkpoints while substantially improving continuity:
- StreamVLN
- waiting time: 7.32s β 1.63s (77.7%β)
- wall-clock episode time: 41.98s β 36.71s (12.6%β)
- pause count: 6.75 β 0.80
- NaVIDA
- waiting time: 10.64s β 2.89s (72.8%β)
- wall-clock episode time: 34.90s β 28.06s (19.6%β)
- pause count: 9.25 β 1.20
The setup is largely aligned with NaVIDA, and reusing the same environment is recommended.
- Python 3.10
- CUDA 11.8+
- Habitat-Sim 0.2.4
- Habitat-Lab 0.2.4
- PyTorch / Transformers / PEFT
qwen_vl_utils,vllm,flash-attn
Following NaVIDA, the expected layout is the standard Habitat VLN-CE structure:
data/
βββ scene_datasets/
βββ R2R_VLNCE_v1-3_preprocessed/
βββ ...
Download weights of StreamVLN and NaVIDA.
Useful entry points:
real_world_demo/agent_service_live.py: local Flask action servicereal_world_demo/real_world_vln_live.py: RGB-D + robot control client
π Update the endpoint and device settings before deployment in your own robot environment.
# For NaVIDA demo
cd real_world_demo
bash start_server.sh # on server
python3 real_world_vln_live.py # on robotThis repository builds on or reuses ideas/code from:


