Skip to content

3dlg-hcvc/LAW-VLNCE

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code
This branch is 8 commits ahead, 10 commits behind jacobkrantz:master.

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments

This repository is the official implementation of Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments [Project Website].

In the Vision-and-Language Navigation (VLN) task an embodied agent navigates a 3D environment, following natural language instructions. A challenge in this task is how to handle 'off the path' scenarios where an agent veers from a reference path. Prior work supervises the agent with actions based on the shortest path from the agent’s location to the goal, but such goal-oriented supervision is often not in alignment with the instruction. Furthermore, the evaluation metrics employed by prior work do not measure how much of a language instruction the agent is able to follow. In this work, we propose a simple and effective language-aligned supervision scheme, and a new metric that measures the number of sub-instructions the agent has completed during navigation.

Setup

We build on top of VLN-CE codebase. Please follow the set-up instructions and download the data as described in the VLN-CE codebase. Next, clone this repository and install dependencies from requirements.py:

git clone git@github.com:3dlg-hcvc/LAW-VLNCE.git
cd LAW-VLNCE
python -m pip install -r requirements.txt

Issues

If you find an issue installing torch-scatter, use the following and replace {cuda-version} with your cuda version and {torch-version} with your installed torch version: pip install torch-scatter==latest+{cuda-version} -f https://pytorch-geometric.com/whl/torch-{torch-version}.html

Refer torch-scatter

Usage

The run.py script is how training and evaluation is done for all model configurations. Specify a configuration file and a run type as such:

python run.py \
  --exp-config path/to/experiment_config.yaml \
  --run-type {train | eval | inference}

Training

We follow a similar training regime as VLN-CE, by first training with teacher forcing on the augmented data and then fine-tuning with Dagger on the original Room-to-Room data.

For our LAW pano model, we first train using cma_pm_aug.yaml config. We then evaluate all the checkpoints and select the best performing one, on the nDTW metric. This checkpoint is then fine-tuned using cma_pm_da_aug_tune config, by updating the LOAD_FROM_CKPT and CKPT_TO_LOAD fields.

Evaluation

The same config may be used for evaluating the models, where EVAL_CKPT_PATH_DIR specifies the path of the checkpoint (or a folder for evaluating multiple checkpoints), STATS_EVAL_DIR specifies the folder where evaluations are to be saved, EVAL.SPLIT specifies dataset split (val-seen or val_unseen), and EVAL.EPISODE_COUNT specifies number of episodes to be evaluated.

python run.py \
  --exp-config vlnce_baselines/config/paper_configs/law_pano_config/cma_pm_aug.yaml \
  --run-type eval

Citing

If you use LAW-VLNCE in your research, please cite the following paper:

Acknowledgements

We thank Jacob Krantz for the VLN-CE codebase, on which we build our repository.

About

Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%