asdasdasd Visual Representation for Semantic Target Driven Navigation
Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, James Davidson
arXiv 2018
ArXiv: https://arxiv.org/abs/1805.06066
networkx
gin-config
git clone --depth 1 https://github.com/tensorflow/models.git
We used Active Vision Dataset (AVD) which can be downloaded from here. To make our code faster and reduce memory footprint, we created the AVD Minimal dataset. AVD Minimal consists of low resolution images from the original AVD dataset. In addition, we added annotations for target views, predicted object detections from pre-trained object detector on MS-COCO dataset, and predicted semantic segmentation from pre-trained model on NYU-v2 dataset. AVD minimal can be downloaded from here. Set $AVD_DIR
as the path to the downloaded AVD Minimal.
If you wish to navigate the environment, to see how the AVD looks like you can use the following command:
python viz_active_vision_dataset_main -- \
--mode=human \
--gin_config=envs/configs/active_vision_config.gin \
--gin_params='ActiveVisionDatasetEnv.dataset_root=$AVD_DIR'
Right now, the released version only supports training and inference using the real data from Active Vision Dataset.
Use the following command for training:
# Train
python train_supervised_active_vision.py \
--mode='train' \
--logdir=$CHECKPOINT_DIR \
--modality_types='det' \
--batch_size=8 \
--train_iters=200000 \
--lstm_cell_size=2048 \
--policy_fc_size=2048 \
--sequence_length=20 \
--max_eval_episode_length=100 \
--test_iters=194 \
--gin_config=envs/configs/active_vision_config.gin \
--gin_params='ActiveVisionDatasetEnv.dataset_root=$AVD_DIR' \
--logtostderr
Use the following command for unrolling the policy on the eval environments. The inference code periodically check the checkpoint folder for new checkpoints to use it for unrolling the policy on the eval environments. After each evaluation, it will create a folder in the $CHECKPOINT_DIR/evals/$ITER where $ITER is the iteration number at which the checkpoint is stored.
# Eval
python train_supervised_active_vision \
--mode='eval' \
--logdir=$CHECKPOINT_DIR \
--modality_types='det' \
--batch_size=8 \
--train_iters=200000 \
--lstm_cell_size=2048 \
--policy_fc_size=2048 \
--sequence_length=20 \
--max_eval_episode_length=100 \
--test_iters=194 \
--gin_config=envs/configs/active_vision_config.gin \
--gin_params='ActiveVisionDatasetEnv.dataset_root=$AVD_DIR' \
--logtostderr
At any point, you can run the following command to compute statistics such as success rate over all the evaluations so far. It also generates gif images for unrolling of the best policy.
# Visualize and Compute Stats
python viz_active_vision_dataset_main.py \
--mode=eval \
--eval_folder=$CHECKPOINT_DIR/evals/ \
--output_folder=$OUTPUT_GIFS_FOLDER \
--gin_config=envs/configs/active_vision_config.gin \
--gin_params='ActiveVisionDatasetEnv.dataset_root=$AVD_DIR'
If you find our work useful in your research please consider citing our paper:
@inproceedings{MousavianArxiv18,
author = {A. Mousavian and A. Toshev and M. Fiser and J. Kosecka and J. Davidson},
title = {Visual Representations for Semantic Target Driven Navigation},
booktitle = {arXiv:1805.06066},
year = {2018},
}