Skip to content
/ MOVILAN Public

Modular and simple vision language navigation framework

Notifications You must be signed in to change notification settings

Homagn/MOVILAN

Repository files navigation

Alt text

  1. Get familiarized with concepts: (paper here: https://arxiv.org/abs/2101.07891) (explanation slide here: https://github.com/Homagn/MOVILAN/blob/main/MOVILAN_detailed_explanation.pptx)

  2. First set up the docker in your system if you dont have nvidia-docker then follow instructions : https://github.com/Homagn/Dockerfiles/blob/main/Docker-knowhows/nvidia-docker-setup

  3. Pull the necessary environment or build it Build the docker file : (download the source code here from github) (navigate to the Dockerfile location in MOVILAN/)

    sudo nvidia-docker build -t homagni/vision_language:latest .

    OR

    Pull the prebuilt docker image like this:

    docker pull homagni/vision_language:latest

  4. Download the necessary model weights and data go to the google drive folder-> https://drive.google.com/file/d/1Spz3o5wmYUIMyXsYl3tKYYTMapzkca1_/view?usp=sharing download the zip file, after that extract the contents to your source MOVILAN/ folder like this

    alfred_model_1000_modification -> language_understanding/alfred_model_1000_modification

    data -> mapper/data

    nn_weights -> mapper/nn_weights

    unet_weights.pth -> cross_modal/unet_weights.pth

    prehash.npy -> cross_modal/prehash.npy

    descriptions.json -> cross_modal/data/descriptions.json

  5. Run the docker instance

    (in a terminal in linux)

    xhost +

    (after this in a newline)

    (NOTE- replace /home/homagni/Desktop/MOVILAN/ with the location where you have downloaded the source code)

    sudo nvidia-docker run --rm -ti --mount type=bind,source=/home/homagni/Desktop/MOVILAN/,target=/ai2thor --net=host --ipc=host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --env="QT_X11_NO_MITSHM=1" homagni/vision_language

    (Now youll be inside the terminal of the docker instance)

    (run the test code)

    cd /ai2thor

    python3 main_interactive.py

    (it should open up an ai2thor instance and run an execution of our algorithm for an instruction in ALFRED dataset)

EXTRA NOTES:

In mapper/params.py you can change debug_viz= True or false depending on whether you want to see the internal map state of the robot

The code creates a lot of log outputs depicting the various stages of decision making to make a log you can try

python3 main_batchrun.py > SomeFile.txt

VIEWING EXPERT TRAJECTORIES

cd /ai2hor/robot/

(replace with room number and task number)

python3 master_execution.py --room 1 --task 1 --gendata

DEBUGGING pipeline

For new rooms objects may not be identifiable in a map go to /ai2thor/mapper/datagen.py and follow the instructions in the end comments to generate maps and correct maps

go to /ai2thor/log_instructions.py to generate list of existing instructions in the dataset

(ERRORS ?)

If the display is not opening up from the docker instance (if youre using linux azure VM and docker from inside it) mviereck/x11docker#186 and (probably the last instruction of this) https://github.com/stas-pavlov/azure-glx-rendering

https://unix.stackexchange.com/questions/403424/x11-forwarding-from-a-docker-container-in-remote-server

using the --privileged flag as in here

(https://answers.ros.org/question/301056/ros2-rviz-in-docker-container/)

is able to make gazebo work with display from docker in azure cloud

About

Modular and simple vision language navigation framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published