-
Get familiarized with concepts: (paper here: https://arxiv.org/abs/2101.07891) (explanation slide here: https://github.com/Homagn/MOVILAN/blob/main/MOVILAN_detailed_explanation.pptx)
-
First set up the docker in your system if you dont have nvidia-docker then follow instructions : https://github.com/Homagn/Dockerfiles/blob/main/Docker-knowhows/nvidia-docker-setup
-
Pull the necessary environment or build it Build the docker file : (download the source code here from github) (navigate to the Dockerfile location in MOVILAN/)
sudo nvidia-docker build -t homagni/vision_language:latest .
OR
Pull the prebuilt docker image like this:
docker pull homagni/vision_language:latest
-
Download the necessary model weights and data go to the google drive folder-> https://drive.google.com/file/d/1Spz3o5wmYUIMyXsYl3tKYYTMapzkca1_/view?usp=sharing download the zip file, after that extract the contents to your source MOVILAN/ folder like this
alfred_model_1000_modification -> language_understanding/alfred_model_1000_modification
data -> mapper/data
nn_weights -> mapper/nn_weights
unet_weights.pth -> cross_modal/unet_weights.pth
prehash.npy -> cross_modal/prehash.npy
descriptions.json -> cross_modal/data/descriptions.json
-
Run the docker instance
(in a terminal in linux)
xhost +
(after this in a newline)
(NOTE- replace /home/homagni/Desktop/MOVILAN/ with the location where you have downloaded the source code)
sudo nvidia-docker run --rm -ti --mount type=bind,source=/home/homagni/Desktop/MOVILAN/,target=/ai2thor --net=host --ipc=host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --env="QT_X11_NO_MITSHM=1" homagni/vision_language
(Now youll be inside the terminal of the docker instance)
(run the test code)
cd /ai2thor
python3 main_interactive.py
(it should open up an ai2thor instance and run an execution of our algorithm for an instruction in ALFRED dataset)
EXTRA NOTES:
In mapper/params.py you can change debug_viz= True or false depending on whether you want to see the internal map state of the robot
The code creates a lot of log outputs depicting the various stages of decision making to make a log you can try
python3 main_batchrun.py > SomeFile.txt
VIEWING EXPERT TRAJECTORIES
cd /ai2hor/robot/
(replace with room number and task number)
python3 master_execution.py --room 1 --task 1 --gendata
DEBUGGING pipeline
For new rooms objects may not be identifiable in a map go to /ai2thor/mapper/datagen.py and follow the instructions in the end comments to generate maps and correct maps
go to /ai2thor/log_instructions.py to generate list of existing instructions in the dataset
(ERRORS ?)
If the display is not opening up from the docker instance (if youre using linux azure VM and docker from inside it) mviereck/x11docker#186 and (probably the last instruction of this) https://github.com/stas-pavlov/azure-glx-rendering
using the --privileged flag as in here
(https://answers.ros.org/question/301056/ros2-rviz-in-docker-container/)
is able to make gazebo work with display from docker in azure cloud