`ex_enjoy_pretrained_agent.bash` hanging in fresh Docker image #90

playertr · 2021-08-31T08:04:20Z

Thank you for sharing this wonderful project! I'm looking at your Ignition integration to judge whether Ignition is a viable simulator for a supervised-learning grasping project, and I've enjoyed learning from your repo. I've run into an apparent bug, though:

When I run the script to enjoy the pretrained agent from a fresh docker image, it hangs on enjoy.py:101, at the line obs = env.reset(). CPU usage stays high for at least five minutes, but I don't see anything other than the image below:

I expected to see "an agent trying to grasp one of four objects in a fully randomised novel environment". Is the apparent hanging expected behavior, for instance if large assets are being downloaded? Am I running the script correctly?

Thank you!

Terminal Command and Output

tim@tim-UBUNTU:~/Research/drl_grasping$ ./docker/run.bash andrejorsula/drl_grasping:latest ros2 run drl_grasping ex_enjoy_pretrained_agent.bash
Launching ign_moveit2 in background:
ros2 launch drl_grasping ign_moveit2.launch.py

Executing enjoy command:
ros2 run drl_grasping enjoy.py --env Grasp-OctreeWithColor-Gazebo-v0 --algo tqc --seed 69 --folder /root/drl_grasping/repos/drl_grasping/install/share/drl_grasping/pretrained_agents/Grasp-OctreeWithColor-Gazebo-v0/panda --env-kwargs robot_model:"panda" 

[INFO] [launch]: All log files can be found below /root/.ros/log/2021-08-31-00-56-19-357172-tim-UBUNTU-42
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [move_group-1]: process started with pid [59]
[INFO] [rviz2-2]: process started with pid [61]
[INFO] [parameter_bridge-3]: process started with pid [63]
[INFO] [parameter_bridge-4]: process started with pid [65]
[INFO] [parameter_bridge-5]: process started with pid [67]
[INFO] [parameter_bridge-6]: process started with pid [69]
[INFO] [parameter_bridge-7]: process started with pid [71]
[rviz2-2] QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
[move_group-1] Parsing robot urdf xml string.
[move_group-1] Link panda_link1 had 1 children
[move_group-1] Link panda_link2 had 1 children
[move_group-1] Link panda_link3 had 1 children
[move_group-1] Link panda_link4 had 1 children
[move_group-1] Link panda_link5 had 1 children
[move_group-1] Link panda_link6 had 1 children
[move_group-1] Link panda_link7 had 1 children
[move_group-1] Link panda_hand had 3 children
[move_group-1] Link panda_link8 had 0 children
[move_group-1] Link panda_leftfinger had 0 children
[move_group-1] Link panda_rightfinger had 0 children
[move_group-1] Starting planning scene monitors...
[move_group-1] Planning scene monitors started.
[move_group-1] Loading 'move_group/ApplyPlanningSceneService'...
[move_group-1] Loading 'move_group/ClearOctomapService'...
[move_group-1] Loading 'move_group/MoveGroupCartesianPathService'...
[move_group-1] Loading 'move_group/MoveGroupExecuteTrajectoryAction'...
[move_group-1] Loading 'move_group/MoveGroupGetPlanningSceneService'...
[move_group-1] Loading 'move_group/MoveGroupKinematicsService'...
[move_group-1] Loading 'move_group/MoveGroupMoveAction'...
[move_group-1] Loading 'move_group/MoveGroupPlanService'...
[move_group-1] Loading 'move_group/MoveGroupQueryPlannersService'...
[move_group-1] Loading 'move_group/MoveGroupStateValidationService'...
[move_group-1] 
[move_group-1] You can start planning now!
[move_group-1] 
[rviz2-2] Parsing robot urdf xml string.
[rviz2-2] Link panda_link1 had 1 children
[rviz2-2] Link panda_link2 had 1 children
[rviz2-2] Link panda_link3 had 1 children
[rviz2-2] Link panda_link4 had 1 children
[rviz2-2] Link panda_link5 had 1 children
[rviz2-2] Link panda_link6 had 1 children
[rviz2-2] Link panda_link7 had 1 children
[rviz2-2] Link panda_hand had 3 children
[rviz2-2] Link panda_link8 had 0 children
[rviz2-2] Link panda_leftfinger had 0 children
[rviz2-2] Link panda_rightfinger had 0 children
[rviz2-2] Warning: Invalid frame ID "world" passed to canTransform argument target_frame - frame does not exist
[rviz2-2]          at line 133 in /tmp/binarydeb/ros-foxy-tf2-0.13.10/src/buffer_core.cpp
[rviz2-2] Warning: Invalid frame ID "world" passed to canTransform argument target_frame - frame does not exist
[rviz2-2]          at line 133 in /tmp/binarydeb/ros-foxy-tf2-0.13.10/src/buffer_core.cpp
Loading latest experiment, id=0
[INFO] [1630396582.040055036] [ign_moveit2_py_0]: ign_moveit2_py initialised successfuly
Setting callback for signal SIGINT
Setting callback for signal SIGTERM
Setting callback for signal SIGABRT
Initialised OctreeCnnFeaturesExtractor with 679482 parameters
Initialised OctreeCnnFeaturesExtractor with 679482 parameters
Inserting robot
Warning [GenericJoint.hpp:1480] [GenericJoint::setRestPosition] Value of _q0 [0], is out of the limit range [-3.07178, -0.0698132] for index [0] of Joint [Joint].
Inserting camera

The text was updated successfully, but these errors were encountered:

AndrejOrsula · 2021-08-31T08:43:09Z

Hello,

The output you provided is expected and it looks complete. The Docker image already contains all the assets it needs to run the example, so it is not stuck at downloading them. However, the hanging that you are experiencing might be caused while loading the models into the simulation.

First, please make sure that your CUDA device works inside the container (nvidia-smi).
Then, could you please monitor your RAM and VRAM usage as well while running the example?

If that seems fine, could you please try running a more simple world in Ignition. For example, check that camera_sensor.sdf world shows all the models (including cone), and that the camera sensor provides a video stream (start the simulation and press the refresh button next to the topic selector under Image display on the right).

./docker/run.bash andrejorsula/drl_grasping:latest ign gazebo camera_sensor.sdf

Also, just to confirm. Are you using the pre-built image pulled from Docker Hub? Building it from scratch might cause some unexpected problems due to updates of dependencies.

docker pull andrejorsula/drl_grasping:latest

playertr · 2021-08-31T16:31:28Z

Thanks for the debugging tips!

nvidia-smi works inside the container. I have CUDA 11.4.
When I run the enjoyment script, my RAM and VRAM usage both increase but do not fill up entirely. Screenshots including htop and nvidia-smi are attached.
When I run the command you suggested, a black screen pops up. Eventually, a box pops up informing me that "ign-gazebo-gui" is not responding.

I suspect that somewhere along the chain, my NVIDIA driver -> X server -> Docker setup is misconfigured. Interestingly, glxgears works just fine inside the container. I tried to run the Ignition Gazebo container to debug, but I could not find an official image or build it from source.

Do you have any idea of next steps to find where the problem might lie?

Monitoring Statistics:

 ./docker/run.bash andrejorsula/drl_grasping:latest ros2 run drl_grasping ex_enjoy_pretrained_agent.bash

Simple world:

./docker/run.bash andrejorsula/drl_grasping:latest ign gazebo camera_sensor.sdf

playertr · 2021-08-31T17:39:19Z

Solved!

For some reason, I was running into Ignition Gazebo's ancient resolved Issue #38, which affects ign_transport.

The workaround was to add a new environment variable to the Docker image by adding -e IGN_IP=127.0.0.1 to run.bash:53. I could submit a PR containing that change, but I'm not convinced that the bug affects other users.

With the change, ex_enjoy_pretrained_agent.bash executes and looks totally dope.

For posterity, these were the steps leading me to this solution:

Try out the suggested simple command, ign gazebo camera_sensor.sdf. It failed with no informative errors.
Increase verbosity with the flag -v 4 and notice that it was hanging on the message [GUI] [Dbg] [Gui.cc:151] GUI requesting list of world names. The server may be busy downloading resources. Please be patient.
Identify a relevant forum post leading to the original issue describing the bug and workaround.

AndrejOrsula · 2021-08-31T18:34:06Z

Interesting! I have not experienced a similar issue before, but I am happy that you were able to resolve it.

I'm looking at your Ignition integration to judge whether Ignition is a viable simulator for a supervised-learning grasping project, ...

I thought I would give a couple of remarks for this on a side-note, as this project might not be fully representative of the current state of Ignition.

This project still uses Ignition Dome, so later version(s) might have some features that you could require, in addition to better performance.
Duration of loading a model is a bit misleading in the example above. Initial loading of each model will be quite slow because it goes through pre-processing first + loading of assets into RAM. Each subsequent loading should be almost instantaneous, as all assets are also kept in RAM**.
Since all models are kept in RAM**, RAM usage might be much higher than what you might experience.
You might have already seen it, but there is a recent ML extension for Ignition that could be useful for you - https://community.gazebosim.org/t/gsoc-2021-machine-learning-extension-to-ignition-gazebo/1070. (I have not tried it myself)
If you plan to use ROS 2/MoveIt 2 for motion planning+control and you are not in a hurry, then waiting for https://github.com/ignitionrobotics/ign_ros2_control might be a better option over the current approach for controlling arms (e.g. https://github.com/AndrejOrsula/ign_moveit2 and https://community.gazebosim.org/t/community-meeting-ros-2-ignition-gazebo-august-2021/1058#mobile-manipulators-with-moveit-2-1).

** This might no longer be the case in future releases. The issue can be tracked under gazebosim/gz-gui#208, but there was a mention during one of the community meetings that it is already resolved (or partially resolved).

playertr closed this as completed Aug 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ex_enjoy_pretrained_agent.bash` hanging in fresh Docker image #90

`ex_enjoy_pretrained_agent.bash` hanging in fresh Docker image #90

playertr commented Aug 31, 2021 •

edited

Loading

AndrejOrsula commented Aug 31, 2021

playertr commented Aug 31, 2021 •

edited

Loading

playertr commented Aug 31, 2021 •

edited

Loading

AndrejOrsula commented Aug 31, 2021

ex_enjoy_pretrained_agent.bash hanging in fresh Docker image #90

ex_enjoy_pretrained_agent.bash hanging in fresh Docker image #90

Comments

playertr commented Aug 31, 2021 • edited Loading

AndrejOrsula commented Aug 31, 2021

playertr commented Aug 31, 2021 • edited Loading

playertr commented Aug 31, 2021 • edited Loading

AndrejOrsula commented Aug 31, 2021

`ex_enjoy_pretrained_agent.bash` hanging in fresh Docker image #90

`ex_enjoy_pretrained_agent.bash` hanging in fresh Docker image #90

playertr commented Aug 31, 2021 •

edited

Loading

playertr commented Aug 31, 2021 •

edited

Loading

playertr commented Aug 31, 2021 •

edited

Loading