Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ex_enjoy_pretrained_agent.bash hanging in fresh Docker image #90

Closed
playertr opened this issue Aug 31, 2021 · 4 comments
Closed

ex_enjoy_pretrained_agent.bash hanging in fresh Docker image #90

playertr opened this issue Aug 31, 2021 · 4 comments

Comments

@playertr
Copy link

playertr commented Aug 31, 2021

Thank you for sharing this wonderful project! I'm looking at your Ignition integration to judge whether Ignition is a viable simulator for a supervised-learning grasping project, and I've enjoyed learning from your repo. I've run into an apparent bug, though:

When I run the script to enjoy the pretrained agent from a fresh docker image, it hangs on enjoy.py:101, at the line obs = env.reset(). CPU usage stays high for at least five minutes, but I don't see anything other than the image below:

Screenshot from 2021-08-31 00-47-06

I expected to see "an agent trying to grasp one of four objects in a fully randomised novel environment". Is the apparent hanging expected behavior, for instance if large assets are being downloaded? Am I running the script correctly?

Thank you!

Terminal Command and Output
tim@tim-UBUNTU:~/Research/drl_grasping$ ./docker/run.bash andrejorsula/drl_grasping:latest ros2 run drl_grasping ex_enjoy_pretrained_agent.bash
Launching ign_moveit2 in background:
ros2 launch drl_grasping ign_moveit2.launch.py

Executing enjoy command:
ros2 run drl_grasping enjoy.py --env Grasp-OctreeWithColor-Gazebo-v0 --algo tqc --seed 69 --folder /root/drl_grasping/repos/drl_grasping/install/share/drl_grasping/pretrained_agents/Grasp-OctreeWithColor-Gazebo-v0/panda --env-kwargs robot_model:"panda" 

[INFO] [launch]: All log files can be found below /root/.ros/log/2021-08-31-00-56-19-357172-tim-UBUNTU-42
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [move_group-1]: process started with pid [59]
[INFO] [rviz2-2]: process started with pid [61]
[INFO] [parameter_bridge-3]: process started with pid [63]
[INFO] [parameter_bridge-4]: process started with pid [65]
[INFO] [parameter_bridge-5]: process started with pid [67]
[INFO] [parameter_bridge-6]: process started with pid [69]
[INFO] [parameter_bridge-7]: process started with pid [71]
[rviz2-2] QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
[move_group-1] Parsing robot urdf xml string.
[move_group-1] Link panda_link1 had 1 children
[move_group-1] Link panda_link2 had 1 children
[move_group-1] Link panda_link3 had 1 children
[move_group-1] Link panda_link4 had 1 children
[move_group-1] Link panda_link5 had 1 children
[move_group-1] Link panda_link6 had 1 children
[move_group-1] Link panda_link7 had 1 children
[move_group-1] Link panda_hand had 3 children
[move_group-1] Link panda_link8 had 0 children
[move_group-1] Link panda_leftfinger had 0 children
[move_group-1] Link panda_rightfinger had 0 children
[move_group-1] Starting planning scene monitors...
[move_group-1] Planning scene monitors started.
[move_group-1] Loading 'move_group/ApplyPlanningSceneService'...
[move_group-1] Loading 'move_group/ClearOctomapService'...
[move_group-1] Loading 'move_group/MoveGroupCartesianPathService'...
[move_group-1] Loading 'move_group/MoveGroupExecuteTrajectoryAction'...
[move_group-1] Loading 'move_group/MoveGroupGetPlanningSceneService'...
[move_group-1] Loading 'move_group/MoveGroupKinematicsService'...
[move_group-1] Loading 'move_group/MoveGroupMoveAction'...
[move_group-1] Loading 'move_group/MoveGroupPlanService'...
[move_group-1] Loading 'move_group/MoveGroupQueryPlannersService'...
[move_group-1] Loading 'move_group/MoveGroupStateValidationService'...
[move_group-1] 
[move_group-1] You can start planning now!
[move_group-1] 
[rviz2-2] Parsing robot urdf xml string.
[rviz2-2] Link panda_link1 had 1 children
[rviz2-2] Link panda_link2 had 1 children
[rviz2-2] Link panda_link3 had 1 children
[rviz2-2] Link panda_link4 had 1 children
[rviz2-2] Link panda_link5 had 1 children
[rviz2-2] Link panda_link6 had 1 children
[rviz2-2] Link panda_link7 had 1 children
[rviz2-2] Link panda_hand had 3 children
[rviz2-2] Link panda_link8 had 0 children
[rviz2-2] Link panda_leftfinger had 0 children
[rviz2-2] Link panda_rightfinger had 0 children
[rviz2-2] Warning: Invalid frame ID "world" passed to canTransform argument target_frame - frame does not exist
[rviz2-2]          at line 133 in /tmp/binarydeb/ros-foxy-tf2-0.13.10/src/buffer_core.cpp
[rviz2-2] Warning: Invalid frame ID "world" passed to canTransform argument target_frame - frame does not exist
[rviz2-2]          at line 133 in /tmp/binarydeb/ros-foxy-tf2-0.13.10/src/buffer_core.cpp
Loading latest experiment, id=0
[INFO] [1630396582.040055036] [ign_moveit2_py_0]: ign_moveit2_py initialised successfuly
Setting callback for signal SIGINT
Setting callback for signal SIGTERM
Setting callback for signal SIGABRT
Initialised OctreeCnnFeaturesExtractor with 679482 parameters
Initialised OctreeCnnFeaturesExtractor with 679482 parameters
Inserting robot
Warning [GenericJoint.hpp:1480] [GenericJoint::setRestPosition] Value of _q0 [0], is out of the limit range [-3.07178, -0.0698132] for index [0] of Joint [Joint].
Inserting camera
@AndrejOrsula
Copy link
Owner

Hello,

The output you provided is expected and it looks complete. The Docker image already contains all the assets it needs to run the example, so it is not stuck at downloading them. However, the hanging that you are experiencing might be caused while loading the models into the simulation.

  • First, please make sure that your CUDA device works inside the container (nvidia-smi).
  • Then, could you please monitor your RAM and VRAM usage as well while running the example?

If that seems fine, could you please try running a more simple world in Ignition. For example, check that camera_sensor.sdf world shows all the models (including cone), and that the camera sensor provides a video stream (start the simulation and press the refresh button next to the topic selector under Image display on the right).

./docker/run.bash andrejorsula/drl_grasping:latest ign gazebo camera_sensor.sdf

Also, just to confirm. Are you using the pre-built image pulled from Docker Hub? Building it from scratch might cause some unexpected problems due to updates of dependencies.

docker pull andrejorsula/drl_grasping:latest

@playertr
Copy link
Author

playertr commented Aug 31, 2021

Thanks for the debugging tips!

  • nvidia-smi works inside the container. I have CUDA 11.4.
  • When I run the enjoyment script, my RAM and VRAM usage both increase but do not fill up entirely. Screenshots including htop and nvidia-smi are attached.
  • When I run the command you suggested, a black screen pops up. Eventually, a box pops up informing me that "ign-gazebo-gui" is not responding.

I suspect that somewhere along the chain, my NVIDIA driver -> X server -> Docker setup is misconfigured. Interestingly, glxgears works just fine inside the container. I tried to run the Ignition Gazebo container to debug, but I could not find an official image or build it from source.

Do you have any idea of next steps to find where the problem might lie?

Monitoring Statistics:

 ./docker/run.bash andrejorsula/drl_grasping:latest ros2 run drl_grasping ex_enjoy_pretrained_agent.bash

Screenshot from 2021-08-31 07-55-38

Simple world:

./docker/run.bash andrejorsula/drl_grasping:latest ign gazebo camera_sensor.sdf

Screenshot from 2021-08-31 09-27-23

@playertr
Copy link
Author

playertr commented Aug 31, 2021

Solved!

For some reason, I was running into Ignition Gazebo's ancient resolved Issue #38, which affects ign_transport.

The workaround was to add a new environment variable to the Docker image by adding -e IGN_IP=127.0.0.1 to run.bash:53. I could submit a PR containing that change, but I'm not convinced that the bug affects other users.

With the change, ex_enjoy_pretrained_agent.bash executes and looks totally dope.

Screenshot from 2021-08-31 10-37-31

For posterity, these were the steps leading me to this solution:

  1. Try out the suggested simple command, ign gazebo camera_sensor.sdf. It failed with no informative errors.
  2. Increase verbosity with the flag -v 4 and notice that it was hanging on the message [GUI] [Dbg] [Gui.cc:151] GUI requesting list of world names. The server may be busy downloading resources. Please be patient.
  3. Identify a relevant forum post leading to the original issue describing the bug and workaround.

@AndrejOrsula
Copy link
Owner

Interesting! I have not experienced a similar issue before, but I am happy that you were able to resolve it.


I'm looking at your Ignition integration to judge whether Ignition is a viable simulator for a supervised-learning grasping project, ...

I thought I would give a couple of remarks for this on a side-note, as this project might not be fully representative of the current state of Ignition.

** This might no longer be the case in future releases. The issue can be tracked under gazebosim/gz-gui#208, but there was a mention during one of the community meetings that it is already resolved (or partially resolved).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants