-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deepbots-panda to deepworlds #33
Conversation
Hey @KelvinYang0320! First of all, thank you for this great addition to the repository. I would also like to commend you on the quality of the code, great job! One issue i noticed and that seems to cause a significant slow down in the training process is that when the arm self-collides, the simulation slows down to a crawl. I am including some screenshots below. I guess that the the robot mesh becomes red when self-colliding and the simulation struggles to move forward, sometimes even showing the physics step warning. I am running the simulation in as-fast-as-possible speed and goes up to x15 when no self-collision is happening and slows down to x1.0 or less when self-colliding. Can you confirm that this is indeed happening on your system as well? I can imagine a possible solution of detecting self-collision and add it as a termination (done) condition with some possible negative reward. It might greatly improve the speed of the training process. I can see that the webots docs it states that the simulation might slow down a lot, but i am not exactly sure if there's a way to detect it in code. I posted a question in the Webots discord and will notify you here when i get an answer. |
Hello @tsampazk The gif below is the same goal reaching task that was solved in my IKPY-based RL Env, and this environment is without the self-collision and boundingObject. |
I opened a feature request on the Webots repository, because it seems a great addition for RL tasks. For this PR maybe you can try removing the bounding box as you suggest as it is not critical. We can re-address this in the future, in relation to the feedback we will get from the Webots developers. |
Just also note that one main problem of this robot model causing low simulation performances are the boundingObjects. |
@all-contributors please add @KelvinYang0320 for code, doc, example, ideas |
I've put up a pull request to add @KelvinYang0320! 🎉 |
@tsampazk |
Yes, i guess it's ok for the purposes of this example! 😄 You can go forward and remove it and we will add more thorough reviews and it should be good to go. |
@tsampazk Okay, I have removed it. |
Since you have observed the results many more times and have more experience on the problem than me, if you think it is better this way, it looks good to me! As a minor note, i think that we would prefer to keep the solved conditions and hyperparameters in a state where it is as easy as possible for someone to train the agent out-of-the-box. For example in the find target problem we provide, the training procedure is problematic and we are looking into fixing it in the future.
To be honest i tried running the example on my system and the robot controller crashes silently after 50-100 episodes. Obviously this is not happening to you, so i am wondering what Webots version are you using? Moreover, i think that we are both running the latest dev version of deepbots installed from here, correct? |
@KelvinYang0320 Also you can take a look at the suggestion posted here. Maybe it is something you can look into in the future if you find yourself working on a similar problem? |
Yes, we are both running the latest dev version of deepbots with Webots R2021a, and I ran this example on Ubuntu 18.04 & Ubuntu 20.04. I have listed this information on the READMD.md. |
I can remove the 6 goals from the 9 goals to simplify this problem.
However, since the robot controls 7 motor positions (in the huge action space), it still might not be able to be solved within 50-100 episodes.
Thank you very much for opening this issue on webots for me. 😄 |
Oops sorry i didn't notice that part of the README! I am having the issue on Windows 10.
Ram usage seems stable, GPU is barely used, and the rest of the system runs fine. Even disabling the SYNCHRONIZATION flag of the controller enables the rest of the simulation to continue running. One thing i noticed is that it happened during the handshaking process. I tried to reproduce it a another time but it has reached 250 episodes and haven't happened. Really weird indeed. @ManosMagnus could you please do a test run and see if the issue appears on your system as well when you get a chance? If my colleague Manos can't reproduce i think we can blame it on my system.
Of course! Then we would have a pretty spectacular RL agent in our hands! 😛 You can keep it as it is, no problem at all. The fact that the provided code solves the problem is good enough. If you can find a way to make it faster it would be great, but it is not a must, so don't worry about it.
You are welcome! Just food for thought for the future! 😁 |
I also tried to reproduce the issue you mentioned on my Windows 10, but it has reached over 500 episodes and haven't happened.:thinking:
Thank you! I think I will just keep it as it is. |
Hello @KelvinYang0320, Sorry for the delay, I will provide a code review on the following days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me. I've run example without any problem. Some minor comments were added.
|
||
# agent.load_models() # Load the pretrained model | ||
episodeCount = 0 | ||
episodeLimit = 50000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a universal constant value or argument of run function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
episodeCount = 0
and episodeLimit = 50000
are set as universal constants.
But we still can change these constants when doing other tasks.
Similar to these lines in cartpole_discrete example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that this can be on top of the file (bellow imports something like EPISODE_LIMIT
) in order to be visible to the end user. Indeed this is as in cartpole, maybe we should change there as well. Of course, it is really minor, if you feel that's better as it is, feel free to resolve the comment
examples/panda/panda_goal_reaching/controllers/robot_supervisor_manager/DDPG_runner.py
Outdated
Show resolved
Hide resolved
examples/panda/panda_goal_reaching/controllers/robot_supervisor_manager/DDPG_runner.py
Outdated
Show resolved
Hide resolved
examples/panda/panda_goal_reaching/controllers/robot_supervisor_manager/DDPG_runner.py
Show resolved
Hide resolved
examples/panda/panda_goal_reaching/controllers/robot_supervisor_manager/DDPG_runner.py
Show resolved
Hide resolved
#self.fc1.weight.data.uniform_(-f1, f1) | ||
#self.fc1.bias.data.uniform_(-f1, f1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove those lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
examples/panda/panda_goal_reaching/controllers/robot_supervisor_manager/checkConvergence.py
Outdated
Show resolved
Hide resolved
self.setup_motors() | ||
|
||
# Set up misc | ||
self.stepsPerEpisode = 300 # How many steps to run each episode (changing this messes up the solved condition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument or constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.stepsPerEpisode = 300
is a constant.
If user are trying to reach farther target, maybe user can change this constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar as the previous comment. Feel free resolve the comment if need be
self.stepsPerEpisode = 300 # How many steps to run each episode (changing this messes up the solved condition) | ||
self.episodeScore = 0 # Score accumulated during an episode | ||
self.episodeScoreList = [] # A list to save all the episode scores, used to check if task is solved | ||
self.motorVelocity = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument or constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to these lines in cartpole_discrete example.
I set these as constants, and users can modify these lines if they want.
examples/panda/panda_goal_reaching/controllers/robot_supervisor_manager/checkConvergence.py
Outdated
Show resolved
Hide resolved
@ManosMagnus Thank you for your comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just two minors comments. Could you please squash the commits in order to get merged?
|
||
# agent.load_models() # Load the pretrained model | ||
episodeCount = 0 | ||
episodeLimit = 50000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that this can be on top of the file (bellow imports something like EPISODE_LIMIT
) in order to be visible to the end user. Indeed this is as in cartpole, maybe we should change there as well. Of course, it is really minor, if you feel that's better as it is, feel free to resolve the comment
self.setup_motors() | ||
|
||
# Set up misc | ||
self.stepsPerEpisode = 300 # How many steps to run each episode (changing this messes up the solved condition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar as the previous comment. Feel free resolve the comment if need be
@ManosMagnus I have set Do you think we should create a python file, say constant.py, for these constants? |
Thank you @KelvinYang0320! That's a good idea. However, I am afraid that it could be not visible to the end user. Do you think constants is better to be added at |
Sorry for the delay in responding.
Yes, that is a good idea. I will modify it. |
Might be because of the cross imports. We can just keep it on I think that can merge the PR for now. Please remove and constants from manager and squash the commit in order to merge the PR. Thank you! |
update readme update readme update readme del break Update README.md Update README.md refactoring refactoring refactoring forget to save my models! os os tmp/ddpg save models every 200 episodes del boundingObject del boundingObject fix solved() SMA100->SMA500 fix maintainer comments del #... in ddpg.py add const const:capital letters update
Ok, I have removed them and squashed all commits to one commit. |
Thank you @KelvinYang0320 for contributing such a nice example! |
No description provided.