Extend deepbots to support stable-baselines and implement gym-style default enviroments #85

eakirtas · 2021-03-19T11:15:50Z

OpenAI Gym provides several environments to demonstrate the capabilities of RL in different problems. Deepbots goal is to demonstrate capabilities of RL in a 3D, high fidelity simulator such as Webots. In this way, it is aimed to eliminate the gap between software base RL problems and real life scenarios. On this way, different environments could a perfect test bed for researchers. The great thing about OpenAI Gym is that include several easy-to-use examples. This project goal is to implement some of the existing OpenAI Gym examples using deepbots in Webots simulator.

For example, some of OpenAI gym environments have already been implemented on deepworlds repository:

CartPole-v1 have already been implemented using a custom robot with same objectives as the original one.
MountainCar-v0 have been implemented by using BB-8 robot with same objectives as the original one.

Several other OpenAI gym environments can be replicate in Webots:

MountainCarContinuous-v0 this is similar as MountainCar-v0 problem but with continuous action space
CarRacing-v0 is a car-like robot that tries to remain on a road while tries to achieve the best possible score. A similar objective can be replicated in Webots using any existing robot (such as Boe-Bot, Elisa, e-puck, Khepera IV etc) or even a car-like robots (such as ALTINO) with similar objectives as the original problem.
BipedalWalker-v2 is a 'two-legged' robots which tries to 'walk' on a straight line. A similar environment can be replicated on Webots using the existing 'two-legged' robots (such as Atlas, HOAP-2, KHR-3HV etc)
BipedalWalkerHardcore-v2 is a more challenging version of BipedalWalker-v2 which not only tries to walk on a straight line but also overcome the objects that it founds on it's way.
LunarLander-v2 is a drone-like robot that tries to land on the ground. This can be perfectly replicated in Webots using Mavic 2 PRO
LunarLanderContinuous-v2 is the same problem as LunarLander-v2 but with continuous action space
Ant-v2 is a four-legged creature walk forward as fast as possible. Several four-legged robots are include in Webots (such as Aibo ERS7, bioloid, GhostDog etc)
Robotic arm problem such us FetchPickAndPlace-v1, FetchPush-v1, FetchReach-v1 and FetchSlide-v1 can be integrated on deepworld using Webots robots such as IPR, IRB 4600/40, P-Rob 3, etc
MultiCartPole is similar as the CartPole but with more than one Cart that tries to stabilize two (or more) linked poles.

Of course Webots include a various of robots that potentially can be used on a wide range of problem. Any ideas will be more than welcome. However, we recommend to start with OpenAI gym environment since there are already known to the community and can be easily solved without much research.

It is highly recommended that contributors who are interested on this project to use stable-baselines algorithms which are well established algorithms for Reinforcements Learning problems. Since deepbots version v0.1.3-dev2 stable baselines are supported by deepbots framework.

Finally, a great enhancement on deepworld repository will be a mechanism to run those environments easily and out of the box. A well established infrastructure that users can install each environment separately and run it easily. This feature will bring deepbots closer to OpenAI gym toolkit. Reference issue#7

Regarding the GSoC proposals: We do not expect to include all the above recommended environments but those that you are interested more (2-3 environments will be more that great) and can be fit on your timeline. At the and of the program we except to deliver some of those environments and the setup tool.

Feel free to post your ideas, thoughts or any disagreements.

sanketsans · 2021-03-22T21:04:08Z

Hi @ManosMagnus I am interested to work on this problem for this gsoc.
Contribution

Develop RL based environments which are based on openAI gym scenarios ( https://gym.openai.com/envs/#classic_control ). I also want to develop other testbeds which are different from the gym based environments.
The environments should be enhanced to support following categories :

○ continuous state, discrete action space.
○ Continuous state - action space.
○ Discrete space - action
Implement and integrate DQN, CEM and REINFORCE algorithms in the codebase. Currently, deepbots support DDPG and PPO which are only based on continuous state-action pairs. Other algorithms supporting an enhanced environment catalog will help beginners and researchers to understand much better about the RL algorithms and how they work on different sets of state-action pairs.

tsampazk · 2021-12-05T12:27:18Z

Assigned to @NickKok

SidharajYadav · 2022-01-20T09:44:52Z

how i can contribute
please guide

eakirtas added the GSoC-Project label Mar 19, 2021

tsampazk added this to the Release 0.2.0 milestone Dec 5, 2021

tsampazk mentioned this issue Dec 5, 2021

StableBaselines Intergration #42

Closed

tsampazk closed this as completed May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend deepbots to support stable-baselines and implement gym-style default enviroments #85

Extend deepbots to support stable-baselines and implement gym-style default enviroments #85

eakirtas commented Mar 19, 2021

sanketsans commented Mar 22, 2021

tsampazk commented Dec 5, 2021

SidharajYadav commented Jan 20, 2022

Extend deepbots to support stable-baselines and implement gym-style default enviroments #85

Extend deepbots to support stable-baselines and implement gym-style default enviroments #85

Comments

eakirtas commented Mar 19, 2021

sanketsans commented Mar 22, 2021

tsampazk commented Dec 5, 2021

SidharajYadav commented Jan 20, 2022