Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pybullet gym copy-safe #2536

Closed
matthieu637 opened this issue Dec 11, 2019 · 5 comments
Closed

pybullet gym copy-safe #2536

matthieu637 opened this issue Dec 11, 2019 · 5 comments

Comments

@matthieu637
Copy link

Hello,
I'm trying to sample several next observation from the current state. To do so, I usually use copy.deepcopy(a_gym_environment).
However it doesn't work with pybullet, I observed that is not copy safe.

Example where the 2 prints should return the same output:

import pybullet_envs
import gym
import copy
import numpy as np

ac=[ 0, 0, 0, 0, 0, 0]
np.random.seed(0)

e1=gym.make("HalfCheetahBulletEnv-v0")
e1.seed(0)
e1.reset()

e2=copy.deepcopy(e1)
#e2=gym.make("HalfCheetahBulletEnv-v0")
print(e1.step(ac))
e2.seed(0)
e2.reset()
print(e2.step(ac))

It displays 2 different outputs which mean that e1 and e2 interact with each other.

I guess it's because of the client-server architecture of pybullet. I tried to implement a __deepcopy__ method inside BulletClient but it is not working so far.

Any hints?

@erwincoumans
Copy link
Member

erwincoumans commented Dec 11, 2019

PyBullet is a C plugin, so deepcopy cannot be used.

  1. Each copy would need a unique copy of the simulation. Manually copy the state from one to the other env, and use saveState to disk, restoreState from disk

  2. If you reuse PyBullet instances (deepcopy will do that, since it cannot copy the full C state) then manually use pybullet.saveState and restoreState

Both require serious work I suspect.
It seems easier to create multiple separate envs, and copy the state of one env to the other(s).

@floringogianu
Copy link

@matthieu637 I just started working on support for this here: benelot/pybullet-gym#42, take a look for a discussion of how other envs are doing it.

Here's an example using directly the envs in pybullet, sorry if it's a little messy. It seems to be doing the right thing but not sure if that's really the case, maybe @erwincoumans could confirm it?

import time
import numpy as np
import multiprocessing as mp

from pybullet_envs.gym_locomotion_envs import *
from pybullet_envs.gym_manipulator_envs import *
from pybullet_envs.bullet.racecarZEDGymEnv import RacecarZEDGymEnv
np.set_printoptions(precision=4, suppress=True)


ENVS = {
    "Ant": AntBulletEnv,
    "Reacher": ReacherBulletEnv,
    "Pusher": PusherBulletEnv
}


def mc_rollout(env_name, state_path, crt_step):
    np.set_printoptions(precision=4, suppress=True)

    env = ENVS[env_name](render=False)

    obs = env.reset()
    env._p.restoreState(fileName=state_path)

    print(f"Loaded state from {state_path}.")

    Gt, step, done, first_action = 0, 0, False, None
    while not done:
        action = env.action_space.sample()
        obs, reward, done, _ = env.step(action)

        Gt += reward
        step += 1

        if step % 100 == 0:
            print(f"Rollout: did {step} steps.")

        if step == 1:
            print(f"\nState #{step} in rollout:\n", obs, "\n")
            first_action = action

        if step == (1000 - crt_step):
            break

    print(f"\nRollout done after {step} steps, return={Gt:3.2f}.")
    return Gt, step, first_action


def main():
    pool = mp.Pool(processes=1)
    
    env_name = "Reacher"
    env = ENVS[env_name](render=False)
    print(f"\nStarting {env_name} environment.\n")

    obs, done, roll_act = env.reset(), False, None
    Gt, step = 0, 0
    while not done:
        if step % 100 == 0:
            print(f"Main: did {step} steps.")

        action = env.action_space.sample() if roll_act is None else roll_act
        obs, reward, done, _ = env.step(action)

        if step == 500:

            state_path = "/run/shm/state.bullet"
            env._p.saveBullet(state_path)

            # start a monte-carlo rollout
            task = pool.starmap_async(mc_rollout, [(env_name, state_path, step)])
            # and wait for it to finish
            Gmc, Hmc, roll_act = task.get()[0]
            print(f"Rollout returned after steps={Hmc}, return={Gmc:3.2f}.\n")
        
        if step == 501:
            print(f"State #{step} in main:\n", obs, "\n")

        if step == 1000:
            break

        Gt += reward
        step += 1

    print(f"Done after {step} steps, return={Gt:3.2f}.")


if __name__ == "__main__":
    main()

@erwincoumans
Copy link
Member

Thanks. Benelot's environments are not the same as pybullet_envs, the one that ship with pybullet (in case you run pip3 install pybullet). So it would still be good to have those improvements in this repository.

Note that the bullet_client is now in pybullet_utils, so the envs you are working on in Benelot's envs need to make this change. (not using from pybullet_envs.bullet import bullet_client)

Note that stable baselines uses the environments in this repo. You can also train them in colab:
https://colab.sandbox.google.com/drive/15JSROMJbeiqxcUwifPR2NYeeFBKmyIlX#scrollTo=E2eWDjPZsQc5

See https://github.com/hill-a/stable-baselines and https://github.com/araffin/rl-baselines-zoo

@floringogianu
Copy link

Thank you, I'll look at the resources you pointed.

Just to make sure, the example I gave in the code snippet above is with pybullet_envs, the ones in this repo. All I am doing is using env._p (the pybullet.BulletClient), for saving and reloading.

@erwincoumans
Copy link
Member

@floringogianu Thanks, at a glance it looks good.
Closing this, since I don't expect further contributions in this area.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants