Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save/Resume support #42

Closed
wants to merge 4 commits into from
Closed

Save/Resume support #42

wants to merge 4 commits into from

Conversation

floringogianu
Copy link

@floringogianu floringogianu commented Jan 28, 2020

Started work on #40.

Because saving and loading using pybullet.saveState() is not working across python processes (which don't share memory), I decided to use the pybullet.saveBullet() API which saves the state of the environment on the disk. This could potentially be expensive, so I added an example in which (on Linux at least) we can save the states to /run/shm/ which is a memory mapped file in fedora and ubuntu based distributions.

state_path = "/run/shm/state.bullet"  # or somewhere on the disk

env.save_env_state(state_path)
...
env.make(...)
env.reset()  # this is necessary
env.restore_env_state(state_path)

I think ideally pybullet would return the serialized state so we can pass it around among python processes, but I looked at the C++ backend and it doesn't look trivial to implement.

Compatibility with OpenAI/gym:

The contributors over at OpenAI/gym haven't decided on an API for saving and resuming state: openai/gym#402.

ALE implements it like this:

   snapshot = env.ale.cloneState()
   ...
   env.ale.restoreState(snapshot)

The recommended way for MuJoCo is:

saved_state = env.sim.get_state()
...
env.sim.set_state(saved_state)

Or, if the env is entirely written in python you can just deepcopy it. Other envs, such as those based on Box2d simply don't have support for this.

Eventually we will need to get close to the API they'll decide on.

@benelot
Copy link
Owner

benelot commented Jan 28, 2020

As far as I know, the reset mechanism in pybullet is based on saving and loading the state again. I would suggest to look into this as I believe it is not disk-based.

@floringogianu
Copy link
Author

Yes, I tried first to work with the same mechanism used in env.reset(), as I mentioned above. However it does not work between processes. If you have any pointers to how it can be used, besides the way pybullet.saveState() is used in env.reset() and in this example here so that it works between python processes, let me know.

@mantle2048
Copy link

mantle2048 commented Apr 21, 2021

Hi, thanks for your great work for saving and restoring state in pybullet-gym. It helped me a lot.
But through the following test, I found that there is a problem in your implementation.

import gym
import pybulletgym


if __name__ == '__main__':
    env = gym.make("HalfCheetahMuJoCoEnv-v0")
    env.seed(0)
    env.action_space.seed(0)
    obs, done = env.reset(), False
    for i in range(200):
        nx_obs, rew, done, _ = env.step(env.action_space.sample())

    env.save_env_state('state.bullet')
    act = env.action_space.sample()
    state1 = env.step(act)

    env.restore_env_state('state.bullet')
    state2 = env.step(act)
    print(state1)
    print(state2)
#  the output of the test
(array([-4.1603717e-01,  1.4024903e-01,  3.2494479e-01,  3.7975675e-01,
        5.4050684e-01,  8.0007428e-01, -1.2108239e+00, -1.9657683e+00,
       -4.3360400e-01, -3.7912399e-01,  5.7443666e-01, -3.6629651e+00,
        2.3992314e+00,  1.3094640e+01, -4.5016697e-03, -4.8106198e+00,
        1.6316326e+01], dtype=float32), -0.6035058938981799, False, {})
(array([-4.1603717e-01,  1.4024903e-01,  3.2494479e-01,  3.7975675e-01,
        5.4050684e-01,  8.0007428e-01, -1.2108239e+00, -1.9657683e+00,
       -4.3360400e-01, -3.7912399e-01,  5.7443666e-01, -3.6629651e+00,
        2.3992314e+00,  1.3094640e+01, -4.5016697e-03, -4.8106198e+00,
        1.6316326e+01], dtype=float32), -0.22421693801879883, False, {})

As expected, state1 and state2 should exactly be the same, but the two env.step(act) produce different rewards. After debugging, I found that pybullet.saveBullet() or pybullet.saveState() only save the state of the emulator, but the reward value is calculated by Python code.

Specifically, taking the env HalfCheetahMuJoCoEnv-v0 as an example. self.robot represents the robot-HalfCheetah in the env.

In HalfCheetah, self.pos_after records the current position of the robot and the function calc_potential is used to update the value of self.pos_after and calc the potential which is a part of the reward.

After restoring the saved state, the state of the emulator did change. But the self.pos_after didn't change, the potential rewards in the first env.step() after restoring the saved state will be zero.

Therefore, I think we should save the pos_after of the current state in save_env_state, and then set pos_after to the correct value in restore_env_state.

The specific implementation is:

def save_env_state(self, file_path):
        self.prev_pos = self.robot.pos_after
	self._p.saveBullet(file_path)

def restore_env_state(self, file_path):
        self.robot.pos_after = self.prev_pos
	self._p.restoreState(fileName=file_path)
	try:
		os.remove(file_path)
	except FileNotFoundError:
		pass
#  After modification, the output of the test
(array([-0.406764  ,  0.13484241,  0.36529362,  0.31630808,  0.58991945,
        0.8009333 , -1.1393532 , -2.4010131 , -0.4189203 , -0.22211011,
        0.41537938, -3.5352209 ,  1.9927833 , 14.179333  , -0.05656279,
       -3.5729892 , 14.309292  ], dtype=float32), -0.5812075076813373, False, {})
(array([-0.406764  ,  0.13484241,  0.36529362,  0.31630808,  0.58991945,
        0.8009333 , -1.1393532 , -2.4010131 , -0.4189203 , -0.22211011,
        0.41537938, -3.5352209 ,  1.9927833 , 14.179333  , -0.05656279,
       -3.5729892 , 14.309292  ], dtype=float32), -0.5812075076813373, False, {})

the state1 and state2 are exactly the same.

@floringogianu floringogianu closed this by deleting the head repository Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants