[Question] influence of buffer size when using vecenv and save customized replay buffer #1885

JaimeParker · 2024-04-08T16:35:53Z

❓ Question

My first question is should I resize the default buffer_size when using off-policy algorithm and vectorized env?

I noticed that the default buffer_size of SAC is 1e6, for env_num=1. However, for vecenv, the buffer_size for each env is buffer_size / num_env, which means much less replay buffer for each env when I'm using make_vec_env.

For SAC, replay buffer contains $(s_t,a_t,s_{t+1},r_t)$ of different policies, with the help of max entropy it can avoid staying in a local minimum, which means searching advantages.

So will less buffer size cause bad behavior? And is there a limit of max buffer_size for each env?

My second question is near to issue 278, which tells me how to add a customized replay_buffer to buffer. But now I want to create a customized replay_buffer then saving it. Here is my code:

from stable_baselines3.common import off_policy_algorithm, buffers
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.envs import CustomizedEnv
from stable_baselines3.sac.policies import MlpPolicy

env = CustomizedEnv()
vec_env = make_vec_env(CustomizedEnv, n_envs=20)

m_off_policy_algorithm = off_policy_algorithm.OffPolicyAlgorithm(
    policy=MlpPolicy,
    env=env,
    learning_rate=0.0003
)

# to calculate expert traj
# define customized replay buffer here, should be a num_env * (buffer_size / num_env) matrix
m_replay_buffer = buffers.ReplayBuffer(1000, etc...)

m_off_policy_algorithm.replay_buffer = m_replay_buffer
m_off_policy_algorithm.save_replay_buffer('replay_buffer.pkl')

is this way recommended or is there any other way to do this (create a customized replay buffer then saving it) ?

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

araffin · 2024-04-08T18:08:56Z

should I resize the default buffer_size when using off-policy algorithm and vectorized env?
the buffer_size for each env is buffer_size / num_env,

yes

stable-baselines3/stable_baselines3/common/buffers.py

Lines 196 to 197 in 5623d98

    
           # Adjust buffer size 
        
           self.buffer_size = max(buffer_size // n_envs, 1)

because the shape of the buffer is:

stable-baselines3/stable_baselines3/common/buffers.py

Line 212 in 5623d98

    
           self.observations = np.zeros((self.buffer_size, self.n_envs, *self.obs_shape), dtype=observation_space.dtype)

so at the end, the number of transitions stored is the same, you should not need to do anything.

About the size of the replay buffer in general:
There is no recommended answer for that.
The buffer size should be usually "large" enough, which means that for many task, it will never be full (for instance, it has a size of 1M transitions on MuJoCo tasks, but the maximum number of timesteps is 1M too).
Depending on your task, you might want to resize it to be more on-policy (keep mostly recent data).

But now I want to create a customized replay_buffer then saving it. Here is my code:

Not sure how custom you want it to be.
As long as you derived from the base class and follow the interface there should not be any issue.
If you just want to pre-fill a replay buffer, then you can simply instantiate a SB3 one (or use the empty one after initializing the model) and manually add transitions.

JaimeParker · 2024-04-09T02:45:42Z

Many thanks.

The buffer size should be usually "large" enough, which means that for many task, it will never be full.

My task is about a robot from a random state to another random state, so that it might need as much recent data as possible, and I tested the buffer size here:

        print(replay_buffer.size())
        replay_buffer.add(
            self._last_original_obs,  # type: ignore[arg-type]
            next_obs,  # type: ignore[arg-type]
            buffer_action,
            reward_,
            dones,
            infos,
        )

the outcome shows it will be full quickly. So for such a randomized task, should I increase the buffer size significantly?

or maybe RL is not so capable for such task?

JaimeParker added the question Further information is requested label Apr 8, 2024

JaimeParker closed this as completed Apr 10, 2024

JaimeParker mentioned this issue Apr 21, 2024

[Question] Discontinuous reward training curve #1898

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] influence of buffer size when using vecenv and save customized replay buffer #1885

[Question] influence of buffer size when using vecenv and save customized replay buffer #1885

JaimeParker commented Apr 8, 2024 •

edited

Loading

araffin commented Apr 8, 2024

JaimeParker commented Apr 9, 2024 •

edited

Loading

[Question] influence of buffer size when using vecenv and save customized replay buffer #1885

[Question] influence of buffer size when using vecenv and save customized replay buffer #1885

Comments

JaimeParker commented Apr 8, 2024 • edited Loading

❓ Question

Checklist

araffin commented Apr 8, 2024

JaimeParker commented Apr 9, 2024 • edited Loading

JaimeParker commented Apr 8, 2024 •

edited

Loading

JaimeParker commented Apr 9, 2024 •

edited

Loading