You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def normalize_reward(self, reward: np.ndarray) -> np.ndarray:
"""
Normalize rewards using this VecNormalize's rewards statistics.
Calling this method does not update statistics.
"""
if self.norm_reward:
reward = np.clip(reward / np.sqrt(self.ret_rms.var + self.epsilon), -self.clip_reward, self.clip_reward)
return reward
I wonder why we do not subtract mean in normalize reward? I have done some experiment and it indeed shows that "subtract mean" will reduce performance, but chould you tell me why exactly. Are there researches or explanations validing this thought?
The text was updated successfully, but these errors were encountered:
stable-baselines3/stable_baselines3/common/vec_env/vec_normalize.py
Line 177 in 237223f
I wonder why we do not subtract mean in normalize reward? I have done some experiment and it indeed shows that "subtract mean" will reduce performance, but chould you tell me why exactly. Are there researches or explanations validing this thought?
The text was updated successfully, but these errors were encountered: