You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was comparing baselines and rl-baselines-zoo and when using HER I see the latter outputs just one success rate.
Baselines, on the other hand, outputs two success rates, train and test. The difference between the two, if I got it right, is that the train one refers to the training success rate, which is likely going to be less than the test due to the use of noise for exploration.
May I ask, therefore, what is the success_rate in this output? Also, is there a resource where I can study to confirm my understanding of the other outputted parameters?
The following is an example of output. Also, I should ask, why doesn't number of epochs grow? This should be the 26th.
Hello,
I was comparing baselines and rl-baselines-zoo and when using HER I see the latter outputs just one success rate.
Baselines, on the other hand, outputs two success rates, train and test. The difference between the two, if I got it right, is that the train one refers to the training success rate, which is likely going to be less than the test due to the use of noise for exploration.
May I ask, therefore, what is the success_rate in this output? Also, is there a resource where I can study to confirm my understanding of the other outputted parameters?
The following is an example of output. Also, I should ask, why doesn't number of epochs grow? This should be the 26th.
| obs_rms_mean | -0.0829 |
| obs_rms_std | 0.384 |
| reference_Q_mean | -8.74 |
| reference_Q_std | 7.05 |
| reference_action_mean | -0.232 |
| reference_action_std | 0.925 |
| reference_actor_Q_mean | -8.5 |
| reference_actor_Q_std | 7.12 |
| rollout/Q_mean | -8.01 |
| rollout/actions_mean | -0.04 |
| rollout/actions_std | 0.704 |
| rollout/episode_steps | 150 |
| rollout/episodes | 1.73e+03 |
| rollout/return | -105 |
| rollout/return_history | -82.7 |
| success rate | 0.86 |
| total/duration | 5.09e+04 |
| total/episodes | 1.73e+03 |
| total/epochs | 1 |
| total/steps | 259998 |
| total/steps_per_second | 5.1 |
| train/loss_actor | 4.76 |
| train/loss_critic | 0.0953 |
| train/param_noise_di... | 0 |
Used these hyperparameters:
MyEnv-v0:
n_timesteps: !!float 20000
policy: 'MlpPolicy'
model_class: 'ddpg'
n_sampled_goal: 4
goal_selection_strategy: 'future'
buffer_size: 1000000
batch_size: 256
gamma: 0.95
random_exploration: 0.3
actor_lr: !!float 1e-3
critic_lr: !!float 1e-3
noise_type: 'normal'
noise_std: 0.2
normalize_observations: true
normalize_returns: false
policy_kwargs: "dict(layers=[256, 256, 256])"
Best regards
The text was updated successfully, but these errors were encountered: