# Hyperparameter optimization for PPO RL agent using ASHA scheduler and Random search

In this lesson, we learn to use the ASHA scheduler to stop less-promising trials (with bad hyperparameter value combinations) and speeding up the training process while optimizing the hyperparameters.

In [1]:
from pathlib import Path
from ray import air, tune
from ray.tune.schedulers import ASHAScheduler
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.algorithms.algorithm import Algorithm

For this notebook, we will try to optimize the learning rate `lr` and the discount factor `gamma` hyperparameters (same from [lesson 2 notebook 2](2-optimize_ppo_hyperparameters_cartpole.ipynb)).

In [2]:
search_space = {
    "lr": tune.loguniform(1e-5, 1),
    "gamma": tune.choice(
        [
            0.5,
            0.6,
            0.7,
            0.8,
            0.9,
            0.95,
            0.98,
            0.99,
            0.995,
            0.999,
            0.9999,
        ]
    ),
}

Similarly to [lesson 2 notebook 2](2-optimize_ppo_hyperparameters_cartpole.ipynb), we use a random search algorithm.

In [3]:
search_algo = tune.search.basic_variant.BasicVariantGenerator()  # Random search

In this example, we utilize an ASHA scheduler to stop unpromising trials (with bad performance).

In [4]:
scheduler_algo = ASHAScheduler(
    time_attr="training_iteration",  # Metric to use for time comparison
    max_t=10,  # Max time units per trial.
    grace_period=1,  # Only stop trials at least this old in time.
    reduction_factor=3,  # Used to set halving rate and amount.
    brackets=1,  # Number of brackets. Each bracket has a different halving rate, specified by the reduction factor.
)  # ASHA trial scheduler

Once the search and scheduler algorithms are defined, we can define our Tune configuration:

In [5]:
number_trials = 10
tune_config = tune.TuneConfig(
    metric="env_runners/episode_reward_mean",  # That's the metric we want to maximize/minimize
    mode="max",  # Here we indicate we want to maximize the metric env_runners/episode_reward_mean
    scheduler=scheduler_algo,
    search_alg=search_algo,
    num_samples=number_trials,  # Number of trials to run
)

Now, it's time to train our PPO RL agent using this Tune configurations. When executing the cell below, pay attention to the number of trials with status PENDING and RUNNING.

In [6]:
config = PPOConfig().environment("CartPole-v1")
stop = {
    "training_iteration": 10,
}
checkpoint_frequency = 0
store_results_path = str(Path("./ray_results/").resolve()) + "/nb_3/"
agent_name = "ppo_cartpole"

tuner = tune.Tuner(
    "PPO",
    param_space={
        **config.to_dict(),
        **search_space,
    },  # Here we mix the Algo config with the search space
    tune_config=tune_config,
    run_config=air.RunConfig(
        storage_path=store_results_path,
        name=agent_name,
        stop=stop,
        verbose=2,
        checkpoint_config=air.CheckpointConfig(
            checkpoint_frequency=checkpoint_frequency,
            checkpoint_at_end=True,
        ),
    ),
)
results = tuner.fit()
print(results)

2024-12-01 00:53:44,932	INFO worker.py:1783 -- Started a local Ray instance.
2024-12-01 00:53:45,446	INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
2024-12-01 00:53:45,448	INFO tune.py:616 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949
  gym.logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
  logger.warn(
  logger.warn(f"{pre} is not within the observation space.")


0,1
Current time:,2024-12-01 00:55:25
Running for:,00:01:39.91
Memory:,4.9/23.9 GiB

Trial name,status,loc,gamma,lr,iter,total time (s),ts,num_healthy_workers,num_in_flight_async_ sample_reqs,num_remote_worker_re starts
PPO_CartPole-v1_b6f11_00000,TERMINATED,200.239.93.233:574827,0.9999,1.33874e-05,1,9.66023,4000,2,0,0
PPO_CartPole-v1_b6f11_00001,TERMINATED,200.239.93.233:574828,0.995,0.000147899,3,29.1785,12000,2,0,0
PPO_CartPole-v1_b6f11_00002,TERMINATED,200.239.93.233:574829,0.995,0.0034044,1,9.87723,4000,2,0,0
PPO_CartPole-v1_b6f11_00003,TERMINATED,200.239.93.233:574830,0.999,0.00018428,1,9.89045,4000,2,0,0
PPO_CartPole-v1_b6f11_00004,TERMINATED,200.239.93.233:574831,0.9,0.000830229,10,91.5173,40000,2,0,0
PPO_CartPole-v1_b6f11_00005,TERMINATED,200.239.93.233:575786,0.6,0.897171,3,36.9139,12000,2,0,0
PPO_CartPole-v1_b6f11_00006,TERMINATED,200.239.93.233:575787,0.9999,0.000812042,1,9.76093,4000,2,0,0
PPO_CartPole-v1_b6f11_00007,TERMINATED,200.239.93.233:575845,0.9,0.00236283,1,9.72787,4000,2,0,0
PPO_CartPole-v1_b6f11_00008,TERMINATED,200.239.93.233:576385,0.995,5.03801e-05,1,9.57406,4000,2,0,0
PPO_CartPole-v1_b6f11_00009,TERMINATED,200.239.93.233:576444,0.5,0.0654679,1,9.50552,4000,2,0,0


[36m(PPO pid=574831)[0m Install gputil for GPU system monitoring.


Trial name,agent_timesteps_total,counters,custom_metrics,env_runners,episode_media,info,num_agent_steps_sampled,num_agent_steps_sampled_lifetime,num_agent_steps_trained,num_env_steps_sampled,num_env_steps_sampled_lifetime,num_env_steps_sampled_this_iter,num_env_steps_sampled_throughput_per_sec,num_env_steps_trained,num_env_steps_trained_this_iter,num_env_steps_trained_throughput_per_sec,num_healthy_workers,num_in_flight_async_sample_reqs,num_remote_worker_restarts,num_steps_trained_this_iter,perf,timers
PPO_CartPole-v1_b6f11_00000,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 74.0, 'episode_reward_min': 9.0, 'episode_reward_mean': np.float64(21.240641711229948), 'episode_len_mean': np.float64(21.240641711229948), 'episode_media': {}, 'episodes_timesteps_total': 3972, 'policy_reward_min': {'default_policy': np.float64(9.0)}, 'policy_reward_max': {'default_policy': np.float64(74.0)}, 'policy_reward_mean': {'default_policy': np.float64(21.240641711229948)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [19.0, 66.0, 21.0, 16.0, 19.0, 44.0, 12.0, 17.0, 14.0, 20.0, 22.0, 16.0, 29.0, 15.0, 16.0, 16.0, 13.0, 18.0, 12.0, 13.0, 40.0, 17.0, 25.0, 26.0, 35.0, 38.0, 24.0, 11.0, 9.0, 13.0, 11.0, 20.0, 14.0, 21.0, 36.0, 16.0, 16.0, 14.0, 48.0, 28.0, 18.0, 13.0, 28.0, 24.0, 13.0, 24.0, 49.0, 16.0, 14.0, 12.0, 74.0, 15.0, 21.0, 20.0, 23.0, 22.0, 12.0, 24.0, 33.0, 22.0, 15.0, 13.0, 17.0, 46.0, 17.0, 49.0, 47.0, 10.0, 32.0, 34.0, 17.0, 24.0, 13.0, 17.0, 16.0, 59.0, 14.0, 20.0, 19.0, 11.0, 13.0, 26.0, 16.0, 21.0, 16.0, 16.0, 26.0, 13.0, 13.0, 49.0, 16.0, 20.0, 21.0, 13.0, 13.0, 16.0, 31.0, 15.0, 14.0, 14.0, 23.0, 35.0, 12.0, 15.0, 26.0, 12.0, 18.0, 13.0, 18.0, 14.0, 19.0, 22.0, 34.0, 47.0, 18.0, 28.0, 24.0, 29.0, 11.0, 29.0, 16.0, 28.0, 18.0, 33.0, 14.0, 25.0, 21.0, 23.0, 17.0, 15.0, 15.0, 44.0, 11.0, 12.0, 17.0, 10.0, 21.0, 20.0, 10.0, 23.0, 37.0, 19.0, 20.0, 18.0, 30.0, 12.0, 23.0, 19.0, 12.0, 11.0, 28.0, 20.0, 13.0, 15.0, 19.0, 9.0, 45.0, 9.0, 12.0, 16.0, 21.0, 19.0, 12.0, 12.0, 21.0, 12.0, 20.0, 18.0, 28.0, 17.0, 15.0, 12.0, 11.0, 9.0, 32.0, 17.0, 27.0, 10.0, 23.0, 17.0, 13.0, 18.0, 15.0, 25.0, 48.0, 33.0, 11.0], 'episode_lengths': [19, 66, 21, 16, 19, 44, 12, 17, 14, 20, 22, 16, 29, 15, 16, 16, 13, 18, 12, 13, 40, 17, 25, 26, 35, 38, 24, 11, 9, 13, 11, 20, 14, 21, 36, 16, 16, 14, 48, 28, 18, 13, 28, 24, 13, 24, 49, 16, 14, 12, 74, 15, 21, 20, 23, 22, 12, 24, 33, 22, 15, 13, 17, 46, 17, 49, 47, 10, 32, 34, 17, 24, 13, 17, 16, 59, 14, 20, 19, 11, 13, 26, 16, 21, 16, 16, 26, 13, 13, 49, 16, 20, 21, 13, 13, 16, 31, 15, 14, 14, 23, 35, 12, 15, 26, 12, 18, 13, 18, 14, 19, 22, 34, 47, 18, 28, 24, 29, 11, 29, 16, 28, 18, 33, 14, 25, 21, 23, 17, 15, 15, 44, 11, 12, 17, 10, 21, 20, 10, 23, 37, 19, 20, 18, 30, 12, 23, 19, 12, 11, 28, 20, 13, 15, 19, 9, 45, 9, 12, 16, 21, 19, 12, 12, 21, 12, 20, 18, 28, 17, 15, 12, 11, 9, 32, 17, 27, 10, 23, 17, 13, 18, 15, 25, 48, 33, 11], 'policy_default_policy_reward': [19.0, 66.0, 21.0, 16.0, 19.0, 44.0, 12.0, 17.0, 14.0, 20.0, 22.0, 16.0, 29.0, 15.0, 16.0, 16.0, 13.0, 18.0, 12.0, 13.0, 40.0, 17.0, 25.0, 26.0, 35.0, 38.0, 24.0, 11.0, 9.0, 13.0, 11.0, 20.0, 14.0, 21.0, 36.0, 16.0, 16.0, 14.0, 48.0, 28.0, 18.0, 13.0, 28.0, 24.0, 13.0, 24.0, 49.0, 16.0, 14.0, 12.0, 74.0, 15.0, 21.0, 20.0, 23.0, 22.0, 12.0, 24.0, 33.0, 22.0, 15.0, 13.0, 17.0, 46.0, 17.0, 49.0, 47.0, 10.0, 32.0, 34.0, 17.0, 24.0, 13.0, 17.0, 16.0, 59.0, 14.0, 20.0, 19.0, 11.0, 13.0, 26.0, 16.0, 21.0, 16.0, 16.0, 26.0, 13.0, 13.0, 49.0, 16.0, 20.0, 21.0, 13.0, 13.0, 16.0, 31.0, 15.0, 14.0, 14.0, 23.0, 35.0, 12.0, 15.0, 26.0, 12.0, 18.0, 13.0, 18.0, 14.0, 19.0, 22.0, 34.0, 47.0, 18.0, 28.0, 24.0, 29.0, 11.0, 29.0, 16.0, 28.0, 18.0, 33.0, 14.0, 25.0, 21.0, 23.0, 17.0, 15.0, 15.0, 44.0, 11.0, 12.0, 17.0, 10.0, 21.0, 20.0, 10.0, 23.0, 37.0, 19.0, 20.0, 18.0, 30.0, 12.0, 23.0, 19.0, 12.0, 11.0, 28.0, 20.0, 13.0, 15.0, 19.0, 9.0, 45.0, 9.0, 12.0, 16.0, 21.0, 19.0, 12.0, 12.0, 21.0, 12.0, 20.0, 18.0, 28.0, 17.0, 15.0, 12.0, 11.0, 9.0, 32.0, 17.0, 27.0, 10.0, 23.0, 17.0, 13.0, 18.0, 15.0, 25.0, 48.0, 33.0, 11.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.21891919434834076), 'mean_inference_ms': np.float64(0.6758492674212827), 'mean_action_processing_ms': np.float64(0.08024121606923107), 'mean_env_wait_ms': np.float64(0.04002833072215309), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004381037013415984), 'StateBufferConnector_ms': np.float64(0.0034086207017541567), 'ViewRequirementAgentConnector_ms': np.float64(0.08246885901466411)}, 'num_episodes': 187, 'episode_return_max': 74.0, 'episode_return_min': 9.0, 'episode_return_mean': np.float64(21.240641711229948), 'episodes_this_iter': 187}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(1.8215415), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(1.3387362386843284e-05), 'total_loss': np.float64(9.208352335037723), 'policy_loss': np.float64(-0.03128230159711694), 'vf_loss': np.float64(9.235793708473123), 'vf_explained_var': np.float64(-0.000993846244709466), 'kl': np.float64(0.019204470895124115), 'entropy': np.float64(0.6744765489332137), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,414.267,4000,4000,414.267,2,0,0,4000,"{'cpu_util_percent': np.float64(36.84285714285714), 'ram_util_percent': np.float64(38.471428571428575)}","{'training_iteration_time_ms': 9655.61, 'restore_workers_time_ms': 0.021, 'training_step_time_ms': 9655.557, 'sample_time_ms': 2065.697, 'load_time_ms': 0.488, 'load_throughput': 8204017.604, 'learn_time_ms': 7584.957, 'learn_throughput': 527.36, 'synch_weights_time_ms': 3.752}"
PPO_CartPole-v1_b6f11_00001,12000,"{'num_env_steps_sampled': 12000, 'num_env_steps_trained': 12000, 'num_agent_steps_sampled': 12000, 'num_agent_steps_trained': 12000}",{},"{'episode_reward_max': 313.0, 'episode_reward_min': 12.0, 'episode_reward_mean': np.float64(79.28), 'episode_len_mean': np.float64(79.28), 'episode_media': {}, 'episodes_timesteps_total': 7928, 'policy_reward_min': {'default_policy': np.float64(12.0)}, 'policy_reward_max': {'default_policy': np.float64(313.0)}, 'policy_reward_mean': {'default_policy': np.float64(79.28)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [15.0, 15.0, 15.0, 27.0, 54.0, 133.0, 21.0, 13.0, 18.0, 130.0, 60.0, 64.0, 135.0, 83.0, 13.0, 78.0, 43.0, 145.0, 16.0, 22.0, 128.0, 180.0, 85.0, 60.0, 62.0, 42.0, 117.0, 28.0, 41.0, 39.0, 72.0, 48.0, 65.0, 16.0, 43.0, 44.0, 18.0, 91.0, 54.0, 20.0, 59.0, 17.0, 31.0, 14.0, 37.0, 213.0, 60.0, 74.0, 62.0, 38.0, 31.0, 37.0, 15.0, 12.0, 14.0, 33.0, 29.0, 76.0, 111.0, 44.0, 109.0, 146.0, 64.0, 43.0, 19.0, 42.0, 48.0, 56.0, 129.0, 28.0, 15.0, 67.0, 142.0, 92.0, 139.0, 117.0, 182.0, 134.0, 123.0, 242.0, 23.0, 161.0, 231.0, 245.0, 17.0, 199.0, 29.0, 93.0, 59.0, 61.0, 163.0, 250.0, 152.0, 205.0, 313.0, 31.0, 160.0, 85.0, 128.0, 26.0], 'episode_lengths': [15, 15, 15, 27, 54, 133, 21, 13, 18, 130, 60, 64, 135, 83, 13, 78, 43, 145, 16, 22, 128, 180, 85, 60, 62, 42, 117, 28, 41, 39, 72, 48, 65, 16, 43, 44, 18, 91, 54, 20, 59, 17, 31, 14, 37, 213, 60, 74, 62, 38, 31, 37, 15, 12, 14, 33, 29, 76, 111, 44, 109, 146, 64, 43, 19, 42, 48, 56, 129, 28, 15, 67, 142, 92, 139, 117, 182, 134, 123, 242, 23, 161, 231, 245, 17, 199, 29, 93, 59, 61, 163, 250, 152, 205, 313, 31, 160, 85, 128, 26], 'policy_default_policy_reward': [15.0, 15.0, 15.0, 27.0, 54.0, 133.0, 21.0, 13.0, 18.0, 130.0, 60.0, 64.0, 135.0, 83.0, 13.0, 78.0, 43.0, 145.0, 16.0, 22.0, 128.0, 180.0, 85.0, 60.0, 62.0, 42.0, 117.0, 28.0, 41.0, 39.0, 72.0, 48.0, 65.0, 16.0, 43.0, 44.0, 18.0, 91.0, 54.0, 20.0, 59.0, 17.0, 31.0, 14.0, 37.0, 213.0, 60.0, 74.0, 62.0, 38.0, 31.0, 37.0, 15.0, 12.0, 14.0, 33.0, 29.0, 76.0, 111.0, 44.0, 109.0, 146.0, 64.0, 43.0, 19.0, 42.0, 48.0, 56.0, 129.0, 28.0, 15.0, 67.0, 142.0, 92.0, 139.0, 117.0, 182.0, 134.0, 123.0, 242.0, 23.0, 161.0, 231.0, 245.0, 17.0, 199.0, 29.0, 93.0, 59.0, 61.0, 163.0, 250.0, 152.0, 205.0, 313.0, 31.0, 160.0, 85.0, 128.0, 26.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.21399125313845066), 'mean_inference_ms': np.float64(0.6808071244621118), 'mean_action_processing_ms': np.float64(0.0805726636070855), 'mean_env_wait_ms': np.float64(0.04069024424448269), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.00419306755065918), 'StateBufferConnector_ms': np.float64(0.0032105445861816406), 'ViewRequirementAgentConnector_ms': np.float64(0.08512163162231445)}, 'num_episodes': 29, 'episode_return_max': 313.0, 'episode_return_min': 12.0, 'episode_return_mean': np.float64(79.28), 'episodes_this_iter': 29}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(0.4670086), 'cur_kl_coeff': np.float64(0.3), 'cur_lr': np.float64(0.00014789929860875513), 'total_loss': np.float64(9.807838360981275), 'policy_loss': np.float64(-0.022226133475941354), 'vf_loss': np.float64(9.82706548219086), 'vf_explained_var': np.float64(0.006808120012283325), 'kl': np.float64(0.009996683280784069), 'entropy': np.float64(0.5744201308937483), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(2325.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 12000, 'num_env_steps_trained': 12000, 'num_agent_steps_sampled': 12000, 'num_agent_steps_trained': 12000}",12000,12000,12000,12000,12000,4000,415.999,12000,4000,415.999,2,0,0,4000,"{'cpu_util_percent': np.float64(38.94285714285714), 'ram_util_percent': np.float64(38.07142857142857)}","{'training_iteration_time_ms': 9722.392, 'restore_workers_time_ms': 0.019, 'training_step_time_ms': 9722.342, 'sample_time_ms': 2114.384, 'load_time_ms': 0.35, 'load_throughput': 11428621.253, 'learn_time_ms': 7602.946, 'learn_throughput': 526.112, 'synch_weights_time_ms': 4.072}"
PPO_CartPole-v1_b6f11_00002,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 66.0, 'episode_reward_min': 10.0, 'episode_reward_mean': np.float64(21.52972972972973), 'episode_len_mean': np.float64(21.52972972972973), 'episode_media': {}, 'episodes_timesteps_total': 3983, 'policy_reward_min': {'default_policy': np.float64(10.0)}, 'policy_reward_max': {'default_policy': np.float64(66.0)}, 'policy_reward_mean': {'default_policy': np.float64(21.52972972972973)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [11.0, 34.0, 15.0, 23.0, 18.0, 21.0, 23.0, 14.0, 37.0, 16.0, 11.0, 31.0, 31.0, 18.0, 23.0, 18.0, 23.0, 18.0, 43.0, 16.0, 21.0, 17.0, 27.0, 44.0, 31.0, 16.0, 33.0, 16.0, 19.0, 26.0, 14.0, 24.0, 24.0, 12.0, 48.0, 44.0, 25.0, 13.0, 16.0, 15.0, 11.0, 19.0, 29.0, 13.0, 24.0, 19.0, 25.0, 13.0, 34.0, 20.0, 37.0, 19.0, 10.0, 20.0, 10.0, 20.0, 11.0, 15.0, 21.0, 24.0, 10.0, 28.0, 20.0, 11.0, 31.0, 15.0, 16.0, 18.0, 50.0, 18.0, 38.0, 11.0, 12.0, 33.0, 33.0, 66.0, 19.0, 19.0, 34.0, 17.0, 15.0, 17.0, 26.0, 18.0, 15.0, 17.0, 33.0, 13.0, 23.0, 27.0, 19.0, 12.0, 12.0, 12.0, 11.0, 21.0, 13.0, 22.0, 10.0, 23.0, 10.0, 43.0, 17.0, 16.0, 22.0, 17.0, 17.0, 28.0, 46.0, 23.0, 12.0, 16.0, 32.0, 28.0, 36.0, 27.0, 24.0, 20.0, 16.0, 21.0, 11.0, 29.0, 10.0, 18.0, 11.0, 20.0, 12.0, 12.0, 18.0, 36.0, 15.0, 30.0, 19.0, 22.0, 19.0, 16.0, 27.0, 11.0, 13.0, 26.0, 25.0, 14.0, 21.0, 32.0, 18.0, 24.0, 27.0, 35.0, 34.0, 22.0, 12.0, 27.0, 17.0, 13.0, 15.0, 26.0, 18.0, 14.0, 11.0, 17.0, 28.0, 18.0, 13.0, 22.0, 40.0, 32.0, 23.0, 12.0, 13.0, 13.0, 21.0, 19.0, 17.0, 21.0, 14.0, 24.0, 35.0, 13.0, 24.0, 27.0, 13.0, 42.0, 18.0, 17.0, 15.0], 'episode_lengths': [11, 34, 15, 23, 18, 21, 23, 14, 37, 16, 11, 31, 31, 18, 23, 18, 23, 18, 43, 16, 21, 17, 27, 44, 31, 16, 33, 16, 19, 26, 14, 24, 24, 12, 48, 44, 25, 13, 16, 15, 11, 19, 29, 13, 24, 19, 25, 13, 34, 20, 37, 19, 10, 20, 10, 20, 11, 15, 21, 24, 10, 28, 20, 11, 31, 15, 16, 18, 50, 18, 38, 11, 12, 33, 33, 66, 19, 19, 34, 17, 15, 17, 26, 18, 15, 17, 33, 13, 23, 27, 19, 12, 12, 12, 11, 21, 13, 22, 10, 23, 10, 43, 17, 16, 22, 17, 17, 28, 46, 23, 12, 16, 32, 28, 36, 27, 24, 20, 16, 21, 11, 29, 10, 18, 11, 20, 12, 12, 18, 36, 15, 30, 19, 22, 19, 16, 27, 11, 13, 26, 25, 14, 21, 32, 18, 24, 27, 35, 34, 22, 12, 27, 17, 13, 15, 26, 18, 14, 11, 17, 28, 18, 13, 22, 40, 32, 23, 12, 13, 13, 21, 19, 17, 21, 14, 24, 35, 13, 24, 27, 13, 42, 18, 17, 15], 'policy_default_policy_reward': [11.0, 34.0, 15.0, 23.0, 18.0, 21.0, 23.0, 14.0, 37.0, 16.0, 11.0, 31.0, 31.0, 18.0, 23.0, 18.0, 23.0, 18.0, 43.0, 16.0, 21.0, 17.0, 27.0, 44.0, 31.0, 16.0, 33.0, 16.0, 19.0, 26.0, 14.0, 24.0, 24.0, 12.0, 48.0, 44.0, 25.0, 13.0, 16.0, 15.0, 11.0, 19.0, 29.0, 13.0, 24.0, 19.0, 25.0, 13.0, 34.0, 20.0, 37.0, 19.0, 10.0, 20.0, 10.0, 20.0, 11.0, 15.0, 21.0, 24.0, 10.0, 28.0, 20.0, 11.0, 31.0, 15.0, 16.0, 18.0, 50.0, 18.0, 38.0, 11.0, 12.0, 33.0, 33.0, 66.0, 19.0, 19.0, 34.0, 17.0, 15.0, 17.0, 26.0, 18.0, 15.0, 17.0, 33.0, 13.0, 23.0, 27.0, 19.0, 12.0, 12.0, 12.0, 11.0, 21.0, 13.0, 22.0, 10.0, 23.0, 10.0, 43.0, 17.0, 16.0, 22.0, 17.0, 17.0, 28.0, 46.0, 23.0, 12.0, 16.0, 32.0, 28.0, 36.0, 27.0, 24.0, 20.0, 16.0, 21.0, 11.0, 29.0, 10.0, 18.0, 11.0, 20.0, 12.0, 12.0, 18.0, 36.0, 15.0, 30.0, 19.0, 22.0, 19.0, 16.0, 27.0, 11.0, 13.0, 26.0, 25.0, 14.0, 21.0, 32.0, 18.0, 24.0, 27.0, 35.0, 34.0, 22.0, 12.0, 27.0, 17.0, 13.0, 15.0, 26.0, 18.0, 14.0, 11.0, 17.0, 28.0, 18.0, 13.0, 22.0, 40.0, 32.0, 23.0, 12.0, 13.0, 13.0, 21.0, 19.0, 17.0, 21.0, 14.0, 24.0, 35.0, 13.0, 24.0, 27.0, 13.0, 42.0, 18.0, 17.0, 15.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.22625293080165929), 'mean_inference_ms': np.float64(0.7096701957599282), 'mean_action_processing_ms': np.float64(0.08426176818499764), 'mean_env_wait_ms': np.float64(0.04297801892036754), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004438322943610114), 'StateBufferConnector_ms': np.float64(0.0033521652221679688), 'ViewRequirementAgentConnector_ms': np.float64(0.08593920114878062)}, 'num_episodes': 185, 'episode_return_max': 66.0, 'episode_return_min': 10.0, 'episode_return_mean': np.float64(21.52972972972973), 'episodes_this_iter': 185}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(5.515938), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(0.0034043997621444467), 'total_loss': np.float64(5.423406871159871), 'policy_loss': np.float64(-0.05221704922175856), 'vf_loss': np.float64(5.468204048884812), 'vf_explained_var': np.float64(0.47720687684192453), 'kl': np.float64(0.03709944877018545), 'entropy': np.float64(0.657301079201442), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,405.17,4000,4000,405.17,2,0,0,4000,"{'cpu_util_percent': np.float64(36.65999999999999), 'ram_util_percent': np.float64(38.28666666666667)}","{'training_iteration_time_ms': 9872.408, 'restore_workers_time_ms': 0.017, 'training_step_time_ms': 9872.357, 'sample_time_ms': 2233.843, 'load_time_ms': 0.464, 'load_throughput': 8625817.995, 'learn_time_ms': 7633.028, 'learn_throughput': 524.038, 'synch_weights_time_ms': 4.298}"
PPO_CartPole-v1_b6f11_00003,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 65.0, 'episode_reward_min': 9.0, 'episode_reward_mean': np.float64(21.63586956521739), 'episode_len_mean': np.float64(21.63586956521739), 'episode_media': {}, 'episodes_timesteps_total': 3981, 'policy_reward_min': {'default_policy': np.float64(9.0)}, 'policy_reward_max': {'default_policy': np.float64(65.0)}, 'policy_reward_mean': {'default_policy': np.float64(21.63586956521739)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [13.0, 15.0, 16.0, 21.0, 20.0, 36.0, 24.0, 35.0, 15.0, 15.0, 40.0, 10.0, 26.0, 33.0, 33.0, 16.0, 24.0, 21.0, 10.0, 26.0, 29.0, 23.0, 10.0, 9.0, 27.0, 20.0, 31.0, 21.0, 44.0, 31.0, 16.0, 11.0, 10.0, 17.0, 26.0, 31.0, 22.0, 14.0, 17.0, 45.0, 23.0, 56.0, 35.0, 17.0, 46.0, 40.0, 16.0, 15.0, 25.0, 31.0, 65.0, 33.0, 15.0, 17.0, 34.0, 15.0, 22.0, 10.0, 17.0, 13.0, 28.0, 13.0, 21.0, 36.0, 11.0, 10.0, 15.0, 39.0, 13.0, 9.0, 11.0, 17.0, 17.0, 12.0, 18.0, 20.0, 12.0, 27.0, 23.0, 32.0, 12.0, 16.0, 12.0, 33.0, 15.0, 14.0, 15.0, 19.0, 26.0, 32.0, 44.0, 21.0, 21.0, 26.0, 29.0, 21.0, 30.0, 20.0, 19.0, 28.0, 22.0, 12.0, 10.0, 11.0, 17.0, 13.0, 20.0, 12.0, 44.0, 25.0, 17.0, 15.0, 23.0, 13.0, 19.0, 15.0, 14.0, 17.0, 19.0, 16.0, 22.0, 9.0, 19.0, 31.0, 16.0, 33.0, 25.0, 9.0, 12.0, 18.0, 32.0, 22.0, 20.0, 9.0, 11.0, 53.0, 25.0, 21.0, 19.0, 21.0, 17.0, 42.0, 15.0, 18.0, 13.0, 16.0, 14.0, 17.0, 36.0, 35.0, 24.0, 23.0, 12.0, 11.0, 12.0, 26.0, 32.0, 18.0, 13.0, 19.0, 40.0, 11.0, 12.0, 28.0, 18.0, 24.0, 16.0, 19.0, 24.0, 15.0, 12.0, 18.0, 36.0, 12.0, 60.0, 15.0, 21.0, 35.0, 23.0, 9.0, 13.0, 12.0, 19.0, 10.0], 'episode_lengths': [13, 15, 16, 21, 20, 36, 24, 35, 15, 15, 40, 10, 26, 33, 33, 16, 24, 21, 10, 26, 29, 23, 10, 9, 27, 20, 31, 21, 44, 31, 16, 11, 10, 17, 26, 31, 22, 14, 17, 45, 23, 56, 35, 17, 46, 40, 16, 15, 25, 31, 65, 33, 15, 17, 34, 15, 22, 10, 17, 13, 28, 13, 21, 36, 11, 10, 15, 39, 13, 9, 11, 17, 17, 12, 18, 20, 12, 27, 23, 32, 12, 16, 12, 33, 15, 14, 15, 19, 26, 32, 44, 21, 21, 26, 29, 21, 30, 20, 19, 28, 22, 12, 10, 11, 17, 13, 20, 12, 44, 25, 17, 15, 23, 13, 19, 15, 14, 17, 19, 16, 22, 9, 19, 31, 16, 33, 25, 9, 12, 18, 32, 22, 20, 9, 11, 53, 25, 21, 19, 21, 17, 42, 15, 18, 13, 16, 14, 17, 36, 35, 24, 23, 12, 11, 12, 26, 32, 18, 13, 19, 40, 11, 12, 28, 18, 24, 16, 19, 24, 15, 12, 18, 36, 12, 60, 15, 21, 35, 23, 9, 13, 12, 19, 10], 'policy_default_policy_reward': [13.0, 15.0, 16.0, 21.0, 20.0, 36.0, 24.0, 35.0, 15.0, 15.0, 40.0, 10.0, 26.0, 33.0, 33.0, 16.0, 24.0, 21.0, 10.0, 26.0, 29.0, 23.0, 10.0, 9.0, 27.0, 20.0, 31.0, 21.0, 44.0, 31.0, 16.0, 11.0, 10.0, 17.0, 26.0, 31.0, 22.0, 14.0, 17.0, 45.0, 23.0, 56.0, 35.0, 17.0, 46.0, 40.0, 16.0, 15.0, 25.0, 31.0, 65.0, 33.0, 15.0, 17.0, 34.0, 15.0, 22.0, 10.0, 17.0, 13.0, 28.0, 13.0, 21.0, 36.0, 11.0, 10.0, 15.0, 39.0, 13.0, 9.0, 11.0, 17.0, 17.0, 12.0, 18.0, 20.0, 12.0, 27.0, 23.0, 32.0, 12.0, 16.0, 12.0, 33.0, 15.0, 14.0, 15.0, 19.0, 26.0, 32.0, 44.0, 21.0, 21.0, 26.0, 29.0, 21.0, 30.0, 20.0, 19.0, 28.0, 22.0, 12.0, 10.0, 11.0, 17.0, 13.0, 20.0, 12.0, 44.0, 25.0, 17.0, 15.0, 23.0, 13.0, 19.0, 15.0, 14.0, 17.0, 19.0, 16.0, 22.0, 9.0, 19.0, 31.0, 16.0, 33.0, 25.0, 9.0, 12.0, 18.0, 32.0, 22.0, 20.0, 9.0, 11.0, 53.0, 25.0, 21.0, 19.0, 21.0, 17.0, 42.0, 15.0, 18.0, 13.0, 16.0, 14.0, 17.0, 36.0, 35.0, 24.0, 23.0, 12.0, 11.0, 12.0, 26.0, 32.0, 18.0, 13.0, 19.0, 40.0, 11.0, 12.0, 28.0, 18.0, 24.0, 16.0, 19.0, 24.0, 15.0, 12.0, 18.0, 36.0, 12.0, 60.0, 15.0, 21.0, 35.0, 23.0, 9.0, 13.0, 12.0, 19.0, 10.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.2280622464935029), 'mean_inference_ms': np.float64(0.7105704337954871), 'mean_action_processing_ms': np.float64(0.08417534472809059), 'mean_env_wait_ms': np.float64(0.04354904652874787), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004236076189124066), 'StateBufferConnector_ms': np.float64(0.0035742054814877715), 'ViewRequirementAgentConnector_ms': np.float64(0.0867040260978367)}, 'num_episodes': 184, 'episode_return_max': 65.0, 'episode_return_min': 9.0, 'episode_return_mean': np.float64(21.63586956521739), 'episodes_this_iter': 184}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(0.77927935), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(0.00018428008605071825), 'total_loss': np.float64(8.761600986603767), 'policy_loss': np.float64(-0.048131551242042936), 'vf_loss': np.float64(8.803909748344012), 'vf_explained_var': np.float64(-0.08611863915638257), 'kl': np.float64(0.02911387098221041), 'entropy': np.float64(0.6642696369078851), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,404.63,4000,4000,404.63,2,0,0,4000,"{'cpu_util_percent': np.float64(36.64666666666666), 'ram_util_percent': np.float64(38.48666666666667)}","{'training_iteration_time_ms': 9885.577, 'restore_workers_time_ms': 0.023, 'training_step_time_ms': 9885.515, 'sample_time_ms': 2178.955, 'load_time_ms': 0.498, 'load_throughput': 8035065.134, 'learn_time_ms': 7701.071, 'learn_throughput': 519.408, 'synch_weights_time_ms': 4.291}"
PPO_CartPole-v1_b6f11_00004,40000,"{'num_env_steps_sampled': 40000, 'num_env_steps_trained': 40000, 'num_agent_steps_sampled': 40000, 'num_agent_steps_trained': 40000}",{},"{'episode_reward_max': 500.0, 'episode_reward_min': 17.0, 'episode_reward_mean': np.float64(315.21), 'episode_len_mean': np.float64(315.21), 'episode_media': {}, 'episodes_timesteps_total': 31521, 'policy_reward_min': {'default_policy': np.float64(17.0)}, 'policy_reward_max': {'default_policy': np.float64(500.0)}, 'policy_reward_mean': {'default_policy': np.float64(315.21)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [57.0, 34.0, 27.0, 81.0, 80.0, 196.0, 198.0, 108.0, 304.0, 69.0, 240.0, 343.0, 155.0, 174.0, 57.0, 167.0, 211.0, 180.0, 31.0, 62.0, 175.0, 189.0, 17.0, 80.0, 68.0, 200.0, 91.0, 138.0, 198.0, 139.0, 258.0, 224.0, 373.0, 214.0, 155.0, 195.0, 311.0, 252.0, 306.0, 346.0, 209.0, 180.0, 284.0, 254.0, 26.0, 282.0, 280.0, 500.0, 297.0, 384.0, 492.0, 500.0, 492.0, 500.0, 421.0, 500.0, 500.0, 398.0, 500.0, 500.0, 500.0, 344.0, 500.0, 356.0, 352.0, 500.0, 461.0, 362.0, 500.0, 500.0, 267.0, 352.0, 394.0, 500.0, 500.0, 423.0, 500.0, 355.0, 500.0, 473.0, 500.0, 300.0, 500.0, 500.0, 270.0, 500.0, 500.0, 500.0, 500.0, 479.0, 500.0, 337.0, 500.0, 326.0, 500.0, 418.0, 366.0, 292.0, 461.0, 431.0], 'episode_lengths': [57, 34, 27, 81, 80, 196, 198, 108, 304, 69, 240, 343, 155, 174, 57, 167, 211, 180, 31, 62, 175, 189, 17, 80, 68, 200, 91, 138, 198, 139, 258, 224, 373, 214, 155, 195, 311, 252, 306, 346, 209, 180, 284, 254, 26, 282, 280, 500, 297, 384, 492, 500, 492, 500, 421, 500, 500, 398, 500, 500, 500, 344, 500, 356, 352, 500, 461, 362, 500, 500, 267, 352, 394, 500, 500, 423, 500, 355, 500, 473, 500, 300, 500, 500, 270, 500, 500, 500, 500, 479, 500, 337, 500, 326, 500, 418, 366, 292, 461, 431], 'policy_default_policy_reward': [57.0, 34.0, 27.0, 81.0, 80.0, 196.0, 198.0, 108.0, 304.0, 69.0, 240.0, 343.0, 155.0, 174.0, 57.0, 167.0, 211.0, 180.0, 31.0, 62.0, 175.0, 189.0, 17.0, 80.0, 68.0, 200.0, 91.0, 138.0, 198.0, 139.0, 258.0, 224.0, 373.0, 214.0, 155.0, 195.0, 311.0, 252.0, 306.0, 346.0, 209.0, 180.0, 284.0, 254.0, 26.0, 282.0, 280.0, 500.0, 297.0, 384.0, 492.0, 500.0, 492.0, 500.0, 421.0, 500.0, 500.0, 398.0, 500.0, 500.0, 500.0, 344.0, 500.0, 356.0, 352.0, 500.0, 461.0, 362.0, 500.0, 500.0, 267.0, 352.0, 394.0, 500.0, 500.0, 423.0, 500.0, 355.0, 500.0, 473.0, 500.0, 300.0, 500.0, 500.0, 270.0, 500.0, 500.0, 500.0, 500.0, 479.0, 500.0, 337.0, 500.0, 326.0, 500.0, 418.0, 366.0, 292.0, 461.0, 431.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.19759328999861733), 'mean_inference_ms': np.float64(0.6760221125083454), 'mean_action_processing_ms': np.float64(0.08034020241912142), 'mean_env_wait_ms': np.float64(0.04132095149512996), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004463672637939453), 'StateBufferConnector_ms': np.float64(0.003154277801513672), 'ViewRequirementAgentConnector_ms': np.float64(0.08138275146484375)}, 'num_episodes': 9, 'episode_return_max': 500.0, 'episode_return_min': 17.0, 'episode_return_mean': np.float64(315.21), 'episodes_this_iter': 9}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(2.7765515), 'cur_kl_coeff': np.float64(0.22500000000000006), 'cur_lr': np.float64(0.000830229371395691), 'total_loss': np.float64(0.10098183428948765), 'policy_loss': np.float64(-0.004712139101078113), 'vf_loss': np.float64(0.10389814629357833), 'vf_explained_var': np.float64(-0.5651304345618012), 'kl': np.float64(0.007981459990265498), 'entropy': np.float64(0.49980736708128326), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(8835.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 40000, 'num_env_steps_trained': 40000, 'num_agent_steps_sampled': 40000, 'num_agent_steps_trained': 40000}",40000,40000,40000,40000,40000,4000,451.776,40000,4000,451.776,2,0,0,4000,"{'cpu_util_percent': np.float64(8.05), 'ram_util_percent': np.float64(20.599999999999998)}","{'training_iteration_time_ms': 9148.177, 'restore_workers_time_ms': 0.018, 'training_step_time_ms': 9148.129, 'sample_time_ms': 1983.514, 'load_time_ms': 0.315, 'load_throughput': 12680232.787, 'learn_time_ms': 7159.827, 'learn_throughput': 558.673, 'synch_weights_time_ms': 3.971}"
PPO_CartPole-v1_b6f11_00005,12000,"{'num_env_steps_sampled': 12000, 'num_env_steps_trained': 12000, 'num_agent_steps_sampled': 12000, 'num_agent_steps_trained': 12000}",{},"{'episode_reward_max': 11.0, 'episode_reward_min': 8.0, 'episode_reward_mean': np.float64(9.4), 'episode_len_mean': np.float64(9.4), 'episode_media': {}, 'episodes_timesteps_total': 3995, 'policy_reward_min': {'default_policy': np.float64(8.0)}, 'policy_reward_max': {'default_policy': np.float64(11.0)}, 'policy_reward_mean': {'default_policy': np.float64(9.4)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [10.0, 8.0, 9.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 8.0, 10.0, 9.0, 10.0, 11.0, 8.0, 10.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 8.0, 8.0, 9.0, 10.0, 9.0, 10.0, 11.0, 10.0, 9.0, 8.0, 9.0, 9.0, 8.0, 10.0, 8.0, 8.0, 9.0, 10.0, 10.0, 10.0, 10.0, 8.0, 9.0, 10.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 8.0, 10.0, 10.0, 10.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 8.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 11.0, 10.0, 9.0, 9.0, 10.0, 10.0, 9.0, 10.0, 8.0, 9.0, 9.0, 10.0, 8.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 11.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 10.0, 10.0, 9.0, 10.0, 8.0, 8.0, 8.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 8.0, 10.0, 11.0, 10.0, 10.0, 8.0, 8.0, 10.0, 8.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 10.0, 8.0, 9.0, 8.0, 9.0, 10.0, 8.0, 9.0, 9.0, 9.0, 10.0, 10.0, 9.0, 8.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 8.0, 10.0, 10.0, 8.0, 9.0, 9.0, 9.0, 11.0, 9.0, 10.0, 10.0, 8.0, 10.0, 9.0, 9.0, 8.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0, 11.0, 10.0, 10.0, 10.0, 8.0, 10.0, 9.0, 10.0, 8.0, 10.0, 9.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 11.0, 9.0, 9.0, 9.0, 9.0, 8.0, 9.0, 8.0, 8.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 8.0, 8.0, 9.0, 10.0, 10.0, 10.0, 10.0, 8.0, 10.0, 9.0, 10.0, 10.0, 8.0, 10.0, 8.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 10.0, 9.0, 8.0, 10.0, 9.0, 10.0, 9.0, 9.0, 8.0, 10.0, 8.0, 9.0, 10.0, 10.0, 8.0, 9.0, 10.0, 10.0, 8.0, 10.0, 9.0, 9.0, 10.0, 11.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 9.0, 10.0, 9.0, 9.0, 8.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 11.0, 10.0, 10.0, 9.0, 9.0, 8.0, 10.0, 10.0, 9.0, 10.0, 8.0, 10.0, 10.0, 8.0, 9.0, 10.0, 10.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 11.0, 10.0, 10.0, 9.0, 10.0, 9.0, 8.0, 9.0, 11.0, 10.0, 10.0, 10.0, 10.0, 10.0, 9.0, 10.0, 11.0, 10.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 9.0, 11.0, 11.0, 11.0, 9.0, 8.0, 10.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 10.0, 8.0, 9.0, 9.0, 10.0, 10.0, 8.0, 10.0, 8.0, 11.0, 9.0, 10.0, 10.0, 8.0, 10.0, 10.0, 9.0, 8.0, 9.0, 10.0, 10.0, 9.0, 11.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 8.0], 'episode_lengths': [10, 8, 9, 9, 9, 10, 10, 9, 9, 9, 9, 10, 10, 8, 10, 9, 10, 11, 8, 10, 9, 9, 10, 9, 10, 10, 9, 10, 10, 9, 9, 9, 10, 10, 10, 10, 8, 8, 9, 10, 9, 10, 11, 10, 9, 8, 9, 9, 8, 10, 8, 8, 9, 10, 10, 10, 10, 8, 9, 10, 10, 10, 9, 10, 9, 10, 8, 10, 10, 10, 10, 10, 9, 10, 9, 10, 8, 10, 10, 9, 10, 9, 10, 9, 10, 10, 10, 9, 10, 9, 10, 10, 9, 11, 10, 9, 9, 10, 10, 9, 10, 8, 9, 9, 10, 8, 9, 10, 9, 9, 10, 10, 9, 9, 11, 10, 9, 10, 10, 9, 10, 9, 9, 9, 10, 10, 9, 10, 8, 8, 8, 9, 9, 10, 10, 9, 9, 8, 10, 11, 10, 10, 8, 8, 10, 8, 9, 9, 9, 10, 9, 10, 10, 10, 9, 10, 10, 9, 10, 10, 9, 9, 10, 8, 9, 8, 9, 10, 8, 9, 9, 9, 10, 10, 9, 8, 9, 9, 10, 10, 9, 9, 8, 10, 10, 8, 9, 9, 9, 11, 9, 10, 10, 8, 10, 9, 9, 8, 10, 9, 10, 10, 10, 9, 11, 10, 10, 10, 8, 10, 9, 10, 8, 10, 9, 10, 10, 10, 10, 10, 10, 10, 11, 9, 9, 9, 9, 8, 9, 8, 8, 10, 9, 10, 10, 9, 9, 9, 8, 8, 9, 10, 10, 10, 10, 8, 10, 9, 10, 10, 8, 10, 8, 9, 10, 9, 9, 10, 10, 10, 10, 10, 9, 8, 10, 9, 10, 9, 9, 8, 10, 8, 9, 10, 10, 8, 9, 10, 10, 8, 10, 9, 9, 10, 11, 9, 10, 10, 9, 9, 9, 10, 9, 9, 9, 9, 10, 10, 10, 9, 10, 9, 9, 8, 9, 10, 9, 10, 9, 9, 11, 10, 10, 9, 9, 8, 10, 10, 9, 10, 8, 10, 10, 8, 9, 10, 10, 9, 9, 10, 9, 10, 9, 9, 11, 10, 10, 9, 10, 9, 8, 9, 11, 10, 10, 10, 10, 10, 9, 10, 11, 10, 9, 9, 9, 10, 9, 10, 10, 9, 9, 9, 9, 9, 10, 9, 9, 10, 10, 9, 10, 9, 9, 10, 10, 9, 11, 11, 11, 9, 8, 10, 9, 10, 9, 9, 10, 9, 10, 10, 9, 10, 10, 9, 10, 10, 8, 9, 9, 10, 10, 8, 10, 8, 11, 9, 10, 10, 8, 10, 10, 9, 8, 9, 10, 10, 9, 11, 9, 9, 9, 10, 10, 10, 8], 'policy_default_policy_reward': [10.0, 8.0, 9.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 8.0, 10.0, 9.0, 10.0, 11.0, 8.0, 10.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 8.0, 8.0, 9.0, 10.0, 9.0, 10.0, 11.0, 10.0, 9.0, 8.0, 9.0, 9.0, 8.0, 10.0, 8.0, 8.0, 9.0, 10.0, 10.0, 10.0, 10.0, 8.0, 9.0, 10.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 8.0, 10.0, 10.0, 10.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 8.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 11.0, 10.0, 9.0, 9.0, 10.0, 10.0, 9.0, 10.0, 8.0, 9.0, 9.0, 10.0, 8.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 11.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 10.0, 10.0, 9.0, 10.0, 8.0, 8.0, 8.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 8.0, 10.0, 11.0, 10.0, 10.0, 8.0, 8.0, 10.0, 8.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 10.0, 8.0, 9.0, 8.0, 9.0, 10.0, 8.0, 9.0, 9.0, 9.0, 10.0, 10.0, 9.0, 8.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 8.0, 10.0, 10.0, 8.0, 9.0, 9.0, 9.0, 11.0, 9.0, 10.0, 10.0, 8.0, 10.0, 9.0, 9.0, 8.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0, 11.0, 10.0, 10.0, 10.0, 8.0, 10.0, 9.0, 10.0, 8.0, 10.0, 9.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 11.0, 9.0, 9.0, 9.0, 9.0, 8.0, 9.0, 8.0, 8.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 8.0, 8.0, 9.0, 10.0, 10.0, 10.0, 10.0, 8.0, 10.0, 9.0, 10.0, 10.0, 8.0, 10.0, 8.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 10.0, 9.0, 8.0, 10.0, 9.0, 10.0, 9.0, 9.0, 8.0, 10.0, 8.0, 9.0, 10.0, 10.0, 8.0, 9.0, 10.0, 10.0, 8.0, 10.0, 9.0, 9.0, 10.0, 11.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 9.0, 10.0, 9.0, 9.0, 8.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 11.0, 10.0, 10.0, 9.0, 9.0, 8.0, 10.0, 10.0, 9.0, 10.0, 8.0, 10.0, 10.0, 8.0, 9.0, 10.0, 10.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 11.0, 10.0, 10.0, 9.0, 10.0, 9.0, 8.0, 9.0, 11.0, 10.0, 10.0, 10.0, 10.0, 10.0, 9.0, 10.0, 11.0, 10.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 9.0, 11.0, 11.0, 11.0, 9.0, 8.0, 10.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 10.0, 8.0, 9.0, 9.0, 10.0, 10.0, 8.0, 10.0, 8.0, 11.0, 9.0, 10.0, 10.0, 8.0, 10.0, 10.0, 9.0, 8.0, 9.0, 10.0, 10.0, 9.0, 11.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 8.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.2458848259330914), 'mean_inference_ms': np.float64(0.694349102014019), 'mean_action_processing_ms': np.float64(0.08114332429080856), 'mean_env_wait_ms': np.float64(0.040843391270755626), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004269599914550781), 'StateBufferConnector_ms': np.float64(0.004369062535903033), 'ViewRequirementAgentConnector_ms': np.float64(0.0846407834221335)}, 'num_episodes': 425, 'episode_return_max': 11.0, 'episode_return_min': 8.0, 'episode_return_mean': np.float64(9.4), 'episodes_this_iter': 425}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(0.0), 'cur_kl_coeff': np.float64(0.4500000000000001), 'cur_lr': np.float64(0.8971711294551244), 'total_loss': np.float64(9.994943341901225), 'policy_loss': np.float64(-0.005056699107010518), 'vf_loss': np.float64(10.0), 'vf_explained_var': np.float64(-0.9888143116428006), 'kl': np.float64(0.0), 'entropy': np.float64(0.0), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(2325.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 12000, 'num_env_steps_trained': 12000, 'num_agent_steps_sampled': 12000, 'num_agent_steps_trained': 12000}",12000,12000,12000,12000,12000,4000,309.203,12000,4000,309.203,2,0,0,4000,"{'cpu_util_percent': np.float64(18.916666666666664), 'ram_util_percent': np.float64(27.93333333333333)}","{'training_iteration_time_ms': 12298.415, 'restore_workers_time_ms': 0.023, 'training_step_time_ms': 12298.361, 'sample_time_ms': 2214.545, 'load_time_ms': 0.346, 'load_throughput': 11573154.288, 'learn_time_ms': 10079.012, 'learn_throughput': 396.864, 'synch_weights_time_ms': 3.946}"
PPO_CartPole-v1_b6f11_00006,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 81.0, 'episode_reward_min': 9.0, 'episode_reward_mean': np.float64(21.311827956989248), 'episode_len_mean': np.float64(21.311827956989248), 'episode_media': {}, 'episodes_timesteps_total': 3964, 'policy_reward_min': {'default_policy': np.float64(9.0)}, 'policy_reward_max': {'default_policy': np.float64(81.0)}, 'policy_reward_mean': {'default_policy': np.float64(21.311827956989248)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [26.0, 29.0, 15.0, 18.0, 11.0, 23.0, 14.0, 18.0, 18.0, 13.0, 66.0, 24.0, 16.0, 23.0, 13.0, 12.0, 12.0, 20.0, 10.0, 10.0, 26.0, 11.0, 20.0, 27.0, 17.0, 17.0, 30.0, 15.0, 20.0, 12.0, 24.0, 20.0, 15.0, 14.0, 13.0, 21.0, 18.0, 16.0, 30.0, 12.0, 15.0, 21.0, 22.0, 16.0, 21.0, 21.0, 25.0, 11.0, 81.0, 27.0, 35.0, 29.0, 11.0, 12.0, 14.0, 22.0, 28.0, 18.0, 33.0, 38.0, 18.0, 46.0, 29.0, 11.0, 26.0, 16.0, 13.0, 43.0, 17.0, 16.0, 48.0, 14.0, 30.0, 16.0, 19.0, 37.0, 24.0, 27.0, 47.0, 20.0, 18.0, 31.0, 13.0, 23.0, 19.0, 9.0, 30.0, 17.0, 15.0, 18.0, 22.0, 38.0, 29.0, 11.0, 19.0, 43.0, 14.0, 13.0, 16.0, 9.0, 13.0, 12.0, 28.0, 25.0, 18.0, 31.0, 25.0, 28.0, 11.0, 25.0, 18.0, 38.0, 16.0, 14.0, 12.0, 13.0, 58.0, 18.0, 9.0, 23.0, 22.0, 30.0, 42.0, 12.0, 31.0, 24.0, 14.0, 21.0, 16.0, 14.0, 29.0, 15.0, 11.0, 10.0, 20.0, 18.0, 16.0, 15.0, 16.0, 50.0, 18.0, 15.0, 14.0, 9.0, 14.0, 18.0, 32.0, 10.0, 19.0, 33.0, 29.0, 13.0, 22.0, 14.0, 38.0, 15.0, 10.0, 14.0, 27.0, 15.0, 10.0, 35.0, 18.0, 30.0, 17.0, 13.0, 18.0, 13.0, 32.0, 32.0, 28.0, 11.0, 17.0, 12.0, 14.0, 23.0, 14.0, 46.0, 11.0, 22.0, 19.0, 20.0, 13.0, 18.0, 33.0, 12.0], 'episode_lengths': [26, 29, 15, 18, 11, 23, 14, 18, 18, 13, 66, 24, 16, 23, 13, 12, 12, 20, 10, 10, 26, 11, 20, 27, 17, 17, 30, 15, 20, 12, 24, 20, 15, 14, 13, 21, 18, 16, 30, 12, 15, 21, 22, 16, 21, 21, 25, 11, 81, 27, 35, 29, 11, 12, 14, 22, 28, 18, 33, 38, 18, 46, 29, 11, 26, 16, 13, 43, 17, 16, 48, 14, 30, 16, 19, 37, 24, 27, 47, 20, 18, 31, 13, 23, 19, 9, 30, 17, 15, 18, 22, 38, 29, 11, 19, 43, 14, 13, 16, 9, 13, 12, 28, 25, 18, 31, 25, 28, 11, 25, 18, 38, 16, 14, 12, 13, 58, 18, 9, 23, 22, 30, 42, 12, 31, 24, 14, 21, 16, 14, 29, 15, 11, 10, 20, 18, 16, 15, 16, 50, 18, 15, 14, 9, 14, 18, 32, 10, 19, 33, 29, 13, 22, 14, 38, 15, 10, 14, 27, 15, 10, 35, 18, 30, 17, 13, 18, 13, 32, 32, 28, 11, 17, 12, 14, 23, 14, 46, 11, 22, 19, 20, 13, 18, 33, 12], 'policy_default_policy_reward': [26.0, 29.0, 15.0, 18.0, 11.0, 23.0, 14.0, 18.0, 18.0, 13.0, 66.0, 24.0, 16.0, 23.0, 13.0, 12.0, 12.0, 20.0, 10.0, 10.0, 26.0, 11.0, 20.0, 27.0, 17.0, 17.0, 30.0, 15.0, 20.0, 12.0, 24.0, 20.0, 15.0, 14.0, 13.0, 21.0, 18.0, 16.0, 30.0, 12.0, 15.0, 21.0, 22.0, 16.0, 21.0, 21.0, 25.0, 11.0, 81.0, 27.0, 35.0, 29.0, 11.0, 12.0, 14.0, 22.0, 28.0, 18.0, 33.0, 38.0, 18.0, 46.0, 29.0, 11.0, 26.0, 16.0, 13.0, 43.0, 17.0, 16.0, 48.0, 14.0, 30.0, 16.0, 19.0, 37.0, 24.0, 27.0, 47.0, 20.0, 18.0, 31.0, 13.0, 23.0, 19.0, 9.0, 30.0, 17.0, 15.0, 18.0, 22.0, 38.0, 29.0, 11.0, 19.0, 43.0, 14.0, 13.0, 16.0, 9.0, 13.0, 12.0, 28.0, 25.0, 18.0, 31.0, 25.0, 28.0, 11.0, 25.0, 18.0, 38.0, 16.0, 14.0, 12.0, 13.0, 58.0, 18.0, 9.0, 23.0, 22.0, 30.0, 42.0, 12.0, 31.0, 24.0, 14.0, 21.0, 16.0, 14.0, 29.0, 15.0, 11.0, 10.0, 20.0, 18.0, 16.0, 15.0, 16.0, 50.0, 18.0, 15.0, 14.0, 9.0, 14.0, 18.0, 32.0, 10.0, 19.0, 33.0, 29.0, 13.0, 22.0, 14.0, 38.0, 15.0, 10.0, 14.0, 27.0, 15.0, 10.0, 35.0, 18.0, 30.0, 17.0, 13.0, 18.0, 13.0, 32.0, 32.0, 28.0, 11.0, 17.0, 12.0, 14.0, 23.0, 14.0, 46.0, 11.0, 22.0, 19.0, 20.0, 13.0, 18.0, 33.0, 12.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.22398119664154148), 'mean_inference_ms': np.float64(0.7003584945990375), 'mean_action_processing_ms': np.float64(0.08302012350974683), 'mean_env_wait_ms': np.float64(0.04221489381391201), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004281279861286122), 'StateBufferConnector_ms': np.float64(0.0034584793993221816), 'ViewRequirementAgentConnector_ms': np.float64(0.08334767433904833)}, 'num_episodes': 186, 'episode_return_max': 81.0, 'episode_return_min': 9.0, 'episode_return_mean': np.float64(21.311827956989248), 'episodes_this_iter': 186}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(3.3784993), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(0.0008120417938303223), 'total_loss': np.float64(6.346510570792741), 'policy_loss': np.float64(-0.05784688922927104), 'vf_loss': np.float64(6.396882474807001), 'vf_explained_var': np.float64(0.2685656417441624), 'kl': np.float64(0.03737488245682304), 'entropy': np.float64(0.6564255635584554), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,409.995,4000,4000,409.995,2,0,0,4000,"{'cpu_util_percent': np.float64(36.74285714285714), 'ram_util_percent': np.float64(38.707142857142856)}","{'training_iteration_time_ms': 9756.215, 'restore_workers_time_ms': 0.019, 'training_step_time_ms': 9756.165, 'sample_time_ms': 2135.098, 'load_time_ms': 0.46, 'load_throughput': 8692858.031, 'learn_time_ms': 7616.001, 'learn_throughput': 525.21, 'synch_weights_time_ms': 3.959}"
PPO_CartPole-v1_b6f11_00007,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 77.0, 'episode_reward_min': 9.0, 'episode_reward_mean': np.float64(20.895287958115183), 'episode_len_mean': np.float64(20.895287958115183), 'episode_media': {}, 'episodes_timesteps_total': 3991, 'policy_reward_min': {'default_policy': np.float64(9.0)}, 'policy_reward_max': {'default_policy': np.float64(77.0)}, 'policy_reward_mean': {'default_policy': np.float64(20.895287958115183)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [19.0, 23.0, 21.0, 28.0, 10.0, 23.0, 22.0, 9.0, 9.0, 11.0, 17.0, 27.0, 19.0, 15.0, 27.0, 17.0, 16.0, 34.0, 50.0, 12.0, 15.0, 25.0, 31.0, 11.0, 14.0, 21.0, 17.0, 43.0, 15.0, 24.0, 10.0, 17.0, 13.0, 18.0, 12.0, 14.0, 13.0, 14.0, 11.0, 14.0, 43.0, 17.0, 18.0, 27.0, 30.0, 26.0, 26.0, 14.0, 31.0, 16.0, 17.0, 10.0, 35.0, 30.0, 15.0, 16.0, 11.0, 16.0, 11.0, 18.0, 13.0, 32.0, 13.0, 19.0, 21.0, 24.0, 37.0, 13.0, 10.0, 17.0, 45.0, 14.0, 22.0, 19.0, 19.0, 67.0, 29.0, 15.0, 22.0, 13.0, 13.0, 22.0, 11.0, 36.0, 18.0, 11.0, 43.0, 27.0, 18.0, 11.0, 21.0, 12.0, 10.0, 13.0, 24.0, 34.0, 15.0, 23.0, 15.0, 9.0, 39.0, 17.0, 77.0, 30.0, 20.0, 14.0, 50.0, 26.0, 11.0, 15.0, 26.0, 20.0, 16.0, 30.0, 13.0, 27.0, 32.0, 12.0, 27.0, 12.0, 22.0, 16.0, 24.0, 14.0, 14.0, 21.0, 17.0, 24.0, 12.0, 26.0, 17.0, 26.0, 27.0, 14.0, 48.0, 21.0, 19.0, 19.0, 31.0, 10.0, 24.0, 16.0, 15.0, 49.0, 12.0, 18.0, 16.0, 24.0, 26.0, 13.0, 34.0, 12.0, 14.0, 20.0, 15.0, 18.0, 11.0, 25.0, 23.0, 12.0, 15.0, 19.0, 30.0, 16.0, 40.0, 22.0, 17.0, 39.0, 14.0, 27.0, 22.0, 14.0, 20.0, 17.0, 15.0, 17.0, 15.0, 15.0, 11.0, 14.0, 42.0, 20.0, 11.0, 20.0, 21.0, 13.0, 14.0, 29.0, 19.0, 19.0, 13.0], 'episode_lengths': [19, 23, 21, 28, 10, 23, 22, 9, 9, 11, 17, 27, 19, 15, 27, 17, 16, 34, 50, 12, 15, 25, 31, 11, 14, 21, 17, 43, 15, 24, 10, 17, 13, 18, 12, 14, 13, 14, 11, 14, 43, 17, 18, 27, 30, 26, 26, 14, 31, 16, 17, 10, 35, 30, 15, 16, 11, 16, 11, 18, 13, 32, 13, 19, 21, 24, 37, 13, 10, 17, 45, 14, 22, 19, 19, 67, 29, 15, 22, 13, 13, 22, 11, 36, 18, 11, 43, 27, 18, 11, 21, 12, 10, 13, 24, 34, 15, 23, 15, 9, 39, 17, 77, 30, 20, 14, 50, 26, 11, 15, 26, 20, 16, 30, 13, 27, 32, 12, 27, 12, 22, 16, 24, 14, 14, 21, 17, 24, 12, 26, 17, 26, 27, 14, 48, 21, 19, 19, 31, 10, 24, 16, 15, 49, 12, 18, 16, 24, 26, 13, 34, 12, 14, 20, 15, 18, 11, 25, 23, 12, 15, 19, 30, 16, 40, 22, 17, 39, 14, 27, 22, 14, 20, 17, 15, 17, 15, 15, 11, 14, 42, 20, 11, 20, 21, 13, 14, 29, 19, 19, 13], 'policy_default_policy_reward': [19.0, 23.0, 21.0, 28.0, 10.0, 23.0, 22.0, 9.0, 9.0, 11.0, 17.0, 27.0, 19.0, 15.0, 27.0, 17.0, 16.0, 34.0, 50.0, 12.0, 15.0, 25.0, 31.0, 11.0, 14.0, 21.0, 17.0, 43.0, 15.0, 24.0, 10.0, 17.0, 13.0, 18.0, 12.0, 14.0, 13.0, 14.0, 11.0, 14.0, 43.0, 17.0, 18.0, 27.0, 30.0, 26.0, 26.0, 14.0, 31.0, 16.0, 17.0, 10.0, 35.0, 30.0, 15.0, 16.0, 11.0, 16.0, 11.0, 18.0, 13.0, 32.0, 13.0, 19.0, 21.0, 24.0, 37.0, 13.0, 10.0, 17.0, 45.0, 14.0, 22.0, 19.0, 19.0, 67.0, 29.0, 15.0, 22.0, 13.0, 13.0, 22.0, 11.0, 36.0, 18.0, 11.0, 43.0, 27.0, 18.0, 11.0, 21.0, 12.0, 10.0, 13.0, 24.0, 34.0, 15.0, 23.0, 15.0, 9.0, 39.0, 17.0, 77.0, 30.0, 20.0, 14.0, 50.0, 26.0, 11.0, 15.0, 26.0, 20.0, 16.0, 30.0, 13.0, 27.0, 32.0, 12.0, 27.0, 12.0, 22.0, 16.0, 24.0, 14.0, 14.0, 21.0, 17.0, 24.0, 12.0, 26.0, 17.0, 26.0, 27.0, 14.0, 48.0, 21.0, 19.0, 19.0, 31.0, 10.0, 24.0, 16.0, 15.0, 49.0, 12.0, 18.0, 16.0, 24.0, 26.0, 13.0, 34.0, 12.0, 14.0, 20.0, 15.0, 18.0, 11.0, 25.0, 23.0, 12.0, 15.0, 19.0, 30.0, 16.0, 40.0, 22.0, 17.0, 39.0, 14.0, 27.0, 22.0, 14.0, 20.0, 17.0, 15.0, 17.0, 15.0, 15.0, 11.0, 14.0, 42.0, 20.0, 11.0, 20.0, 21.0, 13.0, 14.0, 29.0, 19.0, 19.0, 13.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.22562084860595225), 'mean_inference_ms': np.float64(0.7020440844431554), 'mean_action_processing_ms': np.float64(0.08252964188805796), 'mean_env_wait_ms': np.float64(0.0420951681317918), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004446818566447153), 'StateBufferConnector_ms': np.float64(0.003451951511243251), 'ViewRequirementAgentConnector_ms': np.float64(0.0848234635997193)}, 'num_episodes': 191, 'episode_return_max': 77.0, 'episode_return_min': 9.0, 'episode_return_mean': np.float64(20.895287958115183), 'episodes_this_iter': 191}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(9.39041), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(0.0023628301008279525), 'total_loss': np.float64(1.9635839023256814), 'policy_loss': np.float64(-0.054887130516030454), 'vf_loss': np.float64(2.0110506119907545), 'vf_explained_var': np.float64(0.6678159539417554), 'kl': np.float64(0.03710211996515219), 'entropy': np.float64(0.6568742670679605), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,411.391,4000,4000,411.391,2,0,0,4000,"{'cpu_util_percent': np.float64(36.7642857142857), 'ram_util_percent': np.float64(38.714285714285715)}","{'training_iteration_time_ms': 9723.122, 'restore_workers_time_ms': 0.022, 'training_step_time_ms': 9723.066, 'sample_time_ms': 2140.459, 'load_time_ms': 0.443, 'load_throughput': 9034580.506, 'learn_time_ms': 7576.912, 'learn_throughput': 527.92, 'synch_weights_time_ms': 4.51}"
PPO_CartPole-v1_b6f11_00008,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 89.0, 'episode_reward_min': 8.0, 'episode_reward_mean': np.float64(20.957894736842107), 'episode_len_mean': np.float64(20.957894736842107), 'episode_media': {}, 'episodes_timesteps_total': 3982, 'policy_reward_min': {'default_policy': np.float64(8.0)}, 'policy_reward_max': {'default_policy': np.float64(89.0)}, 'policy_reward_mean': {'default_policy': np.float64(20.957894736842107)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [10.0, 22.0, 13.0, 9.0, 9.0, 16.0, 11.0, 28.0, 26.0, 23.0, 12.0, 15.0, 12.0, 17.0, 14.0, 18.0, 33.0, 57.0, 14.0, 13.0, 18.0, 14.0, 15.0, 17.0, 22.0, 12.0, 13.0, 13.0, 19.0, 10.0, 15.0, 24.0, 14.0, 16.0, 26.0, 20.0, 22.0, 31.0, 12.0, 24.0, 21.0, 14.0, 16.0, 10.0, 18.0, 15.0, 22.0, 22.0, 20.0, 18.0, 25.0, 21.0, 26.0, 41.0, 16.0, 33.0, 30.0, 14.0, 24.0, 15.0, 34.0, 12.0, 26.0, 16.0, 17.0, 16.0, 17.0, 8.0, 13.0, 40.0, 11.0, 19.0, 21.0, 11.0, 11.0, 11.0, 89.0, 17.0, 58.0, 27.0, 17.0, 18.0, 41.0, 19.0, 10.0, 27.0, 37.0, 37.0, 11.0, 19.0, 37.0, 30.0, 21.0, 41.0, 15.0, 39.0, 18.0, 18.0, 11.0, 35.0, 10.0, 19.0, 31.0, 15.0, 43.0, 34.0, 20.0, 10.0, 10.0, 16.0, 9.0, 14.0, 29.0, 18.0, 11.0, 13.0, 40.0, 23.0, 17.0, 23.0, 24.0, 52.0, 24.0, 20.0, 14.0, 14.0, 24.0, 15.0, 36.0, 22.0, 10.0, 20.0, 12.0, 11.0, 36.0, 10.0, 11.0, 22.0, 22.0, 25.0, 13.0, 10.0, 12.0, 14.0, 22.0, 28.0, 18.0, 28.0, 13.0, 15.0, 23.0, 23.0, 10.0, 17.0, 19.0, 50.0, 14.0, 16.0, 30.0, 12.0, 21.0, 17.0, 21.0, 36.0, 18.0, 22.0, 22.0, 18.0, 18.0, 12.0, 16.0, 12.0, 17.0, 22.0, 15.0, 29.0, 18.0, 27.0, 29.0, 15.0, 17.0, 22.0, 22.0, 21.0, 22.0, 13.0, 28.0, 28.0, 23.0, 40.0], 'episode_lengths': [10, 22, 13, 9, 9, 16, 11, 28, 26, 23, 12, 15, 12, 17, 14, 18, 33, 57, 14, 13, 18, 14, 15, 17, 22, 12, 13, 13, 19, 10, 15, 24, 14, 16, 26, 20, 22, 31, 12, 24, 21, 14, 16, 10, 18, 15, 22, 22, 20, 18, 25, 21, 26, 41, 16, 33, 30, 14, 24, 15, 34, 12, 26, 16, 17, 16, 17, 8, 13, 40, 11, 19, 21, 11, 11, 11, 89, 17, 58, 27, 17, 18, 41, 19, 10, 27, 37, 37, 11, 19, 37, 30, 21, 41, 15, 39, 18, 18, 11, 35, 10, 19, 31, 15, 43, 34, 20, 10, 10, 16, 9, 14, 29, 18, 11, 13, 40, 23, 17, 23, 24, 52, 24, 20, 14, 14, 24, 15, 36, 22, 10, 20, 12, 11, 36, 10, 11, 22, 22, 25, 13, 10, 12, 14, 22, 28, 18, 28, 13, 15, 23, 23, 10, 17, 19, 50, 14, 16, 30, 12, 21, 17, 21, 36, 18, 22, 22, 18, 18, 12, 16, 12, 17, 22, 15, 29, 18, 27, 29, 15, 17, 22, 22, 21, 22, 13, 28, 28, 23, 40], 'policy_default_policy_reward': [10.0, 22.0, 13.0, 9.0, 9.0, 16.0, 11.0, 28.0, 26.0, 23.0, 12.0, 15.0, 12.0, 17.0, 14.0, 18.0, 33.0, 57.0, 14.0, 13.0, 18.0, 14.0, 15.0, 17.0, 22.0, 12.0, 13.0, 13.0, 19.0, 10.0, 15.0, 24.0, 14.0, 16.0, 26.0, 20.0, 22.0, 31.0, 12.0, 24.0, 21.0, 14.0, 16.0, 10.0, 18.0, 15.0, 22.0, 22.0, 20.0, 18.0, 25.0, 21.0, 26.0, 41.0, 16.0, 33.0, 30.0, 14.0, 24.0, 15.0, 34.0, 12.0, 26.0, 16.0, 17.0, 16.0, 17.0, 8.0, 13.0, 40.0, 11.0, 19.0, 21.0, 11.0, 11.0, 11.0, 89.0, 17.0, 58.0, 27.0, 17.0, 18.0, 41.0, 19.0, 10.0, 27.0, 37.0, 37.0, 11.0, 19.0, 37.0, 30.0, 21.0, 41.0, 15.0, 39.0, 18.0, 18.0, 11.0, 35.0, 10.0, 19.0, 31.0, 15.0, 43.0, 34.0, 20.0, 10.0, 10.0, 16.0, 9.0, 14.0, 29.0, 18.0, 11.0, 13.0, 40.0, 23.0, 17.0, 23.0, 24.0, 52.0, 24.0, 20.0, 14.0, 14.0, 24.0, 15.0, 36.0, 22.0, 10.0, 20.0, 12.0, 11.0, 36.0, 10.0, 11.0, 22.0, 22.0, 25.0, 13.0, 10.0, 12.0, 14.0, 22.0, 28.0, 18.0, 28.0, 13.0, 15.0, 23.0, 23.0, 10.0, 17.0, 19.0, 50.0, 14.0, 16.0, 30.0, 12.0, 21.0, 17.0, 21.0, 36.0, 18.0, 22.0, 22.0, 18.0, 18.0, 12.0, 16.0, 12.0, 17.0, 22.0, 15.0, 29.0, 18.0, 27.0, 29.0, 15.0, 17.0, 22.0, 22.0, 21.0, 22.0, 13.0, 28.0, 28.0, 23.0, 40.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.21710593350673163), 'mean_inference_ms': np.float64(0.6683753539449885), 'mean_action_processing_ms': np.float64(0.0799397911277066), 'mean_env_wait_ms': np.float64(0.04076434070995829), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004289526688425164), 'StateBufferConnector_ms': np.float64(0.0032836512515419408), 'ViewRequirementAgentConnector_ms': np.float64(0.081609424791838)}, 'num_episodes': 190, 'episode_return_max': 89.0, 'episode_return_min': 8.0, 'episode_return_mean': np.float64(20.957894736842107), 'episodes_this_iter': 190}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(1.7115636), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(5.038014815372315e-05), 'total_loss': np.float64(8.989190988643195), 'policy_loss': np.float64(-0.04115404137360153), 'vf_loss': np.float64(9.024740335505495), 'vf_explained_var': np.float64(-0.051808938992920744), 'kl': np.float64(0.02802332705017401), 'entropy': np.float64(0.6659084981487643), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,417.999,4000,4000,417.999,2,0,0,4000,"{'cpu_util_percent': np.float64(29.014285714285716), 'ram_util_percent': np.float64(34.31428571428571)}","{'training_iteration_time_ms': 9569.41, 'restore_workers_time_ms': 0.025, 'training_step_time_ms': 9569.352, 'sample_time_ms': 2064.446, 'load_time_ms': 0.466, 'load_throughput': 8590484.383, 'learn_time_ms': 7499.703, 'learn_throughput': 533.354, 'synch_weights_time_ms': 4.119}"
PPO_CartPole-v1_b6f11_00009,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 61.0, 'episode_reward_min': 8.0, 'episode_reward_mean': np.float64(21.376344086021504), 'episode_len_mean': np.float64(21.376344086021504), 'episode_media': {}, 'episodes_timesteps_total': 3976, 'policy_reward_min': {'default_policy': np.float64(8.0)}, 'policy_reward_max': {'default_policy': np.float64(61.0)}, 'policy_reward_mean': {'default_policy': np.float64(21.376344086021504)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [24.0, 17.0, 21.0, 36.0, 27.0, 12.0, 33.0, 33.0, 33.0, 18.0, 20.0, 17.0, 20.0, 12.0, 27.0, 26.0, 24.0, 20.0, 14.0, 31.0, 22.0, 18.0, 33.0, 28.0, 14.0, 14.0, 27.0, 15.0, 42.0, 16.0, 19.0, 14.0, 19.0, 38.0, 50.0, 17.0, 47.0, 14.0, 35.0, 12.0, 20.0, 34.0, 18.0, 11.0, 13.0, 11.0, 21.0, 21.0, 11.0, 14.0, 23.0, 17.0, 20.0, 25.0, 14.0, 21.0, 13.0, 14.0, 31.0, 12.0, 28.0, 22.0, 15.0, 26.0, 12.0, 16.0, 11.0, 17.0, 19.0, 23.0, 23.0, 16.0, 23.0, 23.0, 22.0, 35.0, 14.0, 11.0, 15.0, 12.0, 30.0, 16.0, 28.0, 39.0, 22.0, 15.0, 54.0, 32.0, 17.0, 14.0, 21.0, 16.0, 10.0, 16.0, 20.0, 9.0, 54.0, 12.0, 16.0, 26.0, 41.0, 24.0, 61.0, 15.0, 33.0, 25.0, 19.0, 9.0, 33.0, 22.0, 10.0, 17.0, 22.0, 28.0, 12.0, 26.0, 22.0, 18.0, 30.0, 11.0, 17.0, 40.0, 10.0, 16.0, 13.0, 10.0, 20.0, 23.0, 25.0, 17.0, 25.0, 29.0, 21.0, 27.0, 14.0, 23.0, 15.0, 12.0, 17.0, 17.0, 22.0, 18.0, 22.0, 12.0, 35.0, 14.0, 15.0, 20.0, 23.0, 14.0, 14.0, 31.0, 24.0, 32.0, 45.0, 30.0, 14.0, 29.0, 19.0, 36.0, 12.0, 27.0, 18.0, 10.0, 17.0, 21.0, 13.0, 15.0, 8.0, 12.0, 8.0, 13.0, 17.0, 13.0, 17.0, 38.0, 18.0, 13.0, 31.0, 14.0, 23.0, 14.0, 17.0, 18.0, 36.0, 22.0], 'episode_lengths': [24, 17, 21, 36, 27, 12, 33, 33, 33, 18, 20, 17, 20, 12, 27, 26, 24, 20, 14, 31, 22, 18, 33, 28, 14, 14, 27, 15, 42, 16, 19, 14, 19, 38, 50, 17, 47, 14, 35, 12, 20, 34, 18, 11, 13, 11, 21, 21, 11, 14, 23, 17, 20, 25, 14, 21, 13, 14, 31, 12, 28, 22, 15, 26, 12, 16, 11, 17, 19, 23, 23, 16, 23, 23, 22, 35, 14, 11, 15, 12, 30, 16, 28, 39, 22, 15, 54, 32, 17, 14, 21, 16, 10, 16, 20, 9, 54, 12, 16, 26, 41, 24, 61, 15, 33, 25, 19, 9, 33, 22, 10, 17, 22, 28, 12, 26, 22, 18, 30, 11, 17, 40, 10, 16, 13, 10, 20, 23, 25, 17, 25, 29, 21, 27, 14, 23, 15, 12, 17, 17, 22, 18, 22, 12, 35, 14, 15, 20, 23, 14, 14, 31, 24, 32, 45, 30, 14, 29, 19, 36, 12, 27, 18, 10, 17, 21, 13, 15, 8, 12, 8, 13, 17, 13, 17, 38, 18, 13, 31, 14, 23, 14, 17, 18, 36, 22], 'policy_default_policy_reward': [24.0, 17.0, 21.0, 36.0, 27.0, 12.0, 33.0, 33.0, 33.0, 18.0, 20.0, 17.0, 20.0, 12.0, 27.0, 26.0, 24.0, 20.0, 14.0, 31.0, 22.0, 18.0, 33.0, 28.0, 14.0, 14.0, 27.0, 15.0, 42.0, 16.0, 19.0, 14.0, 19.0, 38.0, 50.0, 17.0, 47.0, 14.0, 35.0, 12.0, 20.0, 34.0, 18.0, 11.0, 13.0, 11.0, 21.0, 21.0, 11.0, 14.0, 23.0, 17.0, 20.0, 25.0, 14.0, 21.0, 13.0, 14.0, 31.0, 12.0, 28.0, 22.0, 15.0, 26.0, 12.0, 16.0, 11.0, 17.0, 19.0, 23.0, 23.0, 16.0, 23.0, 23.0, 22.0, 35.0, 14.0, 11.0, 15.0, 12.0, 30.0, 16.0, 28.0, 39.0, 22.0, 15.0, 54.0, 32.0, 17.0, 14.0, 21.0, 16.0, 10.0, 16.0, 20.0, 9.0, 54.0, 12.0, 16.0, 26.0, 41.0, 24.0, 61.0, 15.0, 33.0, 25.0, 19.0, 9.0, 33.0, 22.0, 10.0, 17.0, 22.0, 28.0, 12.0, 26.0, 22.0, 18.0, 30.0, 11.0, 17.0, 40.0, 10.0, 16.0, 13.0, 10.0, 20.0, 23.0, 25.0, 17.0, 25.0, 29.0, 21.0, 27.0, 14.0, 23.0, 15.0, 12.0, 17.0, 17.0, 22.0, 18.0, 22.0, 12.0, 35.0, 14.0, 15.0, 20.0, 23.0, 14.0, 14.0, 31.0, 24.0, 32.0, 45.0, 30.0, 14.0, 29.0, 19.0, 36.0, 12.0, 27.0, 18.0, 10.0, 17.0, 21.0, 13.0, 15.0, 8.0, 12.0, 8.0, 13.0, 17.0, 13.0, 17.0, 38.0, 18.0, 13.0, 31.0, 14.0, 23.0, 14.0, 17.0, 18.0, 36.0, 22.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.2187755353571475), 'mean_inference_ms': np.float64(0.6755617177063695), 'mean_action_processing_ms': np.float64(0.08024024535977349), 'mean_env_wait_ms': np.float64(0.040455830844574686), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004239236154863911), 'StateBufferConnector_ms': np.float64(0.003239159942955099), 'ViewRequirementAgentConnector_ms': np.float64(0.08251449113251061)}, 'num_episodes': 186, 'episode_return_max': 61.0, 'episode_return_min': 8.0, 'episode_return_mean': np.float64(21.376344086021504), 'episodes_this_iter': 186}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(7.1603327), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(0.06546785766458763), 'total_loss': np.float64(4.956189763226536), 'policy_loss': np.float64(0.14899155548473278), 'vf_loss': np.float64(4.476274557680052), 'vf_explained_var': np.float64(-0.8354643350006432), 'kl': np.float64(1.6546182267557645), 'entropy': np.float64(0.24148590566852418), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,421.039,4000,4000,421.039,2,0,0,4000,"{'cpu_util_percent': np.float64(28.735714285714284), 'ram_util_percent': np.float64(34.31428571428571)}","{'training_iteration_time_ms': 9500.324, 'restore_workers_time_ms': 0.02, 'training_step_time_ms': 9500.273, 'sample_time_ms': 2059.527, 'load_time_ms': 0.523, 'load_throughput': 7643378.588, 'learn_time_ms': 7435.818, 'learn_throughput': 537.937, 'synch_weights_time_ms': 3.768}"


[36m(PPO pid=574827)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/home/lasse/ray_minicourse/lesson_2/ray_results/nb_3/ppo_cartpole/PPO_CartPole-v1_b6f11_00000_0_gamma=0.9999,lr=0.0000_2024-12-01_00-53-45/checkpoint_000000)
[36m(PPO pid=574829)[0m Install gputil for GPU system monitoring.[32m [repeated 4x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)[0m
[36m(PPO pid=574829)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/home/lasse/ray_minicourse/lesson_2/ray_results/nb_3/ppo_cartpole/PPO_CartPole-v1_b6f11_00002_2_gamma=0.9950,lr=0.0034_2024-12-01_00-53-45/checkpoint_000000)[32m [repeated 2x across cluster][0m
[36m(PPO pid=575787)[0m Install gputil for GPU system monitoring.
[36m(PPO pid=575786)[0m Install gputil for GPU system monitoring.

ResultGrid<[
  Result(
    metrics={'custom_metrics': {}, 'episode_media': {}, 'info': {'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(1.8215415), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(1.3387362386843284e-05), 'total_loss': np.float64(9.208352335037723), 'policy_loss': np.float64(-0.03128230159711694), 'vf_loss': np.float64(9.235793708473123), 'vf_explained_var': np.float64(-0.000993846244709466), 'kl': np.float64(0.019204470895124115), 'entropy': np.float64(0.6744765489332137), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}, 'env_runners': {'episode_reward_max': 74.0, 'episode_reward_mi

Pay attention to the `Trial progress` in the logs above in the column `agent_timesteps_total`. Not all the trials have completed the $40000$ pre-defined steps (training iterations * batch size = 10 * 4000). The ASHA scheduler finalized the unpromised trials before finishing all the training iterations to save time and speedup the hyperparameter optimization process.

Now, let's observe the reward progress during the training in tensorboard looking at the `episode_reward_mean` metric.

In [7]:
%load_ext tensorboard
%tensorboard --logdir ray_results/nb_3

The training reward shows that not all the trials completed the 10 iterations. This is because the ASHA scheduler stops the trials that are not performing well and only the best trials finished the training.