# Hyperparameter optimization for PPO RL agent using ASHA scheduler and Random search

In this lesson, we learn to use the ASHA scheduler to stop less-promising trials (with bad hyperparameter value combinations) and speeding up the training process while optimizing the hyperparameters.

In [1]:
from pathlib import Path
from ray import air, tune
from ray.tune.schedulers import ASHAScheduler
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.algorithms.algorithm import Algorithm

For this notebook, we will try to optimize the learning rate `lr` and the discount factor `gamma` hyperparameters (same from [lesson 2 notebook 2](2-optimize_ppo_hyperparameters_cartpole.ipynb)).

In [2]:
search_space = {
    "lr": tune.loguniform(1e-5, 1),
    "gamma": tune.choice(
        [
            0.5,
            0.6,
            0.7,
            0.8,
            0.9,
            0.95,
            0.98,
            0.99,
            0.995,
            0.999,
            0.9999,
        ]
    ),
}

Similarly to [lesson 2 notebook 2](2-optimize_ppo_hyperparameters_cartpole.ipynb), we use a random search algorithm.

In [3]:
search_algo = tune.search.basic_variant.BasicVariantGenerator()  # Random search

In this example, we utilize an ASHA scheduler to stop unpromising trials (with bad performance).

In [4]:
scheduler_algo = ASHAScheduler(
    time_attr="training_iteration",  # Metric to use for comparison
    max_t=10,  # Max time units per trial.
    grace_period=1,  # Only stop trials at least this old in time.
    reduction_factor=3,  # Used to set halving rate and amount.
    brackets=1,  # Number of brackets. Each bracket has a different halving rate, specified by the reduction factor.
)  # ASHA trial scheduler

Once the search and scheduler algorithms are defined, we can define our Tune configuration:

In [None]:
number_trials = 10
tune_config = tune.TuneConfig(
    metric="env_runners/episode_reward_mean",  # That's the metric we want to maximize/minimize
    mode="max",  # Here we indicate we want to maximize the metric env_runners/episode_reward_mean
    scheduler=scheduler_algo,
    search_alg=search_algo,
    num_samples=number_trials,  # Number of trials to run
)

Now, it's time to train our PPO RL agent using this Tune configurations. When executing the cell below, pay attention to the number of trials with status PENDING and RUNNING. Since we allowed only `number_parallel_trials = 2` trials running in parallel, the other trials should be with pending status.

In [6]:
config = PPOConfig().environment("CartPole-v1")
stop = {
    "training_iteration": 10,
}
checkpoint_frequency = 0
store_results_path = str(Path("./ray_results/").resolve()) + "/nb_3/"
agent_name = "ppo_cartpole"

tuner = tune.Tuner(
    "PPO",
    param_space={
        **config.to_dict(),
        **search_space,
    },  # Here we mix the Algo config with the search space
    tune_config=tune_config,
    run_config=air.RunConfig(
        storage_path=store_results_path,
        name=agent_name,
        stop=stop,
        verbose=2,
        checkpoint_config=air.CheckpointConfig(
            checkpoint_frequency=checkpoint_frequency,
            checkpoint_at_end=True,
        ),
    ),
)
results = tuner.fit()
print(results)

2024-11-30 03:20:50,723	INFO worker.py:1783 -- Started a local Ray instance.
2024-11-30 03:20:51,248	INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
2024-11-30 03:20:51,250	INFO tune.py:616 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949
  gym.logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
  logger.warn(
  logger.warn(f"{pre} is not within the observation space.")


0,1
Current time:,2024-11-30 03:23:18
Running for:,00:02:27.32
Memory:,6.5/23.9 GiB

Trial name,status,loc,gamma,lr,iter,total time (s),ts,num_healthy_workers,num_in_flight_async_ sample_reqs,num_remote_worker_re starts
PPO_CartPole-v1_191a4_00000,TERMINATED,200.239.93.233:524331,0.5,0.0001187,10,98.1027,40000,2,0,0
PPO_CartPole-v1_191a4_00001,TERMINATED,200.239.93.233:524332,0.98,0.108495,3,34.4772,12000,2,0,0
PPO_CartPole-v1_191a4_00002,TERMINATED,200.239.93.233:524333,0.5,0.000775,9,88.8118,36000,2,0,0
PPO_CartPole-v1_191a4_00003,TERMINATED,200.239.93.233:524334,0.9999,0.0272017,1,10.0148,4000,2,0,0
PPO_CartPole-v1_191a4_00004,TERMINATED,200.239.93.233:524335,0.5,8.12688e-05,3,29.8847,12000,2,0,0
PPO_CartPole-v1_191a4_00005,TERMINATED,200.239.93.233:525267,0.5,0.00843513,1,9.84887,4000,2,0,0
PPO_CartPole-v1_191a4_00006,TERMINATED,200.239.93.233:525454,0.6,0.00114071,1,9.7989,4000,2,0,0
PPO_CartPole-v1_191a4_00007,TERMINATED,200.239.93.233:525509,0.7,6.27415e-05,1,10.0325,4000,2,0,0
PPO_CartPole-v1_191a4_00008,TERMINATED,200.239.93.233:525817,0.8,5.6009e-05,10,95.4426,40000,2,0,0
PPO_CartPole-v1_191a4_00009,TERMINATED,200.239.93.233:526038,0.98,3.04454e-05,1,10.1776,4000,2,0,0


[36m(PPO pid=524331)[0m Install gputil for GPU system monitoring.


Trial name,agent_timesteps_total,counters,custom_metrics,env_runners,episode_media,info,num_agent_steps_sampled,num_agent_steps_sampled_lifetime,num_agent_steps_trained,num_env_steps_sampled,num_env_steps_sampled_lifetime,num_env_steps_sampled_this_iter,num_env_steps_sampled_throughput_per_sec,num_env_steps_trained,num_env_steps_trained_this_iter,num_env_steps_trained_throughput_per_sec,num_healthy_workers,num_in_flight_async_sample_reqs,num_remote_worker_restarts,num_steps_trained_this_iter,perf,timers
PPO_CartPole-v1_191a4_00000,40000,"{'num_env_steps_sampled': 40000, 'num_env_steps_trained': 40000, 'num_agent_steps_sampled': 40000, 'num_agent_steps_trained': 40000}",{},"{'episode_reward_max': 386.0, 'episode_reward_min': 24.0, 'episode_reward_mean': np.float64(193.75), 'episode_len_mean': np.float64(193.75), 'episode_media': {}, 'episodes_timesteps_total': 19375, 'policy_reward_min': {'default_policy': np.float64(24.0)}, 'policy_reward_max': {'default_policy': np.float64(386.0)}, 'policy_reward_mean': {'default_policy': np.float64(193.75)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [153.0, 149.0, 130.0, 148.0, 156.0, 177.0, 116.0, 180.0, 77.0, 51.0, 218.0, 228.0, 177.0, 40.0, 41.0, 24.0, 172.0, 39.0, 155.0, 152.0, 116.0, 64.0, 83.0, 188.0, 229.0, 172.0, 263.0, 120.0, 136.0, 241.0, 128.0, 162.0, 160.0, 224.0, 209.0, 213.0, 152.0, 200.0, 180.0, 147.0, 117.0, 165.0, 240.0, 161.0, 157.0, 71.0, 68.0, 252.0, 386.0, 304.0, 272.0, 252.0, 175.0, 176.0, 191.0, 219.0, 204.0, 296.0, 182.0, 330.0, 198.0, 249.0, 267.0, 213.0, 179.0, 231.0, 172.0, 209.0, 233.0, 168.0, 238.0, 184.0, 344.0, 277.0, 296.0, 242.0, 165.0, 149.0, 200.0, 185.0, 215.0, 374.0, 133.0, 201.0, 305.0, 206.0, 166.0, 242.0, 191.0, 303.0, 335.0, 256.0, 297.0, 222.0, 242.0, 192.0, 197.0, 290.0, 216.0, 205.0], 'episode_lengths': [153, 149, 130, 148, 156, 177, 116, 180, 77, 51, 218, 228, 177, 40, 41, 24, 172, 39, 155, 152, 116, 64, 83, 188, 229, 172, 263, 120, 136, 241, 128, 162, 160, 224, 209, 213, 152, 200, 180, 147, 117, 165, 240, 161, 157, 71, 68, 252, 386, 304, 272, 252, 175, 176, 191, 219, 204, 296, 182, 330, 198, 249, 267, 213, 179, 231, 172, 209, 233, 168, 238, 184, 344, 277, 296, 242, 165, 149, 200, 185, 215, 374, 133, 201, 305, 206, 166, 242, 191, 303, 335, 256, 297, 222, 242, 192, 197, 290, 216, 205], 'policy_default_policy_reward': [153.0, 149.0, 130.0, 148.0, 156.0, 177.0, 116.0, 180.0, 77.0, 51.0, 218.0, 228.0, 177.0, 40.0, 41.0, 24.0, 172.0, 39.0, 155.0, 152.0, 116.0, 64.0, 83.0, 188.0, 229.0, 172.0, 263.0, 120.0, 136.0, 241.0, 128.0, 162.0, 160.0, 224.0, 209.0, 213.0, 152.0, 200.0, 180.0, 147.0, 117.0, 165.0, 240.0, 161.0, 157.0, 71.0, 68.0, 252.0, 386.0, 304.0, 272.0, 252.0, 175.0, 176.0, 191.0, 219.0, 204.0, 296.0, 182.0, 330.0, 198.0, 249.0, 267.0, 213.0, 179.0, 231.0, 172.0, 209.0, 233.0, 168.0, 238.0, 184.0, 344.0, 277.0, 296.0, 242.0, 165.0, 149.0, 200.0, 185.0, 215.0, 374.0, 133.0, 201.0, 305.0, 206.0, 166.0, 242.0, 191.0, 303.0, 335.0, 256.0, 297.0, 222.0, 242.0, 192.0, 197.0, 290.0, 216.0, 205.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.20270511114949755), 'mean_inference_ms': np.float64(0.6917176231146419), 'mean_action_processing_ms': np.float64(0.08134807468208846), 'mean_env_wait_ms': np.float64(0.04069190992184943), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.00440216064453125), 'StateBufferConnector_ms': np.float64(0.0031888484954833984), 'ViewRequirementAgentConnector_ms': np.float64(0.08451128005981445)}, 'num_episodes': 18, 'episode_return_max': 386.0, 'episode_return_min': 24.0, 'episode_return_mean': np.float64(193.75), 'episodes_this_iter': 18}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(0.37658748), 'cur_kl_coeff': np.float64(0.05000000000000001), 'cur_lr': np.float64(0.00011869961140365569), 'total_loss': np.float64(-0.0008616650679370299), 'policy_loss': np.float64(-0.006050677709682014), 'vf_loss': np.float64(0.004951831043589302), 'vf_explained_var': np.float64(-0.24358116740821512), 'kl': np.float64(0.004743618728305799), 'entropy': np.float64(0.5449451564140217), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(8835.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 40000, 'num_env_steps_trained': 40000, 'num_agent_steps_sampled': 40000, 'num_agent_steps_trained': 40000}",40000,40000,40000,40000,40000,4000,425.412,40000,4000,425.412,2,0,0,4000,"{'cpu_util_percent': np.float64(16.307142857142857), 'ram_util_percent': np.float64(32.10000000000001)}","{'training_iteration_time_ms': 9806.498, 'restore_workers_time_ms': 0.022, 'training_step_time_ms': 9806.439, 'sample_time_ms': 2079.171, 'load_time_ms': 0.346, 'load_throughput': 11545809.648, 'learn_time_ms': 7722.274, 'learn_throughput': 517.982, 'synch_weights_time_ms': 4.12}"
PPO_CartPole-v1_191a4_00001,12000,"{'num_env_steps_sampled': 12000, 'num_env_steps_trained': 12000, 'num_agent_steps_sampled': 12000, 'num_agent_steps_trained': 12000}",{},"{'episode_reward_max': 12.0, 'episode_reward_min': 8.0, 'episode_reward_mean': np.float64(9.312354312354312), 'episode_len_mean': np.float64(9.312354312354312), 'episode_media': {}, 'episodes_timesteps_total': 3995, 'policy_reward_min': {'default_policy': np.float64(8.0)}, 'policy_reward_max': {'default_policy': np.float64(12.0)}, 'policy_reward_mean': {'default_policy': np.float64(9.312354312354312)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [10.0, 10.0, 10.0, 11.0, 9.0, 9.0, 9.0, 11.0, 9.0, 9.0, 10.0, 10.0, 9.0, 10.0, 8.0, 10.0, 9.0, 9.0, 9.0, 10.0, 9.0, 8.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 11.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 8.0, 9.0, 10.0, 10.0, 9.0, 8.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 10.0, 9.0, 8.0, 9.0, 8.0, 9.0, 8.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 8.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 8.0, 9.0, 9.0, 10.0, 8.0, 10.0, 10.0, 9.0, 9.0, 10.0, 8.0, 9.0, 8.0, 10.0, 10.0, 9.0, 8.0, 8.0, 10.0, 11.0, 9.0, 11.0, 8.0, 9.0, 8.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 8.0, 9.0, 8.0, 9.0, 9.0, 8.0, 8.0, 10.0, 10.0, 10.0, 8.0, 8.0, 10.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 10.0, 8.0, 9.0, 9.0, 10.0, 10.0, 9.0, 10.0, 8.0, 10.0, 9.0, 10.0, 10.0, 8.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 10.0, 12.0, 10.0, 10.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 10.0, 8.0, 9.0, 10.0, 8.0, 9.0, 9.0, 9.0, 10.0, 8.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0, 9.0, 8.0, 9.0, 9.0, 9.0, 9.0, 9.0, 8.0, 10.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 8.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 9.0, 9.0, 9.0, 10.0, 8.0, 9.0, 11.0, 11.0, 9.0, 9.0, 9.0, 11.0, 8.0, 10.0, 10.0, 10.0, 11.0, 10.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0, 9.0, 8.0, 10.0, 10.0, 8.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 11.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 9.0, 8.0, 10.0, 9.0, 8.0, 12.0, 10.0, 9.0, 10.0, 9.0, 8.0, 10.0, 8.0, 11.0, 10.0, 9.0, 10.0, 10.0, 9.0, 11.0, 10.0, 9.0, 9.0, 8.0, 10.0, 9.0, 10.0, 9.0, 8.0, 9.0, 8.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 10.0, 8.0, 9.0, 10.0, 9.0, 9.0, 11.0, 10.0, 9.0, 9.0, 10.0, 8.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 9.0, 9.0, 8.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 8.0, 9.0, 9.0, 10.0, 8.0, 10.0, 10.0, 9.0, 10.0, 9.0, 8.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 10.0, 10.0, 10.0, 9.0, 9.0, 10.0, 9.0, 9.0, 8.0, 9.0, 8.0, 8.0, 10.0, 8.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0], 'episode_lengths': [10, 10, 10, 11, 9, 9, 9, 11, 9, 9, 10, 10, 9, 10, 8, 10, 9, 9, 9, 10, 9, 8, 10, 9, 10, 10, 9, 9, 9, 11, 9, 10, 9, 10, 9, 9, 9, 9, 8, 9, 10, 10, 9, 8, 9, 9, 10, 10, 9, 9, 10, 9, 8, 9, 8, 9, 8, 9, 9, 10, 9, 10, 10, 8, 9, 10, 9, 10, 9, 9, 9, 9, 9, 9, 9, 10, 9, 10, 9, 10, 10, 9, 9, 8, 9, 9, 10, 8, 10, 10, 9, 9, 10, 8, 9, 8, 10, 10, 9, 8, 8, 10, 11, 9, 11, 8, 9, 8, 9, 9, 10, 9, 10, 10, 9, 10, 9, 9, 9, 10, 9, 10, 9, 8, 9, 8, 9, 9, 8, 8, 10, 10, 10, 8, 8, 10, 9, 10, 9, 9, 10, 10, 10, 10, 10, 8, 9, 9, 10, 10, 9, 10, 8, 10, 9, 10, 10, 8, 10, 9, 10, 10, 9, 9, 9, 10, 10, 9, 9, 10, 12, 10, 10, 9, 9, 9, 9, 10, 9, 9, 10, 9, 9, 9, 9, 10, 9, 10, 9, 10, 9, 10, 9, 10, 9, 9, 9, 10, 8, 9, 10, 8, 9, 9, 9, 10, 8, 10, 10, 9, 10, 9, 10, 9, 10, 10, 10, 9, 9, 8, 9, 9, 9, 9, 9, 8, 10, 9, 10, 9, 9, 10, 9, 9, 9, 9, 9, 8, 9, 9, 10, 10, 10, 10, 9, 9, 9, 10, 8, 9, 11, 11, 9, 9, 9, 11, 8, 10, 10, 10, 11, 10, 9, 9, 10, 9, 10, 10, 10, 9, 9, 8, 10, 10, 8, 9, 10, 9, 9, 10, 9, 10, 9, 10, 9, 9, 9, 9, 9, 11, 9, 9, 10, 9, 9, 10, 10, 10, 10, 9, 8, 10, 9, 8, 12, 10, 9, 10, 9, 8, 10, 8, 11, 10, 9, 10, 10, 9, 11, 10, 9, 9, 8, 10, 9, 10, 9, 8, 9, 8, 10, 9, 9, 9, 9, 9, 10, 8, 9, 10, 9, 9, 11, 10, 9, 9, 10, 8, 9, 10, 9, 10, 10, 9, 10, 9, 9, 8, 10, 10, 9, 10, 9, 10, 9, 9, 10, 9, 9, 10, 9, 9, 10, 8, 9, 9, 10, 8, 10, 10, 9, 10, 9, 8, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 10, 9, 10, 9, 9, 9, 9, 9, 10, 9, 9, 10, 10, 9, 9, 10, 10, 10, 9, 9, 10, 9, 9, 8, 9, 8, 8, 10, 8, 9, 9, 9, 9, 9, 9], 'policy_default_policy_reward': [10.0, 10.0, 10.0, 11.0, 9.0, 9.0, 9.0, 11.0, 9.0, 9.0, 10.0, 10.0, 9.0, 10.0, 8.0, 10.0, 9.0, 9.0, 9.0, 10.0, 9.0, 8.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 11.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 8.0, 9.0, 10.0, 10.0, 9.0, 8.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 10.0, 9.0, 8.0, 9.0, 8.0, 9.0, 8.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 8.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 8.0, 9.0, 9.0, 10.0, 8.0, 10.0, 10.0, 9.0, 9.0, 10.0, 8.0, 9.0, 8.0, 10.0, 10.0, 9.0, 8.0, 8.0, 10.0, 11.0, 9.0, 11.0, 8.0, 9.0, 8.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 8.0, 9.0, 8.0, 9.0, 9.0, 8.0, 8.0, 10.0, 10.0, 10.0, 8.0, 8.0, 10.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 10.0, 8.0, 9.0, 9.0, 10.0, 10.0, 9.0, 10.0, 8.0, 10.0, 9.0, 10.0, 10.0, 8.0, 10.0, 9.0, 10.0, 10.0, 9.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 10.0, 12.0, 10.0, 10.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 10.0, 8.0, 9.0, 10.0, 8.0, 9.0, 9.0, 9.0, 10.0, 8.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0, 9.0, 8.0, 9.0, 9.0, 9.0, 9.0, 9.0, 8.0, 10.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 8.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 9.0, 9.0, 9.0, 10.0, 8.0, 9.0, 11.0, 11.0, 9.0, 9.0, 9.0, 11.0, 8.0, 10.0, 10.0, 10.0, 11.0, 10.0, 9.0, 9.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0, 9.0, 8.0, 10.0, 10.0, 8.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 11.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 9.0, 8.0, 10.0, 9.0, 8.0, 12.0, 10.0, 9.0, 10.0, 9.0, 8.0, 10.0, 8.0, 11.0, 10.0, 9.0, 10.0, 10.0, 9.0, 11.0, 10.0, 9.0, 9.0, 8.0, 10.0, 9.0, 10.0, 9.0, 8.0, 9.0, 8.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 10.0, 8.0, 9.0, 10.0, 9.0, 9.0, 11.0, 10.0, 9.0, 9.0, 10.0, 8.0, 9.0, 10.0, 9.0, 10.0, 10.0, 9.0, 10.0, 9.0, 9.0, 8.0, 10.0, 10.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 8.0, 9.0, 9.0, 10.0, 8.0, 10.0, 10.0, 9.0, 10.0, 9.0, 8.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 10.0, 9.0, 9.0, 9.0, 9.0, 9.0, 10.0, 9.0, 9.0, 10.0, 10.0, 9.0, 9.0, 10.0, 10.0, 10.0, 9.0, 9.0, 10.0, 9.0, 9.0, 8.0, 9.0, 8.0, 8.0, 10.0, 8.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.24664928150184673), 'mean_inference_ms': np.float64(0.6979483582889632), 'mean_action_processing_ms': np.float64(0.08254765170745679), 'mean_env_wait_ms': np.float64(0.04153919823678567), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.00457680308735454), 'StateBufferConnector_ms': np.float64(0.004668947144266053), 'ViewRequirementAgentConnector_ms': np.float64(0.08616286279994013)}, 'num_episodes': 429, 'episode_return_max': 12.0, 'episode_return_min': 8.0, 'episode_return_mean': np.float64(9.312354312354312), 'episodes_this_iter': 429}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(0.08098202), 'cur_kl_coeff': np.float64(0.4500000000000001), 'cur_lr': np.float64(0.10849519052357619), 'total_loss': np.float64(inf), 'policy_loss': np.float64(-0.020868281116809256), 'vf_loss': np.float64(10.0), 'vf_explained_var': np.float64(-0.9706218473372921), 'kl': np.float64(inf), 'entropy': np.float64(0.029401574637879072), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(2325.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 12000, 'num_env_steps_trained': 12000, 'num_agent_steps_sampled': 12000, 'num_agent_steps_trained': 12000}",12000,12000,12000,12000,12000,4000,329.163,12000,4000,329.163,2,0,0,4000,"{'cpu_util_percent': np.float64(36.77222222222222), 'ram_util_percent': np.float64(42.599999999999994)}","{'training_iteration_time_ms': 11485.662, 'restore_workers_time_ms': 0.026, 'training_step_time_ms': 11485.599, 'sample_time_ms': 2218.655, 'load_time_ms': 0.386, 'load_throughput': 10356306.173, 'learn_time_ms': 9261.588, 'learn_throughput': 431.891, 'synch_weights_time_ms': 4.368}"
PPO_CartPole-v1_191a4_00002,36000,"{'num_env_steps_sampled': 36000, 'num_env_steps_trained': 36000, 'num_agent_steps_sampled': 36000, 'num_agent_steps_trained': 36000}",{},"{'episode_reward_max': 344.0, 'episode_reward_min': 14.0, 'episode_reward_mean': np.float64(137.45), 'episode_len_mean': np.float64(137.45), 'episode_media': {}, 'episodes_timesteps_total': 13745, 'policy_reward_min': {'default_policy': np.float64(14.0)}, 'policy_reward_max': {'default_policy': np.float64(344.0)}, 'policy_reward_mean': {'default_policy': np.float64(137.45)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [104.0, 169.0, 134.0, 91.0, 58.0, 115.0, 113.0, 111.0, 24.0, 137.0, 174.0, 162.0, 219.0, 20.0, 82.0, 114.0, 101.0, 209.0, 173.0, 128.0, 138.0, 99.0, 181.0, 138.0, 147.0, 14.0, 70.0, 166.0, 145.0, 96.0, 179.0, 87.0, 135.0, 91.0, 196.0, 205.0, 197.0, 106.0, 121.0, 57.0, 141.0, 176.0, 215.0, 148.0, 162.0, 160.0, 260.0, 63.0, 122.0, 148.0, 159.0, 46.0, 142.0, 136.0, 95.0, 344.0, 119.0, 151.0, 70.0, 175.0, 127.0, 64.0, 109.0, 84.0, 164.0, 175.0, 219.0, 92.0, 14.0, 144.0, 121.0, 141.0, 134.0, 72.0, 138.0, 151.0, 110.0, 153.0, 175.0, 90.0, 135.0, 266.0, 170.0, 162.0, 178.0, 197.0, 298.0, 133.0, 170.0, 159.0, 45.0, 30.0, 232.0, 150.0, 163.0, 120.0, 163.0, 162.0, 138.0, 159.0], 'episode_lengths': [104, 169, 134, 91, 58, 115, 113, 111, 24, 137, 174, 162, 219, 20, 82, 114, 101, 209, 173, 128, 138, 99, 181, 138, 147, 14, 70, 166, 145, 96, 179, 87, 135, 91, 196, 205, 197, 106, 121, 57, 141, 176, 215, 148, 162, 160, 260, 63, 122, 148, 159, 46, 142, 136, 95, 344, 119, 151, 70, 175, 127, 64, 109, 84, 164, 175, 219, 92, 14, 144, 121, 141, 134, 72, 138, 151, 110, 153, 175, 90, 135, 266, 170, 162, 178, 197, 298, 133, 170, 159, 45, 30, 232, 150, 163, 120, 163, 162, 138, 159], 'policy_default_policy_reward': [104.0, 169.0, 134.0, 91.0, 58.0, 115.0, 113.0, 111.0, 24.0, 137.0, 174.0, 162.0, 219.0, 20.0, 82.0, 114.0, 101.0, 209.0, 173.0, 128.0, 138.0, 99.0, 181.0, 138.0, 147.0, 14.0, 70.0, 166.0, 145.0, 96.0, 179.0, 87.0, 135.0, 91.0, 196.0, 205.0, 197.0, 106.0, 121.0, 57.0, 141.0, 176.0, 215.0, 148.0, 162.0, 160.0, 260.0, 63.0, 122.0, 148.0, 159.0, 46.0, 142.0, 136.0, 95.0, 344.0, 119.0, 151.0, 70.0, 175.0, 127.0, 64.0, 109.0, 84.0, 164.0, 175.0, 219.0, 92.0, 14.0, 144.0, 121.0, 141.0, 134.0, 72.0, 138.0, 151.0, 110.0, 153.0, 175.0, 90.0, 135.0, 266.0, 170.0, 162.0, 178.0, 197.0, 298.0, 133.0, 170.0, 159.0, 45.0, 30.0, 232.0, 150.0, 163.0, 120.0, 163.0, 162.0, 138.0, 159.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.2064835866649936), 'mean_inference_ms': np.float64(0.6956462653244899), 'mean_action_processing_ms': np.float64(0.08152815083245629), 'mean_env_wait_ms': np.float64(0.04109868912027457), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004165172576904297), 'StateBufferConnector_ms': np.float64(0.003198385238647461), 'ViewRequirementAgentConnector_ms': np.float64(0.08608675003051758)}, 'num_episodes': 27, 'episode_return_max': 344.0, 'episode_return_min': 14.0, 'episode_return_mean': np.float64(137.45), 'episodes_this_iter': 27}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(0.52357596), 'cur_kl_coeff': np.float64(0.3), 'cur_lr': np.float64(0.0007749995641855787), 'total_loss': np.float64(0.0006495361461088822), 'policy_loss': np.float64(-0.00564217398303651), 'vf_loss': np.float64(0.004352648840847573), 'vf_explained_var': np.float64(0.21421237889156547), 'kl': np.float64(0.0064635378154300795), 'entropy': np.float64(0.5701948198259518), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(7905.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 36000, 'num_env_steps_trained': 36000, 'num_agent_steps_sampled': 36000, 'num_agent_steps_trained': 36000}",36000,36000,36000,36000,36000,4000,399.95,36000,4000,399.95,2,0,0,4000,"{'cpu_util_percent': np.float64(23.814285714285713), 'ram_util_percent': np.float64(36.39999999999999)}","{'training_iteration_time_ms': 9863.956, 'restore_workers_time_ms': 0.022, 'training_step_time_ms': 9863.898, 'sample_time_ms': 2094.955, 'load_time_ms': 0.338, 'load_throughput': 11848316.384, 'learn_time_ms': 7763.942, 'learn_throughput': 515.202, 'synch_weights_time_ms': 4.146}"
PPO_CartPole-v1_191a4_00003,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 55.0, 'episode_reward_min': 9.0, 'episode_reward_mean': np.float64(21.55135135135135), 'episode_len_mean': np.float64(21.55135135135135), 'episode_media': {}, 'episodes_timesteps_total': 3987, 'policy_reward_min': {'default_policy': np.float64(9.0)}, 'policy_reward_max': {'default_policy': np.float64(55.0)}, 'policy_reward_mean': {'default_policy': np.float64(21.55135135135135)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [16.0, 30.0, 16.0, 19.0, 25.0, 26.0, 15.0, 28.0, 18.0, 27.0, 9.0, 28.0, 15.0, 38.0, 39.0, 10.0, 29.0, 18.0, 12.0, 18.0, 24.0, 17.0, 13.0, 36.0, 14.0, 20.0, 21.0, 14.0, 29.0, 18.0, 18.0, 19.0, 49.0, 34.0, 19.0, 12.0, 12.0, 29.0, 27.0, 26.0, 14.0, 12.0, 22.0, 13.0, 15.0, 18.0, 17.0, 16.0, 20.0, 22.0, 34.0, 23.0, 10.0, 26.0, 18.0, 20.0, 17.0, 18.0, 16.0, 14.0, 28.0, 23.0, 23.0, 9.0, 23.0, 18.0, 27.0, 15.0, 29.0, 41.0, 32.0, 11.0, 46.0, 25.0, 37.0, 15.0, 17.0, 29.0, 15.0, 19.0, 47.0, 19.0, 53.0, 10.0, 29.0, 13.0, 11.0, 15.0, 28.0, 22.0, 13.0, 13.0, 11.0, 24.0, 21.0, 13.0, 23.0, 18.0, 18.0, 18.0, 12.0, 25.0, 21.0, 21.0, 30.0, 17.0, 18.0, 16.0, 18.0, 17.0, 19.0, 33.0, 15.0, 14.0, 13.0, 22.0, 21.0, 24.0, 15.0, 12.0, 13.0, 14.0, 32.0, 14.0, 55.0, 23.0, 17.0, 15.0, 39.0, 34.0, 11.0, 52.0, 26.0, 48.0, 31.0, 31.0, 15.0, 13.0, 20.0, 13.0, 17.0, 16.0, 29.0, 11.0, 17.0, 9.0, 21.0, 19.0, 21.0, 50.0, 11.0, 14.0, 26.0, 43.0, 33.0, 13.0, 36.0, 11.0, 16.0, 13.0, 13.0, 34.0, 19.0, 15.0, 17.0, 15.0, 28.0, 11.0, 18.0, 15.0, 15.0, 42.0, 15.0, 23.0, 16.0, 16.0, 29.0, 40.0, 12.0, 28.0, 11.0, 32.0, 18.0, 14.0, 13.0], 'episode_lengths': [16, 30, 16, 19, 25, 26, 15, 28, 18, 27, 9, 28, 15, 38, 39, 10, 29, 18, 12, 18, 24, 17, 13, 36, 14, 20, 21, 14, 29, 18, 18, 19, 49, 34, 19, 12, 12, 29, 27, 26, 14, 12, 22, 13, 15, 18, 17, 16, 20, 22, 34, 23, 10, 26, 18, 20, 17, 18, 16, 14, 28, 23, 23, 9, 23, 18, 27, 15, 29, 41, 32, 11, 46, 25, 37, 15, 17, 29, 15, 19, 47, 19, 53, 10, 29, 13, 11, 15, 28, 22, 13, 13, 11, 24, 21, 13, 23, 18, 18, 18, 12, 25, 21, 21, 30, 17, 18, 16, 18, 17, 19, 33, 15, 14, 13, 22, 21, 24, 15, 12, 13, 14, 32, 14, 55, 23, 17, 15, 39, 34, 11, 52, 26, 48, 31, 31, 15, 13, 20, 13, 17, 16, 29, 11, 17, 9, 21, 19, 21, 50, 11, 14, 26, 43, 33, 13, 36, 11, 16, 13, 13, 34, 19, 15, 17, 15, 28, 11, 18, 15, 15, 42, 15, 23, 16, 16, 29, 40, 12, 28, 11, 32, 18, 14, 13], 'policy_default_policy_reward': [16.0, 30.0, 16.0, 19.0, 25.0, 26.0, 15.0, 28.0, 18.0, 27.0, 9.0, 28.0, 15.0, 38.0, 39.0, 10.0, 29.0, 18.0, 12.0, 18.0, 24.0, 17.0, 13.0, 36.0, 14.0, 20.0, 21.0, 14.0, 29.0, 18.0, 18.0, 19.0, 49.0, 34.0, 19.0, 12.0, 12.0, 29.0, 27.0, 26.0, 14.0, 12.0, 22.0, 13.0, 15.0, 18.0, 17.0, 16.0, 20.0, 22.0, 34.0, 23.0, 10.0, 26.0, 18.0, 20.0, 17.0, 18.0, 16.0, 14.0, 28.0, 23.0, 23.0, 9.0, 23.0, 18.0, 27.0, 15.0, 29.0, 41.0, 32.0, 11.0, 46.0, 25.0, 37.0, 15.0, 17.0, 29.0, 15.0, 19.0, 47.0, 19.0, 53.0, 10.0, 29.0, 13.0, 11.0, 15.0, 28.0, 22.0, 13.0, 13.0, 11.0, 24.0, 21.0, 13.0, 23.0, 18.0, 18.0, 18.0, 12.0, 25.0, 21.0, 21.0, 30.0, 17.0, 18.0, 16.0, 18.0, 17.0, 19.0, 33.0, 15.0, 14.0, 13.0, 22.0, 21.0, 24.0, 15.0, 12.0, 13.0, 14.0, 32.0, 14.0, 55.0, 23.0, 17.0, 15.0, 39.0, 34.0, 11.0, 52.0, 26.0, 48.0, 31.0, 31.0, 15.0, 13.0, 20.0, 13.0, 17.0, 16.0, 29.0, 11.0, 17.0, 9.0, 21.0, 19.0, 21.0, 50.0, 11.0, 14.0, 26.0, 43.0, 33.0, 13.0, 36.0, 11.0, 16.0, 13.0, 13.0, 34.0, 19.0, 15.0, 17.0, 15.0, 28.0, 11.0, 18.0, 15.0, 15.0, 42.0, 15.0, 23.0, 16.0, 16.0, 29.0, 40.0, 12.0, 28.0, 11.0, 32.0, 18.0, 14.0, 13.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.22122594122159714), 'mean_inference_ms': np.float64(0.6863691976919686), 'mean_action_processing_ms': np.float64(0.08124438989558053), 'mean_env_wait_ms': np.float64(0.04252018646368427), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004531499501821157), 'StateBufferConnector_ms': np.float64(0.0033555159697661527), 'ViewRequirementAgentConnector_ms': np.float64(0.08311220117517419)}, 'num_episodes': 185, 'episode_return_max': 55.0, 'episode_return_min': 9.0, 'episode_return_mean': np.float64(21.55135135135135), 'episodes_this_iter': 185}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(7.1916866), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(0.02720165399320536), 'total_loss': np.float64(6.366919623651812), 'policy_loss': np.float64(0.04390071377418535), 'vf_loss': np.float64(6.276510743428302), 'vf_explained_var': np.float64(0.28789040145053657), 'kl': np.float64(0.23254085417327272), 'entropy': np.float64(0.5402095665454224), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,399.604,4000,4000,399.604,2,0,0,4000,"{'cpu_util_percent': np.float64(36.25333333333334), 'ram_util_percent': np.float64(45.22666666666666)}","{'training_iteration_time_ms': 10009.912, 'restore_workers_time_ms': 0.02, 'training_step_time_ms': 10009.858, 'sample_time_ms': 2095.372, 'load_time_ms': 0.503, 'load_throughput': 7955057.373, 'learn_time_ms': 7908.931, 'learn_throughput': 505.757, 'synch_weights_time_ms': 4.287}"
PPO_CartPole-v1_191a4_00004,12000,"{'num_env_steps_sampled': 12000, 'num_env_steps_trained': 12000, 'num_agent_steps_sampled': 12000, 'num_agent_steps_trained': 12000}",{},"{'episode_reward_max': 116.0, 'episode_reward_min': 10.0, 'episode_reward_mean': np.float64(43.43), 'episode_len_mean': np.float64(43.43), 'episode_media': {}, 'episodes_timesteps_total': 4343, 'policy_reward_min': {'default_policy': np.float64(10.0)}, 'policy_reward_max': {'default_policy': np.float64(116.0)}, 'policy_reward_mean': {'default_policy': np.float64(43.43)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [26.0, 41.0, 10.0, 48.0, 78.0, 22.0, 21.0, 19.0, 33.0, 113.0, 11.0, 65.0, 54.0, 12.0, 65.0, 15.0, 22.0, 91.0, 76.0, 86.0, 100.0, 12.0, 18.0, 28.0, 19.0, 32.0, 71.0, 80.0, 52.0, 21.0, 34.0, 18.0, 18.0, 25.0, 51.0, 87.0, 30.0, 46.0, 40.0, 42.0, 97.0, 69.0, 18.0, 52.0, 52.0, 26.0, 28.0, 13.0, 72.0, 16.0, 15.0, 25.0, 45.0, 25.0, 14.0, 37.0, 59.0, 100.0, 54.0, 64.0, 11.0, 42.0, 40.0, 44.0, 22.0, 92.0, 30.0, 51.0, 72.0, 31.0, 97.0, 21.0, 20.0, 60.0, 23.0, 34.0, 111.0, 20.0, 29.0, 22.0, 101.0, 26.0, 62.0, 15.0, 25.0, 91.0, 11.0, 26.0, 72.0, 54.0, 20.0, 17.0, 30.0, 116.0, 26.0, 25.0, 48.0, 26.0, 16.0, 51.0], 'episode_lengths': [26, 41, 10, 48, 78, 22, 21, 19, 33, 113, 11, 65, 54, 12, 65, 15, 22, 91, 76, 86, 100, 12, 18, 28, 19, 32, 71, 80, 52, 21, 34, 18, 18, 25, 51, 87, 30, 46, 40, 42, 97, 69, 18, 52, 52, 26, 28, 13, 72, 16, 15, 25, 45, 25, 14, 37, 59, 100, 54, 64, 11, 42, 40, 44, 22, 92, 30, 51, 72, 31, 97, 21, 20, 60, 23, 34, 111, 20, 29, 22, 101, 26, 62, 15, 25, 91, 11, 26, 72, 54, 20, 17, 30, 116, 26, 25, 48, 26, 16, 51], 'policy_default_policy_reward': [26.0, 41.0, 10.0, 48.0, 78.0, 22.0, 21.0, 19.0, 33.0, 113.0, 11.0, 65.0, 54.0, 12.0, 65.0, 15.0, 22.0, 91.0, 76.0, 86.0, 100.0, 12.0, 18.0, 28.0, 19.0, 32.0, 71.0, 80.0, 52.0, 21.0, 34.0, 18.0, 18.0, 25.0, 51.0, 87.0, 30.0, 46.0, 40.0, 42.0, 97.0, 69.0, 18.0, 52.0, 52.0, 26.0, 28.0, 13.0, 72.0, 16.0, 15.0, 25.0, 45.0, 25.0, 14.0, 37.0, 59.0, 100.0, 54.0, 64.0, 11.0, 42.0, 40.0, 44.0, 22.0, 92.0, 30.0, 51.0, 72.0, 31.0, 97.0, 21.0, 20.0, 60.0, 23.0, 34.0, 111.0, 20.0, 29.0, 22.0, 101.0, 26.0, 62.0, 15.0, 25.0, 91.0, 11.0, 26.0, 72.0, 54.0, 20.0, 17.0, 30.0, 116.0, 26.0, 25.0, 48.0, 26.0, 16.0, 51.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.21616805394353655), 'mean_inference_ms': np.float64(0.70578695443594), 'mean_action_processing_ms': np.float64(0.08410695712122358), 'mean_env_wait_ms': np.float64(0.04160179116317246), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004484653472900391), 'StateBufferConnector_ms': np.float64(0.0033702850341796875), 'ViewRequirementAgentConnector_ms': np.float64(0.08547019958496094)}, 'num_episodes': 91, 'episode_return_max': 116.0, 'episode_return_min': 10.0, 'episode_return_mean': np.float64(43.43), 'episodes_this_iter': 91}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(0.58899194), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(8.126883165634085e-05), 'total_loss': np.float64(0.015533676806096268), 'policy_loss': np.float64(-0.012189933070312103), 'vf_loss': np.float64(0.026299381530493177), 'vf_explained_var': np.float64(0.08277984408922093), 'kl': np.float64(0.007121137788558566), 'entropy': np.float64(0.6334050155455067), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(2325.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 12000, 'num_env_steps_trained': 12000, 'num_agent_steps_sampled': 12000, 'num_agent_steps_trained': 12000}",12000,12000,12000,12000,12000,4000,408.155,12000,4000,408.155,2,0,0,4000,"{'cpu_util_percent': np.float64(39.58571428571428), 'ram_util_percent': np.float64(45.471428571428575)}","{'training_iteration_time_ms': 9956.124, 'restore_workers_time_ms': 0.025, 'training_step_time_ms': 9956.064, 'sample_time_ms': 2216.292, 'load_time_ms': 0.377, 'load_throughput': 10598367.656, 'learn_time_ms': 7734.08, 'learn_throughput': 517.191, 'synch_weights_time_ms': 4.7}"
PPO_CartPole-v1_191a4_00005,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 74.0, 'episode_reward_min': 9.0, 'episode_reward_mean': np.float64(21.675824175824175), 'episode_len_mean': np.float64(21.675824175824175), 'episode_media': {}, 'episodes_timesteps_total': 3945, 'policy_reward_min': {'default_policy': np.float64(9.0)}, 'policy_reward_max': {'default_policy': np.float64(74.0)}, 'policy_reward_mean': {'default_policy': np.float64(21.675824175824175)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [13.0, 14.0, 12.0, 14.0, 17.0, 13.0, 19.0, 22.0, 18.0, 10.0, 15.0, 14.0, 13.0, 24.0, 56.0, 13.0, 16.0, 15.0, 14.0, 25.0, 13.0, 23.0, 18.0, 41.0, 18.0, 10.0, 12.0, 14.0, 11.0, 26.0, 17.0, 24.0, 10.0, 16.0, 9.0, 29.0, 14.0, 18.0, 12.0, 17.0, 14.0, 15.0, 13.0, 19.0, 28.0, 20.0, 17.0, 22.0, 36.0, 15.0, 33.0, 22.0, 36.0, 12.0, 20.0, 18.0, 30.0, 11.0, 32.0, 10.0, 29.0, 13.0, 18.0, 23.0, 26.0, 74.0, 21.0, 19.0, 19.0, 10.0, 17.0, 13.0, 42.0, 13.0, 59.0, 15.0, 30.0, 30.0, 44.0, 15.0, 21.0, 14.0, 16.0, 25.0, 27.0, 22.0, 15.0, 17.0, 17.0, 36.0, 18.0, 17.0, 19.0, 16.0, 23.0, 12.0, 14.0, 21.0, 29.0, 24.0, 11.0, 20.0, 16.0, 9.0, 22.0, 28.0, 18.0, 19.0, 26.0, 10.0, 17.0, 38.0, 19.0, 21.0, 26.0, 36.0, 18.0, 10.0, 21.0, 49.0, 28.0, 11.0, 15.0, 18.0, 36.0, 18.0, 24.0, 30.0, 25.0, 34.0, 15.0, 18.0, 45.0, 44.0, 19.0, 12.0, 21.0, 48.0, 17.0, 16.0, 21.0, 19.0, 36.0, 13.0, 71.0, 23.0, 18.0, 17.0, 22.0, 12.0, 18.0, 15.0, 19.0, 24.0, 36.0, 13.0, 17.0, 37.0, 25.0, 14.0, 15.0, 12.0, 45.0, 22.0, 16.0, 27.0, 25.0, 55.0, 9.0, 9.0, 43.0, 13.0, 13.0, 12.0, 18.0, 18.0, 13.0, 21.0, 22.0, 16.0, 39.0, 19.0], 'episode_lengths': [13, 14, 12, 14, 17, 13, 19, 22, 18, 10, 15, 14, 13, 24, 56, 13, 16, 15, 14, 25, 13, 23, 18, 41, 18, 10, 12, 14, 11, 26, 17, 24, 10, 16, 9, 29, 14, 18, 12, 17, 14, 15, 13, 19, 28, 20, 17, 22, 36, 15, 33, 22, 36, 12, 20, 18, 30, 11, 32, 10, 29, 13, 18, 23, 26, 74, 21, 19, 19, 10, 17, 13, 42, 13, 59, 15, 30, 30, 44, 15, 21, 14, 16, 25, 27, 22, 15, 17, 17, 36, 18, 17, 19, 16, 23, 12, 14, 21, 29, 24, 11, 20, 16, 9, 22, 28, 18, 19, 26, 10, 17, 38, 19, 21, 26, 36, 18, 10, 21, 49, 28, 11, 15, 18, 36, 18, 24, 30, 25, 34, 15, 18, 45, 44, 19, 12, 21, 48, 17, 16, 21, 19, 36, 13, 71, 23, 18, 17, 22, 12, 18, 15, 19, 24, 36, 13, 17, 37, 25, 14, 15, 12, 45, 22, 16, 27, 25, 55, 9, 9, 43, 13, 13, 12, 18, 18, 13, 21, 22, 16, 39, 19], 'policy_default_policy_reward': [13.0, 14.0, 12.0, 14.0, 17.0, 13.0, 19.0, 22.0, 18.0, 10.0, 15.0, 14.0, 13.0, 24.0, 56.0, 13.0, 16.0, 15.0, 14.0, 25.0, 13.0, 23.0, 18.0, 41.0, 18.0, 10.0, 12.0, 14.0, 11.0, 26.0, 17.0, 24.0, 10.0, 16.0, 9.0, 29.0, 14.0, 18.0, 12.0, 17.0, 14.0, 15.0, 13.0, 19.0, 28.0, 20.0, 17.0, 22.0, 36.0, 15.0, 33.0, 22.0, 36.0, 12.0, 20.0, 18.0, 30.0, 11.0, 32.0, 10.0, 29.0, 13.0, 18.0, 23.0, 26.0, 74.0, 21.0, 19.0, 19.0, 10.0, 17.0, 13.0, 42.0, 13.0, 59.0, 15.0, 30.0, 30.0, 44.0, 15.0, 21.0, 14.0, 16.0, 25.0, 27.0, 22.0, 15.0, 17.0, 17.0, 36.0, 18.0, 17.0, 19.0, 16.0, 23.0, 12.0, 14.0, 21.0, 29.0, 24.0, 11.0, 20.0, 16.0, 9.0, 22.0, 28.0, 18.0, 19.0, 26.0, 10.0, 17.0, 38.0, 19.0, 21.0, 26.0, 36.0, 18.0, 10.0, 21.0, 49.0, 28.0, 11.0, 15.0, 18.0, 36.0, 18.0, 24.0, 30.0, 25.0, 34.0, 15.0, 18.0, 45.0, 44.0, 19.0, 12.0, 21.0, 48.0, 17.0, 16.0, 21.0, 19.0, 36.0, 13.0, 71.0, 23.0, 18.0, 17.0, 22.0, 12.0, 18.0, 15.0, 19.0, 24.0, 36.0, 13.0, 17.0, 37.0, 25.0, 14.0, 15.0, 12.0, 45.0, 22.0, 16.0, 27.0, 25.0, 55.0, 9.0, 9.0, 43.0, 13.0, 13.0, 12.0, 18.0, 18.0, 13.0, 21.0, 22.0, 16.0, 39.0, 19.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.223068546889689), 'mean_inference_ms': np.float64(0.705487994267935), 'mean_action_processing_ms': np.float64(0.0840943242670164), 'mean_env_wait_ms': np.float64(0.042491642615481294), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004348519084217784), 'StateBufferConnector_ms': np.float64(0.0033643219497177627), 'ViewRequirementAgentConnector_ms': np.float64(0.0847262340587574)}, 'num_episodes': 182, 'episode_return_max': 74.0, 'episode_return_min': 9.0, 'episode_return_mean': np.float64(21.675824175824175), 'episodes_this_iter': 182}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(1.4478945), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(0.00843513103595342), 'total_loss': np.float64(0.07624960629474772), 'policy_loss': np.float64(-0.007256694225173804), 'vf_loss': np.float64(0.07971334639631251), 'vf_explained_var': np.float64(0.6886625298248824), 'kl': np.float64(0.018964765817023967), 'entropy': np.float64(0.6754167434348854), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,406.341,4000,4000,406.341,2,0,0,4000,"{'cpu_util_percent': np.float64(36.66666666666666), 'ram_util_percent': np.float64(45.46666666666667)}","{'training_iteration_time_ms': 9843.966, 'restore_workers_time_ms': 0.023, 'training_step_time_ms': 9843.911, 'sample_time_ms': 2157.003, 'load_time_ms': 0.53, 'load_throughput': 7550502.25, 'learn_time_ms': 7681.368, 'learn_throughput': 520.741, 'synch_weights_time_ms': 4.309}"
PPO_CartPole-v1_191a4_00006,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 81.0, 'episode_reward_min': 9.0, 'episode_reward_mean': np.float64(21.47027027027027), 'episode_len_mean': np.float64(21.47027027027027), 'episode_media': {}, 'episodes_timesteps_total': 3972, 'policy_reward_min': {'default_policy': np.float64(9.0)}, 'policy_reward_max': {'default_policy': np.float64(81.0)}, 'policy_reward_mean': {'default_policy': np.float64(21.47027027027027)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [12.0, 12.0, 21.0, 13.0, 14.0, 37.0, 35.0, 27.0, 11.0, 13.0, 22.0, 21.0, 12.0, 23.0, 39.0, 50.0, 44.0, 23.0, 21.0, 15.0, 41.0, 10.0, 22.0, 11.0, 11.0, 11.0, 11.0, 14.0, 11.0, 10.0, 16.0, 45.0, 9.0, 18.0, 19.0, 27.0, 16.0, 18.0, 35.0, 11.0, 18.0, 16.0, 25.0, 21.0, 22.0, 15.0, 9.0, 13.0, 35.0, 25.0, 17.0, 31.0, 37.0, 19.0, 46.0, 61.0, 17.0, 13.0, 13.0, 25.0, 21.0, 24.0, 21.0, 11.0, 35.0, 11.0, 11.0, 13.0, 24.0, 14.0, 12.0, 11.0, 26.0, 23.0, 19.0, 22.0, 15.0, 13.0, 42.0, 21.0, 31.0, 32.0, 31.0, 11.0, 44.0, 10.0, 60.0, 44.0, 18.0, 18.0, 18.0, 14.0, 26.0, 14.0, 16.0, 26.0, 29.0, 12.0, 14.0, 11.0, 21.0, 12.0, 20.0, 24.0, 43.0, 34.0, 29.0, 19.0, 19.0, 12.0, 16.0, 26.0, 27.0, 10.0, 11.0, 20.0, 17.0, 10.0, 18.0, 20.0, 26.0, 13.0, 20.0, 56.0, 16.0, 13.0, 12.0, 15.0, 34.0, 30.0, 81.0, 22.0, 19.0, 31.0, 21.0, 16.0, 11.0, 28.0, 19.0, 17.0, 37.0, 13.0, 21.0, 13.0, 24.0, 15.0, 15.0, 13.0, 12.0, 12.0, 14.0, 24.0, 16.0, 18.0, 12.0, 28.0, 22.0, 20.0, 30.0, 19.0, 22.0, 22.0, 31.0, 25.0, 19.0, 13.0, 23.0, 15.0, 28.0, 13.0, 47.0, 14.0, 49.0, 15.0, 16.0, 17.0, 14.0, 9.0, 15.0, 14.0, 15.0, 23.0, 11.0, 19.0, 20.0], 'episode_lengths': [12, 12, 21, 13, 14, 37, 35, 27, 11, 13, 22, 21, 12, 23, 39, 50, 44, 23, 21, 15, 41, 10, 22, 11, 11, 11, 11, 14, 11, 10, 16, 45, 9, 18, 19, 27, 16, 18, 35, 11, 18, 16, 25, 21, 22, 15, 9, 13, 35, 25, 17, 31, 37, 19, 46, 61, 17, 13, 13, 25, 21, 24, 21, 11, 35, 11, 11, 13, 24, 14, 12, 11, 26, 23, 19, 22, 15, 13, 42, 21, 31, 32, 31, 11, 44, 10, 60, 44, 18, 18, 18, 14, 26, 14, 16, 26, 29, 12, 14, 11, 21, 12, 20, 24, 43, 34, 29, 19, 19, 12, 16, 26, 27, 10, 11, 20, 17, 10, 18, 20, 26, 13, 20, 56, 16, 13, 12, 15, 34, 30, 81, 22, 19, 31, 21, 16, 11, 28, 19, 17, 37, 13, 21, 13, 24, 15, 15, 13, 12, 12, 14, 24, 16, 18, 12, 28, 22, 20, 30, 19, 22, 22, 31, 25, 19, 13, 23, 15, 28, 13, 47, 14, 49, 15, 16, 17, 14, 9, 15, 14, 15, 23, 11, 19, 20], 'policy_default_policy_reward': [12.0, 12.0, 21.0, 13.0, 14.0, 37.0, 35.0, 27.0, 11.0, 13.0, 22.0, 21.0, 12.0, 23.0, 39.0, 50.0, 44.0, 23.0, 21.0, 15.0, 41.0, 10.0, 22.0, 11.0, 11.0, 11.0, 11.0, 14.0, 11.0, 10.0, 16.0, 45.0, 9.0, 18.0, 19.0, 27.0, 16.0, 18.0, 35.0, 11.0, 18.0, 16.0, 25.0, 21.0, 22.0, 15.0, 9.0, 13.0, 35.0, 25.0, 17.0, 31.0, 37.0, 19.0, 46.0, 61.0, 17.0, 13.0, 13.0, 25.0, 21.0, 24.0, 21.0, 11.0, 35.0, 11.0, 11.0, 13.0, 24.0, 14.0, 12.0, 11.0, 26.0, 23.0, 19.0, 22.0, 15.0, 13.0, 42.0, 21.0, 31.0, 32.0, 31.0, 11.0, 44.0, 10.0, 60.0, 44.0, 18.0, 18.0, 18.0, 14.0, 26.0, 14.0, 16.0, 26.0, 29.0, 12.0, 14.0, 11.0, 21.0, 12.0, 20.0, 24.0, 43.0, 34.0, 29.0, 19.0, 19.0, 12.0, 16.0, 26.0, 27.0, 10.0, 11.0, 20.0, 17.0, 10.0, 18.0, 20.0, 26.0, 13.0, 20.0, 56.0, 16.0, 13.0, 12.0, 15.0, 34.0, 30.0, 81.0, 22.0, 19.0, 31.0, 21.0, 16.0, 11.0, 28.0, 19.0, 17.0, 37.0, 13.0, 21.0, 13.0, 24.0, 15.0, 15.0, 13.0, 12.0, 12.0, 14.0, 24.0, 16.0, 18.0, 12.0, 28.0, 22.0, 20.0, 30.0, 19.0, 22.0, 22.0, 31.0, 25.0, 19.0, 13.0, 23.0, 15.0, 28.0, 13.0, 47.0, 14.0, 49.0, 15.0, 16.0, 17.0, 14.0, 9.0, 15.0, 14.0, 15.0, 23.0, 11.0, 19.0, 20.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.22098627704493765), 'mean_inference_ms': np.float64(0.6902100228798944), 'mean_action_processing_ms': np.float64(0.08145220052310864), 'mean_env_wait_ms': np.float64(0.04177061496449536), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004284317429001267), 'StateBufferConnector_ms': np.float64(0.0033821930756440036), 'ViewRequirementAgentConnector_ms': np.float64(0.08304583059774863)}, 'num_episodes': 185, 'episode_return_max': 81.0, 'episode_return_min': 9.0, 'episode_return_mean': np.float64(21.47027027027027), 'episodes_this_iter': 185}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(1.1772361), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(0.0011407113612718776), 'total_loss': np.float64(0.28568219874850326), 'policy_loss': np.float64(-0.011556516021430012), 'vf_loss': np.float64(0.2940322115356403), 'vf_explained_var': np.float64(0.5407953379615661), 'kl': np.float64(0.01603252647911462), 'entropy': np.float64(0.6774524308660979), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,408.44,4000,4000,408.44,2,0,0,4000,"{'cpu_util_percent': np.float64(38.88571428571429), 'ram_util_percent': np.float64(44.67857142857143)}","{'training_iteration_time_ms': 9793.363, 'restore_workers_time_ms': 0.023, 'training_step_time_ms': 9793.308, 'sample_time_ms': 2101.357, 'load_time_ms': 0.499, 'load_throughput': 8008217.661, 'learn_time_ms': 7686.687, 'learn_throughput': 520.38, 'synch_weights_time_ms': 4.036}"
PPO_CartPole-v1_191a4_00007,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 72.0, 'episode_reward_min': 8.0, 'episode_reward_mean': np.float64(21.010695187165776), 'episode_len_mean': np.float64(21.010695187165776), 'episode_media': {}, 'episodes_timesteps_total': 3929, 'policy_reward_min': {'default_policy': np.float64(8.0)}, 'policy_reward_max': {'default_policy': np.float64(72.0)}, 'policy_reward_mean': {'default_policy': np.float64(21.010695187165776)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [12.0, 15.0, 26.0, 10.0, 22.0, 32.0, 18.0, 43.0, 28.0, 36.0, 21.0, 18.0, 21.0, 18.0, 18.0, 22.0, 47.0, 17.0, 11.0, 11.0, 24.0, 13.0, 17.0, 18.0, 22.0, 24.0, 28.0, 15.0, 51.0, 10.0, 11.0, 11.0, 23.0, 30.0, 23.0, 13.0, 22.0, 16.0, 25.0, 24.0, 35.0, 36.0, 22.0, 15.0, 18.0, 13.0, 18.0, 15.0, 12.0, 14.0, 21.0, 12.0, 29.0, 12.0, 72.0, 11.0, 29.0, 60.0, 13.0, 23.0, 11.0, 25.0, 22.0, 29.0, 15.0, 23.0, 13.0, 26.0, 28.0, 14.0, 20.0, 10.0, 15.0, 11.0, 17.0, 12.0, 14.0, 42.0, 14.0, 15.0, 19.0, 10.0, 10.0, 26.0, 12.0, 23.0, 11.0, 22.0, 11.0, 11.0, 15.0, 23.0, 32.0, 11.0, 27.0, 11.0, 12.0, 34.0, 24.0, 13.0, 10.0, 10.0, 27.0, 14.0, 10.0, 25.0, 10.0, 20.0, 16.0, 16.0, 15.0, 28.0, 32.0, 14.0, 17.0, 13.0, 13.0, 13.0, 32.0, 8.0, 29.0, 13.0, 12.0, 54.0, 19.0, 23.0, 14.0, 17.0, 11.0, 29.0, 17.0, 11.0, 19.0, 14.0, 55.0, 11.0, 21.0, 17.0, 13.0, 36.0, 19.0, 18.0, 11.0, 17.0, 25.0, 18.0, 24.0, 20.0, 17.0, 14.0, 16.0, 34.0, 21.0, 20.0, 19.0, 39.0, 11.0, 36.0, 33.0, 22.0, 21.0, 23.0, 35.0, 15.0, 12.0, 35.0, 11.0, 16.0, 10.0, 11.0, 39.0, 21.0, 15.0, 20.0, 13.0, 24.0, 23.0, 25.0, 12.0, 28.0, 10.0, 30.0, 43.0, 23.0, 35.0, 36.0, 52.0], 'episode_lengths': [12, 15, 26, 10, 22, 32, 18, 43, 28, 36, 21, 18, 21, 18, 18, 22, 47, 17, 11, 11, 24, 13, 17, 18, 22, 24, 28, 15, 51, 10, 11, 11, 23, 30, 23, 13, 22, 16, 25, 24, 35, 36, 22, 15, 18, 13, 18, 15, 12, 14, 21, 12, 29, 12, 72, 11, 29, 60, 13, 23, 11, 25, 22, 29, 15, 23, 13, 26, 28, 14, 20, 10, 15, 11, 17, 12, 14, 42, 14, 15, 19, 10, 10, 26, 12, 23, 11, 22, 11, 11, 15, 23, 32, 11, 27, 11, 12, 34, 24, 13, 10, 10, 27, 14, 10, 25, 10, 20, 16, 16, 15, 28, 32, 14, 17, 13, 13, 13, 32, 8, 29, 13, 12, 54, 19, 23, 14, 17, 11, 29, 17, 11, 19, 14, 55, 11, 21, 17, 13, 36, 19, 18, 11, 17, 25, 18, 24, 20, 17, 14, 16, 34, 21, 20, 19, 39, 11, 36, 33, 22, 21, 23, 35, 15, 12, 35, 11, 16, 10, 11, 39, 21, 15, 20, 13, 24, 23, 25, 12, 28, 10, 30, 43, 23, 35, 36, 52], 'policy_default_policy_reward': [12.0, 15.0, 26.0, 10.0, 22.0, 32.0, 18.0, 43.0, 28.0, 36.0, 21.0, 18.0, 21.0, 18.0, 18.0, 22.0, 47.0, 17.0, 11.0, 11.0, 24.0, 13.0, 17.0, 18.0, 22.0, 24.0, 28.0, 15.0, 51.0, 10.0, 11.0, 11.0, 23.0, 30.0, 23.0, 13.0, 22.0, 16.0, 25.0, 24.0, 35.0, 36.0, 22.0, 15.0, 18.0, 13.0, 18.0, 15.0, 12.0, 14.0, 21.0, 12.0, 29.0, 12.0, 72.0, 11.0, 29.0, 60.0, 13.0, 23.0, 11.0, 25.0, 22.0, 29.0, 15.0, 23.0, 13.0, 26.0, 28.0, 14.0, 20.0, 10.0, 15.0, 11.0, 17.0, 12.0, 14.0, 42.0, 14.0, 15.0, 19.0, 10.0, 10.0, 26.0, 12.0, 23.0, 11.0, 22.0, 11.0, 11.0, 15.0, 23.0, 32.0, 11.0, 27.0, 11.0, 12.0, 34.0, 24.0, 13.0, 10.0, 10.0, 27.0, 14.0, 10.0, 25.0, 10.0, 20.0, 16.0, 16.0, 15.0, 28.0, 32.0, 14.0, 17.0, 13.0, 13.0, 13.0, 32.0, 8.0, 29.0, 13.0, 12.0, 54.0, 19.0, 23.0, 14.0, 17.0, 11.0, 29.0, 17.0, 11.0, 19.0, 14.0, 55.0, 11.0, 21.0, 17.0, 13.0, 36.0, 19.0, 18.0, 11.0, 17.0, 25.0, 18.0, 24.0, 20.0, 17.0, 14.0, 16.0, 34.0, 21.0, 20.0, 19.0, 39.0, 11.0, 36.0, 33.0, 22.0, 21.0, 23.0, 35.0, 15.0, 12.0, 35.0, 11.0, 16.0, 10.0, 11.0, 39.0, 21.0, 15.0, 20.0, 13.0, 24.0, 23.0, 25.0, 12.0, 28.0, 10.0, 30.0, 43.0, 23.0, 35.0, 36.0, 52.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.2263765500720226), 'mean_inference_ms': np.float64(0.7089788519634345), 'mean_action_processing_ms': np.float64(0.08454907655262921), 'mean_env_wait_ms': np.float64(0.04289150537504307), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004479464362649357), 'StateBufferConnector_ms': np.float64(0.0034719864952372993), 'ViewRequirementAgentConnector_ms': np.float64(0.08508202863887032)}, 'num_episodes': 187, 'episode_return_max': 72.0, 'episode_return_min': 8.0, 'episode_return_mean': np.float64(21.010695187165776), 'episodes_this_iter': 187}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(3.6509023), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(6.274147546121654e-05), 'total_loss': np.float64(5.961005552097033), 'policy_loss': np.float64(-0.01725069465245851), 'vf_loss': np.float64(5.974096602265552), 'vf_explained_var': np.float64(-0.8107874786341062), 'kl': np.float64(0.02079825449742552), 'entropy': np.float64(0.6730567523869135), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,398.892,4000,4000,398.892,2,0,0,4000,"{'cpu_util_percent': np.float64(38.07333333333333), 'ram_util_percent': np.float64(44.85333333333333)}","{'training_iteration_time_ms': 10027.788, 'restore_workers_time_ms': 0.021, 'training_step_time_ms': 10027.731, 'sample_time_ms': 2189.972, 'load_time_ms': 0.475, 'load_throughput': 8413849.549, 'learn_time_ms': 7832.494, 'learn_throughput': 510.693, 'synch_weights_time_ms': 4.11}"
PPO_CartPole-v1_191a4_00008,40000,"{'num_env_steps_sampled': 40000, 'num_env_steps_trained': 40000, 'num_agent_steps_sampled': 40000, 'num_agent_steps_trained': 40000}",{},"{'episode_reward_max': 500.0, 'episode_reward_min': 41.0, 'episode_reward_mean': np.float64(262.17), 'episode_len_mean': np.float64(262.17), 'episode_media': {}, 'episodes_timesteps_total': 26217, 'policy_reward_min': {'default_policy': np.float64(41.0)}, 'policy_reward_max': {'default_policy': np.float64(500.0)}, 'policy_reward_mean': {'default_policy': np.float64(262.17)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [73.0, 114.0, 295.0, 96.0, 168.0, 204.0, 210.0, 137.0, 52.0, 41.0, 122.0, 254.0, 121.0, 237.0, 134.0, 193.0, 178.0, 243.0, 278.0, 181.0, 252.0, 179.0, 225.0, 226.0, 150.0, 268.0, 203.0, 147.0, 217.0, 346.0, 139.0, 265.0, 241.0, 272.0, 155.0, 378.0, 320.0, 201.0, 330.0, 248.0, 265.0, 254.0, 232.0, 221.0, 294.0, 198.0, 164.0, 336.0, 220.0, 236.0, 208.0, 374.0, 223.0, 246.0, 226.0, 289.0, 210.0, 205.0, 227.0, 388.0, 231.0, 240.0, 351.0, 238.0, 294.0, 294.0, 272.0, 500.0, 500.0, 224.0, 310.0, 386.0, 214.0, 245.0, 300.0, 331.0, 353.0, 221.0, 291.0, 280.0, 438.0, 350.0, 465.0, 391.0, 292.0, 276.0, 209.0, 388.0, 213.0, 299.0, 500.0, 206.0, 306.0, 305.0, 431.0, 310.0, 463.0, 452.0, 239.0, 500.0], 'episode_lengths': [73, 114, 295, 96, 168, 204, 210, 137, 52, 41, 122, 254, 121, 237, 134, 193, 178, 243, 278, 181, 252, 179, 225, 226, 150, 268, 203, 147, 217, 346, 139, 265, 241, 272, 155, 378, 320, 201, 330, 248, 265, 254, 232, 221, 294, 198, 164, 336, 220, 236, 208, 374, 223, 246, 226, 289, 210, 205, 227, 388, 231, 240, 351, 238, 294, 294, 272, 500, 500, 224, 310, 386, 214, 245, 300, 331, 353, 221, 291, 280, 438, 350, 465, 391, 292, 276, 209, 388, 213, 299, 500, 206, 306, 305, 431, 310, 463, 452, 239, 500], 'policy_default_policy_reward': [73.0, 114.0, 295.0, 96.0, 168.0, 204.0, 210.0, 137.0, 52.0, 41.0, 122.0, 254.0, 121.0, 237.0, 134.0, 193.0, 178.0, 243.0, 278.0, 181.0, 252.0, 179.0, 225.0, 226.0, 150.0, 268.0, 203.0, 147.0, 217.0, 346.0, 139.0, 265.0, 241.0, 272.0, 155.0, 378.0, 320.0, 201.0, 330.0, 248.0, 265.0, 254.0, 232.0, 221.0, 294.0, 198.0, 164.0, 336.0, 220.0, 236.0, 208.0, 374.0, 223.0, 246.0, 226.0, 289.0, 210.0, 205.0, 227.0, 388.0, 231.0, 240.0, 351.0, 238.0, 294.0, 294.0, 272.0, 500.0, 500.0, 224.0, 310.0, 386.0, 214.0, 245.0, 300.0, 331.0, 353.0, 221.0, 291.0, 280.0, 438.0, 350.0, 465.0, 391.0, 292.0, 276.0, 209.0, 388.0, 213.0, 299.0, 500.0, 206.0, 306.0, 305.0, 431.0, 310.0, 463.0, 452.0, 239.0, 500.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.1974792995616771), 'mean_inference_ms': np.float64(0.675338905446882), 'mean_action_processing_ms': np.float64(0.07920919711218329), 'mean_env_wait_ms': np.float64(0.039645796888998934), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.0041201114654541016), 'StateBufferConnector_ms': np.float64(0.003187894821166992), 'ViewRequirementAgentConnector_ms': np.float64(0.08067917823791504)}, 'num_episodes': 13, 'episode_return_max': 500.0, 'episode_return_min': 41.0, 'episode_return_mean': np.float64(262.17), 'episodes_this_iter': 13}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(0.7365763), 'cur_kl_coeff': np.float64(0.0375), 'cur_lr': np.float64(5.600895220717451e-05), 'total_loss': np.float64(0.03933217683052819), 'policy_loss': np.float64(-0.005168025131006875), 'vf_loss': np.float64(0.044386060029583244), 'vf_explained_var': np.float64(-0.4231460569366332), 'kl': np.float64(0.0030437823495647826), 'entropy': np.float64(0.5371327212741298), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(8835.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 40000, 'num_env_steps_trained': 40000, 'num_agent_steps_sampled': 40000, 'num_agent_steps_trained': 40000}",40000,40000,40000,40000,40000,4000,428.305,40000,4000,428.305,2,0,0,4000,"{'cpu_util_percent': np.float64(8.235714285714286), 'ram_util_percent': np.float64(27.0)}","{'training_iteration_time_ms': 9540.723, 'restore_workers_time_ms': 0.02, 'training_step_time_ms': 9540.673, 'sample_time_ms': 2013.951, 'load_time_ms': 0.322, 'load_throughput': 12413774.325, 'learn_time_ms': 7521.935, 'learn_throughput': 531.778, 'synch_weights_time_ms': 3.967}"
PPO_CartPole-v1_191a4_00009,4000,"{'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",{},"{'episode_reward_max': 80.0, 'episode_reward_min': 9.0, 'episode_reward_mean': np.float64(22.338983050847457), 'episode_len_mean': np.float64(22.338983050847457), 'episode_media': {}, 'episodes_timesteps_total': 3954, 'policy_reward_min': {'default_policy': np.float64(9.0)}, 'policy_reward_max': {'default_policy': np.float64(80.0)}, 'policy_reward_mean': {'default_policy': np.float64(22.338983050847457)}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [14.0, 17.0, 22.0, 14.0, 10.0, 16.0, 42.0, 18.0, 10.0, 24.0, 44.0, 21.0, 17.0, 21.0, 21.0, 19.0, 39.0, 14.0, 12.0, 26.0, 14.0, 11.0, 26.0, 13.0, 11.0, 41.0, 20.0, 11.0, 11.0, 29.0, 15.0, 12.0, 14.0, 19.0, 15.0, 54.0, 14.0, 36.0, 16.0, 21.0, 13.0, 34.0, 23.0, 19.0, 32.0, 26.0, 17.0, 14.0, 16.0, 21.0, 17.0, 18.0, 38.0, 20.0, 17.0, 13.0, 29.0, 20.0, 10.0, 22.0, 12.0, 11.0, 53.0, 13.0, 22.0, 13.0, 13.0, 27.0, 26.0, 12.0, 17.0, 15.0, 15.0, 27.0, 11.0, 19.0, 13.0, 18.0, 14.0, 19.0, 20.0, 11.0, 12.0, 30.0, 9.0, 16.0, 43.0, 31.0, 18.0, 16.0, 33.0, 19.0, 25.0, 33.0, 42.0, 13.0, 14.0, 11.0, 22.0, 12.0, 17.0, 17.0, 15.0, 20.0, 38.0, 13.0, 19.0, 12.0, 12.0, 14.0, 44.0, 14.0, 33.0, 23.0, 51.0, 13.0, 17.0, 17.0, 13.0, 20.0, 51.0, 32.0, 15.0, 44.0, 39.0, 32.0, 30.0, 12.0, 15.0, 37.0, 49.0, 18.0, 80.0, 16.0, 12.0, 35.0, 11.0, 16.0, 15.0, 34.0, 18.0, 20.0, 38.0, 25.0, 14.0, 53.0, 17.0, 18.0, 18.0, 40.0, 21.0, 9.0, 56.0, 25.0, 29.0, 12.0, 47.0, 61.0, 16.0, 42.0, 14.0, 32.0, 14.0, 47.0, 27.0, 11.0, 14.0, 21.0, 12.0, 9.0, 13.0, 26.0, 21.0, 15.0, 12.0, 11.0, 18.0], 'episode_lengths': [14, 17, 22, 14, 10, 16, 42, 18, 10, 24, 44, 21, 17, 21, 21, 19, 39, 14, 12, 26, 14, 11, 26, 13, 11, 41, 20, 11, 11, 29, 15, 12, 14, 19, 15, 54, 14, 36, 16, 21, 13, 34, 23, 19, 32, 26, 17, 14, 16, 21, 17, 18, 38, 20, 17, 13, 29, 20, 10, 22, 12, 11, 53, 13, 22, 13, 13, 27, 26, 12, 17, 15, 15, 27, 11, 19, 13, 18, 14, 19, 20, 11, 12, 30, 9, 16, 43, 31, 18, 16, 33, 19, 25, 33, 42, 13, 14, 11, 22, 12, 17, 17, 15, 20, 38, 13, 19, 12, 12, 14, 44, 14, 33, 23, 51, 13, 17, 17, 13, 20, 51, 32, 15, 44, 39, 32, 30, 12, 15, 37, 49, 18, 80, 16, 12, 35, 11, 16, 15, 34, 18, 20, 38, 25, 14, 53, 17, 18, 18, 40, 21, 9, 56, 25, 29, 12, 47, 61, 16, 42, 14, 32, 14, 47, 27, 11, 14, 21, 12, 9, 13, 26, 21, 15, 12, 11, 18], 'policy_default_policy_reward': [14.0, 17.0, 22.0, 14.0, 10.0, 16.0, 42.0, 18.0, 10.0, 24.0, 44.0, 21.0, 17.0, 21.0, 21.0, 19.0, 39.0, 14.0, 12.0, 26.0, 14.0, 11.0, 26.0, 13.0, 11.0, 41.0, 20.0, 11.0, 11.0, 29.0, 15.0, 12.0, 14.0, 19.0, 15.0, 54.0, 14.0, 36.0, 16.0, 21.0, 13.0, 34.0, 23.0, 19.0, 32.0, 26.0, 17.0, 14.0, 16.0, 21.0, 17.0, 18.0, 38.0, 20.0, 17.0, 13.0, 29.0, 20.0, 10.0, 22.0, 12.0, 11.0, 53.0, 13.0, 22.0, 13.0, 13.0, 27.0, 26.0, 12.0, 17.0, 15.0, 15.0, 27.0, 11.0, 19.0, 13.0, 18.0, 14.0, 19.0, 20.0, 11.0, 12.0, 30.0, 9.0, 16.0, 43.0, 31.0, 18.0, 16.0, 33.0, 19.0, 25.0, 33.0, 42.0, 13.0, 14.0, 11.0, 22.0, 12.0, 17.0, 17.0, 15.0, 20.0, 38.0, 13.0, 19.0, 12.0, 12.0, 14.0, 44.0, 14.0, 33.0, 23.0, 51.0, 13.0, 17.0, 17.0, 13.0, 20.0, 51.0, 32.0, 15.0, 44.0, 39.0, 32.0, 30.0, 12.0, 15.0, 37.0, 49.0, 18.0, 80.0, 16.0, 12.0, 35.0, 11.0, 16.0, 15.0, 34.0, 18.0, 20.0, 38.0, 25.0, 14.0, 53.0, 17.0, 18.0, 18.0, 40.0, 21.0, 9.0, 56.0, 25.0, 29.0, 12.0, 47.0, 61.0, 16.0, 42.0, 14.0, 32.0, 14.0, 47.0, 27.0, 11.0, 14.0, 21.0, 12.0, 9.0, 13.0, 26.0, 21.0, 15.0, 12.0, 11.0, 18.0]}, 'sampler_perf': {'mean_raw_obs_processing_ms': np.float64(0.22878931180848605), 'mean_inference_ms': np.float64(0.71699312189872), 'mean_action_processing_ms': np.float64(0.08440754341623496), 'mean_env_wait_ms': np.float64(0.04434275310817298), 'mean_env_render_ms': np.float64(0.0)}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': np.float64(0.004892564762783589), 'StateBufferConnector_ms': np.float64(0.0035308848666605976), 'ViewRequirementAgentConnector_ms': np.float64(0.08774463739772301)}, 'num_episodes': 177, 'episode_return_max': 80.0, 'episode_return_min': 9.0, 'episode_return_mean': np.float64(22.338983050847457), 'episodes_this_iter': 177}",{},"{'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(1.589623), 'cur_kl_coeff': np.float64(0.20000000000000004), 'cur_lr': np.float64(3.0445361465236456e-05), 'total_loss': np.float64(9.160076042400894), 'policy_loss': np.float64(-0.03459041225022927), 'vf_loss': np.float64(9.189447409106839), 'vf_explained_var': np.float64(-0.01805909865645952), 'kl': np.float64(0.02609530152609836), 'entropy': np.float64(0.6678357378129036), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(465.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}",4000,4000,4000,4000,4000,4000,393.21,4000,4000,393.21,2,0,0,4000,"{'cpu_util_percent': np.float64(29.2), 'ram_util_percent': np.float64(40.886666666666656)}","{'training_iteration_time_ms': 10172.7, 'restore_workers_time_ms': 0.021, 'training_step_time_ms': 10172.638, 'sample_time_ms': 2184.732, 'load_time_ms': 0.507, 'load_throughput': 7887736.718, 'learn_time_ms': 7982.339, 'learn_throughput': 501.106, 'synch_weights_time_ms': 4.405}"


[36m(PPO pid=524334)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/home/lasse/ray_minicourse/lesson_2/ray_results/nb_3/ppo_cartpole/PPO_CartPole-v1_191a4_00003_3_gamma=0.9999,lr=0.0272_2024-11-30_03-20-51/checkpoint_000000)
[36m(PPO pid=524332)[0m Install gputil for GPU system monitoring.[32m [repeated 4x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)[0m
[36m(PPO pid=525267)[0m Install gputil for GPU system monitoring.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
[36m(PPO pid=525267)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/home/lasse/ray_minicourse/lesson_2/ray_results/nb_3/ppo_cartpole/PPO_CartPole-v1_191a4_00005_5_gamma=0.5000,lr=0.0084_2024-11-30_03-20-51/checkpoint_000000)
NaN or Inf found in input tensor.
NaN

ResultGrid<[
  Result(
    metrics={'custom_metrics': {}, 'episode_media': {}, 'info': {'learner': {'default_policy': {'learner_stats': {'allreduce_latency': np.float64(0.0), 'grad_gnorm': np.float32(0.37658748), 'cur_kl_coeff': np.float64(0.05000000000000001), 'cur_lr': np.float64(0.00011869961140365569), 'total_loss': np.float64(-0.0008616650679370299), 'policy_loss': np.float64(-0.006050677709682014), 'vf_loss': np.float64(0.004951831043589302), 'vf_explained_var': np.float64(-0.24358116740821512), 'kl': np.float64(0.004743618728305799), 'entropy': np.float64(0.5449451564140217), 'entropy_coeff': np.float64(0.0)}, 'model': {}, 'custom_metrics': {}, 'num_agent_steps_trained': np.float64(128.0), 'num_grad_updates_lifetime': np.float64(8835.5), 'diff_num_grad_updates_vs_sampler_policy': np.float64(464.5)}}, 'num_env_steps_sampled': 40000, 'num_env_steps_trained': 40000, 'num_agent_steps_sampled': 40000, 'num_agent_steps_trained': 40000}, 'env_runners': {'episode_reward_max': 386.0, 'ep

Pay attention to the `Trial progress` in the logs above in the column `agent_timesteps_total`. Not all the trials have completed the $40000$ pre-defined steps (training iterations * batch size = 10 * 4000). The ASHA scheduler finalized the unpromised trials before finishing all the training iterations to save time and speedup the hyperparameter optimization process.

Now, let's observe the reward progress during the training in tensorboard looking at the `episode_reward_mean` metric.

In [9]:
%load_ext tensorboard
%tensorboard --logdir ray_results/nb_3

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6010 (pid 526429), started 0:00:21 ago. (Use '!kill 526429' to kill it.)

The training reward shows that not all the trials completed the 10 iterations. This is because the ASHA scheduler stops the trials that are not performing well and only the best trials finished the training.