# Tutorial 08: Creating Custom Environments

This tutorial walks you through the process of creating custom environments in Flow. Custom environments contain specific methods that define the problem space of a task, such as the state and action spaces of the RL agent and the signal (or reward) that the RL algorithm will optimize over. By specifying a few methods within a custom environment, individuals can use Flow to design traffic control tasks of various types, such as optimal traffic light signal timing and flow regulation via mixed autonomy traffic (see the figures below). Finally, these environments are compatible with OpenAI Gym.

The rest of the tutorial is organized as follows: in section 1 walks through the process of creating an environment for mixed autonomy vehicle control where the autonomous vehicles perceive all vehicles in the network, and section two implements the environment in simulation.

<img src="img/sample_envs.png">


## 1. Creating an Environment Class

In this tutorial we will create an environment in which the accelerations of a handful of vehicles in the network are specified by a single centralized agent, with the objective of the agent being to improve the average speed of all vehicle in the network. In order to create this environment, we begin by inheriting the base environment class located in *flow.envs*:

In [1]:
866691# import the base environment class

import os
866691
print(os.environ["SUMO_HOME"])

from flow.envs import Env

# define the environment class, and inherit properties from the base environment class
class myEnv(Env):
    pass

/home/rrishi/sumo_binaries/bin


`Env` provides the interface for running and modifying a SUMO simulation. Using this class, we are able to start sumo, provide a network to specify a configuration and controllers, perform simulation steps, and reset the simulation to an initial configuration.

By inheriting Flow's base environment, a custom environment for varying control tasks can be created by adding the following functions to the child class: 
* **action_space**
* **observation_space**
* **apply_rl_actions**
* **get_state**
* **compute_reward**

Each of these components are covered in the next few subsections.

### 1.1 ADDITIONAL_ENV_PARAMS

The features used to parametrize components of the state/action space as well as the reward function are specified within the `EnvParams` input, as discussed in tutorial 1. Specifically, for the sake of our environment, the `additional_params` attribute within `EnvParams` will be responsible for storing information on the maximum possible accelerations and decelerations by the autonomous vehicles in the network. Accordingly, for this problem, we define an `ADDITIONAL_ENV_PARAMS` variable of the form:

In [2]:
ADDITIONAL_ENV_PARAMS = {
    "max_accel": 1,
    "max_decel": 1,
}

All environments presented in Flow provide a unique `ADDITIONAL_ENV_PARAMS` component containing the information needed to properly define some environment-specific parameters. We assume that these values are always provided by the user, and accordingly can be called from `env_params`. For example, if we would like to call the "max_accel" parameter, we simply type:

    max_accel = env_params.additional_params["max_accel"]

### 1.2 action_space

The `action_space` method defines the number and bounds of the actions provided by the RL agent. In order to define these bounds with an OpenAI gym setting, we use several objects located within *gym.spaces*. For instance, the `Box` object is used to define a bounded array of values in $\mathbb{R}^n$.

In [3]:
from gym.spaces.box import Box

In addition, `Tuple` objects (not used by this tutorial) allow users to combine multiple `Box` elements together.

In [4]:
from gym.spaces import Tuple

Once we have imported the above objects, we are ready to define the bounds of our action space. Given that our actions consist of a list of n real numbers (where n is the number of autonomous vehicles) bounded from above and below by "max_accel" and "max_decel" respectively (see section 1.1), we can define our action space as follows:

In [5]:
ADDITIONAL_NET_PARAMS = {
    # radius of the circular components
    "radius_ring": 30,
    # number of lanes
    "lanes": 1,
    # speed limit for all edges
    "speed_limit": 80,
    # resolution of the curved portions
    "resolution": 40
}

In [6]:
from flow.networks import Network

class myNetwork(Network):  # update my network class

    def specify_nodes(self, net_params):
        # one of the elements net_params will need is a "radius" value
        r = 200 #net_params.additional_params["radius"]
        y = 20

        # specify the name and position (x,y) of each node
        nodes = [{"id": "center", "x": 0,  "y": 0},
                 {"id": "right", "x": r,  "y": 0},
                 {"id": "bottom", "x": 0,  "y": -r},
                 {"id": "left", "x": -r,  "y": 0},
                 {"id": "top", "x": 0,  "y": r}]
                 
            
            
            
            
            #{"id": "topright", "x": y,  "y": y},
                 #{"id": "bottomright",  "x": y,  "y": -y},
                 #{"id": "topleft",    "x": -y,  "y": y},
                 #{"id": "bottomleft",   "x": -y, "y": -y},
                 #{"id": "toprightright",   "x": r, "y": y},
                 #{"id": "bottomrightright",   "x": r, "y": -y},
                 #{"id": "topleftleft",   "x": -r, "y": y},
                 #{"id": "bottomleftleft",   "x": -r, "y": -y},
                 #{"id": "toptopright",   "x": y, "y": r},
                 #{"id": "toptopleft",   "x": -y, "y": r},
                 #{"id": "bottombottomright",   "x": y, "y": -r},
                 #{"id": "bottombottomleft",   "x": -y, "y": -r},]

        return nodes

In [7]:
# some mathematical operations that may be used
from numpy import pi, sin, cos, linspace

class myNetwork(myNetwork):  # update my network class

    def specify_edges(self, net_params):
        #r = 50 #net_params.additional_params["radius"]
        edgelen = 200#r * pi / 2
        # this will let us control the number of lanes in the network
        lanes = 3 #net_params.additional_params["num_lanes"]
        # speed limit of vehicles in the network
        speed_limit = net_params.additional_params["speed_limit"]

        edges = [
            {#exit
                "id": "exit_edge1", 
                "numLanes": lanes, 
                "speed": speed_limit,
                "from": "center",
                "to": "right", 
                "length": edgelen,
            },
            {
                "id": "enter_edge2", 
                "numLanes": lanes, 
                "speed": speed_limit,
                "from": "left",
                "to": "center",
                "length": edgelen,
            },
            {
                "id": "enter_edge3", 
                "numLanes": lanes, 
                "speed": speed_limit,
                "from": "bottom",
                "to": "center",
                "length": edgelen,
            },
            {#exit
                "id": "exit_edge4",
                "numLanes": lanes,
                "speed": speed_limit,
                "from": "center",
                "to": "top", 
                "length": edgelen,
            },
            {
                "id": "enter_edge1",
                "numLanes": lanes, 
                "speed": speed_limit,
                "from": "right",
                "to": "center",
                "length": edgelen,
            },
            {#exit
                "id": "exit_edge2",
                "numLanes": lanes, 
                "speed": speed_limit,
                "from": "center",
                "to": "left",
                "length": edgelen,
            },
            {#exit
                "id": "exit_edge3", 
                "numLanes": lanes, 
                "speed": speed_limit,
                "from": "center",
                "to": "bottom",
                "length": edgelen,
            },
            {
                "id": "enter_edge4",
                "numLanes": lanes,
                "speed": speed_limit,
                "from": "top",
                "to": "center", 
                "length": edgelen,
            }
        ]

        return edges



### 1.3 observation_space
The observation space of an environment represents the number and types of observations that are provided to the reinforcement learning agent. For this example, we will be observe two values for each vehicle: its position and speed. Accordingly, we need a observation space that is twice the size of the number of vehicles in the network.

### 1.4 apply_rl_actions
The function `apply_rl_actions` is responsible for transforming commands specified by the RL agent into actual actions performed within the simulator. The vehicle kernel within the environment class contains several helper methods that may be of used to facilitate this process. These functions include:
* **apply_acceleration** (list of str, list of float) -> None: converts an action, or a list of actions, into accelerations to the specified vehicles (in simulation)
* **apply_lane_change** (list of str, list of {-1, 0, 1}) -> None: converts an action, or a list of actions, into lane change directions for the specified vehicles (in simulation)
* **choose_route** (list of str, list of list of str) -> None: converts an action, or a list of actions, into rerouting commands for the specified vehicles (in simulation)

For our example we consider a situation where the RL agent can only specify accelerations for the RL vehicles; accordingly, the actuation method for the RL agent is defined as follows:

### 1.5 get_state

The `get_state` method extracts features from within the environments and provides then as inputs to the policy provided by the RL agent. Several helper methods exist within flow to help facilitate this process. Some useful helper method can be accessed from the following objects:
* **self.k.vehicle**: provides current state information for all vehicles within the network
* **self.k.traffic_light**: provides state information on the traffic lights
* **self.k.network**: information on the network, which unlike the vehicles and traffic lights is static
* More accessor objects and methods can be found within the Flow documentation at: http://berkeleyflow.readthedocs.io/en/latest/

In order to model global observability within the network, our state space consists of the speeds and positions of all vehicles (as mentioned in section 1.3). This is implemented as follows:

### 1.6 compute_reward

The `compute_reward` method returns the reward associated with any given state. These value may encompass returns from values within the state space (defined in section 1.5) or may contain information provided by the environment but not immediately available within the state, as is the case in partially observable tasks (or POMDPs).

For this tutorial, we choose the reward function to be the average speed of all vehicles currently in the network. In order to extract this information from the environment, we use the `get_speed` method within the Vehicle kernel class to collect the current speed of all vehicles in the network, and return the average of these speeds as the reward. This is done as follows:

## 2. Testing the New Environment


### 2.1 Testing in Simulation
Now that we have successfully created our new environment, we are ready to test this environment in simulation. We begin by running this environment in a non-RL based simulation. The return provided at the end of the simulation is indicative of the cumulative expected reward when jam-like behavior exists within the netowrk. 

In [8]:
import re
from flow.controllers.routing_controllers import MinicityRouter

from flow.controllers import IDMController, ContinuousRouter, RLController
from flow.core.experiment import Experiment
from flow.core.params import SumoParams, EnvParams, \
    InitialConfig, NetParams
from flow.core.params import VehicleParams
from flow.networks.ring import RingNetwork, ADDITIONAL_NET_PARAMS

from myEnv import myEnv
from flow.envs import AccelEnv

# sim_params = SumoParams(sim_step=0.1, render=False)

# vehicles = VehicleParams()
# vehicles.add(veh_id="rl",
#              acceleration_controller=(RLController, {}),
#              routing_controller=(MinicityRouter, {}),
#              num_vehicles=25)

# env_params = EnvParams(additional_params=ADDITIONAL_ENV_PARAMS)

# additional_net_params = ADDITIONAL_NET_PARAMS.copy()
# net_params = NetParams(additional_params=additional_net_params)

# initial_config = InitialConfig(bunching=20)

# flow_params = dict(
#     exp_tag='ring',
#     env_name=myEnv,  # using my new environment for the simulation
#     network=myNetwork,
#     simulator='traci',
#     sim=sim_params,
#     env=env_params,
#     net=net_params,
#     veh=vehicles,
#     initial=initial_config,
# )

# # number of time steps
# flow_params['env'].horizon = 1500
# exp = Experiment(flow_params)

# # run the sumo simulation
# _ = exp.run(1)

### 2.2 Training the New Environment

Next, we wish to train this environment in the presence of the autonomous vehicle agent to reduce the formation of waves in the network, thereby pushing the performance of vehicles in the network past the above expected return.

The below code block may be used to train the above environment using the Proximal Policy Optimization (PPO) algorithm provided by RLlib. In order to register the environment with OpenAI gym, the environment must first be placed in a separate ".py" file and then imported via the script below. Then, the script immediately below should function regularly.

In [9]:
# :|

**Note**: We do not recommend training this environment to completion within a jupyter notebook setting; however, once training is complete, visualization of the resulting policy should show that the autonomous vehicle learns to dissipate the formation and propagation of waves in the network.

In [None]:
import json
import ray
from ray.rllib.agents.registry import get_agent_class
from ray.tune import run_experiments
from ray.tune.registry import register_env

from flow.networks.ring import RingNetwork, ADDITIONAL_NET_PARAMS
from flow.utils.registry import make_create_env
from flow.utils.rllib import FlowParamsEncoder
from flow.core.params import SumoParams, EnvParams, InitialConfig, NetParams
from flow.core.params import VehicleParams, SumoCarFollowingParams
from flow.controllers import RLController, IDMController, ContinuousRouter

# time horizon of a single rollout
HORIZON = 1500
# number of rollouts per training iteration
N_ROLLOUTS = 10
# number of parallel workers
N_CPUS = 15


vehicles = VehicleParams()


# We place 25 autonomous vehicles in the network
# vehicles.add(
#     veh_id="rl",
#     acceleration_controller=(RLController, {}),
#     routing_controller=(MinicityRouter, {}),
#     num_vehicles=25)


# Places x number of autonomous vehicles and y number of human-driven vehicles
x = 24
y = 8

if x != 0:
    vehicles.add(
        veh_id="rl",
        acceleration_controller=(RLController, {}),
        routing_controller=(MinicityRouter, {}),
        num_vehicles=x)

if y != 0:
    vehicles.add(
        veh_id="flow",
        acceleration_controller=(IDMController, {}),
        routing_controller=(MinicityRouter, {}),
        num_vehicles=y,
        color="white")

flow_params = dict(
    # name of the experiment
    exp_tag="experiment",

    # name of the flow environment the experiment is running on
    env_name=myEnv,  # <------ here we replace the environment with our new environment

    # name of the network class the experiment is running on
    network=myNetwork,

    # simulator that is used by the experiment
    simulator='traci',

    # sumo-related parameters (see flow.core.params.SumoParams)
    sim=SumoParams(
        sim_step=0.1,
        render=False,
        restart_instance=True,
    ),

    # environment related parameters (see flow.core.params.EnvParams)
    env=EnvParams(
        horizon=HORIZON,
        warmup_steps=0,
        clip_actions=False,
        additional_params={
#             "target_velocity": 50,
            "sort_vehicles": False,
            "max_accel": 1,
            "max_decel": 1,
        },
    ),
                                         
    # network-related parameters (see flow.core.params.NetParams and the
    # network's documentation or ADDITIONAL_NET_PARAMS component)
    net=NetParams(
        additional_params=ADDITIONAL_NET_PARAMS.copy()
    ),

    # vehicles to be placed in the network at the start of a rollout (see
    # flow.core.params.VehicleParams)
    veh=vehicles,

    # parameters specifying the positioning of vehicles upon initialization/
    # reset (see flow.core.params.InitialConfig)
    initial=InitialConfig(
        bunching=20,
    ),
)


def setup_exps():
    """Return the relevant components of an RLlib experiment.

    Returns
    -------
    str
        name of the training algorithm
    str
        name of the gym environment to be trained
    dict
        training configuration parameters
    """
    alg_run = "PPO"

    agent_cls = get_agent_class(alg_run)
    config = agent_cls._default_config.copy()
    config["num_workers"] = N_CPUS
    config["train_batch_size"] = HORIZON * N_ROLLOUTS
    config["gamma"] = 0.999  # discount rate
    config["model"].update({"fcnet_hiddens": [3, 3]})
    config["use_gae"] = True
    config["lambda"] = 0.97
    config["kl_target"] = 0.02
    config["num_sgd_iter"] = 10
    config['clip_actions'] = False  # FIXME(ev) temporary ray bug
    config["horizon"] = HORIZON

    # save the flow params for replay
    flow_json = json.dumps(
        flow_params, cls=FlowParamsEncoder, sort_keys=True, indent=4)
    config['env_config']['flow_params'] = flow_json
    config['env_config']['run'] = alg_run

    create_env, gym_name = make_create_env(params=flow_params, version=0)

    # Register as rllib env
    register_env(gym_name, create_env)
    return alg_run, gym_name, config


alg_run, gym_name, config = setup_exps()
ray.init(num_cpus=N_CPUS + 1)
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "checkpoint_freq": 20,
        "checkpoint_at_end": True,
        "max_failures": 999,
        "stop": {
            "training_iteration": 1000,
        },
    }
})


# VISUALIZE ITERATION

# trials = run_experiments({
#     flow_params["exp_tag"]: {
#         "run": alg_run,
#         "env": gym_name,
#         "config": {
#             **config
#         },
#         "restore": "~/ray_results/experiment/248-1000/checkpoint_980/checkpoint-980",
# #         "checkpoint_freq": 20,
#         "checkpoint_at_end": False,
#         "max_failures": 999,
#         "stop": {
#             "training_iteration": 981,
#         },
#     },
# })

2021-03-08 15:14:25,545	INFO resource_spec.py:216 -- Starting Ray with 14.79 GiB memory available for workers and up to 7.41 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2021-03-08 15:14:26,252	INFO ray_trial_executor.py:121 -- Trial PPO_myEnv-v0_06b10b5c: Setting up new remote runner.


Trial name,status,loc
PPO_myEnv-v0_06b10b5c,RUNNING,


[2m[36m(pid=14857)[0m 2021-03-08 15:14:28,116	INFO trainer.py:371 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution
[2m[36m(pid=14857)[0m 2021-03-08 15:14:28,453	INFO trainer.py:512 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
[2m[36m(pid=14857)[0m No routes specified, defaulting to single edge routes.
[2m[36m(pid=14852)[0m No routes specified, defaulting to single edge routes.
[2m[36m(pid=14855)[0m No routes specified, defaulting to single edge routes.
[2m[36m(pid=14864)[0m No routes specified, defaulting to single edge routes.
[2m[36m(pid=14854)[0m No routes specified, defaulting to single edge routes.
[2m[36m(pid=14862)[0m No routes specified, defaulting to single edge routes.
[2m[36m(pid=14850)[0m No routes specified, defaulting to single edge routes.
[2m[36m(pid=14860)[0m No routes specified, defaulting to single edge routes.
[2m[36m(pid=14861)[0m





Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-14-50
  done: false
  episode_len_mean: 638.6666666666666
  episode_reward_max: 1000.321654419291
  episode_reward_mean: 895.8704237249609
  episode_reward_min: 697.5786569193747
  episodes_this_iter: 3
  episodes_total: 3
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3523.237
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 34.12252426147461
        entropy_coeff: 0.0
        kl: 0.0024834556970745325
        policy_loss: -0.0047480259090662
        total_loss: 1802.8619384765625
        vf_explained_var: 0.0007854322902858257
        vf_loss: 1802.8660888671875
    load_time_ms: 84.844
    num_steps_sampled: 15000
    num_steps_trained: 14976
    sample_time_ms: 12787.356
    update_time_ms: 606.004
  iterations_since_restore: 1
  node_ip: 192.168.107.

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,1,17.0869,15000,895.87




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-15-04
  done: false
  episode_len_mean: 1241.0
  episode_reward_max: 2461.4325809870265
  episode_reward_mean: 1726.6162190664527
  episode_reward_min: 697.5786569193747
  episodes_this_iter: 13
  episodes_total: 16
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3366.176
    learner:
      default_policy:
        cur_kl_coeff: 0.10000000149011612
        cur_lr: 4.999999873689376e-05
        entropy: 34.11256408691406
        entropy_coeff: 0.0
        kl: 0.0022558714263141155
        policy_loss: -0.005574571900069714
        total_loss: 1692.2352294921875
        vf_explained_var: 0.0003996078739874065
        vf_loss: 1692.2403564453125
    load_time_ms: 50.459
    num_steps_sampled: 30000
    num_steps_trained: 29952
    sample_time_ms: 11333.889
    update_time_ms: 307.218
  iterations_since_restore: 2
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,2,30.2223,30000,1726.62






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-15-17
  done: false
  episode_len_mean: 1290.5333333333333
  episode_reward_max: 2461.4325809870265
  episode_reward_mean: 1733.767699603947
  episode_reward_min: 697.5786569193747
  episodes_this_iter: 14
  episodes_total: 30
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3310.305
    learner:
      default_policy:
        cur_kl_coeff: 0.05000000074505806
        cur_lr: 4.999999873689376e-05
        entropy: 34.123382568359375
        entropy_coeff: 0.0
        kl: 0.002882912289351225
        policy_loss: -0.006402074359357357
        total_loss: 1581.2396240234375
        vf_explained_var: 0.0016743887681514025
        vf_loss: 1581.245849609375
    load_time_ms: 40.258
    num_steps_sampled: 45000
    num_steps_trained: 44928
    sample_time_ms: 10970.681
    update_time_ms: 209.02
  iterations_since_restore: 3
  node_ip: 192.168.

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,3,43.7165,45000,1733.77




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-15-30
  done: false
  episode_len_mean: 1298.157894736842
  episode_reward_max: 2562.318823354186
  episode_reward_mean: 1751.2049096923342
  episode_reward_min: 697.5786569193747
  episodes_this_iter: 8
  episodes_total: 38
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3279.057
    learner:
      default_policy:
        cur_kl_coeff: 0.02500000037252903
        cur_lr: 4.999999873689376e-05
        entropy: 34.180625915527344
        entropy_coeff: 0.0
        kl: 0.0038769261445850134
        policy_loss: -0.007699020206928253
        total_loss: 1582.5723876953125
        vf_explained_var: 0.00019316744874231517
        vf_loss: 1582.580322265625
    load_time_ms: 33.02
    num_steps_sampled: 60000
    num_steps_trained: 59904
    sample_time_ms: 10766.588
    update_time_ms: 158.981
  iterations_since_restore: 4
  node_ip: 192.168.

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,4,57.0961,60000,1751.2




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-15-45
  done: false
  episode_len_mean: 1282.3
  episode_reward_max: 2562.318823354186
  episode_reward_mean: 1743.486264963334
  episode_reward_min: 697.5786569193747
  episodes_this_iter: 12
  episodes_total: 50
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3276.094
    learner:
      default_policy:
        cur_kl_coeff: 0.012500000186264515
        cur_lr: 4.999999873689376e-05
        entropy: 34.20689010620117
        entropy_coeff: 0.0
        kl: 0.003084396943449974
        policy_loss: -0.005685742478817701
        total_loss: 1821.65185546875
        vf_explained_var: -0.0006134683499112725
        vf_loss: 1821.6572265625
    load_time_ms: 29.893
    num_steps_sampled: 75000
    num_steps_trained: 74880
    sample_time_ms: 10812.522
    update_time_ms: 128.965
  iterations_since_restore: 5
  node_ip: 192.168.107.157
  num_h

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,5,71.4032,75000,1743.49






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-16-00
  done: false
  episode_len_mean: 1278.4285714285713
  episode_reward_max: 2562.318823354186
  episode_reward_mean: 1735.8673493631013
  episode_reward_min: 697.5786569193747
  episodes_this_iter: 13
  episodes_total: 63
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3264.767
    learner:
      default_policy:
        cur_kl_coeff: 0.0062500000931322575
        cur_lr: 4.999999873689376e-05
        entropy: 34.21315383911133
        entropy_coeff: 0.0
        kl: 0.003262594109401107
        policy_loss: -0.006288408767431974
        total_loss: 1704.56396484375
        vf_explained_var: 0.006257759407162666
        vf_loss: 1704.570556640625
    load_time_ms: 27.916
    num_steps_sampled: 90000
    num_steps_trained: 89856
    sample_time_ms: 10930.298
    update_time_ms: 109.067
  iterations_since_restore: 6
  node_ip: 192.168.1

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,6,86.1772,90000,1735.87






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-16-14
  done: false
  episode_len_mean: 1261.4415584415585
  episode_reward_max: 2562.318823354186
  episode_reward_mean: 1736.2711934523643
  episode_reward_min: 697.5786569193747
  episodes_this_iter: 14
  episodes_total: 77
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3257.054
    learner:
      default_policy:
        cur_kl_coeff: 0.0031250000465661287
        cur_lr: 4.999999873689376e-05
        entropy: 34.221466064453125
        entropy_coeff: 0.0
        kl: 0.003068143967539072
        policy_loss: -0.006566986441612244
        total_loss: 1896.223876953125
        vf_explained_var: 0.0038422788493335247
        vf_loss: 1896.23046875
    load_time_ms: 26.374
    num_steps_sampled: 105000
    num_steps_trained: 104832
    sample_time_ms: 10967.338
    update_time_ms: 94.799
  iterations_since_restore: 7
  node_ip: 192.168.1

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,7,100.625,105000,1736.27




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-16-29
  done: false
  episode_len_mean: 1258.0113636363637
  episode_reward_max: 2562.318823354186
  episode_reward_mean: 1749.5471243851432
  episode_reward_min: 697.5786569193747
  episodes_this_iter: 11
  episodes_total: 88
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3252.558
    learner:
      default_policy:
        cur_kl_coeff: 0.0015625000232830644
        cur_lr: 4.999999873689376e-05
        entropy: 34.18902587890625
        entropy_coeff: 0.0
        kl: 0.0036672838032245636
        policy_loss: -0.007534302305430174
        total_loss: 1870.0711669921875
        vf_explained_var: 0.009615126065909863
        vf_loss: 1870.0789794921875
    load_time_ms: 25.206
    num_steps_sampled: 120000
    num_steps_trained: 119808
    sample_time_ms: 11024.579
    update_time_ms: 84.117
  iterations_since_restore: 8
  node_ip: 192.

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,8,115.322,120000,1749.55




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-16-43
  done: false
  episode_len_mean: 1270.8775510204082
  episode_reward_max: 3082.188908698603
  episode_reward_mean: 1779.6084843109684
  episode_reward_min: 697.5786569193747
  episodes_this_iter: 10
  episodes_total: 98
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3249.628
    learner:
      default_policy:
        cur_kl_coeff: 0.0007812500116415322
        cur_lr: 4.999999873689376e-05
        entropy: 34.12698745727539
        entropy_coeff: 0.0
        kl: 0.0032600811682641506
        policy_loss: -0.007399260997772217
        total_loss: 1908.8782958984375
        vf_explained_var: 0.006370424292981625
        vf_loss: 1908.8858642578125
    load_time_ms: 24.284
    num_steps_sampled: 135000
    num_steps_trained: 134784
    sample_time_ms: 11018.001
    update_time_ms: 75.791
  iterations_since_restore: 9
  node_ip: 192.

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,9,129.559,135000,1779.61




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-16-56
  done: false
  episode_len_mean: 1294.53
  episode_reward_max: 3082.188908698603
  episode_reward_mean: 1829.0812476461413
  episode_reward_min: 762.1429259638348
  episodes_this_iter: 12
  episodes_total: 110
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3245.686
    learner:
      default_policy:
        cur_kl_coeff: 0.0003906250058207661
        cur_lr: 4.999999873689376e-05
        entropy: 34.11781311035156
        entropy_coeff: 0.0
        kl: 0.0032171092461794615
        policy_loss: -0.007235899567604065
        total_loss: 2162.8837890625
        vf_explained_var: 0.005768269766122103
        vf_loss: 2162.890869140625
    load_time_ms: 23.313
    num_steps_sampled: 150000
    num_steps_trained: 149760
    sample_time_ms: 10908.174
    update_time_ms: 69.099
  iterations_since_restore: 10
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,10,142.732,150000,1829.08




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-17-10
  done: false
  episode_len_mean: 1292.5
  episode_reward_max: 3082.188908698603
  episode_reward_mean: 1862.908975072539
  episode_reward_min: 762.1429259638348
  episodes_this_iter: 11
  episodes_total: 121
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3209.581
    learner:
      default_policy:
        cur_kl_coeff: 0.00019531250291038305
        cur_lr: 4.999999873689376e-05
        entropy: 34.13369369506836
        entropy_coeff: 0.0
        kl: 0.0032488673459738493
        policy_loss: -0.006651131436228752
        total_loss: 2063.204345703125
        vf_explained_var: 0.0016225383151322603
        vf_loss: 2063.2109375
    load_time_ms: 15.98
    num_steps_sampled: 165000
    num_steps_trained: 164736
    sample_time_ms: 10651.376
    update_time_ms: 9.373
  iterations_since_restore: 11
  node_ip: 192.168.107.157
  num_

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,11,156.155,165000,1862.91




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-17-24
  done: false
  episode_len_mean: 1281.23
  episode_reward_max: 3082.188908698603
  episode_reward_mean: 1882.497520730224
  episode_reward_min: 762.1429259638348
  episodes_this_iter: 11
  episodes_total: 132
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3212.455
    learner:
      default_policy:
        cur_kl_coeff: 9.765625145519152e-05
        cur_lr: 4.999999873689376e-05
        entropy: 34.12575149536133
        entropy_coeff: 0.0
        kl: 0.0031702020205557346
        policy_loss: -0.00612522242590785
        total_loss: 2321.36328125
        vf_explained_var: 0.010841649025678635
        vf_loss: 2321.36962890625
    load_time_ms: 15.712
    num_steps_sampled: 180000
    num_steps_trained: 179712
    sample_time_ms: 10765.622
    update_time_ms: 9.451
  iterations_since_restore: 12
  node_ip: 192.168.107.157
  num_h

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,12,170.46,180000,1882.5




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-17-38
  done: false
  episode_len_mean: 1289.06
  episode_reward_max: 3082.188908698603
  episode_reward_mean: 1935.2875108167102
  episode_reward_min: 762.1429259638348
  episodes_this_iter: 12
  episodes_total: 144
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3210.896
    learner:
      default_policy:
        cur_kl_coeff: 4.882812572759576e-05
        cur_lr: 4.999999873689376e-05
        entropy: 34.17047119140625
        entropy_coeff: 0.0
        kl: 0.0035137252416461706
        policy_loss: -0.007413653656840324
        total_loss: 2158.455810546875
        vf_explained_var: 0.00038223082083277404
        vf_loss: 2158.46337890625
    load_time_ms: 15.3
    num_steps_sampled: 195000
    num_steps_trained: 194688
    sample_time_ms: 10833.528
    update_time_ms: 9.248
  iterations_since_restore: 13
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,13,184.609,195000,1935.29




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-17-52
  done: false
  episode_len_mean: 1290.26
  episode_reward_max: 3082.188908698603
  episode_reward_mean: 1972.4922651591464
  episode_reward_min: 762.1429259638348
  episodes_this_iter: 12
  episodes_total: 156
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3214.367
    learner:
      default_policy:
        cur_kl_coeff: 2.441406286379788e-05
        cur_lr: 4.999999873689376e-05
        entropy: 34.15592575073242
        entropy_coeff: 0.0
        kl: 0.003175248857587576
        policy_loss: -0.006078826729208231
        total_loss: 2163.29296875
        vf_explained_var: 0.007631607353687286
        vf_loss: 2163.29931640625
    load_time_ms: 15.688
    num_steps_sampled: 210000
    num_steps_trained: 209664
    sample_time_ms: 10836.934
    update_time_ms: 9.299
  iterations_since_restore: 14
  node_ip: 192.168.107.157
  num_

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,14,198.065,210000,1972.49






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-18-05
  done: false
  episode_len_mean: 1280.77
  episode_reward_max: 3082.188908698603
  episode_reward_mean: 1993.122849783216
  episode_reward_min: 770.3785985103654
  episodes_this_iter: 12
  episodes_total: 168
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.835
    learner:
      default_policy:
        cur_kl_coeff: 1.220703143189894e-05
        cur_lr: 4.999999873689376e-05
        entropy: 34.168277740478516
        entropy_coeff: 0.0
        kl: 0.0033458194229751825
        policy_loss: -0.006662840489298105
        total_loss: 2354.18505859375
        vf_explained_var: 0.008646074682474136
        vf_loss: 2354.19140625
    load_time_ms: 15.06
    num_steps_sampled: 225000
    num_steps_trained: 224640
    sample_time_ms: 10728.0
    update_time_ms: 9.266
  iterations_since_restore: 15
  node_ip: 192.168.107.157
  num_he

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,15,211.222,225000,1993.12






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-18-19
  done: false
  episode_len_mean: 1270.09
  episode_reward_max: 3082.188908698603
  episode_reward_mean: 2004.456906695624
  episode_reward_min: 783.1383551611767
  episodes_this_iter: 14
  episodes_total: 182
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3206.815
    learner:
      default_policy:
        cur_kl_coeff: 6.10351571594947e-06
        cur_lr: 4.999999873689376e-05
        entropy: 34.26426696777344
        entropy_coeff: 0.0
        kl: 0.004381966777145863
        policy_loss: -0.008135647512972355
        total_loss: 2183.003662109375
        vf_explained_var: 0.005735671613365412
        vf_loss: 2183.01171875
    load_time_ms: 14.394
    num_steps_sampled: 240000
    num_steps_trained: 239616
    sample_time_ms: 10670.424
    update_time_ms: 9.416
  iterations_since_restore: 16
  node_ip: 192.168.107.157
  num_h

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,16,225.397,240000,2004.46




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-18-32
  done: false
  episode_len_mean: 1276.27
  episode_reward_max: 3082.188908698603
  episode_reward_mean: 2053.557509368225
  episode_reward_min: 783.1383551611767
  episodes_this_iter: 12
  episodes_total: 194
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3204.96
    learner:
      default_policy:
        cur_kl_coeff: 3.051757857974735e-06
        cur_lr: 4.999999873689376e-05
        entropy: 34.279197692871094
        entropy_coeff: 0.0
        kl: 0.005102739669382572
        policy_loss: -0.007341334130614996
        total_loss: 2499.289794921875
        vf_explained_var: 0.005779585801064968
        vf_loss: 2499.29736328125
    load_time_ms: 13.97
    num_steps_sampled: 255000
    num_steps_trained: 254592
    sample_time_ms: 10560.844
    update_time_ms: 9.42
  iterations_since_restore: 17
  node_ip: 192.168.107.157
  num

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,17,238.726,255000,2053.56






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-18-46
  done: false
  episode_len_mean: 1262.57
  episode_reward_max: 2931.949707661664
  episode_reward_mean: 2048.68502677465
  episode_reward_min: 783.1383551611767
  episodes_this_iter: 12
  episodes_total: 206
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3204.688
    learner:
      default_policy:
        cur_kl_coeff: 1.5258789289873675e-06
        cur_lr: 4.999999873689376e-05
        entropy: 34.33005905151367
        entropy_coeff: 0.0
        kl: 0.004478642717003822
        policy_loss: -0.008337135426700115
        total_loss: 2487.466552734375
        vf_explained_var: 0.0038945653941482306
        vf_loss: 2487.474853515625
    load_time_ms: 13.832
    num_steps_sampled: 270000
    num_steps_trained: 269568
    sample_time_ms: 10463.906
    update_time_ms: 9.688
  iterations_since_restore: 18
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,18,252.45,270000,2048.69




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-19-00
  done: false
  episode_len_mean: 1234.83
  episode_reward_max: 2931.949707661664
  episode_reward_mean: 2031.1130544847529
  episode_reward_min: 783.1383551611767
  episodes_this_iter: 9
  episodes_total: 215
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3199.794
    learner:
      default_policy:
        cur_kl_coeff: 7.629394644936838e-07
        cur_lr: 4.999999873689376e-05
        entropy: 34.19166564941406
        entropy_coeff: 0.0
        kl: 0.003950192593038082
        policy_loss: -0.006311224773526192
        total_loss: 2839.04931640625
        vf_explained_var: 0.001735502271912992
        vf_loss: 2839.056396484375
    load_time_ms: 13.437
    num_steps_sampled: 285000
    num_steps_trained: 284544
    sample_time_ms: 10469.288
    update_time_ms: 9.735
  iterations_since_restore: 19
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,19,266.691,285000,2031.11






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-19-15
  done: false
  episode_len_mean: 1221.26
  episode_reward_max: 4721.013874957428
  episode_reward_mean: 2053.8532109599364
  episode_reward_min: 783.1383551611767
  episodes_this_iter: 17
  episodes_total: 232
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3199.528
    learner:
      default_policy:
        cur_kl_coeff: 3.814697322468419e-07
        cur_lr: 4.999999873689376e-05
        entropy: 34.142215728759766
        entropy_coeff: 0.0
        kl: 0.004060724750161171
        policy_loss: -0.00710097374394536
        total_loss: 2984.907470703125
        vf_explained_var: 0.0007674143998883665
        vf_loss: 2984.91455078125
    load_time_ms: 13.999
    num_steps_sampled: 300000
    num_steps_trained: 299520
    sample_time_ms: 10645.806
    update_time_ms: 9.781
  iterations_since_restore: 20
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,20,281.627,300000,2053.85






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-19-30
  done: false
  episode_len_mean: 1185.08
  episode_reward_max: 4721.013874957428
  episode_reward_mean: 2017.9492649569988
  episode_reward_min: 783.1383551611767
  episodes_this_iter: 15
  episodes_total: 247
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.909
    learner:
      default_policy:
        cur_kl_coeff: 1.9073486612342094e-07
        cur_lr: 4.999999873689376e-05
        entropy: 34.1189079284668
        entropy_coeff: 0.0
        kl: 0.004265725612640381
        policy_loss: -0.006387073080986738
        total_loss: 2831.70263671875
        vf_explained_var: 0.007929480634629726
        vf_loss: 2831.709228515625
    load_time_ms: 14.679
    num_steps_sampled: 315000
    num_steps_trained: 314496
    sample_time_ms: 10713.019
    update_time_ms: 9.777
  iterations_since_restore: 21
  node_ip: 192.168.107.157
  

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,21,295.816,315000,2017.95






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-19-43
  done: false
  episode_len_mean: 1182.95
  episode_reward_max: 4721.013874957428
  episode_reward_mean: 2029.3858906418664
  episode_reward_min: 783.1383551611767
  episodes_this_iter: 11
  episodes_total: 258
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3202.291
    learner:
      default_policy:
        cur_kl_coeff: 9.536743306171047e-08
        cur_lr: 4.999999873689376e-05
        entropy: 34.045921325683594
        entropy_coeff: 0.0
        kl: 0.0036696174647659063
        policy_loss: -0.007360586430877447
        total_loss: 2665.36572265625
        vf_explained_var: 0.0032667762134224176
        vf_loss: 2665.373291015625
    load_time_ms: 14.943
    num_steps_sampled: 330000
    num_steps_trained: 329472
    sample_time_ms: 10628.426
    update_time_ms: 9.769
  iterations_since_restore: 22
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,22,309.223,330000,2029.39






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-19-57
  done: false
  episode_len_mean: 1168.15
  episode_reward_max: 4721.013874957428
  episode_reward_mean: 2037.23240031103
  episode_reward_min: 783.1383551611767
  episodes_this_iter: 11
  episodes_total: 269
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.368
    learner:
      default_policy:
        cur_kl_coeff: 4.7683716530855236e-08
        cur_lr: 4.999999873689376e-05
        entropy: 34.04957580566406
        entropy_coeff: 0.0
        kl: 0.0032347894739359617
        policy_loss: -0.006375861819833517
        total_loss: 3328.624755859375
        vf_explained_var: 0.0009383141295984387
        vf_loss: 3328.630859375
    load_time_ms: 14.477
    num_steps_sampled: 345000
    num_steps_trained: 344448
    sample_time_ms: 10575.453
    update_time_ms: 9.593
  iterations_since_restore: 23
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,23,322.87,345000,2037.23






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-20-10
  done: false
  episode_len_mean: 1190.18
  episode_reward_max: 4721.013874957428
  episode_reward_mean: 2093.7983565900367
  episode_reward_min: 1175.5750742018204
  episodes_this_iter: 13
  episodes_total: 282
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3206.036
    learner:
      default_policy:
        cur_kl_coeff: 2.3841858265427618e-08
        cur_lr: 4.999999873689376e-05
        entropy: 34.03968811035156
        entropy_coeff: 0.0
        kl: 0.004364223685115576
        policy_loss: -0.005818452686071396
        total_loss: 2484.424560546875
        vf_explained_var: 0.005356175824999809
        vf_loss: 2484.430908203125
    load_time_ms: 14.582
    num_steps_sampled: 360000
    num_steps_trained: 359424
    sample_time_ms: 10563.553
    update_time_ms: 9.461
  iterations_since_restore: 24
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,24,336.215,360000,2093.8






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-20-24
  done: false
  episode_len_mean: 1184.85
  episode_reward_max: 4721.013874957428
  episode_reward_mean: 2077.274760114194
  episode_reward_min: 1126.2311826846874
  episodes_this_iter: 14
  episodes_total: 296
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.739
    learner:
      default_policy:
        cur_kl_coeff: 1.1920929132713809e-08
        cur_lr: 4.999999873689376e-05
        entropy: 34.083309173583984
        entropy_coeff: 0.0
        kl: 0.003358376445248723
        policy_loss: -0.007218760438263416
        total_loss: 2510.68359375
        vf_explained_var: 0.0017644319450482726
        vf_loss: 2510.69091796875
    load_time_ms: 15.513
    num_steps_sampled: 375000
    num_steps_trained: 374400
    sample_time_ms: 10668.041
    update_time_ms: 9.579
  iterations_since_restore: 25
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,25,350.455,375000,2077.27




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-20-38
  done: false
  episode_len_mean: 1178.55
  episode_reward_max: 4721.013874957428
  episode_reward_mean: 2109.066081900261
  episode_reward_min: 1126.2311826846874
  episodes_this_iter: 10
  episodes_total: 306
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3210.505
    learner:
      default_policy:
        cur_kl_coeff: 5.9604645663569045e-09
        cur_lr: 4.999999873689376e-05
        entropy: 34.17039489746094
        entropy_coeff: 0.0
        kl: 0.00474796025082469
        policy_loss: -0.008076782338321209
        total_loss: 2986.9423828125
        vf_explained_var: 0.0036513381637632847
        vf_loss: 2986.950439453125
    load_time_ms: 15.973
    num_steps_sampled: 390000
    num_steps_trained: 389376
    sample_time_ms: 10593.261
    update_time_ms: 9.356
  iterations_since_restore: 26
  node_ip: 192.168.107.157
  

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,26,363.895,390000,2109.07






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-20-52
  done: false
  episode_len_mean: 1165.34
  episode_reward_max: 3519.267071180578
  episode_reward_mean: 2074.707639325714
  episode_reward_min: 1126.2311826846874
  episodes_this_iter: 12
  episodes_total: 318
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3209.821
    learner:
      default_policy:
        cur_kl_coeff: 2.9802322831784522e-09
        cur_lr: 4.999999873689376e-05
        entropy: 34.184051513671875
        entropy_coeff: 0.0
        kl: 0.004545126110315323
        policy_loss: -0.007809564471244812
        total_loss: 2694.65380859375
        vf_explained_var: 0.0052872332744300365
        vf_loss: 2694.66162109375
    load_time_ms: 15.809
    num_steps_sampled: 405000
    num_steps_trained: 404352
    sample_time_ms: 10657.495
    update_time_ms: 9.315
  iterations_since_restore: 27
  node_ip: 192.168.107.157


Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,27,377.856,405000,2074.71






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-21-06
  done: false
  episode_len_mean: 1182.74
  episode_reward_max: 3519.267071180578
  episode_reward_mean: 2122.0669707798015
  episode_reward_min: 770.7573613917094
  episodes_this_iter: 15
  episodes_total: 333
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3215.228
    learner:
      default_policy:
        cur_kl_coeff: 1.4901161415892261e-09
        cur_lr: 4.999999873689376e-05
        entropy: 34.119632720947266
        entropy_coeff: 0.0
        kl: 0.00573598500341177
        policy_loss: -0.007560104597359896
        total_loss: 3093.821044921875
        vf_explained_var: 0.0016637423541396856
        vf_loss: 3093.829345703125
    load_time_ms: 16.487
    num_steps_sampled: 420000
    num_steps_trained: 419328
    sample_time_ms: 10724.849
    update_time_ms: 8.974
  iterations_since_restore: 28
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,28,392.309,420000,2122.07






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-21-20
  done: false
  episode_len_mean: 1192.56
  episode_reward_max: 3519.267071180578
  episode_reward_mean: 2148.3570838153864
  episode_reward_min: 770.7573613917094
  episodes_this_iter: 14
  episodes_total: 347
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3217.432
    learner:
      default_policy:
        cur_kl_coeff: 7.450580707946131e-10
        cur_lr: 4.999999873689376e-05
        entropy: 34.15761184692383
        entropy_coeff: 0.0
        kl: 0.004760488867759705
        policy_loss: -0.007461878005415201
        total_loss: 2806.988525390625
        vf_explained_var: 0.006014873273670673
        vf_loss: 2806.996337890625
    load_time_ms: 16.792
    num_steps_sampled: 435000
    num_steps_trained: 434304
    sample_time_ms: 10640.299
    update_time_ms: 8.805
  iterations_since_restore: 29
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,29,405.728,435000,2148.36






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-21-34
  done: false
  episode_len_mean: 1182.22
  episode_reward_max: 3519.267071180578
  episode_reward_mean: 2152.084606668209
  episode_reward_min: 770.7573613917094
  episodes_this_iter: 14
  episodes_total: 361
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3220.458
    learner:
      default_policy:
        cur_kl_coeff: 3.7252903539730653e-10
        cur_lr: 4.999999873689376e-05
        entropy: 34.08131408691406
        entropy_coeff: 0.0
        kl: 0.0043394314125180244
        policy_loss: -0.007302083540707827
        total_loss: 2925.428466796875
        vf_explained_var: 0.0022341888397932053
        vf_loss: 2925.43505859375
    load_time_ms: 16.541
    num_steps_sampled: 450000
    num_steps_trained: 449280
    sample_time_ms: 10558.987
    update_time_ms: 8.789
  iterations_since_restore: 30
  node_ip: 192.168.107.157


Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,30,419.886,450000,2152.08






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-21-48
  done: false
  episode_len_mean: 1173.22
  episode_reward_max: 3519.267071180578
  episode_reward_mean: 2149.4219013154884
  episode_reward_min: 770.7573613917094
  episodes_this_iter: 14
  episodes_total: 375
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3216.239
    learner:
      default_policy:
        cur_kl_coeff: 1.8626451769865326e-10
        cur_lr: 4.999999873689376e-05
        entropy: 34.12302017211914
        entropy_coeff: 0.0
        kl: 0.004148859065026045
        policy_loss: -0.006572313141077757
        total_loss: 2992.13671875
        vf_explained_var: 0.004446534905582666
        vf_loss: 2992.142822265625
    load_time_ms: 16.596
    num_steps_sampled: 465000
    num_steps_trained: 464256
    sample_time_ms: 10521.489
    update_time_ms: 9.025
  iterations_since_restore: 31
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,31,433.656,465000,2149.42




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-22-01
  done: false
  episode_len_mean: 1161.35
  episode_reward_max: 3519.267071180578
  episode_reward_mean: 2133.862541097165
  episode_reward_min: 770.7573613917094
  episodes_this_iter: 10
  episodes_total: 385
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3218.781
    learner:
      default_policy:
        cur_kl_coeff: 9.313225884932663e-11
        cur_lr: 4.999999873689376e-05
        entropy: 34.05091094970703
        entropy_coeff: 0.0
        kl: 0.003348428988829255
        policy_loss: -0.0062864916399121284
        total_loss: 3026.408935546875
        vf_explained_var: 0.0029381385538727045
        vf_loss: 3026.4150390625
    load_time_ms: 16.886
    num_steps_sampled: 480000
    num_steps_trained: 479232
    sample_time_ms: 10528.932
    update_time_ms: 8.963
  iterations_since_restore: 32
  node_ip: 192.168.107.157
  

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,32,447.165,480000,2133.86






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-22-16
  done: false
  episode_len_mean: 1154.15
  episode_reward_max: 3147.1543158166737
  episode_reward_mean: 2145.878050629319
  episode_reward_min: 770.7573613917094
  episodes_this_iter: 13
  episodes_total: 398
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3220.299
    learner:
      default_policy:
        cur_kl_coeff: 4.6566129424663316e-11
        cur_lr: 4.999999873689376e-05
        entropy: 34.19114685058594
        entropy_coeff: 0.0
        kl: 0.005297821015119553
        policy_loss: -0.006531808525323868
        total_loss: 3106.9697265625
        vf_explained_var: 0.00414911238476634
        vf_loss: 3106.976806640625
    load_time_ms: 17.314
    num_steps_sampled: 495000
    num_steps_trained: 494208
    sample_time_ms: 10592.927
    update_time_ms: 9.148
  iterations_since_restore: 33
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,33,461.472,495000,2145.88






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-22-30
  done: false
  episode_len_mean: 1164.84
  episode_reward_max: 3147.1543158166737
  episode_reward_mean: 2174.263153402738
  episode_reward_min: 770.7573613917094
  episodes_this_iter: 13
  episodes_total: 411
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3217.34
    learner:
      default_policy:
        cur_kl_coeff: 2.3283064712331658e-11
        cur_lr: 4.999999873689376e-05
        entropy: 34.09205627441406
        entropy_coeff: 0.0
        kl: 0.004164163023233414
        policy_loss: -0.0081402026116848
        total_loss: 3373.189208984375
        vf_explained_var: -1.1362071745679714e-05
        vf_loss: 3373.197021484375
    load_time_ms: 16.994
    num_steps_sampled: 510000
    num_steps_trained: 509184
    sample_time_ms: 10684.504
    update_time_ms: 9.273
  iterations_since_restore: 34
  node_ip: 192.168.107.157


Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,34,475.699,510000,2174.26






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-22-44
  done: false
  episode_len_mean: 1157.73
  episode_reward_max: 3147.1543158166737
  episode_reward_mean: 2185.373721702138
  episode_reward_min: 770.7573613917094
  episodes_this_iter: 14
  episodes_total: 425
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3213.834
    learner:
      default_policy:
        cur_kl_coeff: 1.1641532356165829e-11
        cur_lr: 4.999999873689376e-05
        entropy: 34.14986801147461
        entropy_coeff: 0.0
        kl: 0.004197294358164072
        policy_loss: -0.007147384807467461
        total_loss: 3152.43994140625
        vf_explained_var: 0.002440757816657424
        vf_loss: 3152.44677734375
    load_time_ms: 16.386
    num_steps_sampled: 525000
    num_steps_trained: 524160
    sample_time_ms: 10707.35
    update_time_ms: 9.222
  iterations_since_restore: 35
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,35,490.124,525000,2185.37




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-22-58
  done: false
  episode_len_mean: 1145.38
  episode_reward_max: 3147.1543158166737
  episode_reward_mean: 2182.4232136643473
  episode_reward_min: 945.3942664529986
  episodes_this_iter: 11
  episodes_total: 436
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3212.032
    learner:
      default_policy:
        cur_kl_coeff: 5.8207661780829145e-12
        cur_lr: 4.999999873689376e-05
        entropy: 34.12147903442383
        entropy_coeff: 0.0
        kl: 0.004108601715415716
        policy_loss: -0.007434830069541931
        total_loss: 3505.601318359375
        vf_explained_var: 0.0019895147997885942
        vf_loss: 3505.609619140625
    load_time_ms: 16.375
    num_steps_sampled: 540000
    num_steps_trained: 539136
    sample_time_ms: 10778.957
    update_time_ms: 9.118
  iterations_since_restore: 36
  node_ip: 192.168.107.15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,36,504.271,540000,2182.42






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-23-13
  done: false
  episode_len_mean: 1160.56
  episode_reward_max: 3147.1543158166737
  episode_reward_mean: 2235.2819299434304
  episode_reward_min: 945.3942664529986
  episodes_this_iter: 15
  episodes_total: 451
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3213.977
    learner:
      default_policy:
        cur_kl_coeff: 2.9103830890414573e-12
        cur_lr: 4.999999873689376e-05
        entropy: 34.172027587890625
        entropy_coeff: 0.0
        kl: 0.00514204939827323
        policy_loss: -0.007857291027903557
        total_loss: 3166.2158203125
        vf_explained_var: 0.003291290020570159
        vf_loss: 3166.223876953125
    load_time_ms: 16.877
    num_steps_sampled: 555000
    num_steps_trained: 554112
    sample_time_ms: 10846.567
    update_time_ms: 9.096
  iterations_since_restore: 37
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,37,518.933,555000,2235.28






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-23-26
  done: false
  episode_len_mean: 1165.02
  episode_reward_max: 3355.9855012254798
  episode_reward_mean: 2256.6622539800533
  episode_reward_min: 945.3942664529986
  episodes_this_iter: 12
  episodes_total: 463
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3206.717
    learner:
      default_policy:
        cur_kl_coeff: 1.4551915445207286e-12
        cur_lr: 4.999999873689376e-05
        entropy: 34.136199951171875
        entropy_coeff: 0.0
        kl: 0.0037890085950493813
        policy_loss: -0.005821863189339638
        total_loss: 3212.888427734375
        vf_explained_var: 0.001695467857643962
        vf_loss: 3212.894287109375
    load_time_ms: 16.0
    num_steps_sampled: 570000
    num_steps_trained: 569088
    sample_time_ms: 10727.425
    update_time_ms: 9.174
  iterations_since_restore: 38
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,38,532.114,570000,2256.66






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-23-40
  done: false
  episode_len_mean: 1168.17
  episode_reward_max: 3448.094615957843
  episode_reward_mean: 2272.437007145542
  episode_reward_min: 1241.5913591648764
  episodes_this_iter: 11
  episodes_total: 474
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3210.341
    learner:
      default_policy:
        cur_kl_coeff: 7.275957722603643e-13
        cur_lr: 4.999999873689376e-05
        entropy: 34.126060485839844
        entropy_coeff: 0.0
        kl: 0.005518027581274509
        policy_loss: -0.0074599795043468475
        total_loss: 3332.261962890625
        vf_explained_var: 0.0011380906216800213
        vf_loss: 3332.26904296875
    load_time_ms: 16.131
    num_steps_sampled: 585000
    num_steps_trained: 584064
    sample_time_ms: 10715.227
    update_time_ms: 9.26
  iterations_since_restore: 39
  node_ip: 192.168.107.157


Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,39,545.45,585000,2272.44






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-23-54
  done: false
  episode_len_mean: 1160.59
  episode_reward_max: 3448.094615957843
  episode_reward_mean: 2272.9887880128354
  episode_reward_min: 1057.6465847966463
  episodes_this_iter: 15
  episodes_total: 489
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.261
    learner:
      default_policy:
        cur_kl_coeff: 3.6379788613018216e-13
        cur_lr: 4.999999873689376e-05
        entropy: 34.1389045715332
        entropy_coeff: 0.0
        kl: 0.003977231681346893
        policy_loss: -0.006286475341767073
        total_loss: 3972.1884765625
        vf_explained_var: 0.0016292125219479203
        vf_loss: 3972.19482421875
    load_time_ms: 16.078
    num_steps_sampled: 600000
    num_steps_trained: 599040
    sample_time_ms: 10741.353
    update_time_ms: 9.058
  iterations_since_restore: 40
  node_ip: 192.168.107.157
  

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,40,559.842,600000,2272.99






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-24-08
  done: false
  episode_len_mean: 1176.07
  episode_reward_max: 3448.094615957843
  episode_reward_mean: 2317.4369781742703
  episode_reward_min: 848.441470044161
  episodes_this_iter: 13
  episodes_total: 502
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3209.706
    learner:
      default_policy:
        cur_kl_coeff: 1.8189894306509108e-13
        cur_lr: 4.999999873689376e-05
        entropy: 34.186161041259766
        entropy_coeff: 0.0
        kl: 0.004472072701901197
        policy_loss: -0.008954274468123913
        total_loss: 3567.48291015625
        vf_explained_var: 0.0010431490372866392
        vf_loss: 3567.49169921875
    load_time_ms: 15.275
    num_steps_sampled: 615000
    num_steps_trained: 614016
    sample_time_ms: 10710.748
    update_time_ms: 8.699
  iterations_since_restore: 41
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,41,573.31,615000,2317.44






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-24-21
  done: false
  episode_len_mean: 1180.52
  episode_reward_max: 3472.5353796577274
  episode_reward_mean: 2351.9284431541478
  episode_reward_min: 848.441470044161
  episodes_this_iter: 13
  episodes_total: 515
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.988
    learner:
      default_policy:
        cur_kl_coeff: 9.094947153254554e-14
        cur_lr: 4.999999873689376e-05
        entropy: 34.28214645385742
        entropy_coeff: 0.0
        kl: 0.00511997751891613
        policy_loss: -0.0075654941610991955
        total_loss: 3455.665771484375
        vf_explained_var: 0.00033993396209552884
        vf_loss: 3455.673828125
    load_time_ms: 14.496
    num_steps_sampled: 630000
    num_steps_trained: 628992
    sample_time_ms: 10698.482
    update_time_ms: 8.729
  iterations_since_restore: 42
  node_ip: 192.168.107.157
  

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,42,586.671,630000,2351.93






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-24-34
  done: false
  episode_len_mean: 1162.53
  episode_reward_max: 3472.5353796577274
  episode_reward_mean: 2345.623956064064
  episode_reward_min: 848.441470044161
  episodes_this_iter: 12
  episodes_total: 527
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.753
    learner:
      default_policy:
        cur_kl_coeff: 4.547473576627277e-14
        cur_lr: 4.999999873689376e-05
        entropy: 34.270137786865234
        entropy_coeff: 0.0
        kl: 0.0032179683912545443
        policy_loss: -0.006223858334124088
        total_loss: 4486.49365234375
        vf_explained_var: 0.00038448561099357903
        vf_loss: 4486.5
    load_time_ms: 14.475
    num_steps_sampled: 645000
    num_steps_trained: 643968
    sample_time_ms: 10598.324
    update_time_ms: 8.815
  iterations_since_restore: 43
  node_ip: 192.168.107.157
  num_heal

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,43,599.981,645000,2345.62






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-24-49
  done: false
  episode_len_mean: 1165.61
  episode_reward_max: 3523.542477189971
  episode_reward_mean: 2368.1949519248037
  episode_reward_min: 830.972175022633
  episodes_this_iter: 14
  episodes_total: 541
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3210.851
    learner:
      default_policy:
        cur_kl_coeff: 2.2737367883136385e-14
        cur_lr: 4.999999873689376e-05
        entropy: 34.257205963134766
        entropy_coeff: 0.0
        kl: 0.006029774900525808
        policy_loss: -0.008298478089272976
        total_loss: 3525.486083984375
        vf_explained_var: 0.0003595489833969623
        vf_loss: 3525.494873046875
    load_time_ms: 14.789
    num_steps_sampled: 660000
    num_steps_trained: 658944
    sample_time_ms: 10656.6
    update_time_ms: 8.743
  iterations_since_restore: 44
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,44,614.823,660000,2368.19






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-25-04
  done: false
  episode_len_mean: 1168.02
  episode_reward_max: 3523.542477189971
  episode_reward_mean: 2382.7113243514736
  episode_reward_min: 830.972175022633
  episodes_this_iter: 13
  episodes_total: 554
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.756
    learner:
      default_policy:
        cur_kl_coeff: 1.1368683941568192e-14
        cur_lr: 4.999999873689376e-05
        entropy: 34.206111907958984
        entropy_coeff: 0.0
        kl: 0.0045669302344322205
        policy_loss: -0.007431959267705679
        total_loss: 4343.76025390625
        vf_explained_var: 0.0012452913215383887
        vf_loss: 4343.767578125
    load_time_ms: 15.027
    num_steps_sampled: 675000
    num_steps_trained: 673920
    sample_time_ms: 10657.444
    update_time_ms: 8.594
  iterations_since_restore: 45
  node_ip: 192.168.107.157
  

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,45,629.24,675000,2382.71






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-25-17
  done: false
  episode_len_mean: 1160.05
  episode_reward_max: 3823.971459155344
  episode_reward_mean: 2428.0948971788634
  episode_reward_min: 830.972175022633
  episodes_this_iter: 12
  episodes_total: 566
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.903
    learner:
      default_policy:
        cur_kl_coeff: 5.684341970784096e-15
        cur_lr: 4.999999873689376e-05
        entropy: 34.180789947509766
        entropy_coeff: 0.0
        kl: 0.005210080649703741
        policy_loss: -0.005731509067118168
        total_loss: 4218.666015625
        vf_explained_var: 0.0003632189764175564
        vf_loss: 4218.67236328125
    load_time_ms: 14.641
    num_steps_sampled: 690000
    num_steps_trained: 688896
    sample_time_ms: 10623.263
    update_time_ms: 8.802
  iterations_since_restore: 46
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,46,643.029,690000,2428.09






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-25-32
  done: false
  episode_len_mean: 1143.72
  episode_reward_max: 3823.971459155344
  episode_reward_mean: 2430.9372129848102
  episode_reward_min: 830.972175022633
  episodes_this_iter: 14
  episodes_total: 580
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.794
    learner:
      default_policy:
        cur_kl_coeff: 2.842170985392048e-15
        cur_lr: 4.999999873689376e-05
        entropy: 34.15691375732422
        entropy_coeff: 0.0
        kl: 0.004329792223870754
        policy_loss: -0.006076607387512922
        total_loss: 5072.3486328125
        vf_explained_var: 0.0005313657457008958
        vf_loss: 5072.3544921875
    load_time_ms: 14.687
    num_steps_sampled: 705000
    num_steps_trained: 703872
    sample_time_ms: 10581.104
    update_time_ms: 8.769
  iterations_since_restore: 47
  node_ip: 192.168.107.157
  num

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,47,657.252,705000,2430.94






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-25-45
  done: false
  episode_len_mean: 1160.14
  episode_reward_max: 5222.305266420309
  episode_reward_mean: 2493.3603482428275
  episode_reward_min: 830.972175022633
  episodes_this_iter: 12
  episodes_total: 592
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.406
    learner:
      default_policy:
        cur_kl_coeff: 1.421085492696024e-15
        cur_lr: 4.999999873689376e-05
        entropy: 34.24629592895508
        entropy_coeff: 0.0
        kl: 0.005160065367817879
        policy_loss: -0.007786164991557598
        total_loss: 3755.17919921875
        vf_explained_var: 0.0015296069905161858
        vf_loss: 3755.187255859375
    load_time_ms: 14.655
    num_steps_sampled: 720000
    num_steps_trained: 718848
    sample_time_ms: 10607.411
    update_time_ms: 8.717
  iterations_since_restore: 48
  node_ip: 192.168.107.157
  

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,48,670.714,720000,2493.36






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-25-59
  done: false
  episode_len_mean: 1150.25
  episode_reward_max: 5222.305266420309
  episode_reward_mean: 2494.144071189573
  episode_reward_min: 830.972175022633
  episodes_this_iter: 13
  episodes_total: 605
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.802
    learner:
      default_policy:
        cur_kl_coeff: 7.10542746348012e-16
        cur_lr: 4.999999873689376e-05
        entropy: 34.222782135009766
        entropy_coeff: 0.0
        kl: 0.004947925917804241
        policy_loss: -0.008592679165303707
        total_loss: 4351.654296875
        vf_explained_var: 0.00023640322615392506
        vf_loss: 4351.6630859375
    load_time_ms: 14.508
    num_steps_sampled: 735000
    num_steps_trained: 733824
    sample_time_ms: 10614.636
    update_time_ms: 8.764
  iterations_since_restore: 49
  node_ip: 192.168.107.157
  num_

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,49,684.104,735000,2494.14






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-26-12
  done: false
  episode_len_mean: 1158.81
  episode_reward_max: 5222.305266420309
  episode_reward_mean: 2522.844307190913
  episode_reward_min: 830.972175022633
  episodes_this_iter: 11
  episodes_total: 616
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.358
    learner:
      default_policy:
        cur_kl_coeff: 3.55271373174006e-16
        cur_lr: 4.999999873689376e-05
        entropy: 34.28549575805664
        entropy_coeff: 0.0
        kl: 0.004904798232018948
        policy_loss: -0.006036922801285982
        total_loss: 4328.28173828125
        vf_explained_var: 0.00044163272832520306
        vf_loss: 4328.28759765625
    load_time_ms: 13.878
    num_steps_sampled: 750000
    num_steps_trained: 748800
    sample_time_ms: 10494.052
    update_time_ms: 8.916
  iterations_since_restore: 50
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,50,697.287,750000,2522.84






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-26-26
  done: false
  episode_len_mean: 1161.09
  episode_reward_max: 5222.305266420309
  episode_reward_mean: 2538.335698638801
  episode_reward_min: 830.972175022633
  episodes_this_iter: 12
  episodes_total: 628
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3204.359
    learner:
      default_policy:
        cur_kl_coeff: 1.77635686587003e-16
        cur_lr: 4.999999873689376e-05
        entropy: 34.248207092285156
        entropy_coeff: 0.0
        kl: 0.004098993260413408
        policy_loss: -0.005935890134423971
        total_loss: 4561.35400390625
        vf_explained_var: 0.00012007228360744193
        vf_loss: 4561.35986328125
    load_time_ms: 14.668
    num_steps_sampled: 765000
    num_steps_trained: 763776
    sample_time_ms: 10553.283
    update_time_ms: 8.967
  iterations_since_restore: 51
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,51,711.347,765000,2538.34






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-26-39
  done: false
  episode_len_mean: 1183.21
  episode_reward_max: 5222.305266420309
  episode_reward_mean: 2615.8849411287138
  episode_reward_min: 1159.6632936119222
  episodes_this_iter: 12
  episodes_total: 640
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3202.93
    learner:
      default_policy:
        cur_kl_coeff: 8.88178432935015e-17
        cur_lr: 4.999999873689376e-05
        entropy: 34.201507568359375
        entropy_coeff: 0.0
        kl: 0.004181759897619486
        policy_loss: -0.006704001221805811
        total_loss: 5093.1767578125
        vf_explained_var: 0.0012961269821971655
        vf_loss: 5093.18310546875
    load_time_ms: 14.897
    num_steps_sampled: 780000
    num_steps_trained: 778752
    sample_time_ms: 10550.305
    update_time_ms: 8.992
  iterations_since_restore: 52
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,52,724.664,780000,2615.88






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-26-54
  done: false
  episode_len_mean: 1194.04
  episode_reward_max: 5222.305266420309
  episode_reward_mean: 2662.4299527210637
  episode_reward_min: 1159.6632936119222
  episodes_this_iter: 15
  episodes_total: 655
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3200.105
    learner:
      default_policy:
        cur_kl_coeff: 4.440892164675075e-17
        cur_lr: 4.999999873689376e-05
        entropy: 34.2403564453125
        entropy_coeff: 0.0
        kl: 0.005471518728882074
        policy_loss: -0.006621215026825666
        total_loss: 4421.6201171875
        vf_explained_var: 0.00020184475579299033
        vf_loss: 4421.62646484375
    load_time_ms: 15.101
    num_steps_sampled: 795000
    num_steps_trained: 793728
    sample_time_ms: 10662.613
    update_time_ms: 8.82
  iterations_since_restore: 53
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,53,739.064,795000,2662.43




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-27-07
  done: false
  episode_len_mean: 1190.39
  episode_reward_max: 5222.305266420309
  episode_reward_mean: 2664.862486332258
  episode_reward_min: 1302.9832225357786
  episodes_this_iter: 8
  episodes_total: 663
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3194.694
    learner:
      default_policy:
        cur_kl_coeff: 2.2204460823375376e-17
        cur_lr: 4.999999873689376e-05
        entropy: 34.096580505371094
        entropy_coeff: 0.0
        kl: 0.00482152821496129
        policy_loss: -0.00803071167320013
        total_loss: 5484.1982421875
        vf_explained_var: 0.00027854371001012623
        vf_loss: 5484.20703125
    load_time_ms: 15.158
    num_steps_sampled: 810000
    num_steps_trained: 808704
    sample_time_ms: 10502.95
    update_time_ms: 8.858
  iterations_since_restore: 54
  node_ip: 192.168.107.157
  num_h

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,54,752.256,810000,2664.86






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-27-21
  done: false
  episode_len_mean: 1234.25
  episode_reward_max: 5222.305266420309
  episode_reward_mean: 2779.0614535072928
  episode_reward_min: 1302.9832225357786
  episodes_this_iter: 14
  episodes_total: 677
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3198.039
    learner:
      default_policy:
        cur_kl_coeff: 1.1102230411687688e-17
        cur_lr: 4.999999873689376e-05
        entropy: 34.12976837158203
        entropy_coeff: 0.0
        kl: 0.00465679494664073
        policy_loss: -0.007022792939096689
        total_loss: 4632.2578125
        vf_explained_var: 0.0001289401261601597
        vf_loss: 4632.26513671875
    load_time_ms: 15.288
    num_steps_sampled: 825000
    num_steps_trained: 823680
    sample_time_ms: 10492.641
    update_time_ms: 8.905
  iterations_since_restore: 55
  node_ip: 192.168.107.157
  num

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,55,766.606,825000,2779.06






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-27-36
  done: false
  episode_len_mean: 1244.43
  episode_reward_max: 5222.305266420309
  episode_reward_mean: 2827.612379610093
  episode_reward_min: 1315.010858148405
  episodes_this_iter: 13
  episodes_total: 690
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3197.423
    learner:
      default_policy:
        cur_kl_coeff: 5.551115205843844e-18
        cur_lr: 4.999999873689376e-05
        entropy: 34.14225769042969
        entropy_coeff: 0.0
        kl: 0.004882914014160633
        policy_loss: -0.007565107196569443
        total_loss: 4783.4052734375
        vf_explained_var: 0.00019080414494965225
        vf_loss: 4783.41357421875
    load_time_ms: 15.723
    num_steps_sampled: 840000
    num_steps_trained: 838656
    sample_time_ms: 10574.722
    update_time_ms: 8.888
  iterations_since_restore: 56
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,56,781.216,840000,2827.61






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-27-49
  done: false
  episode_len_mean: 1228.63
  episode_reward_max: 5047.883706191511
  episode_reward_mean: 2814.671484176985
  episode_reward_min: 1315.010858148405
  episodes_this_iter: 14
  episodes_total: 704
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3199.004
    learner:
      default_policy:
        cur_kl_coeff: 2.775557602921922e-18
        cur_lr: 4.999999873689376e-05
        entropy: 34.11270523071289
        entropy_coeff: 0.0
        kl: 0.00487287575379014
        policy_loss: -0.0069445339031517506
        total_loss: 4292.095703125
        vf_explained_var: 0.0012616412714123726
        vf_loss: 4292.1025390625
    load_time_ms: 15.162
    num_steps_sampled: 855000
    num_steps_trained: 853632
    sample_time_ms: 10475.863
    update_time_ms: 9.051
  iterations_since_restore: 57
  node_ip: 192.168.107.157
  num_

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,57,794.46,855000,2814.67






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-28-03
  done: false
  episode_len_mean: 1206.01
  episode_reward_max: 5047.883706191511
  episode_reward_mean: 2777.428905100354
  episode_reward_min: 1315.010858148405
  episodes_this_iter: 12
  episodes_total: 716
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3197.96
    learner:
      default_policy:
        cur_kl_coeff: 1.387778801460961e-18
        cur_lr: 4.999999873689376e-05
        entropy: 34.11088180541992
        entropy_coeff: 0.0
        kl: 0.00557288434356451
        policy_loss: -0.009956423193216324
        total_loss: 5540.8486328125
        vf_explained_var: -3.1295494409278035e-05
        vf_loss: 5540.85791015625
    load_time_ms: 15.954
    num_steps_sampled: 870000
    num_steps_trained: 868608
    sample_time_ms: 10567.628
    update_time_ms: 9.122
  iterations_since_restore: 58
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,58,808.836,870000,2777.43






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-28-18
  done: false
  episode_len_mean: 1193.78
  episode_reward_max: 5047.883706191511
  episode_reward_mean: 2790.486359146754
  episode_reward_min: 1511.2103496590055
  episodes_this_iter: 15
  episodes_total: 731
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3197.687
    learner:
      default_policy:
        cur_kl_coeff: 6.938894007304805e-19
        cur_lr: 4.999999873689376e-05
        entropy: 34.012630462646484
        entropy_coeff: 0.0
        kl: 0.004751990083605051
        policy_loss: -0.0052295527420938015
        total_loss: 5499.81640625
        vf_explained_var: 0.0008200417505577207
        vf_loss: 5499.822265625
    load_time_ms: 15.631
    num_steps_sampled: 885000
    num_steps_trained: 883584
    sample_time_ms: 10694.605
    update_time_ms: 8.915
  iterations_since_restore: 59
  node_ip: 192.168.107.157
  num

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,59,823.488,885000,2790.49






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-28-32
  done: false
  episode_len_mean: 1130.38
  episode_reward_max: 4394.003640841881
  episode_reward_mean: 2701.4427051927983
  episode_reward_min: 1044.1944152020228
  episodes_this_iter: 17
  episodes_total: 748
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3196.656
    learner:
      default_policy:
        cur_kl_coeff: 3.4694470036524025e-19
        cur_lr: 4.999999873689376e-05
        entropy: 34.006046295166016
        entropy_coeff: 0.0
        kl: 0.0038905064575374126
        policy_loss: -0.006853824947029352
        total_loss: 5120.79931640625
        vf_explained_var: 0.00048675903235562146
        vf_loss: 5120.80615234375
    load_time_ms: 15.741
    num_steps_sampled: 900000
    num_steps_trained: 898560
    sample_time_ms: 10800.183
    update_time_ms: 8.961
  iterations_since_restore: 60
  node_ip: 192.168.107.1

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,60,837.714,900000,2701.44






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-28-46
  done: false
  episode_len_mean: 1132.59
  episode_reward_max: 4394.003640841881
  episode_reward_mean: 2705.5775542583388
  episode_reward_min: 1044.1944152020228
  episodes_this_iter: 13
  episodes_total: 761
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3192.397
    learner:
      default_policy:
        cur_kl_coeff: 1.7347235018262012e-19
        cur_lr: 4.999999873689376e-05
        entropy: 34.006202697753906
        entropy_coeff: 0.0
        kl: 0.005222780164331198
        policy_loss: -0.008412384428083897
        total_loss: 5546.109375
        vf_explained_var: 5.925211007706821e-05
        vf_loss: 5546.1171875
    load_time_ms: 15.03
    num_steps_sampled: 915000
    num_steps_trained: 913536
    sample_time_ms: 10732.885
    update_time_ms: 8.931
  iterations_since_restore: 61
  node_ip: 192.168.107.157
  num_hea

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,61,851.052,915000,2705.58






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-28-59
  done: false
  episode_len_mean: 1096.12
  episode_reward_max: 4394.003640841881
  episode_reward_mean: 2672.493579474292
  episode_reward_min: 1044.1944152020228
  episodes_this_iter: 14
  episodes_total: 775
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3192.915
    learner:
      default_policy:
        cur_kl_coeff: 8.673617509131006e-20
        cur_lr: 4.999999873689376e-05
        entropy: 33.921791076660156
        entropy_coeff: 0.0
        kl: 0.00623610895127058
        policy_loss: -0.006262301467359066
        total_loss: 7362.19140625
        vf_explained_var: 0.0003057494177483022
        vf_loss: 7362.19677734375
    load_time_ms: 14.88
    num_steps_sampled: 930000
    num_steps_trained: 928512
    sample_time_ms: 10737.892
    update_time_ms: 9.111
  iterations_since_restore: 62
  node_ip: 192.168.107.157
  num_

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,62,864.426,930000,2672.49






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-29-14
  done: false
  episode_len_mean: 1069.21
  episode_reward_max: 4627.0521930749865
  episode_reward_mean: 2690.482881051753
  episode_reward_min: 1044.1944152020228
  episodes_this_iter: 13
  episodes_total: 788
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3189.309
    learner:
      default_policy:
        cur_kl_coeff: 4.336808754565503e-20
        cur_lr: 4.999999873689376e-05
        entropy: 33.93666458129883
        entropy_coeff: 0.0
        kl: 0.004814676009118557
        policy_loss: -0.006592577788978815
        total_loss: 6287.57763671875
        vf_explained_var: 2.621840212668758e-05
        vf_loss: 6287.58447265625
    load_time_ms: 14.262
    num_steps_sampled: 945000
    num_steps_trained: 943488
    sample_time_ms: 10747.572
    update_time_ms: 9.034
  iterations_since_restore: 63
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,63,878.876,945000,2690.48








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-29-28
  done: false
  episode_len_mean: 1028.21
  episode_reward_max: 4627.0521930749865
  episode_reward_mean: 2648.398071970398
  episode_reward_min: 1044.1944152020228
  episodes_this_iter: 17
  episodes_total: 805
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3191.578
    learner:
      default_policy:
        cur_kl_coeff: 2.1684043772827515e-20
        cur_lr: 4.999999873689376e-05
        entropy: 33.83816146850586
        entropy_coeff: 0.0
        kl: 0.005182808265089989
        policy_loss: -0.007801621221005917
        total_loss: 5555.30615234375
        vf_explained_var: 5.892504850635305e-05
        vf_loss: 5555.31396484375
    load_time_ms: 14.18
    num_steps_sampled: 960000
    num_steps_trained: 958464
    sample_time_ms: 10892.3
    update_time_ms: 8.883
  iterations_since_restore: 64
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,64,893.535,960000,2648.4






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-29-43
  done: false
  episode_len_mean: 1022.08
  episode_reward_max: 4627.0521930749865
  episode_reward_mean: 2627.3714186765988
  episode_reward_min: 1044.1944152020228
  episodes_this_iter: 13
  episodes_total: 818
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3189.981
    learner:
      default_policy:
        cur_kl_coeff: 1.0842021886413758e-20
        cur_lr: 4.999999873689376e-05
        entropy: 33.86321258544922
        entropy_coeff: 0.0
        kl: 0.004638574086129665
        policy_loss: -0.007662082556635141
        total_loss: 5185.68115234375
        vf_explained_var: 8.540632552467287e-05
        vf_loss: 5185.68896484375
    load_time_ms: 13.876
    num_steps_sampled: 975000
    num_steps_trained: 973440
    sample_time_ms: 10889.236
    update_time_ms: 8.825
  iterations_since_restore: 65
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,65,907.833,975000,2627.37






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-29-57
  done: false
  episode_len_mean: 1025.62
  episode_reward_max: 4627.0521930749865
  episode_reward_mean: 2638.1980045373507
  episode_reward_min: 1044.1944152020228
  episodes_this_iter: 16
  episodes_total: 834
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3191.104
    learner:
      default_policy:
        cur_kl_coeff: 5.421010943206879e-21
        cur_lr: 4.999999873689376e-05
        entropy: 33.84092330932617
        entropy_coeff: 0.0
        kl: 0.0052543990314006805
        policy_loss: -0.007474096026271582
        total_loss: 5712.61865234375
        vf_explained_var: 0.00023510567552875727
        vf_loss: 5712.62548828125
    load_time_ms: 13.822
    num_steps_sampled: 990000
    num_steps_trained: 988416
    sample_time_ms: 10868.688
    update_time_ms: 8.665
  iterations_since_restore: 66
  node_ip: 192.168.107.15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,66,922.246,990000,2638.2






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-30-11
  done: false
  episode_len_mean: 1044.85
  episode_reward_max: 4627.0521930749865
  episode_reward_mean: 2653.4339207165085
  episode_reward_min: 1132.7105141491547
  episodes_this_iter: 16
  episodes_total: 850
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3190.501
    learner:
      default_policy:
        cur_kl_coeff: 2.7105054716034394e-21
        cur_lr: 4.999999873689376e-05
        entropy: 33.94505310058594
        entropy_coeff: 0.0
        kl: 0.004812614060938358
        policy_loss: -0.008095013909041882
        total_loss: 5178.54638671875
        vf_explained_var: 0.0005682014161720872
        vf_loss: 5178.5556640625
    load_time_ms: 14.202
    num_steps_sampled: 1005000
    num_steps_trained: 1003392
    sample_time_ms: 10951.649
    update_time_ms: 8.36
  iterations_since_restore: 67
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,67,936.314,1005000,2653.43






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-30-25
  done: false
  episode_len_mean: 1025.84
  episode_reward_max: 4627.0521930749865
  episode_reward_mean: 2627.491084013533
  episode_reward_min: 1132.7105141491547
  episodes_this_iter: 12
  episodes_total: 862
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3189.183
    learner:
      default_policy:
        cur_kl_coeff: 1.3552527358017197e-21
        cur_lr: 4.999999873689376e-05
        entropy: 33.86417770385742
        entropy_coeff: 0.0
        kl: 0.007115734741091728
        policy_loss: -0.009247939102351665
        total_loss: 5369.43701171875
        vf_explained_var: 2.558211053838022e-05
        vf_loss: 5369.44677734375
    load_time_ms: 13.613
    num_steps_sampled: 1020000
    num_steps_trained: 1018368
    sample_time_ms: 10906.579
    update_time_ms: 8.453
  iterations_since_restore: 68
  node_ip: 192.168.107.15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,68,950.22,1020000,2627.49






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-30-40
  done: false
  episode_len_mean: 1020.74
  episode_reward_max: 4627.0521930749865
  episode_reward_mean: 2588.115399315732
  episode_reward_min: 1132.7105141491547
  episodes_this_iter: 13
  episodes_total: 875
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3187.613
    learner:
      default_policy:
        cur_kl_coeff: 6.776263679008599e-22
        cur_lr: 4.999999873689376e-05
        entropy: 33.93277359008789
        entropy_coeff: 0.0
        kl: 0.005674649961292744
        policy_loss: -0.0072809127159416676
        total_loss: 5741.0966796875
        vf_explained_var: 0.00046272206236608326
        vf_loss: 5741.103515625
    load_time_ms: 14.012
    num_steps_sampled: 1035000
    num_steps_trained: 1033344
    sample_time_ms: 10886.178
    update_time_ms: 8.617
  iterations_since_restore: 69
  node_ip: 192.168.107.157


Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,69,964.655,1035000,2588.12








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-30-54
  done: false
  episode_len_mean: 1032.07
  episode_reward_max: 4386.339801262414
  episode_reward_mean: 2588.8861689503715
  episode_reward_min: 1286.3991898953402
  episodes_this_iter: 19
  episodes_total: 894
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3186.773
    learner:
      default_policy:
        cur_kl_coeff: 3.3881318395042993e-22
        cur_lr: 4.999999873689376e-05
        entropy: 33.88072204589844
        entropy_coeff: 0.0
        kl: 0.004960719496011734
        policy_loss: -0.007742301560938358
        total_loss: 7124.35400390625
        vf_explained_var: 0.00013821553147863597
        vf_loss: 7124.36279296875
    load_time_ms: 14.459
    num_steps_sampled: 1050000
    num_steps_trained: 1048320
    sample_time_ms: 10939.46
    update_time_ms: 8.636
  iterations_since_restore: 70
  node_ip: 192.168.107.15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,70,979.412,1050000,2588.89






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-31-09
  done: false
  episode_len_mean: 1032.45
  episode_reward_max: 4386.339801262414
  episode_reward_mean: 2580.9025785100703
  episode_reward_min: 1286.3991898953402
  episodes_this_iter: 15
  episodes_total: 909
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3191.787
    learner:
      default_policy:
        cur_kl_coeff: 1.6940659197521496e-22
        cur_lr: 4.999999873689376e-05
        entropy: 34.05192947387695
        entropy_coeff: 0.0
        kl: 0.005706950090825558
        policy_loss: -0.007559114135801792
        total_loss: 5400.55615234375
        vf_explained_var: 1.3857825251761824e-05
        vf_loss: 5400.5634765625
    load_time_ms: 14.984
    num_steps_sampled: 1065000
    num_steps_trained: 1063296
    sample_time_ms: 11020.539
    update_time_ms: 8.715
  iterations_since_restore: 71
  node_ip: 192.168.107.15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,71,993.618,1065000,2580.9




[2m[36m(pid=14852)[0m 


Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-31-22
  done: false
  episode_len_mean: 994.65
  episode_reward_max: 4386.339801262414
  episode_reward_mean: 2513.1138150891625
  episode_reward_min: 923.9499390747911
  episodes_this_iter: 13
  episodes_total: 922
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3198.06
    learner:
      default_policy:
        cur_kl_coeff: 8.470329598760748e-23
        cur_lr: 4.999999873689376e-05
        entropy: 34.03219985961914
        entropy_coeff: 0.0
        kl: 0.004757540766149759
        policy_loss: -0.007963969372212887
        total_loss: 7006.62646484375
        vf_explained_var: 4.567906216834672e-05
        vf_loss: 7006.634765625
    load_time_ms: 15.669
    num_steps_sampled: 1080000
    num_steps_trained: 1078272
    sample_time_ms: 11018.462
    update_time_ms: 8.551
  iterations_since_restore: 72
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,72,1007.04,1080000,2513.11






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-31-36
  done: false
  episode_len_mean: 1014.76
  episode_reward_max: 4721.040200467894
  episode_reward_mean: 2599.4390293563188
  episode_reward_min: 923.9499390747911
  episodes_this_iter: 17
  episodes_total: 939
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3201.862
    learner:
      default_policy:
        cur_kl_coeff: 4.235164799380374e-23
        cur_lr: 4.999999873689376e-05
        entropy: 34.001365661621094
        entropy_coeff: 0.0
        kl: 0.005033332854509354
        policy_loss: -0.007723943796008825
        total_loss: 6036.67041015625
        vf_explained_var: 5.228448208072223e-05
        vf_loss: 6036.677734375
    load_time_ms: 15.917
    num_steps_sampled: 1095000
    num_steps_trained: 1093248
    sample_time_ms: 11006.505
    update_time_ms: 8.492
  iterations_since_restore: 73
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,73,1021.41,1095000,2599.44






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-31-50
  done: false
  episode_len_mean: 989.35
  episode_reward_max: 4721.040200467894
  episode_reward_mean: 2572.9883060933553
  episode_reward_min: 923.9499390747911
  episodes_this_iter: 14
  episodes_total: 953
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3203.861
    learner:
      default_policy:
        cur_kl_coeff: 2.117582399690187e-23
        cur_lr: 4.999999873689376e-05
        entropy: 34.03175735473633
        entropy_coeff: 0.0
        kl: 0.005747227463871241
        policy_loss: -0.008693796582520008
        total_loss: 6725.85009765625
        vf_explained_var: 3.629260618254193e-06
        vf_loss: 6725.8583984375
    load_time_ms: 15.628
    num_steps_sampled: 1110000
    num_steps_trained: 1108224
    sample_time_ms: 10907.249
    update_time_ms: 8.538
  iterations_since_restore: 74
  node_ip: 192.168.107.157
  

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,74,1035.1,1110000,2572.99






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-32-05
  done: false
  episode_len_mean: 1002.59
  episode_reward_max: 4721.040200467894
  episode_reward_mean: 2606.960288402936
  episode_reward_min: 923.9499390747911
  episodes_this_iter: 13
  episodes_total: 966
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3204.798
    learner:
      default_policy:
        cur_kl_coeff: 1.0587911998450935e-23
        cur_lr: 4.999999873689376e-05
        entropy: 34.052162170410156
        entropy_coeff: 0.0
        kl: 0.008405512198805809
        policy_loss: -0.005055007990449667
        total_loss: 5296.6162109375
        vf_explained_var: 3.746075526578352e-05
        vf_loss: 5296.62109375
    load_time_ms: 15.501
    num_steps_sampled: 1125000
    num_steps_trained: 1123200
    sample_time_ms: 10960.17
    update_time_ms: 8.57
  iterations_since_restore: 75
  node_ip: 192.168.107.157
  num

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,75,1049.94,1125000,2606.96








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-32-20
  done: false
  episode_len_mean: 967.94
  episode_reward_max: 4721.040200467894
  episode_reward_mean: 2564.110255004649
  episode_reward_min: 923.9499390747911
  episodes_this_iter: 19
  episodes_total: 985
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.619
    learner:
      default_policy:
        cur_kl_coeff: 5.293955999225468e-24
        cur_lr: 4.999999873689376e-05
        entropy: 33.902950286865234
        entropy_coeff: 0.0
        kl: 0.006455190014094114
        policy_loss: -0.009721637703478336
        total_loss: 7924.90380859375
        vf_explained_var: 0.00012590741971507668
        vf_loss: 7924.9130859375
    load_time_ms: 15.704
    num_steps_sampled: 1140000
    num_steps_trained: 1138176
    sample_time_ms: 11043.412
    update_time_ms: 8.701
  iterations_since_restore: 76
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,76,1065.23,1140000,2564.11






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-32-35
  done: false
  episode_len_mean: 975.23
  episode_reward_max: 4721.040200467894
  episode_reward_mean: 2593.741725994943
  episode_reward_min: 923.9499390747911
  episodes_this_iter: 18
  episodes_total: 1003
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.964
    learner:
      default_policy:
        cur_kl_coeff: 2.646977999612734e-24
        cur_lr: 4.999999873689376e-05
        entropy: 33.902259826660156
        entropy_coeff: 0.0
        kl: 0.004571858793497086
        policy_loss: -0.006444551516324282
        total_loss: 6027.46728515625
        vf_explained_var: 0.0004656676610466093
        vf_loss: 6027.47412109375
    load_time_ms: 15.308
    num_steps_sampled: 1155000
    num_steps_trained: 1153152
    sample_time_ms: 11114.954
    update_time_ms: 8.897
  iterations_since_restore: 77
  node_ip: 192.168.107.157


Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,77,1080.01,1155000,2593.74






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-32-50
  done: false
  episode_len_mean: 970.48
  episode_reward_max: 4721.040200467894
  episode_reward_mean: 2634.1939141373678
  episode_reward_min: 939.5371777523453
  episodes_this_iter: 16
  episodes_total: 1019
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3212.501
    learner:
      default_policy:
        cur_kl_coeff: 1.323488999806367e-24
        cur_lr: 4.999999873689376e-05
        entropy: 33.93164825439453
        entropy_coeff: 0.0
        kl: 0.007032421883195639
        policy_loss: -0.009826266206800938
        total_loss: 7304.818359375
        vf_explained_var: 0.00020448571012821048
        vf_loss: 7304.828125
    load_time_ms: 14.982
    num_steps_sampled: 1170000
    num_steps_trained: 1168128
    sample_time_ms: 11207.004
    update_time_ms: 8.819
  iterations_since_restore: 78
  node_ip: 192.168.107.157
  num_

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,78,1094.88,1170000,2634.19






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-33-03
  done: false
  episode_len_mean: 972.39
  episode_reward_max: 4504.736450241661
  episode_reward_mean: 2647.80534439307
  episode_reward_min: 939.5371777523453
  episodes_this_iter: 11
  episodes_total: 1030
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.588
    learner:
      default_policy:
        cur_kl_coeff: 6.617444999031835e-25
        cur_lr: 4.999999873689376e-05
        entropy: 33.91541290283203
        entropy_coeff: 0.0
        kl: 0.006538694724440575
        policy_loss: -0.008414401672780514
        total_loss: 7025.4462890625
        vf_explained_var: 1.2116044672438875e-05
        vf_loss: 7025.455078125
    load_time_ms: 14.468
    num_steps_sampled: 1185000
    num_steps_trained: 1183104
    sample_time_ms: 11087.969
    update_time_ms: 8.927
  iterations_since_restore: 79
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,79,1108.08,1185000,2647.81






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-33-18
  done: false
  episode_len_mean: 966.81
  episode_reward_max: 5393.495504548492
  episode_reward_mean: 2675.9733755300194
  episode_reward_min: 939.5371777523453
  episodes_this_iter: 15
  episodes_total: 1045
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3210.204
    learner:
      default_policy:
        cur_kl_coeff: 3.3087224995159173e-25
        cur_lr: 4.999999873689376e-05
        entropy: 33.84640884399414
        entropy_coeff: 0.0
        kl: 0.007581103127449751
        policy_loss: -0.007323131430894136
        total_loss: 8223.951171875
        vf_explained_var: 0.00023096952645573765
        vf_loss: 8223.9580078125
    load_time_ms: 14.493
    num_steps_sampled: 1200000
    num_steps_trained: 1198080
    sample_time_ms: 11053.392
    update_time_ms: 8.844
  iterations_since_restore: 80
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,80,1122.51,1200000,2675.97






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-33-32
  done: false
  episode_len_mean: 977.03
  episode_reward_max: 5393.495504548492
  episode_reward_mean: 2718.47043349201
  episode_reward_min: 939.5371777523453
  episodes_this_iter: 15
  episodes_total: 1060
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.049
    learner:
      default_policy:
        cur_kl_coeff: 1.6543612497579586e-25
        cur_lr: 4.999999873689376e-05
        entropy: 33.927608489990234
        entropy_coeff: 0.0
        kl: 0.004720563068985939
        policy_loss: -0.005812278017401695
        total_loss: 6778.14306640625
        vf_explained_var: 8.142503793351352e-05
        vf_loss: 6778.14892578125
    load_time_ms: 14.189
    num_steps_sampled: 1215000
    num_steps_trained: 1213056
    sample_time_ms: 11113.364
    update_time_ms: 8.715
  iterations_since_restore: 81
  node_ip: 192.168.107.157


Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,81,1137.28,1215000,2718.47






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-33-46
  done: false
  episode_len_mean: 995.76
  episode_reward_max: 5393.495504548492
  episode_reward_mean: 2761.678350952053
  episode_reward_min: 1114.8859998731798
  episodes_this_iter: 14
  episodes_total: 1074
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3202.896
    learner:
      default_policy:
        cur_kl_coeff: 8.271806248789793e-26
        cur_lr: 4.999999873689376e-05
        entropy: 33.910438537597656
        entropy_coeff: 0.0
        kl: 0.006322893314063549
        policy_loss: -0.005752264056354761
        total_loss: 8051.4990234375
        vf_explained_var: 2.043369477178203e-06
        vf_loss: 8051.5048828125
    load_time_ms: 13.545
    num_steps_sampled: 1230000
    num_steps_trained: 1228032
    sample_time_ms: 11108.771
    update_time_ms: 8.629
  iterations_since_restore: 82
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,82,1150.6,1230000,2761.68






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-34-00
  done: false
  episode_len_mean: 1025.41
  episode_reward_max: 5393.495504548492
  episode_reward_mean: 2836.7601705268935
  episode_reward_min: 1114.8859998731798
  episodes_this_iter: 13
  episodes_total: 1087
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3202.074
    learner:
      default_policy:
        cur_kl_coeff: 4.1359031243948966e-26
        cur_lr: 4.999999873689376e-05
        entropy: 33.918487548828125
        entropy_coeff: 0.0
        kl: 0.00384651031345129
        policy_loss: -0.006142349913716316
        total_loss: 6741.55029296875
        vf_explained_var: 0.00011448626173660159
        vf_loss: 6741.55615234375
    load_time_ms: 13.295
    num_steps_sampled: 1245000
    num_steps_trained: 1243008
    sample_time_ms: 11082.183
    update_time_ms: 8.653
  iterations_since_restore: 83
  node_ip: 192.168.107.

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,83,1164.7,1245000,2836.76






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-34-14
  done: false
  episode_len_mean: 1040.45
  episode_reward_max: 5393.495504548492
  episode_reward_mean: 2891.303022961818
  episode_reward_min: 1221.319451664544
  episodes_this_iter: 13
  episodes_total: 1100
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3200.609
    learner:
      default_policy:
        cur_kl_coeff: 2.0679515621974483e-26
        cur_lr: 4.999999873689376e-05
        entropy: 33.793617248535156
        entropy_coeff: 0.0
        kl: 0.007099559064954519
        policy_loss: -0.007173670455813408
        total_loss: 6907.765625
        vf_explained_var: 3.3091786463046446e-05
        vf_loss: 6907.7724609375
    load_time_ms: 13.871
    num_steps_sampled: 1260000
    num_steps_trained: 1257984
    sample_time_ms: 11140.897
    update_time_ms: 8.888
  iterations_since_restore: 84
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,84,1178.97,1260000,2891.3






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-34-28
  done: false
  episode_len_mean: 1068.13
  episode_reward_max: 5393.495504548492
  episode_reward_mean: 2937.479206506716
  episode_reward_min: 1221.319451664544
  episodes_this_iter: 14
  episodes_total: 1114
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3199.492
    learner:
      default_policy:
        cur_kl_coeff: 1.0339757810987241e-26
        cur_lr: 4.999999873689376e-05
        entropy: 33.9090690612793
        entropy_coeff: 0.0
        kl: 0.004323728382587433
        policy_loss: -0.007985815405845642
        total_loss: 6608.6982421875
        vf_explained_var: 9.321249308413826e-06
        vf_loss: 6608.7080078125
    load_time_ms: 14.249
    num_steps_sampled: 1275000
    num_steps_trained: 1272960
    sample_time_ms: 11080.407
    update_time_ms: 9.012
  iterations_since_restore: 85
  node_ip: 192.168.107.157
  

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,85,1193.19,1275000,2937.48






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-34-43
  done: false
  episode_len_mean: 1067.79
  episode_reward_max: 5393.495504548492
  episode_reward_mean: 2951.816585040372
  episode_reward_min: 1221.319451664544
  episodes_this_iter: 13
  episodes_total: 1127
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3199.502
    learner:
      default_policy:
        cur_kl_coeff: 5.169878905493621e-27
        cur_lr: 4.999999873689376e-05
        entropy: 33.8161735534668
        entropy_coeff: 0.0
        kl: 0.005795715376734734
        policy_loss: -0.00740985618904233
        total_loss: 8582.005859375
        vf_explained_var: 1.0093052878801245e-05
        vf_loss: 8582.0126953125
    load_time_ms: 13.662
    num_steps_sampled: 1290000
    num_steps_trained: 1287936
    sample_time_ms: 10971.531
    update_time_ms: 8.948
  iterations_since_restore: 86
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,86,1207.38,1290000,2951.82






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-34-56
  done: false
  episode_len_mean: 1101.91
  episode_reward_max: 4720.659208961551
  episode_reward_mean: 3039.4225643974637
  episode_reward_min: 1340.011633636725
  episodes_this_iter: 14
  episodes_total: 1141
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3199.824
    learner:
      default_policy:
        cur_kl_coeff: 2.5849394527468104e-27
        cur_lr: 4.999999873689376e-05
        entropy: 33.834373474121094
        entropy_coeff: 0.0
        kl: 0.005586844403296709
        policy_loss: -0.009549886919558048
        total_loss: 6819.45849609375
        vf_explained_var: 0.00011999841080978513
        vf_loss: 6819.4677734375
    load_time_ms: 14.275
    num_steps_sampled: 1305000
    num_steps_trained: 1302912
    sample_time_ms: 10845.388
    update_time_ms: 9.019
  iterations_since_restore: 87
  node_ip: 192.168.107.1

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,87,1220.92,1305000,3039.42






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-35-11
  done: false
  episode_len_mean: 1099.13
  episode_reward_max: 4720.659208961551
  episode_reward_mean: 3032.7397517692116
  episode_reward_min: 1258.901259865751
  episodes_this_iter: 13
  episodes_total: 1154
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3200.008
    learner:
      default_policy:
        cur_kl_coeff: 1.2924697263734052e-27
        cur_lr: 4.999999873689376e-05
        entropy: 33.85645294189453
        entropy_coeff: 0.0
        kl: 0.007135145831853151
        policy_loss: -0.008026973344385624
        total_loss: 6371.31884765625
        vf_explained_var: 9.653914503360284e-07
        vf_loss: 6371.32763671875
    load_time_ms: 14.673
    num_steps_sampled: 1320000
    num_steps_trained: 1317888
    sample_time_ms: 10805.024
    update_time_ms: 8.907
  iterations_since_restore: 88
  node_ip: 192.168.107.15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,88,1235.38,1320000,3032.74






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-35-24
  done: false
  episode_len_mean: 1102.22
  episode_reward_max: 4720.659208961551
  episode_reward_mean: 3009.1957315234454
  episode_reward_min: 1258.901259865751
  episodes_this_iter: 15
  episodes_total: 1169
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.748
    learner:
      default_policy:
        cur_kl_coeff: 6.462348631867026e-28
        cur_lr: 4.999999873689376e-05
        entropy: 33.89148712158203
        entropy_coeff: 0.0
        kl: 0.006054412107914686
        policy_loss: -0.007104792166501284
        total_loss: 7091.923828125
        vf_explained_var: 3.1233853405865375e-06
        vf_loss: 7091.9306640625
    load_time_ms: 15.263
    num_steps_sampled: 1335000
    num_steps_trained: 1332864
    sample_time_ms: 10821.369
    update_time_ms: 8.912
  iterations_since_restore: 89
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,89,1248.84,1335000,3009.2






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-35-39
  done: false
  episode_len_mean: 1095.08
  episode_reward_max: 4720.659208961551
  episode_reward_mean: 3016.793459204851
  episode_reward_min: 1258.901259865751
  episodes_this_iter: 16
  episodes_total: 1185
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.463
    learner:
      default_policy:
        cur_kl_coeff: 3.231174315933513e-28
        cur_lr: 4.999999873689376e-05
        entropy: 33.8347053527832
        entropy_coeff: 0.0
        kl: 0.005528660956770182
        policy_loss: -0.006159741431474686
        total_loss: 6037.8642578125
        vf_explained_var: 3.30214825225994e-05
        vf_loss: 6037.87109375
    load_time_ms: 15.115
    num_steps_sampled: 1350000
    num_steps_trained: 1347840
    sample_time_ms: 10861.263
    update_time_ms: 8.715
  iterations_since_restore: 90
  node_ip: 192.168.107.157
  num_

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,90,1263.65,1350000,3016.79






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-35-52
  done: false
  episode_len_mean: 1089.02
  episode_reward_max: 4720.659208961551
  episode_reward_mean: 2989.8999461014323
  episode_reward_min: 1258.901259865751
  episodes_this_iter: 10
  episodes_total: 1195
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.136
    learner:
      default_policy:
        cur_kl_coeff: 1.6155871579667565e-28
        cur_lr: 4.999999873689376e-05
        entropy: 33.868858337402344
        entropy_coeff: 0.0
        kl: 0.006741335149854422
        policy_loss: -0.004363319370895624
        total_loss: 8063.90380859375
        vf_explained_var: 1.0375283636676613e-05
        vf_loss: 8063.90869140625
    load_time_ms: 15.399
    num_steps_sampled: 1365000
    num_steps_trained: 1362816
    sample_time_ms: 10702.24
    update_time_ms: 8.857
  iterations_since_restore: 91
  node_ip: 192.168.107.1

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,91,1276.85,1365000,2989.9






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-36-08
  done: false
  episode_len_mean: 1077.86
  episode_reward_max: 4720.659208961551
  episode_reward_mean: 2969.436508361693
  episode_reward_min: 1258.901259865751
  episodes_this_iter: 20
  episodes_total: 1215
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3206.952
    learner:
      default_policy:
        cur_kl_coeff: 8.077935789833782e-29
        cur_lr: 4.999999873689376e-05
        entropy: 33.80921936035156
        entropy_coeff: 0.0
        kl: 0.005911604966968298
        policy_loss: -0.0074473414570093155
        total_loss: 6547.17138671875
        vf_explained_var: 2.2154077669256367e-05
        vf_loss: 6547.1787109375
    load_time_ms: 15.415
    num_steps_sampled: 1380000
    num_steps_trained: 1377792
    sample_time_ms: 10905.203
    update_time_ms: 8.916
  iterations_since_restore: 92
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,92,1292.18,1380000,2969.44






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-36-21
  done: false
  episode_len_mean: 1097.2
  episode_reward_max: 4720.659208961551
  episode_reward_mean: 3018.50203644692
  episode_reward_min: 1258.901259865751
  episodes_this_iter: 9
  episodes_total: 1224
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.36
    learner:
      default_policy:
        cur_kl_coeff: 4.038967894916891e-29
        cur_lr: 4.999999873689376e-05
        entropy: 33.76314926147461
        entropy_coeff: 0.0
        kl: 0.007482707034796476
        policy_loss: -0.008189043030142784
        total_loss: 5966.40185546875
        vf_explained_var: 3.921374445781112e-05
        vf_loss: 5966.40966796875
    load_time_ms: 15.613
    num_steps_sampled: 1395000
    num_steps_trained: 1392768
    sample_time_ms: 10830.719
    update_time_ms: 9.053
  iterations_since_restore: 93
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,93,1305.55,1395000,3018.5






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-36-36
  done: false
  episode_len_mean: 1080.29
  episode_reward_max: 4654.542748701891
  episode_reward_mean: 2895.801766140307
  episode_reward_min: 1258.901259865751
  episodes_this_iter: 14
  episodes_total: 1238
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.659
    learner:
      default_policy:
        cur_kl_coeff: 2.0194839474584456e-29
        cur_lr: 4.999999873689376e-05
        entropy: 33.73035430908203
        entropy_coeff: 0.0
        kl: 0.005563142243772745
        policy_loss: -0.00802732165902853
        total_loss: 6088.7734375
        vf_explained_var: 0.0001418437750544399
        vf_loss: 6088.7822265625
    load_time_ms: 15.398
    num_steps_sampled: 1410000
    num_steps_trained: 1407744
    sample_time_ms: 10891.147
    update_time_ms: 8.88
  iterations_since_restore: 94
  node_ip: 192.168.107.157
  num_

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,94,1320.43,1410000,2895.8






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-36-50
  done: false
  episode_len_mean: 1094.78
  episode_reward_max: 4959.574967549573
  episode_reward_mean: 2925.231404939388
  episode_reward_min: 1269.0982724564099
  episodes_this_iter: 12
  episodes_total: 1250
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.929
    learner:
      default_policy:
        cur_kl_coeff: 1.0097419737292228e-29
        cur_lr: 4.999999873689376e-05
        entropy: 33.67770004272461
        entropy_coeff: 0.0
        kl: 0.008495545014739037
        policy_loss: -0.007098873611539602
        total_loss: 7000.927734375
        vf_explained_var: 4.847334821533877e-06
        vf_loss: 7000.935546875
    load_time_ms: 14.78
    num_steps_sampled: 1425000
    num_steps_trained: 1422720
    sample_time_ms: 10905.374
    update_time_ms: 8.995
  iterations_since_restore: 95
  node_ip: 192.168.107.157
  n

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,95,1334.79,1425000,2925.23






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-37-04
  done: false
  episode_len_mean: 1109.13
  episode_reward_max: 4959.574967549573
  episode_reward_mean: 2971.4741265008106
  episode_reward_min: 1269.0982724564099
  episodes_this_iter: 12
  episodes_total: 1262
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.379
    learner:
      default_policy:
        cur_kl_coeff: 5.048709868646114e-30
        cur_lr: 4.999999873689376e-05
        entropy: 33.6562385559082
        entropy_coeff: 0.0
        kl: 0.007175872102379799
        policy_loss: -0.008624250069260597
        total_loss: 7819.12939453125
        vf_explained_var: 7.296219791896874e-06
        vf_loss: 7819.1376953125
    load_time_ms: 15.083
    num_steps_sampled: 1440000
    num_steps_trained: 1437696
    sample_time_ms: 10820.298
    update_time_ms: 8.847
  iterations_since_restore: 96
  node_ip: 192.168.107.157


Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,96,1348.11,1440000,2971.47






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-37-18
  done: false
  episode_len_mean: 1111.92
  episode_reward_max: 4959.574967549573
  episode_reward_mean: 3012.786450069543
  episode_reward_min: 1106.7867912976258
  episodes_this_iter: 15
  episodes_total: 1277
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3211.169
    learner:
      default_policy:
        cur_kl_coeff: 2.524354934323057e-30
        cur_lr: 4.999999873689376e-05
        entropy: 33.56068801879883
        entropy_coeff: 0.0
        kl: 0.006642816588282585
        policy_loss: -0.008351963013410568
        total_loss: 7156.91357421875
        vf_explained_var: 1.3788031537842471e-05
        vf_loss: 7156.9228515625
    load_time_ms: 15.064
    num_steps_sampled: 1455000
    num_steps_trained: 1452672
    sample_time_ms: 10889.413
    update_time_ms: 8.672
  iterations_since_restore: 97
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,97,1362.37,1455000,3012.79






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-37-33
  done: false
  episode_len_mean: 1117.66
  episode_reward_max: 4959.574967549573
  episode_reward_mean: 2999.6195608205894
  episode_reward_min: 931.6807419056057
  episodes_this_iter: 17
  episodes_total: 1294
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.481
    learner:
      default_policy:
        cur_kl_coeff: 1.2621774671615285e-30
        cur_lr: 4.999999873689376e-05
        entropy: 33.47092056274414
        entropy_coeff: 0.0
        kl: 0.0068502118811011314
        policy_loss: -0.007549270521849394
        total_loss: 6538.5693359375
        vf_explained_var: 4.130245361011475e-05
        vf_loss: 6538.5771484375
    load_time_ms: 14.525
    num_steps_sampled: 1470000
    num_steps_trained: 1467648
    sample_time_ms: 10905.166
    update_time_ms: 8.725
  iterations_since_restore: 98
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,98,1376.96,1470000,2999.62






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-37-46
  done: false
  episode_len_mean: 1097.09
  episode_reward_max: 4959.574967549573
  episode_reward_mean: 3006.7360956935904
  episode_reward_min: 931.6807419056057
  episodes_this_iter: 11
  episodes_total: 1305
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3204.362
    learner:
      default_policy:
        cur_kl_coeff: 6.3108873358076425e-31
        cur_lr: 4.999999873689376e-05
        entropy: 33.5351448059082
        entropy_coeff: 0.0
        kl: 0.007566096726804972
        policy_loss: -0.007288618013262749
        total_loss: 8096.63525390625
        vf_explained_var: 3.9392049075104296e-05
        vf_loss: 8096.642578125
    load_time_ms: 14.339
    num_steps_sampled: 1485000
    num_steps_trained: 1482624
    sample_time_ms: 10907.212
    update_time_ms: 8.678
  iterations_since_restore: 99
  node_ip: 192.168.107.157


Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,99,1390.4,1485000,3006.74






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-38-00
  done: false
  episode_len_mean: 1107.85
  episode_reward_max: 4959.574967549573
  episode_reward_mean: 3035.092105554306
  episode_reward_min: 931.6807419056057
  episodes_this_iter: 13
  episodes_total: 1318
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3206.672
    learner:
      default_policy:
        cur_kl_coeff: 3.1554436679038213e-31
        cur_lr: 4.999999873689376e-05
        entropy: 33.55628204345703
        entropy_coeff: 0.0
        kl: 0.004523781128227711
        policy_loss: -0.005230198614299297
        total_loss: 8626.845703125
        vf_explained_var: 3.4468805552023696e-06
        vf_loss: 8626.8515625
    load_time_ms: 14.741
    num_steps_sampled: 1500000
    num_steps_trained: 1497600
    sample_time_ms: 10858.9
    update_time_ms: 9.038
  iterations_since_restore: 100
  node_ip: 192.168.107.157
  num

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,100,1404.75,1500000,3035.09






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-38-15
  done: false
  episode_len_mean: 1117.08
  episode_reward_max: 5399.5034073792285
  episode_reward_mean: 3091.221962410559
  episode_reward_min: 931.6807419056057
  episodes_this_iter: 13
  episodes_total: 1331
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.208
    learner:
      default_policy:
        cur_kl_coeff: 1.5777218339519106e-31
        cur_lr: 4.999999873689376e-05
        entropy: 33.637115478515625
        entropy_coeff: 0.0
        kl: 0.007653938606381416
        policy_loss: -0.0063405707478523254
        total_loss: 7351.81982421875
        vf_explained_var: 2.4253995434264652e-05
        vf_loss: 7351.82763671875
    load_time_ms: 14.804
    num_steps_sampled: 1515000
    num_steps_trained: 1512576
    sample_time_ms: 10963.283
    update_time_ms: 8.947
  iterations_since_restore: 101
  node_ip: 192.168.10

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,101,1419.01,1515000,3091.22






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-38-29
  done: false
  episode_len_mean: 1115.56
  episode_reward_max: 5702.948315201679
  episode_reward_mean: 3167.7091013608856
  episode_reward_min: 931.6807419056057
  episodes_this_iter: 14
  episodes_total: 1345
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3209.088
    learner:
      default_policy:
        cur_kl_coeff: 7.888609169759553e-32
        cur_lr: 4.999999873689376e-05
        entropy: 33.58855438232422
        entropy_coeff: 0.0
        kl: 0.005586362909525633
        policy_loss: -0.00778333330526948
        total_loss: 6811.4423828125
        vf_explained_var: 1.2333576933087897e-06
        vf_loss: 6811.44970703125
    load_time_ms: 14.777
    num_steps_sampled: 1530000
    num_steps_trained: 1527552
    sample_time_ms: 10830.705
    update_time_ms: 8.988
  iterations_since_restore: 102
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,102,1433.03,1530000,3167.71






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-38-42
  done: false
  episode_len_mean: 1102.48
  episode_reward_max: 5702.948315201679
  episode_reward_mean: 3109.097846869731
  episode_reward_min: 931.6807419056057
  episodes_this_iter: 12
  episodes_total: 1357
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3209.629
    learner:
      default_policy:
        cur_kl_coeff: 3.9443045848797766e-32
        cur_lr: 4.999999873689376e-05
        entropy: 33.63173294067383
        entropy_coeff: 0.0
        kl: 0.006032771896570921
        policy_loss: -0.006416637450456619
        total_loss: 7582.47509765625
        vf_explained_var: 3.148347786918748e-07
        vf_loss: 7582.48046875
    load_time_ms: 15.123
    num_steps_sampled: 1545000
    num_steps_trained: 1542528
    sample_time_ms: 10818.946
    update_time_ms: 8.881
  iterations_since_restore: 103
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,103,1446.29,1545000,3109.1






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-38-56
  done: false
  episode_len_mean: 1121.39
  episode_reward_max: 5702.948315201679
  episode_reward_mean: 3161.5235450972077
  episode_reward_min: 931.6807419056057
  episodes_this_iter: 14
  episodes_total: 1371
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3210.202
    learner:
      default_policy:
        cur_kl_coeff: 1.9721522924398883e-32
        cur_lr: 4.999999873689376e-05
        entropy: 33.61213302612305
        entropy_coeff: 0.0
        kl: 0.006104090251028538
        policy_loss: -0.006875457242131233
        total_loss: 7388.3076171875
        vf_explained_var: 7.655019726371393e-05
        vf_loss: 7388.3134765625
    load_time_ms: 15.255
    num_steps_sampled: 1560000
    num_steps_trained: 1557504
    sample_time_ms: 10702.977
    update_time_ms: 8.963
  iterations_since_restore: 104
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,104,1460.01,1560000,3161.52






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-39-10
  done: false
  episode_len_mean: 1111.61
  episode_reward_max: 5702.948315201679
  episode_reward_mean: 3154.669616060491
  episode_reward_min: 1299.1575890175116
  episodes_this_iter: 14
  episodes_total: 1385
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3212.835
    learner:
      default_policy:
        cur_kl_coeff: 9.860761462199441e-33
        cur_lr: 4.999999873689376e-05
        entropy: 33.6093864440918
        entropy_coeff: 0.0
        kl: 0.006838678382337093
        policy_loss: -0.006880303379148245
        total_loss: 7772.802734375
        vf_explained_var: 1.1157276276207995e-05
        vf_loss: 7772.80859375
    load_time_ms: 15.706
    num_steps_sampled: 1575000
    num_steps_trained: 1572480
    sample_time_ms: 10689.876
    update_time_ms: 8.76
  iterations_since_restore: 105
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,105,1474.27,1575000,3154.67






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-39-25
  done: false
  episode_len_mean: 1111.77
  episode_reward_max: 5702.948315201679
  episode_reward_mean: 3169.7163129249925
  episode_reward_min: 1299.1575890175116
  episodes_this_iter: 15
  episodes_total: 1400
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3215.058
    learner:
      default_policy:
        cur_kl_coeff: 4.930380731099721e-33
        cur_lr: 4.999999873689376e-05
        entropy: 33.54035949707031
        entropy_coeff: 0.0
        kl: 0.006684566382318735
        policy_loss: -0.007062532007694244
        total_loss: 7300.18359375
        vf_explained_var: 5.995615993015235e-06
        vf_loss: 7300.18994140625
    load_time_ms: 15.311
    num_steps_sampled: 1590000
    num_steps_trained: 1587456
    sample_time_ms: 10844.639
    update_time_ms: 9.02
  iterations_since_restore: 106
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,106,1489.17,1590000,3169.72




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-39-39
  done: false
  episode_len_mean: 1121.32
  episode_reward_max: 5702.948315201679
  episode_reward_mean: 3189.090174398776
  episode_reward_min: 1299.1575890175116
  episodes_this_iter: 9
  episodes_total: 1409
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3209.016
    learner:
      default_policy:
        cur_kl_coeff: 2.4651903655498604e-33
        cur_lr: 4.999999873689376e-05
        entropy: 33.485469818115234
        entropy_coeff: 0.0
        kl: 0.0058462657034397125
        policy_loss: -0.007463091518729925
        total_loss: 7061.87744140625
        vf_explained_var: 3.567465319065377e-05
        vf_loss: 7061.884765625
    load_time_ms: 15.427
    num_steps_sampled: 1605000
    num_steps_trained: 1602432
    sample_time_ms: 10870.258
    update_time_ms: 9.137
  iterations_since_restore: 107
  node_ip: 192.168.107.15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,107,1503.63,1605000,3189.09






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-39-54
  done: false
  episode_len_mean: 1134.23
  episode_reward_max: 5702.948315201679
  episode_reward_mean: 3183.16143617341
  episode_reward_min: 1299.1575890175116
  episodes_this_iter: 14
  episodes_total: 1423
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3210.86
    learner:
      default_policy:
        cur_kl_coeff: 1.2325951827749302e-33
        cur_lr: 4.999999873689376e-05
        entropy: 33.43513107299805
        entropy_coeff: 0.0
        kl: 0.006756259128451347
        policy_loss: -0.007976830936968327
        total_loss: 7314.8125
        vf_explained_var: 1.05383051050012e-05
        vf_loss: 7314.8203125
    load_time_ms: 16.003
    num_steps_sampled: 1620000
    num_steps_trained: 1617408
    sample_time_ms: 10842.765
    update_time_ms: 9.115
  iterations_since_restore: 108
  node_ip: 192.168.107.157
  num_healt

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,108,1517.96,1620000,3183.16






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-40-08
  done: false
  episode_len_mean: 1131.34
  episode_reward_max: 5702.948315201679
  episode_reward_mean: 3159.396051241415
  episode_reward_min: 1299.1575890175116
  episodes_this_iter: 14
  episodes_total: 1437
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3210.173
    learner:
      default_policy:
        cur_kl_coeff: 6.162975913874651e-34
        cur_lr: 4.999999873689376e-05
        entropy: 33.27067565917969
        entropy_coeff: 0.0
        kl: 0.010963058099150658
        policy_loss: -0.006541367154568434
        total_loss: 7689.56298828125
        vf_explained_var: 2.1340499642974464e-06
        vf_loss: 7689.56884765625
    load_time_ms: 15.595
    num_steps_sampled: 1635000
    num_steps_trained: 1632384
    sample_time_ms: 10895.995
    update_time_ms: 9.021
  iterations_since_restore: 109
  node_ip: 192.168.107.1

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,109,1531.92,1635000,3159.4






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-40-21
  done: false
  episode_len_mean: 1166.18
  episode_reward_max: 4684.775481093586
  episode_reward_mean: 3261.5128542081766
  episode_reward_min: 1299.1575890175116
  episodes_this_iter: 12
  episodes_total: 1449
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3213.054
    learner:
      default_policy:
        cur_kl_coeff: 6.162975913874651e-34
        cur_lr: 4.999999873689376e-05
        entropy: 33.24433135986328
        entropy_coeff: 0.0
        kl: 0.0061359768733382225
        policy_loss: -0.007847312837839127
        total_loss: 7485.00244140625
        vf_explained_var: 9.154662166110938e-07
        vf_loss: 7485.0107421875
    load_time_ms: 15.335
    num_steps_sampled: 1650000
    num_steps_trained: 1647360
    sample_time_ms: 10816.088
    update_time_ms: 8.912
  iterations_since_restore: 110
  node_ip: 192.168.107.1

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,110,1545.51,1650000,3261.51






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-40-36
  done: false
  episode_len_mean: 1150.2
  episode_reward_max: 4868.256654122747
  episode_reward_mean: 3243.853172851882
  episode_reward_min: 1299.1575890175116
  episodes_this_iter: 13
  episodes_total: 1462
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3212.808
    learner:
      default_policy:
        cur_kl_coeff: 3.0814879569373254e-34
        cur_lr: 4.999999873689376e-05
        entropy: 33.20346450805664
        entropy_coeff: 0.0
        kl: 0.006054229568690062
        policy_loss: -0.008255749940872192
        total_loss: 8474.6640625
        vf_explained_var: 1.80342254907373e-07
        vf_loss: 8474.6728515625
    load_time_ms: 15.576
    num_steps_sampled: 1665000
    num_steps_trained: 1662336
    sample_time_ms: 10806.006
    update_time_ms: 8.993
  iterations_since_restore: 111
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,111,1559.66,1665000,3243.85






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-40-49
  done: false
  episode_len_mean: 1163.45
  episode_reward_max: 5449.226558330097
  episode_reward_mean: 3287.740342742979
  episode_reward_min: 1299.1575890175116
  episodes_this_iter: 14
  episodes_total: 1476
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3213.714
    learner:
      default_policy:
        cur_kl_coeff: 1.5407439784686627e-34
        cur_lr: 4.999999873689376e-05
        entropy: 33.29183578491211
        entropy_coeff: 0.0
        kl: 0.0061942036263644695
        policy_loss: -0.00945048127323389
        total_loss: 7262.01220703125
        vf_explained_var: 4.645086846721824e-06
        vf_loss: 7262.02197265625
    load_time_ms: 15.924
    num_steps_sampled: 1680000
    num_steps_trained: 1677312
    sample_time_ms: 10731.168
    update_time_ms: 9.052
  iterations_since_restore: 112
  node_ip: 192.168.107.1

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,112,1572.95,1680000,3287.74






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-41-03
  done: false
  episode_len_mean: 1162.45
  episode_reward_max: 5449.226558330097
  episode_reward_mean: 3268.4648915151643
  episode_reward_min: 1382.469131605557
  episodes_this_iter: 14
  episodes_total: 1490
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3216.549
    learner:
      default_policy:
        cur_kl_coeff: 7.703719892343314e-35
        cur_lr: 4.999999873689376e-05
        entropy: 33.2354850769043
        entropy_coeff: 0.0
        kl: 0.006303480360656977
        policy_loss: -0.00741534773260355
        total_loss: 6789.07373046875
        vf_explained_var: 6.57179413110498e-08
        vf_loss: 6789.08154296875
    load_time_ms: 15.426
    num_steps_sampled: 1695000
    num_steps_trained: 1692288
    sample_time_ms: 10836.618
    update_time_ms: 9.059
  iterations_since_restore: 113
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,113,1587.28,1695000,3268.46






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-41-18
  done: false
  episode_len_mean: 1131.05
  episode_reward_max: 5449.226558330097
  episode_reward_mean: 3209.709258293654
  episode_reward_min: 1382.469131605557
  episodes_this_iter: 15
  episodes_total: 1505
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3215.024
    learner:
      default_policy:
        cur_kl_coeff: 3.851859946171657e-35
        cur_lr: 4.999999873689376e-05
        entropy: 33.15487289428711
        entropy_coeff: 0.0
        kl: 0.007653933949768543
        policy_loss: -0.0074725584127008915
        total_loss: 8693.6279296875
        vf_explained_var: 5.410267931438284e-07
        vf_loss: 8693.634765625
    load_time_ms: 15.259
    num_steps_sampled: 1710000
    num_steps_trained: 1707264
    sample_time_ms: 10898.859
    update_time_ms: 8.668
  iterations_since_restore: 114
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,114,1601.6,1710000,3209.71






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-41-31
  done: false
  episode_len_mean: 1118.66
  episode_reward_max: 5449.226558330097
  episode_reward_mean: 3220.1831872436674
  episode_reward_min: 1382.469131605557
  episodes_this_iter: 11
  episodes_total: 1516
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3214.958
    learner:
      default_policy:
        cur_kl_coeff: 1.9259299730858284e-35
        cur_lr: 4.999999873689376e-05
        entropy: 33.20631408691406
        entropy_coeff: 0.0
        kl: 0.0074579124338924885
        policy_loss: -0.006270637270063162
        total_loss: 7592.9658203125
        vf_explained_var: 1.0725779247877654e-05
        vf_loss: 7592.97216796875
    load_time_ms: 15.125
    num_steps_sampled: 1725000
    num_steps_trained: 1722240
    sample_time_ms: 10850.551
    update_time_ms: 8.634
  iterations_since_restore: 115
  node_ip: 192.168.107.

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,115,1615.38,1725000,3220.18






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-41-45
  done: false
  episode_len_mean: 1128.53
  episode_reward_max: 5449.226558330097
  episode_reward_mean: 3221.1569523085723
  episode_reward_min: 1382.469131605557
  episodes_this_iter: 14
  episodes_total: 1530
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3214.149
    learner:
      default_policy:
        cur_kl_coeff: 9.629649865429142e-36
        cur_lr: 4.999999873689376e-05
        entropy: 33.2099723815918
        entropy_coeff: 0.0
        kl: 0.007758496329188347
        policy_loss: -0.008295116014778614
        total_loss: 8524.1708984375
        vf_explained_var: 1.538920696475543e-05
        vf_loss: 8524.1787109375
    load_time_ms: 15.473
    num_steps_sampled: 1740000
    num_steps_trained: 1737216
    sample_time_ms: 10729.243
    update_time_ms: 8.57
  iterations_since_restore: 116
  node_ip: 192.168.107.157
  

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,116,1629.06,1740000,3221.16








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-41-59
  done: false
  episode_len_mean: 1092.78
  episode_reward_max: 5449.226558330097
  episode_reward_mean: 3163.368679553072
  episode_reward_min: 1403.0786027303466
  episodes_this_iter: 17
  episodes_total: 1547
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3215.56
    learner:
      default_policy:
        cur_kl_coeff: 4.814824932714571e-36
        cur_lr: 4.999999873689376e-05
        entropy: 33.217491149902344
        entropy_coeff: 0.0
        kl: 0.0056132664903998375
        policy_loss: -0.008707115426659584
        total_loss: 7014.1611328125
        vf_explained_var: 1.9104052739749022e-07
        vf_loss: 7014.16943359375
    load_time_ms: 14.729
    num_steps_sampled: 1755000
    num_steps_trained: 1752192
    sample_time_ms: 10690.551
    update_time_ms: 8.676
  iterations_since_restore: 117
  node_ip: 192.168.107.1

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,117,1643.14,1755000,3163.37






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-42-14
  done: false
  episode_len_mean: 1057.1
  episode_reward_max: 5449.226558330097
  episode_reward_mean: 3102.4059251726717
  episode_reward_min: 1403.0786027303466
  episodes_this_iter: 15
  episodes_total: 1562
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3214.058
    learner:
      default_policy:
        cur_kl_coeff: 2.4074124663572855e-36
        cur_lr: 4.999999873689376e-05
        entropy: 33.06761169433594
        entropy_coeff: 0.0
        kl: 0.00902444776147604
        policy_loss: -0.008387893438339233
        total_loss: 9524.787109375
        vf_explained_var: 9.883163443191734e-08
        vf_loss: 9524.7939453125
    load_time_ms: 14.499
    num_steps_sampled: 1770000
    num_steps_trained: 1767168
    sample_time_ms: 10695.256
    update_time_ms: 8.704
  iterations_since_restore: 118
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,118,1657.5,1770000,3102.41






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-42-27
  done: false
  episode_len_mean: 1044.14
  episode_reward_max: 5169.4487483332105
  episode_reward_mean: 3070.1646435209077
  episode_reward_min: 1403.0786027303466
  episodes_this_iter: 12
  episodes_total: 1574
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3213.304
    learner:
      default_policy:
        cur_kl_coeff: 1.2037062331786428e-36
        cur_lr: 4.999999873689376e-05
        entropy: 33.11418151855469
        entropy_coeff: 0.0
        kl: 0.0063783214427530766
        policy_loss: -0.007483061868697405
        total_loss: 8913.97265625
        vf_explained_var: 4.0857202066035825e-07
        vf_loss: 8913.98046875
    load_time_ms: 14.551
    num_steps_sampled: 1785000
    num_steps_trained: 1782144
    sample_time_ms: 10624.973
    update_time_ms: 8.664
  iterations_since_restore: 119
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,119,1670.75,1785000,3070.16






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-42-42
  done: false
  episode_len_mean: 1047.18
  episode_reward_max: 5216.874961808858
  episode_reward_mean: 3137.800254505741
  episode_reward_min: 1377.0132217051253
  episodes_this_iter: 16
  episodes_total: 1590
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.938
    learner:
      default_policy:
        cur_kl_coeff: 6.018531165893214e-37
        cur_lr: 4.999999873689376e-05
        entropy: 33.0322380065918
        entropy_coeff: 0.0
        kl: 0.0053712595254182816
        policy_loss: -0.006069586146622896
        total_loss: 8209.013671875
        vf_explained_var: 1.976123257918516e-06
        vf_loss: 8209.0205078125
    load_time_ms: 14.402
    num_steps_sampled: 1800000
    num_steps_trained: 1797120
    sample_time_ms: 10744.674
    update_time_ms: 8.625
  iterations_since_restore: 120
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,120,1685.48,1800000,3137.8






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-42-56
  done: false
  episode_len_mean: 1050.65
  episode_reward_max: 5289.124112981242
  episode_reward_mean: 3173.799770607187
  episode_reward_min: 1377.0132217051253
  episodes_this_iter: 14
  episodes_total: 1604
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3210.251
    learner:
      default_policy:
        cur_kl_coeff: 3.009265582946607e-37
        cur_lr: 4.999999873689376e-05
        entropy: 33.07440185546875
        entropy_coeff: 0.0
        kl: 0.007050238084048033
        policy_loss: -0.006519998889416456
        total_loss: 9553.267578125
        vf_explained_var: 2.2146947230794467e-05
        vf_loss: 9553.2734375
    load_time_ms: 13.71
    num_steps_sampled: 1815000
    num_steps_trained: 1812096
    sample_time_ms: 10749.597
    update_time_ms: 8.664
  iterations_since_restore: 121
  node_ip: 192.168.107.157
  nu

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,121,1699.69,1815000,3173.8




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-43-10
  done: false
  episode_len_mean: 1060.72
  episode_reward_max: 5289.124112981242
  episode_reward_mean: 3192.402817492848
  episode_reward_min: 1377.0132217051253
  episodes_this_iter: 11
  episodes_total: 1615
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3212.012
    learner:
      default_policy:
        cur_kl_coeff: 1.5046327914733034e-37
        cur_lr: 4.999999873689376e-05
        entropy: 33.10015106201172
        entropy_coeff: 0.0
        kl: 0.007764381356537342
        policy_loss: -0.006309342570602894
        total_loss: 8822.5478515625
        vf_explained_var: 2.6766051632876042e-06
        vf_loss: 8822.5537109375
    load_time_ms: 13.565
    num_steps_sampled: 1830000
    num_steps_trained: 1827072
    sample_time_ms: 10838.822
    update_time_ms: 8.317
  iterations_since_restore: 122
  node_ip: 192.168.107.15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,122,1713.88,1830000,3192.4






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-43-23
  done: false
  episode_len_mean: 1067.52
  episode_reward_max: 5289.124112981242
  episode_reward_mean: 3242.7411173770893
  episode_reward_min: 1377.0132217051253
  episodes_this_iter: 14
  episodes_total: 1629
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3213.372
    learner:
      default_policy:
        cur_kl_coeff: 7.523163957366517e-38
        cur_lr: 4.999999873689376e-05
        entropy: 33.11360168457031
        entropy_coeff: 0.0
        kl: 0.007603565696626902
        policy_loss: -0.006995213683694601
        total_loss: 8984.9794921875
        vf_explained_var: 2.1768430542579154e-06
        vf_loss: 8984.9853515625
    load_time_ms: 14.163
    num_steps_sampled: 1845000
    num_steps_trained: 1842048
    sample_time_ms: 10744.327
    update_time_ms: 8.307
  iterations_since_restore: 123
  node_ip: 192.168.107.15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,123,1727.29,1845000,3242.74






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-43-38
  done: false
  episode_len_mean: 1086.21
  episode_reward_max: 5289.124112981242
  episode_reward_mean: 3324.659172016516
  episode_reward_min: 1377.0132217051253
  episodes_this_iter: 12
  episodes_total: 1641
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3215.723
    learner:
      default_policy:
        cur_kl_coeff: 3.7615819786832586e-38
        cur_lr: 4.999999873689376e-05
        entropy: 33.05644607543945
        entropy_coeff: 0.0
        kl: 0.0065994951874017715
        policy_loss: -0.005452702287584543
        total_loss: 7912.169921875
        vf_explained_var: 3.74439446204633e-06
        vf_loss: 7912.17529296875
    load_time_ms: 14.184
    num_steps_sampled: 1860000
    num_steps_trained: 1857024
    sample_time_ms: 10772.602
    update_time_ms: 8.802
  iterations_since_restore: 124
  node_ip: 192.168.107.157

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,124,1741.93,1860000,3324.66






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-43-53
  done: false
  episode_len_mean: 1121.56
  episode_reward_max: 5289.124112981242
  episode_reward_mean: 3403.217040787864
  episode_reward_min: 1256.1956496319401
  episodes_this_iter: 12
  episodes_total: 1653
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3213.306
    learner:
      default_policy:
        cur_kl_coeff: 1.8807909893416293e-38
        cur_lr: 4.999999873689376e-05
        entropy: 33.04002380371094
        entropy_coeff: 0.0
        kl: 0.006435176357626915
        policy_loss: -0.008635490201413631
        total_loss: 8421.8955078125
        vf_explained_var: 9.707915523904376e-06
        vf_loss: 8421.904296875
    load_time_ms: 14.421
    num_steps_sampled: 1875000
    num_steps_trained: 1872000
    sample_time_ms: 10835.736
    update_time_ms: 8.97
  iterations_since_restore: 125
  node_ip: 192.168.107.157
 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,125,1756.32,1875000,3403.22








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-44-07
  done: false
  episode_len_mean: 1110.22
  episode_reward_max: 5289.124112981242
  episode_reward_mean: 3384.6276643280294
  episode_reward_min: 1135.4965111227264
  episodes_this_iter: 19
  episodes_total: 1672
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3212.011
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.9567985534668
        entropy_coeff: 0.0
        kl: 0.005577952601015568
        policy_loss: -0.006399760954082012
        total_loss: 8415.5009765625
        vf_explained_var: 2.9496657134586712e-06
        vf_loss: 8415.5078125
    load_time_ms: 14.677
    num_steps_sampled: 1890000
    num_steps_trained: 1886976
    sample_time_ms: 10939.427
    update_time_ms: 8.879
  iterations_since_restore: 126
  node_ip: 192.168.107.157
  num_healthy_worker

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,126,1771.02,1890000,3384.63






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-44-22
  done: false
  episode_len_mean: 1077.11
  episode_reward_max: 5289.124112981242
  episode_reward_mean: 3291.5459446174737
  episode_reward_min: 1135.4965111227264
  episodes_this_iter: 15
  episodes_total: 1687
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3213.691
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.96991729736328
        entropy_coeff: 0.0
        kl: 0.008746202103793621
        policy_loss: -0.00919954665005207
        total_loss: 9369.474609375
        vf_explained_var: 2.099407993227942e-06
        vf_loss: 9369.484375
    load_time_ms: 15.48
    num_steps_sampled: 1905000
    num_steps_trained: 1901952
    sample_time_ms: 11003.44
    update_time_ms: 8.586
  iterations_since_restore: 127
  node_ip: 192.168.107.157
  num_healthy_workers: 15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,127,1785.77,1905000,3291.55






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-44-36
  done: false
  episode_len_mean: 1094.41
  episode_reward_max: 4993.250391631941
  episode_reward_mean: 3333.0684881521324
  episode_reward_min: 1135.4965111227264
  episodes_this_iter: 14
  episodes_total: 1701
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3212.027
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.01030349731445
        entropy_coeff: 0.0
        kl: 0.0045412625186145306
        policy_loss: -0.007197348866611719
        total_loss: 8039.5107421875
        vf_explained_var: 1.106710533349542e-05
        vf_loss: 8039.51708984375
    load_time_ms: 15.746
    num_steps_sampled: 1920000
    num_steps_trained: 1916928
    sample_time_ms: 10914.998
    update_time_ms: 8.457
  iterations_since_restore: 128
  node_ip: 192.168.107.157
  num_healthy_w

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,128,1799.23,1920000,3333.07




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-44-49
  done: false
  episode_len_mean: 1104.23
  episode_reward_max: 4993.250391631941
  episode_reward_mean: 3359.242679947973
  episode_reward_min: 1135.4965111227264
  episodes_this_iter: 9
  episodes_total: 1710
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3213.335
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.966705322265625
        entropy_coeff: 0.0
        kl: 0.005760694853961468
        policy_loss: -0.008183643221855164
        total_loss: 10348.3857421875
        vf_explained_var: 6.062352753133382e-08
        vf_loss: 10348.3955078125
    load_time_ms: 15.786
    num_steps_sampled: 1935000
    num_steps_trained: 1931904
    sample_time_ms: 10915.344
    update_time_ms: 8.482
  iterations_since_restore: 129
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,129,1812.5,1935000,3359.24








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-45-03
  done: false
  episode_len_mean: 1077.33
  episode_reward_max: 4993.250391631941
  episode_reward_mean: 3316.154672069741
  episode_reward_min: 1135.4965111227264
  episodes_this_iter: 18
  episodes_total: 1728
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3215.563
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.930511474609375
        entropy_coeff: 0.0
        kl: 0.006288551725447178
        policy_loss: -0.007486117538064718
        total_loss: 9390.5341796875
        vf_explained_var: 1.1976967471127864e-06
        vf_loss: 9390.54296875
    load_time_ms: 15.895
    num_steps_sampled: 1950000
    num_steps_trained: 1946880
    sample_time_ms: 10865.728
    update_time_ms: 8.602
  iterations_since_restore: 130
  node_ip: 192.168.107.157
  num_healthy_work

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,130,1826.76,1950000,3316.15






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-45-17
  done: false
  episode_len_mean: 1034.26
  episode_reward_max: 4993.250391631941
  episode_reward_mean: 3222.17988448978
  episode_reward_min: 1135.4965111227264
  episodes_this_iter: 15
  episodes_total: 1743
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3211.238
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.066864013671875
        entropy_coeff: 0.0
        kl: 0.00704343942925334
        policy_loss: -0.00665196031332016
        total_loss: 8222.1513671875
        vf_explained_var: 1.5186448081294657e-06
        vf_loss: 8222.1572265625
    load_time_ms: 16.328
    num_steps_sampled: 1965000
    num_steps_trained: 1961856
    sample_time_ms: 10877.713
    update_time_ms: 8.6
  iterations_since_restore: 131
  node_ip: 192.168.107.157
  num_healthy_workers

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,131,1841.05,1965000,3222.18






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-45-32
  done: false
  episode_len_mean: 1011.58
  episode_reward_max: 4970.98788915383
  episode_reward_mean: 3174.6855304018572
  episode_reward_min: 1135.4965111227264
  episodes_this_iter: 14
  episodes_total: 1757
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3212.643
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.99678421020508
        entropy_coeff: 0.0
        kl: 0.007756083272397518
        policy_loss: -0.005053028464317322
        total_loss: 10940.75
        vf_explained_var: 1.1099709809059277e-05
        vf_loss: 10940.75390625
    load_time_ms: 16.465
    num_steps_sampled: 1980000
    num_steps_trained: 1976832
    sample_time_ms: 10898.728
    update_time_ms: 8.732
  iterations_since_restore: 132
  node_ip: 192.168.107.157
  num_healthy_workers: 15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,132,1855.47,1980000,3174.69






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-45-47
  done: false
  episode_len_mean: 1033.51
  episode_reward_max: 5114.324438522805
  episode_reward_mean: 3251.8278473507507
  episode_reward_min: 1188.047033853368
  episodes_this_iter: 19
  episodes_total: 1776
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3208.654
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.072059631347656
        entropy_coeff: 0.0
        kl: 0.00444599287584424
        policy_loss: -0.006710733287036419
        total_loss: 9475.552734375
        vf_explained_var: 8.386933586734813e-06
        vf_loss: 9475.55859375
    load_time_ms: 16.49
    num_steps_sampled: 1995000
    num_steps_trained: 1991808
    sample_time_ms: 11092.431
    update_time_ms: 8.775
  iterations_since_restore: 133
  node_ip: 192.168.107.157
  num_healthy_workers:

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,133,1870.77,1995000,3251.83




Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-46-02
  done: false
  episode_len_mean: 1020.97
  episode_reward_max: 5114.324438522805
  episode_reward_mean: 3208.641708808758
  episode_reward_min: 1188.047033853368
  episodes_this_iter: 10
  episodes_total: 1786
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3206.469
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.08684158325195
        entropy_coeff: 0.0
        kl: 0.004607933573424816
        policy_loss: -0.005932260770350695
        total_loss: 10379.0634765625
        vf_explained_var: 4.7887493082043875e-08
        vf_loss: 10379.0693359375
    load_time_ms: 16.368
    num_steps_sampled: 2010000
    num_steps_trained: 2006784
    sample_time_ms: 11068.275
    update_time_ms: 8.555
  iterations_since_restore: 134
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,134,1885.14,2010000,3208.64






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-46-16
  done: false
  episode_len_mean: 1017.59
  episode_reward_max: 6214.045858591703
  episode_reward_mean: 3270.036081457316
  episode_reward_min: 1333.3859582659256
  episodes_this_iter: 16
  episodes_total: 1802
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.673
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.9961051940918
        entropy_coeff: 0.0
        kl: 0.0062454198487102985
        policy_loss: -0.007066851016134024
        total_loss: 10604.626953125
        vf_explained_var: 1.7127420051110676e-06
        vf_loss: 10604.6328125
    load_time_ms: 15.783
    num_steps_sampled: 2025000
    num_steps_trained: 2021760
    sample_time_ms: 11061.137
    update_time_ms: 8.399
  iterations_since_restore: 135
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,135,1899.44,2025000,3270.04






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-46-31
  done: false
  episode_len_mean: 1027.39
  episode_reward_max: 6214.045858591703
  episode_reward_mean: 3317.0339474504103
  episode_reward_min: 1333.3859582659256
  episodes_this_iter: 16
  episodes_total: 1818
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.229
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.218379974365234
        entropy_coeff: 0.0
        kl: 0.005484454333782196
        policy_loss: -0.0059650614857673645
        total_loss: 8101.03173828125
        vf_explained_var: 2.8701929295493755e-06
        vf_loss: 8101.03759765625
    load_time_ms: 15.576
    num_steps_sampled: 2040000
    num_steps_trained: 2036736
    sample_time_ms: 11094.329
    update_time_ms: 8.4
  iterations_since_restore: 136
  node_ip: 192.168.107.157
  num_healthy_

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,136,1914.47,2040000,3317.03






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-46-46
  done: false
  episode_len_mean: 1001.36
  episode_reward_max: 6214.045858591703
  episode_reward_mean: 3255.747708997893
  episode_reward_min: 1326.34580475462
  episodes_this_iter: 16
  episodes_total: 1834
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3198.336
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.04042434692383
        entropy_coeff: 0.0
        kl: 0.006136408541351557
        policy_loss: -0.0060455696657299995
        total_loss: 10293.091796875
        vf_explained_var: 1.0428265113660018e-06
        vf_loss: 10293.09765625
    load_time_ms: 15.215
    num_steps_sampled: 2055000
    num_steps_trained: 2051712
    sample_time_ms: 11115.182
    update_time_ms: 8.628
  iterations_since_restore: 137
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,137,1929.35,2055000,3255.75






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-47-00
  done: false
  episode_len_mean: 1010.98
  episode_reward_max: 6214.045858591703
  episode_reward_mean: 3293.3043997983414
  episode_reward_min: 1326.34580475462
  episodes_this_iter: 12
  episodes_total: 1846
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3203.126
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.11211395263672
        entropy_coeff: 0.0
        kl: 0.005075411405414343
        policy_loss: -0.007589033339172602
        total_loss: 10690.7275390625
        vf_explained_var: 9.465421157983656e-07
        vf_loss: 10690.7353515625
    load_time_ms: 15.441
    num_steps_sampled: 2070000
    num_steps_trained: 2066688
    sample_time_ms: 11168.29
    update_time_ms: 8.806
  iterations_since_restore: 138
  node_ip: 192.168.107.157
  num_healthy_work

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,138,1943.4,2070000,3293.3






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-47-15
  done: false
  episode_len_mean: 1029.52
  episode_reward_max: 6214.045858591703
  episode_reward_mean: 3350.255129480886
  episode_reward_min: 1320.885943205682
  episodes_this_iter: 14
  episodes_total: 1860
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3203.174
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.12456130981445
        entropy_coeff: 0.0
        kl: 0.008201666176319122
        policy_loss: -0.008566424250602722
        total_loss: 8493.3212890625
        vf_explained_var: 3.472862090347917e-06
        vf_loss: 8493.3291015625
    load_time_ms: 15.281
    num_steps_sampled: 2085000
    num_steps_trained: 2081664
    sample_time_ms: 11304.597
    update_time_ms: 9.014
  iterations_since_restore: 139
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,139,1958.03,2085000,3350.26






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-47-28
  done: false
  episode_len_mean: 1037.99
  episode_reward_max: 6214.045858591703
  episode_reward_mean: 3382.256737893059
  episode_reward_min: 1320.885943205682
  episodes_this_iter: 15
  episodes_total: 1875
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3199.936
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.219844818115234
        entropy_coeff: 0.0
        kl: 0.008415299467742443
        policy_loss: -0.0069823856465518475
        total_loss: 8925.5244140625
        vf_explained_var: 9.022207336784049e-07
        vf_loss: 8925.53125
    load_time_ms: 14.906
    num_steps_sampled: 2100000
    num_steps_trained: 2096640
    sample_time_ms: 11250.012
    update_time_ms: 9.041
  iterations_since_restore: 140
  node_ip: 192.168.107.157
  num_healthy_workers:

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,140,1971.71,2100000,3382.26






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-47-43
  done: false
  episode_len_mean: 1070.8
  episode_reward_max: 6214.045858591703
  episode_reward_mean: 3467.7110068599127
  episode_reward_min: 1320.885943205682
  episodes_this_iter: 12
  episodes_total: 1887
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3201.171
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.21627426147461
        entropy_coeff: 0.0
        kl: 0.011900994926691055
        policy_loss: -0.0059645590372383595
        total_loss: 8711.9521484375
        vf_explained_var: 8.700750186108053e-06
        vf_loss: 8711.9580078125
    load_time_ms: 14.719
    num_steps_sampled: 2115000
    num_steps_trained: 2111616
    sample_time_ms: 11306.96
    update_time_ms: 8.957
  iterations_since_restore: 141
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,141,1986.58,2115000,3467.71






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-47-58
  done: false
  episode_len_mean: 1064.55
  episode_reward_max: 5617.926844985676
  episode_reward_mean: 3402.9430802467396
  episode_reward_min: 1320.885943205682
  episodes_this_iter: 13
  episodes_total: 1900
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3197.405
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.08283615112305
        entropy_coeff: 0.0
        kl: 0.0077137090265750885
        policy_loss: -0.0066588702611625195
        total_loss: 9038.0302734375
        vf_explained_var: 6.430678695323877e-06
        vf_loss: 9038.037109375
    load_time_ms: 14.623
    num_steps_sampled: 2130000
    num_steps_trained: 2126592
    sample_time_ms: 11340.87
    update_time_ms: 9.124
  iterations_since_restore: 142
  node_ip: 192.168.107.157
  num_healthy_work

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,142,2001.3,2130000,3402.94








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-48-13
  done: false
  episode_len_mean: 1056.56
  episode_reward_max: 5617.926844985676
  episode_reward_mean: 3350.0525496000714
  episode_reward_min: 1320.885943205682
  episodes_this_iter: 19
  episodes_total: 1919
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3195.742
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.25571060180664
        entropy_coeff: 0.0
        kl: 0.006142019759863615
        policy_loss: -0.008662273176014423
        total_loss: 8345.00390625
        vf_explained_var: 1.7264969756070059e-06
        vf_loss: 8345.0126953125
    load_time_ms: 14.509
    num_steps_sampled: 2145000
    num_steps_trained: 2141568
    sample_time_ms: 11293.243
    update_time_ms: 9.14
  iterations_since_restore: 143
  node_ip: 192.168.107.157
  num_healthy_worker

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,143,2016.11,2145000,3350.05






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-48-26
  done: false
  episode_len_mean: 1062.11
  episode_reward_max: 5617.926844985676
  episode_reward_mean: 3376.7608940566333
  episode_reward_min: 1320.885943205682
  episodes_this_iter: 12
  episodes_total: 1931
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3199.886
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.08761215209961
        entropy_coeff: 0.0
        kl: 0.006194961257278919
        policy_loss: -0.007778472732752562
        total_loss: 11020.291015625
        vf_explained_var: 5.451023099567465e-08
        vf_loss: 11020.2998046875
    load_time_ms: 15.059
    num_steps_sampled: 2160000
    num_steps_trained: 2156544
    sample_time_ms: 11199.45
    update_time_ms: 9.373
  iterations_since_restore: 144
  node_ip: 192.168.107.157
  num_healthy_work

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,144,2029.59,2160000,3376.76






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-48-40
  done: false
  episode_len_mean: 1077.44
  episode_reward_max: 5617.926844985676
  episode_reward_mean: 3411.860527608113
  episode_reward_min: 1320.885943205682
  episodes_this_iter: 12
  episodes_total: 1943
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3200.502
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.17955780029297
        entropy_coeff: 0.0
        kl: 0.006178529933094978
        policy_loss: -0.006425152998417616
        total_loss: 8972.53515625
        vf_explained_var: 1.213489440488047e-06
        vf_loss: 8972.5419921875
    load_time_ms: 15.154
    num_steps_sampled: 2175000
    num_steps_trained: 2171520
    sample_time_ms: 11138.409
    update_time_ms: 9.524
  iterations_since_restore: 145
  node_ip: 192.168.107.157
  num_healthy_workers

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,145,2043.29,2175000,3411.86






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-48-55
  done: false
  episode_len_mean: 1053.43
  episode_reward_max: 5617.926844985676
  episode_reward_mean: 3328.0914561655086
  episode_reward_min: 1412.021982803792
  episodes_this_iter: 17
  episodes_total: 1960
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3200.151
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.98936462402344
        entropy_coeff: 0.0
        kl: 0.007695071864873171
        policy_loss: -0.005919224116951227
        total_loss: 9415.5947265625
        vf_explained_var: 1.502852171597624e-07
        vf_loss: 9415.6005859375
    load_time_ms: 15.538
    num_steps_sampled: 2190000
    num_steps_trained: 2186496
    sample_time_ms: 11170.977
    update_time_ms: 9.588
  iterations_since_restore: 146
  node_ip: 192.168.107.157
  num_healthy_work

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,146,2058.64,2190000,3328.09






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-49-10
  done: false
  episode_len_mean: 1049.96
  episode_reward_max: 5617.926844985676
  episode_reward_mean: 3321.344346146978
  episode_reward_min: 1412.021982803792
  episodes_this_iter: 15
  episodes_total: 1975
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3201.853
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 33.098453521728516
        entropy_coeff: 0.0
        kl: 0.0055236127227544785
        policy_loss: -0.006179091054946184
        total_loss: 8898.65625
        vf_explained_var: -6.113296979748384e-09
        vf_loss: 8898.662109375
    load_time_ms: 15.287
    num_steps_sampled: 2205000
    num_steps_trained: 2201472
    sample_time_ms: 11169.106
    update_time_ms: 9.48
  iterations_since_restore: 147
  node_ip: 192.168.107.157
  num_healthy_workers: 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,147,2073.52,2205000,3321.34






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-49-25
  done: false
  episode_len_mean: 1010.07
  episode_reward_max: 5153.094888212891
  episode_reward_mean: 3229.184071248401
  episode_reward_min: 1389.2733717657552
  episodes_this_iter: 11
  episodes_total: 1986
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3198.348
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.98851013183594
        entropy_coeff: 0.0
        kl: 0.00755184143781662
        policy_loss: -0.00958089530467987
        total_loss: 8710.048828125
        vf_explained_var: 1.503871089880704e-06
        vf_loss: 8710.0595703125
    load_time_ms: 15.161
    num_steps_sampled: 2220000
    num_steps_trained: 2216448
    sample_time_ms: 11221.095
    update_time_ms: 9.471
  iterations_since_restore: 148
  node_ip: 192.168.107.157
  num_healthy_workers

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,148,2088.05,2220000,3229.18






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-49-40
  done: false
  episode_len_mean: 1037.71
  episode_reward_max: 5110.434420607418
  episode_reward_mean: 3236.323519353216
  episode_reward_min: 1389.2733717657552
  episodes_this_iter: 15
  episodes_total: 2001
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3199.205
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.94436264038086
        entropy_coeff: 0.0
        kl: 0.008888522163033485
        policy_loss: -0.0075632319785654545
        total_loss: 8998.6572265625
        vf_explained_var: 2.5624902377785475e-07
        vf_loss: 8998.666015625
    load_time_ms: 15.51
    num_steps_sampled: 2235000
    num_steps_trained: 2231424
    sample_time_ms: 11264.676
    update_time_ms: 9.328
  iterations_since_restore: 149
  node_ip: 192.168.107.157
  num_healthy_work

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,149,2103.13,2235000,3236.32






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-49-54
  done: false
  episode_len_mean: 1054.61
  episode_reward_max: 5110.434420607418
  episode_reward_mean: 3304.4758498088595
  episode_reward_min: 1389.2733717657552
  episodes_this_iter: 15
  episodes_total: 2016
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3199.48
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.940616607666016
        entropy_coeff: 0.0
        kl: 0.005297812633216381
        policy_loss: -0.005935681983828545
        total_loss: 9616.5947265625
        vf_explained_var: 8.171949957613833e-06
        vf_loss: 9616.6025390625
    load_time_ms: 15.437
    num_steps_sampled: 2250000
    num_steps_trained: 2246400
    sample_time_ms: 11344.779
    update_time_ms: 9.305
  iterations_since_restore: 150
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,150,2117.61,2250000,3304.48






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-50-08
  done: false
  episode_len_mean: 1063.31
  episode_reward_max: 5110.434420607418
  episode_reward_mean: 3294.974817368444
  episode_reward_min: 1389.2733717657552
  episodes_this_iter: 14
  episodes_total: 2030
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3204.461
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.82268524169922
        entropy_coeff: 0.0
        kl: 0.010975389741361141
        policy_loss: -0.005960923619568348
        total_loss: 9006.3828125
        vf_explained_var: 4.381196205827109e-08
        vf_loss: 9006.388671875
    load_time_ms: 15.929
    num_steps_sampled: 2265000
    num_steps_trained: 2261376
    sample_time_ms: 11231.794
    update_time_ms: 9.262
  iterations_since_restore: 151
  node_ip: 192.168.107.157
  num_healthy_workers:

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,151,2131.4,2265000,3294.97






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-50-23
  done: false
  episode_len_mean: 1064.47
  episode_reward_max: 5199.064412802137
  episode_reward_mean: 3316.640861387978
  episode_reward_min: 1389.2733717657552
  episodes_this_iter: 13
  episodes_total: 2043
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3204.126
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.898895263671875
        entropy_coeff: 0.0
        kl: 0.006171270739287138
        policy_loss: -0.005293356254696846
        total_loss: 9998.8837890625
        vf_explained_var: 2.2058813442527025e-07
        vf_loss: 9998.888671875
    load_time_ms: 16.125
    num_steps_sampled: 2280000
    num_steps_trained: 2276352
    sample_time_ms: 11197.189
    update_time_ms: 9.234
  iterations_since_restore: 152
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,152,2145.77,2280000,3316.64








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-50-38
  done: false
  episode_len_mean: 1061.89
  episode_reward_max: 5449.121920150912
  episode_reward_mean: 3301.7288687987793
  episode_reward_min: 1389.2733717657552
  episodes_this_iter: 16
  episodes_total: 2059
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3210.328
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.87190246582031
        entropy_coeff: 0.0
        kl: 0.008275349624454975
        policy_loss: -0.00754887331277132
        total_loss: 9434.435546875
        vf_explained_var: 1.7718372191666276e-06
        vf_loss: 9434.4443359375
    load_time_ms: 16.167
    num_steps_sampled: 2295000
    num_steps_trained: 2291328
    sample_time_ms: 11221.624
    update_time_ms: 9.198
  iterations_since_restore: 153
  node_ip: 192.168.107.157
  num_healthy_work

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,153,2160.88,2295000,3301.73






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-50-53
  done: false
  episode_len_mean: 1049.52
  episode_reward_max: 5449.121920150912
  episode_reward_mean: 3267.8189488571584
  episode_reward_min: 1389.2733717657552
  episodes_this_iter: 16
  episodes_total: 2075
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.138
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.90180206298828
        entropy_coeff: 0.0
        kl: 0.0076407333835959435
        policy_loss: -0.0077870674431324005
        total_loss: 9482.4970703125
        vf_explained_var: 1.5064182434798568e-06
        vf_loss: 9482.5048828125
    load_time_ms: 15.564
    num_steps_sampled: 2310000
    num_steps_trained: 2306304
    sample_time_ms: 11379.711
    update_time_ms: 8.965
  iterations_since_restore: 154
  node_ip: 192.168.107.157
  num_healthy_

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,154,2175.89,2310000,3267.82






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-51-08
  done: false
  episode_len_mean: 1046.51
  episode_reward_max: 5449.121920150912
  episode_reward_mean: 3283.965158830685
  episode_reward_min: 1260.1653354777807
  episodes_this_iter: 15
  episodes_total: 2090
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3203.185
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.963096618652344
        entropy_coeff: 0.0
        kl: 0.006152490619570017
        policy_loss: -0.008759675547480583
        total_loss: 9332.2080078125
        vf_explained_var: 6.592171644115297e-07
        vf_loss: 9332.21875
    load_time_ms: 15.996
    num_steps_sampled: 2325000
    num_steps_trained: 2321280
    sample_time_ms: 11488.74
    update_time_ms: 8.706
  iterations_since_restore: 155
  node_ip: 192.168.107.157
  num_healthy_workers: 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,155,2190.66,2325000,3283.97






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-51-22
  done: false
  episode_len_mean: 1026.03
  episode_reward_max: 5449.121920150912
  episode_reward_mean: 3266.4838608800155
  episode_reward_min: 1260.1653354777807
  episodes_this_iter: 14
  episodes_total: 2104
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.102
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.92096710205078
        entropy_coeff: 0.0
        kl: 0.004855059087276459
        policy_loss: -0.006731868255883455
        total_loss: 8072.0546875
        vf_explained_var: 1.749421812746732e-06
        vf_loss: 8072.06103515625
    load_time_ms: 15.302
    num_steps_sampled: 2340000
    num_steps_trained: 2336256
    sample_time_ms: 11382.093
    update_time_ms: 8.673
  iterations_since_restore: 156
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,156,2204.96,2340000,3266.48






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-51-36
  done: false
  episode_len_mean: 994.75
  episode_reward_max: 5449.121920150912
  episode_reward_mean: 3186.774033758142
  episode_reward_min: 1260.1653354777807
  episodes_this_iter: 16
  episodes_total: 2120
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3209.917
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.83591079711914
        entropy_coeff: 0.0
        kl: 0.007011088076978922
        policy_loss: -0.007758973632007837
        total_loss: 11419.892578125
        vf_explained_var: 1.0698269825581974e-08
        vf_loss: 11419.8994140625
    load_time_ms: 15.363
    num_steps_sampled: 2355000
    num_steps_trained: 2351232
    sample_time_ms: 11322.189
    update_time_ms: 8.733
  iterations_since_restore: 157
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,157,2219.28,2355000,3186.77








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-51-50
  done: false
  episode_len_mean: 989.9
  episode_reward_max: 5449.121920150912
  episode_reward_mean: 3157.444422870611
  episode_reward_min: 1260.1653354777807
  episodes_this_iter: 16
  episodes_total: 2136
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.886
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.85202407836914
        entropy_coeff: 0.0
        kl: 0.00621029594913125
        policy_loss: -0.008274303749203682
        total_loss: 9542.0654296875
        vf_explained_var: 1.6072876860562246e-06
        vf_loss: 9542.072265625
    load_time_ms: 14.637
    num_steps_sampled: 2370000
    num_steps_trained: 2366208
    sample_time_ms: 11270.772
    update_time_ms: 8.505
  iterations_since_restore: 158
  node_ip: 192.168.107.157
  num_healthy_workers

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,158,2233.27,2370000,3157.44








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-52-05
  done: false
  episode_len_mean: 959.99
  episode_reward_max: 5313.772510083225
  episode_reward_mean: 3107.9515285104967
  episode_reward_min: 1260.1653354777807
  episodes_this_iter: 17
  episodes_total: 2153
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.943
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.86961364746094
        entropy_coeff: 0.0
        kl: 0.007694270461797714
        policy_loss: -0.006525751668959856
        total_loss: 10514.8603515625
        vf_explained_var: 4.62063354689235e-07
        vf_loss: 10514.8642578125
    load_time_ms: 14.741
    num_steps_sampled: 2385000
    num_steps_trained: 2381184
    sample_time_ms: 11213.544
    update_time_ms: 8.362
  iterations_since_restore: 159
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,159,2247.77,2385000,3107.95








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-52-19
  done: false
  episode_len_mean: 933.65
  episode_reward_max: 4955.909454907497
  episode_reward_mean: 3084.4936974488055
  episode_reward_min: 1260.1653354777807
  episodes_this_iter: 19
  episodes_total: 2172
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.284
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.682186126708984
        entropy_coeff: 0.0
        kl: 0.006587899290025234
        policy_loss: -0.005662743002176285
        total_loss: 11422.7275390625
        vf_explained_var: 8.202006540614093e-08
        vf_loss: 11422.7314453125
    load_time_ms: 15.267
    num_steps_sampled: 2400000
    num_steps_trained: 2396160
    sample_time_ms: 11207.201
    update_time_ms: 8.422
  iterations_since_restore: 160
  node_ip: 192.168.107.157
  num_healthy_w

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,160,2262.19,2400000,3084.49






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-52-33
  done: false
  episode_len_mean: 930.06
  episode_reward_max: 4955.909454907497
  episode_reward_mean: 3079.043212663209
  episode_reward_min: 1280.5280776098743
  episodes_this_iter: 13
  episodes_total: 2185
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3203.665
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.80503845214844
        entropy_coeff: 0.0
        kl: 0.00682383356615901
        policy_loss: -0.008319126442074776
        total_loss: 9315.140625
        vf_explained_var: 6.836703505541664e-07
        vf_loss: 9315.1494140625
    load_time_ms: 14.738
    num_steps_sampled: 2415000
    num_steps_trained: 2411136
    sample_time_ms: 11184.402
    update_time_ms: 8.52
  iterations_since_restore: 161
  node_ip: 192.168.107.157
  num_healthy_workers: 15

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,161,2275.72,2415000,3079.04






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-52-47
  done: false
  episode_len_mean: 918.91
  episode_reward_max: 4955.909454907497
  episode_reward_mean: 3064.2274725795623
  episode_reward_min: 1280.5280776098743
  episodes_this_iter: 18
  episodes_total: 2203
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3201.462
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.85300064086914
        entropy_coeff: 0.0
        kl: 0.01624702848494053
        policy_loss: -0.005298192612826824
        total_loss: 9615.9287109375
        vf_explained_var: 2.96494903295752e-07
        vf_loss: 9615.93359375
    load_time_ms: 14.596
    num_steps_sampled: 2430000
    num_steps_trained: 2426112
    sample_time_ms: 11168.61
    update_time_ms: 8.652
  iterations_since_restore: 162
  node_ip: 192.168.107.157
  num_healthy_workers: 

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,162,2289.91,2430000,3064.23








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-53-01
  done: false
  episode_len_mean: 914.15
  episode_reward_max: 4955.909454907497
  episode_reward_mean: 3083.9270758283938
  episode_reward_min: 1280.5280776098743
  episodes_this_iter: 18
  episodes_total: 2221
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3195.644
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.81293487548828
        entropy_coeff: 0.0
        kl: 0.005912702530622482
        policy_loss: -0.006143747828900814
        total_loss: 11456.5361328125
        vf_explained_var: 1.3754917915775877e-07
        vf_loss: 11456.54296875
    load_time_ms: 14.046
    num_steps_sampled: 2445000
    num_steps_trained: 2441088
    sample_time_ms: 11094.179
    update_time_ms: 8.578
  iterations_since_restore: 163
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,163,2304.22,2445000,3083.93








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-53-15
  done: false
  episode_len_mean: 877.69
  episode_reward_max: 4772.052973054323
  episode_reward_mean: 3007.9869665692927
  episode_reward_min: 1158.8988841443131
  episodes_this_iter: 18
  episodes_total: 2239
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3195.941
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.74207305908203
        entropy_coeff: 0.0
        kl: 0.008643249981105328
        policy_loss: -0.00919030699878931
        total_loss: 11372.1884765625
        vf_explained_var: -5.094414334827491e-10
        vf_loss: 11372.1982421875
    load_time_ms: 13.934
    num_steps_sampled: 2460000
    num_steps_trained: 2456064
    sample_time_ms: 11007.458
    update_time_ms: 8.748
  iterations_since_restore: 164
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,164,2318.36,2460000,3007.99








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-53-31
  done: false
  episode_len_mean: 908.73
  episode_reward_max: 5243.774369636617
  episode_reward_mean: 3101.535313464248
  episode_reward_min: 1158.8988841443131
  episodes_this_iter: 17
  episodes_total: 2256
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3194.976
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.817047119140625
        entropy_coeff: 0.0
        kl: 0.009590039029717445
        policy_loss: -0.008266233839094639
        total_loss: 11027.8291015625
        vf_explained_var: 3.4234463441862317e-07
        vf_loss: 11027.837890625
    load_time_ms: 13.387
    num_steps_sampled: 2475000
    num_steps_trained: 2471040
    sample_time_ms: 11059.791
    update_time_ms: 8.989
  iterations_since_restore: 165
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,165,2333.64,2475000,3101.54






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-53-45
  done: false
  episode_len_mean: 896.55
  episode_reward_max: 5243.774369636617
  episode_reward_mean: 3068.41472204003
  episode_reward_min: 1158.8988841443131
  episodes_this_iter: 12
  episodes_total: 2268
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3193.717
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.82625961303711
        entropy_coeff: 0.0
        kl: 0.00972396694123745
        policy_loss: -0.0066976104862987995
        total_loss: 11622.876953125
        vf_explained_var: 4.22836379243563e-08
        vf_loss: 11622.8837890625
    load_time_ms: 13.854
    num_steps_sampled: 2490000
    num_steps_trained: 2486016
    sample_time_ms: 11058.858
    update_time_ms: 8.974
  iterations_since_restore: 166
  node_ip: 192.168.107.157
  num_healthy_worker

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,166,2347.92,2490000,3068.41






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-53-59
  done: false
  episode_len_mean: 919.28
  episode_reward_max: 5243.774369636617
  episode_reward_mean: 3138.739070830569
  episode_reward_min: 1158.8988841443131
  episodes_this_iter: 15
  episodes_total: 2283
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3191.89
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.777610778808594
        entropy_coeff: 0.0
        kl: 0.007194995414465666
        policy_loss: -0.006343757268041372
        total_loss: 9901.1904296875
        vf_explained_var: 7.697659611949348e-07
        vf_loss: 9901.1962890625
    load_time_ms: 14.153
    num_steps_sampled: 2505000
    num_steps_trained: 2500992
    sample_time_ms: 11046.228
    update_time_ms: 9.107
  iterations_since_restore: 167
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,167,2362.11,2505000,3138.74







Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-54-14
  done: false
  episode_len_mean: 928.16
  episode_reward_max: 5243.774369636617
  episode_reward_mean: 3172.2693651664144
  episode_reward_min: 1158.8988841443131
  episodes_this_iter: 14
  episodes_total: 2297
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3193.308
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.82453918457031
        entropy_coeff: 0.0
        kl: 0.007888229563832283
        policy_loss: -0.005971113685518503
        total_loss: 11139.0048828125
        vf_explained_var: 2.664378655481414e-07
        vf_loss: 11139.0107421875
    load_time_ms: 14.575
    num_steps_sampled: 2520000
    num_steps_trained: 2515968
    sample_time_ms: 11081.329
    update_time_ms: 9.394
  iterations_since_restore: 168
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,168,2376.47,2520000,3172.27






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-54-29
  done: false
  episode_len_mean: 942.24
  episode_reward_max: 5243.774369636617
  episode_reward_mean: 3257.6286912776727
  episode_reward_min: 1158.8988841443131
  episodes_this_iter: 17
  episodes_total: 2314
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3187.559
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.74151611328125
        entropy_coeff: 0.0
        kl: 0.005306250881403685
        policy_loss: -0.006812548730522394
        total_loss: 9580.501953125
        vf_explained_var: 1.4264359471383159e-08
        vf_loss: 9580.509765625
    load_time_ms: 14.669
    num_steps_sampled: 2535000
    num_steps_trained: 2530944
    sample_time_ms: 11124.785
    update_time_ms: 9.425
  iterations_since_restore: 169
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,169,2391.35,2535000,3257.63






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-54-43
  done: false
  episode_len_mean: 993.81
  episode_reward_max: 5382.137331821315
  episode_reward_mean: 3377.7525270880014
  episode_reward_min: 1309.577030963477
  episodes_this_iter: 16
  episodes_total: 2330
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3191.197
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.7758903503418
        entropy_coeff: 0.0
        kl: 0.005472263786941767
        policy_loss: -0.007763718720525503
        total_loss: 10459.48046875
        vf_explained_var: 2.2262589993715665e-07
        vf_loss: 10459.48828125
    load_time_ms: 14.617
    num_steps_sampled: 2550000
    num_steps_trained: 2545920
    sample_time_ms: 11095.593
    update_time_ms: 9.224
  iterations_since_restore: 170
  node_ip: 192.168.107.157
  num_healthy_workers

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,170,2405.51,2550000,3377.75






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-54-57
  done: false
  episode_len_mean: 984.7
  episode_reward_max: 5382.137331821315
  episode_reward_mean: 3371.4594924716166
  episode_reward_min: 1309.577030963477
  episodes_this_iter: 14
  episodes_total: 2344
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3191.974
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.65520095825195
        entropy_coeff: 0.0
        kl: 0.010715831071138382
        policy_loss: -0.007276094984263182
        total_loss: 12020.6337890625
        vf_explained_var: 2.4707907186893863e-07
        vf_loss: 12020.640625
    load_time_ms: 14.889
    num_steps_sampled: 2565000
    num_steps_trained: 2560896
    sample_time_ms: 11174.424
    update_time_ms: 9.355
  iterations_since_restore: 171
  node_ip: 192.168.107.157
  num_healthy_workers

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,171,2419.84,2565000,3371.46






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-55-12
  done: false
  episode_len_mean: 982.07
  episode_reward_max: 5382.137331821315
  episode_reward_mean: 3368.5037281821988
  episode_reward_min: 1369.0559528758126
  episodes_this_iter: 16
  episodes_total: 2360
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3194.885
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.612789154052734
        entropy_coeff: 0.0
        kl: 0.007605390623211861
        policy_loss: -0.0076107908971607685
        total_loss: 11214.359375
        vf_explained_var: 2.424941101253353e-07
        vf_loss: 11214.3671875
    load_time_ms: 14.44
    num_steps_sampled: 2580000
    num_steps_trained: 2575872
    sample_time_ms: 11233.728
    update_time_ms: 9.175
  iterations_since_restore: 172
  node_ip: 192.168.107.157
  num_healthy_workers:

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,172,2434.65,2580000,3368.5








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-55-26
  done: false
  episode_len_mean: 987.45
  episode_reward_max: 5382.137331821315
  episode_reward_mean: 3409.6318980586575
  episode_reward_min: 1369.0559528758126
  episodes_this_iter: 18
  episodes_total: 2378
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3195.14
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.60939025878906
        entropy_coeff: 0.0
        kl: 0.008937302976846695
        policy_loss: -0.007725942879915237
        total_loss: 11885.9013671875
        vf_explained_var: 3.005704485303795e-08
        vf_loss: 11885.9111328125
    load_time_ms: 14.417
    num_steps_sampled: 2595000
    num_steps_trained: 2590848
    sample_time_ms: 11240.575
    update_time_ms: 9.238
  iterations_since_restore: 173
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,173,2449.03,2595000,3409.63






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-55-41
  done: false
  episode_len_mean: 955.16
  episode_reward_max: 5382.137331821315
  episode_reward_mean: 3356.7141376295763
  episode_reward_min: 1369.0559528758126
  episodes_this_iter: 15
  episodes_total: 2393
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3194.752
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.6422004699707
        entropy_coeff: 0.0
        kl: 0.005617329850792885
        policy_loss: -0.0067428238689899445
        total_loss: 11935.3935546875
        vf_explained_var: 1.2226593959496768e-08
        vf_loss: 11935.3994140625
    load_time_ms: 14.758
    num_steps_sampled: 2610000
    num_steps_trained: 2605824
    sample_time_ms: 11268.912
    update_time_ms: 9.27
  iterations_since_restore: 174
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,174,2463.45,2610000,3356.71






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-55-55
  done: false
  episode_len_mean: 939.56
  episode_reward_max: 5382.137331821315
  episode_reward_mean: 3304.518966652627
  episode_reward_min: 1311.9221922211411
  episodes_this_iter: 17
  episodes_total: 2410
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3202.067
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.64692306518555
        entropy_coeff: 0.0
        kl: 0.008103049360215664
        policy_loss: -0.008452213369309902
        total_loss: 9269.533203125
        vf_explained_var: 1.436115326214349e-06
        vf_loss: 9269.54296875
    load_time_ms: 15.278
    num_steps_sampled: 2625000
    num_steps_trained: 2620800
    sample_time_ms: 11186.568
    update_time_ms: 9.189
  iterations_since_restore: 175
  node_ip: 192.168.107.157
  num_healthy_workers:

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,175,2477.99,2625000,3304.52






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-56-10
  done: false
  episode_len_mean: 936.42
  episode_reward_max: 5369.599100761181
  episode_reward_mean: 3291.934115775007
  episode_reward_min: 1311.9221922211411
  episodes_this_iter: 14
  episodes_total: 2424
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3200.874
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.380672454833984
        entropy_coeff: 0.0
        kl: 0.006621667183935642
        policy_loss: -0.006816852372139692
        total_loss: 13492.7392578125
        vf_explained_var: 1.2736035337468365e-08
        vf_loss: 13492.7431640625
    load_time_ms: 14.864
    num_steps_sampled: 2640000
    num_steps_trained: 2635776
    sample_time_ms: 11185.313
    update_time_ms: 9.29
  iterations_since_restore: 176
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,176,2492.24,2640000,3291.93






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-56-23
  done: false
  episode_len_mean: 940.75
  episode_reward_max: 6254.900707894104
  episode_reward_mean: 3350.218642357462
  episode_reward_min: 1311.9221922211411
  episodes_this_iter: 15
  episodes_total: 2439
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3203.126
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.377723693847656
        entropy_coeff: 0.0
        kl: 0.009821034036576748
        policy_loss: -0.009064369834959507
        total_loss: 12258.03125
        vf_explained_var: 3.0158932418089535e-07
        vf_loss: 12258.0419921875
    load_time_ms: 14.697
    num_steps_sampled: 2655000
    num_steps_trained: 2650752
    sample_time_ms: 11117.154
    update_time_ms: 9.193
  iterations_since_restore: 177
  node_ip: 192.168.107.157
  num_healthy_worker

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,177,2505.77,2655000,3350.22






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-56-38
  done: false
  episode_len_mean: 947.84
  episode_reward_max: 6254.900707894104
  episode_reward_mean: 3382.44043280636
  episode_reward_min: 1311.9221922211411
  episodes_this_iter: 16
  episodes_total: 2455
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3204.366
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.576080322265625
        entropy_coeff: 0.0
        kl: 0.006734136026352644
        policy_loss: -0.007184748072177172
        total_loss: 10993.9921875
        vf_explained_var: 8.100118265019773e-08
        vf_loss: 10993.998046875
    load_time_ms: 14.89
    num_steps_sampled: 2670000
    num_steps_trained: 2665728
    sample_time_ms: 11112.162
    update_time_ms: 9.103
  iterations_since_restore: 178
  node_ip: 192.168.107.157
  num_healthy_workers:

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,178,2520.09,2670000,3382.44








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-56-52
  done: false
  episode_len_mean: 930.18
  episode_reward_max: 6254.900707894104
  episode_reward_mean: 3288.5334931984626
  episode_reward_min: 1311.9221922211411
  episodes_this_iter: 19
  episodes_total: 2474
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3207.624
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.69350814819336
        entropy_coeff: 0.0
        kl: 0.006105550564825535
        policy_loss: -0.006784538738429546
        total_loss: 10947.0302734375
        vf_explained_var: 4.6461056513180665e-07
        vf_loss: 10947.0341796875
    load_time_ms: 14.568
    num_steps_sampled: 2685000
    num_steps_trained: 2680704
    sample_time_ms: 11047.767
    update_time_ms: 9.308
  iterations_since_restore: 179
  node_ip: 192.168.107.157
  num_healthy_w

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,179,2534.36,2685000,3288.53






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-57-06
  done: false
  episode_len_mean: 918.07
  episode_reward_max: 6254.900707894104
  episode_reward_mean: 3239.1540897793525
  episode_reward_min: 1311.9221922211411
  episodes_this_iter: 17
  episodes_total: 2491
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3206.023
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.61582565307617
        entropy_coeff: 0.0
        kl: 0.005844343453645706
        policy_loss: -0.008196495473384857
        total_loss: 10786.6005859375
        vf_explained_var: 1.2827734963138937e-06
        vf_loss: 10786.607421875
    load_time_ms: 14.854
    num_steps_sampled: 2700000
    num_steps_trained: 2695680
    sample_time_ms: 11074.269
    update_time_ms: 9.397
  iterations_since_restore: 180
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,180,2548.77,2700000,3239.15








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-57-21
  done: false
  episode_len_mean: 889.27
  episode_reward_max: 6254.900707894104
  episode_reward_mean: 3130.0416457560664
  episode_reward_min: 1456.1736178823746
  episodes_this_iter: 19
  episodes_total: 2510
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3201.542
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.538475036621094
        entropy_coeff: 0.0
        kl: 0.00917898491024971
        policy_loss: -0.00871740747243166
        total_loss: 10570.271484375
        vf_explained_var: 3.9532653772766935e-07
        vf_loss: 10570.28125
    load_time_ms: 14.157
    num_steps_sampled: 2715000
    num_steps_trained: 2710656
    sample_time_ms: 11079.362
    update_time_ms: 9.369
  iterations_since_restore: 181
  node_ip: 192.168.107.157
  num_healthy_workers:

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,181,2563.1,2715000,3130.04






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-57-35
  done: false
  episode_len_mean: 887.07
  episode_reward_max: 6254.900707894104
  episode_reward_mean: 3138.757961566638
  episode_reward_min: 1130.241898485596
  episodes_this_iter: 17
  episodes_total: 2527
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.841
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.487030029296875
        entropy_coeff: 0.0
        kl: 0.005756831262260675
        policy_loss: -0.0073068272322416306
        total_loss: 11328.2646484375
        vf_explained_var: 1.258320310171257e-07
        vf_loss: 11328.271484375
    load_time_ms: 14.9
    num_steps_sampled: 2730000
    num_steps_trained: 2725632
    sample_time_ms: 11010.897
    update_time_ms: 9.436
  iterations_since_restore: 182
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,182,2577.27,2730000,3138.76








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-57-49
  done: false
  episode_len_mean: 807.76
  episode_reward_max: 5993.861092464269
  episode_reward_mean: 2886.763295451008
  episode_reward_min: 1130.241898485596
  episodes_this_iter: 24
  episodes_total: 2551
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3204.893
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.3554801940918
        entropy_coeff: 0.0
        kl: 0.0071089668199419975
        policy_loss: -0.007201753556728363
        total_loss: 13852.537109375
        vf_explained_var: 1.7321008272119798e-07
        vf_loss: 13852.5458984375
    load_time_ms: 15.362
    num_steps_sampled: 2745000
    num_steps_trained: 2740608
    sample_time_ms: 10965.371
    update_time_ms: 9.495
  iterations_since_restore: 183
  node_ip: 192.168.107.157
  num_healthy_work

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,183,2591.19,2745000,2886.76








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-58-04
  done: false
  episode_len_mean: 764.02
  episode_reward_max: 5993.861092464269
  episode_reward_mean: 2805.9079132319516
  episode_reward_min: 1130.241898485596
  episodes_this_iter: 23
  episodes_total: 2574
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.044
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.31724166870117
        entropy_coeff: 0.0
        kl: 0.006905051413923502
        policy_loss: -0.006100557744503021
        total_loss: 12476.5869140625
        vf_explained_var: 9.679387069638778e-09
        vf_loss: 12476.591796875
    load_time_ms: 14.785
    num_steps_sampled: 2760000
    num_steps_trained: 2755584
    sample_time_ms: 11042.429
    update_time_ms: 9.169
  iterations_since_restore: 184
  node_ip: 192.168.107.157
  num_healthy_work

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,184,2606.38,2760000,2805.91








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-58-19
  done: false
  episode_len_mean: 731.94
  episode_reward_max: 5993.861092464269
  episode_reward_mean: 2735.888295478804
  episode_reward_min: 1130.241898485596
  episodes_this_iter: 23
  episodes_total: 2597
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3204.088
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.28040313720703
        entropy_coeff: 0.0
        kl: 0.009768706746399403
        policy_loss: -0.00820294301956892
        total_loss: 12157.416015625
        vf_explained_var: 4.635916894812908e-08
        vf_loss: 12157.4208984375
    load_time_ms: 15.28
    num_steps_sampled: 2775000
    num_steps_trained: 2770560
    sample_time_ms: 11120.925
    update_time_ms: 8.989
  iterations_since_restore: 185
  node_ip: 192.168.107.157
  num_healthy_workers

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,185,2621.69,2775000,2735.89






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-58-34
  done: false
  episode_len_mean: 728.83
  episode_reward_max: 5993.861092464269
  episode_reward_mean: 2759.037665734448
  episode_reward_min: 1130.241898485596
  episodes_this_iter: 17
  episodes_total: 2614
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3200.684
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.44777297973633
        entropy_coeff: 0.0
        kl: 0.008616230450570583
        policy_loss: -0.0055970801040530205
        total_loss: 11816.8388671875
        vf_explained_var: -9.679387069638778e-09
        vf_loss: 11816.84375
    load_time_ms: 15.694
    num_steps_sampled: 2790000
    num_steps_trained: 2785536
    sample_time_ms: 11154.043
    update_time_ms: 8.708
  iterations_since_restore: 186
  node_ip: 192.168.107.157
  num_healthy_workers

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,186,2636.25,2790000,2759.04








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-58-48
  done: false
  episode_len_mean: 730.04
  episode_reward_max: 5822.909884910611
  episode_reward_mean: 2777.204031611365
  episode_reward_min: 1157.2631214088542
  episodes_this_iter: 19
  episodes_total: 2633
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3188.224
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.31306457519531
        entropy_coeff: 0.0
        kl: 0.008315199986100197
        policy_loss: -0.007891099900007248
        total_loss: 13137.974609375
        vf_explained_var: 1.9104052739749022e-07
        vf_loss: 13137.9814453125
    load_time_ms: 15.368
    num_steps_sampled: 2805000
    num_steps_trained: 2800512
    sample_time_ms: 11232.756
    update_time_ms: 8.674
  iterations_since_restore: 187
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,187,2650.43,2805000,2777.2








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-59-03
  done: false
  episode_len_mean: 742.37
  episode_reward_max: 5822.909884910611
  episode_reward_mean: 2828.5879829045657
  episode_reward_min: 1157.2631214088542
  episodes_this_iter: 18
  episodes_total: 2651
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3195.096
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.349151611328125
        entropy_coeff: 0.0
        kl: 0.00544386962428689
        policy_loss: -0.006691742688417435
        total_loss: 11997.8173828125
        vf_explained_var: 2.2670143096092943e-07
        vf_loss: 11997.826171875
    load_time_ms: 15.244
    num_steps_sampled: 2820000
    num_steps_trained: 2815488
    sample_time_ms: 11316.698
    update_time_ms: 8.328
  iterations_since_restore: 188
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,188,2665.65,2820000,2828.59








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-59-19
  done: false
  episode_len_mean: 765.57
  episode_reward_max: 5822.909884910611
  episode_reward_mean: 2857.7868652091174
  episode_reward_min: 1157.2631214088542
  episodes_this_iter: 19
  episodes_total: 2670
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3205.845
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.32548904418945
        entropy_coeff: 0.0
        kl: 0.005609674379229546
        policy_loss: -0.006079245824366808
        total_loss: 11530.6953125
        vf_explained_var: 7.590676887048176e-08
        vf_loss: 11530.7021484375
    load_time_ms: 15.601
    num_steps_sampled: 2835000
    num_steps_trained: 2830464
    sample_time_ms: 11403.646
    update_time_ms: 8.428
  iterations_since_restore: 189
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,189,2680.91,2835000,2857.79








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-59-34
  done: false
  episode_len_mean: 762.82
  episode_reward_max: 5822.909884910611
  episode_reward_mean: 2854.0401236940907
  episode_reward_min: 1157.2631214088542
  episodes_this_iter: 22
  episodes_total: 2692
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3217.362
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.521114349365234
        entropy_coeff: 0.0
        kl: 0.008883585222065449
        policy_loss: -0.008209341205656528
        total_loss: 10713.8310546875
        vf_explained_var: 7.692565162642495e-08
        vf_loss: 10713.8408203125
    load_time_ms: 15.152
    num_steps_sampled: 2850000
    num_steps_trained: 2845440
    sample_time_ms: 11470.251
    update_time_ms: 8.43
  iterations_since_restore: 190
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,190,2696.1,2850000,2854.04








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_15-59-48
  done: false
  episode_len_mean: 749.98
  episode_reward_max: 5822.909884910611
  episode_reward_mean: 2794.8717206833367
  episode_reward_min: 1147.3501549506614
  episodes_this_iter: 23
  episodes_total: 2715
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3228.537
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.24672317504883
        entropy_coeff: 0.0
        kl: 0.008797578513622284
        policy_loss: -0.007819319143891335
        total_loss: 13847.7646484375
        vf_explained_var: 9.01711274536865e-08
        vf_loss: 13847.7734375
    load_time_ms: 15.649
    num_steps_sampled: 2865000
    num_steps_trained: 2860416
    sample_time_ms: 11467.388
    update_time_ms: 8.427
  iterations_since_restore: 191
  node_ip: 192.168.107.157
  num_healthy_worker

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,191,2710.51,2865000,2794.87








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-00-04
  done: false
  episode_len_mean: 747.19
  episode_reward_max: 5007.469373646983
  episode_reward_mean: 2772.7946541291935
  episode_reward_min: 1147.3501549506614
  episodes_this_iter: 19
  episodes_total: 2734
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3227.258
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.47814178466797
        entropy_coeff: 0.0
        kl: 0.009094121865928173
        policy_loss: -0.006068889982998371
        total_loss: 10690.4677734375
        vf_explained_var: 4.432140343624269e-08
        vf_loss: 10690.4755859375
    load_time_ms: 15.191
    num_steps_sampled: 2880000
    num_steps_trained: 2875392
    sample_time_ms: 11597.237
    update_time_ms: 8.257
  iterations_since_restore: 192
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,192,2725.96,2880000,2772.79








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-00-18
  done: false
  episode_len_mean: 740.59
  episode_reward_max: 5007.469373646983
  episode_reward_mean: 2718.384027783743
  episode_reward_min: 1147.3501549506614
  episodes_this_iter: 19
  episodes_total: 2753
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3231.477
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.47286605834961
        entropy_coeff: 0.0
        kl: 0.0076568108052015305
        policy_loss: -0.006918212398886681
        total_loss: 11493.2724609375
        vf_explained_var: 7.692565162642495e-08
        vf_loss: 11493.279296875
    load_time_ms: 15.088
    num_steps_sampled: 2895000
    num_steps_trained: 2890368
    sample_time_ms: 11669.064
    update_time_ms: 8.093
  iterations_since_restore: 193
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,193,2740.64,2895000,2718.38








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-00-33
  done: false
  episode_len_mean: 728.57
  episode_reward_max: 5007.469373646983
  episode_reward_mean: 2738.1882562331116
  episode_reward_min: 1147.3501549506614
  episodes_this_iter: 22
  episodes_total: 2775
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3234.882
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.271270751953125
        entropy_coeff: 0.0
        kl: 0.0063324267975986
        policy_loss: -0.0067916554398834705
        total_loss: 13851.7265625
        vf_explained_var: 2.801927756479472e-08
        vf_loss: 13851.732421875
    load_time_ms: 15.698
    num_steps_sampled: 2910000
    num_steps_trained: 2905344
    sample_time_ms: 11558.602
    update_time_ms: 8.177
  iterations_since_restore: 194
  node_ip: 192.168.107.157
  num_healthy_worker

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,194,2754.76,2910000,2738.19








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-00-47
  done: false
  episode_len_mean: 740.03
  episode_reward_max: 5755.040896356811
  episode_reward_mean: 2746.6947933382444
  episode_reward_min: 1264.1404794675632
  episodes_this_iter: 21
  episodes_total: 2796
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3221.316
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.40583801269531
        entropy_coeff: 0.0
        kl: 0.009676715359091759
        policy_loss: -0.007912734523415565
        total_loss: 11363.259765625
        vf_explained_var: 3.209481036492434e-08
        vf_loss: 11363.2685546875
    load_time_ms: 14.968
    num_steps_sampled: 2925000
    num_steps_trained: 2920320
    sample_time_ms: 11493.468
    update_time_ms: 8.315
  iterations_since_restore: 195
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,195,2769.29,2925000,2746.69








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-01-03
  done: false
  episode_len_mean: 710.98
  episode_reward_max: 5755.040896356811
  episode_reward_mean: 2682.50631106183
  episode_reward_min: 1264.1404794675632
  episodes_this_iter: 23
  episodes_total: 2819
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3227.748
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.273563385009766
        entropy_coeff: 0.0
        kl: 0.009758539497852325
        policy_loss: -0.006090736947953701
        total_loss: 12310.0068359375
        vf_explained_var: -1.2736035337468365e-08
        vf_loss: 12310.0107421875
    load_time_ms: 15.087
    num_steps_sampled: 2940000
    num_steps_trained: 2935296
    sample_time_ms: 11569.84
    update_time_ms: 8.595
  iterations_since_restore: 196
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,196,2784.68,2940000,2682.51








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-01-17
  done: false
  episode_len_mean: 715.63
  episode_reward_max: 5864.327617379778
  episode_reward_mean: 2732.2566146481167
  episode_reward_min: 1264.1404794675632
  episodes_this_iter: 20
  episodes_total: 2839
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3237.729
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.35956954956055
        entropy_coeff: 0.0
        kl: 0.013771713711321354
        policy_loss: -0.007850994355976582
        total_loss: 12157.677734375
        vf_explained_var: -6.113296979748384e-09
        vf_loss: 12157.6884765625
    load_time_ms: 15.094
    num_steps_sampled: 2955000
    num_steps_trained: 2950272
    sample_time_ms: 11599.542
    update_time_ms: 8.697
  iterations_since_restore: 197
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,197,2799.26,2955000,2732.26








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-01-32
  done: false
  episode_len_mean: 698.87
  episode_reward_max: 5864.327617379778
  episode_reward_mean: 2675.5913657608435
  episode_reward_min: 1264.1404794675632
  episodes_this_iter: 18
  episodes_total: 2857
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3233.308
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.26490020751953
        entropy_coeff: 0.0
        kl: 0.010220196098089218
        policy_loss: -0.009056040085852146
        total_loss: 12258.095703125
        vf_explained_var: 1.4773800849354757e-08
        vf_loss: 12258.103515625
    load_time_ms: 15.3
    num_steps_sampled: 2970000
    num_steps_trained: 2965248
    sample_time_ms: 11573.466
    update_time_ms: 9.013
  iterations_since_restore: 198
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,198,2814.19,2970000,2675.59








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-01-47
  done: false
  episode_len_mean: 736.59
  episode_reward_max: 5864.327617379778
  episode_reward_mean: 2770.9595697631034
  episode_reward_min: 1264.1404794675632
  episodes_this_iter: 21
  episodes_total: 2878
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3235.517
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.466941833496094
        entropy_coeff: 0.0
        kl: 0.00929640606045723
        policy_loss: -0.007317007053643465
        total_loss: 11199.45703125
        vf_explained_var: 3.056648489874192e-09
        vf_loss: 11199.46484375
    load_time_ms: 15.623
    num_steps_sampled: 2985000
    num_steps_trained: 2980224
    sample_time_ms: 11527.531
    update_time_ms: 8.966
  iterations_since_restore: 199
  node_ip: 192.168.107.157
  num_healthy_worker

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,199,2829.01,2985000,2770.96








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-02-03
  done: false
  episode_len_mean: 727.52
  episode_reward_max: 5864.327617379778
  episode_reward_mean: 2732.0405595583284
  episode_reward_min: 1314.2168700494578
  episodes_this_iter: 22
  episodes_total: 2900
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3225.144
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.292659759521484
        entropy_coeff: 0.0
        kl: 0.009222351014614105
        policy_loss: -0.008599299937486649
        total_loss: 11781.5361328125
        vf_explained_var: 1.528324244937096e-09
        vf_loss: 11781.544921875
    load_time_ms: 15.748
    num_steps_sampled: 3000000
    num_steps_trained: 2995200
    sample_time_ms: 11634.718
    update_time_ms: 8.968
  iterations_since_restore: 200
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,200,2845.17,3000000,2732.04






Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-02-18
  done: false
  episode_len_mean: 739.16
  episode_reward_max: 5864.327617379778
  episode_reward_mean: 2762.3348539705617
  episode_reward_min: 1455.8159642368157
  episodes_this_iter: 16
  episodes_total: 2916
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3201.541
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.34967803955078
        entropy_coeff: 0.0
        kl: 0.0082527045160532
        policy_loss: -0.008918317966163158
        total_loss: 11050.634765625
        vf_explained_var: 3.515145863275393e-08
        vf_loss: 11050.6435546875
    load_time_ms: 15.927
    num_steps_sampled: 3015000
    num_steps_trained: 3010176
    sample_time_ms: 11697.22
    update_time_ms: 8.803
  iterations_since_restore: 201
  node_ip: 192.168.107.157
  num_healthy_worker

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,201,2859.97,3015000,2762.33








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-02-33
  done: false
  episode_len_mean: 733.56
  episode_reward_max: 5391.938177604421
  episode_reward_mean: 2697.9759612280495
  episode_reward_min: 1240.6494943046648
  episodes_this_iter: 20
  episodes_total: 2936
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3202.017
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.57575988769531
        entropy_coeff: 0.0
        kl: 0.0068259225226938725
        policy_loss: -0.007215083111077547
        total_loss: 12298.70703125
        vf_explained_var: 5.654799650756104e-08
        vf_loss: 12298.71484375
    load_time_ms: 15.707
    num_steps_sampled: 3030000
    num_steps_trained: 3025152
    sample_time_ms: 11690.518
    update_time_ms: 8.763
  iterations_since_restore: 202
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,202,2875.35,3030000,2697.98








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-02-48
  done: false
  episode_len_mean: 786.3
  episode_reward_max: 5391.938177604421
  episode_reward_mean: 2834.9490044344006
  episode_reward_min: 1240.6494943046648
  episodes_this_iter: 18
  episodes_total: 2954
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3180.853
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.559329986572266
        entropy_coeff: 0.0
        kl: 0.007145124953240156
        policy_loss: -0.006883104797452688
        total_loss: 10009.7763671875
        vf_explained_var: 5.5019672373646245e-08
        vf_loss: 10009.7841796875
    load_time_ms: 15.363
    num_steps_sampled: 3045000
    num_steps_trained: 3040128
    sample_time_ms: 11753.487
    update_time_ms: 8.875
  iterations_since_restore: 203
  node_ip: 192.168.107.157
  num_healthy_w

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,203,2890.45,3045000,2834.95








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-03-04
  done: false
  episode_len_mean: 770.84
  episode_reward_max: 5124.914666509165
  episode_reward_mean: 2735.907776661581
  episode_reward_min: 1175.2432156274394
  episodes_this_iter: 21
  episodes_total: 2975
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3171.146
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.49265670776367
        entropy_coeff: 0.0
        kl: 0.009282813407480717
        policy_loss: -0.007672078441828489
        total_loss: 11088.455078125
        vf_explained_var: 1.839083552113152e-07
        vf_loss: 11088.462890625
    load_time_ms: 14.731
    num_steps_sampled: 3060000
    num_steps_trained: 3055104
    sample_time_ms: 11910.029
    update_time_ms: 9.095
  iterations_since_restore: 204
  node_ip: 192.168.107.157
  num_healthy_worke

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,204,2906.04,3060000,2735.91








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-03-19
  done: false
  episode_len_mean: 795.45
  episode_reward_max: 5124.914666509165
  episode_reward_mean: 2829.7837232923125
  episode_reward_min: 1175.2432156274394
  episodes_this_iter: 18
  episodes_total: 2993
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3187.667
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.44514465332031
        entropy_coeff: 0.0
        kl: 0.012001406401395798
        policy_loss: -0.00810584332793951
        total_loss: 11218.2822265625
        vf_explained_var: 0.0
        vf_loss: 11218.2890625
    load_time_ms: 14.813
    num_steps_sampled: 3075000
    num_steps_trained: 3070080
    sample_time_ms: 11977.804
    update_time_ms: 9.355
  iterations_since_restore: 205
  node_ip: 192.168.107.157
  num_healthy_workers: 15
  off_policy

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,205,2921.4,3075000,2829.78








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-03-35
  done: false
  episode_len_mean: 790.32
  episode_reward_max: 5124.914666509165
  episode_reward_mean: 2835.390873110786
  episode_reward_min: 1137.5882795459881
  episodes_this_iter: 19
  episodes_total: 3012
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3187.399
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.40248107910156
        entropy_coeff: 0.0
        kl: 0.009555650874972343
        policy_loss: -0.007494200486689806
        total_loss: 11170.1953125
        vf_explained_var: 2.547207111902594e-09
        vf_loss: 11170.2021484375
    load_time_ms: 14.515
    num_steps_sampled: 3090000
    num_steps_trained: 3085056
    sample_time_ms: 11992.289
    update_time_ms: 9.045
  iterations_since_restore: 206
  node_ip: 192.168.107.157
  num_healthy_worker

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,206,2936.93,3090000,2835.39








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-03-50
  done: false
  episode_len_mean: 763.28
  episode_reward_max: 5362.620290760927
  episode_reward_mean: 2766.2690417905415
  episode_reward_min: 1137.5882795459881
  episodes_this_iter: 25
  episodes_total: 3037
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3188.16
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.1799430847168
        entropy_coeff: 0.0
        kl: 0.006314282305538654
        policy_loss: -0.0073523325845599174
        total_loss: 11567.5595703125
        vf_explained_var: -3.5660898678457897e-09
        vf_loss: 11567.56640625
    load_time_ms: 14.896
    num_steps_sampled: 3105000
    num_steps_trained: 3100032
    sample_time_ms: 12068.544
    update_time_ms: 8.982
  iterations_since_restore: 207
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,207,2952.28,3105000,2766.27








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-04-05
  done: false
  episode_len_mean: 734.44
  episode_reward_max: 5362.620290760927
  episode_reward_mean: 2681.711376827134
  episode_reward_min: 1137.5882795459881
  episodes_this_iter: 18
  episodes_total: 3055
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3187.868
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.55005645751953
        entropy_coeff: 0.0
        kl: 0.007590185385197401
        policy_loss: -0.00779733108356595
        total_loss: 12221.0244140625
        vf_explained_var: 2.0377656895220753e-08
        vf_loss: 12221.0322265625
    load_time_ms: 14.884
    num_steps_sampled: 3120000
    num_steps_trained: 3115008
    sample_time_ms: 12032.353
    update_time_ms: 9.021
  iterations_since_restore: 208
  node_ip: 192.168.107.157
  num_healthy_wor

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,208,2966.85,3120000,2681.71








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-04-20
  done: false
  episode_len_mean: 745.43
  episode_reward_max: 5362.620290760927
  episode_reward_mean: 2726.952879431863
  episode_reward_min: 1137.5882795459881
  episodes_this_iter: 20
  episodes_total: 3075
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3186.695
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.30479431152344
        entropy_coeff: 0.0
        kl: 0.00808705110102892
        policy_loss: -0.008481542579829693
        total_loss: 10477.5634765625
        vf_explained_var: 6.266129304322021e-08
        vf_loss: 10477.572265625
    load_time_ms: 14.069
    num_steps_sampled: 3135000
    num_steps_trained: 3129984
    sample_time_ms: 12016.28
    update_time_ms: 9.192
  iterations_since_restore: 209
  node_ip: 192.168.107.157
  num_healthy_worker

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,209,2981.49,3135000,2726.95








Result for PPO_myEnv-v0_06b10b5c:
  custom_metrics: {}
  date: 2021-03-08_16-04-35
  done: false
  episode_len_mean: 716.45
  episode_reward_max: 5362.620290760927
  episode_reward_mean: 2600.4605390637284
  episode_reward_min: 1137.5882795459881
  episodes_this_iter: 21
  episodes_total: 3096
  experiment_id: ca3df19adaf641c4a8a37e6d5e7d79ba
  experiment_tag: '0'
  hostname: Crystalcomp
  info:
    grad_time_ms: 3193.038
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 32.30915069580078
        entropy_coeff: 0.0
        kl: 0.009967433288693428
        policy_loss: -0.007576256524771452
        total_loss: 11930.001953125
        vf_explained_var: -8.660504313695583e-09
        vf_loss: 11930.0087890625
    load_time_ms: 14.457
    num_steps_sampled: 3150000
    num_steps_trained: 3144960
    sample_time_ms: 11888.366
    update_time_ms: 9.194
  iterations_since_restore: 210
  node_ip: 192.168.107.157
  num_healthy_wo

Trial name,status,loc,iter,total time (s),timesteps,reward
PPO_myEnv-v0_06b10b5c,RUNNING,192.168.107.157:14857,210,2996.44,3150000,2600.46










In [None]:
# trials = run_experiments({
#     flow_params["exp_tag"]: {
#         "run": alg_run,
#         "env": gym_name,
#         "config": {
#             **config
#         },
#         "restore": "/ray_results/stabilizing_the_ring/TestE/checkpoint_500/checkpoint-500"
#         "checkpoint_freq": 1,
#         "checkpoint_at_end": False,
#         "max_failures": 999,
#         "stop": {
#             "training_iteration": 1,
#         },
#     },
# })