# Sequential Q-learning

In this notebook, we'll build a neural network that can learn to play games through reinforcement learning. More specifically, we'll use Q-learning to train an agent to play a game called [Cart-Pole](https://gym.openai.com/envs/CartPole-v0). In this game, a freely swinging pole is attached to a cart. The cart can move to the left and right, and the goal is to keep the pole upright as long as possible.

![Cart-Pole](assets/cart-pole.jpg)

We can simulate this game using [OpenAI Gym](https://gym.openai.com/). First, let's check out how OpenAI Gym works. Then, we'll get into training an agent to play the Cart-Pole game.

In [1]:
import gym
import numpy as np

In [2]:
# In this one we should define and detect GPUs for tensorflow
# GPUs or CPU
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.7.1
Default GPU Device: 


>**Note:** Make sure you have OpenAI Gym cloned into the same directory with this notebook. I've included `gym` as a submodule, so you can run `git submodule --init --recursive` to pull the contents into the `gym` repo.

>**Note:** Make sure you have OpenAI Gym cloned. Then run this command `pip install -e gym/[all]`.

In [3]:
import gym

# Create the Cart-Pole game environment
#env = gym.make('CartPole-v0')
env = gym.make('CartPole-v1')

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m




We interact with the simulation through `env`. To show the simulation running, you can use `env.render()` to render one frame. Passing in an action as an integer to `env.step` will generate the next step in the simulation.  You can see how many actions are possible from `env.action_space` and to get a random action you can use `env.action_space.sample()`. This is general to all Gym games. In the Cart-Pole game, there are two possible actions, moving the cart left or right. So there are two actions we can take, encoded as 0 and 1.

Run the code below to watch the simulation run.

In [4]:
env.reset()
for _ in range(10):
    # env.render()
    action = env.action_space.sample()
    state, reward, done, info = env.step(action) # take a random action
    print('state, action, reward, done, info:', state, action, reward, done, info)
    if done:
        env.reset()

state, action, reward, done, info: [-0.01977852 -0.15034484 -0.02802841  0.2722864 ] 0 1.0 False {}
state, action, reward, done, info: [-0.02278542  0.04516561 -0.02258268 -0.02910336] 1 1.0 False {}
state, action, reward, done, info: [-0.02188211  0.24060401 -0.02316475 -0.32882495] 1 1.0 False {}
state, action, reward, done, info: [-0.01707003  0.04581936 -0.02974125 -0.04353619] 0 1.0 False {}
state, action, reward, done, info: [-0.01615364  0.24135489 -0.03061197 -0.34545241] 1 1.0 False {}
state, action, reward, done, info: [-0.01132654  0.43689862 -0.03752102 -0.6476291 ] 1 1.0 False {}
state, action, reward, done, info: [-0.00258857  0.63252268 -0.0504736  -0.95188725] 1 1.0 False {}
state, action, reward, done, info: [ 0.01006188  0.82828621 -0.06951135 -1.25999152] 1 1.0 False {}
state, action, reward, done, info: [ 0.02662761  1.0242251  -0.09471118 -1.57360997] 1 1.0 False {}
state, action, reward, done, info: [ 0.04711211  1.22034044 -0.12618338 -1.89426804] 1 1.0 False {}


To shut the window showing the simulation, use `env.close()`.

If you ran the simulation above, we can look at the rewards:

In [5]:
# print(rewards[-20:])
# print(np.array(rewards).shape, np.array(states).shape, np.array(actions).shape, np.array(dones).shape)
# print(np.array(rewards).dtype, np.array(states).dtype, np.array(actions).dtype, np.array(dones).dtype)
# print(np.max(np.array(actions)), np.min(np.array(actions)))
# print((np.max(np.array(actions)) - np.min(np.array(actions)))+1)
# print(np.max(np.array(rewards)), np.min(np.array(rewards)))
# print(np.max(np.array(states)), np.min(np.array(states)))

The game resets after the pole has fallen past a certain angle. For each frame while the simulation is running, it returns a reward of 1.0. The longer the game runs, the more reward we get. Then, our network's goal is to maximize the reward by keeping the pole vertical. It will do this by moving the cart to the left and the right.

## Q-Network

We train our Q-learning agent using the Bellman Equation:

$$
Q(s, a) = r + \gamma \max{Q(s', a')}
$$

where $s$ is a state, $a$ is an action, and $s'$ is the next state from state $s$ and action $a$.

Before we used this equation to learn values for a Q-_table_. However, for this game there are a huge number of states available. The state has four values: the position and velocity of the cart, and the position and velocity of the pole. These are all real-valued numbers, so ignoring floating point precisions, you practically have infinite states. Instead of using a table then, we'll replace it with a neural network that will approximate the Q-table lookup function.

<img src="assets/deep-q-learning.png" width=450px>

Now, our Q value, $Q(s, a)$ is calculated by passing in a state to the network. The output will be Q-values for each available action, with fully connected hidden layers.

<img src="assets/q-network.png" width=550px>


As I showed before, we can define our targets for training as $\hat{Q}(s,a) = r + \gamma \max{Q(s', a')}$. Then we update the weights by minimizing $(\hat{Q}(s,a) - Q(s,a))^2$. 

For this Cart-Pole game, we have four inputs, one for each value in the state, and two outputs, one for each action. To get $\hat{Q}$, we'll first choose an action, then simulate the game using that action. This will get us the next state, $s'$, and the reward. With that, we can calculate $\hat{Q}$ then pass it back into the $Q$ network to run the optimizer and update the weights.

Below is my implementation of the Q-network. I used two fully connected layers with ReLU activations. Two seems to be good enough, three might be better. Feel free to try it out.

In [6]:
def model_input(state_size, lstm_size, batch_size=1):
    actions = tf.placeholder(tf.int32, [None], name='actions')
    states = tf.placeholder(tf.float32, [None, state_size], name='states')
    targetQs = tf.placeholder(tf.float32, [None], name='targetQs')
    labelQs = tf.placeholder(tf.float32, [None], name='labelQs')
        
    gru = tf.nn.rnn_cell.GRUCell(lstm_size)
    cell = tf.nn.rnn_cell.MultiRNNCell([gru], state_is_tuple=False)
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    return actions, states, targetQs, labelQs, cell, initial_state

In [7]:
# RNN generator or sequence generator
def generator(states, initial_state, cell, lstm_size, num_classes, reuse=False): 
    with tf.variable_scope('generator', reuse=reuse):
        # First fully connected layer
        inputs = tf.layers.dense(inputs=states, units=lstm_size)
        print(states.shape, inputs.shape)
        
        # with tf.variable_scope('dynamic_rnn_', reuse=tf.AUTO_REUSE):
        # dynamic means adapt to the batch_size
        inputs_rnn = tf.reshape(inputs, [1, -1, lstm_size]) # NxH -> 1xNxH
        print(inputs_rnn.shape, initial_state.shape)
        outputs_rnn, final_state = tf.nn.dynamic_rnn(cell=cell, inputs=inputs_rnn, initial_state=initial_state)
        print(outputs_rnn.shape, final_state.shape)
        outputs = tf.reshape(outputs_rnn, [-1, lstm_size]) # 1xNxH -> NxH
        print(outputs.shape)

        # Last fully connected layer
        logits = tf.layers.dense(inputs=outputs, units=num_classes)
        print(logits.shape)
        #predictions = tf.nn.softmax(logits)
        
        # logits are the action logits
        return logits, final_state

In [8]:
def model_loss(action_size, hidden_size, states, cell, initial_state, actions, targetQs, labelQs):
    actions_logits, final_state = generator(states=states, cell=cell, initial_state=initial_state, 
                                            lstm_size=hidden_size, num_classes=action_size)
    actions_labels = tf.one_hot(indices=actions, depth=action_size, dtype=actions_logits.dtype)
    Qs = tf.reduce_max(actions_logits*actions_labels, axis=1)
    lossQtgt = tf.reduce_mean(tf.square(Qs - targetQs)) # next state, next action and nextQs
    lossQlbl = tf.reduce_mean(tf.square(Qs - labelQs)) # current state, action, and currentQs
    lossQtgt_sigm = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=Qs, 
                                                                           labels=tf.nn.sigmoid(targetQs)))
    lossQlbl_sigm = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=Qs,
                                                                           labels=tf.nn.sigmoid(labelQs)))
    loss = lossQtgt + lossQlbl + lossQtgt_sigm + lossQlbl_sigm
    return actions_logits, final_state, loss, lossQtgt, lossQlbl, lossQtgt_sigm, lossQlbl_sigm

In [9]:
def model_opt(loss, learning_rate):
    """
    Get optimization operations in order
    :param loss: Generator loss Tensor for action prediction
    :param learning_rate: Learning Rate Placeholder
    :return: A tuple of (qfunction training, generator training, discriminator training)
    """
    # Get weights and bias to update
    t_vars = tf.trainable_variables()
    g_vars = [var for var in t_vars if var.name.startswith('generator')]

    # # Optimize
    # with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): # Required for batchnorm (BN)
    # #opt = tf.train.AdamOptimizer(learning_rate).minimize(loss, var_list=g_vars)

    #grads, _ = tf.clip_by_global_norm(t_list=tf.gradients(loss, g_vars), clip_norm=5) # usually around 1-5
    grads = tf.gradients(loss, g_vars)
    opt = tf.train.AdamOptimizer(learning_rate).apply_gradients(grads_and_vars=zip(grads, g_vars))

    return opt

In [10]:
class Model:
    def __init__(self, state_size, action_size, hidden_size, learning_rate):

        # Data of the Model: make the data available inside the framework
        self.actions, self.states, self.targetQs, self.labelQs, cell, self.initial_state = model_input(
            state_size=state_size, lstm_size=hidden_size)
        
        # Create the Model: calculating the loss and forwad pass
        self.actions_logits, self.final_state, self.loss, self.lossQtgt, self.lossQlbl, self.lossQtgt_sigm, self.lossQlbl_sigm = model_loss(
            action_size=action_size, hidden_size=hidden_size, 
            states=self.states, actions=self.actions, 
            targetQs=self.targetQs, labelQs=self.labelQs, 
            cell=cell, initial_state=self.initial_state)

        # Update the model: backward pass and backprop
        self.opt = model_opt(loss=self.loss, learning_rate=learning_rate)

## Experience replay

Reinforcement learning algorithms can have stability issues due to correlations between states. To reduce correlations when training, we can store the agent's experiences and later draw a random mini-batch of those experiences to train on. 

Here, we'll create a `Memory` object that will store our experiences, our transitions $<s, a, r, s'>$. This memory will have a maxmium capacity, so we can keep newer experiences in memory while getting rid of older experiences. Then, we'll sample a random mini-batch of transitions $<s, a, r, s'>$ and train on those.

Below, I've implemented a `Memory` object. If you're unfamiliar with `deque`, this is a double-ended queue. You can think of it like a tube open on both sides. You can put objects in either side of the tube. But if it's full, adding anything more will push an object out the other side. This is a great data structure to use for the memory buffer.

In [11]:
from collections import deque

class Memory():    
    def __init__(self, max_size = 1000):
        self.buffer = deque(maxlen=max_size)
        self.states = deque(maxlen=max_size)

## Exploration - Exploitation

To learn about the environment and rules of the game, the agent needs to explore by taking random actions. We'll do this by choosing a random action with some probability $\epsilon$ (epsilon).  That is, with some probability $\epsilon$ the agent will make a random action and with probability $1 - \epsilon$, the agent will choose an action from $Q(s,a)$. This is called an **$\epsilon$-greedy policy**.


At first, the agent needs to do a lot of exploring. Later when it has learned more, the agent can favor choosing actions based on what it has learned. This is called _exploitation_. We'll set it up so the agent is more likely to explore early in training, then more likely to exploit later in training.

## Q-Learning training algorithm

Putting all this together, we can list out the algorithm we'll use to train the network. We'll train the network in _episodes_. One *episode* is one simulation of the game. For this game, the goal is to keep the pole upright for 195 frames. So we can start a new episode once meeting that goal. The game ends if the pole tilts over too far, or if the cart moves too far the left or right. When a game ends, we'll start a new episode. Now, to train the agent:

* Initialize the memory $D$
* Initialize the action-value network $Q$ with random weights
* **For** episode = 1, $M$ **do**
  * **For** $t$, $T$ **do**
     * With probability $\epsilon$ select a random action $a_t$, otherwise select $a_t = \mathrm{argmax}_a Q(s,a)$
     * Execute action $a_t$ in simulator and observe reward $r_{t+1}$ and new state $s_{t+1}$
     * Store transition $<s_t, a_t, r_{t+1}, s_{t+1}>$ in memory $D$
     * Sample random mini-batch from $D$: $<s_j, a_j, r_j, s'_j>$
     * Set $\hat{Q}_j = r_j$ if the episode ends at $j+1$, otherwise set $\hat{Q}_j = r_j + \gamma \max_{a'}{Q(s'_j, a')}$
     * Make a gradient descent step with loss $(\hat{Q}_j - Q(s_j, a_j))^2$
  * **endfor**
* **endfor**

## Hyperparameters

One of the more difficult aspects of reinforcememt learning are the large number of hyperparameters. Not only are we tuning the network, but we're tuning the simulation.

In [12]:
# print('state:', np.array(states).shape[1], 
#       'action size: {}'.format((np.max(np.array(actions)) - np.min(np.array(actions)))+1))

In [13]:
# Training parameters
# Network parameters
state_size = 4                 # number of units for the input state/observation -- simulation
action_size = 2                # number of units for the output actions -- simulation
hidden_size = 64               # number of units in each Q-network hidden layer -- simulation
batch_size = 32                # number of samples in the memory/ experience as mini-batch size
learning_rate = 0.001          # learning rate for adam

In [14]:
# Reset/init the graph/session
graph = tf.reset_default_graph()

# Init the model
model = Model(action_size=action_size, hidden_size=hidden_size, state_size=state_size, learning_rate=learning_rate)

# Init the memory
memory = Memory(max_size=batch_size)

(?, 4) (?, 64)
(1, ?, 64) (1, 64)
(1, ?, 64) (1, 64)
(?, 64)
(?, 2)


## Populate the memory (exprience memory)

Here I'm re-initializing the simulation and pre-populating the memory. The agent is taking random actions and storing the transitions in memory. This will help the agent with exploring the game.

In [15]:
state = env.reset()
for _ in range(batch_size):
    action = env.action_space.sample()
    next_state, reward, done, _ = env.step(action)
    memory.buffer.append([state, action, next_state, reward, float(done)])
    state = next_state
    if done is True:
        state = env.reset()

## Training the model

Below we'll train our agent. If you want to watch it train, uncomment the `env.render()` line. This is slow because it's rendering the frames slower than the network can train. But, it's cool to watch the agent get better at the game.

In [16]:
memory.buffer[0]

[array([ 0.0345243 , -0.00959876,  0.04293128,  0.02900301]),
 1,
 array([ 0.03433232,  0.18488207,  0.04351134, -0.24983153]),
 1.0,
 0.0]

In [17]:
# states, rewards, actions

In [None]:
# Now train with experiences
saver = tf.train.Saver() # save the trained model
rewards_list, loss_list = [], []

# TF session for training
with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    episode_loss = deque(maxlen=batch_size)
    episode_reward = deque(maxlen=batch_size)
    
    # Training episodes/epochs
    for ep in range(11111):
        total_reward = 0
        loss_batch = []
        lossQlbl_batch, lossQlbl_sigm_batch, lossQtgt_batch, lossQtgt_sigm_batch = [], [], [], []
        state = env.reset()
        initial_state = sess.run(model.initial_state) # Qs or current batch or states[:-1]

        # Training steps/batches
        while True:
            # Testing
            action_logits, final_state = sess.run([model.actions_logits, model.final_state],
                                                  feed_dict = {model.states: state.reshape([1, -1]), 
                                                               model.initial_state: initial_state})
            action = np.argmax(action_logits)
            next_state, reward, done, _ = env.step(action)
            memory.buffer.append([state, action, next_state, reward, float(done)])
            memory.states.append([initial_state, final_state])
            total_reward += reward
            initial_state = final_state
            state = next_state

            # Training
            batch = memory.buffer
            states = np.array([each[0] for each in batch])
            actions = np.array([each[1] for each in batch])
            next_states = np.array([each[2] for each in batch])
            rewards = np.array([each[3] for each in batch])
            dones = np.array([each[4] for each in batch])
            rnn_states = memory.states
            initial_states = np.array([each[0] for each in rnn_states])
            final_states = np.array([each[1] for each in rnn_states])
            actions_logits = sess.run(model.actions_logits, 
                                      feed_dict = {model.states: states, 
                                                   model.initial_state: initial_states[0].reshape([1, -1])})
            labelQs = np.max(actions_logits, axis=1) # explore
            next_actions_logits = sess.run(model.actions_logits, 
                                           feed_dict = {model.states: next_states, 
                                                        model.initial_state: final_states[0].reshape([1, -1])})
            nextQs = np.max(next_actions_logits, axis=1) * (1-dones) # exploit
            targetQs = rewards + (0.99 * nextQs)
            loss, _, lossQlbl, lossQlbl_sigm, lossQtgt, lossQtgt_sigm = sess.run([model.loss, model.opt, 
                                                                                  model.lossQlbl, 
                                                                                  model.lossQlbl_sigm, 
                                                                                  model.lossQtgt, 
                                                                                  model.lossQtgt_sigm], 
                                            feed_dict = {model.states: states, 
                                                         model.actions: actions,
                                                         model.targetQs: targetQs,
                                                         
                                                         model.labelQs: labelQs,
                                                         model.initial_state: initial_states[0].reshape([1, -1])})
            loss_batch.append(loss)
            lossQlbl_batch.append(lossQlbl)
            lossQlbl_sigm_batch.append(lossQlbl_sigm)
            lossQtgt_batch.append(lossQtgt)
            lossQtgt_sigm_batch.append(lossQtgt_sigm)
            if done is True:
                break
                
        episode_reward.append(total_reward)
        print('Episode: {}'.format(ep),
              'meanReward: {:.4f}'.format(np.mean(episode_reward)),
              'meanLoss: {:.4f}'.format(np.mean(loss_batch)),
              'meanLossQlbl: {:.4f}'.format(np.mean(lossQlbl_batch)),
              'meanLossQlbl_sigm: {:.4f}'.format(np.mean(lossQlbl_sigm_batch)),
              'meanLossQtgt: {:.4f}'.format(np.mean(lossQtgt_batch)),
              'meanLossQtgt_sigm: {:.4f}'.format(np.mean(lossQtgt_sigm_batch)))
        rewards_list.append([ep, np.mean(episode_reward)])
        loss_list.append([ep, np.mean(loss_batch)])
        if(np.mean(episode_reward) >= 500):
            break
    
    saver.save(sess, 'checkpoints/model5.ckpt')

Episode: 0 meanReward: 8.0000 meanLoss: 2.4964 meanLossQlbl: 0.0702 meanLossQlbl_sigm: 0.6805 meanLossQtgt: 1.0969 meanLossQtgt_sigm: 0.6489
Episode: 1 meanReward: 8.5000 meanLoss: 2.5235 meanLossQlbl: 0.6727 meanLossQlbl_sigm: 0.6896 meanLossQtgt: 0.4811 meanLossQtgt_sigm: 0.6801
Episode: 2 meanReward: 8.6667 meanLoss: 4.3015 meanLossQlbl: 2.0428 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 0.8724 meanLossQtgt_sigm: 0.6931
Episode: 3 meanReward: 8.7500 meanLoss: 6.0796 meanLossQlbl: 3.1585 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 1.5363 meanLossQtgt_sigm: 0.6918
Episode: 4 meanReward: 9.0000 meanLoss: 7.2201 meanLossQlbl: 3.9690 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 1.8655 meanLossQtgt_sigm: 0.6926
Episode: 5 meanReward: 8.8333 meanLoss: 7.6302 meanLossQlbl: 4.2173 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 2.0279 meanLossQtgt_sigm: 0.6920
Episode: 6 meanReward: 8.8571 meanLoss: 7.1715 meanLossQlbl: 3.9283 meanLossQlbl_sigm: 0.6930 meanLossQtgt: 1.8586 meanLossQtgt_sigm: 0.6916
Episode: 7 me

Episode: 59 meanReward: 9.3438 meanLoss: 0.7034 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0620 meanLossQtgt: 0.5220 meanLossQtgt_sigm: 0.1194
Episode: 60 meanReward: 9.3750 meanLoss: 0.6408 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1006 meanLossQtgt: 0.4061 meanLossQtgt_sigm: 0.1342
Episode: 61 meanReward: 9.3438 meanLoss: 0.6298 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1203 meanLossQtgt: 0.3688 meanLossQtgt_sigm: 0.1407
Episode: 62 meanReward: 9.3438 meanLoss: 0.7013 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0875 meanLossQtgt: 0.4843 meanLossQtgt_sigm: 0.1295
Episode: 63 meanReward: 9.3438 meanLoss: 0.6899 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0748 meanLossQtgt: 0.4902 meanLossQtgt_sigm: 0.1249
Episode: 64 meanReward: 9.3438 meanLoss: 0.5796 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0798 meanLossQtgt: 0.3751 meanLossQtgt_sigm: 0.1246
Episode: 65 meanReward: 9.3438 meanLoss: 0.5218 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0959 meanLossQtgt: 0.2973 meanLossQtgt_sigm: 0.1285
Episod

Episode: 117 meanReward: 9.4688 meanLoss: 0.5337 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1361 meanLossQtgt: 0.2497 meanLossQtgt_sigm: 0.1479
Episode: 118 meanReward: 9.4062 meanLoss: 0.5715 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0779 meanLossQtgt: 0.3708 meanLossQtgt_sigm: 0.1229
Episode: 119 meanReward: 9.4062 meanLoss: 0.4397 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0886 meanLossQtgt: 0.2282 meanLossQtgt_sigm: 0.1229
Episode: 120 meanReward: 9.4062 meanLoss: 0.3824 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1185 meanLossQtgt: 0.1287 meanLossQtgt_sigm: 0.1352
Episode: 121 meanReward: 9.4375 meanLoss: 0.3821 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1418 meanLossQtgt: 0.0952 meanLossQtgt_sigm: 0.1451
Episode: 122 meanReward: 9.4688 meanLoss: 0.3861 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1118 meanLossQtgt: 0.1473 meanLossQtgt_sigm: 0.1269
Episode: 123 meanReward: 9.5312 meanLoss: 0.3619 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0921 meanLossQtgt: 0.1519 meanLossQtgt_sigm: 0.1179

Episode: 175 meanReward: 9.7188 meanLoss: 0.5085 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0724 meanLossQtgt: 0.3247 meanLossQtgt_sigm: 0.1114
Episode: 176 meanReward: 9.7188 meanLoss: 0.5438 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0851 meanLossQtgt: 0.3403 meanLossQtgt_sigm: 0.1184
Episode: 177 meanReward: 9.6875 meanLoss: 0.6433 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1227 meanLossQtgt: 0.3828 meanLossQtgt_sigm: 0.1378
Episode: 178 meanReward: 9.6562 meanLoss: 0.7493 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1679 meanLossQtgt: 0.4176 meanLossQtgt_sigm: 0.1639
Episode: 179 meanReward: 9.7188 meanLoss: 0.7726 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1806 meanLossQtgt: 0.4220 meanLossQtgt_sigm: 0.1700
Episode: 180 meanReward: 9.7188 meanLoss: 0.9204 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1133 meanLossQtgt: 0.6712 meanLossQtgt_sigm: 0.1360
Episode: 181 meanReward: 9.6875 meanLoss: 1.0372 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0719 meanLossQtgt: 0.8439 meanLossQtgt_sigm: 0.1214

Episode: 234 meanReward: 9.0000 meanLoss: 0.5215 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1501 meanLossQtgt: 0.2129 meanLossQtgt_sigm: 0.1585
Episode: 235 meanReward: 9.0000 meanLoss: 0.5242 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1477 meanLossQtgt: 0.2220 meanLossQtgt_sigm: 0.1545
Episode: 236 meanReward: 9.0312 meanLoss: 0.5683 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0952 meanLossQtgt: 0.3431 meanLossQtgt_sigm: 0.1300
Episode: 237 meanReward: 9.0312 meanLoss: 0.5954 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0814 meanLossQtgt: 0.3910 meanLossQtgt_sigm: 0.1229
Episode: 238 meanReward: 9.0625 meanLoss: 0.5603 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0914 meanLossQtgt: 0.3379 meanLossQtgt_sigm: 0.1310
Episode: 239 meanReward: 9.0000 meanLoss: 0.5134 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0960 meanLossQtgt: 0.2878 meanLossQtgt_sigm: 0.1296
Episode: 240 meanReward: 9.0625 meanLoss: 0.5156 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0822 meanLossQtgt: 0.3063 meanLossQtgt_sigm: 0.1271

Episode: 292 meanReward: 9.4375 meanLoss: 0.7262 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0944 meanLossQtgt: 0.4872 meanLossQtgt_sigm: 0.1446
Episode: 293 meanReward: 9.4375 meanLoss: 0.5456 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0998 meanLossQtgt: 0.3080 meanLossQtgt_sigm: 0.1378
Episode: 294 meanReward: 9.4688 meanLoss: 0.4525 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0897 meanLossQtgt: 0.2387 meanLossQtgt_sigm: 0.1242
Episode: 295 meanReward: 9.4688 meanLoss: 0.4120 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0830 meanLossQtgt: 0.2121 meanLossQtgt_sigm: 0.1170
Episode: 296 meanReward: 9.4375 meanLoss: 0.3764 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0968 meanLossQtgt: 0.1580 meanLossQtgt_sigm: 0.1216
Episode: 297 meanReward: 9.4062 meanLoss: 0.3584 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1132 meanLossQtgt: 0.1162 meanLossQtgt_sigm: 0.1290
Episode: 298 meanReward: 9.4375 meanLoss: 0.3372 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1054 meanLossQtgt: 0.1079 meanLossQtgt_sigm: 0.1239

Episode: 350 meanReward: 9.3125 meanLoss: 0.5018 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0832 meanLossQtgt: 0.2877 meanLossQtgt_sigm: 0.1309
Episode: 351 meanReward: 9.2812 meanLoss: 0.5544 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1471 meanLossQtgt: 0.2437 meanLossQtgt_sigm: 0.1637
Episode: 352 meanReward: 9.2812 meanLoss: 0.7049 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2318 meanLossQtgt: 0.2546 meanLossQtgt_sigm: 0.2185
Episode: 353 meanReward: 9.2812 meanLoss: 0.8111 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2550 meanLossQtgt: 0.3252 meanLossQtgt_sigm: 0.2309
Episode: 354 meanReward: 9.3125 meanLoss: 0.8045 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1834 meanLossQtgt: 0.4310 meanLossQtgt_sigm: 0.1900
Episode: 355 meanReward: 9.3125 meanLoss: 0.8917 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1274 meanLossQtgt: 0.5998 meanLossQtgt_sigm: 0.1644
Episode: 356 meanReward: 9.3750 meanLoss: 1.0998 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0783 meanLossQtgt: 0.8759 meanLossQtgt_sigm: 0.1457

Episode: 409 meanReward: 9.2812 meanLoss: 0.6937 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1037 meanLossQtgt: 0.4343 meanLossQtgt_sigm: 0.1557
Episode: 410 meanReward: 9.2500 meanLoss: 0.6041 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1134 meanLossQtgt: 0.3377 meanLossQtgt_sigm: 0.1529
Episode: 411 meanReward: 9.2812 meanLoss: 0.6253 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0900 meanLossQtgt: 0.3940 meanLossQtgt_sigm: 0.1413
Episode: 412 meanReward: 9.2500 meanLoss: 0.5614 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0918 meanLossQtgt: 0.3304 meanLossQtgt_sigm: 0.1392
Episode: 413 meanReward: 9.2188 meanLoss: 0.5356 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1029 meanLossQtgt: 0.2918 meanLossQtgt_sigm: 0.1409
Episode: 414 meanReward: 9.2500 meanLoss: 0.5485 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1026 meanLossQtgt: 0.3042 meanLossQtgt_sigm: 0.1416
Episode: 415 meanReward: 9.2812 meanLoss: 0.5128 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1106 meanLossQtgt: 0.2561 meanLossQtgt_sigm: 0.1461

Episode: 468 meanReward: 9.4062 meanLoss: 0.7340 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0576 meanLossQtgt: 0.5478 meanLossQtgt_sigm: 0.1286
Episode: 469 meanReward: 9.3750 meanLoss: 0.6423 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0684 meanLossQtgt: 0.4478 meanLossQtgt_sigm: 0.1261
Episode: 470 meanReward: 9.3438 meanLoss: 0.5773 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0825 meanLossQtgt: 0.3624 meanLossQtgt_sigm: 0.1325
Episode: 471 meanReward: 9.3125 meanLoss: 0.5205 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0865 meanLossQtgt: 0.3007 meanLossQtgt_sigm: 0.1333
Episode: 472 meanReward: 9.2500 meanLoss: 0.4739 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1022 meanLossQtgt: 0.2332 meanLossQtgt_sigm: 0.1385
Episode: 473 meanReward: 9.2500 meanLoss: 0.4143 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1222 meanLossQtgt: 0.1456 meanLossQtgt_sigm: 0.1465
Episode: 474 meanReward: 9.1875 meanLoss: 0.3834 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1336 meanLossQtgt: 0.0998 meanLossQtgt_sigm: 0.1500

Episode: 527 meanReward: 9.6562 meanLoss: 0.3588 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1040 meanLossQtgt: 0.1300 meanLossQtgt_sigm: 0.1249
Episode: 528 meanReward: 9.6875 meanLoss: 0.3906 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1383 meanLossQtgt: 0.1098 meanLossQtgt_sigm: 0.1425
Episode: 529 meanReward: 9.6875 meanLoss: 0.3454 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1322 meanLossQtgt: 0.0735 meanLossQtgt_sigm: 0.1397
Episode: 530 meanReward: 9.6250 meanLoss: 0.3498 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1163 meanLossQtgt: 0.0979 meanLossQtgt_sigm: 0.1355
Episode: 531 meanReward: 9.6562 meanLoss: 0.3447 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1445 meanLossQtgt: 0.0487 meanLossQtgt_sigm: 0.1515
Episode: 532 meanReward: 9.6250 meanLoss: 0.3719 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1705 meanLossQtgt: 0.0291 meanLossQtgt_sigm: 0.1723
Episode: 533 meanReward: 9.5938 meanLoss: 0.4266 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2054 meanLossQtgt: 0.0249 meanLossQtgt_sigm: 0.1963

Episode: 586 meanReward: 9.4375 meanLoss: 0.4844 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2069 meanLossQtgt: 0.0855 meanLossQtgt_sigm: 0.1920
Episode: 587 meanReward: 9.4375 meanLoss: 0.4052 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1143 meanLossQtgt: 0.1546 meanLossQtgt_sigm: 0.1363
Episode: 588 meanReward: 9.4688 meanLoss: 0.4195 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1096 meanLossQtgt: 0.1690 meanLossQtgt_sigm: 0.1408
Episode: 589 meanReward: 9.4062 meanLoss: 0.4163 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1697 meanLossQtgt: 0.0788 meanLossQtgt_sigm: 0.1678
Episode: 590 meanReward: 9.4375 meanLoss: 0.3970 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1553 meanLossQtgt: 0.0821 meanLossQtgt_sigm: 0.1596
Episode: 591 meanReward: 9.5000 meanLoss: 0.3733 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1122 meanLossQtgt: 0.1255 meanLossQtgt_sigm: 0.1356
Episode: 592 meanReward: 9.5000 meanLoss: 0.3763 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1103 meanLossQtgt: 0.1319 meanLossQtgt_sigm: 0.1341

Episode: 644 meanReward: 9.4375 meanLoss: 0.4465 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2014 meanLossQtgt: 0.0542 meanLossQtgt_sigm: 0.1909
Episode: 645 meanReward: 9.4688 meanLoss: 0.3820 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1288 meanLossQtgt: 0.1040 meanLossQtgt_sigm: 0.1491
Episode: 646 meanReward: 9.4062 meanLoss: 0.3721 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1626 meanLossQtgt: 0.0434 meanLossQtgt_sigm: 0.1661
Episode: 647 meanReward: 9.4062 meanLoss: 0.3667 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1567 meanLossQtgt: 0.0470 meanLossQtgt_sigm: 0.1630
Episode: 648 meanReward: 9.4062 meanLoss: 0.3869 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1621 meanLossQtgt: 0.0625 meanLossQtgt_sigm: 0.1623
Episode: 649 meanReward: 9.4062 meanLoss: 0.3719 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1447 meanLossQtgt: 0.0760 meanLossQtgt_sigm: 0.1511
Episode: 650 meanReward: 9.4062 meanLoss: 0.3673 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1502 meanLossQtgt: 0.0642 meanLossQtgt_sigm: 0.1529

Episode: 702 meanReward: 9.2188 meanLoss: 0.3861 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0997 meanLossQtgt: 0.1594 meanLossQtgt_sigm: 0.1270
Episode: 703 meanReward: 9.2500 meanLoss: 0.3819 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1070 meanLossQtgt: 0.1427 meanLossQtgt_sigm: 0.1321
Episode: 704 meanReward: 9.2500 meanLoss: 0.3645 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1308 meanLossQtgt: 0.0928 meanLossQtgt_sigm: 0.1410
Episode: 705 meanReward: 9.2812 meanLoss: 0.3877 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1572 meanLossQtgt: 0.0730 meanLossQtgt_sigm: 0.1576
Episode: 706 meanReward: 9.2812 meanLoss: 0.3609 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1386 meanLossQtgt: 0.0796 meanLossQtgt_sigm: 0.1427
Episode: 707 meanReward: 9.3125 meanLoss: 0.3372 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1281 meanLossQtgt: 0.0715 meanLossQtgt_sigm: 0.1376
Episode: 708 meanReward: 9.2500 meanLoss: 0.3365 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1049 meanLossQtgt: 0.1051 meanLossQtgt_sigm: 0.1266

Episode: 761 meanReward: 9.3750 meanLoss: 0.3286 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1132 meanLossQtgt: 0.0834 meanLossQtgt_sigm: 0.1320
Episode: 762 meanReward: 9.3438 meanLoss: 0.3846 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1714 meanLossQtgt: 0.0488 meanLossQtgt_sigm: 0.1644
Episode: 763 meanReward: 9.3438 meanLoss: 0.4145 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1927 meanLossQtgt: 0.0414 meanLossQtgt_sigm: 0.1803
Episode: 764 meanReward: 9.3125 meanLoss: 0.3259 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1208 meanLossQtgt: 0.0732 meanLossQtgt_sigm: 0.1319
Episode: 765 meanReward: 9.3125 meanLoss: 0.3272 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1041 meanLossQtgt: 0.0960 meanLossQtgt_sigm: 0.1271
Episode: 766 meanReward: 9.3125 meanLoss: 0.3264 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1451 meanLossQtgt: 0.0353 meanLossQtgt_sigm: 0.1461
Episode: 767 meanReward: 9.3125 meanLoss: 0.3607 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1662 meanLossQtgt: 0.0307 meanLossQtgt_sigm: 0.1637

Episode: 820 meanReward: 9.3750 meanLoss: 0.5184 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2209 meanLossQtgt: 0.0967 meanLossQtgt_sigm: 0.2008
Episode: 821 meanReward: 9.4062 meanLoss: 0.4263 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1544 meanLossQtgt: 0.1181 meanLossQtgt_sigm: 0.1538
Episode: 822 meanReward: 9.3750 meanLoss: 0.3939 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1069 meanLossQtgt: 0.1544 meanLossQtgt_sigm: 0.1326
Episode: 823 meanReward: 9.3438 meanLoss: 0.3752 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1043 meanLossQtgt: 0.1422 meanLossQtgt_sigm: 0.1286
Episode: 824 meanReward: 9.3438 meanLoss: 0.3562 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1166 meanLossQtgt: 0.1053 meanLossQtgt_sigm: 0.1343
Episode: 825 meanReward: 9.3125 meanLoss: 0.3735 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1520 meanLossQtgt: 0.0633 meanLossQtgt_sigm: 0.1582
Episode: 826 meanReward: 9.3750 meanLoss: 0.4025 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1791 meanLossQtgt: 0.0532 meanLossQtgt_sigm: 0.1702

Episode: 878 meanReward: 9.5312 meanLoss: 0.3970 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1884 meanLossQtgt: 0.0250 meanLossQtgt_sigm: 0.1837
Episode: 879 meanReward: 9.5938 meanLoss: 0.3863 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1794 meanLossQtgt: 0.0373 meanLossQtgt_sigm: 0.1696
Episode: 880 meanReward: 9.5938 meanLoss: 0.3099 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1216 meanLossQtgt: 0.0570 meanLossQtgt_sigm: 0.1313
Episode: 881 meanReward: 9.5625 meanLoss: 0.3113 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0965 meanLossQtgt: 0.0968 meanLossQtgt_sigm: 0.1181
Episode: 882 meanReward: 9.5625 meanLoss: 0.3153 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1234 meanLossQtgt: 0.0578 meanLossQtgt_sigm: 0.1340
Episode: 883 meanReward: 9.5938 meanLoss: 0.3675 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1715 meanLossQtgt: 0.0331 meanLossQtgt_sigm: 0.1628
Episode: 884 meanReward: 9.5938 meanLoss: 0.3330 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1539 meanLossQtgt: 0.0288 meanLossQtgt_sigm: 0.1502

Episode: 937 meanReward: 9.3125 meanLoss: 0.4008 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1886 meanLossQtgt: 0.0281 meanLossQtgt_sigm: 0.1841
Episode: 938 meanReward: 9.2812 meanLoss: 0.4445 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2086 meanLossQtgt: 0.0375 meanLossQtgt_sigm: 0.1984
Episode: 939 meanReward: 9.2500 meanLoss: 0.3596 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1481 meanLossQtgt: 0.0593 meanLossQtgt_sigm: 0.1522
Episode: 940 meanReward: 9.2812 meanLoss: 0.3538 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1009 meanLossQtgt: 0.1260 meanLossQtgt_sigm: 0.1269
Episode: 941 meanReward: 9.2500 meanLoss: 0.3666 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1438 meanLossQtgt: 0.0703 meanLossQtgt_sigm: 0.1525
Episode: 942 meanReward: 9.2812 meanLoss: 0.5206 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2307 meanLossQtgt: 0.0770 meanLossQtgt_sigm: 0.2128
Episode: 943 meanReward: 9.3125 meanLoss: 0.4225 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1759 meanLossQtgt: 0.0806 meanLossQtgt_sigm: 0.1660

Episode: 996 meanReward: 9.2188 meanLoss: 0.3762 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0832 meanLossQtgt: 0.1739 meanLossQtgt_sigm: 0.1192
Episode: 997 meanReward: 9.2500 meanLoss: 0.3501 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1421 meanLossQtgt: 0.0586 meanLossQtgt_sigm: 0.1494
Episode: 998 meanReward: 9.2188 meanLoss: 0.4375 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2014 meanLossQtgt: 0.0484 meanLossQtgt_sigm: 0.1876
Episode: 999 meanReward: 9.2188 meanLoss: 0.3931 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1816 meanLossQtgt: 0.0404 meanLossQtgt_sigm: 0.1711
Episode: 1000 meanReward: 9.1875 meanLoss: 0.3200 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1140 meanLossQtgt: 0.0752 meanLossQtgt_sigm: 0.1307
Episode: 1001 meanReward: 9.2188 meanLoss: 0.3257 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1230 meanLossQtgt: 0.0644 meanLossQtgt_sigm: 0.1383
Episode: 1002 meanReward: 9.2500 meanLoss: 0.3226 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1469 meanLossQtgt: 0.0248 meanLossQtgt_sigm: 0.1

Episode: 1053 meanReward: 9.3125 meanLoss: 0.3183 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1457 meanLossQtgt: 0.0252 meanLossQtgt_sigm: 0.1474
Episode: 1054 meanReward: 9.2812 meanLoss: 0.3110 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1252 meanLossQtgt: 0.0496 meanLossQtgt_sigm: 0.1362
Episode: 1055 meanReward: 9.2812 meanLoss: 0.3248 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1228 meanLossQtgt: 0.0631 meanLossQtgt_sigm: 0.1389
Episode: 1056 meanReward: 9.3125 meanLoss: 0.4090 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1836 meanLossQtgt: 0.0480 meanLossQtgt_sigm: 0.1773
Episode: 1057 meanReward: 9.3125 meanLoss: 0.5038 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2221 meanLossQtgt: 0.0772 meanLossQtgt_sigm: 0.2046
Episode: 1058 meanReward: 9.3438 meanLoss: 0.4329 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1811 meanLossQtgt: 0.0807 meanLossQtgt_sigm: 0.1711
Episode: 1059 meanReward: 9.3125 meanLoss: 0.3852 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0905 meanLossQtgt: 0.1740 meanLossQtgt_sigm:

Episode: 1111 meanReward: 9.5000 meanLoss: 0.5402 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2055 meanLossQtgt: 0.1469 meanLossQtgt_sigm: 0.1878
Episode: 1112 meanReward: 9.4688 meanLoss: 0.6292 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2247 meanLossQtgt: 0.2052 meanLossQtgt_sigm: 0.1993
Episode: 1113 meanReward: 9.4375 meanLoss: 0.5581 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1544 meanLossQtgt: 0.2438 meanLossQtgt_sigm: 0.1598
Episode: 1114 meanReward: 9.4062 meanLoss: 0.5339 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1024 meanLossQtgt: 0.2968 meanLossQtgt_sigm: 0.1348
Episode: 1115 meanReward: 9.3750 meanLoss: 0.4605 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0860 meanLossQtgt: 0.2479 meanLossQtgt_sigm: 0.1267
Episode: 1116 meanReward: 9.4062 meanLoss: 0.4038 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1015 meanLossQtgt: 0.1718 meanLossQtgt_sigm: 0.1306
Episode: 1117 meanReward: 9.3750 meanLoss: 0.4072 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1368 meanLossQtgt: 0.1233 meanLossQtgt_sigm:

Episode: 1168 meanReward: 9.4375 meanLoss: 0.3249 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1235 meanLossQtgt: 0.0681 meanLossQtgt_sigm: 0.1333
Episode: 1169 meanReward: 9.4375 meanLoss: 0.3564 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1548 meanLossQtgt: 0.0490 meanLossQtgt_sigm: 0.1525
Episode: 1170 meanReward: 9.4375 meanLoss: 0.3750 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1708 meanLossQtgt: 0.0430 meanLossQtgt_sigm: 0.1612
Episode: 1171 meanReward: 9.4375 meanLoss: 0.3236 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1329 meanLossQtgt: 0.0536 meanLossQtgt_sigm: 0.1371
Episode: 1172 meanReward: 9.4375 meanLoss: 0.3139 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0994 meanLossQtgt: 0.0939 meanLossQtgt_sigm: 0.1205
Episode: 1173 meanReward: 9.4688 meanLoss: 0.2990 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1150 meanLossQtgt: 0.0556 meanLossQtgt_sigm: 0.1283
Episode: 1174 meanReward: 9.4688 meanLoss: 0.3218 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1445 meanLossQtgt: 0.0325 meanLossQtgt_sigm:

Episode: 1226 meanReward: 9.1562 meanLoss: 0.3560 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1113 meanLossQtgt: 0.1144 meanLossQtgt_sigm: 0.1303
Episode: 1227 meanReward: 9.1562 meanLoss: 0.3589 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0919 meanLossQtgt: 0.1453 meanLossQtgt_sigm: 0.1216
Episode: 1228 meanReward: 9.1562 meanLoss: 0.3178 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1222 meanLossQtgt: 0.0639 meanLossQtgt_sigm: 0.1316
Episode: 1229 meanReward: 9.1562 meanLoss: 0.3607 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1623 meanLossQtgt: 0.0438 meanLossQtgt_sigm: 0.1545
Episode: 1230 meanReward: 9.2188 meanLoss: 0.3710 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1710 meanLossQtgt: 0.0401 meanLossQtgt_sigm: 0.1599
Episode: 1231 meanReward: 9.2500 meanLoss: 0.3024 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1300 meanLossQtgt: 0.0401 meanLossQtgt_sigm: 0.1323
Episode: 1232 meanReward: 9.2500 meanLoss: 0.2954 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1011 meanLossQtgt: 0.0761 meanLossQtgt_sigm:

Episode: 1283 meanReward: 9.3438 meanLoss: 0.3081 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0986 meanLossQtgt: 0.0889 meanLossQtgt_sigm: 0.1206
Episode: 1284 meanReward: 9.3125 meanLoss: 0.3202 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1097 meanLossQtgt: 0.0801 meanLossQtgt_sigm: 0.1304
Episode: 1285 meanReward: 9.3125 meanLoss: 0.3523 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1646 meanLossQtgt: 0.0270 meanLossQtgt_sigm: 0.1607
Episode: 1286 meanReward: 9.3125 meanLoss: 0.3803 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1799 meanLossQtgt: 0.0301 meanLossQtgt_sigm: 0.1703
Episode: 1287 meanReward: 9.3125 meanLoss: 0.3174 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1404 meanLossQtgt: 0.0359 meanLossQtgt_sigm: 0.1411
Episode: 1288 meanReward: 9.3125 meanLoss: 0.2951 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0972 meanLossQtgt: 0.0794 meanLossQtgt_sigm: 0.1186
Episode: 1289 meanReward: 9.3438 meanLoss: 0.2971 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1163 meanLossQtgt: 0.0519 meanLossQtgt_sigm:

Episode: 1340 meanReward: 9.4375 meanLoss: 0.3063 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1265 meanLossQtgt: 0.0439 meanLossQtgt_sigm: 0.1359
Episode: 1341 meanReward: 9.4375 meanLoss: 0.2938 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1240 meanLossQtgt: 0.0363 meanLossQtgt_sigm: 0.1335
Episode: 1342 meanReward: 9.4375 meanLoss: 0.3036 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1419 meanLossQtgt: 0.0184 meanLossQtgt_sigm: 0.1433
Episode: 1343 meanReward: 9.4062 meanLoss: 0.3301 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1603 meanLossQtgt: 0.0158 meanLossQtgt_sigm: 0.1539
Episode: 1344 meanReward: 9.4375 meanLoss: 0.2920 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1412 meanLossQtgt: 0.0102 meanLossQtgt_sigm: 0.1406
Episode: 1345 meanReward: 9.5000 meanLoss: 0.2832 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1338 meanLossQtgt: 0.0144 meanLossQtgt_sigm: 0.1350
Episode: 1346 meanReward: 9.4375 meanLoss: 0.2983 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1330 meanLossQtgt: 0.0304 meanLossQtgt_sigm:

Episode: 1397 meanReward: 9.4062 meanLoss: 0.3949 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0996 meanLossQtgt: 0.1671 meanLossQtgt_sigm: 0.1282
Episode: 1398 meanReward: 9.4062 meanLoss: 0.3922 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0873 meanLossQtgt: 0.1811 meanLossQtgt_sigm: 0.1238
Episode: 1399 meanReward: 9.4062 meanLoss: 0.3412 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1226 meanLossQtgt: 0.0827 meanLossQtgt_sigm: 0.1359
Episode: 1400 meanReward: 9.4375 meanLoss: 0.3321 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1496 meanLossQtgt: 0.0351 meanLossQtgt_sigm: 0.1474
Episode: 1401 meanReward: 9.4062 meanLoss: 0.3148 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1457 meanLossQtgt: 0.0251 meanLossQtgt_sigm: 0.1440
Episode: 1402 meanReward: 9.4375 meanLoss: 0.2818 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1209 meanLossQtgt: 0.0329 meanLossQtgt_sigm: 0.1280
Episode: 1403 meanReward: 9.4375 meanLoss: 0.2653 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1130 meanLossQtgt: 0.0311 meanLossQtgt_sigm:

Episode: 1454 meanReward: 9.3438 meanLoss: 0.3119 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1505 meanLossQtgt: 0.0134 meanLossQtgt_sigm: 0.1480
Episode: 1455 meanReward: 9.3750 meanLoss: 0.3234 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1567 meanLossQtgt: 0.0149 meanLossQtgt_sigm: 0.1518
Episode: 1456 meanReward: 9.3438 meanLoss: 0.2838 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1261 meanLossQtgt: 0.0246 meanLossQtgt_sigm: 0.1332
Episode: 1457 meanReward: 9.3438 meanLoss: 0.2893 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1304 meanLossQtgt: 0.0223 meanLossQtgt_sigm: 0.1366
Episode: 1458 meanReward: 9.3438 meanLoss: 0.3255 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1573 meanLossQtgt: 0.0129 meanLossQtgt_sigm: 0.1553
Episode: 1459 meanReward: 9.3438 meanLoss: 0.3310 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1601 meanLossQtgt: 0.0149 meanLossQtgt_sigm: 0.1560
Episode: 1460 meanReward: 9.3125 meanLoss: 0.2839 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1193 meanLossQtgt: 0.0344 meanLossQtgt_sigm:

Episode: 1511 meanReward: 9.3438 meanLoss: 0.2987 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0893 meanLossQtgt: 0.0961 meanLossQtgt_sigm: 0.1133
Episode: 1512 meanReward: 9.3125 meanLoss: 0.2903 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1084 meanLossQtgt: 0.0589 meanLossQtgt_sigm: 0.1230
Episode: 1513 meanReward: 9.2188 meanLoss: 0.3590 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1619 meanLossQtgt: 0.0381 meanLossQtgt_sigm: 0.1590
Episode: 1514 meanReward: 9.2812 meanLoss: 0.4502 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2090 meanLossQtgt: 0.0470 meanLossQtgt_sigm: 0.1941
Episode: 1515 meanReward: 9.2812 meanLoss: 0.3658 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1564 meanLossQtgt: 0.0543 meanLossQtgt_sigm: 0.1550
Episode: 1516 meanReward: 9.2500 meanLoss: 0.3287 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1025 meanLossQtgt: 0.1026 meanLossQtgt_sigm: 0.1237
Episode: 1517 meanReward: 9.2500 meanLoss: 0.3114 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1033 meanLossQtgt: 0.0856 meanLossQtgt_sigm:

Episode: 1569 meanReward: 9.0938 meanLoss: 0.3226 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1501 meanLossQtgt: 0.0219 meanLossQtgt_sigm: 0.1506
Episode: 1570 meanReward: 9.0312 meanLoss: 0.3370 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1594 meanLossQtgt: 0.0199 meanLossQtgt_sigm: 0.1577
Episode: 1571 meanReward: 9.0625 meanLoss: 0.3108 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1446 meanLossQtgt: 0.0185 meanLossQtgt_sigm: 0.1477
Episode: 1572 meanReward: 9.1250 meanLoss: 0.2986 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1376 meanLossQtgt: 0.0188 meanLossQtgt_sigm: 0.1422
Episode: 1573 meanReward: 9.0938 meanLoss: 0.3355 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1570 meanLossQtgt: 0.0256 meanLossQtgt_sigm: 0.1529
Episode: 1574 meanReward: 9.0938 meanLoss: 0.3202 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1499 meanLossQtgt: 0.0233 meanLossQtgt_sigm: 0.1470
Episode: 1575 meanReward: 9.0938 meanLoss: 0.3134 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1382 meanLossQtgt: 0.0345 meanLossQtgt_sigm:

Episode: 1626 meanReward: 9.2188 meanLoss: 0.3413 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1099 meanLossQtgt: 0.0987 meanLossQtgt_sigm: 0.1327
Episode: 1627 meanReward: 9.2500 meanLoss: 0.3654 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1598 meanLossQtgt: 0.0459 meanLossQtgt_sigm: 0.1597
Episode: 1628 meanReward: 9.2812 meanLoss: 0.3783 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1779 meanLossQtgt: 0.0329 meanLossQtgt_sigm: 0.1676
Episode: 1629 meanReward: 9.2500 meanLoss: 0.3102 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1387 meanLossQtgt: 0.0318 meanLossQtgt_sigm: 0.1397
Episode: 1630 meanReward: 9.2500 meanLoss: 0.2904 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1165 meanLossQtgt: 0.0456 meanLossQtgt_sigm: 0.1282
Episode: 1631 meanReward: 9.2812 meanLoss: 0.2961 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1330 meanLossQtgt: 0.0263 meanLossQtgt_sigm: 0.1368
Episode: 1632 meanReward: 9.2812 meanLoss: 0.3220 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1540 meanLossQtgt: 0.0179 meanLossQtgt_sigm:

Episode: 1684 meanReward: 9.2500 meanLoss: 0.3614 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1734 meanLossQtgt: 0.0238 meanLossQtgt_sigm: 0.1642
Episode: 1685 meanReward: 9.2812 meanLoss: 0.3233 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1532 meanLossQtgt: 0.0203 meanLossQtgt_sigm: 0.1497
Episode: 1686 meanReward: 9.2500 meanLoss: 0.2786 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1154 meanLossQtgt: 0.0367 meanLossQtgt_sigm: 0.1265
Episode: 1687 meanReward: 9.2500 meanLoss: 0.2776 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1222 meanLossQtgt: 0.0253 meanLossQtgt_sigm: 0.1301
Episode: 1688 meanReward: 9.2500 meanLoss: 0.3196 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1548 meanLossQtgt: 0.0147 meanLossQtgt_sigm: 0.1500
Episode: 1689 meanReward: 9.2500 meanLoss: 0.3409 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1657 meanLossQtgt: 0.0171 meanLossQtgt_sigm: 0.1581
Episode: 1690 meanReward: 9.2188 meanLoss: 0.3046 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1393 meanLossQtgt: 0.0257 meanLossQtgt_sigm:

Episode: 1742 meanReward: 9.4688 meanLoss: 0.2928 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1355 meanLossQtgt: 0.0174 meanLossQtgt_sigm: 0.1400
Episode: 1743 meanReward: 9.4688 meanLoss: 0.3269 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1601 meanLossQtgt: 0.0110 meanLossQtgt_sigm: 0.1558
Episode: 1744 meanReward: 9.4375 meanLoss: 0.3315 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1626 meanLossQtgt: 0.0108 meanLossQtgt_sigm: 0.1581
Episode: 1745 meanReward: 9.4375 meanLoss: 0.3104 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1473 meanLossQtgt: 0.0142 meanLossQtgt_sigm: 0.1489
Episode: 1746 meanReward: 9.4062 meanLoss: 0.3044 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1426 meanLossQtgt: 0.0153 meanLossQtgt_sigm: 0.1465
Episode: 1747 meanReward: 9.4062 meanLoss: 0.3213 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1549 meanLossQtgt: 0.0129 meanLossQtgt_sigm: 0.1535
Episode: 1748 meanReward: 9.3438 meanLoss: 0.3169 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1529 meanLossQtgt: 0.0133 meanLossQtgt_sigm:

Episode: 1799 meanReward: 9.3438 meanLoss: 0.2859 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1335 meanLossQtgt: 0.0176 meanLossQtgt_sigm: 0.1347
Episode: 1800 meanReward: 9.3125 meanLoss: 0.2913 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1378 meanLossQtgt: 0.0171 meanLossQtgt_sigm: 0.1365
Episode: 1801 meanReward: 9.2812 meanLoss: 0.2730 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1217 meanLossQtgt: 0.0240 meanLossQtgt_sigm: 0.1273
Episode: 1802 meanReward: 9.3125 meanLoss: 0.3007 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1409 meanLossQtgt: 0.0182 meanLossQtgt_sigm: 0.1416
Episode: 1803 meanReward: 9.2500 meanLoss: 0.2942 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1404 meanLossQtgt: 0.0120 meanLossQtgt_sigm: 0.1418
Episode: 1804 meanReward: 9.3125 meanLoss: 0.3015 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1426 meanLossQtgt: 0.0133 meanLossQtgt_sigm: 0.1455
Episode: 1805 meanReward: 9.3750 meanLoss: 0.3464 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1661 meanLossQtgt: 0.0195 meanLossQtgt_sigm:

Episode: 1856 meanReward: 9.4062 meanLoss: 0.3948 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1869 meanLossQtgt: 0.0327 meanLossQtgt_sigm: 0.1753
Episode: 1857 meanReward: 9.3438 meanLoss: 0.3609 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1684 meanLossQtgt: 0.0331 meanLossQtgt_sigm: 0.1594
Episode: 1858 meanReward: 9.3750 meanLoss: 0.3156 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0988 meanLossQtgt: 0.0964 meanLossQtgt_sigm: 0.1204
Episode: 1859 meanReward: 9.3438 meanLoss: 0.3186 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1250 meanLossQtgt: 0.0590 meanLossQtgt_sigm: 0.1346
Episode: 1860 meanReward: 9.3125 meanLoss: 0.3493 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1611 meanLossQtgt: 0.0313 meanLossQtgt_sigm: 0.1569
Episode: 1861 meanReward: 9.3438 meanLoss: 0.3374 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1578 meanLossQtgt: 0.0272 meanLossQtgt_sigm: 0.1524
Episode: 1862 meanReward: 9.3125 meanLoss: 0.2916 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1087 meanLossQtgt: 0.0586 meanLossQtgt_sigm:

Episode: 1914 meanReward: 9.4062 meanLoss: 0.5891 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0547 meanLossQtgt: 0.4236 meanLossQtgt_sigm: 0.1108
Episode: 1915 meanReward: 9.3750 meanLoss: 0.6052 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0521 meanLossQtgt: 0.4379 meanLossQtgt_sigm: 0.1152
Episode: 1916 meanReward: 9.4062 meanLoss: 0.4008 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1243 meanLossQtgt: 0.1394 meanLossQtgt_sigm: 0.1371
Episode: 1917 meanReward: 9.4062 meanLoss: 0.4642 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1942 meanLossQtgt: 0.0924 meanLossQtgt_sigm: 0.1776
Episode: 1918 meanReward: 9.4062 meanLoss: 0.4422 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1792 meanLossQtgt: 0.0972 meanLossQtgt_sigm: 0.1658
Episode: 1919 meanReward: 9.4062 meanLoss: 0.3581 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1073 meanLossQtgt: 0.1254 meanLossQtgt_sigm: 0.1254
Episode: 1920 meanReward: 9.4062 meanLoss: 0.4391 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0645 meanLossQtgt: 0.2623 meanLossQtgt_sigm:

Episode: 1971 meanReward: 9.1562 meanLoss: 0.3203 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1547 meanLossQtgt: 0.0143 meanLossQtgt_sigm: 0.1513
Episode: 1972 meanReward: 9.1875 meanLoss: 0.3058 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1389 meanLossQtgt: 0.0224 meanLossQtgt_sigm: 0.1445
Episode: 1973 meanReward: 9.1562 meanLoss: 0.3065 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1367 meanLossQtgt: 0.0252 meanLossQtgt_sigm: 0.1446
Episode: 1974 meanReward: 9.2188 meanLoss: 0.3274 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1556 meanLossQtgt: 0.0147 meanLossQtgt_sigm: 0.1571
Episode: 1975 meanReward: 9.1875 meanLoss: 0.3527 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1689 meanLossQtgt: 0.0184 meanLossQtgt_sigm: 0.1654
Episode: 1976 meanReward: 9.2188 meanLoss: 0.3197 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1420 meanLossQtgt: 0.0308 meanLossQtgt_sigm: 0.1468
Episode: 1977 meanReward: 9.2500 meanLoss: 0.3461 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1591 meanLossQtgt: 0.0312 meanLossQtgt_sigm:

Episode: 2028 meanReward: 9.5938 meanLoss: 0.2762 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1326 meanLossQtgt: 0.0089 meanLossQtgt_sigm: 0.1346
Episode: 2029 meanReward: 9.5625 meanLoss: 0.2788 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1339 meanLossQtgt: 0.0087 meanLossQtgt_sigm: 0.1362
Episode: 2030 meanReward: 9.5312 meanLoss: 0.2858 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1392 meanLossQtgt: 0.0058 meanLossQtgt_sigm: 0.1408
Episode: 2031 meanReward: 9.5625 meanLoss: 0.3083 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1531 meanLossQtgt: 0.0044 meanLossQtgt_sigm: 0.1508
Episode: 2032 meanReward: 9.5938 meanLoss: 0.3067 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1497 meanLossQtgt: 0.0089 meanLossQtgt_sigm: 0.1481
Episode: 2033 meanReward: 9.5312 meanLoss: 0.2834 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1347 meanLossQtgt: 0.0113 meanLossQtgt_sigm: 0.1374
Episode: 2034 meanReward: 9.5000 meanLoss: 0.2930 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1433 meanLossQtgt: 0.0069 meanLossQtgt_sigm:

Episode: 2085 meanReward: 9.3750 meanLoss: 0.3788 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1485 meanLossQtgt: 0.0789 meanLossQtgt_sigm: 0.1514
Episode: 2086 meanReward: 9.3750 meanLoss: 0.4830 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2109 meanLossQtgt: 0.0823 meanLossQtgt_sigm: 0.1898
Episode: 2087 meanReward: 9.4062 meanLoss: 0.3991 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1547 meanLossQtgt: 0.0925 meanLossQtgt_sigm: 0.1518
Episode: 2088 meanReward: 9.3438 meanLoss: 0.4518 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0702 meanLossQtgt: 0.2652 meanLossQtgt_sigm: 0.1163
Episode: 2089 meanReward: 9.3438 meanLoss: 0.3987 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.0882 meanLossQtgt: 0.1871 meanLossQtgt_sigm: 0.1235
Episode: 2090 meanReward: 9.3125 meanLoss: 0.4613 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1754 meanLossQtgt: 0.1200 meanLossQtgt_sigm: 0.1659
Episode: 2091 meanReward: 9.2812 meanLoss: 0.6201 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.2389 meanLossQtgt: 0.1727 meanLossQtgt_sigm:

Episode: 2143 meanReward: 9.2812 meanLoss: 0.3184 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1524 meanLossQtgt: 0.0152 meanLossQtgt_sigm: 0.1508
Episode: 2144 meanReward: 9.2812 meanLoss: 0.3325 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1625 meanLossQtgt: 0.0141 meanLossQtgt_sigm: 0.1558
Episode: 2145 meanReward: 9.2812 meanLoss: 0.2930 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1332 meanLossQtgt: 0.0238 meanLossQtgt_sigm: 0.1360
Episode: 2146 meanReward: 9.2812 meanLoss: 0.2929 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1288 meanLossQtgt: 0.0291 meanLossQtgt_sigm: 0.1351
Episode: 2147 meanReward: 9.3125 meanLoss: 0.2779 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1273 meanLossQtgt: 0.0191 meanLossQtgt_sigm: 0.1315
Episode: 2148 meanReward: 9.3438 meanLoss: 0.2742 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1290 meanLossQtgt: 0.0141 meanLossQtgt_sigm: 0.1310
Episode: 2149 meanReward: 9.2812 meanLoss: 0.2881 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1382 meanLossQtgt: 0.0128 meanLossQtgt_sigm:

Episode: 2200 meanReward: 9.2500 meanLoss: 0.2899 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1225 meanLossQtgt: 0.0361 meanLossQtgt_sigm: 0.1313
Episode: 2201 meanReward: 9.2188 meanLoss: 0.2918 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1350 meanLossQtgt: 0.0191 meanLossQtgt_sigm: 0.1377
Episode: 2202 meanReward: 9.2188 meanLoss: 0.3295 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1602 meanLossQtgt: 0.0159 meanLossQtgt_sigm: 0.1534
Episode: 2203 meanReward: 9.2500 meanLoss: 0.2916 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1367 meanLossQtgt: 0.0192 meanLossQtgt_sigm: 0.1357
Episode: 2204 meanReward: 9.2500 meanLoss: 0.2669 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1126 meanLossQtgt: 0.0323 meanLossQtgt_sigm: 0.1220
Episode: 2205 meanReward: 9.2812 meanLoss: 0.2700 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1125 meanLossQtgt: 0.0347 meanLossQtgt_sigm: 0.1228
Episode: 2206 meanReward: 9.3438 meanLoss: 0.3352 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1598 meanLossQtgt: 0.0230 meanLossQtgt_sigm:

Episode: 2257 meanReward: 9.5000 meanLoss: 0.3177 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1495 meanLossQtgt: 0.0231 meanLossQtgt_sigm: 0.1452
Episode: 2258 meanReward: 9.5312 meanLoss: 0.2784 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1094 meanLossQtgt: 0.0478 meanLossQtgt_sigm: 0.1211
Episode: 2259 meanReward: 9.4688 meanLoss: 0.2828 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1187 meanLossQtgt: 0.0379 meanLossQtgt_sigm: 0.1262
Episode: 2260 meanReward: 9.4375 meanLoss: 0.3096 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1356 meanLossQtgt: 0.0340 meanLossQtgt_sigm: 0.1400
Episode: 2261 meanReward: 9.4375 meanLoss: 0.3386 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1577 meanLossQtgt: 0.0258 meanLossQtgt_sigm: 0.1551
Episode: 2262 meanReward: 9.4375 meanLoss: 0.3358 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1513 meanLossQtgt: 0.0341 meanLossQtgt_sigm: 0.1504
Episode: 2263 meanReward: 9.4062 meanLoss: 0.3015 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1285 meanLossQtgt: 0.0396 meanLossQtgt_sigm:

Episode: 2314 meanReward: 9.3750 meanLoss: 0.3136 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1304 meanLossQtgt: 0.0438 meanLossQtgt_sigm: 0.1394
Episode: 2315 meanReward: 9.3438 meanLoss: 0.3034 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1081 meanLossQtgt: 0.0676 meanLossQtgt_sigm: 0.1277
Episode: 2316 meanReward: 9.3750 meanLoss: 0.3157 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1492 meanLossQtgt: 0.0180 meanLossQtgt_sigm: 0.1484
Episode: 2317 meanReward: 9.3438 meanLoss: 0.3596 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1747 meanLossQtgt: 0.0202 meanLossQtgt_sigm: 0.1647
Episode: 2318 meanReward: 9.3750 meanLoss: 0.3129 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1480 meanLossQtgt: 0.0192 meanLossQtgt_sigm: 0.1457
Episode: 2319 meanReward: 9.3750 meanLoss: 0.2916 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1212 meanLossQtgt: 0.0399 meanLossQtgt_sigm: 0.1305
Episode: 2320 meanReward: 9.3750 meanLoss: 0.2987 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1238 meanLossQtgt: 0.0413 meanLossQtgt_sigm:

Episode: 2372 meanReward: 9.2188 meanLoss: 0.3209 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1541 meanLossQtgt: 0.0180 meanLossQtgt_sigm: 0.1489
Episode: 2373 meanReward: 9.2188 meanLoss: 0.2837 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1216 meanLossQtgt: 0.0337 meanLossQtgt_sigm: 0.1284
Episode: 2374 meanReward: 9.2188 meanLoss: 0.2870 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1029 meanLossQtgt: 0.0629 meanLossQtgt_sigm: 0.1212
Episode: 2375 meanReward: 9.2188 meanLoss: 0.3034 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1377 meanLossQtgt: 0.0231 meanLossQtgt_sigm: 0.1426
Episode: 2376 meanReward: 9.2188 meanLoss: 0.3631 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1766 meanLossQtgt: 0.0176 meanLossQtgt_sigm: 0.1688
Episode: 2377 meanReward: 9.2500 meanLoss: 0.3182 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1534 meanLossQtgt: 0.0144 meanLossQtgt_sigm: 0.1503
Episode: 2378 meanReward: 9.2812 meanLoss: 0.2687 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1118 meanLossQtgt: 0.0336 meanLossQtgt_sigm:

Episode: 2429 meanReward: 9.2812 meanLoss: 0.2914 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1148 meanLossQtgt: 0.0477 meanLossQtgt_sigm: 0.1289
Episode: 2430 meanReward: 9.2188 meanLoss: 0.3020 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1358 meanLossQtgt: 0.0246 meanLossQtgt_sigm: 0.1416
Episode: 2431 meanReward: 9.2188 meanLoss: 0.3426 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1650 meanLossQtgt: 0.0166 meanLossQtgt_sigm: 0.1609
Episode: 2432 meanReward: 9.2500 meanLoss: 0.3520 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1704 meanLossQtgt: 0.0188 meanLossQtgt_sigm: 0.1628
Episode: 2433 meanReward: 9.2812 meanLoss: 0.2867 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1265 meanLossQtgt: 0.0276 meanLossQtgt_sigm: 0.1326
Episode: 2434 meanReward: 9.2188 meanLoss: 0.2785 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1257 meanLossQtgt: 0.0220 meanLossQtgt_sigm: 0.1308
Episode: 2435 meanReward: 9.2812 meanLoss: 0.2833 meanLossQlbl: 0.0000 meanLossQlbl_sigm: 0.1336 meanLossQtgt: 0.0133 meanLossQtgt_sigm:

Episode: 2486 meanReward: 13.5000 meanLoss: 22.5426 meanLossQlbl: 14.0024 meanLossQlbl_sigm: 0.6900 meanLossQtgt: 7.1557 meanLossQtgt_sigm: 0.6944
Episode: 2487 meanReward: 14.0000 meanLoss: 26.3023 meanLossQlbl: 16.1681 meanLossQlbl_sigm: 0.6930 meanLossQtgt: 8.7481 meanLossQtgt_sigm: 0.6932
Episode: 2488 meanReward: 14.1875 meanLoss: 26.7636 meanLossQlbl: 16.6558 meanLossQlbl_sigm: 0.6930 meanLossQtgt: 8.7214 meanLossQtgt_sigm: 0.6935
Episode: 2489 meanReward: 14.3438 meanLoss: 27.2647 meanLossQlbl: 17.0801 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7983 meanLossQtgt_sigm: 0.6931
Episode: 2490 meanReward: 14.6875 meanLoss: 28.6835 meanLossQlbl: 18.0126 meanLossQlbl_sigm: 0.6926 meanLossQtgt: 9.2853 meanLossQtgt_sigm: 0.6931
Episode: 2491 meanReward: 15.0625 meanLoss: 31.5480 meanLossQlbl: 19.4789 meanLossQlbl_sigm: 0.6930 meanLossQtgt: 10.6827 meanLossQtgt_sigm: 0.6934
Episode: 2492 meanReward: 15.3750 meanLoss: 31.1560 meanLossQlbl: 19.2320 meanLossQlbl_sigm: 0.6930 meanLossQtgt: 10.

Episode: 2542 meanReward: 18.0000 meanLoss: 28.2067 meanLossQlbl: 17.7434 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.0770 meanLossQtgt_sigm: 0.6932
Episode: 2543 meanReward: 17.8750 meanLoss: 28.4214 meanLossQlbl: 17.8917 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.1433 meanLossQtgt_sigm: 0.6932
Episode: 2544 meanReward: 18.0938 meanLoss: 27.6541 meanLossQlbl: 17.1933 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.0745 meanLossQtgt_sigm: 0.6932
Episode: 2545 meanReward: 18.0938 meanLoss: 26.0305 meanLossQlbl: 16.2582 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3861 meanLossQtgt_sigm: 0.6932
Episode: 2546 meanReward: 18.0938 meanLoss: 27.6979 meanLossQlbl: 17.2556 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.0560 meanLossQtgt_sigm: 0.6932
Episode: 2547 meanReward: 17.9375 meanLoss: 27.5451 meanLossQlbl: 17.1428 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.0160 meanLossQtgt_sigm: 0.6932
Episode: 2548 meanReward: 18.1250 meanLoss: 28.5341 meanLossQlbl: 17.6713 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.47

Episode: 2598 meanReward: 18.5000 meanLoss: 25.9688 meanLossQlbl: 16.3652 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2172 meanLossQtgt_sigm: 0.6932
Episode: 2599 meanReward: 18.1562 meanLoss: 26.1932 meanLossQlbl: 16.4902 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3167 meanLossQtgt_sigm: 0.6931
Episode: 2600 meanReward: 18.2188 meanLoss: 25.3373 meanLossQlbl: 15.9879 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9630 meanLossQtgt_sigm: 0.6931
Episode: 2601 meanReward: 18.1250 meanLoss: 26.0912 meanLossQlbl: 16.4899 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2150 meanLossQtgt_sigm: 0.6932
Episode: 2602 meanReward: 17.8438 meanLoss: 23.0928 meanLossQlbl: 14.7519 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 6.9547 meanLossQtgt_sigm: 0.6932
Episode: 2603 meanReward: 18.1562 meanLoss: 27.0169 meanLossQlbl: 16.7770 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8536 meanLossQtgt_sigm: 0.6931
Episode: 2604 meanReward: 18.1250 meanLoss: 25.9587 meanLossQlbl: 16.1978 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.37

Episode: 2654 meanReward: 16.9688 meanLoss: 28.1542 meanLossQlbl: 17.7044 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.0635 meanLossQtgt_sigm: 0.6931
Episode: 2655 meanReward: 17.1250 meanLoss: 26.8527 meanLossQlbl: 16.9176 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5488 meanLossQtgt_sigm: 0.6931
Episode: 2656 meanReward: 17.0000 meanLoss: 26.7534 meanLossQlbl: 16.9020 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4651 meanLossQtgt_sigm: 0.6931
Episode: 2657 meanReward: 17.0625 meanLoss: 27.4953 meanLossQlbl: 17.3262 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7829 meanLossQtgt_sigm: 0.6931
Episode: 2658 meanReward: 16.9688 meanLoss: 25.6446 meanLossQlbl: 16.1636 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0947 meanLossQtgt_sigm: 0.6931
Episode: 2659 meanReward: 17.0938 meanLoss: 24.0975 meanLossQlbl: 15.0816 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.6296 meanLossQtgt_sigm: 0.6931
Episode: 2660 meanReward: 17.1875 meanLoss: 27.2085 meanLossQlbl: 16.8259 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.99

Episode: 2710 meanReward: 18.8438 meanLoss: 25.1842 meanLossQlbl: 15.7428 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0551 meanLossQtgt_sigm: 0.6931
Episode: 2711 meanReward: 18.8438 meanLoss: 26.8631 meanLossQlbl: 16.8970 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5799 meanLossQtgt_sigm: 0.6931
Episode: 2712 meanReward: 18.9062 meanLoss: 27.0788 meanLossQlbl: 17.0625 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6300 meanLossQtgt_sigm: 0.6932
Episode: 2713 meanReward: 18.9062 meanLoss: 23.9789 meanLossQlbl: 15.1320 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.4606 meanLossQtgt_sigm: 0.6931
Episode: 2714 meanReward: 18.8438 meanLoss: 25.6058 meanLossQlbl: 15.9635 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2560 meanLossQtgt_sigm: 0.6931
Episode: 2715 meanReward: 18.7812 meanLoss: 26.4311 meanLossQlbl: 16.5196 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5251 meanLossQtgt_sigm: 0.6931
Episode: 2716 meanReward: 18.5312 meanLoss: 26.0996 meanLossQlbl: 16.4121 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.30

Episode: 2766 meanReward: 17.9062 meanLoss: 25.5196 meanLossQlbl: 15.9766 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.1567 meanLossQtgt_sigm: 0.6931
Episode: 2767 meanReward: 18.0000 meanLoss: 26.5259 meanLossQlbl: 16.4402 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6994 meanLossQtgt_sigm: 0.6931
Episode: 2768 meanReward: 17.7812 meanLoss: 25.9105 meanLossQlbl: 16.0993 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4249 meanLossQtgt_sigm: 0.6931
Episode: 2769 meanReward: 17.7812 meanLoss: 27.2905 meanLossQlbl: 17.0294 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8748 meanLossQtgt_sigm: 0.6931
Episode: 2770 meanReward: 17.8438 meanLoss: 26.3798 meanLossQlbl: 16.4892 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5043 meanLossQtgt_sigm: 0.6931
Episode: 2771 meanReward: 17.7188 meanLoss: 23.1699 meanLossQlbl: 14.6828 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.1008 meanLossQtgt_sigm: 0.6931
Episode: 2772 meanReward: 17.4375 meanLoss: 23.3343 meanLossQlbl: 14.7945 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.15

Episode: 2822 meanReward: 17.7500 meanLoss: 26.0375 meanLossQlbl: 16.4322 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2190 meanLossQtgt_sigm: 0.6931
Episode: 2823 meanReward: 17.6562 meanLoss: 23.8908 meanLossQlbl: 15.2191 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.2855 meanLossQtgt_sigm: 0.6931
Episode: 2824 meanReward: 17.8125 meanLoss: 27.2539 meanLossQlbl: 16.9336 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9340 meanLossQtgt_sigm: 0.6931
Episode: 2825 meanReward: 17.7500 meanLoss: 23.3867 meanLossQlbl: 14.5371 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.4633 meanLossQtgt_sigm: 0.6931
Episode: 2826 meanReward: 17.6562 meanLoss: 18.6192 meanLossQlbl: 12.0107 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 5.2222 meanLossQtgt_sigm: 0.6931
Episode: 2827 meanReward: 17.6875 meanLoss: 25.9791 meanLossQlbl: 16.1447 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4482 meanLossQtgt_sigm: 0.6931
Episode: 2828 meanReward: 17.7500 meanLoss: 27.7006 meanLossQlbl: 17.1114 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.20

Episode: 2878 meanReward: 18.5625 meanLoss: 26.2731 meanLossQlbl: 16.2888 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5979 meanLossQtgt_sigm: 0.6931
Episode: 2879 meanReward: 18.5938 meanLoss: 26.8678 meanLossQlbl: 16.6907 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7908 meanLossQtgt_sigm: 0.6931
Episode: 2880 meanReward: 18.8750 meanLoss: 26.7503 meanLossQlbl: 16.5359 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8282 meanLossQtgt_sigm: 0.6931
Episode: 2881 meanReward: 18.7500 meanLoss: 26.9228 meanLossQlbl: 16.7361 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8005 meanLossQtgt_sigm: 0.6931
Episode: 2882 meanReward: 18.8750 meanLoss: 26.1096 meanLossQlbl: 16.3370 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3863 meanLossQtgt_sigm: 0.6931
Episode: 2883 meanReward: 18.8125 meanLoss: 25.3411 meanLossQlbl: 15.6504 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3044 meanLossQtgt_sigm: 0.6931
Episode: 2884 meanReward: 18.7188 meanLoss: 23.4045 meanLossQlbl: 14.5973 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.42

Episode: 2934 meanReward: 18.2500 meanLoss: 23.6972 meanLossQlbl: 14.8532 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.4577 meanLossQtgt_sigm: 0.6931
Episode: 2935 meanReward: 18.2500 meanLoss: 24.2585 meanLossQlbl: 15.3031 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.5691 meanLossQtgt_sigm: 0.6931
Episode: 2936 meanReward: 18.2188 meanLoss: 27.6014 meanLossQlbl: 17.2364 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9787 meanLossQtgt_sigm: 0.6931
Episode: 2937 meanReward: 18.2500 meanLoss: 26.9273 meanLossQlbl: 16.9745 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5665 meanLossQtgt_sigm: 0.6931
Episode: 2938 meanReward: 18.2188 meanLoss: 25.9162 meanLossQlbl: 16.5681 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9619 meanLossQtgt_sigm: 0.6931
Episode: 2939 meanReward: 18.1875 meanLoss: 26.5456 meanLossQlbl: 16.8510 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3083 meanLossQtgt_sigm: 0.6931
Episode: 2940 meanReward: 18.2188 meanLoss: 27.1565 meanLossQlbl: 17.1900 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.58

Episode: 2990 meanReward: 18.2500 meanLoss: 25.3648 meanLossQlbl: 15.7844 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.1941 meanLossQtgt_sigm: 0.6931
Episode: 2991 meanReward: 18.2812 meanLoss: 25.6345 meanLossQlbl: 15.9990 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2492 meanLossQtgt_sigm: 0.6931
Episode: 2992 meanReward: 18.0312 meanLoss: 27.6772 meanLossQlbl: 17.1344 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.1565 meanLossQtgt_sigm: 0.6931
Episode: 2993 meanReward: 18.1250 meanLoss: 25.1905 meanLossQlbl: 15.6658 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.1384 meanLossQtgt_sigm: 0.6931
Episode: 2994 meanReward: 18.0938 meanLoss: 25.3941 meanLossQlbl: 15.8398 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.1680 meanLossQtgt_sigm: 0.6931
Episode: 2995 meanReward: 18.1875 meanLoss: 27.1199 meanLossQlbl: 16.9531 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7805 meanLossQtgt_sigm: 0.6931
Episode: 2996 meanReward: 18.2812 meanLoss: 27.6012 meanLossQlbl: 17.3701 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.84

Episode: 3046 meanReward: 18.5000 meanLoss: 26.4404 meanLossQlbl: 16.5795 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4746 meanLossQtgt_sigm: 0.6931
Episode: 3047 meanReward: 18.5625 meanLoss: 27.4701 meanLossQlbl: 17.2619 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8219 meanLossQtgt_sigm: 0.6931
Episode: 3048 meanReward: 18.4688 meanLoss: 26.7055 meanLossQlbl: 16.8772 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4420 meanLossQtgt_sigm: 0.6931
Episode: 3049 meanReward: 18.5000 meanLoss: 26.9955 meanLossQlbl: 17.0585 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5507 meanLossQtgt_sigm: 0.6931
Episode: 3050 meanReward: 18.6250 meanLoss: 25.4752 meanLossQlbl: 16.3223 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.7667 meanLossQtgt_sigm: 0.6931
Episode: 3051 meanReward: 18.7500 meanLoss: 25.7740 meanLossQlbl: 16.0924 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2953 meanLossQtgt_sigm: 0.6931
Episode: 3052 meanReward: 18.6875 meanLoss: 25.5666 meanLossQlbl: 15.8239 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.35

Episode: 3102 meanReward: 17.0938 meanLoss: 26.6900 meanLossQlbl: 16.5418 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7620 meanLossQtgt_sigm: 0.6931
Episode: 3103 meanReward: 16.8125 meanLoss: 24.6491 meanLossQlbl: 15.2980 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9648 meanLossQtgt_sigm: 0.6931
Episode: 3104 meanReward: 17.0625 meanLoss: 23.5428 meanLossQlbl: 14.7240 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.4325 meanLossQtgt_sigm: 0.6931
Episode: 3105 meanReward: 17.3125 meanLoss: 26.5290 meanLossQlbl: 16.4300 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7127 meanLossQtgt_sigm: 0.6931
Episode: 3106 meanReward: 17.3125 meanLoss: 23.5110 meanLossQlbl: 14.6981 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.4267 meanLossQtgt_sigm: 0.6931
Episode: 3107 meanReward: 17.2500 meanLoss: 20.1385 meanLossQlbl: 12.8506 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 5.9015 meanLossQtgt_sigm: 0.6931
Episode: 3108 meanReward: 17.5312 meanLoss: 26.2668 meanLossQlbl: 16.2919 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.58

Episode: 3158 meanReward: 17.9375 meanLoss: 24.1576 meanLossQlbl: 15.2216 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.5497 meanLossQtgt_sigm: 0.6931
Episode: 3159 meanReward: 18.0938 meanLoss: 23.9670 meanLossQlbl: 14.9825 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.5982 meanLossQtgt_sigm: 0.6931
Episode: 3160 meanReward: 18.3438 meanLoss: 26.5821 meanLossQlbl: 16.4808 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7150 meanLossQtgt_sigm: 0.6931
Episode: 3161 meanReward: 18.0938 meanLoss: 24.6741 meanLossQlbl: 15.4496 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.8382 meanLossQtgt_sigm: 0.6931
Episode: 3162 meanReward: 18.3125 meanLoss: 25.2303 meanLossQlbl: 15.8308 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0133 meanLossQtgt_sigm: 0.6931
Episode: 3163 meanReward: 18.2812 meanLoss: 26.1529 meanLossQlbl: 16.3664 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4002 meanLossQtgt_sigm: 0.6931
Episode: 3164 meanReward: 18.2500 meanLoss: 22.4228 meanLossQlbl: 14.2418 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 6.79

Episode: 3214 meanReward: 17.8438 meanLoss: 27.3456 meanLossQlbl: 16.8081 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.1512 meanLossQtgt_sigm: 0.6931
Episode: 3215 meanReward: 17.5312 meanLoss: 24.1020 meanLossQlbl: 15.1803 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.5355 meanLossQtgt_sigm: 0.6931
Episode: 3216 meanReward: 17.4062 meanLoss: 26.5344 meanLossQlbl: 16.6000 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5481 meanLossQtgt_sigm: 0.6931
Episode: 3217 meanReward: 17.5625 meanLoss: 25.2740 meanLossQlbl: 15.7038 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.1839 meanLossQtgt_sigm: 0.6931
Episode: 3218 meanReward: 17.4375 meanLoss: 25.9208 meanLossQlbl: 16.1208 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4138 meanLossQtgt_sigm: 0.6931
Episode: 3219 meanReward: 17.4062 meanLoss: 27.3208 meanLossQlbl: 17.0139 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9206 meanLossQtgt_sigm: 0.6931
Episode: 3220 meanReward: 17.3438 meanLoss: 25.0219 meanLossQlbl: 15.6147 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.02

Episode: 3270 meanReward: 17.9375 meanLoss: 25.8334 meanLossQlbl: 16.1017 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3454 meanLossQtgt_sigm: 0.6931
Episode: 3271 meanReward: 17.9688 meanLoss: 24.1976 meanLossQlbl: 15.4365 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.3748 meanLossQtgt_sigm: 0.6931
Episode: 3272 meanReward: 17.8125 meanLoss: 27.1711 meanLossQlbl: 17.0561 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7287 meanLossQtgt_sigm: 0.6931
Episode: 3273 meanReward: 17.7812 meanLoss: 24.8974 meanLossQlbl: 15.7411 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.7700 meanLossQtgt_sigm: 0.6931
Episode: 3274 meanReward: 17.6875 meanLoss: 28.0880 meanLossQlbl: 17.4779 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.2239 meanLossQtgt_sigm: 0.6931
Episode: 3275 meanReward: 18.0625 meanLoss: 27.0360 meanLossQlbl: 17.0294 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6204 meanLossQtgt_sigm: 0.6931
Episode: 3276 meanReward: 18.0938 meanLoss: 24.4706 meanLossQlbl: 15.3632 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.72

Episode: 3326 meanReward: 19.6875 meanLoss: 24.9582 meanLossQlbl: 15.5136 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0582 meanLossQtgt_sigm: 0.6931
Episode: 3327 meanReward: 19.4062 meanLoss: 23.4070 meanLossQlbl: 14.6266 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.3941 meanLossQtgt_sigm: 0.6931
Episode: 3328 meanReward: 19.4688 meanLoss: 22.7685 meanLossQlbl: 14.5242 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 6.8581 meanLossQtgt_sigm: 0.6931
Episode: 3329 meanReward: 19.4688 meanLoss: 27.6619 meanLossQlbl: 17.2942 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9813 meanLossQtgt_sigm: 0.6931
Episode: 3330 meanReward: 19.4375 meanLoss: 25.6768 meanLossQlbl: 16.0039 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2866 meanLossQtgt_sigm: 0.6931
Episode: 3331 meanReward: 19.1562 meanLoss: 25.3635 meanLossQlbl: 15.9187 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0585 meanLossQtgt_sigm: 0.6931
Episode: 3332 meanReward: 19.1875 meanLoss: 26.2880 meanLossQlbl: 16.4188 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.48

Episode: 3382 meanReward: 16.4688 meanLoss: 28.1584 meanLossQlbl: 17.7938 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9783 meanLossQtgt_sigm: 0.6931
Episode: 3383 meanReward: 16.7188 meanLoss: 27.3256 meanLossQlbl: 17.1586 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7807 meanLossQtgt_sigm: 0.6931
Episode: 3384 meanReward: 16.9375 meanLoss: 26.6800 meanLossQlbl: 16.6318 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6619 meanLossQtgt_sigm: 0.6931
Episode: 3385 meanReward: 16.8750 meanLoss: 24.1730 meanLossQlbl: 15.0862 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.7004 meanLossQtgt_sigm: 0.6931
Episode: 3386 meanReward: 16.7188 meanLoss: 21.8364 meanLossQlbl: 13.7761 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 6.6740 meanLossQtgt_sigm: 0.6931
Episode: 3387 meanReward: 16.7500 meanLoss: 25.6442 meanLossQlbl: 15.8954 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3625 meanLossQtgt_sigm: 0.6931
Episode: 3388 meanReward: 16.7812 meanLoss: 26.0284 meanLossQlbl: 16.0870 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.55

Episode: 3438 meanReward: 18.8125 meanLoss: 24.5307 meanLossQlbl: 15.2091 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9353 meanLossQtgt_sigm: 0.6931
Episode: 3439 meanReward: 18.6562 meanLoss: 23.5747 meanLossQlbl: 14.8044 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.3840 meanLossQtgt_sigm: 0.6931
Episode: 3440 meanReward: 18.6562 meanLoss: 23.2954 meanLossQlbl: 14.7015 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.2077 meanLossQtgt_sigm: 0.6931
Episode: 3441 meanReward: 18.5625 meanLoss: 24.9094 meanLossQlbl: 15.4310 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0921 meanLossQtgt_sigm: 0.6931
Episode: 3442 meanReward: 18.9688 meanLoss: 27.5716 meanLossQlbl: 16.9958 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.1896 meanLossQtgt_sigm: 0.6931
Episode: 3443 meanReward: 19.0938 meanLoss: 25.8680 meanLossQlbl: 15.8616 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6202 meanLossQtgt_sigm: 0.6931
Episode: 3444 meanReward: 19.4375 meanLoss: 27.0629 meanLossQlbl: 16.6544 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.02

Episode: 3494 meanReward: 16.4375 meanLoss: 26.6318 meanLossQlbl: 16.7369 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5086 meanLossQtgt_sigm: 0.6931
Episode: 3495 meanReward: 16.5938 meanLoss: 26.2128 meanLossQlbl: 16.4013 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4252 meanLossQtgt_sigm: 0.6931
Episode: 3496 meanReward: 16.3438 meanLoss: 24.9001 meanLossQlbl: 15.5241 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9896 meanLossQtgt_sigm: 0.6931
Episode: 3497 meanReward: 16.0625 meanLoss: 22.9919 meanLossQlbl: 14.5812 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.0244 meanLossQtgt_sigm: 0.6931
Episode: 3498 meanReward: 16.0000 meanLoss: 26.8555 meanLossQlbl: 16.8767 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5925 meanLossQtgt_sigm: 0.6931
Episode: 3499 meanReward: 15.9375 meanLoss: 26.0878 meanLossQlbl: 16.6943 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0072 meanLossQtgt_sigm: 0.6931
Episode: 3500 meanReward: 15.9375 meanLoss: 26.0250 meanLossQlbl: 16.5720 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.06

Episode: 3550 meanReward: 17.8750 meanLoss: 26.3273 meanLossQlbl: 16.3646 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5764 meanLossQtgt_sigm: 0.6931
Episode: 3551 meanReward: 17.7188 meanLoss: 24.2873 meanLossQlbl: 15.1028 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.7982 meanLossQtgt_sigm: 0.6931
Episode: 3552 meanReward: 17.5312 meanLoss: 25.6542 meanLossQlbl: 15.8731 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3947 meanLossQtgt_sigm: 0.6931
Episode: 3553 meanReward: 17.3438 meanLoss: 26.9194 meanLossQlbl: 16.7738 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7593 meanLossQtgt_sigm: 0.6931
Episode: 3554 meanReward: 17.2812 meanLoss: 28.0629 meanLossQlbl: 17.5280 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.1486 meanLossQtgt_sigm: 0.6931
Episode: 3555 meanReward: 17.1250 meanLoss: 27.2655 meanLossQlbl: 17.2087 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6705 meanLossQtgt_sigm: 0.6931
Episode: 3556 meanReward: 17.0625 meanLoss: 26.5910 meanLossQlbl: 16.9421 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.26

Episode: 3606 meanReward: 16.6250 meanLoss: 27.5728 meanLossQlbl: 17.2258 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9607 meanLossQtgt_sigm: 0.6931
Episode: 3607 meanReward: 16.8438 meanLoss: 27.1763 meanLossQlbl: 16.8767 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9133 meanLossQtgt_sigm: 0.6931
Episode: 3608 meanReward: 17.0000 meanLoss: 25.7690 meanLossQlbl: 16.0731 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3096 meanLossQtgt_sigm: 0.6931
Episode: 3609 meanReward: 17.0938 meanLoss: 25.9256 meanLossQlbl: 16.1947 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3446 meanLossQtgt_sigm: 0.6931
Episode: 3610 meanReward: 17.2188 meanLoss: 25.5465 meanLossQlbl: 16.0699 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0903 meanLossQtgt_sigm: 0.6931
Episode: 3611 meanReward: 17.4375 meanLoss: 25.6473 meanLossQlbl: 16.1466 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.1145 meanLossQtgt_sigm: 0.6931
Episode: 3612 meanReward: 17.5000 meanLoss: 24.7348 meanLossQlbl: 15.5381 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.81

Episode: 3662 meanReward: 17.6250 meanLoss: 25.7927 meanLossQlbl: 16.1125 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2939 meanLossQtgt_sigm: 0.6931
Episode: 3663 meanReward: 17.3438 meanLoss: 25.8662 meanLossQlbl: 15.9799 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5000 meanLossQtgt_sigm: 0.6931
Episode: 3664 meanReward: 17.2188 meanLoss: 26.1152 meanLossQlbl: 16.2298 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4991 meanLossQtgt_sigm: 0.6931
Episode: 3665 meanReward: 17.4062 meanLoss: 27.9750 meanLossQlbl: 17.3174 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.2713 meanLossQtgt_sigm: 0.6931
Episode: 3666 meanReward: 17.5312 meanLoss: 26.9940 meanLossQlbl: 16.7232 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8845 meanLossQtgt_sigm: 0.6931
Episode: 3667 meanReward: 17.6562 meanLoss: 25.7248 meanLossQlbl: 16.0604 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2781 meanLossQtgt_sigm: 0.6931
Episode: 3668 meanReward: 17.5625 meanLoss: 27.9353 meanLossQlbl: 17.3563 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.19

Episode: 3718 meanReward: 17.8750 meanLoss: 27.8043 meanLossQlbl: 17.4833 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9347 meanLossQtgt_sigm: 0.6931
Episode: 3719 meanReward: 17.7500 meanLoss: 28.5286 meanLossQlbl: 17.7661 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.3763 meanLossQtgt_sigm: 0.6931
Episode: 3720 meanReward: 17.7812 meanLoss: 27.7644 meanLossQlbl: 17.2879 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.0902 meanLossQtgt_sigm: 0.6931
Episode: 3721 meanReward: 17.6250 meanLoss: 26.6154 meanLossQlbl: 16.5577 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6714 meanLossQtgt_sigm: 0.6931
Episode: 3722 meanReward: 17.6562 meanLoss: 25.8959 meanLossQlbl: 16.2041 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3055 meanLossQtgt_sigm: 0.6931
Episode: 3723 meanReward: 17.6562 meanLoss: 26.4266 meanLossQlbl: 16.5699 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4705 meanLossQtgt_sigm: 0.6931
Episode: 3724 meanReward: 17.4062 meanLoss: 26.1406 meanLossQlbl: 16.4261 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.32

Episode: 3774 meanReward: 16.3438 meanLoss: 26.4898 meanLossQlbl: 16.5434 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5601 meanLossQtgt_sigm: 0.6931
Episode: 3775 meanReward: 16.2812 meanLoss: 26.8876 meanLossQlbl: 16.8397 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6616 meanLossQtgt_sigm: 0.6931
Episode: 3776 meanReward: 16.3438 meanLoss: 25.7719 meanLossQlbl: 16.0182 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3674 meanLossQtgt_sigm: 0.6931
Episode: 3777 meanReward: 16.5625 meanLoss: 26.6413 meanLossQlbl: 16.4370 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8180 meanLossQtgt_sigm: 0.6931
Episode: 3778 meanReward: 16.6250 meanLoss: 25.1832 meanLossQlbl: 15.7228 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0741 meanLossQtgt_sigm: 0.6931
Episode: 3779 meanReward: 16.4688 meanLoss: 22.4609 meanLossQlbl: 14.4137 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 6.6609 meanLossQtgt_sigm: 0.6931
Episode: 3780 meanReward: 16.5625 meanLoss: 25.1623 meanLossQlbl: 15.8331 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.94

Episode: 3830 meanReward: 17.8125 meanLoss: 25.6610 meanLossQlbl: 16.2841 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9906 meanLossQtgt_sigm: 0.6931
Episode: 3831 meanReward: 17.7188 meanLoss: 25.4864 meanLossQlbl: 16.2680 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.8320 meanLossQtgt_sigm: 0.6931
Episode: 3832 meanReward: 17.7812 meanLoss: 26.4628 meanLossQlbl: 16.6547 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4218 meanLossQtgt_sigm: 0.6931
Episode: 3833 meanReward: 17.8438 meanLoss: 26.3774 meanLossQlbl: 16.2606 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7305 meanLossQtgt_sigm: 0.6931
Episode: 3834 meanReward: 18.0000 meanLoss: 25.9336 meanLossQlbl: 16.0296 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5177 meanLossQtgt_sigm: 0.6931
Episode: 3835 meanReward: 17.7188 meanLoss: 25.5567 meanLossQlbl: 15.9711 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.1993 meanLossQtgt_sigm: 0.6931
Episode: 3836 meanReward: 17.4375 meanLoss: 24.0167 meanLossQlbl: 15.1698 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.46

Episode: 3886 meanReward: 18.0938 meanLoss: 26.5736 meanLossQlbl: 16.4201 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7672 meanLossQtgt_sigm: 0.6931
Episode: 3887 meanReward: 18.3438 meanLoss: 27.2332 meanLossQlbl: 16.8276 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.0192 meanLossQtgt_sigm: 0.6931
Episode: 3888 meanReward: 18.5312 meanLoss: 26.7975 meanLossQlbl: 16.5720 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8392 meanLossQtgt_sigm: 0.6931
Episode: 3889 meanReward: 18.6250 meanLoss: 26.5803 meanLossQlbl: 16.4661 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7280 meanLossQtgt_sigm: 0.6931
Episode: 3890 meanReward: 18.3438 meanLoss: 24.6939 meanLossQlbl: 15.3330 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9746 meanLossQtgt_sigm: 0.6931
Episode: 3891 meanReward: 18.3750 meanLoss: 26.3982 meanLossQlbl: 16.2887 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7232 meanLossQtgt_sigm: 0.6931
Episode: 3892 meanReward: 18.3750 meanLoss: 25.1133 meanLossQlbl: 15.5086 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.21

Episode: 3942 meanReward: 18.9062 meanLoss: 24.1185 meanLossQlbl: 15.3626 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.3696 meanLossQtgt_sigm: 0.6931
Episode: 3943 meanReward: 19.0938 meanLoss: 26.7156 meanLossQlbl: 16.7135 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6158 meanLossQtgt_sigm: 0.6931
Episode: 3944 meanReward: 19.3438 meanLoss: 26.3776 meanLossQlbl: 16.3520 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6394 meanLossQtgt_sigm: 0.6931
Episode: 3945 meanReward: 19.1562 meanLoss: 25.4872 meanLossQlbl: 15.8827 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2183 meanLossQtgt_sigm: 0.6931
Episode: 3946 meanReward: 18.8438 meanLoss: 25.1094 meanLossQlbl: 15.7650 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9581 meanLossQtgt_sigm: 0.6931
Episode: 3947 meanReward: 19.1250 meanLoss: 27.0658 meanLossQlbl: 16.9015 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7780 meanLossQtgt_sigm: 0.6931
Episode: 3948 meanReward: 18.8438 meanLoss: 24.8421 meanLossQlbl: 15.5069 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.94

Episode: 3998 meanReward: 16.2812 meanLoss: 26.3457 meanLossQlbl: 16.3784 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5810 meanLossQtgt_sigm: 0.6931
Episode: 3999 meanReward: 16.3125 meanLoss: 26.2899 meanLossQlbl: 16.3540 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5496 meanLossQtgt_sigm: 0.6931
Episode: 4000 meanReward: 16.0938 meanLoss: 26.4062 meanLossQlbl: 16.5385 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4814 meanLossQtgt_sigm: 0.6931
Episode: 4001 meanReward: 16.3438 meanLoss: 26.8163 meanLossQlbl: 16.8456 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5844 meanLossQtgt_sigm: 0.6931
Episode: 4002 meanReward: 16.3125 meanLoss: 26.2944 meanLossQlbl: 16.5744 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3337 meanLossQtgt_sigm: 0.6931
Episode: 4003 meanReward: 16.3750 meanLoss: 26.0981 meanLossQlbl: 16.1939 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5179 meanLossQtgt_sigm: 0.6931
Episode: 4004 meanReward: 16.3750 meanLoss: 24.1850 meanLossQlbl: 15.2518 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.54

Episode: 4054 meanReward: 18.4688 meanLoss: 25.7105 meanLossQlbl: 16.1281 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.1961 meanLossQtgt_sigm: 0.6931
Episode: 4055 meanReward: 18.5312 meanLoss: 26.7842 meanLossQlbl: 16.5662 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8318 meanLossQtgt_sigm: 0.6931
Episode: 4056 meanReward: 18.3750 meanLoss: 25.8672 meanLossQlbl: 16.0909 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3900 meanLossQtgt_sigm: 0.6931
Episode: 4057 meanReward: 18.1875 meanLoss: 26.9717 meanLossQlbl: 16.7954 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7900 meanLossQtgt_sigm: 0.6931
Episode: 4058 meanReward: 18.3125 meanLoss: 27.0801 meanLossQlbl: 16.9296 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7642 meanLossQtgt_sigm: 0.6931
Episode: 4059 meanReward: 18.4375 meanLoss: 26.9627 meanLossQlbl: 16.7093 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8672 meanLossQtgt_sigm: 0.6931
Episode: 4060 meanReward: 18.6875 meanLoss: 26.2957 meanLossQlbl: 16.3534 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.55

Episode: 4110 meanReward: 17.5625 meanLoss: 27.9324 meanLossQlbl: 17.4856 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.0605 meanLossQtgt_sigm: 0.6931
Episode: 4111 meanReward: 17.7188 meanLoss: 27.9614 meanLossQlbl: 17.3888 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.1863 meanLossQtgt_sigm: 0.6931
Episode: 4112 meanReward: 17.6250 meanLoss: 24.9354 meanLossQlbl: 15.5965 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9526 meanLossQtgt_sigm: 0.6931
Episode: 4113 meanReward: 17.6562 meanLoss: 22.9579 meanLossQlbl: 14.6416 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 6.9301 meanLossQtgt_sigm: 0.6931
Episode: 4114 meanReward: 17.8438 meanLoss: 25.9433 meanLossQlbl: 16.2996 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2574 meanLossQtgt_sigm: 0.6931
Episode: 4115 meanReward: 17.4688 meanLoss: 25.5642 meanLossQlbl: 15.8906 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2873 meanLossQtgt_sigm: 0.6931
Episode: 4116 meanReward: 17.2500 meanLoss: 23.7224 meanLossQlbl: 14.8807 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.45

Episode: 4166 meanReward: 17.6875 meanLoss: 27.0472 meanLossQlbl: 16.8736 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7873 meanLossQtgt_sigm: 0.6931
Episode: 4167 meanReward: 17.6875 meanLoss: 24.7052 meanLossQlbl: 15.4008 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9181 meanLossQtgt_sigm: 0.6931
Episode: 4168 meanReward: 17.7188 meanLoss: 22.8812 meanLossQlbl: 14.4373 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.0576 meanLossQtgt_sigm: 0.6931
Episode: 4169 meanReward: 17.9688 meanLoss: 28.0324 meanLossQlbl: 17.4114 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.2348 meanLossQtgt_sigm: 0.6931
Episode: 4170 meanReward: 18.3438 meanLoss: 26.8801 meanLossQlbl: 16.7335 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7603 meanLossQtgt_sigm: 0.6931
Episode: 4171 meanReward: 18.1875 meanLoss: 26.4284 meanLossQlbl: 16.4285 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6136 meanLossQtgt_sigm: 0.6931
Episode: 4172 meanReward: 17.8125 meanLoss: 25.5244 meanLossQlbl: 16.1598 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.97

Episode: 4222 meanReward: 19.0625 meanLoss: 27.0994 meanLossQlbl: 16.7344 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9787 meanLossQtgt_sigm: 0.6931
Episode: 4223 meanReward: 19.1875 meanLoss: 26.9568 meanLossQlbl: 16.7778 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7927 meanLossQtgt_sigm: 0.6931
Episode: 4224 meanReward: 19.1875 meanLoss: 25.6180 meanLossQlbl: 16.0173 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2144 meanLossQtgt_sigm: 0.6931
Episode: 4225 meanReward: 19.2812 meanLoss: 24.5711 meanLossQlbl: 15.5866 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.5982 meanLossQtgt_sigm: 0.6931
Episode: 4226 meanReward: 19.1562 meanLoss: 27.4608 meanLossQlbl: 17.2076 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8668 meanLossQtgt_sigm: 0.6931
Episode: 4227 meanReward: 19.0625 meanLoss: 26.4908 meanLossQlbl: 16.4277 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6767 meanLossQtgt_sigm: 0.6931
Episode: 4228 meanReward: 18.8125 meanLoss: 24.7878 meanLossQlbl: 15.4329 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.96

Episode: 4278 meanReward: 17.1562 meanLoss: 24.3517 meanLossQlbl: 15.4135 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.5519 meanLossQtgt_sigm: 0.6931
Episode: 4279 meanReward: 17.2500 meanLoss: 26.1276 meanLossQlbl: 16.2795 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4618 meanLossQtgt_sigm: 0.6931
Episode: 4280 meanReward: 17.3125 meanLoss: 25.8091 meanLossQlbl: 16.1140 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3088 meanLossQtgt_sigm: 0.6931
Episode: 4281 meanReward: 17.3125 meanLoss: 25.2072 meanLossQlbl: 15.8762 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9447 meanLossQtgt_sigm: 0.6931
Episode: 4282 meanReward: 17.3438 meanLoss: 25.2729 meanLossQlbl: 15.8202 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0664 meanLossQtgt_sigm: 0.6931
Episode: 4283 meanReward: 17.1562 meanLoss: 28.5968 meanLossQlbl: 17.8444 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.3662 meanLossQtgt_sigm: 0.6931
Episode: 4284 meanReward: 17.3438 meanLoss: 27.6040 meanLossQlbl: 17.3246 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.89

Episode: 4334 meanReward: 18.3750 meanLoss: 26.7280 meanLossQlbl: 16.5448 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7968 meanLossQtgt_sigm: 0.6931
Episode: 4335 meanReward: 18.5625 meanLoss: 27.6327 meanLossQlbl: 16.9967 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.2497 meanLossQtgt_sigm: 0.6931
Episode: 4336 meanReward: 18.7812 meanLoss: 26.2001 meanLossQlbl: 16.2551 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5588 meanLossQtgt_sigm: 0.6931
Episode: 4337 meanReward: 18.5312 meanLoss: 24.6657 meanLossQlbl: 15.5376 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.7418 meanLossQtgt_sigm: 0.6931
Episode: 4338 meanReward: 18.6250 meanLoss: 25.8465 meanLossQlbl: 16.2463 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2139 meanLossQtgt_sigm: 0.6931
Episode: 4339 meanReward: 18.5938 meanLoss: 25.6992 meanLossQlbl: 16.1002 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.2128 meanLossQtgt_sigm: 0.6931
Episode: 4340 meanReward: 18.5938 meanLoss: 26.5077 meanLossQlbl: 16.5422 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.57

Episode: 4390 meanReward: 18.6250 meanLoss: 25.8366 meanLossQlbl: 16.2689 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.1814 meanLossQtgt_sigm: 0.6931
Episode: 4391 meanReward: 18.3438 meanLoss: 24.9969 meanLossQlbl: 15.7217 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.8889 meanLossQtgt_sigm: 0.6931
Episode: 4392 meanReward: 18.5938 meanLoss: 25.2351 meanLossQlbl: 15.8531 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9957 meanLossQtgt_sigm: 0.6931
Episode: 4393 meanReward: 18.5625 meanLoss: 26.4078 meanLossQlbl: 16.5619 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.4597 meanLossQtgt_sigm: 0.6931
Episode: 4394 meanReward: 18.3750 meanLoss: 24.4534 meanLossQlbl: 15.5692 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.4979 meanLossQtgt_sigm: 0.6931
Episode: 4395 meanReward: 18.2812 meanLoss: 23.6597 meanLossQlbl: 15.0422 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.2312 meanLossQtgt_sigm: 0.6931
Episode: 4396 meanReward: 18.3438 meanLoss: 25.6569 meanLossQlbl: 15.8229 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.44

Episode: 4446 meanReward: 10.8125 meanLoss: 34.8583 meanLossQlbl: 21.6645 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8075 meanLossQtgt_sigm: 0.6931
Episode: 4447 meanReward: 10.5625 meanLoss: 33.3690 meanLossQlbl: 20.9354 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0473 meanLossQtgt_sigm: 0.6931
Episode: 4448 meanReward: 10.5000 meanLoss: 34.1142 meanLossQlbl: 21.3796 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3483 meanLossQtgt_sigm: 0.6931
Episode: 4449 meanReward: 10.5000 meanLoss: 33.3685 meanLossQlbl: 20.9040 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0781 meanLossQtgt_sigm: 0.6931
Episode: 4450 meanReward: 10.6250 meanLoss: 34.5960 meanLossQlbl: 21.4971 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7126 meanLossQtgt_sigm: 0.6931
Episode: 4451 meanReward: 10.7812 meanLoss: 35.1724 meanLossQlbl: 21.8140 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9721 meanLossQtgt_sigm: 0.6931
Episode: 4452 meanReward: 10.7500 meanLoss: 32.8998 meanLossQlbl: 20.5525 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 4503 meanReward: 11.0938 meanLoss: 33.6282 meanLossQlbl: 21.0630 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1788 meanLossQtgt_sigm: 0.6931
Episode: 4504 meanReward: 11.1250 meanLoss: 34.6117 meanLossQlbl: 21.6189 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6065 meanLossQtgt_sigm: 0.6931
Episode: 4505 meanReward: 10.9688 meanLoss: 35.7677 meanLossQlbl: 22.2285 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1530 meanLossQtgt_sigm: 0.6931
Episode: 4506 meanReward: 10.9688 meanLoss: 32.5714 meanLossQlbl: 20.5049 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6802 meanLossQtgt_sigm: 0.6931
Episode: 4507 meanReward: 10.8750 meanLoss: 32.4517 meanLossQlbl: 20.4449 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6205 meanLossQtgt_sigm: 0.6931
Episode: 4508 meanReward: 11.0312 meanLoss: 35.2934 meanLossQlbl: 22.0759 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8311 meanLossQtgt_sigm: 0.6931
Episode: 4509 meanReward: 11.0312 meanLoss: 36.8135 meanLossQlbl: 22.8283 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 4559 meanReward: 10.6250 meanLoss: 33.3722 meanLossQlbl: 20.7992 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1867 meanLossQtgt_sigm: 0.6931
Episode: 4560 meanReward: 10.5938 meanLoss: 33.2567 meanLossQlbl: 20.6835 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1870 meanLossQtgt_sigm: 0.6931
Episode: 4561 meanReward: 10.5938 meanLoss: 29.0714 meanLossQlbl: 18.3249 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.3602 meanLossQtgt_sigm: 0.6931
Episode: 4562 meanReward: 10.4375 meanLoss: 25.0046 meanLossQlbl: 16.1279 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.4904 meanLossQtgt_sigm: 0.6931
Episode: 4563 meanReward: 10.5938 meanLoss: 31.8308 meanLossQlbl: 20.0539 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.3906 meanLossQtgt_sigm: 0.6931
Episode: 4564 meanReward: 10.5938 meanLoss: 36.3588 meanLossQlbl: 22.5784 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3941 meanLossQtgt_sigm: 0.6931
Episode: 4565 meanReward: 10.7500 meanLoss: 35.0213 meanLossQlbl: 21.6910 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 

Episode: 4616 meanReward: 10.9062 meanLoss: 34.0183 meanLossQlbl: 21.2283 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4037 meanLossQtgt_sigm: 0.6931
Episode: 4617 meanReward: 10.9062 meanLoss: 33.9479 meanLossQlbl: 21.1924 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3692 meanLossQtgt_sigm: 0.6931
Episode: 4618 meanReward: 10.6875 meanLoss: 33.8138 meanLossQlbl: 21.0831 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3444 meanLossQtgt_sigm: 0.6931
Episode: 4619 meanReward: 10.7188 meanLoss: 29.6033 meanLossQlbl: 18.8261 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.3909 meanLossQtgt_sigm: 0.6931
Episode: 4620 meanReward: 10.7500 meanLoss: 33.3755 meanLossQlbl: 20.9857 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0035 meanLossQtgt_sigm: 0.6931
Episode: 4621 meanReward: 10.6875 meanLoss: 34.7147 meanLossQlbl: 21.7154 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6130 meanLossQtgt_sigm: 0.6931
Episode: 4622 meanReward: 10.7188 meanLoss: 35.3856 meanLossQlbl: 22.1502 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 4672 meanReward: 10.8750 meanLoss: 35.5201 meanLossQlbl: 22.2454 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8884 meanLossQtgt_sigm: 0.6931
Episode: 4673 meanReward: 10.8438 meanLoss: 34.9204 meanLossQlbl: 21.8867 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6475 meanLossQtgt_sigm: 0.6931
Episode: 4674 meanReward: 10.6562 meanLoss: 34.7930 meanLossQlbl: 21.7230 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6837 meanLossQtgt_sigm: 0.6931
Episode: 4675 meanReward: 10.6562 meanLoss: 35.7777 meanLossQlbl: 22.2834 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1080 meanLossQtgt_sigm: 0.6931
Episode: 4676 meanReward: 10.6875 meanLoss: 30.6619 meanLossQlbl: 19.3434 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.9322 meanLossQtgt_sigm: 0.6931
Episode: 4677 meanReward: 10.5938 meanLoss: 34.3841 meanLossQlbl: 21.5058 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4920 meanLossQtgt_sigm: 0.6931
Episode: 4678 meanReward: 10.6250 meanLoss: 35.6867 meanLossQlbl: 22.1999 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 4728 meanReward: 10.7188 meanLoss: 33.6225 meanLossQlbl: 21.0915 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1447 meanLossQtgt_sigm: 0.6931
Episode: 4729 meanReward: 10.5938 meanLoss: 35.1582 meanLossQlbl: 21.9476 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8243 meanLossQtgt_sigm: 0.6931
Episode: 4730 meanReward: 10.5000 meanLoss: 36.0156 meanLossQlbl: 22.3870 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2423 meanLossQtgt_sigm: 0.6931
Episode: 4731 meanReward: 10.4062 meanLoss: 30.7735 meanLossQlbl: 19.4780 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.9093 meanLossQtgt_sigm: 0.6931
Episode: 4732 meanReward: 10.2500 meanLoss: 35.9322 meanLossQlbl: 22.3601 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1858 meanLossQtgt_sigm: 0.6931
Episode: 4733 meanReward: 10.1875 meanLoss: 35.6066 meanLossQlbl: 22.1825 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0378 meanLossQtgt_sigm: 0.6931
Episode: 4734 meanReward: 10.0312 meanLoss: 30.3931 meanLossQlbl: 19.1608 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 4785 meanReward: 10.6562 meanLoss: 35.4837 meanLossQlbl: 22.1721 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9253 meanLossQtgt_sigm: 0.6931
Episode: 4786 meanReward: 10.7812 meanLoss: 35.2288 meanLossQlbl: 21.8015 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0411 meanLossQtgt_sigm: 0.6931
Episode: 4787 meanReward: 10.9062 meanLoss: 34.7925 meanLossQlbl: 21.5692 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8370 meanLossQtgt_sigm: 0.6931
Episode: 4788 meanReward: 10.8750 meanLoss: 32.1093 meanLossQlbl: 20.0884 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6347 meanLossQtgt_sigm: 0.6931
Episode: 4789 meanReward: 10.8750 meanLoss: 31.3089 meanLossQlbl: 19.6717 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2508 meanLossQtgt_sigm: 0.6931
Episode: 4790 meanReward: 10.9062 meanLoss: 33.3984 meanLossQlbl: 20.8409 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1711 meanLossQtgt_sigm: 0.6931
Episode: 4791 meanReward: 10.8750 meanLoss: 33.9821 meanLossQlbl: 21.2028 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 4841 meanReward: 11.2812 meanLoss: 34.7972 meanLossQlbl: 21.6612 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7497 meanLossQtgt_sigm: 0.6931
Episode: 4842 meanReward: 11.1875 meanLoss: 35.4427 meanLossQlbl: 21.9759 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0805 meanLossQtgt_sigm: 0.6931
Episode: 4843 meanReward: 11.1875 meanLoss: 34.3962 meanLossQlbl: 21.5119 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4980 meanLossQtgt_sigm: 0.6931
Episode: 4844 meanReward: 11.1562 meanLoss: 33.2874 meanLossQlbl: 20.8710 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0301 meanLossQtgt_sigm: 0.6931
Episode: 4845 meanReward: 11.0625 meanLoss: 34.9277 meanLossQlbl: 21.8286 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7129 meanLossQtgt_sigm: 0.6931
Episode: 4846 meanReward: 11.2188 meanLoss: 33.6966 meanLossQlbl: 21.0024 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3080 meanLossQtgt_sigm: 0.6931
Episode: 4847 meanReward: 11.1875 meanLoss: 32.9113 meanLossQlbl: 20.6436 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 4898 meanReward: 10.8750 meanLoss: 28.6258 meanLossQlbl: 18.2530 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9865 meanLossQtgt_sigm: 0.6931
Episode: 4899 meanReward: 10.8438 meanLoss: 35.6385 meanLossQlbl: 22.0731 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1791 meanLossQtgt_sigm: 0.6931
Episode: 4900 meanReward: 10.7812 meanLoss: 34.8089 meanLossQlbl: 21.5996 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8230 meanLossQtgt_sigm: 0.6931
Episode: 4901 meanReward: 10.8125 meanLoss: 32.8859 meanLossQlbl: 20.6640 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8356 meanLossQtgt_sigm: 0.6931
Episode: 4902 meanReward: 10.9062 meanLoss: 31.5043 meanLossQlbl: 19.8442 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2738 meanLossQtgt_sigm: 0.6931
Episode: 4903 meanReward: 10.7500 meanLoss: 35.1781 meanLossQlbl: 21.8885 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9032 meanLossQtgt_sigm: 0.6931
Episode: 4904 meanReward: 10.7500 meanLoss: 34.8903 meanLossQlbl: 21.6870 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 4955 meanReward: 10.5000 meanLoss: 32.9068 meanLossQlbl: 20.5553 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9652 meanLossQtgt_sigm: 0.6931
Episode: 4956 meanReward: 10.5000 meanLoss: 33.0474 meanLossQlbl: 20.6480 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0131 meanLossQtgt_sigm: 0.6931
Episode: 4957 meanReward: 10.5000 meanLoss: 34.9924 meanLossQlbl: 21.7105 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8957 meanLossQtgt_sigm: 0.6931
Episode: 4958 meanReward: 10.5000 meanLoss: 32.1167 meanLossQlbl: 20.2932 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4372 meanLossQtgt_sigm: 0.6931
Episode: 4959 meanReward: 10.6562 meanLoss: 33.6204 meanLossQlbl: 20.9941 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2400 meanLossQtgt_sigm: 0.6931
Episode: 4960 meanReward: 10.6875 meanLoss: 35.1059 meanLossQlbl: 21.8135 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9061 meanLossQtgt_sigm: 0.6931
Episode: 4961 meanReward: 10.6875 meanLoss: 32.5228 meanLossQlbl: 20.3099 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 5012 meanReward: 10.3750 meanLoss: 34.5497 meanLossQlbl: 21.5564 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6071 meanLossQtgt_sigm: 0.6931
Episode: 5013 meanReward: 10.3438 meanLoss: 34.5661 meanLossQlbl: 21.6782 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5017 meanLossQtgt_sigm: 0.6931
Episode: 5014 meanReward: 10.2812 meanLoss: 35.9982 meanLossQlbl: 22.4261 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1857 meanLossQtgt_sigm: 0.6931
Episode: 5015 meanReward: 10.4375 meanLoss: 33.8418 meanLossQlbl: 21.1132 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3423 meanLossQtgt_sigm: 0.6931
Episode: 5016 meanReward: 10.4375 meanLoss: 35.3768 meanLossQlbl: 21.9793 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0113 meanLossQtgt_sigm: 0.6931
Episode: 5017 meanReward: 10.5000 meanLoss: 32.8766 meanLossQlbl: 20.5087 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9815 meanLossQtgt_sigm: 0.6931
Episode: 5018 meanReward: 10.3438 meanLoss: 28.5688 meanLossQlbl: 18.2467 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 5068 meanReward: 10.6562 meanLoss: 30.2551 meanLossQlbl: 19.1946 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.6742 meanLossQtgt_sigm: 0.6931
Episode: 5069 meanReward: 10.5938 meanLoss: 32.7884 meanLossQlbl: 20.6892 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7129 meanLossQtgt_sigm: 0.6931
Episode: 5070 meanReward: 10.6250 meanLoss: 35.5722 meanLossQlbl: 22.2659 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9200 meanLossQtgt_sigm: 0.6931
Episode: 5071 meanReward: 10.6250 meanLoss: 36.1520 meanLossQlbl: 22.5025 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2632 meanLossQtgt_sigm: 0.6931
Episode: 5072 meanReward: 10.7500 meanLoss: 35.2408 meanLossQlbl: 21.8749 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9796 meanLossQtgt_sigm: 0.6931
Episode: 5073 meanReward: 10.9062 meanLoss: 34.3705 meanLossQlbl: 21.3609 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6234 meanLossQtgt_sigm: 0.6931
Episode: 5074 meanReward: 10.9062 meanLoss: 34.3654 meanLossQlbl: 21.3535 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 5124 meanReward: 10.7500 meanLoss: 24.1163 meanLossQlbl: 15.5775 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.1524 meanLossQtgt_sigm: 0.6931
Episode: 5125 meanReward: 10.7500 meanLoss: 32.8050 meanLossQlbl: 20.5092 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9094 meanLossQtgt_sigm: 0.6931
Episode: 5126 meanReward: 10.9375 meanLoss: 36.6880 meanLossQlbl: 22.6188 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.6829 meanLossQtgt_sigm: 0.6931
Episode: 5127 meanReward: 10.9062 meanLoss: 32.7892 meanLossQlbl: 20.3350 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0679 meanLossQtgt_sigm: 0.6931
Episode: 5128 meanReward: 10.9375 meanLoss: 30.6084 meanLossQlbl: 19.1082 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1140 meanLossQtgt_sigm: 0.6931
Episode: 5129 meanReward: 11.0000 meanLoss: 32.2540 meanLossQlbl: 20.1434 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7244 meanLossQtgt_sigm: 0.6931
Episode: 5130 meanReward: 11.0625 meanLoss: 35.1525 meanLossQlbl: 21.8681 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 5180 meanReward: 10.7188 meanLoss: 31.6603 meanLossQlbl: 19.9744 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2995 meanLossQtgt_sigm: 0.6931
Episode: 5181 meanReward: 10.8438 meanLoss: 31.7870 meanLossQlbl: 19.9803 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4204 meanLossQtgt_sigm: 0.6931
Episode: 5182 meanReward: 10.8750 meanLoss: 36.1762 meanLossQlbl: 22.4582 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3317 meanLossQtgt_sigm: 0.6931
Episode: 5183 meanReward: 10.8750 meanLoss: 35.4930 meanLossQlbl: 22.0505 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0561 meanLossQtgt_sigm: 0.6931
Episode: 5184 meanReward: 11.0000 meanLoss: 31.7693 meanLossQlbl: 20.0036 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.3793 meanLossQtgt_sigm: 0.6931
Episode: 5185 meanReward: 11.0938 meanLoss: 35.5488 meanLossQlbl: 21.9884 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1741 meanLossQtgt_sigm: 0.6931
Episode: 5186 meanReward: 11.1875 meanLoss: 35.1650 meanLossQlbl: 21.7518 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 5236 meanReward: 10.4688 meanLoss: 34.1767 meanLossQlbl: 21.5327 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2576 meanLossQtgt_sigm: 0.6931
Episode: 5237 meanReward: 10.4688 meanLoss: 34.6010 meanLossQlbl: 21.7301 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4846 meanLossQtgt_sigm: 0.6931
Episode: 5238 meanReward: 10.4688 meanLoss: 36.0231 meanLossQlbl: 22.4745 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1624 meanLossQtgt_sigm: 0.6931
Episode: 5239 meanReward: 10.4062 meanLoss: 34.9475 meanLossQlbl: 21.7745 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7867 meanLossQtgt_sigm: 0.6931
Episode: 5240 meanReward: 10.5000 meanLoss: 35.5872 meanLossQlbl: 22.0427 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1582 meanLossQtgt_sigm: 0.6931
Episode: 5241 meanReward: 10.3750 meanLoss: 35.3041 meanLossQlbl: 22.0088 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9090 meanLossQtgt_sigm: 0.6931
Episode: 5242 meanReward: 10.5000 meanLoss: 33.8017 meanLossQlbl: 21.0167 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 5292 meanReward: 10.1875 meanLoss: 33.8862 meanLossQlbl: 21.2832 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2167 meanLossQtgt_sigm: 0.6931
Episode: 5293 meanReward: 10.2500 meanLoss: 34.6071 meanLossQlbl: 21.7219 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4989 meanLossQtgt_sigm: 0.6931
Episode: 5294 meanReward: 10.1875 meanLoss: 35.0440 meanLossQlbl: 21.9506 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7071 meanLossQtgt_sigm: 0.6931
Episode: 5295 meanReward: 10.1562 meanLoss: 34.8401 meanLossQlbl: 21.7922 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6616 meanLossQtgt_sigm: 0.6931
Episode: 5296 meanReward: 10.1562 meanLoss: 33.7434 meanLossQlbl: 21.2193 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1378 meanLossQtgt_sigm: 0.6931
Episode: 5297 meanReward: 10.0000 meanLoss: 34.0260 meanLossQlbl: 21.4200 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2198 meanLossQtgt_sigm: 0.6931
Episode: 5298 meanReward: 9.9688 meanLoss: 34.6496 meanLossQlbl: 21.8002 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 5348 meanReward: 11.2188 meanLoss: 32.3741 meanLossQlbl: 20.3790 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6088 meanLossQtgt_sigm: 0.6931
Episode: 5349 meanReward: 11.0625 meanLoss: 34.7389 meanLossQlbl: 21.7232 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6294 meanLossQtgt_sigm: 0.6931
Episode: 5350 meanReward: 10.9375 meanLoss: 32.5023 meanLossQlbl: 20.4690 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6471 meanLossQtgt_sigm: 0.6931
Episode: 5351 meanReward: 10.9375 meanLoss: 32.1942 meanLossQlbl: 20.2924 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5155 meanLossQtgt_sigm: 0.6931
Episode: 5352 meanReward: 10.9688 meanLoss: 33.8080 meanLossQlbl: 21.2675 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1542 meanLossQtgt_sigm: 0.6931
Episode: 5353 meanReward: 10.8438 meanLoss: 35.5978 meanLossQlbl: 22.2827 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9288 meanLossQtgt_sigm: 0.6931
Episode: 5354 meanReward: 10.6875 meanLoss: 35.3577 meanLossQlbl: 22.0746 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 5404 meanReward: 10.9375 meanLoss: 35.1652 meanLossQlbl: 22.0866 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6923 meanLossQtgt_sigm: 0.6931
Episode: 5405 meanReward: 11.0000 meanLoss: 35.3459 meanLossQlbl: 22.1648 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7949 meanLossQtgt_sigm: 0.6931
Episode: 5406 meanReward: 11.0000 meanLoss: 35.3800 meanLossQlbl: 22.1062 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8876 meanLossQtgt_sigm: 0.6931
Episode: 5407 meanReward: 11.0000 meanLoss: 34.5368 meanLossQlbl: 21.6148 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5357 meanLossQtgt_sigm: 0.6931
Episode: 5408 meanReward: 10.9062 meanLoss: 33.9930 meanLossQlbl: 21.1832 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4235 meanLossQtgt_sigm: 0.6931
Episode: 5409 meanReward: 10.7812 meanLoss: 35.3899 meanLossQlbl: 21.9761 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0275 meanLossQtgt_sigm: 0.6931
Episode: 5410 meanReward: 10.7812 meanLoss: 35.4604 meanLossQlbl: 22.0357 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 5460 meanReward: 10.7188 meanLoss: 33.1986 meanLossQlbl: 20.6320 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1803 meanLossQtgt_sigm: 0.6931
Episode: 5461 meanReward: 10.8438 meanLoss: 32.6704 meanLossQlbl: 20.3303 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9538 meanLossQtgt_sigm: 0.6931
Episode: 5462 meanReward: 10.9375 meanLoss: 33.2634 meanLossQlbl: 20.6671 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2101 meanLossQtgt_sigm: 0.6931
Episode: 5463 meanReward: 10.9062 meanLoss: 32.0479 meanLossQlbl: 20.0124 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6492 meanLossQtgt_sigm: 0.6931
Episode: 5464 meanReward: 10.9062 meanLoss: 29.8210 meanLossQlbl: 18.8718 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.5629 meanLossQtgt_sigm: 0.6931
Episode: 5465 meanReward: 11.0625 meanLoss: 33.2825 meanLossQlbl: 20.7929 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1033 meanLossQtgt_sigm: 0.6931
Episode: 5466 meanReward: 11.0938 meanLoss: 35.3184 meanLossQlbl: 21.8453 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 5516 meanReward: 10.7500 meanLoss: 35.6553 meanLossQlbl: 22.1713 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0977 meanLossQtgt_sigm: 0.6931
Episode: 5517 meanReward: 10.7500 meanLoss: 36.3923 meanLossQlbl: 22.5316 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.4745 meanLossQtgt_sigm: 0.6931
Episode: 5518 meanReward: 10.5312 meanLoss: 34.3831 meanLossQlbl: 21.4711 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5257 meanLossQtgt_sigm: 0.6931
Episode: 5519 meanReward: 10.4062 meanLoss: 30.8423 meanLossQlbl: 19.4652 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.9909 meanLossQtgt_sigm: 0.6931
Episode: 5520 meanReward: 10.6250 meanLoss: 34.8913 meanLossQlbl: 21.6871 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8179 meanLossQtgt_sigm: 0.6931
Episode: 5521 meanReward: 10.5938 meanLoss: 36.7415 meanLossQlbl: 22.6665 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.6886 meanLossQtgt_sigm: 0.6931
Episode: 5522 meanReward: 10.5938 meanLoss: 33.7634 meanLossQlbl: 21.0414 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 5572 meanReward: 11.0625 meanLoss: 32.5706 meanLossQlbl: 20.5308 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6534 meanLossQtgt_sigm: 0.6931
Episode: 5573 meanReward: 11.0000 meanLoss: 33.5581 meanLossQlbl: 21.0844 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0874 meanLossQtgt_sigm: 0.6931
Episode: 5574 meanReward: 11.1562 meanLoss: 35.9022 meanLossQlbl: 22.2106 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3053 meanLossQtgt_sigm: 0.6931
Episode: 5575 meanReward: 11.1250 meanLoss: 35.3601 meanLossQlbl: 21.9176 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0562 meanLossQtgt_sigm: 0.6931
Episode: 5576 meanReward: 11.1562 meanLoss: 32.6922 meanLossQlbl: 20.4707 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8352 meanLossQtgt_sigm: 0.6931
Episode: 5577 meanReward: 11.1250 meanLoss: 30.0744 meanLossQlbl: 19.0961 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.5920 meanLossQtgt_sigm: 0.6931
Episode: 5578 meanReward: 11.0000 meanLoss: 33.0453 meanLossQlbl: 20.8364 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 5628 meanReward: 10.5312 meanLoss: 31.0520 meanLossQlbl: 19.5819 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.0837 meanLossQtgt_sigm: 0.6931
Episode: 5629 meanReward: 10.5312 meanLoss: 34.6590 meanLossQlbl: 21.7151 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5575 meanLossQtgt_sigm: 0.6931
Episode: 5630 meanReward: 10.4375 meanLoss: 35.5336 meanLossQlbl: 22.1887 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9586 meanLossQtgt_sigm: 0.6931
Episode: 5631 meanReward: 10.4688 meanLoss: 33.9195 meanLossQlbl: 21.3278 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2054 meanLossQtgt_sigm: 0.6931
Episode: 5632 meanReward: 10.4688 meanLoss: 32.5497 meanLossQlbl: 20.5225 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6409 meanLossQtgt_sigm: 0.6931
Episode: 5633 meanReward: 10.3438 meanLoss: 33.8004 meanLossQlbl: 21.2569 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1572 meanLossQtgt_sigm: 0.6931
Episode: 5634 meanReward: 10.5000 meanLoss: 35.3679 meanLossQlbl: 22.0889 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 5684 meanReward: 11.0625 meanLoss: 29.8949 meanLossQlbl: 18.8224 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.6862 meanLossQtgt_sigm: 0.6931
Episode: 5685 meanReward: 11.0000 meanLoss: 30.2702 meanLossQlbl: 19.0896 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.7943 meanLossQtgt_sigm: 0.6931
Episode: 5686 meanReward: 10.9375 meanLoss: 31.6492 meanLossQlbl: 19.9483 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.3146 meanLossQtgt_sigm: 0.6931
Episode: 5687 meanReward: 11.0000 meanLoss: 32.2284 meanLossQlbl: 20.3169 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5252 meanLossQtgt_sigm: 0.6931
Episode: 5688 meanReward: 11.0000 meanLoss: 36.1832 meanLossQlbl: 22.4821 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3149 meanLossQtgt_sigm: 0.6931
Episode: 5689 meanReward: 10.9062 meanLoss: 34.0123 meanLossQlbl: 21.2930 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3330 meanLossQtgt_sigm: 0.6931
Episode: 5690 meanReward: 10.8438 meanLoss: 33.6959 meanLossQlbl: 21.1365 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 

Episode: 5740 meanReward: 10.5938 meanLoss: 33.3128 meanLossQlbl: 20.8778 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0487 meanLossQtgt_sigm: 0.6931
Episode: 5741 meanReward: 10.6250 meanLoss: 30.9235 meanLossQlbl: 19.4796 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.0577 meanLossQtgt_sigm: 0.6931
Episode: 5742 meanReward: 10.6562 meanLoss: 31.2809 meanLossQlbl: 19.7675 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1271 meanLossQtgt_sigm: 0.6931
Episode: 5743 meanReward: 10.7188 meanLoss: 35.5589 meanLossQlbl: 22.0281 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1445 meanLossQtgt_sigm: 0.6931
Episode: 5744 meanReward: 10.6562 meanLoss: 36.8452 meanLossQlbl: 22.6947 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.7641 meanLossQtgt_sigm: 0.6931
Episode: 5745 meanReward: 10.5625 meanLoss: 32.7395 meanLossQlbl: 20.3616 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9916 meanLossQtgt_sigm: 0.6931
Episode: 5746 meanReward: 10.7812 meanLoss: 31.4090 meanLossQlbl: 19.6511 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 5796 meanReward: 10.8750 meanLoss: 31.9332 meanLossQlbl: 19.9529 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5940 meanLossQtgt_sigm: 0.6931
Episode: 5797 meanReward: 10.9375 meanLoss: 33.4673 meanLossQlbl: 20.8126 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2684 meanLossQtgt_sigm: 0.6931
Episode: 5798 meanReward: 11.1250 meanLoss: 33.8860 meanLossQlbl: 21.0024 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4973 meanLossQtgt_sigm: 0.6931
Episode: 5799 meanReward: 11.1250 meanLoss: 33.9377 meanLossQlbl: 21.0275 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5239 meanLossQtgt_sigm: 0.6931
Episode: 5800 meanReward: 11.0625 meanLoss: 31.6392 meanLossQlbl: 19.7978 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4551 meanLossQtgt_sigm: 0.6931
Episode: 5801 meanReward: 11.0312 meanLoss: 28.1160 meanLossQlbl: 17.9597 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7701 meanLossQtgt_sigm: 0.6931
Episode: 5802 meanReward: 11.0000 meanLoss: 35.2176 meanLossQlbl: 21.9795 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 5852 meanReward: 11.0000 meanLoss: 32.6527 meanLossQlbl: 20.4727 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7937 meanLossQtgt_sigm: 0.6931
Episode: 5853 meanReward: 11.0000 meanLoss: 34.7616 meanLossQlbl: 21.7221 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6532 meanLossQtgt_sigm: 0.6931
Episode: 5854 meanReward: 10.9688 meanLoss: 35.3498 meanLossQlbl: 22.0738 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8897 meanLossQtgt_sigm: 0.6931
Episode: 5855 meanReward: 10.8750 meanLoss: 31.9135 meanLossQlbl: 20.0840 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4431 meanLossQtgt_sigm: 0.6931
Episode: 5856 meanReward: 10.7812 meanLoss: 32.8279 meanLossQlbl: 20.6889 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7528 meanLossQtgt_sigm: 0.6931
Episode: 5857 meanReward: 10.9688 meanLoss: 35.3281 meanLossQlbl: 21.9940 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9478 meanLossQtgt_sigm: 0.6931
Episode: 5858 meanReward: 11.0938 meanLoss: 36.6881 meanLossQlbl: 22.5829 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 5908 meanReward: 10.6875 meanLoss: 34.8664 meanLossQlbl: 21.7521 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7279 meanLossQtgt_sigm: 0.6931
Episode: 5909 meanReward: 10.7500 meanLoss: 34.6531 meanLossQlbl: 21.5520 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7148 meanLossQtgt_sigm: 0.6931
Episode: 5910 meanReward: 10.7500 meanLoss: 33.3572 meanLossQlbl: 20.9807 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9902 meanLossQtgt_sigm: 0.6931
Episode: 5911 meanReward: 10.6875 meanLoss: 31.8960 meanLossQlbl: 20.0948 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4150 meanLossQtgt_sigm: 0.6931
Episode: 5912 meanReward: 10.5625 meanLoss: 32.2863 meanLossQlbl: 20.3573 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5427 meanLossQtgt_sigm: 0.6931
Episode: 5913 meanReward: 10.4688 meanLoss: 33.8163 meanLossQlbl: 21.2904 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1396 meanLossQtgt_sigm: 0.6931
Episode: 5914 meanReward: 10.4688 meanLoss: 35.2821 meanLossQlbl: 22.1171 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 5965 meanReward: 10.8438 meanLoss: 33.8177 meanLossQlbl: 21.1610 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2705 meanLossQtgt_sigm: 0.6931
Episode: 5966 meanReward: 10.6250 meanLoss: 31.9783 meanLossQlbl: 20.2352 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.3568 meanLossQtgt_sigm: 0.6931
Episode: 5967 meanReward: 10.6875 meanLoss: 34.1161 meanLossQlbl: 21.2871 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4427 meanLossQtgt_sigm: 0.6931
Episode: 5968 meanReward: 10.6562 meanLoss: 34.6798 meanLossQlbl: 21.6678 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6256 meanLossQtgt_sigm: 0.6931
Episode: 5969 meanReward: 10.6562 meanLoss: 34.7823 meanLossQlbl: 21.6646 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7314 meanLossQtgt_sigm: 0.6931
Episode: 5970 meanReward: 10.5312 meanLoss: 30.7711 meanLossQlbl: 19.4443 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.9406 meanLossQtgt_sigm: 0.6931
Episode: 5971 meanReward: 10.4375 meanLoss: 31.1644 meanLossQlbl: 19.6435 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 6021 meanReward: 10.5000 meanLoss: 33.4175 meanLossQlbl: 20.9833 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0479 meanLossQtgt_sigm: 0.6931
Episode: 6022 meanReward: 10.6875 meanLoss: 33.7582 meanLossQlbl: 21.0237 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3483 meanLossQtgt_sigm: 0.6931
Episode: 6023 meanReward: 10.7812 meanLoss: 33.7217 meanLossQlbl: 20.9555 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3800 meanLossQtgt_sigm: 0.6931
Episode: 6024 meanReward: 10.8125 meanLoss: 32.5722 meanLossQlbl: 20.2875 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8984 meanLossQtgt_sigm: 0.6931
Episode: 6025 meanReward: 10.7188 meanLoss: 33.3865 meanLossQlbl: 20.7793 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2210 meanLossQtgt_sigm: 0.6931
Episode: 6026 meanReward: 10.7500 meanLoss: 31.0139 meanLossQlbl: 19.4872 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1404 meanLossQtgt_sigm: 0.6931
Episode: 6027 meanReward: 10.7500 meanLoss: 25.5084 meanLossQlbl: 16.4041 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 6077 meanReward: 10.8125 meanLoss: 35.5721 meanLossQlbl: 22.1411 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0447 meanLossQtgt_sigm: 0.6931
Episode: 6078 meanReward: 10.7812 meanLoss: 35.2850 meanLossQlbl: 22.0597 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8390 meanLossQtgt_sigm: 0.6931
Episode: 6079 meanReward: 10.9062 meanLoss: 34.3063 meanLossQlbl: 21.3098 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6102 meanLossQtgt_sigm: 0.6931
Episode: 6080 meanReward: 10.9688 meanLoss: 33.0952 meanLossQlbl: 20.6362 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0727 meanLossQtgt_sigm: 0.6931
Episode: 6081 meanReward: 10.9062 meanLoss: 33.2602 meanLossQlbl: 20.6947 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1792 meanLossQtgt_sigm: 0.6931
Episode: 6082 meanReward: 10.7188 meanLoss: 31.2212 meanLossQlbl: 19.6661 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1687 meanLossQtgt_sigm: 0.6931
Episode: 6083 meanReward: 10.7500 meanLoss: 34.2917 meanLossQlbl: 21.3502 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 6134 meanReward: 11.0000 meanLoss: 33.1095 meanLossQlbl: 20.6567 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0665 meanLossQtgt_sigm: 0.6931
Episode: 6135 meanReward: 10.9375 meanLoss: 32.4241 meanLossQlbl: 20.3922 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6456 meanLossQtgt_sigm: 0.6931
Episode: 6136 meanReward: 10.9688 meanLoss: 35.1909 meanLossQlbl: 21.8980 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9066 meanLossQtgt_sigm: 0.6931
Episode: 6137 meanReward: 10.9062 meanLoss: 30.3118 meanLossQlbl: 19.1858 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.7397 meanLossQtgt_sigm: 0.6931
Episode: 6138 meanReward: 10.7812 meanLoss: 33.6159 meanLossQlbl: 21.1249 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1047 meanLossQtgt_sigm: 0.6931
Episode: 6139 meanReward: 10.8125 meanLoss: 35.6791 meanLossQlbl: 22.3130 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9798 meanLossQtgt_sigm: 0.6931
Episode: 6140 meanReward: 10.8125 meanLoss: 34.1475 meanLossQlbl: 21.4352 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 6190 meanReward: 11.0625 meanLoss: 33.6427 meanLossQlbl: 20.9316 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3248 meanLossQtgt_sigm: 0.6931
Episode: 6191 meanReward: 11.1562 meanLoss: 35.0774 meanLossQlbl: 21.6716 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0196 meanLossQtgt_sigm: 0.6931
Episode: 6192 meanReward: 10.9688 meanLoss: 32.2642 meanLossQlbl: 20.1279 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7500 meanLossQtgt_sigm: 0.6931
Episode: 6193 meanReward: 11.0000 meanLoss: 30.2545 meanLossQlbl: 19.2811 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.5871 meanLossQtgt_sigm: 0.6931
Episode: 6194 meanReward: 11.0312 meanLoss: 32.2294 meanLossQlbl: 20.2447 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5984 meanLossQtgt_sigm: 0.6931
Episode: 6195 meanReward: 11.0312 meanLoss: 33.4636 meanLossQlbl: 21.0553 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0220 meanLossQtgt_sigm: 0.6931
Episode: 6196 meanReward: 11.0000 meanLoss: 35.2709 meanLossQlbl: 22.0644 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 6246 meanReward: 11.1250 meanLoss: 35.8160 meanLossQlbl: 22.3078 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1219 meanLossQtgt_sigm: 0.6931
Episode: 6247 meanReward: 11.0625 meanLoss: 34.0966 meanLossQlbl: 21.3741 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3362 meanLossQtgt_sigm: 0.6931
Episode: 6248 meanReward: 10.9375 meanLoss: 35.4201 meanLossQlbl: 22.1129 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9209 meanLossQtgt_sigm: 0.6931
Episode: 6249 meanReward: 10.9375 meanLoss: 34.3683 meanLossQlbl: 21.5563 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4257 meanLossQtgt_sigm: 0.6931
Episode: 6250 meanReward: 11.0312 meanLoss: 33.6888 meanLossQlbl: 21.1756 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1269 meanLossQtgt_sigm: 0.6931
Episode: 6251 meanReward: 10.9688 meanLoss: 35.1260 meanLossQlbl: 21.9456 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7941 meanLossQtgt_sigm: 0.6931
Episode: 6252 meanReward: 11.0000 meanLoss: 34.8896 meanLossQlbl: 21.7222 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 6302 meanReward: 10.5000 meanLoss: 32.1141 meanLossQlbl: 20.1474 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5804 meanLossQtgt_sigm: 0.6931
Episode: 6303 meanReward: 10.4688 meanLoss: 34.4744 meanLossQlbl: 21.5579 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5302 meanLossQtgt_sigm: 0.6931
Episode: 6304 meanReward: 10.2812 meanLoss: 31.9364 meanLossQlbl: 20.1196 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4305 meanLossQtgt_sigm: 0.6931
Episode: 6305 meanReward: 10.4062 meanLoss: 33.0909 meanLossQlbl: 20.7857 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9189 meanLossQtgt_sigm: 0.6931
Episode: 6306 meanReward: 10.5312 meanLoss: 35.6863 meanLossQlbl: 22.0655 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2345 meanLossQtgt_sigm: 0.6931
Episode: 6307 meanReward: 10.4688 meanLoss: 34.6417 meanLossQlbl: 21.5145 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7408 meanLossQtgt_sigm: 0.6931
Episode: 6308 meanReward: 10.4688 meanLoss: 31.7734 meanLossQlbl: 19.9749 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 6358 meanReward: 11.0000 meanLoss: 35.6755 meanLossQlbl: 22.0268 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2624 meanLossQtgt_sigm: 0.6931
Episode: 6359 meanReward: 11.1250 meanLoss: 31.4443 meanLossQlbl: 19.6077 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4502 meanLossQtgt_sigm: 0.6931
Episode: 6360 meanReward: 11.1875 meanLoss: 32.1393 meanLossQlbl: 20.0048 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7482 meanLossQtgt_sigm: 0.6931
Episode: 6361 meanReward: 11.0938 meanLoss: 32.6679 meanLossQlbl: 20.3634 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9182 meanLossQtgt_sigm: 0.6931
Episode: 6362 meanReward: 11.1250 meanLoss: 31.4904 meanLossQlbl: 19.7963 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.3079 meanLossQtgt_sigm: 0.6931
Episode: 6363 meanReward: 11.0938 meanLoss: 28.7091 meanLossQlbl: 18.2855 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.0373 meanLossQtgt_sigm: 0.6931
Episode: 6364 meanReward: 10.8750 meanLoss: 33.0684 meanLossQlbl: 20.7979 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 6414 meanReward: 10.3438 meanLoss: 31.3640 meanLossQlbl: 19.7658 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2119 meanLossQtgt_sigm: 0.6931
Episode: 6415 meanReward: 10.4688 meanLoss: 34.7751 meanLossQlbl: 21.7400 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6488 meanLossQtgt_sigm: 0.6931
Episode: 6416 meanReward: 10.4688 meanLoss: 36.4418 meanLossQlbl: 22.5134 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.5421 meanLossQtgt_sigm: 0.6931
Episode: 6417 meanReward: 10.5000 meanLoss: 34.8120 meanLossQlbl: 21.6151 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8106 meanLossQtgt_sigm: 0.6931
Episode: 6418 meanReward: 10.4688 meanLoss: 32.2061 meanLossQlbl: 20.2740 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5458 meanLossQtgt_sigm: 0.6931
Episode: 6419 meanReward: 10.4375 meanLoss: 33.8577 meanLossQlbl: 21.2362 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2351 meanLossQtgt_sigm: 0.6931
Episode: 6420 meanReward: 10.4688 meanLoss: 32.4531 meanLossQlbl: 20.3799 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 6471 meanReward: 10.3750 meanLoss: 30.3494 meanLossQlbl: 19.1045 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.8586 meanLossQtgt_sigm: 0.6931
Episode: 6472 meanReward: 10.3750 meanLoss: 24.0984 meanLossQlbl: 15.5016 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.2105 meanLossQtgt_sigm: 0.6931
Episode: 6473 meanReward: 10.3750 meanLoss: 29.6062 meanLossQlbl: 18.7712 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.4487 meanLossQtgt_sigm: 0.6931
Episode: 6474 meanReward: 10.4062 meanLoss: 35.9490 meanLossQlbl: 22.4491 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1136 meanLossQtgt_sigm: 0.6931
Episode: 6475 meanReward: 10.5625 meanLoss: 36.5213 meanLossQlbl: 22.5486 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.5865 meanLossQtgt_sigm: 0.6931
Episode: 6476 meanReward: 10.5625 meanLoss: 34.0985 meanLossQlbl: 21.1580 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5543 meanLossQtgt_sigm: 0.6931
Episode: 6477 meanReward: 10.6875 meanLoss: 32.1856 meanLossQlbl: 20.0992 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 1

Episode: 6528 meanReward: 10.5938 meanLoss: 34.0199 meanLossQlbl: 21.1805 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4531 meanLossQtgt_sigm: 0.6931
Episode: 6529 meanReward: 10.5000 meanLoss: 27.6068 meanLossQlbl: 17.6781 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.5425 meanLossQtgt_sigm: 0.6931
Episode: 6530 meanReward: 10.5000 meanLoss: 31.2210 meanLossQlbl: 19.7568 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.0778 meanLossQtgt_sigm: 0.6931
Episode: 6531 meanReward: 10.3438 meanLoss: 35.2029 meanLossQlbl: 22.0730 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7436 meanLossQtgt_sigm: 0.6931
Episode: 6532 meanReward: 10.3438 meanLoss: 35.3371 meanLossQlbl: 22.1397 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8111 meanLossQtgt_sigm: 0.6931
Episode: 6533 meanReward: 10.3125 meanLoss: 34.7690 meanLossQlbl: 21.8064 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5762 meanLossQtgt_sigm: 0.6931
Episode: 6534 meanReward: 10.3125 meanLoss: 34.6904 meanLossQlbl: 21.7446 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 6585 meanReward: 10.8438 meanLoss: 32.1588 meanLossQlbl: 20.3259 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4465 meanLossQtgt_sigm: 0.6931
Episode: 6586 meanReward: 10.8750 meanLoss: 34.5486 meanLossQlbl: 21.6416 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5207 meanLossQtgt_sigm: 0.6931
Episode: 6587 meanReward: 10.7500 meanLoss: 34.1361 meanLossQlbl: 21.3451 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4048 meanLossQtgt_sigm: 0.6931
Episode: 6588 meanReward: 10.9062 meanLoss: 32.5647 meanLossQlbl: 20.3624 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8160 meanLossQtgt_sigm: 0.6931
Episode: 6589 meanReward: 10.9062 meanLoss: 35.0259 meanLossQlbl: 21.8748 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7648 meanLossQtgt_sigm: 0.6931
Episode: 6590 meanReward: 11.0938 meanLoss: 34.8532 meanLossQlbl: 21.6237 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8432 meanLossQtgt_sigm: 0.6931
Episode: 6591 meanReward: 11.2500 meanLoss: 34.3189 meanLossQlbl: 21.3034 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 6642 meanReward: 10.2812 meanLoss: 33.3278 meanLossQlbl: 20.9722 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9692 meanLossQtgt_sigm: 0.6931
Episode: 6643 meanReward: 10.2812 meanLoss: 31.6993 meanLossQlbl: 20.0226 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2904 meanLossQtgt_sigm: 0.6931
Episode: 6644 meanReward: 10.2812 meanLoss: 34.0049 meanLossQlbl: 21.3363 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2823 meanLossQtgt_sigm: 0.6931
Episode: 6645 meanReward: 10.2812 meanLoss: 35.8003 meanLossQlbl: 22.2528 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1612 meanLossQtgt_sigm: 0.6931
Episode: 6646 meanReward: 10.3750 meanLoss: 36.0316 meanLossQlbl: 22.3117 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3335 meanLossQtgt_sigm: 0.6931
Episode: 6647 meanReward: 10.3125 meanLoss: 35.3769 meanLossQlbl: 21.9965 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9941 meanLossQtgt_sigm: 0.6931
Episode: 6648 meanReward: 10.2188 meanLoss: 33.7683 meanLossQlbl: 21.2373 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 6698 meanReward: 10.5625 meanLoss: 36.1302 meanLossQlbl: 22.3593 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3845 meanLossQtgt_sigm: 0.6931
Episode: 6699 meanReward: 10.5312 meanLoss: 33.2121 meanLossQlbl: 20.7009 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1249 meanLossQtgt_sigm: 0.6931
Episode: 6700 meanReward: 10.5938 meanLoss: 28.4391 meanLossQlbl: 18.1301 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9227 meanLossQtgt_sigm: 0.6931
Episode: 6701 meanReward: 10.4062 meanLoss: 34.7586 meanLossQlbl: 21.5987 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7737 meanLossQtgt_sigm: 0.6931
Episode: 6702 meanReward: 10.4062 meanLoss: 34.8266 meanLossQlbl: 21.6801 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7603 meanLossQtgt_sigm: 0.6931
Episode: 6703 meanReward: 10.4375 meanLoss: 32.6885 meanLossQlbl: 20.6026 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6996 meanLossQtgt_sigm: 0.6931
Episode: 6704 meanReward: 10.5938 meanLoss: 33.0120 meanLossQlbl: 20.6224 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 6754 meanReward: 11.3750 meanLoss: 32.5369 meanLossQlbl: 20.2506 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9000 meanLossQtgt_sigm: 0.6931
Episode: 6755 meanReward: 11.3750 meanLoss: 34.1278 meanLossQlbl: 21.2449 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4967 meanLossQtgt_sigm: 0.6931
Episode: 6756 meanReward: 11.3750 meanLoss: 33.6535 meanLossQlbl: 20.9518 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3154 meanLossQtgt_sigm: 0.6931
Episode: 6757 meanReward: 11.4688 meanLoss: 33.6454 meanLossQlbl: 20.8941 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3650 meanLossQtgt_sigm: 0.6931
Episode: 6758 meanReward: 11.4062 meanLoss: 32.6737 meanLossQlbl: 20.3536 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9338 meanLossQtgt_sigm: 0.6931
Episode: 6759 meanReward: 11.4062 meanLoss: 30.5272 meanLossQlbl: 19.3623 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.7786 meanLossQtgt_sigm: 0.6931
Episode: 6760 meanReward: 11.4375 meanLoss: 31.5699 meanLossQlbl: 19.8602 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 6810 meanReward: 10.7812 meanLoss: 34.0517 meanLossQlbl: 21.1518 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5136 meanLossQtgt_sigm: 0.6931
Episode: 6811 meanReward: 10.8438 meanLoss: 33.1591 meanLossQlbl: 20.6433 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1295 meanLossQtgt_sigm: 0.6931
Episode: 6812 meanReward: 10.8438 meanLoss: 34.7493 meanLossQlbl: 21.5461 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8169 meanLossQtgt_sigm: 0.6931
Episode: 6813 meanReward: 10.8438 meanLoss: 33.4915 meanLossQlbl: 20.9806 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1247 meanLossQtgt_sigm: 0.6931
Episode: 6814 meanReward: 10.9062 meanLoss: 32.0486 meanLossQlbl: 20.1985 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4638 meanLossQtgt_sigm: 0.6931
Episode: 6815 meanReward: 10.9375 meanLoss: 34.0312 meanLossQlbl: 21.3115 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3334 meanLossQtgt_sigm: 0.6931
Episode: 6816 meanReward: 11.0938 meanLoss: 35.0411 meanLossQlbl: 21.6913 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 6866 meanReward: 11.0312 meanLoss: 33.8783 meanLossQlbl: 21.2631 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2289 meanLossQtgt_sigm: 0.6931
Episode: 6867 meanReward: 10.8125 meanLoss: 35.9854 meanLossQlbl: 22.3831 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2160 meanLossQtgt_sigm: 0.6931
Episode: 6868 meanReward: 10.7812 meanLoss: 35.3339 meanLossQlbl: 22.0353 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9123 meanLossQtgt_sigm: 0.6931
Episode: 6869 meanReward: 10.8750 meanLoss: 31.0748 meanLossQlbl: 19.5611 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1274 meanLossQtgt_sigm: 0.6931
Episode: 6870 meanReward: 11.0000 meanLoss: 35.1755 meanLossQlbl: 21.8351 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9542 meanLossQtgt_sigm: 0.6931
Episode: 6871 meanReward: 10.9062 meanLoss: 34.5948 meanLossQlbl: 21.4993 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7092 meanLossQtgt_sigm: 0.6931
Episode: 6872 meanReward: 10.9375 meanLoss: 32.2565 meanLossQlbl: 20.3295 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 6923 meanReward: 11.4062 meanLoss: 33.7976 meanLossQlbl: 21.2235 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1878 meanLossQtgt_sigm: 0.6931
Episode: 6924 meanReward: 11.4688 meanLoss: 33.1298 meanLossQlbl: 20.7366 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0069 meanLossQtgt_sigm: 0.6931
Episode: 6925 meanReward: 11.5938 meanLoss: 35.0582 meanLossQlbl: 21.7218 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9501 meanLossQtgt_sigm: 0.6931
Episode: 6926 meanReward: 11.5938 meanLoss: 33.9155 meanLossQlbl: 21.0252 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5040 meanLossQtgt_sigm: 0.6931
Episode: 6927 meanReward: 11.4375 meanLoss: 32.0343 meanLossQlbl: 19.9997 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6483 meanLossQtgt_sigm: 0.6931
Episode: 6928 meanReward: 11.4062 meanLoss: 33.6997 meanLossQlbl: 21.1347 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1787 meanLossQtgt_sigm: 0.6931
Episode: 6929 meanReward: 11.5625 meanLoss: 31.0999 meanLossQlbl: 19.5939 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 6979 meanReward: 10.6875 meanLoss: 33.4774 meanLossQlbl: 20.9059 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1853 meanLossQtgt_sigm: 0.6931
Episode: 6980 meanReward: 10.5312 meanLoss: 35.0710 meanLossQlbl: 21.8922 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7925 meanLossQtgt_sigm: 0.6931
Episode: 6981 meanReward: 10.4688 meanLoss: 33.6061 meanLossQlbl: 21.0407 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1791 meanLossQtgt_sigm: 0.6931
Episode: 6982 meanReward: 10.6250 meanLoss: 33.8212 meanLossQlbl: 21.1789 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2561 meanLossQtgt_sigm: 0.6931
Episode: 6983 meanReward: 10.5938 meanLoss: 36.8434 meanLossQlbl: 22.7240 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.7331 meanLossQtgt_sigm: 0.6931
Episode: 6984 meanReward: 10.4375 meanLoss: 34.1529 meanLossQlbl: 21.2030 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5635 meanLossQtgt_sigm: 0.6931
Episode: 6985 meanReward: 10.3125 meanLoss: 31.4667 meanLossQlbl: 19.8093 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 7035 meanReward: 10.5312 meanLoss: 34.9523 meanLossQlbl: 21.7482 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8177 meanLossQtgt_sigm: 0.6931
Episode: 7036 meanReward: 10.5312 meanLoss: 34.8189 meanLossQlbl: 21.6611 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7716 meanLossQtgt_sigm: 0.6931
Episode: 7037 meanReward: 10.5000 meanLoss: 33.6535 meanLossQlbl: 20.9984 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2688 meanLossQtgt_sigm: 0.6931
Episode: 7038 meanReward: 10.5000 meanLoss: 30.2683 meanLossQlbl: 19.2189 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.6630 meanLossQtgt_sigm: 0.6931
Episode: 7039 meanReward: 10.4375 meanLoss: 34.9086 meanLossQlbl: 21.8266 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6958 meanLossQtgt_sigm: 0.6931
Episode: 7040 meanReward: 10.5625 meanLoss: 35.1180 meanLossQlbl: 21.8730 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8587 meanLossQtgt_sigm: 0.6931
Episode: 7041 meanReward: 10.5625 meanLoss: 34.2253 meanLossQlbl: 21.3892 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 7091 meanReward: 10.7812 meanLoss: 33.2362 meanLossQlbl: 20.8015 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0484 meanLossQtgt_sigm: 0.6931
Episode: 7092 meanReward: 10.9375 meanLoss: 35.5585 meanLossQlbl: 22.0511 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1212 meanLossQtgt_sigm: 0.6931
Episode: 7093 meanReward: 10.9688 meanLoss: 34.7093 meanLossQlbl: 21.5171 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8060 meanLossQtgt_sigm: 0.6931
Episode: 7094 meanReward: 10.8438 meanLoss: 31.9565 meanLossQlbl: 19.9830 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5872 meanLossQtgt_sigm: 0.6931
Episode: 7095 meanReward: 10.9375 meanLoss: 30.7797 meanLossQlbl: 19.3719 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.0215 meanLossQtgt_sigm: 0.6931
Episode: 7096 meanReward: 10.9688 meanLoss: 34.0413 meanLossQlbl: 21.1625 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4925 meanLossQtgt_sigm: 0.6931
Episode: 7097 meanReward: 10.9062 meanLoss: 35.0180 meanLossQlbl: 21.7201 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 7147 meanReward: 10.8438 meanLoss: 31.9212 meanLossQlbl: 19.9350 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5999 meanLossQtgt_sigm: 0.6931
Episode: 7148 meanReward: 10.7500 meanLoss: 28.2820 meanLossQlbl: 18.0574 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8383 meanLossQtgt_sigm: 0.6931
Episode: 7149 meanReward: 10.7188 meanLoss: 34.0009 meanLossQlbl: 21.3437 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2709 meanLossQtgt_sigm: 0.6931
Episode: 7150 meanReward: 10.7500 meanLoss: 34.1334 meanLossQlbl: 21.4424 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3046 meanLossQtgt_sigm: 0.6931
Episode: 7151 meanReward: 10.6562 meanLoss: 35.5549 meanLossQlbl: 22.2039 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9648 meanLossQtgt_sigm: 0.6931
Episode: 7152 meanReward: 10.5625 meanLoss: 34.8040 meanLossQlbl: 21.8216 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5962 meanLossQtgt_sigm: 0.6931
Episode: 7153 meanReward: 10.6875 meanLoss: 35.3792 meanLossQlbl: 22.1079 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 7204 meanReward: 10.6250 meanLoss: 33.5492 meanLossQlbl: 21.0973 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0657 meanLossQtgt_sigm: 0.6931
Episode: 7205 meanReward: 10.6875 meanLoss: 35.2736 meanLossQlbl: 22.0983 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7890 meanLossQtgt_sigm: 0.6931
Episode: 7206 meanReward: 10.6250 meanLoss: 35.6890 meanLossQlbl: 22.2775 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0252 meanLossQtgt_sigm: 0.6931
Episode: 7207 meanReward: 10.7812 meanLoss: 34.8983 meanLossQlbl: 21.6595 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8525 meanLossQtgt_sigm: 0.6931
Episode: 7208 meanReward: 10.7812 meanLoss: 33.8582 meanLossQlbl: 21.0295 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4424 meanLossQtgt_sigm: 0.6931
Episode: 7209 meanReward: 10.6875 meanLoss: 33.7810 meanLossQlbl: 20.9784 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4163 meanLossQtgt_sigm: 0.6931
Episode: 7210 meanReward: 10.7188 meanLoss: 31.5962 meanLossQlbl: 19.8418 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 7260 meanReward: 10.7812 meanLoss: 32.1497 meanLossQlbl: 20.1590 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6044 meanLossQtgt_sigm: 0.6931
Episode: 7261 meanReward: 10.8125 meanLoss: 33.9224 meanLossQlbl: 21.1811 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3550 meanLossQtgt_sigm: 0.6931
Episode: 7262 meanReward: 10.9062 meanLoss: 33.9441 meanLossQlbl: 21.1106 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4472 meanLossQtgt_sigm: 0.6931
Episode: 7263 meanReward: 11.0000 meanLoss: 33.5937 meanLossQlbl: 20.9338 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2736 meanLossQtgt_sigm: 0.6931
Episode: 7264 meanReward: 10.8438 meanLoss: 34.3929 meanLossQlbl: 21.4444 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5622 meanLossQtgt_sigm: 0.6931
Episode: 7265 meanReward: 10.8438 meanLoss: 32.4353 meanLossQlbl: 20.3843 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6647 meanLossQtgt_sigm: 0.6931
Episode: 7266 meanReward: 10.7500 meanLoss: 30.2969 meanLossQlbl: 19.1829 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 7317 meanReward: 11.4375 meanLoss: 33.5680 meanLossQlbl: 20.9829 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1988 meanLossQtgt_sigm: 0.6931
Episode: 7318 meanReward: 11.4375 meanLoss: 35.8442 meanLossQlbl: 22.2250 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2329 meanLossQtgt_sigm: 0.6931
Episode: 7319 meanReward: 11.4375 meanLoss: 33.5839 meanLossQlbl: 20.8753 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3223 meanLossQtgt_sigm: 0.6931
Episode: 7320 meanReward: 11.4062 meanLoss: 34.0215 meanLossQlbl: 21.1181 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5171 meanLossQtgt_sigm: 0.6931
Episode: 7321 meanReward: 11.3750 meanLoss: 33.4585 meanLossQlbl: 20.8557 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2166 meanLossQtgt_sigm: 0.6931
Episode: 7322 meanReward: 11.4375 meanLoss: 26.9594 meanLossQlbl: 17.2681 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3050 meanLossQtgt_sigm: 0.6931
Episode: 7323 meanReward: 11.2500 meanLoss: 32.5007 meanLossQlbl: 20.4440 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 7373 meanReward: 10.3750 meanLoss: 34.6424 meanLossQlbl: 21.4839 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7723 meanLossQtgt_sigm: 0.6931
Episode: 7374 meanReward: 10.3750 meanLoss: 32.8947 meanLossQlbl: 20.5783 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9301 meanLossQtgt_sigm: 0.6931
Episode: 7375 meanReward: 10.3750 meanLoss: 33.3923 meanLossQlbl: 20.9165 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0895 meanLossQtgt_sigm: 0.6931
Episode: 7376 meanReward: 10.1562 meanLoss: 32.3156 meanLossQlbl: 20.4089 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5204 meanLossQtgt_sigm: 0.6931
Episode: 7377 meanReward: 10.0312 meanLoss: 32.8952 meanLossQlbl: 20.7258 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7831 meanLossQtgt_sigm: 0.6931
Episode: 7378 meanReward: 10.1250 meanLoss: 34.9683 meanLossQlbl: 21.8872 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6949 meanLossQtgt_sigm: 0.6931
Episode: 7379 meanReward: 10.1562 meanLoss: 35.7812 meanLossQlbl: 22.2134 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 7429 meanReward: 10.7812 meanLoss: 33.7805 meanLossQlbl: 21.1544 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2398 meanLossQtgt_sigm: 0.6931
Episode: 7430 meanReward: 10.7500 meanLoss: 33.4238 meanLossQlbl: 20.9882 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0493 meanLossQtgt_sigm: 0.6931
Episode: 7431 meanReward: 10.5938 meanLoss: 34.0384 meanLossQlbl: 21.4156 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2366 meanLossQtgt_sigm: 0.6931
Episode: 7432 meanReward: 10.6250 meanLoss: 35.3155 meanLossQlbl: 22.0674 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8619 meanLossQtgt_sigm: 0.6931
Episode: 7433 meanReward: 10.6562 meanLoss: 35.7412 meanLossQlbl: 22.1822 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1727 meanLossQtgt_sigm: 0.6931
Episode: 7434 meanReward: 10.6562 meanLoss: 33.9300 meanLossQlbl: 21.1003 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4435 meanLossQtgt_sigm: 0.6931
Episode: 7435 meanReward: 10.6875 meanLoss: 32.5320 meanLossQlbl: 20.3646 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 7485 meanReward: 11.4688 meanLoss: 30.8321 meanLossQlbl: 19.4465 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.9993 meanLossQtgt_sigm: 0.6931
Episode: 7486 meanReward: 11.6250 meanLoss: 36.7077 meanLossQlbl: 22.6682 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.6533 meanLossQtgt_sigm: 0.6931
Episode: 7487 meanReward: 11.5938 meanLoss: 32.8106 meanLossQlbl: 20.4201 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0043 meanLossQtgt_sigm: 0.6931
Episode: 7488 meanReward: 11.7812 meanLoss: 31.5723 meanLossQlbl: 19.7140 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4720 meanLossQtgt_sigm: 0.6931
Episode: 7489 meanReward: 11.9062 meanLoss: 34.9525 meanLossQlbl: 21.6331 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9330 meanLossQtgt_sigm: 0.6931
Episode: 7490 meanReward: 11.8125 meanLoss: 34.2296 meanLossQlbl: 21.2418 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6015 meanLossQtgt_sigm: 0.6931
Episode: 7491 meanReward: 11.6875 meanLoss: 32.1387 meanLossQlbl: 20.2572 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 7542 meanReward: 10.6875 meanLoss: 35.1280 meanLossQlbl: 22.1058 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6359 meanLossQtgt_sigm: 0.6931
Episode: 7543 meanReward: 10.8438 meanLoss: 35.9260 meanLossQlbl: 22.4243 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1154 meanLossQtgt_sigm: 0.6931
Episode: 7544 meanReward: 10.8438 meanLoss: 35.8764 meanLossQlbl: 22.3427 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1474 meanLossQtgt_sigm: 0.6931
Episode: 7545 meanReward: 10.7812 meanLoss: 36.0142 meanLossQlbl: 22.3147 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3132 meanLossQtgt_sigm: 0.6931
Episode: 7546 meanReward: 10.7188 meanLoss: 30.5395 meanLossQlbl: 19.3567 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.7965 meanLossQtgt_sigm: 0.6931
Episode: 7547 meanReward: 10.7812 meanLoss: 34.9248 meanLossQlbl: 21.7230 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8155 meanLossQtgt_sigm: 0.6931
Episode: 7548 meanReward: 10.7500 meanLoss: 36.5718 meanLossQlbl: 22.6022 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 7597 meanReward: 10.8438 meanLoss: 32.3461 meanLossQlbl: 20.1320 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8278 meanLossQtgt_sigm: 0.6931
Episode: 7598 meanReward: 10.8750 meanLoss: 30.1301 meanLossQlbl: 19.0572 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.6866 meanLossQtgt_sigm: 0.6931
Episode: 7599 meanReward: 10.8438 meanLoss: 31.7296 meanLossQlbl: 20.0447 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2986 meanLossQtgt_sigm: 0.6931
Episode: 7600 meanReward: 10.6875 meanLoss: 33.8311 meanLossQlbl: 21.2460 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1989 meanLossQtgt_sigm: 0.6931
Episode: 7601 meanReward: 10.7500 meanLoss: 33.4965 meanLossQlbl: 21.1306 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9797 meanLossQtgt_sigm: 0.6931
Episode: 7602 meanReward: 10.7188 meanLoss: 34.0671 meanLossQlbl: 21.3780 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3028 meanLossQtgt_sigm: 0.6931
Episode: 7603 meanReward: 10.6250 meanLoss: 35.3582 meanLossQlbl: 22.1232 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 7654 meanReward: 10.6562 meanLoss: 31.3549 meanLossQlbl: 19.8094 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1592 meanLossQtgt_sigm: 0.6931
Episode: 7655 meanReward: 10.6875 meanLoss: 33.8330 meanLossQlbl: 21.2311 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2156 meanLossQtgt_sigm: 0.6931
Episode: 7656 meanReward: 10.6562 meanLoss: 33.5188 meanLossQlbl: 21.0813 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0512 meanLossQtgt_sigm: 0.6931
Episode: 7657 meanReward: 10.6562 meanLoss: 34.7627 meanLossQlbl: 21.7859 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5905 meanLossQtgt_sigm: 0.6931
Episode: 7658 meanReward: 10.7500 meanLoss: 34.1980 meanLossQlbl: 21.4540 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3576 meanLossQtgt_sigm: 0.6931
Episode: 7659 meanReward: 10.8438 meanLoss: 35.8725 meanLossQlbl: 22.2398 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2465 meanLossQtgt_sigm: 0.6931
Episode: 7660 meanReward: 10.8438 meanLoss: 35.4250 meanLossQlbl: 21.9788 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 7710 meanReward: 10.2812 meanLoss: 32.9499 meanLossQlbl: 20.7957 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7679 meanLossQtgt_sigm: 0.6931
Episode: 7711 meanReward: 10.2812 meanLoss: 35.1323 meanLossQlbl: 22.0647 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6814 meanLossQtgt_sigm: 0.6931
Episode: 7712 meanReward: 10.2500 meanLoss: 35.0742 meanLossQlbl: 22.0160 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6719 meanLossQtgt_sigm: 0.6931
Episode: 7713 meanReward: 10.2188 meanLoss: 34.6873 meanLossQlbl: 21.8091 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4919 meanLossQtgt_sigm: 0.6931
Episode: 7714 meanReward: 10.3438 meanLoss: 34.9541 meanLossQlbl: 21.9207 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6471 meanLossQtgt_sigm: 0.6931
Episode: 7715 meanReward: 10.3750 meanLoss: 35.4406 meanLossQlbl: 22.0711 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9832 meanLossQtgt_sigm: 0.6931
Episode: 7716 meanReward: 10.4062 meanLoss: 34.9224 meanLossQlbl: 21.7228 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 7766 meanReward: 10.9375 meanLoss: 32.5098 meanLossQlbl: 20.2893 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8342 meanLossQtgt_sigm: 0.6931
Episode: 7767 meanReward: 10.9688 meanLoss: 34.4578 meanLossQlbl: 21.4386 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6329 meanLossQtgt_sigm: 0.6931
Episode: 7768 meanReward: 10.9062 meanLoss: 31.3294 meanLossQlbl: 19.8248 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1183 meanLossQtgt_sigm: 0.6931
Episode: 7769 meanReward: 11.0312 meanLoss: 32.8002 meanLossQlbl: 20.5623 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8516 meanLossQtgt_sigm: 0.6931
Episode: 7770 meanReward: 11.1562 meanLoss: 35.4121 meanLossQlbl: 21.9155 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1102 meanLossQtgt_sigm: 0.6931
Episode: 7771 meanReward: 11.0625 meanLoss: 35.4271 meanLossQlbl: 21.9246 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1161 meanLossQtgt_sigm: 0.6931
Episode: 7772 meanReward: 11.0625 meanLoss: 32.9860 meanLossQlbl: 20.7110 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 7822 meanReward: 10.5938 meanLoss: 34.7347 meanLossQlbl: 21.6033 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7451 meanLossQtgt_sigm: 0.6931
Episode: 7823 meanReward: 10.5938 meanLoss: 32.1501 meanLossQlbl: 20.2910 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4728 meanLossQtgt_sigm: 0.6931
Episode: 7824 meanReward: 10.6562 meanLoss: 34.3137 meanLossQlbl: 21.5120 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4154 meanLossQtgt_sigm: 0.6931
Episode: 7825 meanReward: 10.7812 meanLoss: 33.5811 meanLossQlbl: 20.9277 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2672 meanLossQtgt_sigm: 0.6931
Episode: 7826 meanReward: 10.7500 meanLoss: 35.7686 meanLossQlbl: 22.1837 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1986 meanLossQtgt_sigm: 0.6931
Episode: 7827 meanReward: 10.5625 meanLoss: 35.1952 meanLossQlbl: 22.0024 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8066 meanLossQtgt_sigm: 0.6931
Episode: 7828 meanReward: 10.3750 meanLoss: 30.7549 meanLossQlbl: 19.4328 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 7878 meanReward: 10.5000 meanLoss: 33.5618 meanLossQlbl: 20.8756 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2999 meanLossQtgt_sigm: 0.6931
Episode: 7879 meanReward: 10.3125 meanLoss: 35.3144 meanLossQlbl: 21.8869 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0412 meanLossQtgt_sigm: 0.6931
Episode: 7880 meanReward: 10.2812 meanLoss: 33.0395 meanLossQlbl: 20.5913 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0619 meanLossQtgt_sigm: 0.6931
Episode: 7881 meanReward: 10.3438 meanLoss: 28.3636 meanLossQlbl: 18.1556 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.8217 meanLossQtgt_sigm: 0.6931
Episode: 7882 meanReward: 10.3438 meanLoss: 32.6019 meanLossQlbl: 20.5415 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6741 meanLossQtgt_sigm: 0.6931
Episode: 7883 meanReward: 10.2500 meanLoss: 34.5793 meanLossQlbl: 21.6851 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5079 meanLossQtgt_sigm: 0.6931
Episode: 7884 meanReward: 10.1562 meanLoss: 34.9892 meanLossQlbl: 21.9114 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 7934 meanReward: 10.5938 meanLoss: 32.1868 meanLossQlbl: 20.1390 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6615 meanLossQtgt_sigm: 0.6931
Episode: 7935 meanReward: 10.5625 meanLoss: 35.4895 meanLossQlbl: 22.0802 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0230 meanLossQtgt_sigm: 0.6931
Episode: 7936 meanReward: 10.4062 meanLoss: 35.1405 meanLossQlbl: 21.8783 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8759 meanLossQtgt_sigm: 0.6931
Episode: 7937 meanReward: 10.4062 meanLoss: 29.4061 meanLossQlbl: 18.7059 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.3139 meanLossQtgt_sigm: 0.6931
Episode: 7938 meanReward: 10.4375 meanLoss: 33.2106 meanLossQlbl: 20.8693 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9550 meanLossQtgt_sigm: 0.6931
Episode: 7939 meanReward: 10.3125 meanLoss: 35.8543 meanLossQlbl: 22.3662 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1018 meanLossQtgt_sigm: 0.6931
Episode: 7940 meanReward: 10.1562 meanLoss: 34.2575 meanLossQlbl: 21.4954 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 7991 meanReward: 11.0000 meanLoss: 32.3198 meanLossQlbl: 20.2770 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6565 meanLossQtgt_sigm: 0.6931
Episode: 7992 meanReward: 10.8438 meanLoss: 31.4606 meanLossQlbl: 19.8181 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2562 meanLossQtgt_sigm: 0.6931
Episode: 7993 meanReward: 10.7812 meanLoss: 33.7560 meanLossQlbl: 21.1742 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1955 meanLossQtgt_sigm: 0.6931
Episode: 7994 meanReward: 10.6562 meanLoss: 33.6655 meanLossQlbl: 21.1887 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0905 meanLossQtgt_sigm: 0.6931
Episode: 7995 meanReward: 10.8438 meanLoss: 35.3033 meanLossQlbl: 21.9918 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9252 meanLossQtgt_sigm: 0.6931
Episode: 7996 meanReward: 10.7188 meanLoss: 36.3535 meanLossQlbl: 22.4792 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.4880 meanLossQtgt_sigm: 0.6931
Episode: 7997 meanReward: 10.6250 meanLoss: 33.3388 meanLossQlbl: 20.7691 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8047 meanReward: 10.8125 meanLoss: 31.9967 meanLossQlbl: 19.9823 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6282 meanLossQtgt_sigm: 0.6931
Episode: 8048 meanReward: 10.7812 meanLoss: 33.4025 meanLossQlbl: 20.9552 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0610 meanLossQtgt_sigm: 0.6931
Episode: 8049 meanReward: 10.7500 meanLoss: 33.7242 meanLossQlbl: 21.1434 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1945 meanLossQtgt_sigm: 0.6931
Episode: 8050 meanReward: 10.8125 meanLoss: 32.3978 meanLossQlbl: 20.3262 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6853 meanLossQtgt_sigm: 0.6931
Episode: 8051 meanReward: 10.7500 meanLoss: 35.4230 meanLossQlbl: 22.0742 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9625 meanLossQtgt_sigm: 0.6931
Episode: 8052 meanReward: 10.7188 meanLoss: 34.2067 meanLossQlbl: 21.2445 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5759 meanLossQtgt_sigm: 0.6931
Episode: 8053 meanReward: 10.7812 meanLoss: 33.8124 meanLossQlbl: 21.0611 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8103 meanReward: 11.2812 meanLoss: 34.2446 meanLossQlbl: 21.2829 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5754 meanLossQtgt_sigm: 0.6931
Episode: 8104 meanReward: 11.2812 meanLoss: 33.2140 meanLossQlbl: 20.6595 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1682 meanLossQtgt_sigm: 0.6931
Episode: 8105 meanReward: 11.2812 meanLoss: 32.9271 meanLossQlbl: 20.5591 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9817 meanLossQtgt_sigm: 0.6931
Episode: 8106 meanReward: 11.1250 meanLoss: 31.6693 meanLossQlbl: 19.9682 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.3148 meanLossQtgt_sigm: 0.6931
Episode: 8107 meanReward: 11.1562 meanLoss: 34.7706 meanLossQlbl: 21.6316 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7528 meanLossQtgt_sigm: 0.6931
Episode: 8108 meanReward: 11.0625 meanLoss: 30.5905 meanLossQlbl: 19.4053 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.7988 meanLossQtgt_sigm: 0.6931
Episode: 8109 meanReward: 10.9062 meanLoss: 34.0334 meanLossQlbl: 21.3317 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 8159 meanReward: 10.3438 meanLoss: 34.9537 meanLossQlbl: 21.9709 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5966 meanLossQtgt_sigm: 0.6931
Episode: 8160 meanReward: 10.3438 meanLoss: 34.7450 meanLossQlbl: 21.8349 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5238 meanLossQtgt_sigm: 0.6931
Episode: 8161 meanReward: 10.2188 meanLoss: 34.7951 meanLossQlbl: 21.8514 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5574 meanLossQtgt_sigm: 0.6931
Episode: 8162 meanReward: 10.2188 meanLoss: 34.8810 meanLossQlbl: 21.9216 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5731 meanLossQtgt_sigm: 0.6931
Episode: 8163 meanReward: 10.0000 meanLoss: 35.1104 meanLossQlbl: 22.0181 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7060 meanLossQtgt_sigm: 0.6931
Episode: 8164 meanReward: 10.1250 meanLoss: 35.2674 meanLossQlbl: 22.0475 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8336 meanLossQtgt_sigm: 0.6931
Episode: 8165 meanReward: 10.0000 meanLoss: 35.7804 meanLossQlbl: 22.2480 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8215 meanReward: 10.9688 meanLoss: 32.7028 meanLossQlbl: 20.5055 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8110 meanLossQtgt_sigm: 0.6931
Episode: 8216 meanReward: 10.8125 meanLoss: 31.2370 meanLossQlbl: 19.7749 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.0758 meanLossQtgt_sigm: 0.6931
Episode: 8217 meanReward: 10.7188 meanLoss: 34.1382 meanLossQlbl: 21.3652 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3867 meanLossQtgt_sigm: 0.6931
Episode: 8218 meanReward: 10.6250 meanLoss: 35.1041 meanLossQlbl: 21.8827 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8352 meanLossQtgt_sigm: 0.6931
Episode: 8219 meanReward: 10.5625 meanLoss: 34.2723 meanLossQlbl: 21.3255 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5605 meanLossQtgt_sigm: 0.6931
Episode: 8220 meanReward: 10.6562 meanLoss: 32.5704 meanLossQlbl: 20.4428 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7413 meanLossQtgt_sigm: 0.6931
Episode: 8221 meanReward: 10.7812 meanLoss: 34.8370 meanLossQlbl: 21.6494 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8272 meanReward: 10.8125 meanLoss: 34.9425 meanLossQlbl: 21.8585 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6977 meanLossQtgt_sigm: 0.6931
Episode: 8273 meanReward: 11.0000 meanLoss: 34.4472 meanLossQlbl: 21.5973 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4637 meanLossQtgt_sigm: 0.6931
Episode: 8274 meanReward: 10.8438 meanLoss: 36.6816 meanLossQlbl: 22.7366 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.5587 meanLossQtgt_sigm: 0.6931
Episode: 8275 meanReward: 10.9688 meanLoss: 32.8454 meanLossQlbl: 20.4580 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0011 meanLossQtgt_sigm: 0.6931
Episode: 8276 meanReward: 10.9688 meanLoss: 28.6995 meanLossQlbl: 18.2760 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.0372 meanLossQtgt_sigm: 0.6931
Episode: 8277 meanReward: 10.7812 meanLoss: 34.3593 meanLossQlbl: 21.4069 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5661 meanLossQtgt_sigm: 0.6931
Episode: 8278 meanReward: 10.7500 meanLoss: 30.8466 meanLossQlbl: 19.4756 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 8329 meanReward: 11.0312 meanLoss: 30.7781 meanLossQlbl: 19.3259 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.0659 meanLossQtgt_sigm: 0.6931
Episode: 8330 meanReward: 10.9688 meanLoss: 31.7661 meanLossQlbl: 19.9700 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4099 meanLossQtgt_sigm: 0.6931
Episode: 8331 meanReward: 10.8438 meanLoss: 33.2007 meanLossQlbl: 20.6877 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1267 meanLossQtgt_sigm: 0.6931
Episode: 8332 meanReward: 10.8750 meanLoss: 26.3766 meanLossQlbl: 16.9130 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.0773 meanLossQtgt_sigm: 0.6931
Episode: 8333 meanReward: 10.9062 meanLoss: 32.5925 meanLossQlbl: 20.5054 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7009 meanLossQtgt_sigm: 0.6931
Episode: 8334 meanReward: 10.8750 meanLoss: 32.1093 meanLossQlbl: 20.2179 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5051 meanLossQtgt_sigm: 0.6931
Episode: 8335 meanReward: 10.7188 meanLoss: 34.2899 meanLossQlbl: 21.4856 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 8386 meanReward: 10.0312 meanLoss: 32.4775 meanLossQlbl: 20.4170 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6743 meanLossQtgt_sigm: 0.6931
Episode: 8387 meanReward: 10.0625 meanLoss: 33.5903 meanLossQlbl: 21.1012 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1028 meanLossQtgt_sigm: 0.6931
Episode: 8388 meanReward: 10.1562 meanLoss: 35.0656 meanLossQlbl: 21.8531 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8262 meanLossQtgt_sigm: 0.6931
Episode: 8389 meanReward: 10.2188 meanLoss: 34.9052 meanLossQlbl: 21.6939 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8249 meanLossQtgt_sigm: 0.6931
Episode: 8390 meanReward: 10.2188 meanLoss: 35.3843 meanLossQlbl: 22.0233 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9747 meanLossQtgt_sigm: 0.6931
Episode: 8391 meanReward: 10.2500 meanLoss: 32.6174 meanLossQlbl: 20.5032 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7279 meanLossQtgt_sigm: 0.6931
Episode: 8392 meanReward: 10.1875 meanLoss: 32.8392 meanLossQlbl: 20.6263 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8442 meanReward: 10.4688 meanLoss: 32.9476 meanLossQlbl: 20.7144 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8469 meanLossQtgt_sigm: 0.6931
Episode: 8443 meanReward: 10.5000 meanLoss: 34.5168 meanLossQlbl: 21.6665 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4640 meanLossQtgt_sigm: 0.6931
Episode: 8444 meanReward: 10.6562 meanLoss: 35.6396 meanLossQlbl: 22.2280 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0253 meanLossQtgt_sigm: 0.6931
Episode: 8445 meanReward: 10.5000 meanLoss: 36.1186 meanLossQlbl: 22.3537 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3786 meanLossQtgt_sigm: 0.6931
Episode: 8446 meanReward: 10.5938 meanLoss: 34.0636 meanLossQlbl: 21.1745 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5029 meanLossQtgt_sigm: 0.6931
Episode: 8447 meanReward: 10.7500 meanLoss: 34.6544 meanLossQlbl: 21.4814 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7868 meanLossQtgt_sigm: 0.6931
Episode: 8448 meanReward: 10.7188 meanLoss: 32.7554 meanLossQlbl: 20.3846 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8498 meanReward: 10.6562 meanLoss: 34.8363 meanLossQlbl: 21.7534 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6966 meanLossQtgt_sigm: 0.6931
Episode: 8499 meanReward: 10.6250 meanLoss: 36.3316 meanLossQlbl: 22.4775 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.4679 meanLossQtgt_sigm: 0.6931
Episode: 8500 meanReward: 10.4688 meanLoss: 33.6530 meanLossQlbl: 21.0477 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2190 meanLossQtgt_sigm: 0.6931
Episode: 8501 meanReward: 10.4375 meanLoss: 26.1058 meanLossQlbl: 16.7302 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.9894 meanLossQtgt_sigm: 0.6931
Episode: 8502 meanReward: 10.3438 meanLoss: 31.3731 meanLossQlbl: 19.8304 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1564 meanLossQtgt_sigm: 0.6931
Episode: 8503 meanReward: 10.2812 meanLoss: 35.1910 meanLossQlbl: 22.0241 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7807 meanLossQtgt_sigm: 0.6931
Episode: 8504 meanReward: 10.1562 meanLoss: 34.5665 meanLossQlbl: 21.6701 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 8555 meanReward: 11.2500 meanLoss: 31.4527 meanLossQlbl: 19.6633 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4031 meanLossQtgt_sigm: 0.6931
Episode: 8556 meanReward: 11.2812 meanLoss: 33.8847 meanLossQlbl: 21.1370 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3614 meanLossQtgt_sigm: 0.6931
Episode: 8557 meanReward: 11.3750 meanLoss: 34.2209 meanLossQlbl: 21.2601 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5745 meanLossQtgt_sigm: 0.6931
Episode: 8558 meanReward: 11.3750 meanLoss: 33.4452 meanLossQlbl: 20.8301 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2288 meanLossQtgt_sigm: 0.6931
Episode: 8559 meanReward: 11.4062 meanLoss: 34.3639 meanLossQlbl: 21.3383 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6393 meanLossQtgt_sigm: 0.6931
Episode: 8560 meanReward: 11.5938 meanLoss: 33.0591 meanLossQlbl: 20.5713 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1015 meanLossQtgt_sigm: 0.6931
Episode: 8561 meanReward: 11.5312 meanLoss: 32.9992 meanLossQlbl: 20.5630 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8612 meanReward: 10.4375 meanLoss: 35.3242 meanLossQlbl: 21.9792 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9587 meanLossQtgt_sigm: 0.6931
Episode: 8613 meanReward: 10.4375 meanLoss: 35.6424 meanLossQlbl: 22.1327 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1234 meanLossQtgt_sigm: 0.6931
Episode: 8614 meanReward: 10.3438 meanLoss: 33.1749 meanLossQlbl: 20.6616 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1270 meanLossQtgt_sigm: 0.6931
Episode: 8615 meanReward: 10.2812 meanLoss: 25.6649 meanLossQlbl: 16.5165 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.7621 meanLossQtgt_sigm: 0.6931
Episode: 8616 meanReward: 10.2812 meanLoss: 33.9255 meanLossQlbl: 21.2281 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3110 meanLossQtgt_sigm: 0.6931
Episode: 8617 meanReward: 10.3438 meanLoss: 34.3323 meanLossQlbl: 21.4187 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5273 meanLossQtgt_sigm: 0.6931
Episode: 8618 meanReward: 10.4688 meanLoss: 33.9450 meanLossQlbl: 21.1068 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 8668 meanReward: 11.1875 meanLoss: 35.0318 meanLossQlbl: 21.7894 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8561 meanLossQtgt_sigm: 0.6931
Episode: 8669 meanReward: 11.2500 meanLoss: 32.6799 meanLossQlbl: 20.5246 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7690 meanLossQtgt_sigm: 0.6931
Episode: 8670 meanReward: 11.2188 meanLoss: 36.0358 meanLossQlbl: 22.4706 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1789 meanLossQtgt_sigm: 0.6931
Episode: 8671 meanReward: 11.1562 meanLoss: 32.9409 meanLossQlbl: 20.6344 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9202 meanLossQtgt_sigm: 0.6931
Episode: 8672 meanReward: 11.2812 meanLoss: 33.7490 meanLossQlbl: 21.1919 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1708 meanLossQtgt_sigm: 0.6931
Episode: 8673 meanReward: 11.3750 meanLoss: 35.4470 meanLossQlbl: 22.0243 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0365 meanLossQtgt_sigm: 0.6931
Episode: 8674 meanReward: 11.2188 meanLoss: 36.0831 meanLossQlbl: 22.3327 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8724 meanReward: 11.0938 meanLoss: 34.8663 meanLossQlbl: 21.7557 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7243 meanLossQtgt_sigm: 0.6931
Episode: 8725 meanReward: 11.0938 meanLoss: 34.5559 meanLossQlbl: 21.6398 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5298 meanLossQtgt_sigm: 0.6931
Episode: 8726 meanReward: 11.0000 meanLoss: 35.5691 meanLossQlbl: 22.1687 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0141 meanLossQtgt_sigm: 0.6931
Episode: 8727 meanReward: 10.9062 meanLoss: 32.3357 meanLossQlbl: 20.3132 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6362 meanLossQtgt_sigm: 0.6931
Episode: 8728 meanReward: 10.7500 meanLoss: 34.2618 meanLossQlbl: 21.4931 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3825 meanLossQtgt_sigm: 0.6931
Episode: 8729 meanReward: 10.6875 meanLoss: 35.1679 meanLossQlbl: 22.0650 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7166 meanLossQtgt_sigm: 0.6931
Episode: 8730 meanReward: 10.6875 meanLoss: 34.4179 meanLossQlbl: 21.5943 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8780 meanReward: 10.2500 meanLoss: 32.0360 meanLossQlbl: 20.1636 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4861 meanLossQtgt_sigm: 0.6931
Episode: 8781 meanReward: 10.3125 meanLoss: 33.5393 meanLossQlbl: 20.9181 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2349 meanLossQtgt_sigm: 0.6931
Episode: 8782 meanReward: 10.2812 meanLoss: 36.3522 meanLossQlbl: 22.5439 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.4220 meanLossQtgt_sigm: 0.6931
Episode: 8783 meanReward: 10.2188 meanLoss: 34.8920 meanLossQlbl: 21.6597 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8460 meanLossQtgt_sigm: 0.6931
Episode: 8784 meanReward: 10.3438 meanLoss: 33.1041 meanLossQlbl: 20.6734 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0444 meanLossQtgt_sigm: 0.6931
Episode: 8785 meanReward: 10.5000 meanLoss: 36.2037 meanLossQlbl: 22.4024 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.4150 meanLossQtgt_sigm: 0.6931
Episode: 8786 meanReward: 10.4688 meanLoss: 32.6391 meanLossQlbl: 20.4373 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8837 meanReward: 10.5312 meanLoss: 31.1860 meanLossQlbl: 19.5320 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2677 meanLossQtgt_sigm: 0.6931
Episode: 8838 meanReward: 10.7188 meanLoss: 34.2281 meanLossQlbl: 21.2626 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5792 meanLossQtgt_sigm: 0.6931
Episode: 8839 meanReward: 10.6875 meanLoss: 32.8811 meanLossQlbl: 20.4256 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0692 meanLossQtgt_sigm: 0.6931
Episode: 8840 meanReward: 10.6250 meanLoss: 30.1185 meanLossQlbl: 18.9754 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.7568 meanLossQtgt_sigm: 0.6931
Episode: 8841 meanReward: 10.6875 meanLoss: 27.2786 meanLossQlbl: 17.4967 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.3957 meanLossQtgt_sigm: 0.6931
Episode: 8842 meanReward: 10.6875 meanLoss: 34.4764 meanLossQlbl: 21.5328 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5573 meanLossQtgt_sigm: 0.6931
Episode: 8843 meanReward: 10.6875 meanLoss: 34.5782 meanLossQlbl: 21.5090 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 

Episode: 8893 meanReward: 10.7188 meanLoss: 36.1738 meanLossQlbl: 22.5317 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2558 meanLossQtgt_sigm: 0.6931
Episode: 8894 meanReward: 10.7500 meanLoss: 34.8318 meanLossQlbl: 21.6098 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8357 meanLossQtgt_sigm: 0.6931
Episode: 8895 meanReward: 10.6250 meanLoss: 31.8097 meanLossQlbl: 19.9612 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4622 meanLossQtgt_sigm: 0.6931
Episode: 8896 meanReward: 10.6250 meanLoss: 34.8025 meanLossQlbl: 21.6740 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7421 meanLossQtgt_sigm: 0.6931
Episode: 8897 meanReward: 10.6875 meanLoss: 33.9663 meanLossQlbl: 21.2171 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3629 meanLossQtgt_sigm: 0.6931
Episode: 8898 meanReward: 10.5938 meanLoss: 32.7172 meanLossQlbl: 20.4873 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8436 meanLossQtgt_sigm: 0.6931
Episode: 8899 meanReward: 10.5625 meanLoss: 33.2421 meanLossQlbl: 20.8360 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 8950 meanReward: 10.7500 meanLoss: 36.1678 meanLossQlbl: 22.3999 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3816 meanLossQtgt_sigm: 0.6931
Episode: 8951 meanReward: 10.6250 meanLoss: 30.2659 meanLossQlbl: 19.1606 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.7191 meanLossQtgt_sigm: 0.6931
Episode: 8952 meanReward: 10.5938 meanLoss: 33.5519 meanLossQlbl: 21.0900 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0756 meanLossQtgt_sigm: 0.6931
Episode: 8953 meanReward: 10.5625 meanLoss: 35.1866 meanLossQlbl: 22.0498 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7505 meanLossQtgt_sigm: 0.6931
Episode: 8954 meanReward: 10.7812 meanLoss: 35.6147 meanLossQlbl: 22.1596 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0688 meanLossQtgt_sigm: 0.6931
Episode: 8955 meanReward: 10.6875 meanLoss: 36.3301 meanLossQlbl: 22.4934 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.4504 meanLossQtgt_sigm: 0.6931
Episode: 8956 meanReward: 10.6562 meanLoss: 35.2873 meanLossQlbl: 21.9191 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9007 meanReward: 10.6562 meanLoss: 31.8806 meanLossQlbl: 20.1747 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.3197 meanLossQtgt_sigm: 0.6931
Episode: 9008 meanReward: 10.6250 meanLoss: 36.1876 meanLossQlbl: 22.5801 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2211 meanLossQtgt_sigm: 0.6931
Episode: 9009 meanReward: 10.5938 meanLoss: 35.9222 meanLossQlbl: 22.2318 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3041 meanLossQtgt_sigm: 0.6931
Episode: 9010 meanReward: 10.7188 meanLoss: 33.7435 meanLossQlbl: 20.9970 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3602 meanLossQtgt_sigm: 0.6931
Episode: 9011 meanReward: 10.7500 meanLoss: 33.3457 meanLossQlbl: 20.7306 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2288 meanLossQtgt_sigm: 0.6931
Episode: 9012 meanReward: 10.8438 meanLoss: 33.4542 meanLossQlbl: 20.7643 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3036 meanLossQtgt_sigm: 0.6931
Episode: 9013 meanReward: 10.8438 meanLoss: 32.4622 meanLossQlbl: 20.3176 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 9064 meanReward: 11.0000 meanLoss: 28.6290 meanLossQlbl: 18.2714 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.9714 meanLossQtgt_sigm: 0.6931
Episode: 9065 meanReward: 10.9375 meanLoss: 33.0271 meanLossQlbl: 20.7918 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8490 meanLossQtgt_sigm: 0.6931
Episode: 9066 meanReward: 10.7812 meanLoss: 35.8363 meanLossQlbl: 22.3959 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0541 meanLossQtgt_sigm: 0.6931
Episode: 9067 meanReward: 10.9375 meanLoss: 34.5427 meanLossQlbl: 21.5603 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5961 meanLossQtgt_sigm: 0.6931
Episode: 9068 meanReward: 10.9688 meanLoss: 35.2453 meanLossQlbl: 21.8375 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0216 meanLossQtgt_sigm: 0.6931
Episode: 9069 meanReward: 10.9375 meanLoss: 34.3860 meanLossQlbl: 21.3149 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6849 meanLossQtgt_sigm: 0.6931
Episode: 9070 meanReward: 10.8750 meanLoss: 32.1258 meanLossQlbl: 20.2306 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9121 meanReward: 10.9062 meanLoss: 29.6511 meanLossQlbl: 18.7636 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.5012 meanLossQtgt_sigm: 0.6931
Episode: 9122 meanReward: 10.7500 meanLoss: 29.6454 meanLossQlbl: 18.8448 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.4143 meanLossQtgt_sigm: 0.6931
Episode: 9123 meanReward: 10.7500 meanLoss: 32.6355 meanLossQlbl: 20.5633 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6859 meanLossQtgt_sigm: 0.6931
Episode: 9124 meanReward: 10.7500 meanLoss: 34.9199 meanLossQlbl: 21.8787 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6549 meanLossQtgt_sigm: 0.6931
Episode: 9125 meanReward: 10.8125 meanLoss: 34.7594 meanLossQlbl: 21.7538 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6194 meanLossQtgt_sigm: 0.6931
Episode: 9126 meanReward: 10.7500 meanLoss: 34.8403 meanLossQlbl: 21.6239 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8302 meanLossQtgt_sigm: 0.6931
Episode: 9127 meanReward: 10.7812 meanLoss: 35.1356 meanLossQlbl: 21.8821 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 

Episode: 9177 meanReward: 10.2500 meanLoss: 31.9769 meanLossQlbl: 20.0455 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5452 meanLossQtgt_sigm: 0.6931
Episode: 9178 meanReward: 10.4062 meanLoss: 35.2019 meanLossQlbl: 21.7794 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0363 meanLossQtgt_sigm: 0.6931
Episode: 9179 meanReward: 10.4375 meanLoss: 34.1545 meanLossQlbl: 21.1306 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6376 meanLossQtgt_sigm: 0.6931
Episode: 9180 meanReward: 10.4688 meanLoss: 31.1558 meanLossQlbl: 19.5034 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2662 meanLossQtgt_sigm: 0.6931
Episode: 9181 meanReward: 10.4688 meanLoss: 28.0144 meanLossQlbl: 17.9599 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.6681 meanLossQtgt_sigm: 0.6931
Episode: 9182 meanReward: 10.5938 meanLoss: 34.2549 meanLossQlbl: 21.4123 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4563 meanLossQtgt_sigm: 0.6931
Episode: 9183 meanReward: 10.6875 meanLoss: 36.4000 meanLossQlbl: 22.4936 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9233 meanReward: 10.6562 meanLoss: 32.1559 meanLossQlbl: 20.2892 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4804 meanLossQtgt_sigm: 0.6931
Episode: 9234 meanReward: 10.6562 meanLoss: 35.0612 meanLossQlbl: 21.9485 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7264 meanLossQtgt_sigm: 0.6931
Episode: 9235 meanReward: 10.8125 meanLoss: 35.4185 meanLossQlbl: 22.1204 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9119 meanLossQtgt_sigm: 0.6931
Episode: 9236 meanReward: 10.7188 meanLoss: 35.2791 meanLossQlbl: 21.9052 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9877 meanLossQtgt_sigm: 0.6931
Episode: 9237 meanReward: 10.7188 meanLoss: 34.3881 meanLossQlbl: 21.4225 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5793 meanLossQtgt_sigm: 0.6931
Episode: 9238 meanReward: 10.5625 meanLoss: 29.8943 meanLossQlbl: 19.0254 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.4826 meanLossQtgt_sigm: 0.6931
Episode: 9239 meanReward: 10.5312 meanLoss: 32.4281 meanLossQlbl: 20.4354 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9290 meanReward: 10.7812 meanLoss: 34.7202 meanLossQlbl: 21.8558 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4781 meanLossQtgt_sigm: 0.6931
Episode: 9291 meanReward: 10.9062 meanLoss: 35.3740 meanLossQlbl: 22.1523 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8355 meanLossQtgt_sigm: 0.6931
Episode: 9292 meanReward: 10.9062 meanLoss: 35.9231 meanLossQlbl: 22.3365 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2003 meanLossQtgt_sigm: 0.6931
Episode: 9293 meanReward: 10.7188 meanLoss: 35.8658 meanLossQlbl: 22.3151 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1644 meanLossQtgt_sigm: 0.6931
Episode: 9294 meanReward: 10.6250 meanLoss: 30.6464 meanLossQlbl: 19.4158 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.8444 meanLossQtgt_sigm: 0.6931
Episode: 9295 meanReward: 10.5000 meanLoss: 32.9828 meanLossQlbl: 20.8071 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7894 meanLossQtgt_sigm: 0.6931
Episode: 9296 meanReward: 10.5000 meanLoss: 34.5816 meanLossQlbl: 21.7655 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9347 meanReward: 10.4375 meanLoss: 35.1673 meanLossQlbl: 21.9645 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8165 meanLossQtgt_sigm: 0.6931
Episode: 9348 meanReward: 10.4375 meanLoss: 35.9768 meanLossQlbl: 22.2762 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3143 meanLossQtgt_sigm: 0.6931
Episode: 9349 meanReward: 10.4375 meanLoss: 33.5096 meanLossQlbl: 20.8551 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2682 meanLossQtgt_sigm: 0.6931
Episode: 9350 meanReward: 10.6250 meanLoss: 31.2188 meanLossQlbl: 19.5843 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2482 meanLossQtgt_sigm: 0.6931
Episode: 9351 meanReward: 10.5312 meanLoss: 34.6161 meanLossQlbl: 21.5639 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6658 meanLossQtgt_sigm: 0.6931
Episode: 9352 meanReward: 10.5938 meanLoss: 33.3218 meanLossQlbl: 20.7884 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1471 meanLossQtgt_sigm: 0.6931
Episode: 9353 meanReward: 10.4688 meanLoss: 29.5820 meanLossQlbl: 18.6965 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 9403 meanReward: 10.5312 meanLoss: 35.3361 meanLossQlbl: 22.0043 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9455 meanLossQtgt_sigm: 0.6931
Episode: 9404 meanReward: 10.5312 meanLoss: 30.1026 meanLossQlbl: 18.9879 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.7284 meanLossQtgt_sigm: 0.6931
Episode: 9405 meanReward: 10.5938 meanLoss: 33.1191 meanLossQlbl: 20.7604 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9724 meanLossQtgt_sigm: 0.6931
Episode: 9406 meanReward: 10.7500 meanLoss: 35.1237 meanLossQlbl: 21.7478 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9896 meanLossQtgt_sigm: 0.6931
Episode: 9407 meanReward: 10.6250 meanLoss: 34.0808 meanLossQlbl: 21.1516 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5429 meanLossQtgt_sigm: 0.6931
Episode: 9408 meanReward: 10.7188 meanLoss: 33.0216 meanLossQlbl: 20.5922 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0431 meanLossQtgt_sigm: 0.6931
Episode: 9409 meanReward: 10.8438 meanLoss: 33.1090 meanLossQlbl: 20.6478 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9459 meanReward: 11.0938 meanLoss: 34.9581 meanLossQlbl: 21.6943 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8776 meanLossQtgt_sigm: 0.6931
Episode: 9460 meanReward: 11.2812 meanLoss: 32.4859 meanLossQlbl: 20.2566 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8430 meanLossQtgt_sigm: 0.6931
Episode: 9461 meanReward: 11.1562 meanLoss: 32.6219 meanLossQlbl: 20.3763 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8593 meanLossQtgt_sigm: 0.6931
Episode: 9462 meanReward: 11.1250 meanLoss: 32.8943 meanLossQlbl: 20.5912 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9168 meanLossQtgt_sigm: 0.6931
Episode: 9463 meanReward: 11.2812 meanLoss: 29.3855 meanLossQlbl: 18.6117 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.3875 meanLossQtgt_sigm: 0.6931
Episode: 9464 meanReward: 11.1562 meanLoss: 35.7192 meanLossQlbl: 22.2102 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1227 meanLossQtgt_sigm: 0.6931
Episode: 9465 meanReward: 11.1562 meanLoss: 36.7560 meanLossQlbl: 22.8321 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9515 meanReward: 10.7812 meanLoss: 33.6560 meanLossQlbl: 20.9209 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3488 meanLossQtgt_sigm: 0.6931
Episode: 9516 meanReward: 10.7812 meanLoss: 30.7105 meanLossQlbl: 19.2337 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.0905 meanLossQtgt_sigm: 0.6931
Episode: 9517 meanReward: 10.7188 meanLoss: 29.1854 meanLossQlbl: 18.4049 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.3942 meanLossQtgt_sigm: 0.6931
Episode: 9518 meanReward: 10.7188 meanLoss: 29.9572 meanLossQlbl: 19.0079 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.5630 meanLossQtgt_sigm: 0.6931
Episode: 9519 meanReward: 10.8438 meanLoss: 34.8671 meanLossQlbl: 21.6201 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8607 meanLossQtgt_sigm: 0.6931
Episode: 9520 meanReward: 10.8125 meanLoss: 34.2044 meanLossQlbl: 21.2852 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5329 meanLossQtgt_sigm: 0.6931
Episode: 9521 meanReward: 10.8438 meanLoss: 34.9537 meanLossQlbl: 21.6905 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 

Episode: 9571 meanReward: 11.0000 meanLoss: 34.1958 meanLossQlbl: 21.2606 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5489 meanLossQtgt_sigm: 0.6931
Episode: 9572 meanReward: 10.9062 meanLoss: 34.0909 meanLossQlbl: 21.1520 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5526 meanLossQtgt_sigm: 0.6931
Episode: 9573 meanReward: 10.9062 meanLoss: 32.0623 meanLossQlbl: 20.0723 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6037 meanLossQtgt_sigm: 0.6931
Episode: 9574 meanReward: 10.8750 meanLoss: 29.8266 meanLossQlbl: 19.0458 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.3945 meanLossQtgt_sigm: 0.6931
Episode: 9575 meanReward: 10.9688 meanLoss: 34.3289 meanLossQlbl: 21.3142 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6284 meanLossQtgt_sigm: 0.6931
Episode: 9576 meanReward: 11.1562 meanLoss: 33.6933 meanLossQlbl: 20.9568 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3502 meanLossQtgt_sigm: 0.6931
Episode: 9577 meanReward: 11.1250 meanLoss: 34.7217 meanLossQlbl: 21.5137 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9627 meanReward: 11.4062 meanLoss: 30.8756 meanLossQlbl: 19.2473 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2420 meanLossQtgt_sigm: 0.6931
Episode: 9628 meanReward: 11.3438 meanLoss: 31.7940 meanLossQlbl: 19.8393 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5684 meanLossQtgt_sigm: 0.6931
Episode: 9629 meanReward: 11.3125 meanLoss: 30.7789 meanLossQlbl: 19.3361 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.0565 meanLossQtgt_sigm: 0.6931
Episode: 9630 meanReward: 11.3125 meanLoss: 29.8621 meanLossQlbl: 18.9451 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.5307 meanLossQtgt_sigm: 0.6931
Episode: 9631 meanReward: 11.1875 meanLoss: 32.0644 meanLossQlbl: 20.1971 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4810 meanLossQtgt_sigm: 0.6931
Episode: 9632 meanReward: 11.1562 meanLoss: 34.3819 meanLossQlbl: 21.5412 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4544 meanLossQtgt_sigm: 0.6931
Episode: 9633 meanReward: 11.2188 meanLoss: 33.7889 meanLossQlbl: 21.2338 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9683 meanReward: 10.5625 meanLoss: 29.4838 meanLossQlbl: 18.7636 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.3339 meanLossQtgt_sigm: 0.6931
Episode: 9684 meanReward: 10.5625 meanLoss: 32.1931 meanLossQlbl: 20.3270 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4798 meanLossQtgt_sigm: 0.6931
Episode: 9685 meanReward: 10.5312 meanLoss: 34.8335 meanLossQlbl: 21.8449 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6024 meanLossQtgt_sigm: 0.6931
Episode: 9686 meanReward: 10.4375 meanLoss: 35.2744 meanLossQlbl: 22.0159 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8723 meanLossQtgt_sigm: 0.6931
Episode: 9687 meanReward: 10.4375 meanLoss: 36.0260 meanLossQlbl: 22.3242 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3155 meanLossQtgt_sigm: 0.6931
Episode: 9688 meanReward: 10.3438 meanLoss: 35.6839 meanLossQlbl: 22.2105 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0871 meanLossQtgt_sigm: 0.6931
Episode: 9689 meanReward: 10.3125 meanLoss: 31.4414 meanLossQlbl: 19.8273 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9739 meanReward: 10.6562 meanLoss: 34.8967 meanLossQlbl: 21.8391 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6713 meanLossQtgt_sigm: 0.6931
Episode: 9740 meanReward: 10.5938 meanLoss: 34.6944 meanLossQlbl: 21.7714 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5367 meanLossQtgt_sigm: 0.6931
Episode: 9741 meanReward: 10.4375 meanLoss: 34.9692 meanLossQlbl: 21.9644 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6185 meanLossQtgt_sigm: 0.6931
Episode: 9742 meanReward: 10.5625 meanLoss: 34.9423 meanLossQlbl: 21.8808 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6752 meanLossQtgt_sigm: 0.6931
Episode: 9743 meanReward: 10.4688 meanLoss: 35.4343 meanLossQlbl: 21.9997 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0483 meanLossQtgt_sigm: 0.6931
Episode: 9744 meanReward: 10.3750 meanLoss: 34.1361 meanLossQlbl: 21.2341 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5157 meanLossQtgt_sigm: 0.6931
Episode: 9745 meanReward: 10.5938 meanLoss: 31.7436 meanLossQlbl: 19.8752 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 9795 meanReward: 11.6875 meanLoss: 32.3834 meanLossQlbl: 20.1775 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8196 meanLossQtgt_sigm: 0.6931
Episode: 9796 meanReward: 11.6562 meanLoss: 32.0227 meanLossQlbl: 20.0057 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6307 meanLossQtgt_sigm: 0.6931
Episode: 9797 meanReward: 11.6875 meanLoss: 34.1091 meanLossQlbl: 21.1704 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5525 meanLossQtgt_sigm: 0.6931
Episode: 9798 meanReward: 11.5938 meanLoss: 32.8007 meanLossQlbl: 20.4301 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9843 meanLossQtgt_sigm: 0.6931
Episode: 9799 meanReward: 11.4062 meanLoss: 31.1385 meanLossQlbl: 19.6739 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.0784 meanLossQtgt_sigm: 0.6931
Episode: 9800 meanReward: 11.4062 meanLoss: 31.3440 meanLossQlbl: 19.7817 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1760 meanLossQtgt_sigm: 0.6931
Episode: 9801 meanReward: 11.2500 meanLoss: 33.8353 meanLossQlbl: 21.2546 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 9851 meanReward: 10.5938 meanLoss: 32.6106 meanLossQlbl: 20.3859 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.8384 meanLossQtgt_sigm: 0.6931
Episode: 9852 meanReward: 10.5625 meanLoss: 36.0537 meanLossQlbl: 22.3894 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2780 meanLossQtgt_sigm: 0.6931
Episode: 9853 meanReward: 10.5938 meanLoss: 34.2355 meanLossQlbl: 21.3193 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5300 meanLossQtgt_sigm: 0.6931
Episode: 9854 meanReward: 10.5938 meanLoss: 29.5705 meanLossQlbl: 18.8121 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.3721 meanLossQtgt_sigm: 0.6931
Episode: 9855 meanReward: 10.4375 meanLoss: 33.5367 meanLossQlbl: 21.0614 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0890 meanLossQtgt_sigm: 0.6931
Episode: 9856 meanReward: 10.4688 meanLoss: 33.7408 meanLossQlbl: 21.2202 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1343 meanLossQtgt_sigm: 0.6931
Episode: 9857 meanReward: 10.5625 meanLoss: 34.6540 meanLossQlbl: 21.7475 meanLossQlbl_sigm: 0.6931 meanLossQtgt:

Episode: 9907 meanReward: 10.3750 meanLoss: 34.0443 meanLossQlbl: 21.3476 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3105 meanLossQtgt_sigm: 0.6931
Episode: 9908 meanReward: 10.5312 meanLoss: 34.3467 meanLossQlbl: 21.4380 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5224 meanLossQtgt_sigm: 0.6931
Episode: 9909 meanReward: 10.5625 meanLoss: 35.0435 meanLossQlbl: 21.8179 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8393 meanLossQtgt_sigm: 0.6931
Episode: 9910 meanReward: 10.4688 meanLoss: 34.6634 meanLossQlbl: 21.5758 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7013 meanLossQtgt_sigm: 0.6931
Episode: 9911 meanReward: 10.5625 meanLoss: 32.9478 meanLossQlbl: 20.6110 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9504 meanLossQtgt_sigm: 0.6931
Episode: 9912 meanReward: 10.6250 meanLoss: 35.3360 meanLossQlbl: 21.9817 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9680 meanLossQtgt_sigm: 0.6931
Episode: 9913 meanReward: 10.5000 meanLoss: 34.7110 meanLossQlbl: 21.6768 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 9963 meanReward: 10.1562 meanLoss: 36.2817 meanLossQlbl: 22.4866 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.4088 meanLossQtgt_sigm: 0.6931
Episode: 9964 meanReward: 10.1875 meanLoss: 31.2900 meanLossQlbl: 19.7188 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1850 meanLossQtgt_sigm: 0.6931
Episode: 9965 meanReward: 10.0625 meanLoss: 34.0561 meanLossQlbl: 21.3712 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2986 meanLossQtgt_sigm: 0.6931
Episode: 9966 meanReward: 10.1562 meanLoss: 35.3426 meanLossQlbl: 22.0615 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8948 meanLossQtgt_sigm: 0.6931
Episode: 9967 meanReward: 10.2188 meanLoss: 35.8220 meanLossQlbl: 22.2615 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1743 meanLossQtgt_sigm: 0.6931
Episode: 9968 meanReward: 10.2188 meanLoss: 34.4776 meanLossQlbl: 21.4404 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6509 meanLossQtgt_sigm: 0.6931
Episode: 9969 meanReward: 10.1875 meanLoss: 31.3120 meanLossQlbl: 19.8107 meanLossQlbl_sigm: 0.6931 meanLossQtgt

Episode: 10019 meanReward: 11.0938 meanLoss: 31.2199 meanLossQlbl: 19.8138 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.0199 meanLossQtgt_sigm: 0.6931
Episode: 10020 meanReward: 10.9062 meanLoss: 35.7066 meanLossQlbl: 22.3889 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.9314 meanLossQtgt_sigm: 0.6931
Episode: 10021 meanReward: 11.0312 meanLoss: 36.2757 meanLossQlbl: 22.6388 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2505 meanLossQtgt_sigm: 0.6931
Episode: 10022 meanReward: 11.0625 meanLoss: 36.4337 meanLossQlbl: 22.5756 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.4718 meanLossQtgt_sigm: 0.6931
Episode: 10023 meanReward: 11.0312 meanLoss: 36.0804 meanLossQlbl: 22.4040 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.2902 meanLossQtgt_sigm: 0.6931
Episode: 10024 meanReward: 10.9062 meanLoss: 32.4905 meanLossQlbl: 20.4730 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6312 meanLossQtgt_sigm: 0.6931
Episode: 10025 meanReward: 10.9688 meanLoss: 34.3876 meanLossQlbl: 21.5596 meanLossQlbl_sigm: 0.6931 meanL

Episode: 10075 meanReward: 10.3750 meanLoss: 33.9572 meanLossQlbl: 21.3395 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2314 meanLossQtgt_sigm: 0.6931
Episode: 10076 meanReward: 10.4062 meanLoss: 34.9197 meanLossQlbl: 21.8993 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6341 meanLossQtgt_sigm: 0.6931
Episode: 10077 meanReward: 10.5312 meanLoss: 35.1920 meanLossQlbl: 21.9130 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8927 meanLossQtgt_sigm: 0.6931
Episode: 10078 meanReward: 10.7500 meanLoss: 35.4263 meanLossQlbl: 21.9566 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0833 meanLossQtgt_sigm: 0.6931
Episode: 10079 meanReward: 10.6562 meanLoss: 34.1955 meanLossQlbl: 21.2199 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5893 meanLossQtgt_sigm: 0.6931
Episode: 10080 meanReward: 10.6250 meanLoss: 32.0038 meanLossQlbl: 20.0970 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5205 meanLossQtgt_sigm: 0.6931
Episode: 10081 meanReward: 10.6875 meanLoss: 28.1934 meanLossQlbl: 17.9644 meanLossQlbl_sigm: 0.6931 meanL

Episode: 10132 meanReward: 11.1875 meanLoss: 29.9112 meanLossQlbl: 18.9837 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 9.5412 meanLossQtgt_sigm: 0.6931
Episode: 10133 meanReward: 11.0938 meanLoss: 32.3473 meanLossQlbl: 20.3469 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.6141 meanLossQtgt_sigm: 0.6931
Episode: 10134 meanReward: 11.1562 meanLoss: 34.1687 meanLossQlbl: 21.4339 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3485 meanLossQtgt_sigm: 0.6931
Episode: 10135 meanReward: 11.2188 meanLoss: 35.2590 meanLossQlbl: 22.0564 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8163 meanLossQtgt_sigm: 0.6931
Episode: 10136 meanReward: 11.1875 meanLoss: 35.4598 meanLossQlbl: 22.0733 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0002 meanLossQtgt_sigm: 0.6931
Episode: 10137 meanReward: 11.0938 meanLoss: 35.0502 meanLossQlbl: 21.8955 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7684 meanLossQtgt_sigm: 0.6931
Episode: 10138 meanReward: 11.0312 meanLoss: 32.6844 meanLossQlbl: 20.5385 meanLossQlbl_sigm: 0.6931 meanLo

Episode: 10188 meanReward: 11.3125 meanLoss: 36.0653 meanLossQlbl: 22.3177 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3613 meanLossQtgt_sigm: 0.6931
Episode: 10189 meanReward: 11.5000 meanLoss: 34.0464 meanLossQlbl: 21.1857 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4744 meanLossQtgt_sigm: 0.6931
Episode: 10190 meanReward: 11.4688 meanLoss: 33.4310 meanLossQlbl: 20.8474 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1973 meanLossQtgt_sigm: 0.6931
Episode: 10191 meanReward: 11.4688 meanLoss: 31.2189 meanLossQlbl: 19.5555 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.2771 meanLossQtgt_sigm: 0.6931
Episode: 10192 meanReward: 11.4688 meanLoss: 25.5576 meanLossQlbl: 16.4132 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 7.7581 meanLossQtgt_sigm: 0.6931
Episode: 10193 meanReward: 11.5625 meanLoss: 32.1566 meanLossQlbl: 20.2435 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5268 meanLossQtgt_sigm: 0.6931
Episode: 10194 meanReward: 11.7500 meanLoss: 35.9215 meanLossQlbl: 22.2066 meanLossQlbl_sigm: 0.6931 meanLo

Episode: 10244 meanReward: 10.6875 meanLoss: 35.5265 meanLossQlbl: 22.0116 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1286 meanLossQtgt_sigm: 0.6931
Episode: 10245 meanReward: 10.7188 meanLoss: 33.2217 meanLossQlbl: 20.7879 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0475 meanLossQtgt_sigm: 0.6931
Episode: 10246 meanReward: 10.8125 meanLoss: 33.6295 meanLossQlbl: 21.0322 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.2110 meanLossQtgt_sigm: 0.6931
Episode: 10247 meanReward: 10.8438 meanLoss: 33.9091 meanLossQlbl: 21.1604 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.3625 meanLossQtgt_sigm: 0.6931
Episode: 10248 meanReward: 11.0625 meanLoss: 34.0912 meanLossQlbl: 21.2058 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4991 meanLossQtgt_sigm: 0.6931
Episode: 10249 meanReward: 11.0000 meanLoss: 33.1930 meanLossQlbl: 20.7139 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.0928 meanLossQtgt_sigm: 0.6931
Episode: 10250 meanReward: 11.0000 meanLoss: 33.2060 meanLossQlbl: 20.7668 meanLossQlbl_sigm: 0.6931 meanL

Episode: 10300 meanReward: 10.2500 meanLoss: 33.0852 meanLossQlbl: 20.7894 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.9095 meanLossQtgt_sigm: 0.6931
Episode: 10301 meanReward: 10.2812 meanLoss: 35.6530 meanLossQlbl: 22.1467 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1200 meanLossQtgt_sigm: 0.6931
Episode: 10302 meanReward: 10.2812 meanLoss: 36.0627 meanLossQlbl: 22.3235 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.3529 meanLossQtgt_sigm: 0.6931
Episode: 10303 meanReward: 10.4688 meanLoss: 35.1820 meanLossQlbl: 21.7495 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.0462 meanLossQtgt_sigm: 0.6931
Episode: 10304 meanReward: 10.4375 meanLoss: 34.4258 meanLossQlbl: 21.2678 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.7717 meanLossQtgt_sigm: 0.6931
Episode: 10305 meanReward: 10.4375 meanLoss: 30.8837 meanLossQlbl: 19.3569 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.1405 meanLossQtgt_sigm: 0.6931
Episode: 10306 meanReward: 10.5312 meanLoss: 30.3325 meanLossQlbl: 19.1494 meanLossQlbl_sigm: 0.6931 meanL

Episode: 10356 meanReward: 10.7500 meanLoss: 35.6147 meanLossQlbl: 22.0795 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.1489 meanLossQtgt_sigm: 0.6931
Episode: 10357 meanReward: 10.5312 meanLoss: 34.6108 meanLossQlbl: 21.6959 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.5286 meanLossQtgt_sigm: 0.6931
Episode: 10358 meanReward: 10.5625 meanLoss: 31.8748 meanLossQlbl: 20.0867 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.4018 meanLossQtgt_sigm: 0.6931
Episode: 10359 meanReward: 10.5625 meanLoss: 34.3987 meanLossQlbl: 21.5696 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4428 meanLossQtgt_sigm: 0.6931
Episode: 10360 meanReward: 10.5312 meanLoss: 35.0443 meanLossQlbl: 21.9868 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.6712 meanLossQtgt_sigm: 0.6931
Episode: 10361 meanReward: 10.5312 meanLoss: 34.5742 meanLossQlbl: 21.7174 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.4705 meanLossQtgt_sigm: 0.6931
Episode: 10362 meanReward: 10.4062 meanLoss: 34.3366 meanLossQlbl: 21.6211 meanLossQlbl_sigm: 0.6931 meanL

Episode: 10412 meanReward: 10.8125 meanLoss: 31.9797 meanLossQlbl: 20.0741 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.5193 meanLossQtgt_sigm: 0.6931
Episode: 10413 meanReward: 10.8125 meanLoss: 28.1479 meanLossQlbl: 17.9798 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 8.7818 meanLossQtgt_sigm: 0.6931
Episode: 10414 meanReward: 10.7188 meanLoss: 32.6927 meanLossQlbl: 20.5972 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 10.7092 meanLossQtgt_sigm: 0.6931
Episode: 10415 meanReward: 10.8125 meanLoss: 36.1989 meanLossQlbl: 22.3763 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 12.4364 meanLossQtgt_sigm: 0.6931
Episode: 10416 meanReward: 10.9062 meanLoss: 34.8690 meanLossQlbl: 21.6098 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.8730 meanLossQtgt_sigm: 0.6931
Episode: 10417 meanReward: 10.8750 meanLoss: 33.3107 meanLossQlbl: 20.7657 meanLossQlbl_sigm: 0.6931 meanLossQtgt: 11.1587 meanLossQtgt_sigm: 0.6931
Episode: 10418 meanReward: 10.8438 meanLoss: 34.0143 meanLossQlbl: 21.3296 meanLossQlbl_sigm: 0.6931 meanLo

# Visualizing training

Below I'll plot the total rewards for each episode. I'm plotting the rolling average too, in blue.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / N 

In [None]:
eps, arr = np.array(rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(loss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Average losses')

## Testing

Let's checkout how our trained agent plays the game.

In [184]:
import gym

# Create the Cart-Pole game environment
env = gym.make('CartPole-v0')
env = gym.make('CartPole-v1')

with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model-seq.ckpt')    
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    initial_state = sess.run(model.initial_state) # Qs or current batch or states[:-1]
    state = env.reset()
    total_reward = 0
    while True:
        env.render()
        action_logits, initial_state = sess.run([model.actions_logits, model.final_state],
                                                feed_dict = {model.states: state.reshape([1, -1]), 
                                                             model.initial_state: initial_state})
        action = np.argmax(action_logits)
        state, reward, done, _ = env.step(action)
        total_reward += reward
        if done:
            break
print('total_reward:{}'.format(total_reward))
env.close()

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
INFO:tensorflow:Restoring parameters from checkpoints/model.ckpt




total_reward:120.0


## Extending this

So, Cart-Pole is a pretty simple game. However, the same model can be used to train an agent to play something much more complicated like Pong or Space Invaders. Instead of a state like we're using here though, you'd want to use convolutional layers to get the state from the screen images.

![Deep Q-Learning Atari](assets/atari-network.png)

I'll leave it as a challenge for you to use deep Q-learning to train an agent to play Atari games. Here's the original paper which will get you started: http://www.davidqiu.com:8888/research/nature14236.pdf.