# Sequential DQN

In this notebook, we'll build a neural network that can learn to play games through reinforcement learning. More specifically, we'll use Q-learning to train an agent to play a game called [Cart-Pole](https://gym.openai.com/envs/CartPole-v0). In this game, a freely swinging pole is attached to a cart. The cart can move to the left and right, and the goal is to keep the pole upright as long as possible.

![Cart-Pole](assets/cart-pole.jpg)

We can simulate this game using [OpenAI Gym](https://gym.openai.com/). First, let's check out how OpenAI Gym works. Then, we'll get into training an agent to play the Cart-Pole game.

In [1]:
# In this one we should define and detect GPUs for tensorflow
# GPUs or CPU
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

  from ._conv import register_converters as _register_converters


TensorFlow Version: 1.7.1
Default GPU Device: 


>**Note:** Make sure you have OpenAI Gym cloned into the same directory with this notebook. I've included `gym` as a submodule, so you can run `git submodule --init --recursive` to pull the contents into the `gym` repo.

>**Note:** Make sure you have OpenAI Gym cloned. Then run this command `pip install -e gym/[all]`.

In [2]:
import gym

# Create the Cart-Pole game environment
env = gym.make('CartPole-v0')
env = gym.make('CartPole-v1')

We interact with the simulation through `env`. To show the simulation running, you can use `env.render()` to render one frame. Passing in an action as an integer to `env.step` will generate the next step in the simulation.  You can see how many actions are possible from `env.action_space` and to get a random action you can use `env.action_space.sample()`. This is general to all Gym games. In the Cart-Pole game, there are two possible actions, moving the cart left or right. So there are two actions we can take, encoded as 0 and 1.

Run the code below to watch the simulation run.

In [3]:
import numpy as np
state = env.reset()
for _ in range(10):
    # env.render()
    action = env.action_space.sample()
    next_state, reward, done, info = env.step(action) # take a random action
    #print('state, action, next_state, reward, done, info:', state, action, next_state, reward, done, info)
    state = next_state
    if done:
        state = env.reset()

To shut the window showing the simulation, use `env.close()`.

If you ran the simulation above, we can look at the rewards:

In [4]:
# print(rewards[-20:])
# print(np.array(rewards).shape, np.array(states).shape, np.array(actions).shape, np.array(dones).shape)
# print(np.array(rewards).dtype, np.array(states).dtype, np.array(actions).dtype, np.array(dones).dtype)
# print(np.max(np.array(actions)), np.min(np.array(actions)))
# print((np.max(np.array(actions)) - np.min(np.array(actions)))+1)
# print(np.max(np.array(rewards)), np.min(np.array(rewards)))
# print(np.max(np.array(states)), np.min(np.array(states)))

The game resets after the pole has fallen past a certain angle. For each frame while the simulation is running, it returns a reward of 1.0. The longer the game runs, the more reward we get. Then, our network's goal is to maximize the reward by keeping the pole vertical. It will do this by moving the cart to the left and the right.

## Q-Network

We train our Q-learning agent using the Bellman Equation:

$$
Q(s, a) = r + \gamma \max{Q(s', a')}
$$

where $s$ is a state, $a$ is an action, and $s'$ is the next state from state $s$ and action $a$.

Before we used this equation to learn values for a Q-_table_. However, for this game there are a huge number of states available. The state has four values: the position and velocity of the cart, and the position and velocity of the pole. These are all real-valued numbers, so ignoring floating point precisions, you practically have infinite states. Instead of using a table then, we'll replace it with a neural network that will approximate the Q-table lookup function.

<img src="assets/deep-q-learning.png" width=450px>

Now, our Q value, $Q(s, a)$ is calculated by passing in a state to the network. The output will be Q-values for each available action, with fully connected hidden layers.

<img src="assets/q-network.png" width=550px>


As I showed before, we can define our targets for training as $\hat{Q}(s,a) = r + \gamma \max{Q(s', a')}$. Then we update the weights by minimizing $(\hat{Q}(s,a) - Q(s,a))^2$. 

For this Cart-Pole game, we have four inputs, one for each value in the state, and two outputs, one for each action. To get $\hat{Q}$, we'll first choose an action, then simulate the game using that action. This will get us the next state, $s'$, and the reward. With that, we can calculate $\hat{Q}$ then pass it back into the $Q$ network to run the optimizer and update the weights.

Below is my implementation of the Q-network. I used two fully connected layers with ReLU activations. Two seems to be good enough, three might be better. Feel free to try it out.

In [5]:
def model_input(state_size, hidden_size, batch_size=1):
    states = tf.placeholder(tf.float32, [None, state_size], name='states')
    actions = tf.placeholder(tf.int32, [None], name='actions')
    targetQs = tf.placeholder(tf.float32, [None], name='targetQs')
    # RNN
    #cell = tf.nn.rnn_cell.GRUCell(hidden_size)
    #cell = tf.nn.rnn_cell.LSTMCell(hidden_size) #??? NOT working
    cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_size)
    cells = tf.nn.rnn_cell.MultiRNNCell([cell], state_is_tuple=True)
    initial_state = cells.zero_state(batch_size, tf.float32)
    return states, actions, targetQs, cells, initial_state

In [6]:
# RNN generator or sequence generator
def generator(states, action_size, initial_state, cells, hidden_size, reuse=False): 
    with tf.variable_scope('generator', reuse=reuse):
        # First fully connected layer
        inputs = tf.layers.dense(inputs=states, units=hidden_size)
        print(states.shape, inputs.shape)
        
        # with tf.variable_scope('dynamic_rnn_', reuse=tf.AUTO_REUSE):
        # dynamic means adapt to the batch_size and
        # static means can NOT adapt to the batch_size
        inputs_rnn = tf.reshape(inputs, [1, -1, hidden_size]) # NxH -> 1xNxH
        print(inputs_rnn.shape, initial_state)
        outputs_rnn, final_state = tf.nn.dynamic_rnn(cell=cells, inputs=inputs_rnn, initial_state=initial_state)
        print(outputs_rnn.shape, final_state)
        outputs = tf.reshape(outputs_rnn, [-1, hidden_size]) # 1xNxH -> NxH
        print(outputs.shape)

        # Last fully connected layer
        logits = tf.layers.dense(inputs=outputs, units=action_size)
        print(logits.shape)
        #predictions = tf.nn.softmax(logits)
        
        # logits are the action logits
        return logits, final_state

In [7]:
def model_loss(action_size, hidden_size, states, cells, initial_state, actions, targetQs):
    actions_logits, final_state = generator(states=states, cells=cells, initial_state=initial_state, 
                                            hidden_size=hidden_size, action_size=action_size)
    actions_labels = tf.one_hot(indices=actions, depth=action_size, dtype=actions_logits.dtype)
    Qs = tf.reduce_max(actions_logits*actions_labels, axis=1)
    loss = tf.reduce_mean(tf.square(Qs - targetQs))
    return actions_logits, final_state, loss

In [8]:
def model_opt(loss, learning_rate):
    # Get weights and bias to update
    t_vars = tf.trainable_variables()
    g_vars = [var for var in t_vars if var.name.startswith('generator')]

    # # Optimize MLP/CNN
    # with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): # Required for batchnorm (BN)
    # #opt = tf.train.AdamOptimizer(learning_rate).minimize(loss, var_list=g_vars)

    # # Optimize RNN
    #grads, _ = tf.clip_by_global_norm(t_list=tf.gradients(loss, g_vars), clip_norm=5) # usually around 1-5
    grads = tf.gradients(loss, g_vars)
    opt = tf.train.AdamOptimizer(learning_rate).apply_gradients(grads_and_vars=zip(grads, g_vars))

    return opt

In [9]:
class Model:
    def __init__(self, state_size, action_size, hidden_size, learning_rate):

        # Data of the Model: make the data available inside the framework
        self.states, self.actions, self.targetQs, cells, self.initial_state = model_input(
                state_size=state_size, hidden_size=hidden_size)
        
        # Create the Model: calculating the loss and forwad pass
        self.actions_logits, self.final_state, self.loss = model_loss(
            action_size=action_size, hidden_size=hidden_size, 
            states=self.states, actions=self.actions, 
            targetQs=self.targetQs, cells=cells, initial_state=self.initial_state)

        # Update the model: backward pass and backprop
        self.opt = model_opt(loss=self.loss, learning_rate=learning_rate)

## Experience replay

Reinforcement learning algorithms can have stability issues due to correlations between states. To reduce correlations when training, we can store the agent's experiences and later draw a random mini-batch of those experiences to train on. 

Here, we'll create a `Memory` object that will store our experiences, our transitions $<s, a, r, s'>$. This memory will have a maxmium capacity, so we can keep newer experiences in memory while getting rid of older experiences. Then, we'll sample a random mini-batch of transitions $<s, a, r, s'>$ and train on those.

Below, I've implemented a `Memory` object. If you're unfamiliar with `deque`, this is a double-ended queue. You can think of it like a tube open on both sides. You can put objects in either side of the tube. But if it's full, adding anything more will push an object out the other side. This is a great data structure to use for the memory buffer.

In [10]:
from collections import deque

class Memory():    
    def __init__(self, max_size = 1000):
        self.buffer = deque(maxlen=max_size)
        self.states = deque(maxlen=max_size)

## Exploration - Exploitation

To learn about the environment and rules of the game, the agent needs to explore by taking random actions. We'll do this by choosing a random action with some probability $\epsilon$ (epsilon).  That is, with some probability $\epsilon$ the agent will make a random action and with probability $1 - \epsilon$, the agent will choose an action from $Q(s,a)$. This is called an **$\epsilon$-greedy policy**.


At first, the agent needs to do a lot of exploring. Later when it has learned more, the agent can favor choosing actions based on what it has learned. This is called _exploitation_. We'll set it up so the agent is more likely to explore early in training, then more likely to exploit later in training.

## Q-Learning training algorithm

Putting all this together, we can list out the algorithm we'll use to train the network. We'll train the network in _episodes_. One *episode* is one simulation of the game. For this game, the goal is to keep the pole upright for 195 frames. So we can start a new episode once meeting that goal. The game ends if the pole tilts over too far, or if the cart moves too far the left or right. When a game ends, we'll start a new episode. Now, to train the agent:

* Initialize the memory $D$
* Initialize the action-value network $Q$ with random weights
* **For** episode = 1, $M$ **do**
  * **For** $t$, $T$ **do**
     * With probability $\epsilon$ select a random action $a_t$, otherwise select $a_t = \mathrm{argmax}_a Q(s,a)$
     * Execute action $a_t$ in simulator and observe reward $r_{t+1}$ and new state $s_{t+1}$
     * Store transition $<s_t, a_t, r_{t+1}, s_{t+1}>$ in memory $D$
     * Sample random mini-batch from $D$: $<s_j, a_j, r_j, s'_j>$
     * Set $\hat{Q}_j = r_j$ if the episode ends at $j+1$, otherwise set $\hat{Q}_j = r_j + \gamma \max_{a'}{Q(s'_j, a')}$
     * Make a gradient descent step with loss $(\hat{Q}_j - Q(s_j, a_j))^2$
  * **endfor**
* **endfor**

## Hyperparameters

One of the more difficult aspects of reinforcememt learning are the large number of hyperparameters. Not only are we tuning the network, but we're tuning the simulation.

In [11]:
# Network parameters
action_size = 2
state_size = 4
hidden_size = 64               # number of units in each Q-network hidden layer
learning_rate = 0.0001         # Q-network learning rate

# Memory parameters
memory_size = 128            # memory capacity - 1000 DQN
batch_size = 128             # experience mini-batch size - 20 DQN
gamma = 0.99                 # future reward discount

In [12]:
# Reset/init the graph/session
graph = tf.reset_default_graph()

# Init the model
model = Model(action_size=action_size, hidden_size=hidden_size, state_size=state_size, learning_rate=learning_rate)

# Init the memory
memory = Memory(max_size=batch_size)

(?, 4) (?, 64)
(1, ?, 64) (LSTMStateTuple(c=<tf.Tensor 'MultiRNNCellZeroState/BasicLSTMCellZeroState/zeros:0' shape=(1, 64) dtype=float32>, h=<tf.Tensor 'MultiRNNCellZeroState/BasicLSTMCellZeroState/zeros_1:0' shape=(1, 64) dtype=float32>),)
(1, ?, 64) (LSTMStateTuple(c=<tf.Tensor 'generator/rnn/while/Exit_3:0' shape=(1, 64) dtype=float32>, h=<tf.Tensor 'generator/rnn/while/Exit_4:0' shape=(1, 64) dtype=float32>),)
(?, 64)
(?, 2)


In [13]:
model.initial_state[0]

LSTMStateTuple(c=<tf.Tensor 'MultiRNNCellZeroState/BasicLSTMCellZeroState/zeros:0' shape=(1, 64) dtype=float32>, h=<tf.Tensor 'MultiRNNCellZeroState/BasicLSTMCellZeroState/zeros_1:0' shape=(1, 64) dtype=float32>)

## Populate the memory (exprience memory)

Here I'm re-initializing the simulation and pre-populating the memory. The agent is taking random actions and storing the transitions in memory. This will help the agent with exploring the game.

In [14]:
state = env.reset()
for _ in range(memory_size):
    action = env.action_space.sample()
    next_state, reward, done, _ = env.step(action)
    memory.buffer.append([state, action, next_state, reward, float(done)])
    #memory.states.append(np.zeros([1, hidden_size])) # gru
    memory.states.append([np.zeros([1, hidden_size]), np.zeros([1, hidden_size])]) # lstm
    state = next_state
    if done is True:
        # Reseting the env/first state
        state = env.reset()

In [15]:
# # Training
# batch = memory.buffer
# states = np.array([each[0] for each in batch])
# actions = np.array([each[1] for each in batch])
# next_states = np.array([each[2] for each in batch])
# rewards = np.array([each[3] for each in batch])
# dones = np.array([each[4] for each in batch])

In [16]:
# memory.states[0].shape, model.initial_state[0].shape # gru
memory.states[0][1].shape, model.initial_state[0][1].shape #lstm

((1, 64), TensorShape([Dimension(1), Dimension(64)]))

In [17]:
# memory.states[0][0].shape, model.initial_state[0][0].shape

## Training the model

Below we'll train our agent. If you want to watch it train, uncomment the `env.render()` line. This is slow because it's rendering the frames slower than the network can train. But, it's cool to watch the agent get better at the game.

In [18]:
# initial_states = np.array(memory.states)
# initial_states.shape

In [19]:
saver = tf.train.Saver()
episode_rewards_list, rewards_list, loss_list = [], [], []

# TF session for training
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    episode_reward = deque(maxlen=100) # 100 episodes average/running average/running mean/window
    
    # Training episodes/epochs
    for ep in range(11111):
        total_reward = 0
        loss_batch = []
        state = env.reset()
        initial_state = sess.run(model.initial_state)

        # Training steps/batches
        while True:
            action_logits, final_state = sess.run([model.actions_logits, model.final_state],
                                                  feed_dict = {model.states: state.reshape([1, -1]), 
                                                               model.initial_state: initial_state})
            action = np.argmax(action_logits)
            next_state, reward, done, _ = env.step(action)
            memory.buffer.append([state, action, next_state, reward, float(done)])
            memory.states.append(initial_state)
            total_reward += reward
            state = next_state
            initial_state = final_state

            # Training
            batch = memory.buffer
            states = np.array([each[0] for each in batch])
            actions = np.array([each[1] for each in batch])
            next_states = np.array([each[2] for each in batch])
            rewards = np.array([each[3] for each in batch])
            dones = np.array([each[4] for each in batch])
            initial_states = memory.states
            next_actions_logits = sess.run(model.actions_logits, 
                                           feed_dict = {model.states: next_states,
                                                        model.initial_state: initial_states[1]})
            nextQs = np.max(next_actions_logits, axis=1) * (1-dones)
            targetQs = rewards + (gamma * nextQs)
            loss, _ = sess.run([model.loss, model.opt], feed_dict = {model.states: states, 
                                                                     model.actions: actions,
                                                                     model.targetQs: targetQs,
                                                                     model.initial_state: initial_states[0]})
            # End of training
            loss_batch.append(loss)
            if done is True:
                break
                
        # Outputing: priting out/Potting
        episode_reward.append(total_reward)
        print('Episode:{}'.format(ep),
              'meanR:{:.4f}'.format(np.mean(episode_reward)),
              'R:{:.4f}'.format(total_reward),
              'loss:{:.4f}'.format(np.mean(loss_batch)))
        # Ploting out
        episode_rewards_list.append([ep, np.mean(episode_reward)])
        rewards_list.append([ep, total_reward])
        loss_list.append([ep, np.mean(loss_batch)])
        # Break episode/epoch loop
        if np.mean(episode_reward) >= 500:
            break
            
    # At the end of all training episodes/epochs
    saver.save(sess, 'checkpoints/model.ckpt')

Episode:0 meanR:9.0000 R:9.0000 loss:1.0361
Episode:1 meanR:9.0000 R:9.0000 loss:1.0564
Episode:2 meanR:9.0000 R:9.0000 loss:1.0656
Episode:3 meanR:8.7500 R:8.0000 loss:1.0961
Episode:4 meanR:8.8000 R:9.0000 loss:1.1396
Episode:5 meanR:8.6667 R:8.0000 loss:1.1949
Episode:6 meanR:8.8571 R:10.0000 loss:1.2547
Episode:7 meanR:9.0000 R:10.0000 loss:1.3066
Episode:8 meanR:8.8889 R:8.0000 loss:1.3911
Episode:9 meanR:8.9000 R:9.0000 loss:1.3862
Episode:10 meanR:9.0000 R:10.0000 loss:1.4211
Episode:11 meanR:9.0000 R:9.0000 loss:1.5247
Episode:12 meanR:9.0000 R:9.0000 loss:1.6145
Episode:13 meanR:9.0000 R:9.0000 loss:1.8368
Episode:14 meanR:9.0667 R:10.0000 loss:2.0206
Episode:15 meanR:9.0000 R:8.0000 loss:2.2681
Episode:16 meanR:9.0000 R:9.0000 loss:2.5478
Episode:17 meanR:9.0000 R:9.0000 loss:2.8759
Episode:18 meanR:8.9474 R:8.0000 loss:3.1003
Episode:19 meanR:8.9500 R:9.0000 loss:3.2063
Episode:20 meanR:8.9524 R:9.0000 loss:3.5263
Episode:21 meanR:8.9545 R:9.0000 loss:3.7069
Episode:22 meanR

Episode:179 meanR:9.3900 R:10.0000 loss:0.1023
Episode:180 meanR:9.3900 R:10.0000 loss:0.0949
Episode:181 meanR:9.3900 R:10.0000 loss:0.0919
Episode:182 meanR:9.3900 R:9.0000 loss:0.0895
Episode:183 meanR:9.4000 R:9.0000 loss:0.0929
Episode:184 meanR:9.4000 R:9.0000 loss:0.0898
Episode:185 meanR:9.3900 R:9.0000 loss:0.0978
Episode:186 meanR:9.4000 R:9.0000 loss:0.1017
Episode:187 meanR:9.3900 R:9.0000 loss:0.1019
Episode:188 meanR:9.4000 R:10.0000 loss:0.0856
Episode:189 meanR:9.4000 R:9.0000 loss:0.0828
Episode:190 meanR:9.4100 R:10.0000 loss:0.0741
Episode:191 meanR:9.4200 R:10.0000 loss:0.0660
Episode:192 meanR:9.4200 R:9.0000 loss:0.0650
Episode:193 meanR:9.4200 R:10.0000 loss:0.0603
Episode:194 meanR:9.4200 R:10.0000 loss:0.0677
Episode:195 meanR:9.4200 R:10.0000 loss:0.0630
Episode:196 meanR:9.4200 R:10.0000 loss:0.0622
Episode:197 meanR:9.4200 R:10.0000 loss:0.0573
Episode:198 meanR:9.4100 R:8.0000 loss:0.0659
Episode:199 meanR:9.4100 R:9.0000 loss:0.0724
Episode:200 meanR:9.420

Episode:356 meanR:9.2800 R:9.0000 loss:0.0672
Episode:357 meanR:9.2600 R:8.0000 loss:0.0697
Episode:358 meanR:9.2600 R:10.0000 loss:0.0629
Episode:359 meanR:9.2600 R:9.0000 loss:0.0599
Episode:360 meanR:9.2500 R:9.0000 loss:0.0588
Episode:361 meanR:9.2500 R:9.0000 loss:0.0548
Episode:362 meanR:9.2600 R:10.0000 loss:0.0540
Episode:363 meanR:9.2600 R:9.0000 loss:0.0565
Episode:364 meanR:9.2700 R:10.0000 loss:0.0566
Episode:365 meanR:9.2800 R:10.0000 loss:0.0670
Episode:366 meanR:9.2600 R:8.0000 loss:0.0574
Episode:367 meanR:9.2400 R:8.0000 loss:0.0418
Episode:368 meanR:9.2300 R:10.0000 loss:0.0563
Episode:369 meanR:9.2200 R:9.0000 loss:0.0734
Episode:370 meanR:9.2100 R:9.0000 loss:0.0508
Episode:371 meanR:9.2000 R:9.0000 loss:0.0576
Episode:372 meanR:9.2000 R:10.0000 loss:0.0446
Episode:373 meanR:9.1800 R:8.0000 loss:0.0436
Episode:374 meanR:9.1800 R:10.0000 loss:0.0571
Episode:375 meanR:9.2000 R:10.0000 loss:0.0518
Episode:376 meanR:9.2000 R:9.0000 loss:0.0533
Episode:377 meanR:9.2100 R

Episode:533 meanR:9.3300 R:9.0000 loss:0.0542
Episode:534 meanR:9.3300 R:10.0000 loss:0.0533
Episode:535 meanR:9.3500 R:11.0000 loss:0.0444
Episode:536 meanR:9.3600 R:10.0000 loss:0.0468
Episode:537 meanR:9.3700 R:10.0000 loss:0.0415
Episode:538 meanR:9.3500 R:8.0000 loss:0.0440
Episode:539 meanR:9.3600 R:10.0000 loss:0.0435
Episode:540 meanR:9.3800 R:10.0000 loss:0.0453
Episode:541 meanR:9.4000 R:10.0000 loss:0.0420
Episode:542 meanR:9.4000 R:10.0000 loss:0.0479
Episode:543 meanR:9.4000 R:8.0000 loss:0.0566
Episode:544 meanR:9.3800 R:8.0000 loss:0.0613
Episode:545 meanR:9.3900 R:11.0000 loss:0.0505
Episode:546 meanR:9.3900 R:9.0000 loss:0.0552
Episode:547 meanR:9.3900 R:10.0000 loss:0.0530
Episode:548 meanR:9.3900 R:8.0000 loss:0.0571
Episode:549 meanR:9.3800 R:9.0000 loss:0.0532
Episode:550 meanR:9.3900 R:9.0000 loss:0.0527
Episode:551 meanR:9.3900 R:10.0000 loss:0.0540
Episode:552 meanR:9.3800 R:9.0000 loss:0.0573
Episode:553 meanR:9.3800 R:9.0000 loss:0.0464
Episode:554 meanR:9.390

Episode:710 meanR:9.4200 R:10.0000 loss:0.0391
Episode:711 meanR:9.4400 R:11.0000 loss:0.0332
Episode:712 meanR:9.4400 R:9.0000 loss:0.0547
Episode:713 meanR:9.4300 R:9.0000 loss:0.0435
Episode:714 meanR:9.4300 R:9.0000 loss:0.0426
Episode:715 meanR:9.4500 R:10.0000 loss:0.0522
Episode:716 meanR:9.4600 R:9.0000 loss:0.0453
Episode:717 meanR:9.4600 R:10.0000 loss:0.0385
Episode:718 meanR:9.4500 R:9.0000 loss:0.0467
Episode:719 meanR:9.4500 R:10.0000 loss:0.0416
Episode:720 meanR:9.4400 R:8.0000 loss:0.0434
Episode:721 meanR:9.4400 R:9.0000 loss:0.0372
Episode:722 meanR:9.4300 R:9.0000 loss:0.0416
Episode:723 meanR:9.4200 R:9.0000 loss:0.0323
Episode:724 meanR:9.4200 R:10.0000 loss:0.0309
Episode:725 meanR:9.4400 R:11.0000 loss:0.0267
Episode:726 meanR:9.4400 R:10.0000 loss:0.0281
Episode:727 meanR:9.4400 R:9.0000 loss:0.0279
Episode:728 meanR:9.4400 R:10.0000 loss:0.0295
Episode:729 meanR:9.4500 R:11.0000 loss:0.0251
Episode:730 meanR:9.4400 R:9.0000 loss:0.0329
Episode:731 meanR:9.4400

Episode:887 meanR:9.3200 R:9.0000 loss:0.0347
Episode:888 meanR:9.3200 R:10.0000 loss:0.0307
Episode:889 meanR:9.3200 R:10.0000 loss:0.0270
Episode:890 meanR:9.3200 R:8.0000 loss:0.0306
Episode:891 meanR:9.3200 R:9.0000 loss:0.0307
Episode:892 meanR:9.3100 R:9.0000 loss:0.0286
Episode:893 meanR:9.3100 R:10.0000 loss:0.0312
Episode:894 meanR:9.3000 R:9.0000 loss:0.0295
Episode:895 meanR:9.3000 R:10.0000 loss:0.0285
Episode:896 meanR:9.3100 R:10.0000 loss:0.0297
Episode:897 meanR:9.3100 R:10.0000 loss:0.0309
Episode:898 meanR:9.3300 R:10.0000 loss:0.0274
Episode:899 meanR:9.3400 R:9.0000 loss:0.0281
Episode:900 meanR:9.3500 R:9.0000 loss:0.0312
Episode:901 meanR:9.3300 R:8.0000 loss:0.0330
Episode:902 meanR:9.3500 R:10.0000 loss:0.0292
Episode:903 meanR:9.3500 R:9.0000 loss:0.0310
Episode:904 meanR:9.3400 R:9.0000 loss:0.0328
Episode:905 meanR:9.3500 R:10.0000 loss:0.0288
Episode:906 meanR:9.3700 R:10.0000 loss:0.0254
Episode:907 meanR:9.3600 R:9.0000 loss:0.0257
Episode:908 meanR:9.3600

Episode:1062 meanR:9.2500 R:9.0000 loss:0.0226
Episode:1063 meanR:9.2400 R:9.0000 loss:0.0204
Episode:1064 meanR:9.2600 R:10.0000 loss:0.0182
Episode:1065 meanR:9.2500 R:9.0000 loss:0.0206
Episode:1066 meanR:9.2500 R:9.0000 loss:0.0236
Episode:1067 meanR:9.2400 R:9.0000 loss:0.0132
Episode:1068 meanR:9.2300 R:9.0000 loss:0.0264
Episode:1069 meanR:9.2500 R:10.0000 loss:0.0189
Episode:1070 meanR:9.2600 R:10.0000 loss:0.0233
Episode:1071 meanR:9.2700 R:9.0000 loss:0.0251
Episode:1072 meanR:9.2900 R:10.0000 loss:0.0204
Episode:1073 meanR:9.2800 R:8.0000 loss:0.0227
Episode:1074 meanR:9.2700 R:9.0000 loss:0.0313
Episode:1075 meanR:9.2800 R:10.0000 loss:0.0247
Episode:1076 meanR:9.2900 R:9.0000 loss:0.0255
Episode:1077 meanR:9.2900 R:10.0000 loss:0.0245
Episode:1078 meanR:9.2800 R:9.0000 loss:0.0375
Episode:1079 meanR:9.2800 R:10.0000 loss:0.0256
Episode:1080 meanR:9.2700 R:9.0000 loss:0.0313
Episode:1081 meanR:9.2600 R:9.0000 loss:0.0263
Episode:1082 meanR:9.2600 R:10.0000 loss:0.0361
Episo

Episode:1235 meanR:9.4300 R:9.0000 loss:0.0176
Episode:1236 meanR:9.4100 R:8.0000 loss:0.0188
Episode:1237 meanR:9.3900 R:8.0000 loss:0.0182
Episode:1238 meanR:9.3900 R:9.0000 loss:0.0173
Episode:1239 meanR:9.3800 R:9.0000 loss:0.0198
Episode:1240 meanR:9.3700 R:8.0000 loss:0.0163
Episode:1241 meanR:9.3600 R:9.0000 loss:0.0144
Episode:1242 meanR:9.3500 R:9.0000 loss:0.0167
Episode:1243 meanR:9.3500 R:10.0000 loss:0.0172
Episode:1244 meanR:9.3800 R:11.0000 loss:0.0209
Episode:1245 meanR:9.3700 R:9.0000 loss:0.0196
Episode:1246 meanR:9.3800 R:9.0000 loss:0.0169
Episode:1247 meanR:9.3800 R:10.0000 loss:0.0176
Episode:1248 meanR:9.3700 R:9.0000 loss:0.0238
Episode:1249 meanR:9.3700 R:9.0000 loss:0.0230
Episode:1250 meanR:9.3600 R:8.0000 loss:0.0233
Episode:1251 meanR:9.3600 R:9.0000 loss:0.0238
Episode:1252 meanR:9.3700 R:10.0000 loss:0.0267
Episode:1253 meanR:9.3700 R:10.0000 loss:0.0235
Episode:1254 meanR:9.3800 R:10.0000 loss:0.0221
Episode:1255 meanR:9.3900 R:10.0000 loss:0.0193
Episod

Episode:1408 meanR:9.4000 R:9.0000 loss:0.0122
Episode:1409 meanR:9.4100 R:10.0000 loss:0.0128
Episode:1410 meanR:9.4300 R:10.0000 loss:0.0118
Episode:1411 meanR:9.4100 R:8.0000 loss:0.0132
Episode:1412 meanR:9.4100 R:10.0000 loss:0.0123
Episode:1413 meanR:9.4200 R:10.0000 loss:0.0130
Episode:1414 meanR:9.4200 R:8.0000 loss:0.0151
Episode:1415 meanR:9.4300 R:10.0000 loss:0.0145
Episode:1416 meanR:9.4300 R:9.0000 loss:0.0127
Episode:1417 meanR:9.4400 R:10.0000 loss:0.0111
Episode:1418 meanR:9.4400 R:9.0000 loss:0.0111
Episode:1419 meanR:9.4500 R:10.0000 loss:0.0110
Episode:1420 meanR:9.4500 R:10.0000 loss:0.0126
Episode:1421 meanR:9.4700 R:10.0000 loss:0.0133
Episode:1422 meanR:9.4700 R:10.0000 loss:0.0119
Episode:1423 meanR:9.4600 R:9.0000 loss:0.0105
Episode:1424 meanR:9.4700 R:9.0000 loss:0.0119
Episode:1425 meanR:9.4700 R:10.0000 loss:0.0106
Episode:1426 meanR:9.4500 R:8.0000 loss:0.0103
Episode:1427 meanR:9.4600 R:10.0000 loss:0.0138
Episode:1428 meanR:9.4500 R:8.0000 loss:0.0144
E

Episode:1581 meanR:9.3300 R:10.0000 loss:0.0086
Episode:1582 meanR:9.3500 R:10.0000 loss:0.0071
Episode:1583 meanR:9.3500 R:10.0000 loss:0.0058
Episode:1584 meanR:9.3700 R:11.0000 loss:0.0084
Episode:1585 meanR:9.3600 R:8.0000 loss:0.0064
Episode:1586 meanR:9.3500 R:10.0000 loss:0.0097
Episode:1587 meanR:9.3400 R:9.0000 loss:0.0106
Episode:1588 meanR:9.3400 R:10.0000 loss:0.0105
Episode:1589 meanR:9.3200 R:8.0000 loss:0.0133
Episode:1590 meanR:9.3300 R:10.0000 loss:0.0126
Episode:1591 meanR:9.3300 R:10.0000 loss:0.0129
Episode:1592 meanR:9.3300 R:8.0000 loss:0.0124
Episode:1593 meanR:9.3500 R:11.0000 loss:0.0145
Episode:1594 meanR:9.3400 R:9.0000 loss:0.0164
Episode:1595 meanR:9.3500 R:10.0000 loss:0.0154
Episode:1596 meanR:9.3600 R:10.0000 loss:0.0144
Episode:1597 meanR:9.3600 R:9.0000 loss:0.0140
Episode:1598 meanR:9.3700 R:10.0000 loss:0.0183
Episode:1599 meanR:9.3900 R:11.0000 loss:0.0161
Episode:1600 meanR:9.3900 R:9.0000 loss:0.0134
Episode:1601 meanR:9.3900 R:10.0000 loss:0.0175

Episode:1754 meanR:9.2700 R:9.0000 loss:0.0157
Episode:1755 meanR:9.2800 R:10.0000 loss:0.0166
Episode:1756 meanR:9.2700 R:8.0000 loss:0.0219
Episode:1757 meanR:9.2700 R:10.0000 loss:0.0185
Episode:1758 meanR:9.2800 R:9.0000 loss:0.0171
Episode:1759 meanR:9.2700 R:8.0000 loss:0.0173
Episode:1760 meanR:9.2800 R:10.0000 loss:0.0180
Episode:1761 meanR:9.2800 R:9.0000 loss:0.0204
Episode:1762 meanR:9.2800 R:10.0000 loss:0.0185
Episode:1763 meanR:9.2700 R:9.0000 loss:0.0155
Episode:1764 meanR:9.2800 R:11.0000 loss:0.0150
Episode:1765 meanR:9.2800 R:10.0000 loss:0.0141
Episode:1766 meanR:9.2700 R:9.0000 loss:0.0198
Episode:1767 meanR:9.2800 R:9.0000 loss:0.0176
Episode:1768 meanR:9.2800 R:9.0000 loss:0.0161
Episode:1769 meanR:9.2700 R:9.0000 loss:0.0164
Episode:1770 meanR:9.2700 R:10.0000 loss:0.0152
Episode:1771 meanR:9.2800 R:10.0000 loss:0.0198
Episode:1772 meanR:9.2700 R:9.0000 loss:0.0208
Episode:1773 meanR:9.2700 R:9.0000 loss:0.0184
Episode:1774 meanR:9.2600 R:9.0000 loss:0.0189
Episo

Episode:1927 meanR:9.3200 R:10.0000 loss:0.0115
Episode:1928 meanR:9.3200 R:9.0000 loss:0.0117
Episode:1929 meanR:9.3200 R:10.0000 loss:0.0117
Episode:1930 meanR:9.3200 R:10.0000 loss:0.0142
Episode:1931 meanR:9.3100 R:9.0000 loss:0.0154
Episode:1932 meanR:9.3000 R:9.0000 loss:0.0151
Episode:1933 meanR:9.3000 R:9.0000 loss:0.0146
Episode:1934 meanR:9.3000 R:9.0000 loss:0.0162
Episode:1935 meanR:9.3000 R:10.0000 loss:0.0151
Episode:1936 meanR:9.2800 R:8.0000 loss:0.0162
Episode:1937 meanR:9.2800 R:10.0000 loss:0.0162
Episode:1938 meanR:9.2800 R:9.0000 loss:0.0110
Episode:1939 meanR:9.2800 R:10.0000 loss:0.0133
Episode:1940 meanR:9.2800 R:10.0000 loss:0.0102
Episode:1941 meanR:9.2900 R:10.0000 loss:0.0086
Episode:1942 meanR:9.3000 R:9.0000 loss:0.0080
Episode:1943 meanR:9.3000 R:9.0000 loss:0.0071
Episode:1944 meanR:9.3000 R:10.0000 loss:0.0076
Episode:1945 meanR:9.3000 R:9.0000 loss:0.0067
Episode:1946 meanR:9.2900 R:10.0000 loss:0.0065
Episode:1947 meanR:9.2900 R:9.0000 loss:0.0063
Epi

Episode:2100 meanR:9.3000 R:9.0000 loss:0.0135
Episode:2101 meanR:9.2900 R:10.0000 loss:0.0117
Episode:2102 meanR:9.2800 R:9.0000 loss:0.0123
Episode:2103 meanR:9.2800 R:9.0000 loss:0.0136
Episode:2104 meanR:9.2800 R:9.0000 loss:0.0114
Episode:2105 meanR:9.2700 R:9.0000 loss:0.0119
Episode:2106 meanR:9.2700 R:9.0000 loss:0.0097
Episode:2107 meanR:9.2800 R:10.0000 loss:0.0117
Episode:2108 meanR:9.2700 R:8.0000 loss:0.0103
Episode:2109 meanR:9.2800 R:10.0000 loss:0.0127
Episode:2110 meanR:9.2900 R:10.0000 loss:0.0116
Episode:2111 meanR:9.2800 R:8.0000 loss:0.0121
Episode:2112 meanR:9.2800 R:9.0000 loss:0.0113
Episode:2113 meanR:9.2900 R:9.0000 loss:0.0104
Episode:2114 meanR:9.3000 R:10.0000 loss:0.0115
Episode:2115 meanR:9.3100 R:10.0000 loss:0.0084
Episode:2116 meanR:9.3000 R:9.0000 loss:0.0127
Episode:2117 meanR:9.3100 R:9.0000 loss:0.0099
Episode:2118 meanR:9.3000 R:9.0000 loss:0.0115
Episode:2119 meanR:9.2900 R:8.0000 loss:0.0102
Episode:2120 meanR:9.2900 R:9.0000 loss:0.0096
Episode

Episode:2273 meanR:9.4400 R:10.0000 loss:0.0127
Episode:2274 meanR:9.4300 R:8.0000 loss:0.0122
Episode:2275 meanR:9.4400 R:10.0000 loss:0.0112
Episode:2276 meanR:9.4300 R:9.0000 loss:0.0126
Episode:2277 meanR:9.4200 R:9.0000 loss:0.0123
Episode:2278 meanR:9.4000 R:8.0000 loss:0.0146
Episode:2279 meanR:9.3900 R:9.0000 loss:0.0139
Episode:2280 meanR:9.3900 R:9.0000 loss:0.0128
Episode:2281 meanR:9.3900 R:10.0000 loss:0.0118
Episode:2282 meanR:9.3900 R:8.0000 loss:0.0124
Episode:2283 meanR:9.4100 R:10.0000 loss:0.0105
Episode:2284 meanR:9.4100 R:9.0000 loss:0.0113
Episode:2285 meanR:9.4200 R:9.0000 loss:0.0104
Episode:2286 meanR:9.4300 R:10.0000 loss:0.0107
Episode:2287 meanR:9.4400 R:10.0000 loss:0.0107
Episode:2288 meanR:9.4500 R:9.0000 loss:0.0115
Episode:2289 meanR:9.4400 R:8.0000 loss:0.0100
Episode:2290 meanR:9.4400 R:10.0000 loss:0.0095
Episode:2291 meanR:9.4200 R:8.0000 loss:0.0077
Episode:2292 meanR:9.4100 R:8.0000 loss:0.0102
Episode:2293 meanR:9.4100 R:10.0000 loss:0.0092
Episo

Episode:2446 meanR:9.4800 R:9.0000 loss:0.0138
Episode:2447 meanR:9.4700 R:9.0000 loss:0.0128
Episode:2448 meanR:9.4800 R:10.0000 loss:0.0153
Episode:2449 meanR:9.4600 R:8.0000 loss:0.0116
Episode:2450 meanR:9.4500 R:9.0000 loss:0.0168
Episode:2451 meanR:9.4500 R:10.0000 loss:0.0140
Episode:2452 meanR:9.4500 R:9.0000 loss:0.0143
Episode:2453 meanR:9.4400 R:9.0000 loss:0.0141
Episode:2454 meanR:9.4500 R:10.0000 loss:0.0141
Episode:2455 meanR:9.4400 R:9.0000 loss:0.0137
Episode:2456 meanR:9.4400 R:10.0000 loss:0.0147
Episode:2457 meanR:9.4200 R:8.0000 loss:0.0140
Episode:2458 meanR:9.4200 R:10.0000 loss:0.0135
Episode:2459 meanR:9.4000 R:8.0000 loss:0.0145
Episode:2460 meanR:9.4200 R:10.0000 loss:0.0120
Episode:2461 meanR:9.4300 R:10.0000 loss:0.0105
Episode:2462 meanR:9.4200 R:9.0000 loss:0.0099
Episode:2463 meanR:9.4200 R:10.0000 loss:0.0106
Episode:2464 meanR:9.4300 R:11.0000 loss:0.0077
Episode:2465 meanR:9.4400 R:10.0000 loss:0.0094
Episode:2466 meanR:9.4600 R:11.0000 loss:0.0080
Ep

Episode:2619 meanR:9.4600 R:10.0000 loss:0.0084
Episode:2620 meanR:9.4600 R:9.0000 loss:0.0078
Episode:2621 meanR:9.4700 R:10.0000 loss:0.0071
Episode:2622 meanR:9.4700 R:10.0000 loss:0.0073
Episode:2623 meanR:9.4800 R:9.0000 loss:0.0069
Episode:2624 meanR:9.4700 R:9.0000 loss:0.0055
Episode:2625 meanR:9.4600 R:9.0000 loss:0.0059
Episode:2626 meanR:9.4600 R:10.0000 loss:0.0077
Episode:2627 meanR:9.4400 R:8.0000 loss:0.0098
Episode:2628 meanR:9.4300 R:9.0000 loss:0.0088
Episode:2629 meanR:9.4400 R:10.0000 loss:0.0089
Episode:2630 meanR:9.4400 R:9.0000 loss:0.0095
Episode:2631 meanR:9.4500 R:9.0000 loss:0.0107
Episode:2632 meanR:9.4500 R:10.0000 loss:0.0101
Episode:2633 meanR:9.4500 R:10.0000 loss:0.0094
Episode:2634 meanR:9.4400 R:9.0000 loss:0.0094
Episode:2635 meanR:9.4300 R:9.0000 loss:0.0095
Episode:2636 meanR:9.4200 R:9.0000 loss:0.0099
Episode:2637 meanR:9.4200 R:8.0000 loss:0.0115
Episode:2638 meanR:9.4200 R:10.0000 loss:0.0112
Episode:2639 meanR:9.4200 R:10.0000 loss:0.0111
Epis

Episode:2792 meanR:9.2400 R:8.0000 loss:0.0120
Episode:2793 meanR:9.2300 R:8.0000 loss:0.0130
Episode:2794 meanR:9.2300 R:9.0000 loss:0.0144
Episode:2795 meanR:9.2400 R:9.0000 loss:0.0152
Episode:2796 meanR:9.2400 R:10.0000 loss:0.0159
Episode:2797 meanR:9.2500 R:10.0000 loss:0.0139
Episode:2798 meanR:9.2400 R:9.0000 loss:0.0136
Episode:2799 meanR:9.2500 R:10.0000 loss:0.0144
Episode:2800 meanR:9.2600 R:10.0000 loss:0.0136
Episode:2801 meanR:9.2700 R:10.0000 loss:0.0138
Episode:2802 meanR:9.2600 R:9.0000 loss:0.0120
Episode:2803 meanR:9.2700 R:9.0000 loss:0.0137
Episode:2804 meanR:9.2800 R:11.0000 loss:0.0136
Episode:2805 meanR:9.2700 R:9.0000 loss:0.0137
Episode:2806 meanR:9.2800 R:9.0000 loss:0.0140
Episode:2807 meanR:9.2900 R:10.0000 loss:0.0107
Episode:2808 meanR:9.2900 R:9.0000 loss:0.0089
Episode:2809 meanR:9.2800 R:9.0000 loss:0.0090
Episode:2810 meanR:9.2900 R:10.0000 loss:0.0082
Episode:2811 meanR:9.2900 R:9.0000 loss:0.0084
Episode:2812 meanR:9.3000 R:10.0000 loss:0.0089
Epis

Episode:2965 meanR:9.3600 R:10.0000 loss:0.0206
Episode:2966 meanR:9.3600 R:10.0000 loss:0.0176
Episode:2967 meanR:9.3500 R:9.0000 loss:0.0156
Episode:2968 meanR:9.3400 R:8.0000 loss:0.0153
Episode:2969 meanR:9.3600 R:10.0000 loss:0.0137
Episode:2970 meanR:9.3600 R:10.0000 loss:0.0130
Episode:2971 meanR:9.3700 R:10.0000 loss:0.0132
Episode:2972 meanR:9.3800 R:10.0000 loss:0.0150
Episode:2973 meanR:9.3900 R:10.0000 loss:0.0109
Episode:2974 meanR:9.3900 R:10.0000 loss:0.0110
Episode:2975 meanR:9.3900 R:10.0000 loss:0.0124
Episode:2976 meanR:9.3800 R:9.0000 loss:0.0132
Episode:2977 meanR:9.3800 R:10.0000 loss:0.0127
Episode:2978 meanR:9.3800 R:10.0000 loss:0.0123
Episode:2979 meanR:9.3800 R:10.0000 loss:0.0112
Episode:2980 meanR:9.3800 R:9.0000 loss:0.0103
Episode:2981 meanR:9.3800 R:9.0000 loss:0.0112
Episode:2982 meanR:9.3800 R:9.0000 loss:0.0100
Episode:2983 meanR:9.4000 R:10.0000 loss:0.0119
Episode:2984 meanR:9.4200 R:10.0000 loss:0.0128
Episode:2985 meanR:9.4200 R:10.0000 loss:0.013

Episode:3138 meanR:9.3700 R:9.0000 loss:0.0158
Episode:3139 meanR:9.3600 R:9.0000 loss:0.0150
Episode:3140 meanR:9.3500 R:9.0000 loss:0.0148
Episode:3141 meanR:9.3400 R:9.0000 loss:0.0133
Episode:3142 meanR:9.3500 R:10.0000 loss:0.0130
Episode:3143 meanR:9.3600 R:9.0000 loss:0.0127
Episode:3144 meanR:9.3700 R:10.0000 loss:0.0132
Episode:3145 meanR:9.3700 R:9.0000 loss:0.0089
Episode:3146 meanR:9.3500 R:8.0000 loss:0.0089
Episode:3147 meanR:9.3600 R:10.0000 loss:0.0071
Episode:3148 meanR:9.3600 R:10.0000 loss:0.0082
Episode:3149 meanR:9.3700 R:11.0000 loss:0.0094
Episode:3150 meanR:9.3600 R:9.0000 loss:0.0128
Episode:3151 meanR:9.3600 R:9.0000 loss:0.0134
Episode:3152 meanR:9.3700 R:10.0000 loss:0.0141
Episode:3153 meanR:9.3700 R:10.0000 loss:0.0152
Episode:3154 meanR:9.3700 R:10.0000 loss:0.0167
Episode:3155 meanR:9.3700 R:10.0000 loss:0.0180
Episode:3156 meanR:9.3800 R:10.0000 loss:0.0184
Episode:3157 meanR:9.3700 R:9.0000 loss:0.0177
Episode:3158 meanR:9.3600 R:9.0000 loss:0.0181
Epi

Episode:3311 meanR:9.3700 R:10.0000 loss:0.0106
Episode:3312 meanR:9.3400 R:8.0000 loss:0.0100
Episode:3313 meanR:9.3500 R:10.0000 loss:0.0101
Episode:3314 meanR:9.3500 R:10.0000 loss:0.0103
Episode:3315 meanR:9.3600 R:10.0000 loss:0.0100
Episode:3316 meanR:9.3600 R:10.0000 loss:0.0089
Episode:3317 meanR:9.3400 R:8.0000 loss:0.0077
Episode:3318 meanR:9.3500 R:9.0000 loss:0.0092
Episode:3319 meanR:9.3600 R:10.0000 loss:0.0115
Episode:3320 meanR:9.3600 R:9.0000 loss:0.0096
Episode:3321 meanR:9.3700 R:10.0000 loss:0.0094
Episode:3322 meanR:9.3700 R:9.0000 loss:0.0115
Episode:3323 meanR:9.3800 R:10.0000 loss:0.0092
Episode:3324 meanR:9.3600 R:9.0000 loss:0.0098
Episode:3325 meanR:9.3700 R:10.0000 loss:0.0128
Episode:3326 meanR:9.3700 R:10.0000 loss:0.0090
Episode:3327 meanR:9.3600 R:8.0000 loss:0.0110
Episode:3328 meanR:9.3700 R:10.0000 loss:0.0112
Episode:3329 meanR:9.3500 R:8.0000 loss:0.0121
Episode:3330 meanR:9.3700 R:10.0000 loss:0.0114
Episode:3331 meanR:9.3700 R:10.0000 loss:0.0103


Episode:3484 meanR:9.2600 R:9.0000 loss:0.0156
Episode:3485 meanR:9.2700 R:10.0000 loss:0.0148
Episode:3486 meanR:9.2700 R:9.0000 loss:0.0140
Episode:3487 meanR:9.2800 R:10.0000 loss:0.0131
Episode:3488 meanR:9.2600 R:8.0000 loss:0.0140
Episode:3489 meanR:9.2600 R:10.0000 loss:0.0144
Episode:3490 meanR:9.2500 R:9.0000 loss:0.0153
Episode:3491 meanR:9.2500 R:9.0000 loss:0.0121
Episode:3492 meanR:9.2400 R:8.0000 loss:0.0104
Episode:3493 meanR:9.2500 R:11.0000 loss:0.0103
Episode:3494 meanR:9.2500 R:9.0000 loss:0.0100
Episode:3495 meanR:9.2500 R:10.0000 loss:0.0094
Episode:3496 meanR:9.2500 R:8.0000 loss:0.0109
Episode:3497 meanR:9.2600 R:10.0000 loss:0.0105
Episode:3498 meanR:9.2700 R:10.0000 loss:0.0094
Episode:3499 meanR:9.2700 R:9.0000 loss:0.0090
Episode:3500 meanR:9.2600 R:8.0000 loss:0.0087
Episode:3501 meanR:9.2700 R:9.0000 loss:0.0096
Episode:3502 meanR:9.2700 R:9.0000 loss:0.0141
Episode:3503 meanR:9.2800 R:10.0000 loss:0.0119
Episode:3504 meanR:9.2800 R:9.0000 loss:0.0122
Episo

Episode:3657 meanR:9.3900 R:10.0000 loss:0.0090
Episode:3658 meanR:9.3900 R:10.0000 loss:0.0142
Episode:3659 meanR:9.3900 R:9.0000 loss:0.0147
Episode:3660 meanR:9.4100 R:10.0000 loss:0.0117
Episode:3661 meanR:9.4000 R:8.0000 loss:0.0097
Episode:3662 meanR:9.4000 R:10.0000 loss:0.0105
Episode:3663 meanR:9.4000 R:9.0000 loss:0.0089
Episode:3664 meanR:9.3900 R:9.0000 loss:0.0083
Episode:3665 meanR:9.3800 R:9.0000 loss:0.0084
Episode:3666 meanR:9.3700 R:9.0000 loss:0.0074
Episode:3667 meanR:9.3700 R:9.0000 loss:0.0064
Episode:3668 meanR:9.3800 R:9.0000 loss:0.0046
Episode:3669 meanR:9.3900 R:10.0000 loss:0.0044
Episode:3670 meanR:9.3800 R:8.0000 loss:0.0079
Episode:3671 meanR:9.3800 R:10.0000 loss:0.0079
Episode:3672 meanR:9.3700 R:9.0000 loss:0.0060
Episode:3673 meanR:9.3800 R:10.0000 loss:0.0079
Episode:3674 meanR:9.3900 R:10.0000 loss:0.0075
Episode:3675 meanR:9.4000 R:10.0000 loss:0.0114
Episode:3676 meanR:9.3900 R:8.0000 loss:0.0097
Episode:3677 meanR:9.4100 R:10.0000 loss:0.0111
Epi

Episode:3830 meanR:9.5200 R:10.0000 loss:0.0152
Episode:3831 meanR:9.5300 R:10.0000 loss:0.0144
Episode:3832 meanR:9.5200 R:8.0000 loss:0.0122
Episode:3833 meanR:9.5200 R:10.0000 loss:0.0099
Episode:3834 meanR:9.5200 R:10.0000 loss:0.0102
Episode:3835 meanR:9.5100 R:9.0000 loss:0.0099
Episode:3836 meanR:9.5000 R:9.0000 loss:0.0088
Episode:3837 meanR:9.5000 R:9.0000 loss:0.0107
Episode:3838 meanR:9.4900 R:9.0000 loss:0.0099
Episode:3839 meanR:9.4700 R:8.0000 loss:0.0123
Episode:3840 meanR:9.4600 R:9.0000 loss:0.0101
Episode:3841 meanR:9.4500 R:9.0000 loss:0.0108
Episode:3842 meanR:9.4400 R:9.0000 loss:0.0102
Episode:3843 meanR:9.4300 R:9.0000 loss:0.0106
Episode:3844 meanR:9.4300 R:10.0000 loss:0.0117
Episode:3845 meanR:9.4200 R:9.0000 loss:0.0116
Episode:3846 meanR:9.4200 R:10.0000 loss:0.0136
Episode:3847 meanR:9.4100 R:9.0000 loss:0.0095
Episode:3848 meanR:9.4100 R:9.0000 loss:0.0088
Episode:3849 meanR:9.4100 R:10.0000 loss:0.0084
Episode:3850 meanR:9.4200 R:10.0000 loss:0.0088
Episo

Episode:4003 meanR:9.3300 R:10.0000 loss:0.0162
Episode:4004 meanR:9.3200 R:9.0000 loss:0.0143
Episode:4005 meanR:9.3200 R:10.0000 loss:0.0150
Episode:4006 meanR:9.3200 R:9.0000 loss:0.0144
Episode:4007 meanR:9.3400 R:11.0000 loss:0.0138
Episode:4008 meanR:9.3300 R:9.0000 loss:0.0137
Episode:4009 meanR:9.3100 R:9.0000 loss:0.0145
Episode:4010 meanR:9.2900 R:8.0000 loss:0.0131
Episode:4011 meanR:9.2900 R:9.0000 loss:0.0126
Episode:4012 meanR:9.2900 R:8.0000 loss:0.0125
Episode:4013 meanR:9.3000 R:10.0000 loss:0.0175
Episode:4014 meanR:9.2900 R:9.0000 loss:0.0142
Episode:4015 meanR:9.2900 R:10.0000 loss:0.0135
Episode:4016 meanR:9.3000 R:10.0000 loss:0.0180
Episode:4017 meanR:9.2900 R:8.0000 loss:0.0183
Episode:4018 meanR:9.2800 R:9.0000 loss:0.0214
Episode:4019 meanR:9.3000 R:10.0000 loss:0.0204
Episode:4020 meanR:9.3000 R:10.0000 loss:0.0204
Episode:4021 meanR:9.3000 R:9.0000 loss:0.0206
Episode:4022 meanR:9.3200 R:10.0000 loss:0.0192
Episode:4023 meanR:9.3000 R:8.0000 loss:0.0208
Epis

Episode:4176 meanR:9.3400 R:9.0000 loss:0.0128
Episode:4177 meanR:9.3400 R:9.0000 loss:0.0079
Episode:4178 meanR:9.3400 R:9.0000 loss:0.0083
Episode:4179 meanR:9.3400 R:9.0000 loss:0.0099
Episode:4180 meanR:9.3400 R:10.0000 loss:0.0124
Episode:4181 meanR:9.3500 R:11.0000 loss:0.0099
Episode:4182 meanR:9.3500 R:9.0000 loss:0.0107
Episode:4183 meanR:9.3400 R:9.0000 loss:0.0105
Episode:4184 meanR:9.3400 R:8.0000 loss:0.0090
Episode:4185 meanR:9.3500 R:9.0000 loss:0.0105
Episode:4186 meanR:9.3600 R:10.0000 loss:0.0090
Episode:4187 meanR:9.3700 R:10.0000 loss:0.0100
Episode:4188 meanR:9.3800 R:9.0000 loss:0.0104
Episode:4189 meanR:9.3600 R:8.0000 loss:0.0099
Episode:4190 meanR:9.3600 R:10.0000 loss:0.0111
Episode:4191 meanR:9.3500 R:9.0000 loss:0.0109
Episode:4192 meanR:9.3500 R:9.0000 loss:0.0106
Episode:4193 meanR:9.3600 R:9.0000 loss:0.0130
Episode:4194 meanR:9.3500 R:8.0000 loss:0.0140
Episode:4195 meanR:9.3500 R:9.0000 loss:0.0201
Episode:4196 meanR:9.3500 R:9.0000 loss:0.0194
Episode:

Episode:4349 meanR:9.3100 R:9.0000 loss:0.0086
Episode:4350 meanR:9.3000 R:9.0000 loss:0.0086
Episode:4351 meanR:9.3100 R:10.0000 loss:0.0060
Episode:4352 meanR:9.2900 R:9.0000 loss:0.0060
Episode:4353 meanR:9.3000 R:10.0000 loss:0.0065
Episode:4354 meanR:9.3100 R:11.0000 loss:0.0068
Episode:4355 meanR:9.3100 R:9.0000 loss:0.0093
Episode:4356 meanR:9.3200 R:10.0000 loss:0.0090
Episode:4357 meanR:9.3100 R:9.0000 loss:0.0085
Episode:4358 meanR:9.3100 R:9.0000 loss:0.0084
Episode:4359 meanR:9.3200 R:9.0000 loss:0.0099
Episode:4360 meanR:9.3200 R:10.0000 loss:0.0104
Episode:4361 meanR:9.3300 R:10.0000 loss:0.0097
Episode:4362 meanR:9.3500 R:10.0000 loss:0.0099
Episode:4363 meanR:9.3500 R:9.0000 loss:0.0091
Episode:4364 meanR:9.3400 R:9.0000 loss:0.0089
Episode:4365 meanR:9.3400 R:10.0000 loss:0.0081
Episode:4366 meanR:9.3300 R:9.0000 loss:0.0085
Episode:4367 meanR:9.3200 R:8.0000 loss:0.0088
Episode:4368 meanR:9.3300 R:10.0000 loss:0.0097
Episode:4369 meanR:9.3300 R:10.0000 loss:0.0070
Epi

Episode:4522 meanR:9.3300 R:9.0000 loss:0.0125
Episode:4523 meanR:9.3300 R:9.0000 loss:0.0129
Episode:4524 meanR:9.3200 R:9.0000 loss:0.0117
Episode:4525 meanR:9.3000 R:8.0000 loss:0.0134
Episode:4526 meanR:9.3000 R:10.0000 loss:0.0137
Episode:4527 meanR:9.2900 R:9.0000 loss:0.0099
Episode:4528 meanR:9.2800 R:8.0000 loss:0.0091
Episode:4529 meanR:9.2800 R:10.0000 loss:0.0098
Episode:4530 meanR:9.2800 R:9.0000 loss:0.0100
Episode:4531 meanR:9.2800 R:10.0000 loss:0.0151
Episode:4532 meanR:9.2700 R:9.0000 loss:0.0086
Episode:4533 meanR:9.2700 R:9.0000 loss:0.0082
Episode:4534 meanR:9.2700 R:9.0000 loss:0.0085
Episode:4535 meanR:9.2900 R:10.0000 loss:0.0113
Episode:4536 meanR:9.3000 R:10.0000 loss:0.0092
Episode:4537 meanR:9.3200 R:10.0000 loss:0.0087
Episode:4538 meanR:9.3300 R:10.0000 loss:0.0074
Episode:4539 meanR:9.3200 R:9.0000 loss:0.0079
Episode:4540 meanR:9.3300 R:10.0000 loss:0.0076
Episode:4541 meanR:9.3200 R:9.0000 loss:0.0085
Episode:4542 meanR:9.3400 R:10.0000 loss:0.0112
Epis

Episode:4695 meanR:9.2900 R:9.0000 loss:0.0068
Episode:4696 meanR:9.2900 R:9.0000 loss:0.0126
Episode:4697 meanR:9.3000 R:10.0000 loss:0.0063
Episode:4698 meanR:9.3000 R:9.0000 loss:0.0049
Episode:4699 meanR:9.3100 R:11.0000 loss:0.0061
Episode:4700 meanR:9.3200 R:9.0000 loss:0.0060
Episode:4701 meanR:9.3200 R:9.0000 loss:0.0044
Episode:4702 meanR:9.3000 R:8.0000 loss:0.0048
Episode:4703 meanR:9.3000 R:9.0000 loss:0.0056
Episode:4704 meanR:9.3100 R:10.0000 loss:0.0072
Episode:4705 meanR:9.3100 R:10.0000 loss:0.0067
Episode:4706 meanR:9.3100 R:10.0000 loss:0.0066
Episode:4707 meanR:9.3200 R:9.0000 loss:0.0066
Episode:4708 meanR:9.3100 R:9.0000 loss:0.0067
Episode:4709 meanR:9.3000 R:9.0000 loss:0.0062
Episode:4710 meanR:9.3000 R:9.0000 loss:0.0074
Episode:4711 meanR:9.2900 R:9.0000 loss:0.0076
Episode:4712 meanR:9.3000 R:10.0000 loss:0.0083
Episode:4713 meanR:9.2800 R:8.0000 loss:0.0064
Episode:4714 meanR:9.2900 R:10.0000 loss:0.0060
Episode:4715 meanR:9.2800 R:9.0000 loss:0.0089
Episod

Episode:4868 meanR:9.1300 R:10.0000 loss:0.0081
Episode:4869 meanR:9.1400 R:10.0000 loss:0.0071
Episode:4870 meanR:9.1300 R:9.0000 loss:0.0072
Episode:4871 meanR:9.1200 R:10.0000 loss:0.0060
Episode:4872 meanR:9.1000 R:8.0000 loss:0.0067
Episode:4873 meanR:9.1100 R:10.0000 loss:0.0060
Episode:4874 meanR:9.1200 R:9.0000 loss:0.0066
Episode:4875 meanR:9.1100 R:9.0000 loss:0.0055
Episode:4876 meanR:9.1000 R:9.0000 loss:0.0059
Episode:4877 meanR:9.1000 R:10.0000 loss:0.0057
Episode:4878 meanR:9.1000 R:9.0000 loss:0.0068
Episode:4879 meanR:9.1100 R:9.0000 loss:0.0066
Episode:4880 meanR:9.1000 R:8.0000 loss:0.0073
Episode:4881 meanR:9.1100 R:10.0000 loss:0.0102
Episode:4882 meanR:9.1300 R:10.0000 loss:0.0084
Episode:4883 meanR:9.1300 R:9.0000 loss:0.0087
Episode:4884 meanR:9.1200 R:9.0000 loss:0.0094
Episode:4885 meanR:9.1300 R:10.0000 loss:0.0093
Episode:4886 meanR:9.1300 R:9.0000 loss:0.0100
Episode:4887 meanR:9.1400 R:10.0000 loss:0.0101
Episode:4888 meanR:9.1400 R:9.0000 loss:0.0106
Epis

Episode:5041 meanR:9.3900 R:9.0000 loss:0.0074
Episode:5042 meanR:9.3900 R:10.0000 loss:0.0073
Episode:5043 meanR:9.4000 R:9.0000 loss:0.0065
Episode:5044 meanR:9.4100 R:9.0000 loss:0.0067
Episode:5045 meanR:9.4100 R:10.0000 loss:0.0069
Episode:5046 meanR:9.4000 R:9.0000 loss:0.0081
Episode:5047 meanR:9.3900 R:8.0000 loss:0.0089
Episode:5048 meanR:9.3900 R:9.0000 loss:0.0095
Episode:5049 meanR:9.3800 R:8.0000 loss:0.0096
Episode:5050 meanR:9.4000 R:10.0000 loss:0.0083
Episode:5051 meanR:9.4000 R:9.0000 loss:0.0076
Episode:5052 meanR:9.3900 R:8.0000 loss:0.0070
Episode:5053 meanR:9.4000 R:10.0000 loss:0.0077
Episode:5054 meanR:9.4000 R:8.0000 loss:0.0060
Episode:5055 meanR:9.3900 R:9.0000 loss:0.0064
Episode:5056 meanR:9.4100 R:10.0000 loss:0.0067
Episode:5057 meanR:9.4000 R:9.0000 loss:0.0078
Episode:5058 meanR:9.4100 R:10.0000 loss:0.0069
Episode:5059 meanR:9.4000 R:9.0000 loss:0.0065
Episode:5060 meanR:9.4000 R:9.0000 loss:0.0070
Episode:5061 meanR:9.4100 R:10.0000 loss:0.0069
Episod

Episode:5214 meanR:9.2500 R:10.0000 loss:0.0087
Episode:5215 meanR:9.2600 R:10.0000 loss:0.0098
Episode:5216 meanR:9.2600 R:10.0000 loss:0.0088
Episode:5217 meanR:9.2400 R:8.0000 loss:0.0087
Episode:5218 meanR:9.2400 R:10.0000 loss:0.0077
Episode:5219 meanR:9.2400 R:9.0000 loss:0.0086
Episode:5220 meanR:9.2300 R:9.0000 loss:0.0061
Episode:5221 meanR:9.2300 R:10.0000 loss:0.0067
Episode:5222 meanR:9.2200 R:9.0000 loss:0.0070
Episode:5223 meanR:9.2100 R:8.0000 loss:0.0058
Episode:5224 meanR:9.2100 R:10.0000 loss:0.0089
Episode:5225 meanR:9.2100 R:10.0000 loss:0.0107
Episode:5226 meanR:9.2200 R:11.0000 loss:0.0107
Episode:5227 meanR:9.2300 R:10.0000 loss:0.0174
Episode:5228 meanR:9.2300 R:10.0000 loss:0.0092
Episode:5229 meanR:9.2500 R:10.0000 loss:0.0075
Episode:5230 meanR:9.2500 R:9.0000 loss:0.0088
Episode:5231 meanR:9.2400 R:8.0000 loss:0.0090
Episode:5232 meanR:9.2300 R:9.0000 loss:0.0072
Episode:5233 meanR:9.2200 R:9.0000 loss:0.0071
Episode:5234 meanR:9.2500 R:11.0000 loss:0.0072
E

Episode:5387 meanR:9.3900 R:10.0000 loss:0.0133
Episode:5388 meanR:9.3800 R:9.0000 loss:0.0151
Episode:5389 meanR:9.3800 R:8.0000 loss:0.0102
Episode:5390 meanR:9.3700 R:8.0000 loss:0.0125
Episode:5391 meanR:9.3800 R:10.0000 loss:0.0182
Episode:5392 meanR:9.3800 R:9.0000 loss:0.0170
Episode:5393 meanR:9.3900 R:10.0000 loss:0.0162
Episode:5394 meanR:9.3900 R:10.0000 loss:0.0148
Episode:5395 meanR:9.3900 R:10.0000 loss:0.0118
Episode:5396 meanR:9.3900 R:9.0000 loss:0.0102
Episode:5397 meanR:9.4000 R:10.0000 loss:0.0103
Episode:5398 meanR:9.4000 R:9.0000 loss:0.0104
Episode:5399 meanR:9.4000 R:10.0000 loss:0.0101
Episode:5400 meanR:9.4000 R:8.0000 loss:0.0091
Episode:5401 meanR:9.4100 R:9.0000 loss:0.0093
Episode:5402 meanR:9.4000 R:9.0000 loss:0.0116
Episode:5403 meanR:9.3900 R:9.0000 loss:0.0151
Episode:5404 meanR:9.3800 R:9.0000 loss:0.0151
Episode:5405 meanR:9.3600 R:8.0000 loss:0.0085
Episode:5406 meanR:9.3600 R:10.0000 loss:0.0083
Episode:5407 meanR:9.3500 R:8.0000 loss:0.0097
Episo

Episode:5560 meanR:9.2500 R:8.0000 loss:0.0061
Episode:5561 meanR:9.2600 R:10.0000 loss:0.0069
Episode:5562 meanR:9.2500 R:9.0000 loss:0.0056
Episode:5563 meanR:9.2400 R:9.0000 loss:0.0066
Episode:5564 meanR:9.2400 R:10.0000 loss:0.0071
Episode:5565 meanR:9.2400 R:10.0000 loss:0.0059
Episode:5566 meanR:9.2500 R:9.0000 loss:0.0069
Episode:5567 meanR:9.2500 R:9.0000 loss:0.0095
Episode:5568 meanR:9.2400 R:9.0000 loss:0.0069
Episode:5569 meanR:9.2300 R:9.0000 loss:0.0089
Episode:5570 meanR:9.2300 R:9.0000 loss:0.0099
Episode:5571 meanR:9.2300 R:9.0000 loss:0.0086
Episode:5572 meanR:9.2400 R:10.0000 loss:0.0071
Episode:5573 meanR:9.2400 R:9.0000 loss:0.0082
Episode:5574 meanR:9.2400 R:10.0000 loss:0.0073
Episode:5575 meanR:9.2400 R:10.0000 loss:0.0089
Episode:5576 meanR:9.2300 R:10.0000 loss:0.0107
Episode:5577 meanR:9.2300 R:10.0000 loss:0.0098
Episode:5578 meanR:9.2500 R:10.0000 loss:0.0077
Episode:5579 meanR:9.2400 R:9.0000 loss:0.0076
Episode:5580 meanR:9.2400 R:10.0000 loss:0.0079
Epi

Episode:5733 meanR:9.3500 R:9.0000 loss:0.0106
Episode:5734 meanR:9.3400 R:9.0000 loss:0.0117
Episode:5735 meanR:9.3300 R:8.0000 loss:0.0149
Episode:5736 meanR:9.3200 R:9.0000 loss:0.0121
Episode:5737 meanR:9.3300 R:10.0000 loss:0.0135
Episode:5738 meanR:9.3200 R:9.0000 loss:0.0111
Episode:5739 meanR:9.3400 R:10.0000 loss:0.0156
Episode:5740 meanR:9.3600 R:10.0000 loss:0.0115
Episode:5741 meanR:9.3700 R:10.0000 loss:0.0108
Episode:5742 meanR:9.3600 R:9.0000 loss:0.0095
Episode:5743 meanR:9.3600 R:10.0000 loss:0.0114
Episode:5744 meanR:9.3500 R:9.0000 loss:0.0117
Episode:5745 meanR:9.3500 R:9.0000 loss:0.0111
Episode:5746 meanR:9.3600 R:9.0000 loss:0.0096
Episode:5747 meanR:9.3600 R:10.0000 loss:0.0081
Episode:5748 meanR:9.3400 R:8.0000 loss:0.0086
Episode:5749 meanR:9.3200 R:8.0000 loss:0.0098
Episode:5750 meanR:9.3200 R:10.0000 loss:0.0112
Episode:5751 meanR:9.3100 R:10.0000 loss:0.0112
Episode:5752 meanR:9.3000 R:8.0000 loss:0.0105
Episode:5753 meanR:9.2900 R:9.0000 loss:0.0113
Episo

Episode:5906 meanR:9.4200 R:9.0000 loss:0.0083
Episode:5907 meanR:9.4100 R:8.0000 loss:0.0095
Episode:5908 meanR:9.4000 R:9.0000 loss:0.0179
Episode:5909 meanR:9.4000 R:9.0000 loss:0.0247
Episode:5910 meanR:9.4100 R:10.0000 loss:0.0138
Episode:5911 meanR:9.4000 R:9.0000 loss:0.0113
Episode:5912 meanR:9.4000 R:9.0000 loss:0.0107
Episode:5913 meanR:9.4100 R:10.0000 loss:0.0090
Episode:5914 meanR:9.4100 R:9.0000 loss:0.0082
Episode:5915 meanR:9.4100 R:8.0000 loss:0.0071
Episode:5916 meanR:9.4200 R:10.0000 loss:0.0074
Episode:5917 meanR:9.4100 R:9.0000 loss:0.0089
Episode:5918 meanR:9.3900 R:8.0000 loss:0.0064
Episode:5919 meanR:9.3900 R:9.0000 loss:0.0072
Episode:5920 meanR:9.3900 R:9.0000 loss:0.0082
Episode:5921 meanR:9.3900 R:9.0000 loss:0.0150
Episode:5922 meanR:9.4000 R:10.0000 loss:0.0115
Episode:5923 meanR:9.3900 R:9.0000 loss:0.0110
Episode:5924 meanR:9.4000 R:10.0000 loss:0.0096
Episode:5925 meanR:9.3900 R:10.0000 loss:0.0092
Episode:5926 meanR:9.3900 R:10.0000 loss:0.0098
Episod

Episode:6079 meanR:9.4100 R:10.0000 loss:0.0142
Episode:6080 meanR:9.3900 R:8.0000 loss:0.0130
Episode:6081 meanR:9.4000 R:10.0000 loss:0.0152
Episode:6082 meanR:9.4000 R:10.0000 loss:0.0188
Episode:6083 meanR:9.3800 R:8.0000 loss:0.0143
Episode:6084 meanR:9.3800 R:9.0000 loss:0.0125
Episode:6085 meanR:9.3700 R:9.0000 loss:0.0116
Episode:6086 meanR:9.3600 R:8.0000 loss:0.0136
Episode:6087 meanR:9.3500 R:8.0000 loss:0.0149
Episode:6088 meanR:9.3500 R:10.0000 loss:0.0117
Episode:6089 meanR:9.3600 R:10.0000 loss:0.0110
Episode:6090 meanR:9.3600 R:9.0000 loss:0.0090
Episode:6091 meanR:9.3700 R:10.0000 loss:0.0082
Episode:6092 meanR:9.3700 R:9.0000 loss:0.0112
Episode:6093 meanR:9.3700 R:9.0000 loss:0.0092
Episode:6094 meanR:9.3800 R:10.0000 loss:0.0168
Episode:6095 meanR:9.3800 R:9.0000 loss:0.0103
Episode:6096 meanR:9.3800 R:10.0000 loss:0.0114
Episode:6097 meanR:9.3600 R:8.0000 loss:0.0104
Episode:6098 meanR:9.3600 R:9.0000 loss:0.0082
Episode:6099 meanR:9.3500 R:9.0000 loss:0.0082
Episo

Episode:6252 meanR:9.3500 R:10.0000 loss:0.0145
Episode:6253 meanR:9.3500 R:11.0000 loss:0.0151
Episode:6254 meanR:9.3400 R:9.0000 loss:0.0142
Episode:6255 meanR:9.3500 R:10.0000 loss:0.0146
Episode:6256 meanR:9.3600 R:10.0000 loss:0.0180
Episode:6257 meanR:9.3500 R:10.0000 loss:0.0158
Episode:6258 meanR:9.3800 R:11.0000 loss:0.0098
Episode:6259 meanR:9.3900 R:10.0000 loss:0.0119
Episode:6260 meanR:9.3800 R:9.0000 loss:0.0089
Episode:6261 meanR:9.3800 R:10.0000 loss:0.0094
Episode:6262 meanR:9.3600 R:8.0000 loss:0.0107
Episode:6263 meanR:9.3700 R:9.0000 loss:0.0184
Episode:6264 meanR:9.3600 R:8.0000 loss:0.0101
Episode:6265 meanR:9.3700 R:10.0000 loss:0.0093
Episode:6266 meanR:9.3500 R:9.0000 loss:0.0086
Episode:6267 meanR:9.3400 R:9.0000 loss:0.0086
Episode:6268 meanR:9.3400 R:9.0000 loss:0.0084
Episode:6269 meanR:9.3400 R:9.0000 loss:0.0082
Episode:6270 meanR:9.3400 R:9.0000 loss:0.0085
Episode:6271 meanR:9.3400 R:10.0000 loss:0.0087
Episode:6272 meanR:9.3200 R:9.0000 loss:0.0099
Epi

Episode:6425 meanR:9.4200 R:10.0000 loss:0.0066
Episode:6426 meanR:9.4300 R:10.0000 loss:0.0071
Episode:6427 meanR:9.4100 R:8.0000 loss:0.0075
Episode:6428 meanR:9.4000 R:8.0000 loss:0.0073
Episode:6429 meanR:9.4000 R:9.0000 loss:0.0078
Episode:6430 meanR:9.4100 R:10.0000 loss:0.0065
Episode:6431 meanR:9.4100 R:9.0000 loss:0.0066
Episode:6432 meanR:9.4000 R:9.0000 loss:0.0063
Episode:6433 meanR:9.4100 R:10.0000 loss:0.0071
Episode:6434 meanR:9.4200 R:11.0000 loss:0.0062
Episode:6435 meanR:9.4200 R:9.0000 loss:0.0065
Episode:6436 meanR:9.4300 R:10.0000 loss:0.0064
Episode:6437 meanR:9.4400 R:9.0000 loss:0.0073
Episode:6438 meanR:9.4300 R:9.0000 loss:0.0078
Episode:6439 meanR:9.4400 R:10.0000 loss:0.0086
Episode:6440 meanR:9.4200 R:9.0000 loss:0.0065
Episode:6441 meanR:9.4300 R:10.0000 loss:0.0085
Episode:6442 meanR:9.4100 R:8.0000 loss:0.0073
Episode:6443 meanR:9.4100 R:9.0000 loss:0.0069
Episode:6444 meanR:9.3900 R:8.0000 loss:0.0068
Episode:6445 meanR:9.3800 R:9.0000 loss:0.0076
Episo

Episode:6598 meanR:9.3600 R:10.0000 loss:0.0133
Episode:6599 meanR:9.3500 R:9.0000 loss:0.0096
Episode:6600 meanR:9.3700 R:10.0000 loss:0.0098
Episode:6601 meanR:9.3800 R:10.0000 loss:0.0099
Episode:6602 meanR:9.3700 R:8.0000 loss:0.0097
Episode:6603 meanR:9.3800 R:10.0000 loss:0.0094
Episode:6604 meanR:9.3800 R:9.0000 loss:0.0125
Episode:6605 meanR:9.3600 R:8.0000 loss:0.0099
Episode:6606 meanR:9.3600 R:10.0000 loss:0.0081
Episode:6607 meanR:9.3500 R:9.0000 loss:0.0079
Episode:6608 meanR:9.3500 R:9.0000 loss:0.0085
Episode:6609 meanR:9.3400 R:8.0000 loss:0.0109
Episode:6610 meanR:9.3500 R:10.0000 loss:0.0076
Episode:6611 meanR:9.3600 R:10.0000 loss:0.0080
Episode:6612 meanR:9.3400 R:9.0000 loss:0.0085
Episode:6613 meanR:9.3300 R:8.0000 loss:0.0070
Episode:6614 meanR:9.3400 R:11.0000 loss:0.0070
Episode:6615 meanR:9.3300 R:9.0000 loss:0.0086
Episode:6616 meanR:9.3400 R:10.0000 loss:0.0088
Episode:6617 meanR:9.3500 R:10.0000 loss:0.0082
Episode:6618 meanR:9.3500 R:9.0000 loss:0.0092
Epi

Episode:6771 meanR:9.3300 R:9.0000 loss:0.0074
Episode:6772 meanR:9.3300 R:10.0000 loss:0.0098
Episode:6773 meanR:9.3200 R:9.0000 loss:0.0079
Episode:6774 meanR:9.3200 R:9.0000 loss:0.0074
Episode:6775 meanR:9.3100 R:9.0000 loss:0.0077
Episode:6776 meanR:9.3100 R:9.0000 loss:0.0068
Episode:6777 meanR:9.3100 R:9.0000 loss:0.0066
Episode:6778 meanR:9.3000 R:9.0000 loss:0.0073
Episode:6779 meanR:9.2800 R:8.0000 loss:0.0078
Episode:6780 meanR:9.2800 R:10.0000 loss:0.0077
Episode:6781 meanR:9.2800 R:10.0000 loss:0.0076
Episode:6782 meanR:9.2900 R:10.0000 loss:0.0080
Episode:6783 meanR:9.2700 R:8.0000 loss:0.0082
Episode:6784 meanR:9.2800 R:10.0000 loss:0.0067
Episode:6785 meanR:9.2700 R:8.0000 loss:0.0069
Episode:6786 meanR:9.2800 R:9.0000 loss:0.0065
Episode:6787 meanR:9.2700 R:9.0000 loss:0.0082
Episode:6788 meanR:9.2700 R:10.0000 loss:0.0083
Episode:6789 meanR:9.2700 R:10.0000 loss:0.0113
Episode:6790 meanR:9.2600 R:9.0000 loss:0.0087
Episode:6791 meanR:9.2700 R:10.0000 loss:0.0084
Episo

Episode:6944 meanR:9.3100 R:9.0000 loss:0.0083
Episode:6945 meanR:9.3100 R:10.0000 loss:0.0093
Episode:6946 meanR:9.3000 R:9.0000 loss:0.0109
Episode:6947 meanR:9.2900 R:9.0000 loss:0.0135
Episode:6948 meanR:9.2900 R:9.0000 loss:0.0129
Episode:6949 meanR:9.2900 R:10.0000 loss:0.0122
Episode:6950 meanR:9.3100 R:10.0000 loss:0.0100
Episode:6951 meanR:9.3100 R:10.0000 loss:0.0098
Episode:6952 meanR:9.3100 R:8.0000 loss:0.0099
Episode:6953 meanR:9.3200 R:10.0000 loss:0.0100
Episode:6954 meanR:9.3100 R:9.0000 loss:0.0103
Episode:6955 meanR:9.3000 R:9.0000 loss:0.0115
Episode:6956 meanR:9.2900 R:9.0000 loss:0.0094
Episode:6957 meanR:9.3000 R:9.0000 loss:0.0101
Episode:6958 meanR:9.3000 R:10.0000 loss:0.0081
Episode:6959 meanR:9.2900 R:9.0000 loss:0.0073
Episode:6960 meanR:9.3000 R:9.0000 loss:0.0077
Episode:6961 meanR:9.3100 R:9.0000 loss:0.0070
Episode:6962 meanR:9.3200 R:10.0000 loss:0.0068
Episode:6963 meanR:9.3300 R:10.0000 loss:0.0067
Episode:6964 meanR:9.3300 R:9.0000 loss:0.0066
Episo

Episode:7117 meanR:9.4500 R:9.0000 loss:0.0065
Episode:7118 meanR:9.4500 R:10.0000 loss:0.0059
Episode:7119 meanR:9.4400 R:8.0000 loss:0.0078
Episode:7120 meanR:9.4300 R:9.0000 loss:0.0068
Episode:7121 meanR:9.4400 R:10.0000 loss:0.0056
Episode:7122 meanR:9.4300 R:9.0000 loss:0.0063
Episode:7123 meanR:9.4300 R:10.0000 loss:0.0076
Episode:7124 meanR:9.4300 R:10.0000 loss:0.0080
Episode:7125 meanR:9.4300 R:9.0000 loss:0.0169
Episode:7126 meanR:9.4300 R:10.0000 loss:0.0168
Episode:7127 meanR:9.4300 R:9.0000 loss:0.0123
Episode:7128 meanR:9.4300 R:9.0000 loss:0.0191
Episode:7129 meanR:9.4300 R:10.0000 loss:0.0104
Episode:7130 meanR:9.4300 R:9.0000 loss:0.0266
Episode:7131 meanR:9.4300 R:9.0000 loss:0.0217
Episode:7132 meanR:9.4300 R:9.0000 loss:0.0110
Episode:7133 meanR:9.4400 R:10.0000 loss:0.0276
Episode:7134 meanR:9.4300 R:10.0000 loss:0.0138
Episode:7135 meanR:9.4400 R:9.0000 loss:0.0126
Episode:7136 meanR:9.4400 R:10.0000 loss:0.0186
Episode:7137 meanR:9.4400 R:9.0000 loss:0.0073
Epis

Episode:7290 meanR:9.3000 R:10.0000 loss:0.0108
Episode:7291 meanR:9.3000 R:9.0000 loss:0.0109
Episode:7292 meanR:9.3100 R:11.0000 loss:0.0099
Episode:7293 meanR:9.3100 R:10.0000 loss:0.0104
Episode:7294 meanR:9.3000 R:9.0000 loss:0.0071
Episode:7295 meanR:9.3100 R:9.0000 loss:0.0066
Episode:7296 meanR:9.2900 R:8.0000 loss:0.0059
Episode:7297 meanR:9.2800 R:8.0000 loss:0.0055
Episode:7298 meanR:9.2700 R:9.0000 loss:0.0060
Episode:7299 meanR:9.2600 R:9.0000 loss:0.0064
Episode:7300 meanR:9.2700 R:10.0000 loss:0.0060
Episode:7301 meanR:9.2700 R:9.0000 loss:0.0060
Episode:7302 meanR:9.2800 R:10.0000 loss:0.0063
Episode:7303 meanR:9.2900 R:10.0000 loss:0.0067
Episode:7304 meanR:9.2900 R:9.0000 loss:0.0131
Episode:7305 meanR:9.3000 R:10.0000 loss:0.0076
Episode:7306 meanR:9.2900 R:8.0000 loss:0.0073
Episode:7307 meanR:9.3100 R:11.0000 loss:0.0064
Episode:7308 meanR:9.3200 R:9.0000 loss:0.0070
Episode:7309 meanR:9.3200 R:10.0000 loss:0.0086
Episode:7310 meanR:9.3300 R:10.0000 loss:0.0089
Epi

Episode:7463 meanR:9.3200 R:9.0000 loss:0.0082
Episode:7464 meanR:9.3200 R:9.0000 loss:0.0067
Episode:7465 meanR:9.3100 R:10.0000 loss:0.0075
Episode:7466 meanR:9.3000 R:10.0000 loss:0.0072
Episode:7467 meanR:9.3000 R:9.0000 loss:0.0090
Episode:7468 meanR:9.2900 R:8.0000 loss:0.0081
Episode:7469 meanR:9.3000 R:10.0000 loss:0.0085
Episode:7470 meanR:9.2900 R:8.0000 loss:0.0083
Episode:7471 meanR:9.2900 R:10.0000 loss:0.0091
Episode:7472 meanR:9.2900 R:10.0000 loss:0.0090
Episode:7473 meanR:9.2900 R:10.0000 loss:0.0094
Episode:7474 meanR:9.3000 R:10.0000 loss:0.0092
Episode:7475 meanR:9.3000 R:10.0000 loss:0.0090
Episode:7476 meanR:9.3100 R:9.0000 loss:0.0103
Episode:7477 meanR:9.3100 R:10.0000 loss:0.0085
Episode:7478 meanR:9.3200 R:9.0000 loss:0.0076
Episode:7479 meanR:9.3300 R:9.0000 loss:0.0075
Episode:7480 meanR:9.3400 R:9.0000 loss:0.0077
Episode:7481 meanR:9.3500 R:11.0000 loss:0.0108
Episode:7482 meanR:9.3500 R:9.0000 loss:0.0096
Episode:7483 meanR:9.3500 R:9.0000 loss:0.0068
Epi

Episode:7636 meanR:9.3200 R:10.0000 loss:0.0073
Episode:7637 meanR:9.3200 R:9.0000 loss:0.0067
Episode:7638 meanR:9.3200 R:10.0000 loss:0.0063
Episode:7639 meanR:9.3200 R:9.0000 loss:0.0062
Episode:7640 meanR:9.3100 R:9.0000 loss:0.0065
Episode:7641 meanR:9.3100 R:10.0000 loss:0.0082
Episode:7642 meanR:9.2900 R:8.0000 loss:0.0066
Episode:7643 meanR:9.2700 R:8.0000 loss:0.0067
Episode:7644 meanR:9.2600 R:9.0000 loss:0.0069
Episode:7645 meanR:9.2800 R:10.0000 loss:0.0070
Episode:7646 meanR:9.2700 R:9.0000 loss:0.0079
Episode:7647 meanR:9.2600 R:9.0000 loss:0.0079
Episode:7648 meanR:9.2600 R:9.0000 loss:0.0101
Episode:7649 meanR:9.2500 R:9.0000 loss:0.0080
Episode:7650 meanR:9.2400 R:9.0000 loss:0.0058
Episode:7651 meanR:9.2600 R:10.0000 loss:0.0057
Episode:7652 meanR:9.2700 R:9.0000 loss:0.0055
Episode:7653 meanR:9.2600 R:8.0000 loss:0.0050
Episode:7654 meanR:9.2500 R:9.0000 loss:0.0067
Episode:7655 meanR:9.2500 R:10.0000 loss:0.0071
Episode:7656 meanR:9.2500 R:9.0000 loss:0.0055
Episode

Episode:7809 meanR:9.3900 R:9.0000 loss:0.0074
Episode:7810 meanR:9.3900 R:9.0000 loss:0.0084
Episode:7811 meanR:9.3900 R:9.0000 loss:0.0124
Episode:7812 meanR:9.4100 R:10.0000 loss:0.0068
Episode:7813 meanR:9.4000 R:9.0000 loss:0.0068
Episode:7814 meanR:9.4100 R:10.0000 loss:0.0079
Episode:7815 meanR:9.4000 R:9.0000 loss:0.0076
Episode:7816 meanR:9.3900 R:8.0000 loss:0.0076
Episode:7817 meanR:9.3900 R:10.0000 loss:0.0085
Episode:7818 meanR:9.4000 R:10.0000 loss:0.0085
Episode:7819 meanR:9.3900 R:8.0000 loss:0.0072
Episode:7820 meanR:9.3900 R:10.0000 loss:0.0070
Episode:7821 meanR:9.3800 R:9.0000 loss:0.0058
Episode:7822 meanR:9.3800 R:10.0000 loss:0.0058
Episode:7823 meanR:9.3700 R:8.0000 loss:0.0062
Episode:7824 meanR:9.3700 R:9.0000 loss:0.0063
Episode:7825 meanR:9.3600 R:8.0000 loss:0.0068
Episode:7826 meanR:9.3600 R:10.0000 loss:0.0062
Episode:7827 meanR:9.3700 R:10.0000 loss:0.0061
Episode:7828 meanR:9.3700 R:10.0000 loss:0.0058
Episode:7829 meanR:9.3800 R:10.0000 loss:0.0060
Epi

Episode:7982 meanR:9.4500 R:9.0000 loss:0.0061
Episode:7983 meanR:9.4400 R:9.0000 loss:0.0053
Episode:7984 meanR:9.4300 R:9.0000 loss:0.0057
Episode:7985 meanR:9.4100 R:8.0000 loss:0.0060
Episode:7986 meanR:9.4100 R:9.0000 loss:0.0059
Episode:7987 meanR:9.4100 R:9.0000 loss:0.0064
Episode:7988 meanR:9.4100 R:10.0000 loss:0.0074
Episode:7989 meanR:9.3900 R:8.0000 loss:0.0072
Episode:7990 meanR:9.3800 R:10.0000 loss:0.0082
Episode:7991 meanR:9.3900 R:10.0000 loss:0.0077
Episode:7992 meanR:9.3800 R:9.0000 loss:0.0074
Episode:7993 meanR:9.3900 R:10.0000 loss:0.0076
Episode:7994 meanR:9.3700 R:9.0000 loss:0.0083
Episode:7995 meanR:9.3800 R:10.0000 loss:0.0068
Episode:7996 meanR:9.3700 R:10.0000 loss:0.0069
Episode:7997 meanR:9.3800 R:9.0000 loss:0.0068
Episode:7998 meanR:9.3800 R:10.0000 loss:0.0056
Episode:7999 meanR:9.3700 R:9.0000 loss:0.0054
Episode:8000 meanR:9.3600 R:9.0000 loss:0.0058
Episode:8001 meanR:9.3600 R:9.0000 loss:0.0057
Episode:8002 meanR:9.3600 R:10.0000 loss:0.0064
Episo

Episode:8155 meanR:9.2900 R:10.0000 loss:0.0139
Episode:8156 meanR:9.2800 R:10.0000 loss:0.0079
Episode:8157 meanR:9.2800 R:9.0000 loss:0.0099
Episode:8158 meanR:9.2900 R:10.0000 loss:0.0067
Episode:8159 meanR:9.3000 R:10.0000 loss:0.0054
Episode:8160 meanR:9.3100 R:10.0000 loss:0.0070
Episode:8161 meanR:9.3100 R:10.0000 loss:0.0055
Episode:8162 meanR:9.2900 R:8.0000 loss:0.0056
Episode:8163 meanR:9.2900 R:9.0000 loss:0.0124
Episode:8164 meanR:9.2800 R:9.0000 loss:0.0070
Episode:8165 meanR:9.2900 R:10.0000 loss:0.0106
Episode:8166 meanR:9.2800 R:9.0000 loss:0.0152
Episode:8167 meanR:9.2700 R:9.0000 loss:0.0079
Episode:8168 meanR:9.2700 R:9.0000 loss:0.0135
Episode:8169 meanR:9.2600 R:9.0000 loss:0.0074
Episode:8170 meanR:9.2700 R:10.0000 loss:0.0110
Episode:8171 meanR:9.2800 R:10.0000 loss:0.0110
Episode:8172 meanR:9.2900 R:9.0000 loss:0.0074
Episode:8173 meanR:9.3000 R:10.0000 loss:0.0075
Episode:8174 meanR:9.2900 R:9.0000 loss:0.0061
Episode:8175 meanR:9.2800 R:9.0000 loss:0.0058
Epi

Episode:8328 meanR:9.2400 R:9.0000 loss:0.0054
Episode:8329 meanR:9.2500 R:10.0000 loss:0.0061
Episode:8330 meanR:9.2600 R:10.0000 loss:0.0072
Episode:8331 meanR:9.2700 R:9.0000 loss:0.0069
Episode:8332 meanR:9.2700 R:10.0000 loss:0.0043
Episode:8333 meanR:9.2700 R:10.0000 loss:0.0044
Episode:8334 meanR:9.2700 R:10.0000 loss:0.0057
Episode:8335 meanR:9.2600 R:8.0000 loss:0.0042
Episode:8336 meanR:9.2700 R:10.0000 loss:0.0050
Episode:8337 meanR:9.2600 R:8.0000 loss:0.0060
Episode:8338 meanR:9.2600 R:10.0000 loss:0.0038
Episode:8339 meanR:9.2600 R:10.0000 loss:0.0060
Episode:8340 meanR:9.2500 R:9.0000 loss:0.0053
Episode:8341 meanR:9.2600 R:9.0000 loss:0.0078
Episode:8342 meanR:9.2500 R:9.0000 loss:0.0074
Episode:8343 meanR:9.2700 R:10.0000 loss:0.0051
Episode:8344 meanR:9.2700 R:10.0000 loss:0.0051
Episode:8345 meanR:9.2800 R:10.0000 loss:0.0052
Episode:8346 meanR:9.2700 R:9.0000 loss:0.0051
Episode:8347 meanR:9.2500 R:8.0000 loss:0.0047
Episode:8348 meanR:9.2600 R:10.0000 loss:0.0056
E

Episode:8501 meanR:9.4300 R:10.0000 loss:0.0119
Episode:8502 meanR:9.4400 R:10.0000 loss:0.0087
Episode:8503 meanR:9.4400 R:9.0000 loss:0.0068
Episode:8504 meanR:9.4400 R:10.0000 loss:0.0081
Episode:8505 meanR:9.4200 R:8.0000 loss:0.0112
Episode:8506 meanR:9.4400 R:10.0000 loss:0.0120
Episode:8507 meanR:9.4300 R:8.0000 loss:0.0092
Episode:8508 meanR:9.4300 R:10.0000 loss:0.0104
Episode:8509 meanR:9.4400 R:10.0000 loss:0.0101
Episode:8510 meanR:9.4200 R:8.0000 loss:0.0097
Episode:8511 meanR:9.4300 R:10.0000 loss:0.0079
Episode:8512 meanR:9.4200 R:9.0000 loss:0.0080
Episode:8513 meanR:9.4100 R:9.0000 loss:0.0055
Episode:8514 meanR:9.4100 R:11.0000 loss:0.0065
Episode:8515 meanR:9.4100 R:10.0000 loss:0.0037
Episode:8516 meanR:9.4200 R:10.0000 loss:0.0047
Episode:8517 meanR:9.4200 R:10.0000 loss:0.0079
Episode:8518 meanR:9.4200 R:9.0000 loss:0.0057
Episode:8519 meanR:9.4300 R:10.0000 loss:0.0057
Episode:8520 meanR:9.4100 R:8.0000 loss:0.0065
Episode:8521 meanR:9.4000 R:9.0000 loss:0.0053
E

Episode:8674 meanR:9.4100 R:10.0000 loss:0.0038
Episode:8675 meanR:9.4200 R:10.0000 loss:0.0039
Episode:8676 meanR:9.4300 R:10.0000 loss:0.0041
Episode:8677 meanR:9.4300 R:9.0000 loss:0.0050
Episode:8678 meanR:9.4300 R:9.0000 loss:0.0048
Episode:8679 meanR:9.4500 R:10.0000 loss:0.0052
Episode:8680 meanR:9.4700 R:10.0000 loss:0.0040
Episode:8681 meanR:9.4700 R:8.0000 loss:0.0041
Episode:8682 meanR:9.4600 R:9.0000 loss:0.0042
Episode:8683 meanR:9.4500 R:9.0000 loss:0.0051
Episode:8684 meanR:9.4500 R:9.0000 loss:0.0052
Episode:8685 meanR:9.4600 R:10.0000 loss:0.0074
Episode:8686 meanR:9.4600 R:9.0000 loss:0.0062
Episode:8687 meanR:9.4800 R:10.0000 loss:0.0049
Episode:8688 meanR:9.4800 R:9.0000 loss:0.0064
Episode:8689 meanR:9.4900 R:10.0000 loss:0.0061
Episode:8690 meanR:9.4900 R:10.0000 loss:0.0062
Episode:8691 meanR:9.5000 R:10.0000 loss:0.0057
Episode:8692 meanR:9.4800 R:8.0000 loss:0.0061
Episode:8693 meanR:9.4800 R:10.0000 loss:0.0071
Episode:8694 meanR:9.4700 R:9.0000 loss:0.0071
Ep

Episode:8847 meanR:9.4500 R:10.0000 loss:0.0050
Episode:8848 meanR:9.4500 R:8.0000 loss:0.0049
Episode:8849 meanR:9.4400 R:9.0000 loss:0.0056
Episode:8850 meanR:9.4300 R:9.0000 loss:0.0074
Episode:8851 meanR:9.4200 R:9.0000 loss:0.0071
Episode:8852 meanR:9.4200 R:9.0000 loss:0.0079
Episode:8853 meanR:9.4200 R:10.0000 loss:0.0064
Episode:8854 meanR:9.4100 R:10.0000 loss:0.0062
Episode:8855 meanR:9.4200 R:10.0000 loss:0.0068
Episode:8856 meanR:9.4300 R:9.0000 loss:0.0069
Episode:8857 meanR:9.4400 R:9.0000 loss:0.0063
Episode:8858 meanR:9.4400 R:10.0000 loss:0.0057
Episode:8859 meanR:9.4300 R:9.0000 loss:0.0056
Episode:8860 meanR:9.4200 R:9.0000 loss:0.0080
Episode:8861 meanR:9.4100 R:9.0000 loss:0.0046
Episode:8862 meanR:9.4100 R:10.0000 loss:0.0049
Episode:8863 meanR:9.4200 R:10.0000 loss:0.0039
Episode:8864 meanR:9.4300 R:10.0000 loss:0.0038
Episode:8865 meanR:9.4300 R:9.0000 loss:0.0033
Episode:8866 meanR:9.4100 R:8.0000 loss:0.0037
Episode:8867 meanR:9.4100 R:9.0000 loss:0.0044
Episo

Episode:9020 meanR:9.3500 R:11.0000 loss:0.0047
Episode:9021 meanR:9.3500 R:9.0000 loss:0.0047
Episode:9022 meanR:9.3500 R:10.0000 loss:0.0043
Episode:9023 meanR:9.3700 R:11.0000 loss:0.0045
Episode:9024 meanR:9.3700 R:9.0000 loss:0.0048
Episode:9025 meanR:9.3600 R:9.0000 loss:0.0052
Episode:9026 meanR:9.3600 R:9.0000 loss:0.0135
Episode:9027 meanR:9.3700 R:10.0000 loss:0.0088
Episode:9028 meanR:9.3700 R:9.0000 loss:0.0053
Episode:9029 meanR:9.3800 R:9.0000 loss:0.0053
Episode:9030 meanR:9.3800 R:10.0000 loss:0.0043
Episode:9031 meanR:9.3800 R:10.0000 loss:0.0037
Episode:9032 meanR:9.3800 R:9.0000 loss:0.0113
Episode:9033 meanR:9.3800 R:9.0000 loss:0.0053
Episode:9034 meanR:9.3700 R:9.0000 loss:0.0039
Episode:9035 meanR:9.3700 R:10.0000 loss:0.0043
Episode:9036 meanR:9.3800 R:10.0000 loss:0.0035
Episode:9037 meanR:9.3700 R:8.0000 loss:0.0033
Episode:9038 meanR:9.3700 R:10.0000 loss:0.0030
Episode:9039 meanR:9.3800 R:10.0000 loss:0.0079
Episode:9040 meanR:9.3800 R:9.0000 loss:0.0065
Epi

Episode:9193 meanR:9.4500 R:9.0000 loss:0.0029
Episode:9194 meanR:9.4600 R:11.0000 loss:0.0034
Episode:9195 meanR:9.4500 R:9.0000 loss:0.0038
Episode:9196 meanR:9.4600 R:11.0000 loss:0.0035
Episode:9197 meanR:9.4600 R:10.0000 loss:0.0044
Episode:9198 meanR:9.4400 R:8.0000 loss:0.0038
Episode:9199 meanR:9.4300 R:8.0000 loss:0.0059
Episode:9200 meanR:9.4300 R:10.0000 loss:0.0072
Episode:9201 meanR:9.4300 R:9.0000 loss:0.0069
Episode:9202 meanR:9.4300 R:9.0000 loss:0.0135
Episode:9203 meanR:9.4200 R:9.0000 loss:0.0093
Episode:9204 meanR:9.4300 R:10.0000 loss:0.0134
Episode:9205 meanR:9.4400 R:10.0000 loss:0.0086
Episode:9206 meanR:9.4500 R:10.0000 loss:0.0079
Episode:9207 meanR:9.4500 R:9.0000 loss:0.0056
Episode:9208 meanR:9.4500 R:10.0000 loss:0.0036
Episode:9209 meanR:9.4600 R:9.0000 loss:0.0034
Episode:9210 meanR:9.4500 R:9.0000 loss:0.0026
Episode:9211 meanR:9.4500 R:9.0000 loss:0.0025
Episode:9212 meanR:9.4500 R:9.0000 loss:0.0048
Episode:9213 meanR:9.4500 R:9.0000 loss:0.0038
Episo

Episode:9366 meanR:9.3300 R:8.0000 loss:0.0085
Episode:9367 meanR:9.3100 R:8.0000 loss:0.0091
Episode:9368 meanR:9.3100 R:9.0000 loss:0.0096
Episode:9369 meanR:9.3200 R:9.0000 loss:0.0101
Episode:9370 meanR:9.3200 R:9.0000 loss:0.0100
Episode:9371 meanR:9.3100 R:9.0000 loss:0.0091
Episode:9372 meanR:9.3100 R:10.0000 loss:0.0094
Episode:9373 meanR:9.3000 R:8.0000 loss:0.0082
Episode:9374 meanR:9.2900 R:9.0000 loss:0.0082
Episode:9375 meanR:9.2900 R:10.0000 loss:0.0068
Episode:9376 meanR:9.3100 R:10.0000 loss:0.0037
Episode:9377 meanR:9.3000 R:9.0000 loss:0.0031
Episode:9378 meanR:9.3100 R:10.0000 loss:0.0032
Episode:9379 meanR:9.3000 R:9.0000 loss:0.0122
Episode:9380 meanR:9.3100 R:9.0000 loss:0.0043
Episode:9381 meanR:9.3000 R:9.0000 loss:0.0048
Episode:9382 meanR:9.3100 R:9.0000 loss:0.0030
Episode:9383 meanR:9.3300 R:11.0000 loss:0.0032
Episode:9384 meanR:9.3500 R:10.0000 loss:0.0029
Episode:9385 meanR:9.3500 R:9.0000 loss:0.0032
Episode:9386 meanR:9.3500 R:9.0000 loss:0.0030
Episode

Episode:9539 meanR:9.4000 R:9.0000 loss:0.0054
Episode:9540 meanR:9.4100 R:10.0000 loss:0.0051
Episode:9541 meanR:9.4000 R:9.0000 loss:0.0049
Episode:9542 meanR:9.4100 R:11.0000 loss:0.0056
Episode:9543 meanR:9.3900 R:9.0000 loss:0.0062
Episode:9544 meanR:9.3900 R:10.0000 loss:0.0041
Episode:9545 meanR:9.3800 R:8.0000 loss:0.0037
Episode:9546 meanR:9.3700 R:9.0000 loss:0.0067
Episode:9547 meanR:9.3800 R:11.0000 loss:0.0055
Episode:9548 meanR:9.3800 R:9.0000 loss:0.0058
Episode:9549 meanR:9.3800 R:10.0000 loss:0.0054
Episode:9550 meanR:9.3900 R:10.0000 loss:0.0044
Episode:9551 meanR:9.3900 R:10.0000 loss:0.0040
Episode:9552 meanR:9.4000 R:10.0000 loss:0.0041
Episode:9553 meanR:9.4100 R:9.0000 loss:0.0052
Episode:9554 meanR:9.4100 R:10.0000 loss:0.0063
Episode:9555 meanR:9.4100 R:10.0000 loss:0.0078
Episode:9556 meanR:9.4200 R:10.0000 loss:0.0066
Episode:9557 meanR:9.4400 R:10.0000 loss:0.0061
Episode:9558 meanR:9.4200 R:8.0000 loss:0.0125
Episode:9559 meanR:9.4300 R:10.0000 loss:0.0060


Episode:9712 meanR:9.4100 R:10.0000 loss:0.0025
Episode:9713 meanR:9.4100 R:8.0000 loss:0.0035
Episode:9714 meanR:9.4100 R:9.0000 loss:0.0033
Episode:9715 meanR:9.4000 R:8.0000 loss:0.0035
Episode:9716 meanR:9.3900 R:9.0000 loss:0.0044
Episode:9717 meanR:9.3900 R:9.0000 loss:0.0047
Episode:9718 meanR:9.3900 R:10.0000 loss:0.0042
Episode:9719 meanR:9.4000 R:10.0000 loss:0.0048
Episode:9720 meanR:9.4100 R:10.0000 loss:0.0050
Episode:9721 meanR:9.3800 R:8.0000 loss:0.0055
Episode:9722 meanR:9.3900 R:10.0000 loss:0.0052
Episode:9723 meanR:9.3900 R:9.0000 loss:0.0056
Episode:9724 meanR:9.4000 R:10.0000 loss:0.0051
Episode:9725 meanR:9.3900 R:8.0000 loss:0.0053
Episode:9726 meanR:9.3900 R:9.0000 loss:0.0047
Episode:9727 meanR:9.4100 R:10.0000 loss:0.0045
Episode:9728 meanR:9.4000 R:8.0000 loss:0.0048
Episode:9729 meanR:9.4000 R:9.0000 loss:0.0056
Episode:9730 meanR:9.3900 R:9.0000 loss:0.0085
Episode:9731 meanR:9.3800 R:8.0000 loss:0.0078
Episode:9732 meanR:9.3800 R:8.0000 loss:0.0067
Episod

Episode:9885 meanR:9.3800 R:9.0000 loss:0.0023
Episode:9886 meanR:9.3900 R:9.0000 loss:0.0025
Episode:9887 meanR:9.3900 R:9.0000 loss:0.0024
Episode:9888 meanR:9.4000 R:10.0000 loss:0.0024
Episode:9889 meanR:9.4000 R:9.0000 loss:0.0029
Episode:9890 meanR:9.4000 R:9.0000 loss:0.0028
Episode:9891 meanR:9.4000 R:9.0000 loss:0.0036
Episode:9892 meanR:9.4100 R:10.0000 loss:0.0034
Episode:9893 meanR:9.4200 R:10.0000 loss:0.0028
Episode:9894 meanR:9.4200 R:10.0000 loss:0.0031
Episode:9895 meanR:9.4400 R:10.0000 loss:0.0027
Episode:9896 meanR:9.4300 R:9.0000 loss:0.0022
Episode:9897 meanR:9.4300 R:9.0000 loss:0.0038
Episode:9898 meanR:9.4400 R:10.0000 loss:0.0066
Episode:9899 meanR:9.4400 R:9.0000 loss:0.0077
Episode:9900 meanR:9.4300 R:9.0000 loss:0.0110
Episode:9901 meanR:9.4300 R:9.0000 loss:0.0073
Episode:9902 meanR:9.4300 R:10.0000 loss:0.0117
Episode:9903 meanR:9.4300 R:9.0000 loss:0.0093
Episode:9904 meanR:9.4200 R:9.0000 loss:0.0064
Episode:9905 meanR:9.4200 R:10.0000 loss:0.0104
Episo

Episode:10057 meanR:9.4400 R:9.0000 loss:0.0054
Episode:10058 meanR:9.4400 R:9.0000 loss:0.0045
Episode:10059 meanR:9.4200 R:8.0000 loss:0.0059
Episode:10060 meanR:9.4100 R:9.0000 loss:0.0066
Episode:10061 meanR:9.4100 R:10.0000 loss:0.0055
Episode:10062 meanR:9.4000 R:8.0000 loss:0.0128
Episode:10063 meanR:9.4000 R:9.0000 loss:0.0103
Episode:10064 meanR:9.4000 R:10.0000 loss:0.0092
Episode:10065 meanR:9.4100 R:9.0000 loss:0.0104
Episode:10066 meanR:9.3900 R:8.0000 loss:0.0097
Episode:10067 meanR:9.4100 R:10.0000 loss:0.0116
Episode:10068 meanR:9.4000 R:9.0000 loss:0.0076
Episode:10069 meanR:9.3900 R:9.0000 loss:0.0055
Episode:10070 meanR:9.3900 R:9.0000 loss:0.0053
Episode:10071 meanR:9.3800 R:8.0000 loss:0.0048
Episode:10072 meanR:9.3900 R:10.0000 loss:0.0048
Episode:10073 meanR:9.3800 R:9.0000 loss:0.0082
Episode:10074 meanR:9.3800 R:10.0000 loss:0.0061
Episode:10075 meanR:9.3800 R:10.0000 loss:0.0074
Episode:10076 meanR:9.3700 R:10.0000 loss:0.0059
Episode:10077 meanR:9.3700 R:10.0

Episode:10227 meanR:9.3500 R:9.0000 loss:0.0035
Episode:10228 meanR:9.3500 R:9.0000 loss:0.0038
Episode:10229 meanR:9.3300 R:8.0000 loss:0.0093
Episode:10230 meanR:9.3400 R:9.0000 loss:0.0036
Episode:10231 meanR:9.3400 R:10.0000 loss:0.0034
Episode:10232 meanR:9.3400 R:9.0000 loss:0.0042
Episode:10233 meanR:9.3300 R:9.0000 loss:0.0033
Episode:10234 meanR:9.3100 R:9.0000 loss:0.0061
Episode:10235 meanR:9.3200 R:10.0000 loss:0.0066
Episode:10236 meanR:9.3200 R:9.0000 loss:0.0075
Episode:10237 meanR:9.3400 R:10.0000 loss:0.0044
Episode:10238 meanR:9.3500 R:11.0000 loss:0.0055
Episode:10239 meanR:9.3400 R:8.0000 loss:0.0074
Episode:10240 meanR:9.3500 R:10.0000 loss:0.0056
Episode:10241 meanR:9.3500 R:10.0000 loss:0.0047
Episode:10242 meanR:9.3400 R:9.0000 loss:0.0048
Episode:10243 meanR:9.3500 R:10.0000 loss:0.0046
Episode:10244 meanR:9.3600 R:9.0000 loss:0.0049
Episode:10245 meanR:9.3400 R:9.0000 loss:0.0040
Episode:10246 meanR:9.3400 R:10.0000 loss:0.0040
Episode:10247 meanR:9.3400 R:10.

Episode:10396 meanR:9.4800 R:10.0000 loss:0.0060
Episode:10397 meanR:9.4800 R:9.0000 loss:0.0080
Episode:10398 meanR:9.4700 R:9.0000 loss:0.0087
Episode:10399 meanR:9.4600 R:9.0000 loss:0.0058
Episode:10400 meanR:9.4500 R:9.0000 loss:0.0061
Episode:10401 meanR:9.4500 R:9.0000 loss:0.0063
Episode:10402 meanR:9.4500 R:10.0000 loss:0.0048
Episode:10403 meanR:9.4500 R:9.0000 loss:0.0037
Episode:10404 meanR:9.4400 R:8.0000 loss:0.0036
Episode:10405 meanR:9.4500 R:10.0000 loss:0.0093
Episode:10406 meanR:9.4500 R:9.0000 loss:0.0098
Episode:10407 meanR:9.4500 R:10.0000 loss:0.0089
Episode:10408 meanR:9.4500 R:10.0000 loss:0.0084
Episode:10409 meanR:9.4400 R:9.0000 loss:0.0087
Episode:10410 meanR:9.4400 R:10.0000 loss:0.0078
Episode:10411 meanR:9.4500 R:9.0000 loss:0.0078
Episode:10412 meanR:9.4600 R:10.0000 loss:0.0084
Episode:10413 meanR:9.4700 R:10.0000 loss:0.0076
Episode:10414 meanR:9.4600 R:8.0000 loss:0.0079
Episode:10415 meanR:9.4500 R:8.0000 loss:0.0089
Episode:10416 meanR:9.4700 R:10.

Episode:10565 meanR:9.4400 R:10.0000 loss:0.0021
Episode:10566 meanR:9.4400 R:9.0000 loss:0.0029
Episode:10567 meanR:9.4200 R:9.0000 loss:0.0043
Episode:10568 meanR:9.4200 R:9.0000 loss:0.0049
Episode:10569 meanR:9.4000 R:8.0000 loss:0.0040
Episode:10570 meanR:9.4100 R:10.0000 loss:0.0043
Episode:10571 meanR:9.4000 R:9.0000 loss:0.0044
Episode:10572 meanR:9.3900 R:8.0000 loss:0.0053
Episode:10573 meanR:9.3900 R:10.0000 loss:0.0043
Episode:10574 meanR:9.3900 R:9.0000 loss:0.0044
Episode:10575 meanR:9.3800 R:8.0000 loss:0.0053
Episode:10576 meanR:9.3700 R:8.0000 loss:0.0057
Episode:10577 meanR:9.3700 R:9.0000 loss:0.0058
Episode:10578 meanR:9.3700 R:9.0000 loss:0.0061
Episode:10579 meanR:9.3800 R:10.0000 loss:0.0059
Episode:10580 meanR:9.3700 R:9.0000 loss:0.0053
Episode:10581 meanR:9.3800 R:10.0000 loss:0.0040
Episode:10582 meanR:9.3800 R:9.0000 loss:0.0036
Episode:10583 meanR:9.3800 R:9.0000 loss:0.0047
Episode:10584 meanR:9.3700 R:9.0000 loss:0.0090
Episode:10585 meanR:9.3600 R:9.0000

Episode:10735 meanR:9.4000 R:8.0000 loss:0.0044
Episode:10736 meanR:9.3800 R:8.0000 loss:0.0050
Episode:10737 meanR:9.3600 R:8.0000 loss:0.0059
Episode:10738 meanR:9.3600 R:10.0000 loss:0.0078
Episode:10739 meanR:9.3500 R:8.0000 loss:0.0074
Episode:10740 meanR:9.3300 R:8.0000 loss:0.0061
Episode:10741 meanR:9.3400 R:10.0000 loss:0.0053
Episode:10742 meanR:9.3500 R:9.0000 loss:0.0055
Episode:10743 meanR:9.3500 R:10.0000 loss:0.0061
Episode:10744 meanR:9.3300 R:8.0000 loss:0.0054
Episode:10745 meanR:9.3200 R:10.0000 loss:0.0047
Episode:10746 meanR:9.3200 R:9.0000 loss:0.0048
Episode:10747 meanR:9.3100 R:9.0000 loss:0.0064
Episode:10748 meanR:9.3100 R:9.0000 loss:0.0062
Episode:10749 meanR:9.3000 R:9.0000 loss:0.0060
Episode:10750 meanR:9.3000 R:9.0000 loss:0.0063
Episode:10751 meanR:9.3000 R:10.0000 loss:0.0108
Episode:10752 meanR:9.2900 R:8.0000 loss:0.0043
Episode:10753 meanR:9.2900 R:8.0000 loss:0.0040
Episode:10754 meanR:9.3000 R:11.0000 loss:0.0048
Episode:10755 meanR:9.3000 R:9.000

Episode:10905 meanR:9.3800 R:10.0000 loss:0.0064
Episode:10906 meanR:9.3700 R:10.0000 loss:0.0104
Episode:10907 meanR:9.3600 R:10.0000 loss:0.0067
Episode:10908 meanR:9.3600 R:10.0000 loss:0.0053
Episode:10909 meanR:9.3700 R:10.0000 loss:0.0065
Episode:10910 meanR:9.3800 R:10.0000 loss:0.0068
Episode:10911 meanR:9.3900 R:10.0000 loss:0.0085
Episode:10912 meanR:9.3800 R:8.0000 loss:0.0088
Episode:10913 meanR:9.3900 R:10.0000 loss:0.0079
Episode:10914 meanR:9.3900 R:9.0000 loss:0.0092
Episode:10915 meanR:9.3900 R:9.0000 loss:0.0084
Episode:10916 meanR:9.3900 R:10.0000 loss:0.0081
Episode:10917 meanR:9.3800 R:9.0000 loss:0.0066
Episode:10918 meanR:9.4000 R:11.0000 loss:0.0058
Episode:10919 meanR:9.4200 R:10.0000 loss:0.0057
Episode:10920 meanR:9.4000 R:8.0000 loss:0.0055
Episode:10921 meanR:9.4000 R:10.0000 loss:0.0068
Episode:10922 meanR:9.3900 R:8.0000 loss:0.0066
Episode:10923 meanR:9.4000 R:10.0000 loss:0.0060
Episode:10924 meanR:9.4100 R:9.0000 loss:0.0063
Episode:10925 meanR:9.3900 

Episode:11075 meanR:9.2800 R:10.0000 loss:0.0029
Episode:11076 meanR:9.2700 R:8.0000 loss:0.0055
Episode:11077 meanR:9.2600 R:8.0000 loss:0.0127
Episode:11078 meanR:9.2500 R:9.0000 loss:0.0069
Episode:11079 meanR:9.2400 R:9.0000 loss:0.0138
Episode:11080 meanR:9.2600 R:10.0000 loss:0.0152
Episode:11081 meanR:9.2600 R:9.0000 loss:0.0109
Episode:11082 meanR:9.2800 R:11.0000 loss:0.0246
Episode:11083 meanR:9.2700 R:8.0000 loss:0.0150
Episode:11084 meanR:9.2700 R:8.0000 loss:0.0196
Episode:11085 meanR:9.2600 R:10.0000 loss:0.0283
Episode:11086 meanR:9.2600 R:10.0000 loss:0.0302
Episode:11087 meanR:9.2700 R:10.0000 loss:0.0241
Episode:11088 meanR:9.2700 R:9.0000 loss:0.0258
Episode:11089 meanR:9.2800 R:9.0000 loss:0.0366
Episode:11090 meanR:9.2900 R:10.0000 loss:0.0226
Episode:11091 meanR:9.2800 R:9.0000 loss:0.0334
Episode:11092 meanR:9.2700 R:9.0000 loss:0.0474
Episode:11093 meanR:9.2900 R:10.0000 loss:0.0309
Episode:11094 meanR:9.3000 R:10.0000 loss:0.0203
Episode:11095 meanR:9.2800 R:9.

# Visualizing training

Below I'll plot the total rewards for each episode. I'm plotting the rolling average too, in blue.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / N 

In [None]:
eps, arr = np.array(episode_rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(loss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Average losses')

## Testing

Let's checkout how our trained agent plays the game.

In [38]:
import gym
env = gym.make('CartPole-v0')
env = gym.make('CartPole-v1')

with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    
    # Episode/epoch
    for _ in range(10):
        total_reward = 0
        state = env.reset()
        initial_state = sess.run(model.initial_state) # Qs or current batch or states[:-1]
        
        # Steps/batches
        while True:
            env.render()
            action_logits, initial_state = sess.run([model.actions_logits, model.final_state],
                                                    feed_dict = {model.states: state.reshape([1, -1]), 
                                                                 model.initial_state: initial_state})
            action = np.argmax(action_logits)
            state, reward, done, _ = env.step(action)
            total_reward += reward
            if done:
                break
        # At the end of each episode
        print('total_reward:{}'.format(total_reward))

# Close the env
env.close()

INFO:tensorflow:Restoring parameters from checkpoints/model.ckpt
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0


## Extending this

So, Cart-Pole is a pretty simple game. However, the same model can be used to train an agent to play something much more complicated like Pong or Space Invaders. Instead of a state like we're using here though, you'd want to use convolutional layers to get the state from the screen images.

![Deep Q-Learning Atari](assets/atari-network.png)

I'll leave it as a challenge for you to use deep Q-learning to train an agent to play Atari games. Here's the original paper which will get you started: http://www.davidqiu.com:8888/research/nature14236.pdf.