# Sequential DQN

In this notebook, we'll build a neural network that can learn to play games through reinforcement learning. More specifically, we'll use Q-learning to train an agent to play a game called [Cart-Pole](https://gym.openai.com/envs/CartPole-v0). In this game, a freely swinging pole is attached to a cart. The cart can move to the left and right, and the goal is to keep the pole upright as long as possible.

![Cart-Pole](assets/cart-pole.jpg)

We can simulate this game using [OpenAI Gym](https://gym.openai.com/). First, let's check out how OpenAI Gym works. Then, we'll get into training an agent to play the Cart-Pole game.

In [1]:
# In this one we should define and detect GPUs for tensorflow
# GPUs or CPU
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

  from ._conv import register_converters as _register_converters


TensorFlow Version: 1.7.1
Default GPU Device: 


>**Note:** Make sure you have OpenAI Gym cloned into the same directory with this notebook. I've included `gym` as a submodule, so you can run `git submodule --init --recursive` to pull the contents into the `gym` repo.

>**Note:** Make sure you have OpenAI Gym cloned. Then run this command `pip install -e gym/[all]`.

In [2]:
import gym

# Create the Cart-Pole game environment
env = gym.make('CartPole-v0')
env = gym.make('CartPole-v1')

We interact with the simulation through `env`. To show the simulation running, you can use `env.render()` to render one frame. Passing in an action as an integer to `env.step` will generate the next step in the simulation.  You can see how many actions are possible from `env.action_space` and to get a random action you can use `env.action_space.sample()`. This is general to all Gym games. In the Cart-Pole game, there are two possible actions, moving the cart left or right. So there are two actions we can take, encoded as 0 and 1.

Run the code below to watch the simulation run.

In [3]:
import numpy as np
state = env.reset()
for _ in range(10):
    # env.render()
    action = env.action_space.sample()
    next_state, reward, done, info = env.step(action) # take a random action
    #print('state, action, next_state, reward, done, info:', state, action, next_state, reward, done, info)
    state = next_state
    if done:
        state = env.reset()

To shut the window showing the simulation, use `env.close()`.

If you ran the simulation above, we can look at the rewards:

In [4]:
# print(rewards[-20:])
# print(np.array(rewards).shape, np.array(states).shape, np.array(actions).shape, np.array(dones).shape)
# print(np.array(rewards).dtype, np.array(states).dtype, np.array(actions).dtype, np.array(dones).dtype)
# print(np.max(np.array(actions)), np.min(np.array(actions)))
# print((np.max(np.array(actions)) - np.min(np.array(actions)))+1)
# print(np.max(np.array(rewards)), np.min(np.array(rewards)))
# print(np.max(np.array(states)), np.min(np.array(states)))

The game resets after the pole has fallen past a certain angle. For each frame while the simulation is running, it returns a reward of 1.0. The longer the game runs, the more reward we get. Then, our network's goal is to maximize the reward by keeping the pole vertical. It will do this by moving the cart to the left and the right.

## Q-Network

We train our Q-learning agent using the Bellman Equation:

$$
Q(s, a) = r + \gamma \max{Q(s', a')}
$$

where $s$ is a state, $a$ is an action, and $s'$ is the next state from state $s$ and action $a$.

Before we used this equation to learn values for a Q-_table_. However, for this game there are a huge number of states available. The state has four values: the position and velocity of the cart, and the position and velocity of the pole. These are all real-valued numbers, so ignoring floating point precisions, you practically have infinite states. Instead of using a table then, we'll replace it with a neural network that will approximate the Q-table lookup function.

<img src="assets/deep-q-learning.png" width=450px>

Now, our Q value, $Q(s, a)$ is calculated by passing in a state to the network. The output will be Q-values for each available action, with fully connected hidden layers.

<img src="assets/q-network.png" width=550px>


As I showed before, we can define our targets for training as $\hat{Q}(s,a) = r + \gamma \max{Q(s', a')}$. Then we update the weights by minimizing $(\hat{Q}(s,a) - Q(s,a))^2$. 

For this Cart-Pole game, we have four inputs, one for each value in the state, and two outputs, one for each action. To get $\hat{Q}$, we'll first choose an action, then simulate the game using that action. This will get us the next state, $s'$, and the reward. With that, we can calculate $\hat{Q}$ then pass it back into the $Q$ network to run the optimizer and update the weights.

Below is my implementation of the Q-network. I used two fully connected layers with ReLU activations. Two seems to be good enough, three might be better. Feel free to try it out.

In [5]:
def model_input(state_size, lstm_size, batch_size=1):
    states = tf.placeholder(tf.float32, [None, state_size], name='states')
    actions = tf.placeholder(tf.int32, [None], name='actions')
    targetQs = tf.placeholder(tf.float32, [None], name='targetQs')
    # RNN
    gru = tf.nn.rnn_cell.GRUCell(lstm_size)
    cell = tf.nn.rnn_cell.MultiRNNCell([gru], state_is_tuple=False)
    initial_state = cell.zero_state(batch_size, tf.float32)
    return states, actions, targetQs, cell, initial_state

In [6]:
# RNN generator or sequence generator
def generator(states, num_classes, initial_state, cell, lstm_size, reuse=False): 
    with tf.variable_scope('generator', reuse=reuse):
        # First fully connected layer
        inputs = tf.layers.dense(inputs=states, units=lstm_size)
        print(states.shape, inputs.shape)
        
        # with tf.variable_scope('dynamic_rnn_', reuse=tf.AUTO_REUSE):
        # dynamic means adapt to the batch_size
        inputs_rnn = tf.reshape(inputs, [1, -1, lstm_size]) # NxH -> 1xNxH
        print(inputs_rnn.shape, initial_state.shape)
        outputs_rnn, final_state = tf.nn.dynamic_rnn(cell=cell, inputs=inputs_rnn, initial_state=initial_state)
        print(outputs_rnn.shape, final_state.shape)
        outputs = tf.reshape(outputs_rnn, [-1, lstm_size]) # 1xNxH -> NxH
        print(outputs.shape)

        # Last fully connected layer
        logits = tf.layers.dense(inputs=outputs, units=num_classes)
        print(logits.shape)
        #predictions = tf.nn.softmax(logits)
        
        # logits are the action logits
        return logits, final_state

In [7]:
def model_loss(action_size, hidden_size, states, cell, initial_state, actions, targetQs):
    actions_logits, final_state = generator(states=states, cell=cell, initial_state=initial_state, 
                                            lstm_size=hidden_size, num_classes=action_size)
    actions_labels = tf.one_hot(indices=actions, depth=action_size, dtype=actions_logits.dtype)
    Qs = tf.reduce_max(actions_logits*actions_labels, axis=1)
    loss = tf.reduce_mean(tf.square(Qs - targetQs))
    return actions_logits, final_state, loss

In [8]:
def model_opt(loss, learning_rate):
    # Get weights and bias to update
    t_vars = tf.trainable_variables()
    g_vars = [var for var in t_vars if var.name.startswith('generator')]

    # # Optimize
    # with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): # Required for batchnorm (BN)
    # #opt = tf.train.AdamOptimizer(learning_rate).minimize(loss, var_list=g_vars)

    #grads, _ = tf.clip_by_global_norm(t_list=tf.gradients(loss, g_vars), clip_norm=5) # usually around 1-5
    grads = tf.gradients(loss, g_vars)
    opt = tf.train.AdamOptimizer(learning_rate).apply_gradients(grads_and_vars=zip(grads, g_vars))

    return opt

In [9]:
class Model:
    def __init__(self, state_size, action_size, hidden_size, learning_rate):

        # Data of the Model: make the data available inside the framework
        self.states, self.actions, self.targetQs, cell, self.initial_state = model_input(
            state_size=state_size, lstm_size=hidden_size)
        
        # Create the Model: calculating the loss and forwad pass
        self.actions_logits, self.final_state, self.loss = model_loss(
            action_size=action_size, hidden_size=hidden_size, 
            states=self.states, actions=self.actions, 
            targetQs=self.targetQs, cell=cell, initial_state=self.initial_state)

        # Update the model: backward pass and backprop
        self.opt = model_opt(loss=self.loss, learning_rate=learning_rate)

## Experience replay

Reinforcement learning algorithms can have stability issues due to correlations between states. To reduce correlations when training, we can store the agent's experiences and later draw a random mini-batch of those experiences to train on. 

Here, we'll create a `Memory` object that will store our experiences, our transitions $<s, a, r, s'>$. This memory will have a maxmium capacity, so we can keep newer experiences in memory while getting rid of older experiences. Then, we'll sample a random mini-batch of transitions $<s, a, r, s'>$ and train on those.

Below, I've implemented a `Memory` object. If you're unfamiliar with `deque`, this is a double-ended queue. You can think of it like a tube open on both sides. You can put objects in either side of the tube. But if it's full, adding anything more will push an object out the other side. This is a great data structure to use for the memory buffer.

In [10]:
from collections import deque

class Memory():    
    def __init__(self, max_size = 1000):
        self.buffer = deque(maxlen=max_size)
        self.states = deque(maxlen=max_size)
    def sample(self, batch_size):
        idx = np.random.choice(np.arange(len(self.buffer)), 
                               size=batch_size, 
                               replace=False)
        return [self.buffer[ii] for ii in idx], [self.states[ii] for ii in idx]

## Exploration - Exploitation

To learn about the environment and rules of the game, the agent needs to explore by taking random actions. We'll do this by choosing a random action with some probability $\epsilon$ (epsilon).  That is, with some probability $\epsilon$ the agent will make a random action and with probability $1 - \epsilon$, the agent will choose an action from $Q(s,a)$. This is called an **$\epsilon$-greedy policy**.


At first, the agent needs to do a lot of exploring. Later when it has learned more, the agent can favor choosing actions based on what it has learned. This is called _exploitation_. We'll set it up so the agent is more likely to explore early in training, then more likely to exploit later in training.

## Q-Learning training algorithm

Putting all this together, we can list out the algorithm we'll use to train the network. We'll train the network in _episodes_. One *episode* is one simulation of the game. For this game, the goal is to keep the pole upright for 195 frames. So we can start a new episode once meeting that goal. The game ends if the pole tilts over too far, or if the cart moves too far the left or right. When a game ends, we'll start a new episode. Now, to train the agent:

* Initialize the memory $D$
* Initialize the action-value network $Q$ with random weights
* **For** episode = 1, $M$ **do**
  * **For** $t$, $T$ **do**
     * With probability $\epsilon$ select a random action $a_t$, otherwise select $a_t = \mathrm{argmax}_a Q(s,a)$
     * Execute action $a_t$ in simulator and observe reward $r_{t+1}$ and new state $s_{t+1}$
     * Store transition $<s_t, a_t, r_{t+1}, s_{t+1}>$ in memory $D$
     * Sample random mini-batch from $D$: $<s_j, a_j, r_j, s'_j>$
     * Set $\hat{Q}_j = r_j$ if the episode ends at $j+1$, otherwise set $\hat{Q}_j = r_j + \gamma \max_{a'}{Q(s'_j, a')}$
     * Make a gradient descent step with loss $(\hat{Q}_j - Q(s_j, a_j))^2$
  * **endfor**
* **endfor**

## Hyperparameters

One of the more difficult aspects of reinforcememt learning are the large number of hyperparameters. Not only are we tuning the network, but we're tuning the simulation.

In [11]:
# print('state:', np.array(states).shape[1], 
#       'action size: {}'.format((np.max(np.array(actions)) - np.min(np.array(actions)))+1))

In [12]:
# Network parameters
action_size = 2
state_size = 4
hidden_size = 64               # number of units in each Q-network hidden layer
learning_rate = 0.0001         # Q-network learning rate

# Memory parameters
memory_size = 128            # memory capacity - 1000 DQN
batch_size = 128             # experience mini-batch size - 20 DQN
gamma = 0.99                 # future reward discount

In [13]:
# Reset/init the graph/session
graph = tf.reset_default_graph()

# Init the model
model = Model(action_size=action_size, hidden_size=hidden_size, state_size=state_size, learning_rate=learning_rate)

# Init the memory
memory = Memory(max_size=batch_size)

(?, 4) (?, 64)
(1, ?, 64) (1, 64)
(1, ?, 64) (1, 64)
(?, 64)
(?, 2)


## Populate the memory (exprience memory)

Here I'm re-initializing the simulation and pre-populating the memory. The agent is taking random actions and storing the transitions in memory. This will help the agent with exploring the game.

In [14]:
state = env.reset()
for _ in range(memory_size):
    action = env.action_space.sample()
    next_state, reward, done, _ = env.step(action)
    memory.buffer.append([state, action, next_state, reward, float(done)])
    memory.states.append(np.zeros([1, hidden_size]))
    state = next_state
    if done is True:
        # Reseting the env/first state
        state = env.reset()

In [15]:
# # Training
# batch = memory.buffer
# states = np.array([each[0] for each in batch])
# actions = np.array([each[1] for each in batch])
# next_states = np.array([each[2] for each in batch])
# rewards = np.array([each[3] for each in batch])
# dones = np.array([each[4] for each in batch])

## Training the model

Below we'll train our agent. If you want to watch it train, uncomment the `env.render()` line. This is slow because it's rendering the frames slower than the network can train. But, it's cool to watch the agent get better at the game.

In [16]:
# initial_states = np.array(memory.states)
# initial_states.shape

In [None]:
saver = tf.train.Saver()
episode_rewards_list, rewards_list, loss_list = [], [], []

# TF session for training
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    episode_reward = deque(maxlen=100) # 100 episodes average/running average/running mean/window
    
    # Training episodes/epochs
    for ep in range(11111):
        total_reward = 0
        loss_batch = []
        state = env.reset()
        initial_state = sess.run(model.initial_state)

        # Training steps/batches
        while True:
            action_logits, final_state = sess.run([model.actions_logits, model.final_state],
                                                  feed_dict = {model.states: state.reshape([1, -1]), 
                                                               model.initial_state: initial_state})
            action = np.argmax(action_logits)
            next_state, reward, done, _ = env.step(action)
            memory.buffer.append([state, action, next_state, reward, float(done)])
            memory.states.append(initial_state)
            total_reward += reward
            state = next_state
            initial_state = final_state

            # Training
            batch = memory.buffer
            states = np.array([each[0] for each in batch])
            actions = np.array([each[1] for each in batch])
            next_states = np.array([each[2] for each in batch])
            rewards = np.array([each[3] for each in batch])
            dones = np.array([each[4] for each in batch])
            #initial_states = np.array(memory.states)
            initial_states = memory.states
            next_actions_logits = sess.run(model.actions_logits, 
                                           feed_dict = {model.states: next_states,
                                                        model.initial_state: initial_states[1]})
            nextQs = np.max(next_actions_logits, axis=1) * (1-dones)
            targetQs = rewards + (gamma * nextQs)
            loss, _ = sess.run([model.loss, model.opt], feed_dict = {model.states: states, 
                                                                     model.actions: actions,
                                                                     model.targetQs: targetQs,
                                                                     model.initial_state: initial_states[0]})
            # End of training
            loss_batch.append(loss)
            if done is True:
                break
                
        # Outputing: priting out/Potting
        episode_reward.append(total_reward)
        print('Episode:{}'.format(ep),
              'meanR:{:.4f}'.format(np.mean(episode_reward)),
              'R:{:.4f}'.format(total_reward),
              'loss:{:.4f}'.format(np.mean(loss_batch)))
        # Ploting out
        episode_rewards_list.append([ep, np.mean(episode_reward)])
        rewards_list.append([ep, total_reward])
        loss_list.append([ep, np.mean(loss_batch)])
        # Break episode/epoch loop
        if np.mean(episode_reward) >= 500:
            break
            
    # At the end of all training episodes/epochs
    saver.save(sess, 'checkpoints/model.ckpt')

Episode:0 meanR:52.0000 R:52.0000 loss:1.1622
Episode:1 meanR:45.5000 R:39.0000 loss:0.9973
Episode:2 meanR:41.3333 R:33.0000 loss:1.0668
Episode:3 meanR:41.5000 R:42.0000 loss:1.0128
Episode:4 meanR:41.0000 R:39.0000 loss:1.3224
Episode:5 meanR:37.6667 R:21.0000 loss:1.6030
Episode:6 meanR:34.7143 R:17.0000 loss:1.8615
Episode:7 meanR:32.6250 R:18.0000 loss:2.1240
Episode:8 meanR:31.2222 R:20.0000 loss:2.4263
Episode:9 meanR:29.7000 R:16.0000 loss:2.7760
Episode:10 meanR:28.8182 R:20.0000 loss:2.9228
Episode:11 meanR:27.9167 R:18.0000 loss:3.1931
Episode:12 meanR:27.6154 R:24.0000 loss:3.2269
Episode:13 meanR:27.4286 R:25.0000 loss:3.1949
Episode:14 meanR:27.4667 R:28.0000 loss:3.2389
Episode:15 meanR:26.9375 R:19.0000 loss:3.3012
Episode:16 meanR:26.5882 R:21.0000 loss:3.4123
Episode:17 meanR:26.4444 R:24.0000 loss:3.4982
Episode:18 meanR:26.3684 R:25.0000 loss:3.6283
Episode:19 meanR:26.3000 R:25.0000 loss:3.6873
Episode:20 meanR:26.2857 R:26.0000 loss:3.8464
Episode:21 meanR:26.000

Episode:173 meanR:37.2800 R:102.0000 loss:0.7342
Episode:174 meanR:37.5100 R:34.0000 loss:1.7905
Episode:175 meanR:37.7000 R:31.0000 loss:9.7223
Episode:176 meanR:37.8400 R:27.0000 loss:16.4098
Episode:177 meanR:38.4700 R:78.0000 loss:19.2473
Episode:178 meanR:38.5800 R:24.0000 loss:13.5344
Episode:179 meanR:38.6300 R:20.0000 loss:15.1545
Episode:180 meanR:38.6800 R:18.0000 loss:17.6511
Episode:181 meanR:38.7700 R:24.0000 loss:15.7806
Episode:182 meanR:39.7100 R:108.0000 loss:72.1386
Episode:183 meanR:40.7300 R:120.0000 loss:9.9799
Episode:184 meanR:40.8500 R:31.0000 loss:9.6109
Episode:185 meanR:41.0100 R:33.0000 loss:4.7333
Episode:186 meanR:41.1900 R:31.0000 loss:8.5522
Episode:187 meanR:42.7900 R:176.0000 loss:8.4791
Episode:188 meanR:43.7900 R:117.0000 loss:9.4129
Episode:189 meanR:43.8500 R:20.0000 loss:12.7997
Episode:190 meanR:43.8300 R:18.0000 loss:15.7790
Episode:191 meanR:43.9000 R:25.0000 loss:18.9757
Episode:192 meanR:44.9800 R:130.0000 loss:13.3513
Episode:193 meanR:45.95

Episode:339 meanR:289.9600 R:500.0000 loss:15.9674
Episode:340 meanR:294.1800 R:500.0000 loss:15.6195
Episode:341 meanR:298.5100 R:500.0000 loss:15.6869
Episode:342 meanR:300.6600 R:272.0000 loss:28.6322
Episode:343 meanR:305.0600 R:487.0000 loss:15.3149
Episode:344 meanR:307.9000 R:339.0000 loss:22.5444
Episode:345 meanR:309.2300 R:192.0000 loss:27.9462
Episode:346 meanR:310.2800 R:159.0000 loss:19.8253
Episode:347 meanR:310.9000 R:98.0000 loss:20.1440
Episode:348 meanR:310.9900 R:54.0000 loss:68.2759
Episode:349 meanR:311.2400 R:51.0000 loss:66.1013
Episode:350 meanR:315.8100 R:500.0000 loss:7.2205
Episode:351 meanR:320.3200 R:500.0000 loss:4.7770
Episode:352 meanR:323.0600 R:307.0000 loss:25.1335
Episode:353 meanR:326.6400 R:500.0000 loss:6.3606
Episode:354 meanR:327.8500 R:500.0000 loss:14.5608
Episode:355 meanR:328.7300 R:500.0000 loss:6.2355
Episode:356 meanR:328.0000 R:83.0000 loss:55.3152
Episode:357 meanR:331.6600 R:500.0000 loss:23.0249
Episode:358 meanR:335.3900 R:500.0000 l

Episode:500 meanR:339.3000 R:11.0000 loss:340.3174
Episode:501 meanR:334.4000 R:10.0000 loss:341.3881
Episode:502 meanR:329.4900 R:9.0000 loss:336.7712
Episode:503 meanR:324.5800 R:9.0000 loss:343.1965
Episode:504 meanR:319.6900 R:11.0000 loss:347.2753
Episode:505 meanR:318.0200 R:10.0000 loss:347.4333
Episode:506 meanR:313.1100 R:9.0000 loss:346.4355
Episode:507 meanR:308.2000 R:9.0000 loss:347.8940
Episode:508 meanR:306.5200 R:9.0000 loss:350.7670
Episode:509 meanR:301.6100 R:9.0000 loss:351.6598
Episode:510 meanR:296.7100 R:10.0000 loss:347.6264
Episode:511 meanR:291.8100 R:10.0000 loss:344.3522
Episode:512 meanR:286.8900 R:8.0000 loss:336.9310
Episode:513 meanR:281.9800 R:9.0000 loss:338.4356
Episode:514 meanR:279.3000 R:10.0000 loss:330.6525
Episode:515 meanR:274.3900 R:9.0000 loss:320.4992
Episode:516 meanR:269.4800 R:9.0000 loss:310.7948
Episode:517 meanR:264.5700 R:9.0000 loss:299.8535
Episode:518 meanR:259.6600 R:9.0000 loss:291.5416
Episode:519 meanR:254.7500 R:9.0000 loss:28

Episode:666 meanR:17.8200 R:20.0000 loss:10.6096
Episode:667 meanR:17.9100 R:22.0000 loss:10.2937
Episode:668 meanR:17.9500 R:18.0000 loss:11.0787
Episode:669 meanR:18.0500 R:23.0000 loss:10.6568
Episode:670 meanR:18.1300 R:23.0000 loss:10.6074
Episode:671 meanR:18.2000 R:20.0000 loss:10.5958
Episode:672 meanR:18.2500 R:18.0000 loss:10.6981
Episode:673 meanR:18.3400 R:21.0000 loss:10.1802
Episode:674 meanR:18.4200 R:20.0000 loss:10.5114
Episode:675 meanR:18.5000 R:21.0000 loss:10.1636
Episode:676 meanR:18.6000 R:22.0000 loss:9.7167
Episode:677 meanR:18.6700 R:22.0000 loss:10.2270
Episode:678 meanR:18.7600 R:21.0000 loss:9.9439
Episode:679 meanR:18.7900 R:19.0000 loss:9.6087
Episode:680 meanR:18.8700 R:22.0000 loss:9.3588
Episode:681 meanR:18.9300 R:19.0000 loss:9.0355
Episode:682 meanR:19.0000 R:23.0000 loss:8.6183
Episode:683 meanR:19.0700 R:19.0000 loss:8.7489
Episode:684 meanR:19.1300 R:19.0000 loss:8.9008
Episode:685 meanR:19.2100 R:20.0000 loss:8.6647
Episode:686 meanR:19.2800 R:2

Episode:836 meanR:19.2000 R:20.0000 loss:38.0785
Episode:837 meanR:19.1700 R:18.0000 loss:39.8385
Episode:838 meanR:19.1300 R:17.0000 loss:37.4982
Episode:839 meanR:19.0900 R:18.0000 loss:36.1362
Episode:840 meanR:19.0700 R:19.0000 loss:39.2880
Episode:841 meanR:19.0600 R:17.0000 loss:37.6500
Episode:842 meanR:19.0300 R:17.0000 loss:35.0668
Episode:843 meanR:18.9900 R:18.0000 loss:27.7230
Episode:844 meanR:18.9800 R:19.0000 loss:32.8657
Episode:845 meanR:18.9800 R:19.0000 loss:36.5918
Episode:846 meanR:18.9600 R:18.0000 loss:36.8581
Episode:847 meanR:18.9000 R:15.0000 loss:38.9546
Episode:848 meanR:18.8500 R:16.0000 loss:41.5162
Episode:849 meanR:18.8200 R:17.0000 loss:44.8396
Episode:850 meanR:18.7600 R:16.0000 loss:49.4894
Episode:851 meanR:18.7400 R:17.0000 loss:52.3739
Episode:852 meanR:18.7300 R:18.0000 loss:49.2151
Episode:853 meanR:18.7100 R:18.0000 loss:52.3307
Episode:854 meanR:18.6600 R:17.0000 loss:52.6387
Episode:855 meanR:18.6400 R:18.0000 loss:46.6638
Episode:856 meanR:18

Episode:1004 meanR:14.4500 R:15.0000 loss:3.5953
Episode:1005 meanR:14.4300 R:14.0000 loss:2.8041
Episode:1006 meanR:14.4100 R:14.0000 loss:4.2447
Episode:1007 meanR:14.4200 R:15.0000 loss:12.1262
Episode:1008 meanR:14.4000 R:14.0000 loss:22.0749
Episode:1009 meanR:14.3900 R:14.0000 loss:24.6699
Episode:1010 meanR:14.3800 R:13.0000 loss:18.6150
Episode:1011 meanR:14.3400 R:13.0000 loss:16.2628
Episode:1012 meanR:14.3400 R:15.0000 loss:15.4490
Episode:1013 meanR:14.3300 R:13.0000 loss:14.1519
Episode:1014 meanR:14.3300 R:15.0000 loss:12.6924
Episode:1015 meanR:14.3100 R:14.0000 loss:8.2318
Episode:1016 meanR:14.3000 R:13.0000 loss:7.6791
Episode:1017 meanR:14.2800 R:14.0000 loss:8.1962
Episode:1018 meanR:14.2600 R:15.0000 loss:7.6384
Episode:1019 meanR:14.2600 R:13.0000 loss:6.9418
Episode:1020 meanR:14.2400 R:13.0000 loss:5.6737
Episode:1021 meanR:14.2500 R:16.0000 loss:5.0334
Episode:1022 meanR:14.2600 R:14.0000 loss:5.3818
Episode:1023 meanR:14.2700 R:15.0000 loss:5.0955
Episode:1024

Episode:1168 meanR:112.9600 R:10.0000 loss:180.3252
Episode:1169 meanR:112.7800 R:8.0000 loss:171.0389
Episode:1170 meanR:112.6100 R:8.0000 loss:175.2194
Episode:1171 meanR:112.3700 R:10.0000 loss:194.7549
Episode:1172 meanR:112.1700 R:10.0000 loss:225.1641
Episode:1173 meanR:111.8700 R:8.0000 loss:244.7726
Episode:1174 meanR:111.5200 R:9.0000 loss:247.2076
Episode:1175 meanR:111.3600 R:11.0000 loss:233.6577
Episode:1176 meanR:111.1600 R:10.0000 loss:223.3749
Episode:1177 meanR:110.9600 R:10.0000 loss:214.3481
Episode:1178 meanR:110.7600 R:9.0000 loss:211.5576
Episode:1179 meanR:110.5200 R:10.0000 loss:241.0211
Episode:1180 meanR:110.2800 R:10.0000 loss:247.6936
Episode:1181 meanR:110.1000 R:10.0000 loss:244.3373
Episode:1182 meanR:109.9400 R:11.0000 loss:237.3374
Episode:1183 meanR:109.7000 R:10.0000 loss:213.5377
Episode:1184 meanR:109.4300 R:9.0000 loss:187.9494
Episode:1185 meanR:108.9600 R:9.0000 loss:184.3361
Episode:1186 meanR:108.6100 R:9.0000 loss:189.7935
Episode:1187 meanR:1

Episode:1331 meanR:9.9800 R:10.0000 loss:251.2981
Episode:1332 meanR:9.9800 R:9.0000 loss:246.3806
Episode:1333 meanR:9.9800 R:11.0000 loss:250.5294
Episode:1334 meanR:10.0000 R:12.0000 loss:238.3743
Episode:1335 meanR:10.0100 R:10.0000 loss:219.8756
Episode:1336 meanR:10.0200 R:11.0000 loss:205.3947
Episode:1337 meanR:10.0300 R:11.0000 loss:192.2325
Episode:1338 meanR:10.0500 R:11.0000 loss:204.6201
Episode:1339 meanR:10.0700 R:12.0000 loss:235.2532
Episode:1340 meanR:10.0800 R:10.0000 loss:223.2649
Episode:1341 meanR:10.1300 R:15.0000 loss:255.8900
Episode:1342 meanR:10.1300 R:11.0000 loss:276.8381
Episode:1343 meanR:10.1400 R:12.0000 loss:264.0474
Episode:1344 meanR:10.2100 R:17.0000 loss:263.6966
Episode:1345 meanR:10.2500 R:14.0000 loss:290.4488
Episode:1346 meanR:10.3200 R:17.0000 loss:345.7412
Episode:1347 meanR:10.3400 R:13.0000 loss:371.9074
Episode:1348 meanR:10.3700 R:13.0000 loss:379.0882
Episode:1349 meanR:10.4100 R:13.0000 loss:360.4068
Episode:1350 meanR:10.5900 R:27.000

Episode:1491 meanR:462.8900 R:500.0000 loss:18.8890
Episode:1492 meanR:459.2400 R:135.0000 loss:80.5810
Episode:1493 meanR:459.2400 R:500.0000 loss:15.3880
Episode:1494 meanR:459.2400 R:500.0000 loss:15.6941
Episode:1495 meanR:459.2400 R:500.0000 loss:20.8895
Episode:1496 meanR:459.2400 R:500.0000 loss:18.8128
Episode:1497 meanR:459.2400 R:500.0000 loss:17.1775
Episode:1498 meanR:459.2400 R:500.0000 loss:16.7552
Episode:1499 meanR:459.2400 R:500.0000 loss:19.4401
Episode:1500 meanR:460.2200 R:500.0000 loss:19.0735
Episode:1501 meanR:460.2200 R:500.0000 loss:16.1950
Episode:1502 meanR:459.8200 R:126.0000 loss:75.9136
Episode:1503 meanR:455.0100 R:19.0000 loss:72.5444
Episode:1504 meanR:451.3000 R:129.0000 loss:96.5180
Episode:1505 meanR:447.6700 R:137.0000 loss:29.3659
Episode:1506 meanR:443.9500 R:128.0000 loss:8.6900
Episode:1507 meanR:446.8800 R:500.0000 loss:2.9494
Episode:1508 meanR:450.7600 R:500.0000 loss:17.3400
Episode:1509 meanR:454.6700 R:500.0000 loss:22.2095
Episode:1510 me

Episode:1649 meanR:435.4300 R:500.0000 loss:16.5732
Episode:1650 meanR:435.4300 R:500.0000 loss:18.3569
Episode:1651 meanR:435.4300 R:500.0000 loss:15.9977
Episode:1652 meanR:435.4300 R:500.0000 loss:15.7517
Episode:1653 meanR:435.4300 R:500.0000 loss:18.0780
Episode:1654 meanR:435.4300 R:500.0000 loss:15.8211
Episode:1655 meanR:435.4300 R:500.0000 loss:15.3664
Episode:1656 meanR:435.4300 R:500.0000 loss:17.4847
Episode:1657 meanR:435.4300 R:500.0000 loss:19.4509
Episode:1658 meanR:435.4300 R:500.0000 loss:18.5019
Episode:1659 meanR:435.4300 R:500.0000 loss:17.8561
Episode:1660 meanR:431.7300 R:130.0000 loss:43.6907
Episode:1661 meanR:431.7300 R:500.0000 loss:4.2151
Episode:1662 meanR:431.7300 R:500.0000 loss:16.1250
Episode:1663 meanR:431.7300 R:500.0000 loss:18.8814
Episode:1664 meanR:431.7300 R:500.0000 loss:12.7218
Episode:1665 meanR:431.7300 R:500.0000 loss:14.6357
Episode:1666 meanR:433.8800 R:500.0000 loss:10.1968
Episode:1667 meanR:437.3700 R:500.0000 loss:14.4117
Episode:1668 

Episode:1809 meanR:195.5300 R:500.0000 loss:15.8915
Episode:1810 meanR:195.5300 R:500.0000 loss:18.0478
Episode:1811 meanR:191.4000 R:87.0000 loss:66.8313
Episode:1812 meanR:187.1900 R:79.0000 loss:54.0832
Episode:1813 meanR:187.1900 R:500.0000 loss:2.9792
Episode:1814 meanR:187.1900 R:500.0000 loss:13.8104
Episode:1815 meanR:187.1900 R:500.0000 loss:15.7548
Episode:1816 meanR:187.1900 R:500.0000 loss:18.4483
Episode:1817 meanR:187.1900 R:500.0000 loss:16.6389
Episode:1818 meanR:187.1900 R:500.0000 loss:18.6065
Episode:1819 meanR:191.4500 R:500.0000 loss:17.5253
Episode:1820 meanR:195.8400 R:500.0000 loss:18.7286
Episode:1821 meanR:200.1700 R:500.0000 loss:19.7381
Episode:1822 meanR:205.0600 R:500.0000 loss:17.6229
Episode:1823 meanR:209.0800 R:414.0000 loss:22.1608
Episode:1824 meanR:212.5700 R:500.0000 loss:5.1678
Episode:1825 meanR:212.5700 R:500.0000 loss:17.0771
Episode:1826 meanR:212.5700 R:500.0000 loss:19.7760
Episode:1827 meanR:212.5700 R:500.0000 loss:19.1111
Episode:1828 mea

Episode:1969 meanR:199.2900 R:132.0000 loss:5.7932
Episode:1970 meanR:199.6100 R:150.0000 loss:3.6500
Episode:1971 meanR:199.8600 R:147.0000 loss:3.6811
Episode:1972 meanR:200.2300 R:157.0000 loss:3.3711
Episode:1973 meanR:201.4300 R:243.0000 loss:2.7027
Episode:1974 meanR:205.2000 R:500.0000 loss:1.8604
Episode:1975 meanR:208.9200 R:500.0000 loss:13.8819
Episode:1976 meanR:212.5700 R:500.0000 loss:10.5679
Episode:1977 meanR:216.0500 R:481.0000 loss:13.8608
Episode:1978 meanR:219.4600 R:500.0000 loss:1.5128
Episode:1979 meanR:221.4700 R:500.0000 loss:16.5874
Episode:1980 meanR:221.4700 R:500.0000 loss:13.9946
Episode:1981 meanR:221.4700 R:500.0000 loss:10.9955
Episode:1982 meanR:221.4700 R:500.0000 loss:12.0619
Episode:1983 meanR:221.4700 R:500.0000 loss:17.4234
Episode:1984 meanR:224.2400 R:500.0000 loss:12.5880
Episode:1985 meanR:224.2400 R:500.0000 loss:12.7581
Episode:1986 meanR:224.2400 R:500.0000 loss:12.3079
Episode:1987 meanR:224.2400 R:500.0000 loss:17.4326
Episode:1988 meanR:

Episode:2127 meanR:272.4000 R:149.0000 loss:855.2007
Episode:2128 meanR:268.8200 R:142.0000 loss:883.1461
Episode:2129 meanR:267.7900 R:138.0000 loss:898.5370
Episode:2130 meanR:267.7500 R:128.0000 loss:1000.0914
Episode:2131 meanR:267.8900 R:134.0000 loss:883.1301
Episode:2132 meanR:267.8900 R:129.0000 loss:1077.6124
Episode:2133 meanR:268.0500 R:128.0000 loss:1057.5833
Episode:2134 meanR:268.2600 R:124.0000 loss:1058.1725
Episode:2135 meanR:268.5500 R:114.0000 loss:1123.9359
Episode:2136 meanR:267.4700 R:120.0000 loss:1371.4823
Episode:2137 meanR:268.1300 R:123.0000 loss:1159.8857
Episode:2138 meanR:268.4400 R:75.0000 loss:943.0972
Episode:2139 meanR:268.5500 R:126.0000 loss:1853.7974
Episode:2140 meanR:268.8300 R:129.0000 loss:1817.9319
Episode:2141 meanR:269.1800 R:136.0000 loss:1754.3126
Episode:2142 meanR:269.1700 R:119.0000 loss:1923.0643
Episode:2143 meanR:269.2900 R:130.0000 loss:1866.0007
Episode:2144 meanR:269.4500 R:128.0000 loss:1925.5392
Episode:2145 meanR:269.3200 R:132.

Episode:2282 meanR:386.5900 R:500.0000 loss:12.5125
Episode:2283 meanR:390.3000 R:500.0000 loss:14.6635
Episode:2284 meanR:393.7800 R:500.0000 loss:12.2105
Episode:2285 meanR:395.9500 R:385.0000 loss:23.7992
Episode:2286 meanR:399.3100 R:500.0000 loss:1.2359
Episode:2287 meanR:402.5800 R:500.0000 loss:18.9675
Episode:2288 meanR:404.9100 R:331.0000 loss:27.4774
Episode:2289 meanR:408.9900 R:500.0000 loss:3.6947
Episode:2290 meanR:412.6700 R:500.0000 loss:18.8684
Episode:2291 meanR:413.8400 R:261.0000 loss:32.9235
Episode:2292 meanR:417.0200 R:500.0000 loss:4.4352
Episode:2293 meanR:420.3800 R:500.0000 loss:12.9274
Episode:2294 meanR:424.1700 R:500.0000 loss:19.1516
Episode:2295 meanR:426.3900 R:500.0000 loss:18.1421
Episode:2296 meanR:429.8300 R:500.0000 loss:16.1010
Episode:2297 meanR:431.1300 R:500.0000 loss:15.0699
Episode:2298 meanR:432.8400 R:500.0000 loss:14.6091
Episode:2299 meanR:433.1700 R:500.0000 loss:14.7679
Episode:2300 meanR:433.1700 R:500.0000 loss:17.0040
Episode:2301 me

Episode:2441 meanR:358.0300 R:379.0000 loss:22.2128
Episode:2442 meanR:359.3900 R:251.0000 loss:32.3619
Episode:2443 meanR:360.4700 R:193.0000 loss:5.1040
Episode:2444 meanR:364.6700 R:500.0000 loss:5.3674
Episode:2445 meanR:366.4300 R:292.0000 loss:15.9863
Episode:2446 meanR:370.2300 R:500.0000 loss:8.7914
Episode:2447 meanR:373.9300 R:500.0000 loss:16.9404
Episode:2448 meanR:377.6300 R:500.0000 loss:15.9554
Episode:2449 meanR:381.5900 R:500.0000 loss:13.2699
Episode:2450 meanR:385.0500 R:500.0000 loss:14.6889
Episode:2451 meanR:388.2400 R:500.0000 loss:18.7668
Episode:2452 meanR:388.2400 R:500.0000 loss:16.0850
Episode:2453 meanR:388.2400 R:500.0000 loss:23.1332
Episode:2454 meanR:388.2400 R:500.0000 loss:16.2908
Episode:2455 meanR:391.4400 R:500.0000 loss:17.3916
Episode:2456 meanR:394.8000 R:500.0000 loss:16.0028
Episode:2457 meanR:397.9600 R:500.0000 loss:16.2543
Episode:2458 meanR:401.2100 R:500.0000 loss:14.1994
Episode:2459 meanR:404.4400 R:500.0000 loss:13.4589
Episode:2460 me

Episode:2600 meanR:378.1200 R:500.0000 loss:13.8334
Episode:2601 meanR:378.1200 R:500.0000 loss:19.7180
Episode:2602 meanR:378.1200 R:500.0000 loss:17.1464
Episode:2603 meanR:378.1200 R:500.0000 loss:18.8909
Episode:2604 meanR:378.1200 R:500.0000 loss:16.9283
Episode:2605 meanR:378.1200 R:500.0000 loss:16.4047
Episode:2606 meanR:378.1200 R:500.0000 loss:16.4628
Episode:2607 meanR:378.1200 R:500.0000 loss:17.4562
Episode:2608 meanR:378.1200 R:500.0000 loss:16.1146
Episode:2609 meanR:378.1200 R:500.0000 loss:13.9908
Episode:2610 meanR:378.1200 R:500.0000 loss:15.6542
Episode:2611 meanR:378.1200 R:500.0000 loss:13.1663
Episode:2612 meanR:374.4400 R:132.0000 loss:43.6212
Episode:2613 meanR:369.5200 R:8.0000 loss:14.9228
Episode:2614 meanR:364.6300 R:11.0000 loss:78.7284
Episode:2615 meanR:359.7500 R:12.0000 loss:108.4816
Episode:2616 meanR:355.4600 R:71.0000 loss:57.5979
Episode:2617 meanR:351.3200 R:86.0000 loss:25.0843
Episode:2618 meanR:349.1400 R:96.0000 loss:15.5524
Episode:2619 meanR

Episode:2758 meanR:470.8400 R:500.0000 loss:25.2220
Episode:2759 meanR:470.8400 R:500.0000 loss:24.3799
Episode:2760 meanR:470.8400 R:500.0000 loss:23.7072
Episode:2761 meanR:470.8400 R:500.0000 loss:22.3306
Episode:2762 meanR:470.8400 R:500.0000 loss:19.7789
Episode:2763 meanR:470.8400 R:500.0000 loss:21.0887
Episode:2764 meanR:470.8400 R:500.0000 loss:30.3065
Episode:2765 meanR:470.8400 R:500.0000 loss:24.0619
Episode:2766 meanR:470.8400 R:500.0000 loss:21.7156
Episode:2767 meanR:470.8400 R:500.0000 loss:28.5509
Episode:2768 meanR:470.8400 R:500.0000 loss:28.7576
Episode:2769 meanR:470.8400 R:500.0000 loss:22.7289
Episode:2770 meanR:470.8400 R:500.0000 loss:31.4283
Episode:2771 meanR:470.8400 R:500.0000 loss:30.5141
Episode:2772 meanR:474.1600 R:500.0000 loss:25.3225
Episode:2773 meanR:474.1600 R:500.0000 loss:23.6018
Episode:2774 meanR:474.1600 R:500.0000 loss:22.9169
Episode:2775 meanR:474.1600 R:500.0000 loss:32.6846
Episode:2776 meanR:474.1600 R:500.0000 loss:32.8410
Episode:2777

Episode:2916 meanR:202.2900 R:500.0000 loss:86.4857
Episode:2917 meanR:202.2900 R:500.0000 loss:15.4818
Episode:2918 meanR:202.2900 R:500.0000 loss:8.1881
Episode:2919 meanR:205.9200 R:500.0000 loss:6.8652
Episode:2920 meanR:205.9200 R:500.0000 loss:8.6519
Episode:2921 meanR:205.1100 R:419.0000 loss:21.3762
Episode:2922 meanR:205.1100 R:500.0000 loss:9.6000
Episode:2923 meanR:200.8400 R:73.0000 loss:48.1994
Episode:2924 meanR:198.4000 R:256.0000 loss:20.0137
Episode:2925 meanR:194.0300 R:63.0000 loss:7.2841
Episode:2926 meanR:189.8400 R:81.0000 loss:28.6034
Episode:2927 meanR:186.1000 R:126.0000 loss:10.6207
Episode:2928 meanR:181.6500 R:55.0000 loss:12.7065
Episode:2929 meanR:177.1300 R:48.0000 loss:26.5750
Episode:2930 meanR:178.3900 R:142.0000 loss:16.5866
Episode:2931 meanR:174.7300 R:134.0000 loss:5.9636
Episode:2932 meanR:174.7300 R:500.0000 loss:2.6711
Episode:2933 meanR:173.6800 R:395.0000 loss:6.7842
Episode:2934 meanR:173.6800 R:500.0000 loss:3.8037
Episode:2935 meanR:172.400

Episode:3076 meanR:254.2700 R:10.0000 loss:48.9318
Episode:3077 meanR:253.5700 R:9.0000 loss:36.9043
Episode:3078 meanR:252.8500 R:8.0000 loss:29.6136
Episode:3079 meanR:252.2000 R:10.0000 loss:32.7905
Episode:3080 meanR:251.3500 R:9.0000 loss:46.5243
Episode:3081 meanR:250.5900 R:8.0000 loss:53.3382
Episode:3082 meanR:249.8800 R:10.0000 loss:60.4562
Episode:3083 meanR:249.2400 R:10.0000 loss:69.6165
Episode:3084 meanR:248.6000 R:10.0000 loss:78.6376
Episode:3085 meanR:247.9300 R:10.0000 loss:89.2962
Episode:3086 meanR:247.4400 R:10.0000 loss:91.3074
Episode:3087 meanR:246.8700 R:10.0000 loss:26.0487
Episode:3088 meanR:245.9800 R:9.0000 loss:14.0099
Episode:3089 meanR:245.1300 R:9.0000 loss:19.2429
Episode:3090 meanR:244.3800 R:9.0000 loss:23.7636
Episode:3091 meanR:243.7900 R:9.0000 loss:28.8114
Episode:3092 meanR:243.0000 R:8.0000 loss:34.4856
Episode:3093 meanR:242.3300 R:10.0000 loss:33.5564
Episode:3094 meanR:241.7000 R:9.0000 loss:37.2311
Episode:3095 meanR:241.0800 R:10.0000 los

Episode:3240 meanR:9.8700 R:68.0000 loss:589.2590
Episode:3241 meanR:9.8500 R:9.0000 loss:1105.7502
Episode:3242 meanR:9.8400 R:9.0000 loss:1159.6455
Episode:3243 meanR:9.8400 R:10.0000 loss:1191.5400
Episode:3244 meanR:9.8300 R:9.0000 loss:1155.5493
Episode:3245 meanR:9.8300 R:9.0000 loss:1106.5569
Episode:3246 meanR:9.8400 R:10.0000 loss:1073.3020
Episode:3247 meanR:9.8300 R:8.0000 loss:638.5466
Episode:3248 meanR:9.8200 R:9.0000 loss:190.8757
Episode:3249 meanR:9.8200 R:10.0000 loss:227.1645
Episode:3250 meanR:9.8200 R:9.0000 loss:261.9593
Episode:3251 meanR:9.8400 R:11.0000 loss:274.7426
Episode:3252 meanR:9.8500 R:9.0000 loss:321.5275
Episode:3253 meanR:9.8500 R:10.0000 loss:335.5562
Episode:3254 meanR:9.8600 R:10.0000 loss:334.5485
Episode:3255 meanR:9.8500 R:9.0000 loss:349.4679
Episode:3256 meanR:9.8600 R:9.0000 loss:354.9635
Episode:3257 meanR:9.8700 R:10.0000 loss:359.9340
Episode:3258 meanR:10.0100 R:24.0000 loss:429.2357
Episode:3259 meanR:10.0000 R:9.0000 loss:518.5564
Epi

Episode:3403 meanR:11.6400 R:10.0000 loss:621.8044
Episode:3404 meanR:11.6400 R:9.0000 loss:640.0967
Episode:3405 meanR:11.6400 R:10.0000 loss:632.6816
Episode:3406 meanR:11.6600 R:10.0000 loss:649.9931
Episode:3407 meanR:11.6500 R:9.0000 loss:647.6259
Episode:3408 meanR:11.7700 R:21.0000 loss:697.6985
Episode:3409 meanR:11.9600 R:29.0000 loss:729.2465
Episode:3410 meanR:11.9600 R:9.0000 loss:832.3474
Episode:3411 meanR:12.0400 R:17.0000 loss:911.7537
Episode:3412 meanR:12.4100 R:46.0000 loss:1045.5112
Episode:3413 meanR:12.2900 R:13.0000 loss:1087.4585
Episode:3414 meanR:12.2900 R:10.0000 loss:984.8372
Episode:3415 meanR:12.2100 R:15.0000 loss:844.3379
Episode:3416 meanR:17.1200 R:500.0000 loss:2668.3391
Episode:3417 meanR:17.1400 R:10.0000 loss:3330.8550
Episode:3418 meanR:17.0000 R:10.0000 loss:3171.7703
Episode:3419 meanR:17.0000 R:10.0000 loss:2972.3035
Episode:3420 meanR:17.2500 R:34.0000 loss:2594.3865
Episode:3421 meanR:17.4900 R:34.0000 loss:2037.0380
Episode:3422 meanR:17.500

Episode:3561 meanR:68.6600 R:90.0000 loss:1380.7880
Episode:3562 meanR:67.9600 R:12.0000 loss:1537.4362
Episode:3563 meanR:64.7800 R:93.0000 loss:1286.4961
Episode:3564 meanR:65.2400 R:85.0000 loss:1157.6471
Episode:3565 meanR:64.9400 R:10.0000 loss:1389.4646
Episode:3566 meanR:65.8300 R:142.0000 loss:1217.6686
Episode:3567 meanR:63.0800 R:91.0000 loss:91.5623
Episode:3568 meanR:61.2400 R:88.0000 loss:62.9729
Episode:3569 meanR:61.4500 R:68.0000 loss:113.6013
Episode:3570 meanR:60.5900 R:11.0000 loss:126.8602
Episode:3571 meanR:60.2500 R:43.0000 loss:124.1480
Episode:3572 meanR:59.4500 R:13.0000 loss:167.4257
Episode:3573 meanR:57.3900 R:12.0000 loss:181.8576
Episode:3574 meanR:57.4800 R:42.0000 loss:235.8837
Episode:3575 meanR:52.6000 R:12.0000 loss:214.6286
Episode:3576 meanR:52.6000 R:12.0000 loss:138.8555
Episode:3577 meanR:52.4900 R:40.0000 loss:236.8571
Episode:3578 meanR:51.2500 R:12.0000 loss:298.8075
Episode:3579 meanR:51.5800 R:45.0000 loss:306.4034
Episode:3580 meanR:51.2600

Episode:3718 meanR:107.2300 R:102.0000 loss:1098.0903
Episode:3719 meanR:103.2300 R:100.0000 loss:1012.0223
Episode:3720 meanR:99.6000 R:137.0000 loss:931.0997
Episode:3721 meanR:99.0200 R:107.0000 loss:1394.8618
Episode:3722 meanR:99.2900 R:110.0000 loss:1172.1669
Episode:3723 meanR:102.9200 R:432.0000 loss:1137.7300
Episode:3724 meanR:104.8800 R:280.0000 loss:775.6511
Episode:3725 meanR:106.0900 R:198.0000 loss:695.5633
Episode:3726 meanR:110.3300 R:500.0000 loss:187.9822
Episode:3727 meanR:114.5600 R:500.0000 loss:11.6570
Episode:3728 meanR:118.7800 R:500.0000 loss:13.0107
Episode:3729 meanR:123.0600 R:500.0000 loss:15.8448
Episode:3730 meanR:127.8900 R:500.0000 loss:16.4844
Episode:3731 meanR:132.1700 R:500.0000 loss:12.8425
Episode:3732 meanR:136.4400 R:500.0000 loss:15.6728
Episode:3733 meanR:140.6800 R:500.0000 loss:13.5845
Episode:3734 meanR:144.0100 R:500.0000 loss:16.9782
Episode:3735 meanR:148.3400 R:500.0000 loss:15.2010
Episode:3736 meanR:152.6700 R:500.0000 loss:14.5461
E

Episode:3876 meanR:384.8400 R:500.0000 loss:165.3040
Episode:3877 meanR:384.8400 R:500.0000 loss:13.7413
Episode:3878 meanR:383.1100 R:327.0000 loss:278.4303
Episode:3879 meanR:381.3200 R:321.0000 loss:245.7599
Episode:3880 meanR:379.9500 R:363.0000 loss:76.4814
Episode:3881 meanR:378.5000 R:355.0000 loss:73.1224
Episode:3882 meanR:376.9400 R:344.0000 loss:67.9629
Episode:3883 meanR:375.1700 R:323.0000 loss:62.5391
Episode:3884 meanR:373.2700 R:310.0000 loss:107.1892
Episode:3885 meanR:372.2900 R:402.0000 loss:149.6291
Episode:3886 meanR:371.5400 R:425.0000 loss:130.3033
Episode:3887 meanR:370.9400 R:440.0000 loss:96.5267
Episode:3888 meanR:370.9400 R:500.0000 loss:128.1819
Episode:3889 meanR:370.9400 R:500.0000 loss:14.4630
Episode:3890 meanR:369.2200 R:328.0000 loss:248.7093
Episode:3891 meanR:369.2200 R:500.0000 loss:583.1298
Episode:3892 meanR:369.2200 R:500.0000 loss:17.2462
Episode:3893 meanR:369.2200 R:500.0000 loss:13.7898
Episode:3894 meanR:369.2200 R:500.0000 loss:9.5380
Epis

Episode:4034 meanR:436.2700 R:500.0000 loss:17.5690
Episode:4035 meanR:437.6300 R:500.0000 loss:16.1906
Episode:4036 meanR:439.2100 R:500.0000 loss:15.6959
Episode:4037 meanR:439.2100 R:500.0000 loss:14.3267
Episode:4038 meanR:439.2100 R:500.0000 loss:14.0335
Episode:4039 meanR:439.2100 R:500.0000 loss:12.4935
Episode:4040 meanR:441.2600 R:500.0000 loss:13.0444
Episode:4041 meanR:441.2600 R:500.0000 loss:14.4865
Episode:4042 meanR:438.0500 R:179.0000 loss:28.5761
Episode:4043 meanR:435.6000 R:52.0000 loss:9.3902
Episode:4044 meanR:432.8500 R:46.0000 loss:15.4700
Episode:4045 meanR:429.9700 R:34.0000 loss:18.8025
Episode:4046 meanR:425.0700 R:10.0000 loss:16.8022
Episode:4047 meanR:421.9800 R:9.0000 loss:31.2044
Episode:4048 meanR:419.4300 R:10.0000 loss:41.3947
Episode:4049 meanR:414.5500 R:12.0000 loss:63.5948
Episode:4050 meanR:411.8400 R:10.0000 loss:51.6557
Episode:4051 meanR:407.1000 R:26.0000 loss:26.9101
Episode:4052 meanR:402.3700 R:27.0000 loss:80.4942
Episode:4053 meanR:397.6

Episode:4192 meanR:402.0500 R:500.0000 loss:18.0040
Episode:4193 meanR:402.0500 R:500.0000 loss:20.4524
Episode:4194 meanR:402.0500 R:500.0000 loss:18.2117
Episode:4195 meanR:402.0500 R:500.0000 loss:19.2365
Episode:4196 meanR:402.0500 R:500.0000 loss:18.2863
Episode:4197 meanR:402.0500 R:500.0000 loss:22.0190
Episode:4198 meanR:397.1500 R:10.0000 loss:86.5358
Episode:4199 meanR:392.2600 R:11.0000 loss:148.7958
Episode:4200 meanR:387.3700 R:11.0000 loss:127.4793
Episode:4201 meanR:383.9400 R:157.0000 loss:37.7112
Episode:4202 meanR:380.0500 R:111.0000 loss:13.4813
Episode:4203 meanR:380.1300 R:124.0000 loss:42.3249
Episode:4204 meanR:380.2600 R:85.0000 loss:32.1256
Episode:4205 meanR:380.5000 R:66.0000 loss:55.8098
Episode:4206 meanR:380.8200 R:65.0000 loss:59.3328
Episode:4207 meanR:381.3900 R:93.0000 loss:19.2428
Episode:4208 meanR:381.6300 R:41.0000 loss:14.0616
Episode:4209 meanR:381.6800 R:36.0000 loss:19.0169
Episode:4210 meanR:381.5700 R:30.0000 loss:25.2450
Episode:4211 meanR:3

Episode:4352 meanR:423.3100 R:500.0000 loss:35.1926
Episode:4353 meanR:418.4500 R:14.0000 loss:55.5299
Episode:4354 meanR:415.2700 R:13.0000 loss:121.5672
Episode:4355 meanR:411.8600 R:13.0000 loss:181.0562
Episode:4356 meanR:412.8200 R:500.0000 loss:279.2469
Episode:4357 meanR:412.8200 R:500.0000 loss:19.0484
Episode:4358 meanR:416.5200 R:500.0000 loss:14.4973
Episode:4359 meanR:419.0800 R:500.0000 loss:17.0379
Episode:4360 meanR:420.5100 R:500.0000 loss:17.6552
Episode:4361 meanR:421.8300 R:500.0000 loss:17.6702
Episode:4362 meanR:424.2700 R:500.0000 loss:17.1327
Episode:4363 meanR:425.7400 R:500.0000 loss:17.1367
Episode:4364 meanR:428.3000 R:500.0000 loss:17.6242
Episode:4365 meanR:428.5400 R:500.0000 loss:17.8190
Episode:4366 meanR:430.4400 R:500.0000 loss:16.7243
Episode:4367 meanR:432.1500 R:500.0000 loss:16.4215
Episode:4368 meanR:432.2000 R:500.0000 loss:15.5604
Episode:4369 meanR:432.2000 R:500.0000 loss:15.2024
Episode:4370 meanR:433.0200 R:500.0000 loss:15.6525
Episode:4371

Episode:4511 meanR:296.5600 R:88.0000 loss:1.4824
Episode:4512 meanR:292.5100 R:95.0000 loss:1.2971
Episode:4513 meanR:288.5800 R:107.0000 loss:1.4944
Episode:4514 meanR:284.9500 R:137.0000 loss:0.9758
Episode:4515 meanR:281.4600 R:151.0000 loss:0.9653
Episode:4516 meanR:277.8000 R:134.0000 loss:0.6519
Episode:4517 meanR:274.1800 R:138.0000 loss:0.7400
Episode:4518 meanR:270.3900 R:121.0000 loss:0.6305
Episode:4519 meanR:266.6700 R:128.0000 loss:0.7429
Episode:4520 meanR:267.0200 R:162.0000 loss:0.7135
Episode:4521 meanR:266.3900 R:240.0000 loss:0.9212
Episode:4522 meanR:267.0600 R:195.0000 loss:0.6315
Episode:4523 meanR:264.6300 R:257.0000 loss:0.4648
Episode:4524 meanR:262.2700 R:264.0000 loss:0.6403
Episode:4525 meanR:262.2700 R:500.0000 loss:0.7416
Episode:4526 meanR:260.8200 R:166.0000 loss:38.3397
Episode:4527 meanR:263.5600 R:500.0000 loss:1.8484
Episode:4528 meanR:265.2200 R:366.0000 loss:19.8316
Episode:4529 meanR:265.2600 R:214.0000 loss:15.7010
Episode:4530 meanR:266.6700 R:

Episode:4669 meanR:467.7200 R:500.0000 loss:13.5312
Episode:4670 meanR:467.7200 R:500.0000 loss:13.3972
Episode:4671 meanR:467.7200 R:500.0000 loss:13.4687
Episode:4672 meanR:467.7200 R:500.0000 loss:13.2123
Episode:4673 meanR:467.7200 R:500.0000 loss:12.0485
Episode:4674 meanR:467.7200 R:500.0000 loss:14.1018
Episode:4675 meanR:467.7200 R:500.0000 loss:13.9584
Episode:4676 meanR:467.7200 R:500.0000 loss:14.2664
Episode:4677 meanR:467.7200 R:500.0000 loss:15.3096
Episode:4678 meanR:467.7200 R:500.0000 loss:16.9869
Episode:4679 meanR:467.7200 R:500.0000 loss:16.6286
Episode:4680 meanR:467.7200 R:500.0000 loss:15.2352
Episode:4681 meanR:466.2300 R:351.0000 loss:243.3427
Episode:4682 meanR:465.2900 R:406.0000 loss:645.7288
Episode:4683 meanR:466.1500 R:469.0000 loss:1064.0508
Episode:4684 meanR:465.9500 R:480.0000 loss:1107.4982
Episode:4685 meanR:465.9500 R:500.0000 loss:314.9479
Episode:4686 meanR:465.9500 R:500.0000 loss:13.8027
Episode:4687 meanR:465.9500 R:500.0000 loss:12.4008
Episo

Episode:4825 meanR:225.7200 R:215.0000 loss:861.5836
Episode:4826 meanR:221.4700 R:75.0000 loss:1185.0629
Episode:4827 meanR:218.2700 R:180.0000 loss:1234.0135
Episode:4828 meanR:213.7900 R:52.0000 loss:716.1951
Episode:4829 meanR:209.3800 R:59.0000 loss:997.1424
Episode:4830 meanR:206.4500 R:207.0000 loss:1139.2291
Episode:4831 meanR:205.6900 R:181.0000 loss:1746.5232
Episode:4832 meanR:204.3900 R:126.0000 loss:962.4808
Episode:4833 meanR:203.5700 R:185.0000 loss:1003.8818
Episode:4834 meanR:200.5200 R:169.0000 loss:1931.3660
Episode:4835 meanR:195.9000 R:38.0000 loss:1027.4030
Episode:4836 meanR:194.1100 R:38.0000 loss:1241.6622
Episode:4837 meanR:192.9300 R:203.0000 loss:855.8108
Episode:4838 meanR:188.7200 R:46.0000 loss:779.9087
Episode:4839 meanR:189.0000 R:239.0000 loss:958.4959
Episode:4840 meanR:189.2300 R:500.0000 loss:1430.4025
Episode:4841 meanR:188.7000 R:243.0000 loss:1149.3959
Episode:4842 meanR:185.6100 R:191.0000 loss:1164.6038
Episode:4843 meanR:180.9700 R:36.0000 los

Episode:4980 meanR:194.6500 R:500.0000 loss:13.0602
Episode:4981 meanR:198.1100 R:500.0000 loss:14.1785
Episode:4982 meanR:202.6500 R:500.0000 loss:14.0865
Episode:4983 meanR:205.5600 R:500.0000 loss:14.0781
Episode:4984 meanR:208.5700 R:500.0000 loss:14.0218
Episode:4985 meanR:213.0100 R:500.0000 loss:15.2047
Episode:4986 meanR:213.0100 R:500.0000 loss:14.9382
Episode:4987 meanR:215.9400 R:500.0000 loss:14.6149
Episode:4988 meanR:215.9400 R:500.0000 loss:15.1518
Episode:4989 meanR:220.3800 R:500.0000 loss:13.1404
Episode:4990 meanR:225.0000 R:500.0000 loss:14.8202
Episode:4991 meanR:229.6600 R:500.0000 loss:14.7203
Episode:4992 meanR:233.1300 R:500.0000 loss:13.8653
Episode:4993 meanR:236.3600 R:500.0000 loss:14.7090
Episode:4994 meanR:239.6400 R:500.0000 loss:16.0147
Episode:4995 meanR:243.0700 R:500.0000 loss:11.6118
Episode:4996 meanR:244.3100 R:192.0000 loss:47.8684
Episode:4997 meanR:242.5700 R:10.0000 loss:42.6557
Episode:4998 meanR:242.2500 R:12.0000 loss:74.9898
Episode:4999 m

Episode:5138 meanR:430.1600 R:500.0000 loss:5.9480
Episode:5139 meanR:430.1600 R:500.0000 loss:2.8184
Episode:5140 meanR:430.1600 R:500.0000 loss:17.2872
Episode:5141 meanR:430.1600 R:500.0000 loss:14.2135
Episode:5142 meanR:425.2700 R:11.0000 loss:76.0602
Episode:5143 meanR:420.4000 R:13.0000 loss:126.2410
Episode:5144 meanR:415.5100 R:11.0000 loss:197.2821
Episode:5145 meanR:410.8100 R:30.0000 loss:212.4244
Episode:5146 meanR:405.9300 R:12.0000 loss:199.2189
Episode:5147 meanR:406.3600 R:500.0000 loss:29.8156
Episode:5148 meanR:406.9200 R:240.0000 loss:8.7112
Episode:5149 meanR:407.8800 R:240.0000 loss:8.7046
Episode:5150 meanR:409.1800 R:279.0000 loss:3.6921
Episode:5151 meanR:412.5200 R:500.0000 loss:3.6413
Episode:5152 meanR:415.1200 R:500.0000 loss:6.0762
Episode:5153 meanR:415.1200 R:500.0000 loss:20.3102
Episode:5154 meanR:418.1800 R:500.0000 loss:19.1102
Episode:5155 meanR:416.7600 R:358.0000 loss:11.7725
Episode:5156 meanR:416.7600 R:500.0000 loss:4.2226
Episode:5157 meanR:41

Episode:5297 meanR:365.8300 R:500.0000 loss:10.7087
Episode:5298 meanR:365.8300 R:500.0000 loss:9.2090
Episode:5299 meanR:365.8300 R:500.0000 loss:15.0746
Episode:5300 meanR:365.8300 R:500.0000 loss:14.9017
Episode:5301 meanR:365.8300 R:500.0000 loss:18.6953
Episode:5302 meanR:365.8300 R:500.0000 loss:16.4436
Episode:5303 meanR:365.8300 R:500.0000 loss:17.7911
Episode:5304 meanR:365.8300 R:500.0000 loss:18.1205
Episode:5305 meanR:365.8300 R:500.0000 loss:16.6420
Episode:5306 meanR:365.8300 R:500.0000 loss:17.2909
Episode:5307 meanR:365.8300 R:500.0000 loss:16.3833
Episode:5308 meanR:360.9400 R:11.0000 loss:82.6394
Episode:5309 meanR:360.9400 R:500.0000 loss:21.2510
Episode:5310 meanR:363.7500 R:500.0000 loss:14.1565
Episode:5311 meanR:363.7500 R:500.0000 loss:13.5757
Episode:5312 meanR:363.7500 R:500.0000 loss:12.6375
Episode:5313 meanR:363.7500 R:500.0000 loss:13.5822
Episode:5314 meanR:361.2100 R:246.0000 loss:21.0725
Episode:5315 meanR:359.0500 R:284.0000 loss:7.9820
Episode:5316 me

Episode:5456 meanR:440.7700 R:500.0000 loss:21.9582
Episode:5457 meanR:440.7700 R:500.0000 loss:19.6364
Episode:5458 meanR:440.7700 R:500.0000 loss:16.4725
Episode:5459 meanR:440.7700 R:500.0000 loss:19.0670
Episode:5460 meanR:440.7700 R:500.0000 loss:14.8492
Episode:5461 meanR:440.7700 R:500.0000 loss:17.0633
Episode:5462 meanR:436.3400 R:57.0000 loss:68.4071
Episode:5463 meanR:436.3400 R:500.0000 loss:16.0167
Episode:5464 meanR:436.3400 R:500.0000 loss:14.7691
Episode:5465 meanR:438.4600 R:500.0000 loss:12.1156
Episode:5466 meanR:440.2400 R:500.0000 loss:13.5437
Episode:5467 meanR:440.4700 R:500.0000 loss:16.4402
Episode:5468 meanR:440.4700 R:500.0000 loss:10.2860
Episode:5469 meanR:440.4700 R:500.0000 loss:14.4714
Episode:5470 meanR:440.6500 R:500.0000 loss:14.5434
Episode:5471 meanR:441.5300 R:500.0000 loss:14.6705
Episode:5472 meanR:442.1500 R:500.0000 loss:10.9947
Episode:5473 meanR:443.7100 R:500.0000 loss:10.8089
Episode:5474 meanR:443.7100 R:500.0000 loss:14.2794
Episode:5475 

Episode:5615 meanR:256.9900 R:11.0000 loss:29.5704
Episode:5616 meanR:252.0900 R:10.0000 loss:27.3997
Episode:5617 meanR:247.2100 R:12.0000 loss:44.0668
Episode:5618 meanR:242.3200 R:11.0000 loss:40.1139
Episode:5619 meanR:237.4200 R:10.0000 loss:47.0132
Episode:5620 meanR:232.5400 R:12.0000 loss:50.4532
Episode:5621 meanR:227.6500 R:11.0000 loss:63.5466
Episode:5622 meanR:222.7700 R:12.0000 loss:59.1780
Episode:5623 meanR:217.8900 R:12.0000 loss:63.5537
Episode:5624 meanR:212.9900 R:10.0000 loss:54.0776
Episode:5625 meanR:208.1000 R:11.0000 loss:53.1476
Episode:5626 meanR:203.2000 R:10.0000 loss:52.0988
Episode:5627 meanR:198.3100 R:11.0000 loss:47.2246
Episode:5628 meanR:193.4200 R:11.0000 loss:37.5181
Episode:5629 meanR:188.5300 R:11.0000 loss:33.7462
Episode:5630 meanR:183.6500 R:12.0000 loss:28.6797
Episode:5631 meanR:178.7700 R:12.0000 loss:30.1585
Episode:5632 meanR:173.8800 R:11.0000 loss:22.8504
Episode:5633 meanR:168.9900 R:11.0000 loss:22.7982
Episode:5634 meanR:164.1100 R:1

Episode:5779 meanR:10.8000 R:11.0000 loss:11.6879
Episode:5780 meanR:10.8600 R:17.0000 loss:11.5489
Episode:5781 meanR:10.8600 R:11.0000 loss:11.1703
Episode:5782 meanR:10.8900 R:13.0000 loss:8.1759
Episode:5783 meanR:10.8900 R:12.0000 loss:7.7015
Episode:5784 meanR:10.9000 R:11.0000 loss:7.4816
Episode:5785 meanR:10.9000 R:11.0000 loss:7.6652
Episode:5786 meanR:10.9000 R:11.0000 loss:7.1203
Episode:5787 meanR:10.9000 R:11.0000 loss:7.9821
Episode:5788 meanR:10.9100 R:12.0000 loss:10.0587
Episode:5789 meanR:10.9200 R:12.0000 loss:11.6719
Episode:5790 meanR:10.9400 R:12.0000 loss:13.3229
Episode:5791 meanR:11.0200 R:20.0000 loss:13.9595
Episode:5792 meanR:11.0400 R:14.0000 loss:14.4005
Episode:5793 meanR:11.1100 R:18.0000 loss:14.5031
Episode:5794 meanR:11.1400 R:14.0000 loss:11.6802
Episode:5795 meanR:11.1500 R:11.0000 loss:10.1930
Episode:5796 meanR:11.1800 R:13.0000 loss:9.4904
Episode:5797 meanR:11.2600 R:19.0000 loss:8.4617
Episode:5798 meanR:11.3900 R:23.0000 loss:7.3853
Episode:5

Episode:5940 meanR:414.0100 R:500.0000 loss:16.8268
Episode:5941 meanR:414.0100 R:500.0000 loss:16.9411
Episode:5942 meanR:418.1000 R:500.0000 loss:16.0417
Episode:5943 meanR:416.2100 R:311.0000 loss:10.2611
Episode:5944 meanR:413.5000 R:229.0000 loss:1.3033
Episode:5945 meanR:413.5000 R:500.0000 loss:0.7101
Episode:5946 meanR:411.0900 R:259.0000 loss:31.8061
Episode:5947 meanR:411.0900 R:500.0000 loss:4.0807
Episode:5948 meanR:407.3700 R:128.0000 loss:43.3540
Episode:5949 meanR:403.1200 R:75.0000 loss:271.4441
Episode:5950 meanR:398.7100 R:59.0000 loss:277.7843
Episode:5951 meanR:394.3400 R:63.0000 loss:27.3069
Episode:5952 meanR:389.8600 R:52.0000 loss:35.4863
Episode:5953 meanR:385.5900 R:73.0000 loss:24.7052
Episode:5954 meanR:381.7000 R:111.0000 loss:20.3319
Episode:5955 meanR:381.7000 R:500.0000 loss:9.4929
Episode:5956 meanR:381.7000 R:500.0000 loss:18.6658
Episode:5957 meanR:379.7600 R:306.0000 loss:30.5077
Episode:5958 meanR:384.6500 R:500.0000 loss:223.2814
Episode:5959 meanR

Episode:6098 meanR:445.6200 R:500.0000 loss:18.3193
Episode:6099 meanR:447.6000 R:500.0000 loss:16.2376
Episode:6100 meanR:450.1700 R:500.0000 loss:15.8134
Episode:6101 meanR:450.1700 R:500.0000 loss:14.9521
Episode:6102 meanR:450.1700 R:500.0000 loss:13.1381
Episode:6103 meanR:450.1700 R:500.0000 loss:18.5187
Episode:6104 meanR:450.1700 R:500.0000 loss:11.2652
Episode:6105 meanR:450.1700 R:500.0000 loss:12.4047
Episode:6106 meanR:450.1700 R:500.0000 loss:10.6330
Episode:6107 meanR:450.1700 R:500.0000 loss:14.0869
Episode:6108 meanR:450.1700 R:500.0000 loss:17.9093
Episode:6109 meanR:450.1700 R:500.0000 loss:16.8259
Episode:6110 meanR:450.1700 R:500.0000 loss:16.6323
Episode:6111 meanR:450.1700 R:500.0000 loss:15.2996
Episode:6112 meanR:450.1700 R:500.0000 loss:13.3209
Episode:6113 meanR:450.1700 R:500.0000 loss:12.2252
Episode:6114 meanR:450.1700 R:500.0000 loss:14.2493
Episode:6115 meanR:450.1700 R:500.0000 loss:16.9666
Episode:6116 meanR:450.1700 R:500.0000 loss:14.9408
Episode:6117

Episode:6256 meanR:433.0700 R:500.0000 loss:18.7947
Episode:6257 meanR:433.0700 R:500.0000 loss:17.3100
Episode:6258 meanR:433.0700 R:500.0000 loss:16.6763
Episode:6259 meanR:433.0700 R:500.0000 loss:11.4194
Episode:6260 meanR:433.0700 R:500.0000 loss:14.7015
Episode:6261 meanR:433.0700 R:500.0000 loss:15.9371
Episode:6262 meanR:433.0700 R:500.0000 loss:13.3170
Episode:6263 meanR:433.0700 R:500.0000 loss:10.6551
Episode:6264 meanR:433.0700 R:500.0000 loss:18.6788
Episode:6265 meanR:433.0700 R:500.0000 loss:10.6068
Episode:6266 meanR:433.0700 R:500.0000 loss:17.2991
Episode:6267 meanR:433.0700 R:500.0000 loss:5.8880
Episode:6268 meanR:431.6100 R:354.0000 loss:26.1753
Episode:6269 meanR:427.4800 R:87.0000 loss:48.5116
Episode:6270 meanR:427.4800 R:500.0000 loss:5.4950
Episode:6271 meanR:423.0100 R:53.0000 loss:69.8344
Episode:6272 meanR:418.4100 R:40.0000 loss:69.3336
Episode:6273 meanR:417.2300 R:382.0000 loss:14.8616
Episode:6274 meanR:412.8200 R:59.0000 loss:51.9782
Episode:6275 meanR

Episode:6415 meanR:414.4400 R:500.0000 loss:17.5640
Episode:6416 meanR:414.4400 R:500.0000 loss:17.0310
Episode:6417 meanR:414.4400 R:500.0000 loss:15.6308
Episode:6418 meanR:414.4400 R:500.0000 loss:18.9798
Episode:6419 meanR:414.4400 R:500.0000 loss:17.2016
Episode:6420 meanR:414.4400 R:500.0000 loss:15.3432
Episode:6421 meanR:414.4400 R:500.0000 loss:16.9668
Episode:6422 meanR:414.4400 R:500.0000 loss:13.1526
Episode:6423 meanR:414.4400 R:500.0000 loss:22.8868
Episode:6424 meanR:414.4400 R:500.0000 loss:16.1853
Episode:6425 meanR:414.4400 R:500.0000 loss:17.3252
Episode:6426 meanR:414.4400 R:500.0000 loss:15.3369
Episode:6427 meanR:414.4400 R:500.0000 loss:14.1995
Episode:6428 meanR:412.5600 R:312.0000 loss:22.0761
Episode:6429 meanR:415.1900 R:500.0000 loss:3.3376
Episode:6430 meanR:414.1700 R:12.0000 loss:73.6380
Episode:6431 meanR:417.1700 R:500.0000 loss:36.7789
Episode:6432 meanR:417.1700 R:500.0000 loss:14.7171
Episode:6433 meanR:418.1300 R:500.0000 loss:16.1081
Episode:6434 m

Episode:6573 meanR:136.7900 R:71.0000 loss:28.2131
Episode:6574 meanR:136.7300 R:65.0000 loss:49.6789
Episode:6575 meanR:137.2200 R:86.0000 loss:49.8490
Episode:6576 meanR:137.4500 R:74.0000 loss:33.6119
Episode:6577 meanR:137.6500 R:80.0000 loss:33.6004
Episode:6578 meanR:137.9400 R:66.0000 loss:42.4241
Episode:6579 meanR:138.2200 R:88.0000 loss:41.2704
Episode:6580 meanR:138.3500 R:72.0000 loss:36.4021
Episode:6581 meanR:138.6500 R:89.0000 loss:41.8304
Episode:6582 meanR:143.3400 R:500.0000 loss:15.2479
Episode:6583 meanR:147.9800 R:500.0000 loss:15.8376
Episode:6584 meanR:152.6900 R:500.0000 loss:16.8839
Episode:6585 meanR:153.2200 R:97.0000 loss:53.1440
Episode:6586 meanR:157.8000 R:500.0000 loss:13.2300
Episode:6587 meanR:162.2400 R:500.0000 loss:15.6197
Episode:6588 meanR:166.6100 R:500.0000 loss:15.1273
Episode:6589 meanR:171.2800 R:500.0000 loss:15.9579
Episode:6590 meanR:176.0000 R:500.0000 loss:13.9969
Episode:6591 meanR:180.7000 R:500.0000 loss:15.7399
Episode:6592 meanR:185

Episode:6732 meanR:313.6400 R:124.0000 loss:4.2871
Episode:6733 meanR:309.8100 R:117.0000 loss:16.4833
Episode:6734 meanR:305.9200 R:111.0000 loss:14.3713
Episode:6735 meanR:302.2500 R:133.0000 loss:14.9056
Episode:6736 meanR:298.5600 R:131.0000 loss:61.2481
Episode:6737 meanR:294.5600 R:100.0000 loss:106.0795
Episode:6738 meanR:290.9600 R:140.0000 loss:43.8464
Episode:6739 meanR:290.9600 R:500.0000 loss:2.6726
Episode:6740 meanR:290.9600 R:500.0000 loss:15.9134
Episode:6741 meanR:290.9600 R:500.0000 loss:14.5306
Episode:6742 meanR:290.9600 R:500.0000 loss:12.6858
Episode:6743 meanR:290.9600 R:500.0000 loss:11.6054
Episode:6744 meanR:290.9600 R:500.0000 loss:13.6888
Episode:6745 meanR:290.9600 R:500.0000 loss:12.4171
Episode:6746 meanR:290.9600 R:500.0000 loss:9.3861
Episode:6747 meanR:290.9600 R:500.0000 loss:11.4013
Episode:6748 meanR:289.6200 R:366.0000 loss:13.6073
Episode:6749 meanR:289.6200 R:500.0000 loss:3.5585
Episode:6750 meanR:289.6200 R:500.0000 loss:17.0971
Episode:6751 me

Episode:6891 meanR:409.5100 R:179.0000 loss:15.3279
Episode:6892 meanR:412.6000 R:500.0000 loss:6.0299
Episode:6893 meanR:415.7500 R:500.0000 loss:14.1984
Episode:6894 meanR:419.4800 R:500.0000 loss:13.9284
Episode:6895 meanR:419.4800 R:500.0000 loss:14.2959
Episode:6896 meanR:419.4800 R:500.0000 loss:12.3395
Episode:6897 meanR:421.2400 R:500.0000 loss:16.0660
Episode:6898 meanR:421.2400 R:500.0000 loss:16.0964
Episode:6899 meanR:421.2400 R:500.0000 loss:16.2897
Episode:6900 meanR:421.2400 R:500.0000 loss:16.7843
Episode:6901 meanR:421.2400 R:500.0000 loss:15.3955
Episode:6902 meanR:421.2400 R:500.0000 loss:14.8116
Episode:6903 meanR:421.2400 R:500.0000 loss:18.3594
Episode:6904 meanR:424.1700 R:500.0000 loss:17.2941
Episode:6905 meanR:424.1700 R:500.0000 loss:18.5413
Episode:6906 meanR:424.1700 R:500.0000 loss:17.5289
Episode:6907 meanR:424.1700 R:500.0000 loss:16.3538
Episode:6908 meanR:424.6200 R:500.0000 loss:17.6968
Episode:6909 meanR:424.6200 R:500.0000 loss:16.9825
Episode:6910 

Episode:7050 meanR:410.4100 R:298.0000 loss:3.2009
Episode:7051 meanR:413.2600 R:500.0000 loss:1.6129
Episode:7052 meanR:413.3100 R:223.0000 loss:34.9236
Episode:7053 meanR:413.8300 R:276.0000 loss:4.7335
Episode:7054 meanR:416.5100 R:500.0000 loss:2.4062
Episode:7055 meanR:419.4600 R:500.0000 loss:14.1879
Episode:7056 meanR:421.3900 R:377.0000 loss:27.2357
Episode:7057 meanR:423.8500 R:500.0000 loss:2.9561
Episode:7058 meanR:421.1700 R:232.0000 loss:12.4404
Episode:7059 meanR:421.1700 R:500.0000 loss:0.6279
Episode:7060 meanR:423.7300 R:500.0000 loss:16.0452
Episode:7061 meanR:426.4000 R:500.0000 loss:15.9974
Episode:7062 meanR:429.3400 R:500.0000 loss:16.2521
Episode:7063 meanR:432.2500 R:500.0000 loss:19.2713
Episode:7064 meanR:431.2000 R:177.0000 loss:43.1624
Episode:7065 meanR:431.1700 R:137.0000 loss:3.8773
Episode:7066 meanR:433.7300 R:500.0000 loss:1.3964
Episode:7067 meanR:435.6400 R:377.0000 loss:20.1691
Episode:7068 meanR:435.6400 R:500.0000 loss:10.8571
Episode:7069 meanR:4

Episode:7209 meanR:436.3600 R:500.0000 loss:16.0487
Episode:7210 meanR:436.3600 R:500.0000 loss:15.4246
Episode:7211 meanR:434.3200 R:296.0000 loss:30.0980
Episode:7212 meanR:430.6600 R:134.0000 loss:4.0378
Episode:7213 meanR:426.9700 R:131.0000 loss:18.8098
Episode:7214 meanR:423.2700 R:130.0000 loss:7.4916
Episode:7215 meanR:419.7500 R:148.0000 loss:7.1083
Episode:7216 meanR:420.7000 R:500.0000 loss:4.2424
Episode:7217 meanR:419.9600 R:426.0000 loss:17.3645
Episode:7218 meanR:418.7500 R:379.0000 loss:13.5493
Episode:7219 meanR:415.5200 R:177.0000 loss:1.5748
Episode:7220 meanR:412.1500 R:163.0000 loss:24.6820
Episode:7221 meanR:408.7800 R:163.0000 loss:17.5486
Episode:7222 meanR:405.4500 R:167.0000 loss:7.0406
Episode:7223 meanR:402.3200 R:187.0000 loss:3.8315
Episode:7224 meanR:399.1700 R:185.0000 loss:3.7001
Episode:7225 meanR:396.2200 R:205.0000 loss:2.1145
Episode:7226 meanR:393.6000 R:238.0000 loss:1.6022
Episode:7227 meanR:393.6000 R:500.0000 loss:1.2361
Episode:7228 meanR:393.

Episode:7368 meanR:426.0000 R:500.0000 loss:14.2777
Episode:7369 meanR:426.0000 R:500.0000 loss:15.5260
Episode:7370 meanR:426.0000 R:500.0000 loss:10.8214
Episode:7371 meanR:426.0000 R:500.0000 loss:12.7507
Episode:7372 meanR:426.0000 R:500.0000 loss:8.6066
Episode:7373 meanR:426.0000 R:500.0000 loss:10.9662
Episode:7374 meanR:426.0000 R:500.0000 loss:18.1298
Episode:7375 meanR:426.0000 R:500.0000 loss:20.0988
Episode:7376 meanR:426.0000 R:500.0000 loss:8.7716
Episode:7377 meanR:426.0000 R:500.0000 loss:20.0016
Episode:7378 meanR:426.0000 R:500.0000 loss:17.2214
Episode:7379 meanR:426.0000 R:500.0000 loss:14.4004
Episode:7380 meanR:426.0000 R:500.0000 loss:19.2747
Episode:7381 meanR:426.0000 R:500.0000 loss:17.8517
Episode:7382 meanR:426.0000 R:500.0000 loss:18.1147
Episode:7383 meanR:426.0000 R:500.0000 loss:16.9478
Episode:7384 meanR:421.1300 R:13.0000 loss:79.7532
Episode:7385 meanR:416.2800 R:15.0000 loss:120.6237
Episode:7386 meanR:416.2800 R:500.0000 loss:25.0410
Episode:7387 me

Episode:7527 meanR:428.5300 R:500.0000 loss:15.0037
Episode:7528 meanR:430.2100 R:500.0000 loss:15.6383
Episode:7529 meanR:432.4300 R:500.0000 loss:15.4447
Episode:7530 meanR:432.4300 R:500.0000 loss:14.5887
Episode:7531 meanR:432.4300 R:500.0000 loss:14.5752
Episode:7532 meanR:432.4300 R:500.0000 loss:16.9287
Episode:7533 meanR:432.4300 R:500.0000 loss:14.4962
Episode:7534 meanR:432.4300 R:500.0000 loss:16.6255
Episode:7535 meanR:434.5700 R:500.0000 loss:16.2604
Episode:7536 meanR:434.5700 R:500.0000 loss:14.7810
Episode:7537 meanR:434.5700 R:500.0000 loss:16.1282
Episode:7538 meanR:434.5700 R:500.0000 loss:15.4777
Episode:7539 meanR:434.5700 R:500.0000 loss:17.9677
Episode:7540 meanR:437.1000 R:500.0000 loss:12.1105
Episode:7541 meanR:437.1000 R:500.0000 loss:16.5278
Episode:7542 meanR:439.4600 R:500.0000 loss:18.5246
Episode:7543 meanR:439.4600 R:500.0000 loss:17.7246
Episode:7544 meanR:439.4600 R:500.0000 loss:16.7556
Episode:7545 meanR:440.8400 R:500.0000 loss:15.7961
Episode:7546

Episode:7686 meanR:387.3600 R:500.0000 loss:16.6019
Episode:7687 meanR:387.3600 R:500.0000 loss:15.1438
Episode:7688 meanR:383.9400 R:125.0000 loss:55.1882
Episode:7689 meanR:384.2400 R:170.0000 loss:2.5138
Episode:7690 meanR:387.6200 R:479.0000 loss:0.5480
Episode:7691 meanR:391.5000 R:500.0000 loss:0.3800
Episode:7692 meanR:394.9500 R:464.0000 loss:18.5287
Episode:7693 meanR:398.6700 R:500.0000 loss:6.2596
Episode:7694 meanR:402.2800 R:500.0000 loss:18.7541
Episode:7695 meanR:405.8100 R:500.0000 loss:17.8337
Episode:7696 meanR:409.3000 R:500.0000 loss:15.7958
Episode:7697 meanR:412.6000 R:500.0000 loss:14.6014
Episode:7698 meanR:411.9200 R:124.0000 loss:57.8564
Episode:7699 meanR:408.1000 R:118.0000 loss:6.2828
Episode:7700 meanR:410.0900 R:500.0000 loss:1.6192
Episode:7701 meanR:408.7400 R:365.0000 loss:13.2814
Episode:7702 meanR:410.6000 R:500.0000 loss:2.5011
Episode:7703 meanR:410.6000 R:500.0000 loss:17.5873
Episode:7704 meanR:412.5800 R:500.0000 loss:16.2656
Episode:7705 meanR:

Episode:7845 meanR:438.5200 R:500.0000 loss:13.3232
Episode:7846 meanR:436.9400 R:342.0000 loss:21.8755
Episode:7847 meanR:432.6700 R:73.0000 loss:26.5775
Episode:7848 meanR:427.7800 R:11.0000 loss:94.9237
Episode:7849 meanR:423.4100 R:63.0000 loss:77.4735
Episode:7850 meanR:423.4100 R:500.0000 loss:19.1366
Episode:7851 meanR:423.4100 R:500.0000 loss:13.3796
Episode:7852 meanR:423.4100 R:500.0000 loss:13.1162
Episode:7853 meanR:423.4100 R:500.0000 loss:17.7828
Episode:7854 meanR:420.7200 R:231.0000 loss:43.8327
Episode:7855 meanR:420.7200 R:500.0000 loss:1.6236
Episode:7856 meanR:418.1500 R:243.0000 loss:15.9294
Episode:7857 meanR:418.1500 R:500.0000 loss:0.8761
Episode:7858 meanR:418.1500 R:500.0000 loss:12.5617
Episode:7859 meanR:418.1500 R:500.0000 loss:19.1808
Episode:7860 meanR:418.1500 R:500.0000 loss:15.7028
Episode:7861 meanR:418.1500 R:500.0000 loss:17.5601
Episode:7862 meanR:418.1500 R:500.0000 loss:13.9605
Episode:7863 meanR:418.1500 R:500.0000 loss:7.3645
Episode:7864 meanR

Episode:8005 meanR:342.5700 R:500.0000 loss:14.6607
Episode:8006 meanR:346.1900 R:500.0000 loss:16.3458
Episode:8007 meanR:349.7400 R:500.0000 loss:17.8992
Episode:8008 meanR:353.3500 R:500.0000 loss:17.3979
Episode:8009 meanR:357.1300 R:500.0000 loss:18.3577
Episode:8010 meanR:358.1000 R:230.0000 loss:38.8098
Episode:8011 meanR:361.9100 R:500.0000 loss:1.1266
Episode:8012 meanR:366.6700 R:500.0000 loss:17.7467
Episode:8013 meanR:370.4700 R:500.0000 loss:16.3354
Episode:8014 meanR:373.3200 R:411.0000 loss:19.3989
Episode:8015 meanR:374.7000 R:245.0000 loss:1.0249
Episode:8016 meanR:378.8200 R:500.0000 loss:1.0250
Episode:8017 meanR:383.0300 R:500.0000 loss:16.1987
Episode:8018 meanR:386.2200 R:389.0000 loss:20.3149
Episode:8019 meanR:390.4400 R:500.0000 loss:0.8172
Episode:8020 meanR:394.5900 R:500.0000 loss:17.3336
Episode:8021 meanR:395.3900 R:196.0000 loss:37.2208
Episode:8022 meanR:394.8000 R:139.0000 loss:3.4146
Episode:8023 meanR:391.8700 R:207.0000 loss:1.3754
Episode:8024 meanR

Episode:8164 meanR:429.5800 R:500.0000 loss:16.5240
Episode:8165 meanR:429.5800 R:500.0000 loss:16.1101
Episode:8166 meanR:429.5800 R:500.0000 loss:12.5447
Episode:8167 meanR:432.0100 R:500.0000 loss:15.8727
Episode:8168 meanR:432.0100 R:500.0000 loss:14.6886
Episode:8169 meanR:432.0100 R:500.0000 loss:13.1246
Episode:8170 meanR:432.0100 R:500.0000 loss:15.6816
Episode:8171 meanR:432.0100 R:500.0000 loss:14.6938
Episode:8172 meanR:435.1700 R:500.0000 loss:18.7192
Episode:8173 meanR:436.3500 R:500.0000 loss:16.8197
Episode:8174 meanR:436.3500 R:500.0000 loss:16.2129
Episode:8175 meanR:436.3500 R:500.0000 loss:15.1452
Episode:8176 meanR:441.2500 R:500.0000 loss:15.5781
Episode:8177 meanR:445.8000 R:500.0000 loss:13.3356
Episode:8178 meanR:445.8600 R:500.0000 loss:15.7140
Episode:8179 meanR:449.3000 R:500.0000 loss:14.9107
Episode:8180 meanR:449.3000 R:500.0000 loss:12.3915
Episode:8181 meanR:449.3000 R:500.0000 loss:16.7691
Episode:8182 meanR:452.7800 R:500.0000 loss:15.3966
Episode:8183

Episode:8322 meanR:397.2800 R:500.0000 loss:14.9737
Episode:8323 meanR:401.7900 R:500.0000 loss:15.0101
Episode:8324 meanR:406.5100 R:500.0000 loss:15.8396
Episode:8325 meanR:406.5100 R:500.0000 loss:13.7680
Episode:8326 meanR:406.5100 R:500.0000 loss:16.4175
Episode:8327 meanR:401.6500 R:14.0000 loss:75.6694
Episode:8328 meanR:401.6500 R:500.0000 loss:22.7110
Episode:8329 meanR:401.6500 R:500.0000 loss:15.5256
Episode:8330 meanR:401.6500 R:500.0000 loss:15.8983
Episode:8331 meanR:401.6500 R:500.0000 loss:13.4011
Episode:8332 meanR:401.6500 R:500.0000 loss:17.1338
Episode:8333 meanR:401.6500 R:500.0000 loss:14.9729
Episode:8334 meanR:401.6500 R:500.0000 loss:16.2007
Episode:8335 meanR:401.6500 R:500.0000 loss:16.4339
Episode:8336 meanR:401.6500 R:500.0000 loss:17.2710
Episode:8337 meanR:401.6500 R:500.0000 loss:15.6122
Episode:8338 meanR:401.6500 R:500.0000 loss:15.5471
Episode:8339 meanR:403.8800 R:500.0000 loss:15.3600
Episode:8340 meanR:406.6800 R:500.0000 loss:15.0494
Episode:8341 

Episode:8481 meanR:312.7900 R:500.0000 loss:15.1326
Episode:8482 meanR:311.4800 R:10.0000 loss:73.5051
Episode:8483 meanR:309.8800 R:10.0000 loss:125.0860
Episode:8484 meanR:308.8600 R:16.0000 loss:152.3776
Episode:8485 meanR:313.7500 R:500.0000 loss:18.0431
Episode:8486 meanR:313.8800 R:30.0000 loss:65.0279
Episode:8487 meanR:314.9100 R:212.0000 loss:35.8863
Episode:8488 meanR:314.9100 R:500.0000 loss:8.7263
Episode:8489 meanR:318.3400 R:500.0000 loss:14.3613
Episode:8490 meanR:313.8000 R:46.0000 loss:69.3001
Episode:8491 meanR:308.9400 R:14.0000 loss:128.6534
Episode:8492 meanR:304.3700 R:43.0000 loss:141.8902
Episode:8493 meanR:301.3600 R:199.0000 loss:57.8576
Episode:8494 meanR:298.1200 R:176.0000 loss:2.6646
Episode:8495 meanR:294.7600 R:164.0000 loss:2.0935
Episode:8496 meanR:291.6500 R:189.0000 loss:2.1035
Episode:8497 meanR:291.7000 R:146.0000 loss:1.4889
Episode:8498 meanR:292.3900 R:145.0000 loss:1.2543
Episode:8499 meanR:292.8600 R:139.0000 loss:1.2071
Episode:8500 meanR:290

Episode:8640 meanR:434.3700 R:500.0000 loss:17.0680
Episode:8641 meanR:434.3700 R:500.0000 loss:17.4789
Episode:8642 meanR:434.3700 R:500.0000 loss:16.5497
Episode:8643 meanR:434.3700 R:500.0000 loss:16.1413
Episode:8644 meanR:429.4800 R:11.0000 loss:72.4531
Episode:8645 meanR:424.6100 R:13.0000 loss:102.3807
Episode:8646 meanR:423.5600 R:395.0000 loss:18.7431
Episode:8647 meanR:422.0600 R:350.0000 loss:5.9357
Episode:8648 meanR:420.1600 R:310.0000 loss:3.6771
Episode:8649 meanR:420.1600 R:500.0000 loss:6.1022
Episode:8650 meanR:420.1600 R:500.0000 loss:18.2182
Episode:8651 meanR:420.1600 R:500.0000 loss:18.7506
Episode:8652 meanR:425.0500 R:500.0000 loss:17.6776
Episode:8653 meanR:425.0500 R:500.0000 loss:17.3089
Episode:8654 meanR:425.0500 R:500.0000 loss:16.8934
Episode:8655 meanR:425.0500 R:500.0000 loss:16.9767
Episode:8656 meanR:425.0500 R:500.0000 loss:16.7817
Episode:8657 meanR:425.0500 R:500.0000 loss:16.5145
Episode:8658 meanR:420.2000 R:15.0000 loss:72.8071
Episode:8659 mean

Episode:8799 meanR:344.5200 R:500.0000 loss:17.5034
Episode:8800 meanR:344.5200 R:500.0000 loss:15.0421
Episode:8801 meanR:344.5200 R:500.0000 loss:12.2530
Episode:8802 meanR:344.5200 R:500.0000 loss:15.8906
Episode:8803 meanR:347.1300 R:500.0000 loss:18.5358
Episode:8804 meanR:352.0100 R:500.0000 loss:15.1739
Episode:8805 meanR:356.8900 R:500.0000 loss:14.4861
Episode:8806 meanR:361.7700 R:500.0000 loss:15.7965
Episode:8807 meanR:366.6600 R:500.0000 loss:9.6750
Episode:8808 meanR:371.5300 R:500.0000 loss:15.2472
Episode:8809 meanR:371.9700 R:500.0000 loss:18.6634
Episode:8810 meanR:371.9700 R:500.0000 loss:16.5998
Episode:8811 meanR:371.9700 R:500.0000 loss:17.3374
Episode:8812 meanR:371.9700 R:500.0000 loss:16.2014
Episode:8813 meanR:367.1000 R:13.0000 loss:74.8373
Episode:8814 meanR:366.3600 R:31.0000 loss:122.4213
Episode:8815 meanR:366.4800 R:97.0000 loss:103.2635
Episode:8816 meanR:370.2000 R:500.0000 loss:4.9033
Episode:8817 meanR:373.8200 R:500.0000 loss:15.8207
Episode:8818 me

Episode:8957 meanR:431.8000 R:500.0000 loss:19.4228
Episode:8958 meanR:431.8000 R:500.0000 loss:19.6477
Episode:8959 meanR:431.8000 R:500.0000 loss:13.0991
Episode:8960 meanR:431.8000 R:500.0000 loss:12.8124
Episode:8961 meanR:431.8000 R:500.0000 loss:12.0254
Episode:8962 meanR:431.8000 R:500.0000 loss:13.3981
Episode:8963 meanR:431.8000 R:500.0000 loss:12.3087
Episode:8964 meanR:431.8000 R:500.0000 loss:14.8183
Episode:8965 meanR:431.8000 R:500.0000 loss:14.3636
Episode:8966 meanR:431.8000 R:500.0000 loss:14.9198
Episode:8967 meanR:431.8000 R:500.0000 loss:16.7380
Episode:8968 meanR:431.8000 R:500.0000 loss:13.4966
Episode:8969 meanR:431.8000 R:500.0000 loss:9.1049
Episode:8970 meanR:431.8000 R:500.0000 loss:13.8467
Episode:8971 meanR:431.8000 R:500.0000 loss:12.2850
Episode:8972 meanR:431.8000 R:500.0000 loss:15.2456
Episode:8973 meanR:431.8000 R:500.0000 loss:13.7061
Episode:8974 meanR:431.8000 R:500.0000 loss:15.0676
Episode:8975 meanR:431.8000 R:500.0000 loss:15.0047
Episode:8976 

Episode:9115 meanR:435.3000 R:500.0000 loss:18.5768
Episode:9116 meanR:435.3000 R:500.0000 loss:16.5022
Episode:9117 meanR:435.3000 R:500.0000 loss:14.8494
Episode:9118 meanR:435.3000 R:500.0000 loss:15.4706
Episode:9119 meanR:435.3000 R:500.0000 loss:15.2699
Episode:9120 meanR:435.3000 R:500.0000 loss:15.3586
Episode:9121 meanR:435.3000 R:500.0000 loss:16.0633
Episode:9122 meanR:435.3000 R:500.0000 loss:15.1276
Episode:9123 meanR:435.3000 R:500.0000 loss:14.3538
Episode:9124 meanR:435.3000 R:500.0000 loss:16.9442
Episode:9125 meanR:435.3000 R:500.0000 loss:16.8663
Episode:9126 meanR:435.3000 R:500.0000 loss:16.7988
Episode:9127 meanR:435.3000 R:500.0000 loss:16.4024
Episode:9128 meanR:435.3000 R:500.0000 loss:15.2384
Episode:9129 meanR:435.3000 R:500.0000 loss:15.7307


# Visualizing training

Below I'll plot the total rewards for each episode. I'm plotting the rolling average too, in blue.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / N 

In [None]:
eps, arr = np.array(episode_rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(loss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Average losses')

## Testing

Let's checkout how our trained agent plays the game.

In [38]:
import gym
env = gym.make('CartPole-v0')
env = gym.make('CartPole-v1')

with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    
    # Episode/epoch
    for _ in range(10):
        total_reward = 0
        state = env.reset()
        initial_state = sess.run(model.initial_state) # Qs or current batch or states[:-1]
        
        # Steps/batches
        while True:
            env.render()
            action_logits, initial_state = sess.run([model.actions_logits, model.final_state],
                                                    feed_dict = {model.states: state.reshape([1, -1]), 
                                                                 model.initial_state: initial_state})
            action = np.argmax(action_logits)
            state, reward, done, _ = env.step(action)
            total_reward += reward
            if done:
                break
        # At the end of each episode
        print('total_reward:{}'.format(total_reward))

# Close the env
env.close()

INFO:tensorflow:Restoring parameters from checkpoints/model.ckpt
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0
total_reward:500.0


## Extending this

So, Cart-Pole is a pretty simple game. However, the same model can be used to train an agent to play something much more complicated like Pong or Space Invaders. Instead of a state like we're using here though, you'd want to use convolutional layers to get the state from the screen images.

![Deep Q-Learning Atari](assets/atari-network.png)

I'll leave it as a challenge for you to use deep Q-learning to train an agent to play Atari games. Here's the original paper which will get you started: http://www.davidqiu.com:8888/research/nature14236.pdf.