# Deep Q-learning (DQL) or Deep Q-network (DQN)

In this notebook, we'll build a neural network that can learn to play games through reinforcement learning. More specifically, we'll use Q-learning to train an agent to play a game called [Cart-Pole](https://gym.openai.com/envs/CartPole-v0). In this game, a freely swinging pole is attached to a cart. The cart can move to the left and right, and the goal is to keep the pole upright as long as possible.

![Cart-Pole](assets/cart-pole.jpg)

We can simulate this game using [OpenAI Gym](https://gym.openai.com/). First, let's check out how OpenAI Gym works. Then, we'll get into training an agent to play the Cart-Pole game.

In [1]:
import numpy as np

In [2]:
import tensorflow as tf
print('TensorFlow Version: {}'.format(tf.__version__))
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.12.0
Default GPU Device: /device:GPU:0


>**Note:** Make sure you have OpenAI Gym cloned into the same directory with this notebook. I've included `gym` as a submodule, so you can run `git submodule --init --recursive` to pull the contents into the `gym` repo.

>**Note:** Make sure you have OpenAI Gym cloned. Then run this command `pip install -e gym/[all]`.

In [3]:
import gym

## Create the Cart-Pole game environment
# env = gym.make('CartPole-v0') # 200 total reward as goal
env = gym.make('CartPole-v1') # 500 total reward as goal

We interact with the simulation through `env`. To show the simulation running, you can use `env.render()` to render one frame. Passing in an action as an integer to `env.step` will generate the next step in the simulation.  You can see how many actions are possible from `env.action_space` and to get a random action you can use `env.action_space.sample()`. This is general to all Gym games. In the Cart-Pole game, there are two possible actions, moving the cart left or right. So there are two actions we can take, encoded as 0 and 1.

Run the code below to watch the simulation run.

In [4]:
# env.reset()
# batch = []
# for _ in range(1000):
#     # env.render()
#     action = env.action_space.sample()
#     state, reward, done, info = env.step(action) # take a random action
#     batch.append([action, state, reward, done, info])
#     #print('state, action, reward, done, info:', state, action, reward, done, info)
#     if done:
#         env.reset()

To shut the window showing the simulation, use `env.close()`.

If you ran the simulation above, we can look at the rewards:

In [5]:
# batch[0], batch[0][1].shape

In [6]:
# actions = np.array([each[0] for each in batch])
# states = np.array([each[1] for each in batch])
# rewards = np.array([each[2] for each in batch])
# dones = np.array([each[3] for each in batch])
# infos = np.array([each[4] for each in batch])

In [7]:
# print(rewards[-20:])
# print(np.array(rewards).shape, np.array(states).shape, np.array(actions).shape, np.array(dones).shape)
# print(np.array(rewards).dtype, np.array(states).dtype, np.array(actions).dtype, np.array(dones).dtype)
# print(np.max(np.array(actions)), np.min(np.array(actions)))
# print((np.max(np.array(actions)) - np.min(np.array(actions)))+1)
# print(np.max(np.array(rewards)), np.min(np.array(rewards)))
# print(np.max(np.array(states)), np.min(np.array(states)))

The game resets after the pole has fallen past a certain angle. For each frame while the simulation is running, it returns a reward of 1.0. The longer the game runs, the more reward we get. Then, our network's goal is to maximize the reward by keeping the pole vertical. It will do this by moving the cart to the left and the right.

## Q-Network

We train our Q-learning agent using the Bellman Equation:

$$
Q(s, a) = r + \gamma \max{Q(s', a')}
$$

where $s$ is a state, $a$ is an action, and $s'$ is the next state from state $s$ and action $a$.

Before we used this equation to learn values for a Q-_table_. However, for this game there are a huge number of states available. The state has four values: the position and velocity of the cart, and the position and velocity of the pole. These are all real-valued numbers, so ignoring floating point precisions, you practically have infinite states. Instead of using a table then, we'll replace it with a neural network that will approximate the Q-table lookup function.

<img src="assets/deep-q-learning.png" width=450px>

Now, our Q value, $Q(s, a)$ is calculated by passing in a state to the network. The output will be Q-values for each available action, with fully connected hidden layers.

<img src="assets/q-network.png" width=550px>


As I showed before, we can define our targets for training as $\hat{Q}(s,a) = r + \gamma \max{Q(s', a')}$. Then we update the weights by minimizing $(\hat{Q}(s,a) - Q(s,a))^2$. 

For this Cart-Pole game, we have four inputs, one for each value in the state, and two outputs, one for each action. To get $\hat{Q}$, we'll first choose an action, then simulate the game using that action. This will get us the next state, $s'$, and the reward. With that, we can calculate $\hat{Q}$ then pass it back into the $Q$ network to run the optimizer and update the weights.

Below is my implementation of the Q-network. I used two fully connected layers with ReLU activations. Two seems to be good enough, three might be better. Feel free to try it out.

In [8]:
def model_input(state_size, action_size):
    actions = tf.placeholder(tf.int32, [None], name='actions')
    actions_logits = tf.placeholder(tf.float32, [None, action_size], name='actions_logits')
    states = tf.placeholder(tf.float32, [None, state_size], name='states')
    next_states = tf.placeholder(tf.float32, [None, state_size], name='next_states')
    targetQs = tf.placeholder(tf.float32, [None], name='targetQs')
    is_training = tf.placeholder(tf.bool, name='is_training')
    return actions, actions_logits, states, next_states, targetQs, is_training

In [9]:
def init_xavier(random_seed=1, dtype=tf.float32, uniform=False):
    xavier = tf.contrib.layers.xavier_initializer(
        dtype=dtype,
        seed=tf.set_random_seed(random_seed), 
        uniform=uniform) # False: normal
    return xavier

In [10]:
def mlp(inputs, units, trainable=True):
    outputs = tf.layers.dense(
        inputs=inputs,
        units=units,
        activation=None,
        use_bias=True,
        kernel_initializer=init_xavier(), # Xavier with normal init
        bias_initializer=tf.zeros_initializer(),
        kernel_regularizer=None,
        bias_regularizer=None,
        activity_regularizer=None,
        kernel_constraint=None,
        bias_constraint=None,
        trainable=trainable,
        name=None,
        reuse=None)
    return outputs

In [11]:
# tf.nn.leaky_relu(
#     features,
#     alpha=0.2,
#     name=None
# )
def nl(inputs, alpha=0.2):
    outputs = tf.maximum(alpha * inputs, inputs)
    return outputs

In [12]:
def bn(inputs, training=False):
    outputs = tf.layers.batch_normalization(
        inputs=inputs,
        axis=-1,
        momentum=0.99,
        epsilon=0.001,
        center=True,
        scale=True,
        beta_initializer=tf.zeros_initializer(),
        gamma_initializer=tf.ones_initializer(),
        moving_mean_initializer=tf.zeros_initializer(),
        moving_variance_initializer=tf.ones_initializer(),
        beta_regularizer=None,
        gamma_regularizer=None,
        beta_constraint=None,
        gamma_constraint=None,
        training=training,
        trainable=True,
        name=None,
        reuse=None,
        renorm=False,
        renorm_clipping=None,
        renorm_momentum=0.99,
        fused=None,
        virtual_batch_size=None,
        adjustment=None)
    return outputs

In [14]:
# Critic
def D(states, actions_logits, state_size, hidden_size, reuse=False, alpha=0.2, is_training=False):
    with tf.variable_scope('D', reuse=reuse):
        # First fully connected layer
        h = mlp(inputs=states, units=hidden_size)
        h = bn(inputs=h, training=is_training)
        h = nl(h)
        print(states.shape, h.shape)
        
        # Second fully connected layer
        h = tf.concat([h, actions_logits], axis=1)
        h = mlp(inputs=h, units=hidden_size)
        h = bn(inputs=h, training=is_training)
        h = nl(h)
        print(h.shape)
        
        # Output layer
        Qvalues_ = mlp(inputs=h, units=1)
        return Qvalues_

In [18]:
# Actor
def G(states, action_size, hidden_size, reuse=False, alpha=0.2, is_training=False):
    with tf.variable_scope('G', reuse=reuse):
        # First fully connected layer
        h = mlp(inputs=states, units=hidden_size)
        h = bn(inputs=h, training=is_training)
        h = nl(h)
        print(states.shape, h.shape)
        
        # Second fully connected layer
        h = mlp(inputs=h, units=hidden_size)
        h = bn(inputs=h, training=is_training)
        h = nl(h)
        print(h.shape)
        
        # Output layer
        actions_ = mlp(inputs=h, units=action_size)
        return actions_

In [19]:
def model_loss(actions, actions_logits, states, next_states, targetQs, 
               state_size, action_size, hidden_size, 
               is_training=False):

    Qvalues_ = D(states=states, actions_logits=actions_logits, hidden_size=hidden_size, 
                 state_size=state_size, is_training=is_training)
    
    Qs = tf.reshape(Qvalues_, [-1])
    print('Qvalues_.shape, Qs.shape:', Qvalues_.shape, Qs.shape)
    
    #gloss = tf.reduce_mean(tf.reduce_sum((next_states - next_states_)**2, axis=1))

    actions_logits_ = G(states=states, hidden_size=hidden_size, action_size=action_size, 
                        is_training=is_training)
    
    gQvalues_ = D(states=states, actions_logits=actions_logits_, hidden_size=hidden_size, 
                  state_size=state_size, is_training=is_training, reuse=True)
    
    gQs = tf.reshape(gQvalues_, [-1])
    print('Qvalues_.shape, Qs.shape:', gQvalues_.shape, gQs.shape)
    
    dloss = tf.reduce_mean((Qs - targetQs)**2)
    gloss = -tf.reduce_mean(gQs)
    
    return actions_logits_, gQvalues_, gloss, dloss

In [20]:
def model_opt(gloss, dloss, learning_rate):
    # Get weights and bias to update
    t_vars = tf.trainable_variables()
    g_vars = [var for var in t_vars if var.name.startswith('G')]
    d_vars = [var for var in t_vars if var.name.startswith('D')]

    # Optimize
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): # Required for batchnorm (BN)
        g_opt = tf.train.AdamOptimizer(learning_rate).minimize(gloss, var_list=g_vars)
        d_opt = tf.train.AdamOptimizer(learning_rate).minimize(dloss, var_list=d_vars)

    return g_opt, d_opt

In [21]:
class Model:
    def __init__(self, state_size, action_size, hidden_size, learning_rate):

        # Data of the Model: make the data available inside the framework
        (self.actions, self.actions_logits, self.states, self.next_states, self.targetQs, \
         self.is_training) = model_input(state_size=state_size, action_size=action_size)

        # Create the Model: calculating the loss and forwad pass
        self.actions_logits_, self.gQvalues_, self.gloss, self.dloss = model_loss(
            action_size=action_size, hidden_size=hidden_size, state_size=state_size,
            states=self.states, next_states=self.next_states, 
            actions=self.actions, actions_logits=self.actions_logits,
            targetQs=self.targetQs, is_training=self.is_training)

        # Update the model: backward pass and backprop
        self.g_opt, self.d_opt = model_opt(gloss=self.gloss, dloss=self.dloss, learning_rate=learning_rate)

## Experience replay

Reinforcement learning algorithms can have stability issues due to correlations between states. To reduce correlations when training, we can store the agent's experiences and later draw a random mini-batch of those experiences to train on. 

Here, we'll create a `Memory` object that will store our experiences, our transitions $<s, a, r, s'>$. This memory will have a maxmium capacity, so we can keep newer experiences in memory while getting rid of older experiences. Then, we'll sample a random mini-batch of transitions $<s, a, r, s'>$ and train on those.

Below, I've implemented a `Memory` object. If you're unfamiliar with `deque`, this is a double-ended queue. You can think of it like a tube open on both sides. You can put objects in either side of the tube. But if it's full, adding anything more will push an object out the other side. This is a great data structure to use for the memory buffer.

In [22]:
from collections import deque
class Memory():
    def __init__(self, max_size = 1000):
        self.buffer = deque(maxlen=max_size)
    def sample(self, batch_size):
        idx = np.random.choice(np.arange(len(self.buffer)), 
                               size=batch_size, 
                               replace=False)
        return [self.buffer[ii] for ii in idx]

## Exploration - Exploitation

To learn about the environment and rules of the game, the agent needs to explore by taking random actions. We'll do this by choosing a random action with some probability $\epsilon$ (epsilon).  That is, with some probability $\epsilon$ the agent will make a random action and with probability $1 - \epsilon$, the agent will choose an action from $Q(s,a)$. This is called an **$\epsilon$-greedy policy**.


At first, the agent needs to do a lot of exploring. Later when it has learned more, the agent can favor choosing actions based on what it has learned. This is called _exploitation_. We'll set it up so the agent is more likely to explore early in training, then more likely to exploit later in training.

## Q-Learning training algorithm

Putting all this together, we can list out the algorithm we'll use to train the network. We'll train the network in _episodes_. One *episode* is one simulation of the game. For this game, the goal is to keep the pole upright for 195 frames. So we can start a new episode once meeting that goal. The game ends if the pole tilts over too far, or if the cart moves too far the left or right. When a game ends, we'll start a new episode. Now, to train the agent:

* Initialize the memory $D$
* Initialize the action-value network $Q$ with random weights
* **For** episode = 1, $M$ **do**
  * **For** $t$, $T$ **do**
     * With probability $\epsilon$ select a random action $a_t$, otherwise select $a_t = \mathrm{argmax}_a Q(s,a)$
     * Execute action $a_t$ in simulator and observe reward $r_{t+1}$ and new state $s_{t+1}$
     * Store transition $<s_t, a_t, r_{t+1}, s_{t+1}>$ in memory $D$
     * Sample random mini-batch from $D$: $<s_j, a_j, r_j, s'_j>$
     * Set $\hat{Q}_j = r_j$ if the episode ends at $j+1$, otherwise set $\hat{Q}_j = r_j + \gamma \max_{a'}{Q(s'_j, a')}$
     * Make a gradient descent step with loss $(\hat{Q}_j - Q(s_j, a_j))^2$
  * **endfor**
* **endfor**

## Hyperparameters

One of the more difficult aspects of reinforcememt learning are the large number of hyperparameters. Not only are we tuning the network, but we're tuning the simulation.

In [23]:
# print('state size:{}'.format(states.shape), 
#       'actions:{}'.format(actions.shape)) 
# print('action size:', np.max(actions) - np.min(actions)+1)

In [24]:
# Network parameters
action_size = 2
state_size = 4
hidden_size = 64               # number of units in each Q-network hidden layer
learning_rate = 1e-04         # Q-network learning rate

# Memory parameters
memory_size = int(1e6)            # memory capacity
batch_size = 1024             # experience mini-batch size
gamma = 0.99                 # future reward discount

In [25]:
# Reset/init the graph/session
graph = tf.reset_default_graph()

# Init the model
model = Model(action_size=action_size, hidden_size=hidden_size, state_size=state_size, 
              learning_rate=learning_rate)

# Init the memory
memory = Memory(max_size=memory_size)

(?, 4) (?, 64)
(?, 64)
Qvalues_.shape, Qs.shape: (?, 1) (?,)
(?, 4) (?, 64)
(?, 64)
(?, 4) (?, 64)
(?, 64)
Qvalues_.shape, Qs.shape: (?, 1) (?,)


## Populate the memory (exprience memory)

Here I'm re-initializing the simulation and pre-populating the memory. The agent is taking random actions and storing the transitions in memory. This will help the agent with exploring the game.

In [26]:
# state = env.reset()

# for _ in range(memory_size):
    
#     action = env.action_space.sample()
    
#     next_state, reward, done, _ = env.step(action)
    
#     memory.buffer.append([state, action, next_state, reward, float(done)])
    
#     state = next_state
    
#     if done is True:
#         state = env.reset()

## Training the model

Below we'll train our agent. If you want to watch it train, uncomment the `env.render()` line. This is slow because it's rendering the frames slower than the network can train. But, it's cool to watch the agent get better at the game.

In [27]:
model.gQvalues_

<tf.Tensor 'D_1/dense_2/BiasAdd:0' shape=(?, 1) dtype=float32>

In [28]:
def learn(sess, memory, batch_size):
    
    batch = memory.sample(batch_size)
    states = np.array([each[0] for each in batch])
    actions = np.array([each[1] for each in batch])
    next_states = np.array([each[2] for each in batch])
    rewards = np.array([each[3] for each in batch])
    dones = np.array([each[4] for each in batch])
    actions_logits = np.array([each[5] for each in batch])
    
    nextQvalues_ = sess.run(model.gQvalues_, feed_dict = {model.states: next_states,
                                                          model.is_training: False})
    
    #nextQs = np.max(next_actions_logits_, axis=1) * (1-dones)
    nextQs = nextQvalues_.reshape(-1) * (1-dones)
    
    targetQs = rewards + (gamma * nextQs)
    
    feed_dict = {model.states: states, model.actions: actions, model.actions_logits: actions_logits,
                 model.next_states: next_states, model.targetQs: targetQs, model.is_training: True}
    
    dloss, _ = sess.run([model.dloss, model.d_opt], feed_dict)
    gloss, _ = sess.run([model.gloss, model.g_opt], feed_dict)
                                                             
    return gloss, dloss

In [29]:
# def env_(sess, state, action_logits):
    
#     next_state_ = sess.run(model.next_states_, feed_dict={
#         model.states: state.reshape([1, -1]), 
#         model.actions_logits: action_logits.reshape([1, -1]),
#         model.is_training: False})
    
#     return next_state_.reshape(-1)

In [30]:
def act(sess, state):
    
    action_logits_ = sess.run(model.actions_logits_, feed_dict={model.states: state.reshape([1, -1]), 
                                                                model.is_training: False})
    
    action = np.argmax(action_logits_, axis=1)[0]
    #print(action)
    
    return action, action_logits_.reshape(-1)

In [31]:
# l = []
# l.append([0, 1])

In [32]:
# l[0][0]

In [None]:
# Save/load the model and save for plotting
saver = tf.train.Saver()
episode_rewards_list, rewards_list, loss_list = [], [], []

# TF session for training
with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    #total_step = 0 # Explore or exploit parameter
    episode_reward = deque(maxlen=100) # 100 episodes average/running average/running mean/window
    
    # Training episodes/epochs
    for ep in range(11111):
        total_reward = 0
        loss_batch = []
        state = env.reset()

        # Training steps/batches
        while True:
            action, action_logits = act(sess, state)
                
            next_state, reward, done, _ = env.step(action)
            
            memory.buffer.append([state, action, next_state, reward, float(done), action_logits])
            
            # # Explore (Env) or Exploit (Model)
            # #total_step += 1
            # #explore_p = explore_stop + (explore_start - explore_stop) * np.exp(-decay_rate * total_step)
            # explore_p = np.random.normal(0, 1)
            # if np.abs(explore_p) < 1e-3:
            #     print('np.abs(explore_p) < 1e-3:', explore_p)
            #     next_state = env_(sess, state=state, action_logits=action_logits)
            
            total_reward += reward
            state = next_state

            # Training
            if len(memory.buffer) >= batch_size:
                gloss, dloss = learn(sess, memory, batch_size)
                loss_batch.append([gloss, dloss])
            
            if done is True:
                break
                
        if len(memory.buffer) >= batch_size:
            episode_reward.append(total_reward)

            print('Episode:{}'.format(ep),
                  'meanR:{:.4f}'.format(np.mean(episode_reward)),
                  'R:{}'.format(total_reward),
                  'gloss:{:.4f}'.format(np.mean(loss_batch, axis=0)[0]),
                  'dloss:{:.4f}'.format(np.mean(loss_batch, axis=0)[1]))
            
            # Ploting out
            episode_rewards_list.append([ep, np.mean(episode_reward)])
            rewards_list.append([ep, total_reward])
            loss_list.append([ep, np.mean(loss_batch)])

            # Break episode/epoch loop
            if np.mean(episode_reward) >= 500:
                break
        
        else: print('len(memory.buffer) >= batch_size:', len(memory.buffer), batch_size)
            
    # At the end of all training episodes/epochs
    saver.save(sess, 'checkpoints/model.ckpt')

len(memory.buffer) >= batch_size: 9 1024
len(memory.buffer) >= batch_size: 18 1024
len(memory.buffer) >= batch_size: 28 1024
len(memory.buffer) >= batch_size: 38 1024
len(memory.buffer) >= batch_size: 48 1024
len(memory.buffer) >= batch_size: 57 1024
len(memory.buffer) >= batch_size: 67 1024
len(memory.buffer) >= batch_size: 77 1024
len(memory.buffer) >= batch_size: 87 1024
len(memory.buffer) >= batch_size: 97 1024
len(memory.buffer) >= batch_size: 106 1024
len(memory.buffer) >= batch_size: 114 1024
len(memory.buffer) >= batch_size: 124 1024
len(memory.buffer) >= batch_size: 134 1024
len(memory.buffer) >= batch_size: 143 1024
len(memory.buffer) >= batch_size: 151 1024
len(memory.buffer) >= batch_size: 165 1024
len(memory.buffer) >= batch_size: 174 1024
len(memory.buffer) >= batch_size: 183 1024
len(memory.buffer) >= batch_size: 191 1024
len(memory.buffer) >= batch_size: 200 1024
len(memory.buffer) >= batch_size: 209 1024
len(memory.buffer) >= batch_size: 217 1024
len(memory.buffer) >= 

Episode:168 meanR:11.5246 R:10.0 gloss:-2.2800 dloss:1.0268
Episode:169 meanR:11.5000 R:10.0 gloss:-2.3155 dloss:0.6442
Episode:170 meanR:11.4762 R:10.0 gloss:-2.3164 dloss:0.5485
Episode:171 meanR:11.4531 R:10.0 gloss:-2.3584 dloss:1.1065
Episode:172 meanR:11.4154 R:9.0 gloss:-2.4079 dloss:0.9967
Episode:173 meanR:11.3939 R:10.0 gloss:-2.4097 dloss:0.4786
Episode:174 meanR:11.3731 R:10.0 gloss:-2.3999 dloss:0.6714
Episode:175 meanR:11.3529 R:10.0 gloss:-2.4668 dloss:1.4253
Episode:176 meanR:11.3188 R:9.0 gloss:-2.4971 dloss:1.4587
Episode:177 meanR:11.3000 R:10.0 gloss:-2.5321 dloss:0.5312
Episode:178 meanR:11.2958 R:11.0 gloss:-2.5492 dloss:0.4013
Episode:179 meanR:11.2778 R:10.0 gloss:-2.5811 dloss:0.6086
Episode:180 meanR:11.2877 R:12.0 gloss:-2.6124 dloss:1.1243
Episode:181 meanR:11.2703 R:10.0 gloss:-2.6568 dloss:0.7817
Episode:182 meanR:11.2533 R:10.0 gloss:-2.6721 dloss:0.3712
Episode:183 meanR:11.2500 R:11.0 gloss:-2.6928 dloss:0.4314
Episode:184 meanR:11.2468 R:11.0 gloss:-2.

Episode:306 meanR:11.2600 R:11.0 gloss:-5.1206 dloss:0.8660
Episode:307 meanR:11.3200 R:15.0 gloss:-5.1646 dloss:0.8628
Episode:308 meanR:11.4100 R:18.0 gloss:-5.2081 dloss:0.8938
Episode:309 meanR:11.4900 R:18.0 gloss:-5.2804 dloss:0.8722
Episode:310 meanR:11.5000 R:11.0 gloss:-5.2912 dloss:0.9519
Episode:311 meanR:11.5400 R:14.0 gloss:-5.3388 dloss:0.7825
Episode:312 meanR:11.6000 R:16.0 gloss:-5.3764 dloss:0.7274
Episode:313 meanR:11.6400 R:14.0 gloss:-5.4257 dloss:0.7002
Episode:314 meanR:11.7100 R:15.0 gloss:-5.4690 dloss:0.7205
Episode:315 meanR:11.7400 R:12.0 gloss:-5.4728 dloss:0.7473
Episode:316 meanR:11.8100 R:17.0 gloss:-5.4955 dloss:0.7206
Episode:317 meanR:11.8500 R:14.0 gloss:-5.5438 dloss:0.8219
Episode:318 meanR:11.8800 R:17.0 gloss:-5.6031 dloss:0.8991
Episode:319 meanR:11.9100 R:13.0 gloss:-5.6021 dloss:0.8548
Episode:320 meanR:11.9800 R:17.0 gloss:-5.6026 dloss:0.7974
Episode:321 meanR:12.0400 R:17.0 gloss:-5.6259 dloss:0.9413
Episode:322 meanR:12.1100 R:16.0 gloss:-

Episode:443 meanR:13.5000 R:9.0 gloss:-7.2205 dloss:1.9793
Episode:444 meanR:13.4200 R:9.0 gloss:-7.2109 dloss:1.7426
Episode:445 meanR:13.3400 R:10.0 gloss:-7.2216 dloss:1.3430
Episode:446 meanR:13.2900 R:9.0 gloss:-7.2200 dloss:0.9951
Episode:447 meanR:13.2500 R:10.0 gloss:-7.2526 dloss:0.9635
Episode:448 meanR:13.1900 R:9.0 gloss:-7.2849 dloss:1.2811
Episode:449 meanR:13.1800 R:12.0 gloss:-7.2959 dloss:2.4381
Episode:450 meanR:13.1400 R:9.0 gloss:-7.3173 dloss:4.7240
Episode:451 meanR:13.2000 R:18.0 gloss:-7.4236 dloss:3.5783
Episode:452 meanR:13.2600 R:20.0 gloss:-7.5587 dloss:2.8205
Episode:453 meanR:13.2900 R:16.0 gloss:-7.4498 dloss:3.7387
Episode:454 meanR:13.3600 R:19.0 gloss:-7.2807 dloss:3.4825
Episode:455 meanR:13.2700 R:9.0 gloss:-7.1677 dloss:3.9715
Episode:456 meanR:13.2600 R:11.0 gloss:-7.1195 dloss:3.4549
Episode:457 meanR:13.2100 R:11.0 gloss:-7.1018 dloss:3.2221
Episode:458 meanR:13.1700 R:8.0 gloss:-7.0944 dloss:2.7728
Episode:459 meanR:13.0800 R:10.0 gloss:-7.0891 

Episode:580 meanR:16.5300 R:28.0 gloss:-7.8339 dloss:0.9372
Episode:581 meanR:16.6900 R:30.0 gloss:-7.9244 dloss:1.2555
Episode:582 meanR:16.7200 R:14.0 gloss:-8.0139 dloss:1.0236
Episode:583 meanR:16.6900 R:22.0 gloss:-8.0323 dloss:0.9181
Episode:584 meanR:16.7100 R:15.0 gloss:-8.0141 dloss:0.8527
Episode:585 meanR:16.7400 R:14.0 gloss:-8.0005 dloss:0.8050
Episode:586 meanR:16.7700 R:16.0 gloss:-8.0154 dloss:0.7738
Episode:587 meanR:16.7300 R:11.0 gloss:-8.0094 dloss:0.9191
Episode:588 meanR:16.7100 R:11.0 gloss:-7.9697 dloss:1.0431
Episode:589 meanR:16.8200 R:20.0 gloss:-7.9046 dloss:1.0869
Episode:590 meanR:16.8300 R:15.0 gloss:-7.8263 dloss:1.0655
Episode:591 meanR:16.9800 R:26.0 gloss:-7.8307 dloss:0.8010
Episode:592 meanR:16.9500 R:20.0 gloss:-7.8803 dloss:0.7669
Episode:593 meanR:16.9900 R:16.0 gloss:-7.9338 dloss:0.8905
Episode:594 meanR:17.0000 R:18.0 gloss:-7.9719 dloss:1.1799
Episode:595 meanR:17.0500 R:16.0 gloss:-8.0571 dloss:1.0460
Episode:596 meanR:17.0700 R:19.0 gloss:-

Episode:717 meanR:22.5800 R:17.0 gloss:-8.6241 dloss:0.8936
Episode:718 meanR:22.6200 R:28.0 gloss:-8.7103 dloss:0.8118
Episode:719 meanR:22.9400 R:53.0 gloss:-8.6565 dloss:1.0215
Episode:720 meanR:22.9100 R:12.0 gloss:-8.5653 dloss:0.8693
Episode:721 meanR:23.0600 R:33.0 gloss:-8.6580 dloss:0.9166
Episode:722 meanR:23.1900 R:35.0 gloss:-8.7907 dloss:0.8860
Episode:723 meanR:23.2800 R:25.0 gloss:-8.7345 dloss:1.0083
Episode:724 meanR:23.7500 R:61.0 gloss:-8.7449 dloss:0.8588
Episode:725 meanR:23.8500 R:26.0 gloss:-8.8999 dloss:0.9175
Episode:726 meanR:24.0900 R:41.0 gloss:-8.7759 dloss:1.0069
Episode:727 meanR:24.2400 R:29.0 gloss:-8.7249 dloss:0.9287
Episode:728 meanR:24.2000 R:28.0 gloss:-8.7700 dloss:1.8809
Episode:729 meanR:24.1600 R:12.0 gloss:-8.9345 dloss:1.6364
Episode:730 meanR:24.2200 R:22.0 gloss:-8.9756 dloss:1.6215
Episode:731 meanR:24.3300 R:31.0 gloss:-8.7946 dloss:1.1965
Episode:732 meanR:24.4000 R:21.0 gloss:-8.7071 dloss:0.9521
Episode:733 meanR:24.4400 R:30.0 gloss:-

Episode:853 meanR:26.6500 R:16.0 gloss:-11.2988 dloss:1.2599
Episode:854 meanR:26.5900 R:18.0 gloss:-11.3011 dloss:1.1188
Episode:855 meanR:26.6400 R:32.0 gloss:-11.2807 dloss:1.2524
Episode:856 meanR:26.4600 R:20.0 gloss:-11.3452 dloss:1.3337
Episode:857 meanR:26.5000 R:20.0 gloss:-11.2997 dloss:2.1296
Episode:858 meanR:26.1800 R:17.0 gloss:-11.2437 dloss:1.7237
Episode:859 meanR:26.2400 R:20.0 gloss:-11.3434 dloss:2.2799
Episode:860 meanR:26.3400 R:24.0 gloss:-11.4734 dloss:1.9135
Episode:861 meanR:26.4000 R:24.0 gloss:-11.3761 dloss:1.5529
Episode:862 meanR:26.6700 R:43.0 gloss:-11.4398 dloss:1.9915
Episode:863 meanR:26.7700 R:29.0 gloss:-11.4050 dloss:2.2241
Episode:864 meanR:26.7200 R:15.0 gloss:-11.3924 dloss:6.8674
Episode:865 meanR:26.7200 R:31.0 gloss:-11.6483 dloss:4.3680
Episode:866 meanR:26.2800 R:12.0 gloss:-11.4828 dloss:3.8742
Episode:867 meanR:26.6300 R:51.0 gloss:-11.4567 dloss:2.4092
Episode:868 meanR:26.5100 R:16.0 gloss:-11.6614 dloss:1.4612
Episode:869 meanR:26.650

Episode:988 meanR:29.1000 R:13.0 gloss:-13.3652 dloss:2.7445
Episode:989 meanR:29.6500 R:83.0 gloss:-13.5112 dloss:3.3379
Episode:990 meanR:29.7400 R:40.0 gloss:-13.9262 dloss:3.8382
Episode:991 meanR:29.8900 R:34.0 gloss:-13.9291 dloss:3.3840
Episode:992 meanR:29.6000 R:10.0 gloss:-13.9533 dloss:3.4108
Episode:993 meanR:29.9100 R:40.0 gloss:-14.0862 dloss:3.6237
Episode:994 meanR:29.9000 R:14.0 gloss:-14.1962 dloss:4.2023
Episode:995 meanR:30.0000 R:42.0 gloss:-14.4218 dloss:4.8486
Episode:996 meanR:32.2100 R:251.0 gloss:-15.4280 dloss:11.1947
Episode:997 meanR:31.7400 R:9.0 gloss:-16.4557 dloss:17.1230
Episode:998 meanR:31.6000 R:9.0 gloss:-16.4734 dloss:18.6833
Episode:999 meanR:31.4200 R:10.0 gloss:-16.5131 dloss:20.1787
Episode:1000 meanR:31.3700 R:9.0 gloss:-16.5854 dloss:20.6754
Episode:1001 meanR:31.1600 R:9.0 gloss:-16.6467 dloss:20.5801
Episode:1002 meanR:31.1400 R:10.0 gloss:-16.6751 dloss:19.7869
Episode:1003 meanR:31.0700 R:10.0 gloss:-16.7119 dloss:19.3768
Episode:1004 me

Episode:1118 meanR:22.3800 R:9.0 gloss:-31.1649 dloss:123.6979
Episode:1119 meanR:22.0400 R:9.0 gloss:-31.2766 dloss:133.1294
Episode:1120 meanR:22.0300 R:9.0 gloss:-31.4017 dloss:133.4956
Episode:1121 meanR:22.0300 R:9.0 gloss:-31.4519 dloss:127.8243
Episode:1122 meanR:21.9300 R:8.0 gloss:-31.5522 dloss:133.0038
Episode:1123 meanR:21.6600 R:8.0 gloss:-31.6005 dloss:132.8962
Episode:1124 meanR:21.6500 R:8.0 gloss:-31.6781 dloss:147.3672
Episode:1125 meanR:21.6400 R:9.0 gloss:-31.7377 dloss:141.8280
Episode:1126 meanR:21.6200 R:8.0 gloss:-31.7320 dloss:147.2095
Episode:1127 meanR:21.6100 R:9.0 gloss:-31.7838 dloss:145.1676
Episode:1128 meanR:21.6000 R:9.0 gloss:-31.8592 dloss:138.4692
Episode:1129 meanR:21.4800 R:8.0 gloss:-31.8764 dloss:131.6874
Episode:1130 meanR:21.3700 R:9.0 gloss:-31.9224 dloss:128.8570
Episode:1131 meanR:20.3100 R:10.0 gloss:-31.8542 dloss:132.1703
Episode:1132 meanR:20.3100 R:10.0 gloss:-31.8875 dloss:128.4876
Episode:1133 meanR:20.2600 R:10.0 gloss:-31.8460 dlos

Episode:1250 meanR:9.3800 R:9.0 gloss:-28.2297 dloss:31.3798
Episode:1251 meanR:9.3800 R:9.0 gloss:-28.1876 dloss:28.2476
Episode:1252 meanR:9.3800 R:9.0 gloss:-28.1955 dloss:25.6849
Episode:1253 meanR:9.3600 R:8.0 gloss:-28.1023 dloss:25.9970
Episode:1254 meanR:9.3700 R:10.0 gloss:-28.1001 dloss:24.6007
Episode:1255 meanR:9.3700 R:10.0 gloss:-28.0682 dloss:24.8970
Episode:1256 meanR:9.3500 R:9.0 gloss:-28.0396 dloss:27.1772
Episode:1257 meanR:9.3600 R:10.0 gloss:-27.9719 dloss:25.1595
Episode:1258 meanR:9.3700 R:10.0 gloss:-27.8850 dloss:25.9536
Episode:1259 meanR:9.3600 R:9.0 gloss:-27.8843 dloss:26.5619
Episode:1260 meanR:9.3600 R:10.0 gloss:-27.8418 dloss:27.4962
Episode:1261 meanR:9.3700 R:9.0 gloss:-27.7879 dloss:28.3809
Episode:1262 meanR:9.3500 R:8.0 gloss:-27.8195 dloss:26.6564
Episode:1263 meanR:9.3500 R:9.0 gloss:-27.7311 dloss:27.3411
Episode:1264 meanR:9.3600 R:9.0 gloss:-27.7972 dloss:27.2953
Episode:1265 meanR:9.3600 R:9.0 gloss:-27.8149 dloss:28.2845
Episode:1266 meanR:

Episode:1384 meanR:9.3400 R:11.0 gloss:-24.6930 dloss:8.3764
Episode:1385 meanR:9.3600 R:10.0 gloss:-24.6534 dloss:8.4661
Episode:1386 meanR:9.3500 R:9.0 gloss:-24.6672 dloss:9.0436
Episode:1387 meanR:9.3400 R:8.0 gloss:-24.5984 dloss:8.1836
Episode:1388 meanR:9.3300 R:8.0 gloss:-24.6129 dloss:9.2345
Episode:1389 meanR:9.3400 R:10.0 gloss:-24.5440 dloss:9.0058
Episode:1390 meanR:9.3400 R:10.0 gloss:-24.5118 dloss:8.5175
Episode:1391 meanR:9.3500 R:10.0 gloss:-24.4738 dloss:8.5769
Episode:1392 meanR:9.3500 R:9.0 gloss:-24.3893 dloss:7.8024
Episode:1393 meanR:9.3400 R:9.0 gloss:-24.3914 dloss:7.5772
Episode:1394 meanR:9.3400 R:10.0 gloss:-24.3299 dloss:8.5234
Episode:1395 meanR:9.3300 R:8.0 gloss:-24.2905 dloss:8.2522
Episode:1396 meanR:9.3300 R:9.0 gloss:-24.3067 dloss:8.4989
Episode:1397 meanR:9.3300 R:10.0 gloss:-24.3160 dloss:8.6007
Episode:1398 meanR:9.3200 R:9.0 gloss:-24.3179 dloss:8.5109
Episode:1399 meanR:9.3200 R:10.0 gloss:-24.2088 dloss:7.9653
Episode:1400 meanR:9.3200 R:9.0 

Episode:1518 meanR:12.5600 R:10.0 gloss:-19.2912 dloss:7.2652
Episode:1519 meanR:12.6800 R:24.0 gloss:-19.3507 dloss:10.2115
Episode:1520 meanR:12.6300 R:10.0 gloss:-19.1682 dloss:6.2608
Episode:1521 meanR:12.6200 R:10.0 gloss:-19.1302 dloss:7.1642
Episode:1522 meanR:12.6700 R:14.0 gloss:-19.1834 dloss:6.2336
Episode:1523 meanR:12.6200 R:9.0 gloss:-19.2095 dloss:6.5671
Episode:1524 meanR:12.6000 R:10.0 gloss:-19.1803 dloss:7.1585
Episode:1525 meanR:12.6000 R:10.0 gloss:-19.2124 dloss:7.9999
Episode:1526 meanR:12.5500 R:10.0 gloss:-19.0795 dloss:7.1111
Episode:1527 meanR:12.5500 R:10.0 gloss:-19.0985 dloss:7.8267
Episode:1528 meanR:12.4900 R:8.0 gloss:-19.0579 dloss:7.0716
Episode:1529 meanR:12.5700 R:19.0 gloss:-18.9832 dloss:6.3413
Episode:1530 meanR:12.6700 R:26.0 gloss:-18.8844 dloss:5.2102
Episode:1531 meanR:12.6400 R:9.0 gloss:-18.7463 dloss:5.3242
Episode:1532 meanR:12.5900 R:8.0 gloss:-18.7297 dloss:5.7427
Episode:1533 meanR:12.5700 R:8.0 gloss:-18.6983 dloss:6.2858
Episode:1534

Episode:1651 meanR:10.3200 R:10.0 gloss:-16.3592 dloss:8.5591
Episode:1652 meanR:10.2800 R:9.0 gloss:-16.3358 dloss:8.3542
Episode:1653 meanR:10.3000 R:10.0 gloss:-16.3384 dloss:8.6170
Episode:1654 meanR:10.3000 R:9.0 gloss:-16.3341 dloss:7.6400
Episode:1655 meanR:10.2200 R:9.0 gloss:-16.3578 dloss:8.1052
Episode:1656 meanR:10.2200 R:9.0 gloss:-16.3402 dloss:8.5448
Episode:1657 meanR:10.2100 R:9.0 gloss:-16.3624 dloss:8.4377
Episode:1658 meanR:10.3900 R:26.0 gloss:-16.3385 dloss:8.1370
Episode:1659 meanR:10.4400 R:13.0 gloss:-16.2816 dloss:7.3926
Episode:1660 meanR:10.5500 R:21.0 gloss:-16.3555 dloss:7.0533
Episode:1661 meanR:10.6900 R:23.0 gloss:-16.3148 dloss:7.0875
Episode:1662 meanR:10.7100 R:13.0 gloss:-16.3228 dloss:7.7961
Episode:1663 meanR:10.6800 R:11.0 gloss:-16.3278 dloss:8.4543
Episode:1664 meanR:10.7500 R:16.0 gloss:-16.3318 dloss:9.1868
Episode:1665 meanR:10.7600 R:10.0 gloss:-16.3677 dloss:10.3253
Episode:1666 meanR:10.7700 R:9.0 gloss:-16.3354 dloss:10.3937
Episode:1667

Episode:1784 meanR:22.5000 R:21.0 gloss:-18.0438 dloss:4.3031
Episode:1785 meanR:22.5000 R:16.0 gloss:-18.2365 dloss:3.7605
Episode:1786 meanR:22.6200 R:24.0 gloss:-18.2710 dloss:5.0227
Episode:1787 meanR:22.6300 R:17.0 gloss:-18.1656 dloss:4.2119
Episode:1788 meanR:22.6700 R:24.0 gloss:-18.1170 dloss:4.1026
Episode:1789 meanR:22.6900 R:25.0 gloss:-18.3319 dloss:4.2020
Episode:1790 meanR:22.6900 R:26.0 gloss:-18.3563 dloss:7.2923
Episode:1791 meanR:22.5900 R:17.0 gloss:-18.1309 dloss:6.4162
Episode:1792 meanR:22.5900 R:21.0 gloss:-17.9929 dloss:4.3318
Episode:1793 meanR:22.4900 R:22.0 gloss:-18.0123 dloss:4.4263
Episode:1794 meanR:22.5500 R:26.0 gloss:-18.1603 dloss:6.0970
Episode:1795 meanR:22.5900 R:22.0 gloss:-18.2838 dloss:6.1871
Episode:1796 meanR:22.7000 R:27.0 gloss:-18.1611 dloss:7.1503
Episode:1797 meanR:22.7200 R:16.0 gloss:-17.9730 dloss:5.9608
Episode:1798 meanR:22.7600 R:24.0 gloss:-17.8640 dloss:3.8429
Episode:1799 meanR:22.7600 R:16.0 gloss:-17.9195 dloss:4.9109
Episode:

Episode:1916 meanR:21.9900 R:19.0 gloss:-19.3146 dloss:17.4129
Episode:1917 meanR:21.9800 R:18.0 gloss:-19.3783 dloss:24.1939
Episode:1918 meanR:21.8400 R:20.0 gloss:-19.1301 dloss:30.5019
Episode:1919 meanR:21.8200 R:18.0 gloss:-18.7233 dloss:41.7442
Episode:1920 meanR:21.7400 R:18.0 gloss:-18.6709 dloss:39.7686
Episode:1921 meanR:21.8000 R:22.0 gloss:-18.6628 dloss:18.0177
Episode:1922 meanR:21.7600 R:21.0 gloss:-18.4981 dloss:9.3520
Episode:1923 meanR:21.9100 R:32.0 gloss:-18.3952 dloss:13.1359
Episode:1924 meanR:21.9900 R:31.0 gloss:-18.6541 dloss:8.5939
Episode:1925 meanR:22.0000 R:16.0 gloss:-18.8701 dloss:11.5790
Episode:1926 meanR:22.1200 R:26.0 gloss:-18.7662 dloss:14.5792
Episode:1927 meanR:22.2000 R:25.0 gloss:-18.5270 dloss:15.2488
Episode:1928 meanR:22.2600 R:21.0 gloss:-18.3927 dloss:9.1333
Episode:1929 meanR:22.2500 R:18.0 gloss:-18.3545 dloss:7.4101
Episode:1930 meanR:22.3300 R:29.0 gloss:-18.3818 dloss:9.4137
Episode:1931 meanR:22.3400 R:21.0 gloss:-18.4739 dloss:9.672

Episode:2048 meanR:27.0500 R:25.0 gloss:-19.4824 dloss:12.5966
Episode:2049 meanR:27.1400 R:28.0 gloss:-19.7067 dloss:7.2383
Episode:2050 meanR:27.2800 R:36.0 gloss:-19.6886 dloss:9.1298
Episode:2051 meanR:27.2200 R:20.0 gloss:-19.4736 dloss:10.1638
Episode:2052 meanR:27.2600 R:27.0 gloss:-19.3309 dloss:9.7887
Episode:2053 meanR:27.3000 R:30.0 gloss:-19.2416 dloss:8.2869
Episode:2054 meanR:27.2800 R:21.0 gloss:-19.1819 dloss:5.5964
Episode:2055 meanR:27.3300 R:21.0 gloss:-19.1274 dloss:6.8987
Episode:2056 meanR:27.2800 R:18.0 gloss:-19.1801 dloss:9.9069
Episode:2057 meanR:27.2100 R:19.0 gloss:-19.2742 dloss:10.5444
Episode:2058 meanR:27.1600 R:22.0 gloss:-19.4525 dloss:8.1607
Episode:2059 meanR:27.2200 R:22.0 gloss:-19.5803 dloss:5.1524
Episode:2060 meanR:27.4400 R:47.0 gloss:-19.5580 dloss:5.6705
Episode:2061 meanR:27.3000 R:21.0 gloss:-19.4022 dloss:5.8776
Episode:2062 meanR:27.1700 R:25.0 gloss:-19.2914 dloss:6.2042
Episode:2063 meanR:27.1000 R:16.0 gloss:-19.2351 dloss:5.6764
Episo

Episode:2180 meanR:22.5700 R:34.0 gloss:-16.4357 dloss:5.5084
Episode:2181 meanR:22.6200 R:32.0 gloss:-16.5948 dloss:4.9749
Episode:2182 meanR:22.7500 R:33.0 gloss:-16.7770 dloss:9.3043
Episode:2183 meanR:22.7000 R:15.0 gloss:-16.8500 dloss:9.6459
Episode:2184 meanR:22.6700 R:15.0 gloss:-16.8234 dloss:8.1111
Episode:2185 meanR:22.7200 R:28.0 gloss:-16.6569 dloss:6.4702
Episode:2186 meanR:22.6100 R:18.0 gloss:-16.4684 dloss:4.4110
Episode:2187 meanR:22.5800 R:31.0 gloss:-16.4696 dloss:3.8907
Episode:2188 meanR:22.0700 R:30.0 gloss:-16.6000 dloss:3.1903
Episode:2189 meanR:22.1400 R:24.0 gloss:-16.7135 dloss:4.0114
Episode:2190 meanR:22.2700 R:30.0 gloss:-16.7470 dloss:5.4043
Episode:2191 meanR:22.3200 R:21.0 gloss:-16.7168 dloss:5.0723
Episode:2192 meanR:22.5000 R:36.0 gloss:-16.5969 dloss:4.2821
Episode:2193 meanR:22.2600 R:16.0 gloss:-16.4855 dloss:3.5934
Episode:2194 meanR:22.3100 R:26.0 gloss:-16.4952 dloss:2.7414
Episode:2195 meanR:22.5000 R:36.0 gloss:-16.5668 dloss:3.1669
Episode:

Episode:2312 meanR:35.6300 R:25.0 gloss:-16.2666 dloss:11.3454
Episode:2313 meanR:36.0000 R:54.0 gloss:-16.1248 dloss:7.6045
Episode:2314 meanR:35.9900 R:30.0 gloss:-16.1528 dloss:5.4856
Episode:2315 meanR:35.7100 R:17.0 gloss:-16.3285 dloss:5.8435
Episode:2316 meanR:35.6800 R:21.0 gloss:-16.4725 dloss:5.7129
Episode:2317 meanR:35.8100 R:36.0 gloss:-16.5146 dloss:4.9302
Episode:2318 meanR:35.6000 R:38.0 gloss:-16.3674 dloss:8.3851
Episode:2319 meanR:35.7800 R:47.0 gloss:-16.1922 dloss:12.5264
Episode:2320 meanR:35.8300 R:33.0 gloss:-15.9832 dloss:6.6878
Episode:2321 meanR:35.7400 R:25.0 gloss:-16.0567 dloss:7.0567
Episode:2322 meanR:35.4700 R:31.0 gloss:-16.2399 dloss:8.5637
Episode:2323 meanR:35.3000 R:34.0 gloss:-16.4476 dloss:7.3382
Episode:2324 meanR:35.3300 R:18.0 gloss:-16.4343 dloss:6.8188
Episode:2325 meanR:35.2800 R:29.0 gloss:-16.2767 dloss:9.3581
Episode:2326 meanR:35.4400 R:62.0 gloss:-16.0589 dloss:11.5747
Episode:2327 meanR:35.8200 R:68.0 gloss:-15.9699 dloss:4.6532
Episo

Episode:2445 meanR:34.9900 R:23.0 gloss:-15.1762 dloss:2.6519
Episode:2446 meanR:35.1300 R:41.0 gloss:-15.1206 dloss:3.1160
Episode:2447 meanR:35.4300 R:55.0 gloss:-15.0094 dloss:3.7042
Episode:2448 meanR:35.4100 R:16.0 gloss:-14.9879 dloss:3.4177
Episode:2449 meanR:35.2200 R:45.0 gloss:-14.9663 dloss:3.8926
Episode:2450 meanR:34.6200 R:18.0 gloss:-14.8929 dloss:5.9351
Episode:2451 meanR:34.7600 R:32.0 gloss:-14.8114 dloss:5.3753
Episode:2452 meanR:34.5300 R:33.0 gloss:-14.7529 dloss:3.3877
Episode:2453 meanR:34.0000 R:22.0 gloss:-14.6845 dloss:3.1536
Episode:2454 meanR:33.4500 R:18.0 gloss:-14.7463 dloss:4.2801
Episode:2455 meanR:33.2700 R:23.0 gloss:-14.8186 dloss:4.7030
Episode:2456 meanR:33.5500 R:51.0 gloss:-14.8738 dloss:4.7970
Episode:2457 meanR:33.8000 R:45.0 gloss:-14.8293 dloss:5.8933
Episode:2458 meanR:33.5600 R:32.0 gloss:-14.7382 dloss:7.6457
Episode:2459 meanR:33.6700 R:28.0 gloss:-14.6102 dloss:5.3304
Episode:2460 meanR:32.7100 R:20.0 gloss:-14.5933 dloss:3.5432
Episode:

Episode:2577 meanR:41.7000 R:82.0 gloss:-18.4292 dloss:4.6744
Episode:2578 meanR:41.5300 R:24.0 gloss:-18.4255 dloss:6.3216
Episode:2579 meanR:41.7600 R:51.0 gloss:-18.2072 dloss:5.0220
Episode:2580 meanR:41.8900 R:36.0 gloss:-18.0681 dloss:5.4720
Episode:2581 meanR:42.4700 R:75.0 gloss:-18.0977 dloss:4.0572
Episode:2582 meanR:42.4400 R:20.0 gloss:-18.0189 dloss:2.9527
Episode:2583 meanR:41.6600 R:20.0 gloss:-18.0158 dloss:2.1645
Episode:2584 meanR:42.8900 R:151.0 gloss:-18.0555 dloss:2.5148
Episode:2585 meanR:42.7200 R:33.0 gloss:-18.0443 dloss:2.2425
Episode:2586 meanR:43.5800 R:130.0 gloss:-18.2467 dloss:2.3987
Episode:2587 meanR:44.6700 R:128.0 gloss:-18.4196 dloss:2.4647
Episode:2588 meanR:44.6800 R:39.0 gloss:-18.5157 dloss:2.6770
Episode:2589 meanR:45.0200 R:57.0 gloss:-18.5278 dloss:2.4947
Episode:2590 meanR:44.9700 R:33.0 gloss:-18.5342 dloss:3.1000
Episode:2591 meanR:44.8600 R:27.0 gloss:-18.5835 dloss:2.6002
Episode:2592 meanR:44.8300 R:25.0 gloss:-18.5943 dloss:2.3879
Episo

Episode:2709 meanR:77.4900 R:52.0 gloss:-39.3628 dloss:25.6365
Episode:2710 meanR:77.5000 R:49.0 gloss:-39.6303 dloss:24.6112
Episode:2711 meanR:77.4700 R:28.0 gloss:-39.7139 dloss:23.0166
Episode:2712 meanR:77.6400 R:46.0 gloss:-39.7911 dloss:22.3146
Episode:2713 meanR:77.4300 R:32.0 gloss:-39.8410 dloss:22.8969
Episode:2714 meanR:75.1300 R:59.0 gloss:-39.9036 dloss:18.1429
Episode:2715 meanR:74.8100 R:34.0 gloss:-40.1314 dloss:18.4316
Episode:2716 meanR:74.5700 R:28.0 gloss:-39.8460 dloss:18.3898
Episode:2717 meanR:74.6500 R:50.0 gloss:-39.6986 dloss:17.6034
Episode:2718 meanR:74.5100 R:28.0 gloss:-39.4472 dloss:17.6669
Episode:2719 meanR:74.6400 R:38.0 gloss:-39.3456 dloss:15.4312
Episode:2720 meanR:73.1100 R:51.0 gloss:-39.3491 dloss:14.9301
Episode:2721 meanR:72.5700 R:31.0 gloss:-39.3155 dloss:15.1476
Episode:2722 meanR:72.0800 R:51.0 gloss:-39.1926 dloss:16.6204
Episode:2723 meanR:71.7400 R:59.0 gloss:-38.8600 dloss:15.6714
Episode:2724 meanR:71.4200 R:66.0 gloss:-38.4290 dloss:

Episode:2839 meanR:96.7600 R:38.0 gloss:-43.2701 dloss:29.2770
Episode:2840 meanR:97.8900 R:144.0 gloss:-43.1096 dloss:17.4793
Episode:2841 meanR:99.3600 R:181.0 gloss:-42.6584 dloss:17.6863
Episode:2842 meanR:100.3400 R:134.0 gloss:-42.4904 dloss:16.7444
Episode:2843 meanR:101.3700 R:148.0 gloss:-42.6024 dloss:18.0062
Episode:2844 meanR:101.5800 R:57.0 gloss:-42.6967 dloss:19.5212
Episode:2845 meanR:104.0000 R:275.0 gloss:-42.7317 dloss:18.4127
Episode:2846 meanR:104.8300 R:138.0 gloss:-43.0508 dloss:22.2482
Episode:2847 meanR:105.9100 R:138.0 gloss:-42.9009 dloss:36.9535
Episode:2848 meanR:106.0300 R:46.0 gloss:-42.9125 dloss:28.7620
Episode:2849 meanR:107.5500 R:181.0 gloss:-42.8014 dloss:19.1937
Episode:2850 meanR:108.6200 R:141.0 gloss:-43.2000 dloss:19.4536
Episode:2851 meanR:109.9000 R:190.0 gloss:-43.5155 dloss:18.8623
Episode:2852 meanR:110.8900 R:147.0 gloss:-44.1430 dloss:23.2656
Episode:2853 meanR:111.8300 R:182.0 gloss:-44.8132 dloss:29.5785
Episode:2854 meanR:111.8100 R:6

Episode:2966 meanR:101.8800 R:40.0 gloss:-51.5884 dloss:36.9403
Episode:2967 meanR:100.9200 R:36.0 gloss:-51.2769 dloss:31.5281
Episode:2968 meanR:100.3500 R:69.0 gloss:-51.4275 dloss:32.5606
Episode:2969 meanR:100.6500 R:102.0 gloss:-52.2751 dloss:31.2604
Episode:2970 meanR:100.9800 R:107.0 gloss:-52.7267 dloss:34.6034
Episode:2971 meanR:100.7600 R:96.0 gloss:-53.4291 dloss:43.0932
Episode:2972 meanR:99.7500 R:56.0 gloss:-52.8252 dloss:34.1290
Episode:2973 meanR:99.9100 R:83.0 gloss:-53.2620 dloss:36.6659
Episode:2974 meanR:99.1100 R:42.0 gloss:-53.5178 dloss:34.0378
Episode:2975 meanR:98.3900 R:72.0 gloss:-53.1958 dloss:35.6965
Episode:2976 meanR:98.3000 R:115.0 gloss:-53.3874 dloss:35.5821
Episode:2977 meanR:97.5400 R:38.0 gloss:-51.3117 dloss:81.0448
Episode:2978 meanR:96.7800 R:80.0 gloss:-50.0538 dloss:827.4852
Episode:2979 meanR:95.7800 R:54.0 gloss:-48.9562 dloss:110.5291
Episode:2980 meanR:95.2900 R:95.0 gloss:-49.4552 dloss:48.9472
Episode:2981 meanR:94.8000 R:67.0 gloss:-50.

Episode:3093 meanR:157.2500 R:98.0 gloss:-73.6350 dloss:65.8447
Episode:3094 meanR:157.3400 R:166.0 gloss:-74.9116 dloss:100.7309
Episode:3095 meanR:157.7400 R:160.0 gloss:-74.9900 dloss:40.8755
Episode:3096 meanR:158.0200 R:179.0 gloss:-76.2629 dloss:45.0906
Episode:3097 meanR:158.8000 R:195.0 gloss:-75.1278 dloss:81.5024
Episode:3098 meanR:159.2900 R:152.0 gloss:-75.3611 dloss:72.6309
Episode:3099 meanR:160.3500 R:231.0 gloss:-74.4627 dloss:105.0904
Episode:3100 meanR:160.8800 R:196.0 gloss:-74.7949 dloss:48.5236
Episode:3101 meanR:161.4600 R:201.0 gloss:-75.1927 dloss:41.6455
Episode:3102 meanR:161.6800 R:215.0 gloss:-76.0158 dloss:39.2584
Episode:3103 meanR:162.0800 R:166.0 gloss:-75.5882 dloss:43.6208
Episode:3104 meanR:161.9100 R:163.0 gloss:-75.4009 dloss:50.2323
Episode:3105 meanR:162.6800 R:133.0 gloss:-75.8242 dloss:71.6874
Episode:3106 meanR:163.7400 R:177.0 gloss:-75.7083 dloss:147.3405
Episode:3107 meanR:164.8900 R:201.0 gloss:-75.2362 dloss:39.8177
Episode:3108 meanR:164.

Episode:3220 meanR:160.4700 R:250.0 gloss:-88.0563 dloss:177.9822
Episode:3221 meanR:160.5600 R:230.0 gloss:-88.8154 dloss:56.9460
Episode:3222 meanR:161.1600 R:202.0 gloss:-89.2633 dloss:49.5359
Episode:3223 meanR:161.5300 R:215.0 gloss:-89.1690 dloss:46.5201
Episode:3224 meanR:162.3800 R:228.0 gloss:-87.8698 dloss:54.2086
Episode:3225 meanR:163.3800 R:256.0 gloss:-87.5432 dloss:53.3582
Episode:3226 meanR:164.2700 R:243.0 gloss:-88.9456 dloss:53.9564
Episode:3227 meanR:164.8500 R:252.0 gloss:-84.6479 dloss:115.3546
Episode:3228 meanR:165.3300 R:197.0 gloss:-82.4241 dloss:133.1711
Episode:3229 meanR:167.2600 R:336.0 gloss:-85.7769 dloss:110.5245
Episode:3230 meanR:167.7600 R:166.0 gloss:-83.5866 dloss:497.0826
Episode:3231 meanR:168.9000 R:233.0 gloss:-86.8240 dloss:69.4955
Episode:3232 meanR:171.2300 R:311.0 gloss:-87.7770 dloss:60.1147
Episode:3233 meanR:172.1900 R:254.0 gloss:-88.1178 dloss:54.3407
Episode:3234 meanR:174.0500 R:273.0 gloss:-89.1510 dloss:59.2942
Episode:3235 meanR:1

Episode:3346 meanR:170.2900 R:314.0 gloss:-95.0741 dloss:101.2938
Episode:3347 meanR:169.5500 R:89.0 gloss:-94.5968 dloss:77.6670
Episode:3348 meanR:168.4500 R:71.0 gloss:-94.3366 dloss:78.8544
Episode:3349 meanR:166.2700 R:62.0 gloss:-93.7601 dloss:82.7041
Episode:3350 meanR:164.2500 R:68.0 gloss:-93.5095 dloss:78.5225
Episode:3351 meanR:161.8800 R:62.0 gloss:-93.8960 dloss:77.8838
Episode:3352 meanR:160.7200 R:56.0 gloss:-93.2707 dloss:81.4340
Episode:3353 meanR:159.0400 R:68.0 gloss:-91.9941 dloss:88.9231
Episode:3354 meanR:159.9600 R:208.0 gloss:-91.5901 dloss:78.4656
Episode:3355 meanR:160.7000 R:120.0 gloss:-91.0510 dloss:70.4350
Episode:3356 meanR:158.8700 R:84.0 gloss:-90.4029 dloss:75.6131
Episode:3357 meanR:158.1500 R:143.0 gloss:-90.6519 dloss:69.4731
Episode:3358 meanR:156.3700 R:112.0 gloss:-89.4408 dloss:92.6871
Episode:3359 meanR:154.6300 R:108.0 gloss:-89.5063 dloss:78.9702
Episode:3360 meanR:153.6500 R:169.0 gloss:-89.0741 dloss:66.3661
Episode:3361 meanR:152.7500 R:13

Episode:3475 meanR:47.2700 R:83.0 gloss:-76.1359 dloss:57.6333
Episode:3476 meanR:47.6700 R:77.0 gloss:-76.1754 dloss:54.6135
Episode:3477 meanR:47.5400 R:50.0 gloss:-76.4379 dloss:52.4417
Episode:3478 meanR:47.9000 R:56.0 gloss:-76.2780 dloss:57.6028
Episode:3479 meanR:48.0500 R:49.0 gloss:-75.6733 dloss:59.1329
Episode:3480 meanR:48.3500 R:56.0 gloss:-75.7834 dloss:63.4778
Episode:3481 meanR:48.5500 R:39.0 gloss:-77.3655 dloss:74.9870
Episode:3482 meanR:48.2100 R:24.0 gloss:-78.5498 dloss:76.2547
Episode:3483 meanR:48.0700 R:25.0 gloss:-79.4767 dloss:82.9259
Episode:3484 meanR:47.8100 R:10.0 gloss:-80.4168 dloss:94.8679
Episode:3485 meanR:47.7600 R:10.0 gloss:-80.8000 dloss:151.8977
Episode:3486 meanR:47.7200 R:8.0 gloss:-81.3871 dloss:126.6137
Episode:3487 meanR:47.6200 R:10.0 gloss:-81.7134 dloss:147.8592
Episode:3488 meanR:47.5200 R:10.0 gloss:-82.2672 dloss:143.4926
Episode:3489 meanR:47.3700 R:9.0 gloss:-82.6539 dloss:150.5998
Episode:3490 meanR:46.8700 R:8.0 gloss:-83.2515 dlos

Episode:3606 meanR:28.8300 R:9.0 gloss:-71.5893 dloss:44.2498
Episode:3607 meanR:28.8300 R:9.0 gloss:-71.6987 dloss:48.8404
Episode:3608 meanR:28.9800 R:25.0 gloss:-71.7034 dloss:40.0812
Episode:3609 meanR:29.0000 R:12.0 gloss:-71.8713 dloss:38.6249
Episode:3610 meanR:29.0000 R:9.0 gloss:-72.0711 dloss:36.1193
Episode:3611 meanR:29.0500 R:13.0 gloss:-72.1560 dloss:40.1299
Episode:3612 meanR:29.0400 R:11.0 gloss:-72.0722 dloss:39.0407
Episode:3613 meanR:28.9000 R:11.0 gloss:-72.3484 dloss:36.3577
Episode:3614 meanR:28.7700 R:10.0 gloss:-72.2908 dloss:37.6382
Episode:3615 meanR:28.2800 R:9.0 gloss:-72.4239 dloss:36.7500
Episode:3616 meanR:27.9100 R:12.0 gloss:-72.4476 dloss:38.4261
Episode:3617 meanR:27.6500 R:10.0 gloss:-72.4189 dloss:39.8863
Episode:3618 meanR:27.3600 R:10.0 gloss:-72.3465 dloss:40.6353
Episode:3619 meanR:27.1000 R:11.0 gloss:-72.5033 dloss:36.3941
Episode:3620 meanR:26.9900 R:9.0 gloss:-72.3415 dloss:39.2065
Episode:3621 meanR:26.8500 R:9.0 gloss:-72.4456 dloss:40.207

Episode:3737 meanR:11.9100 R:11.0 gloss:-70.0479 dloss:52.1031
Episode:3738 meanR:11.9500 R:15.0 gloss:-70.1091 dloss:44.1194
Episode:3739 meanR:11.9400 R:10.0 gloss:-70.0132 dloss:51.4150
Episode:3740 meanR:11.9500 R:11.0 gloss:-69.9791 dloss:43.4002
Episode:3741 meanR:11.9500 R:9.0 gloss:-69.9749 dloss:36.4378
Episode:3742 meanR:11.9300 R:9.0 gloss:-70.0755 dloss:47.4982
Episode:3743 meanR:11.9300 R:10.0 gloss:-70.1465 dloss:46.8850
Episode:3744 meanR:11.9200 R:11.0 gloss:-70.1925 dloss:36.4485
Episode:3745 meanR:11.9000 R:9.0 gloss:-70.3089 dloss:52.5789
Episode:3746 meanR:11.9300 R:12.0 gloss:-70.2591 dloss:44.0903
Episode:3747 meanR:11.9700 R:15.0 gloss:-70.1388 dloss:41.8442
Episode:3748 meanR:11.9900 R:14.0 gloss:-70.0259 dloss:38.5820
Episode:3749 meanR:11.9900 R:13.0 gloss:-69.9783 dloss:44.1529
Episode:3750 meanR:11.4300 R:12.0 gloss:-69.8460 dloss:39.6553
Episode:3751 meanR:11.4300 R:10.0 gloss:-70.0046 dloss:47.4662
Episode:3752 meanR:11.5100 R:19.0 gloss:-69.9939 dloss:45.

Episode:3868 meanR:14.5200 R:10.0 gloss:-73.0040 dloss:28.6096
Episode:3869 meanR:14.6000 R:19.0 gloss:-73.1219 dloss:29.8507
Episode:3870 meanR:14.5700 R:19.0 gloss:-73.2777 dloss:31.3152
Episode:3871 meanR:14.5900 R:20.0 gloss:-73.4839 dloss:35.0366
Episode:3872 meanR:14.6600 R:17.0 gloss:-73.5005 dloss:36.9986
Episode:3873 meanR:14.7100 R:15.0 gloss:-73.9184 dloss:36.6540
Episode:3874 meanR:14.7400 R:14.0 gloss:-73.9809 dloss:34.8109
Episode:3875 meanR:14.7000 R:11.0 gloss:-74.1592 dloss:36.4700
Episode:3876 meanR:14.7800 R:23.0 gloss:-74.4073 dloss:38.2510
Episode:3877 meanR:14.6600 R:18.0 gloss:-74.6715 dloss:34.6398
Episode:3878 meanR:14.7100 R:17.0 gloss:-74.9191 dloss:29.8716
Episode:3879 meanR:14.7400 R:15.0 gloss:-74.8199 dloss:34.4284
Episode:3880 meanR:14.7900 R:16.0 gloss:-74.5744 dloss:33.3103
Episode:3881 meanR:14.9400 R:26.0 gloss:-74.2240 dloss:33.3470
Episode:3882 meanR:15.0100 R:17.0 gloss:-73.9111 dloss:32.9312
Episode:3883 meanR:15.1000 R:21.0 gloss:-73.5987 dloss:

Episode:3999 meanR:14.2700 R:20.0 gloss:-72.8875 dloss:44.2860
Episode:4000 meanR:14.2600 R:17.0 gloss:-73.1919 dloss:41.7040
Episode:4001 meanR:14.2900 R:18.0 gloss:-73.4782 dloss:41.5895
Episode:4002 meanR:14.2900 R:15.0 gloss:-73.6841 dloss:41.3092
Episode:4003 meanR:14.3300 R:15.0 gloss:-73.8595 dloss:39.8720
Episode:4004 meanR:14.3800 R:17.0 gloss:-73.8455 dloss:37.0709
Episode:4005 meanR:14.4000 R:14.0 gloss:-73.8344 dloss:42.8537
Episode:4006 meanR:14.4400 R:21.0 gloss:-74.0306 dloss:48.0082
Episode:4007 meanR:14.4600 R:16.0 gloss:-74.4056 dloss:49.0992
Episode:4008 meanR:14.4600 R:17.0 gloss:-74.6004 dloss:37.5567
Episode:4009 meanR:14.5000 R:14.0 gloss:-74.6056 dloss:45.3051
Episode:4010 meanR:14.5600 R:17.0 gloss:-74.2714 dloss:41.4315
Episode:4011 meanR:14.6900 R:23.0 gloss:-74.0609 dloss:39.0253
Episode:4012 meanR:14.7600 R:20.0 gloss:-73.8772 dloss:32.0850
Episode:4013 meanR:14.9000 R:25.0 gloss:-74.0512 dloss:37.5125
Episode:4014 meanR:15.0400 R:24.0 gloss:-74.2037 dloss:

Episode:4130 meanR:13.4900 R:11.0 gloss:-71.0713 dloss:25.7714
Episode:4131 meanR:13.3800 R:10.0 gloss:-71.1917 dloss:28.2321
Episode:4132 meanR:13.2500 R:10.0 gloss:-71.2075 dloss:27.8264
Episode:4133 meanR:13.0800 R:12.0 gloss:-71.0849 dloss:26.2357
Episode:4134 meanR:12.9800 R:11.0 gloss:-71.3103 dloss:32.6079
Episode:4135 meanR:12.8700 R:12.0 gloss:-71.3586 dloss:29.2540
Episode:4136 meanR:12.8200 R:10.0 gloss:-71.3146 dloss:30.1146
Episode:4137 meanR:12.7500 R:11.0 gloss:-71.3702 dloss:28.9348
Episode:4138 meanR:12.6600 R:10.0 gloss:-71.3791 dloss:30.3297
Episode:4139 meanR:12.5900 R:10.0 gloss:-71.4062 dloss:27.6707
Episode:4140 meanR:12.5800 R:13.0 gloss:-71.3776 dloss:29.3608
Episode:4141 meanR:12.5800 R:9.0 gloss:-71.6110 dloss:29.7263
Episode:4142 meanR:12.5700 R:10.0 gloss:-71.7313 dloss:26.2434
Episode:4143 meanR:12.5600 R:9.0 gloss:-71.7628 dloss:25.1818
Episode:4144 meanR:12.5400 R:9.0 gloss:-71.7754 dloss:27.8197
Episode:4145 meanR:12.5200 R:10.0 gloss:-71.6964 dloss:24.

Episode:4261 meanR:11.4600 R:13.0 gloss:-69.2919 dloss:26.1880
Episode:4262 meanR:11.4300 R:10.0 gloss:-69.2059 dloss:29.5230
Episode:4263 meanR:11.4400 R:9.0 gloss:-69.3617 dloss:30.4310
Episode:4264 meanR:11.4400 R:12.0 gloss:-69.4158 dloss:33.3656
Episode:4265 meanR:11.4300 R:9.0 gloss:-69.3059 dloss:30.4137
Episode:4266 meanR:11.4300 R:9.0 gloss:-69.4571 dloss:26.8375
Episode:4267 meanR:11.4600 R:16.0 gloss:-69.3382 dloss:30.7320
Episode:4268 meanR:11.4800 R:10.0 gloss:-69.3957 dloss:24.5530
Episode:4269 meanR:11.4800 R:10.0 gloss:-69.5694 dloss:28.4567
Episode:4270 meanR:11.4400 R:11.0 gloss:-69.5210 dloss:33.1705
Episode:4271 meanR:11.4000 R:9.0 gloss:-69.4755 dloss:26.1113
Episode:4272 meanR:11.3600 R:9.0 gloss:-69.4827 dloss:26.9103
Episode:4273 meanR:11.3200 R:10.0 gloss:-69.3354 dloss:25.0228
Episode:4274 meanR:11.3900 R:16.0 gloss:-69.3783 dloss:28.9698
Episode:4275 meanR:11.4200 R:13.0 gloss:-69.4280 dloss:25.1792
Episode:4276 meanR:11.4400 R:11.0 gloss:-69.6559 dloss:23.47

Episode:4392 meanR:13.1800 R:13.0 gloss:-68.5709 dloss:88.6803
Episode:4393 meanR:13.1700 R:15.0 gloss:-68.2770 dloss:79.0692
Episode:4394 meanR:13.2100 R:20.0 gloss:-67.8333 dloss:66.3963
Episode:4395 meanR:13.2000 R:16.0 gloss:-67.5628 dloss:46.7898
Episode:4396 meanR:13.1200 R:8.0 gloss:-67.4963 dloss:42.1032
Episode:4397 meanR:13.0700 R:16.0 gloss:-67.2178 dloss:43.4668
Episode:4398 meanR:13.2200 R:26.0 gloss:-67.1168 dloss:46.7730
Episode:4399 meanR:13.4400 R:34.0 gloss:-67.1709 dloss:40.6664
Episode:4400 meanR:13.5300 R:23.0 gloss:-67.2879 dloss:52.1088
Episode:4401 meanR:13.4700 R:12.0 gloss:-67.3765 dloss:60.9088
Episode:4402 meanR:13.4500 R:9.0 gloss:-67.4810 dloss:46.0971
Episode:4403 meanR:13.4200 R:11.0 gloss:-67.5380 dloss:44.4195
Episode:4404 meanR:13.3600 R:11.0 gloss:-67.6773 dloss:46.9679
Episode:4405 meanR:13.2800 R:11.0 gloss:-67.7248 dloss:43.9014
Episode:4406 meanR:13.2000 R:9.0 gloss:-67.6801 dloss:42.3423
Episode:4407 meanR:13.2000 R:9.0 gloss:-67.7454 dloss:38.5

Episode:4523 meanR:11.7600 R:13.0 gloss:-64.8081 dloss:19.8311
Episode:4524 meanR:11.8100 R:15.0 gloss:-64.7529 dloss:24.3870
Episode:4525 meanR:11.8300 R:11.0 gloss:-64.6112 dloss:24.3375
Episode:4526 meanR:11.8300 R:13.0 gloss:-64.4620 dloss:24.3163
Episode:4527 meanR:11.7800 R:10.0 gloss:-64.5665 dloss:22.0020
Episode:4528 meanR:11.6900 R:11.0 gloss:-64.4845 dloss:28.1966
Episode:4529 meanR:11.6500 R:10.0 gloss:-64.8210 dloss:27.0880
Episode:4530 meanR:11.6700 R:13.0 gloss:-64.9149 dloss:25.2350
Episode:4531 meanR:11.7700 R:19.0 gloss:-65.0665 dloss:28.5576
Episode:4532 meanR:11.7800 R:12.0 gloss:-65.2319 dloss:31.0336
Episode:4533 meanR:11.6800 R:8.0 gloss:-65.1316 dloss:27.6534
Episode:4534 meanR:11.6500 R:8.0 gloss:-65.0383 dloss:24.0890
Episode:4535 meanR:11.6400 R:9.0 gloss:-65.0746 dloss:26.6515
Episode:4536 meanR:11.6700 R:12.0 gloss:-65.2350 dloss:29.7864
Episode:4537 meanR:11.6700 R:9.0 gloss:-65.1753 dloss:38.9890
Episode:4538 meanR:11.6700 R:11.0 gloss:-64.9338 dloss:33.4

In [None]:
# np.mean(loss_batch, axis=0)

# Visualizing training

Below I'll plot the total rewards for each episode. I'm plotting the rolling average too, in blue.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / N 

In [None]:
eps, arr = np.array(episode_rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(loss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

## Testing

Let's checkout how our trained agent plays the game.

In [35]:
# Creating a gym env
import gym

# env = gym.make('CartPole-v0')
env = gym.make('CartPole-v1')

# A training graph session
with tf.Session(graph=graph) as sess:
    #sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    
    # Episodes/epochs
    for _ in range(10):
        state = env.reset()
        total_reward = 0

        # Steps/batches
        while True:
            env.render()
            
            action, _ = act(sess, state)
            
            state, reward, done, _ = env.step(action)
            
            total_reward += reward
            
            if done:
                break
                
        # Closing the env
        print('total_reward: {}'.format(total_reward))

# Close the env at the end
env.close()

INFO:tensorflow:Restoring parameters from checkpoints/model.ckpt


  result = entry_point.load(False)


total_reward: 500.0
total_reward: 500.0
total_reward: 500.0
total_reward: 500.0
total_reward: 500.0
total_reward: 500.0
total_reward: 500.0
total_reward: 500.0
total_reward: 500.0
total_reward: 500.0


## Extending this

So, Cart-Pole is a pretty simple game. However, the same model can be used to train an agent to play something much more complicated like Pong or Space Invaders. Instead of a state like we're using here though, you'd want to use convolutional layers to get the state from the screen images.

![Deep Q-Learning Atari](assets/atari-network.png)

I'll leave it as a challenge for you to use deep Q-learning to train an agent to play Atari games. Here's the original paper which will get you started: http://www.davidqiu.com:8888/research/nature14236.pdf.