# Deep cortical machine learning: Policy gradients + Q-learning + GAN


In this notebook, we'll build a neural network that can learn to play games through reinforcement learning. More specifically, we'll use Q-learning to train an agent to play a game called [Cart-Pole](https://gym.openai.com/envs/CartPole-v0). In this game, a freely swinging pole is attached to a cart. The cart can move to the left and right, and the goal is to keep the pole upright as long as possible.

![Cart-Pole](assets/cart-pole.jpg)

We can simulate this game using [OpenAI Gym](https://gym.openai.com/). First, let's check out how OpenAI Gym works. Then, we'll get into training an agent to play the Cart-Pole game.

In [1]:
# In this one we should define and detect GPUs for tensorflow
# GPUs or CPU
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.7.1
Default GPU Device: 


>**Note:** Make sure you have OpenAI Gym cloned into the same directory with this notebook. I've included `gym` as a submodule, so you can run `git submodule --init --recursive` to pull the contents into the `gym` repo.

##### >**Note:** Make sure you have OpenAI Gym cloned. Then run this command `pip install -e gym/[all]`.

In [2]:
import gym

## Create the Cart-Pole game environment
env = gym.make('CartPole-v0')
env = gym.make('CartPole-v1')
# env = gym.make('Acrobot-v1')
# env = gym.make('MountainCar-v0')
# env = gym.make('MountainCarContinuous-v0')
# env = gym.make('Pendulum-v0')
# env = gym.make('Blackjack-v0')
# env = gym.make('FrozenLake-v0')
# env = gym.make('AirRaid-ram-v0')
# env = gym.make('AirRaid-v0')
# env = gym.make('BipedalWalker-v2')
# env = gym.make('Copy-v0')
# env = gym.make('CarRacing-v0')
# env = gym.make('Ant-v2') #mujoco
# env = gym.make('FetchPickAndPlace-v1') # mujoco required!

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m




We interact with the simulation through `env`. To show the simulation running, you can use `env.render()` to render one frame. Passing in an action as an integer to `env.step` will generate the next step in the simulation.  You can see how many actions are possible from `env.action_space` and to get a random action you can use `env.action_space.sample()`. This is general to all Gym games. In the Cart-Pole game, there are two possible actions, moving the cart left or right. So there are two actions we can take, encoded as 0 and 1.

Run the code below to watch the simulation run.

In [3]:
env.reset()
batch = []
for _ in range(1111):
    #env.render()
    action = env.action_space.sample()
    state, reward, done, info = env.step(action) # take a random action
    batch.append([action, state, reward, done, info])
    #print('state, action, reward, done, info:', state, action, reward, done, info)
    if done:
        env.reset()

To shut the window showing the simulation, use `env.close()`.

In [4]:
# env.close()

If you ran the simulation above, we can look at the rewards:

In [5]:
batch[0], 
batch[0][1].shape, state.shape

((4,), (4,))

In [6]:
import numpy as np
actions = np.array([each[0] for each in batch])
states = np.array([each[1] for each in batch])
rewards = np.array([each[2] for each in batch])
dones = np.array([each[3] for each in batch])
infos = np.array([each[4] for each in batch])

In [7]:
# print(rewards[-20:])
print('shapes:', np.array(rewards).shape, np.array(states).shape, np.array(actions).shape, np.array(dones).shape)
print('dtypes:', np.array(rewards).dtype, np.array(states).dtype, np.array(actions).dtype, np.array(dones).dtype)
print('states:', np.max(np.array(states)), np.min(np.array(states)))
print('actions:', np.max(np.array(actions)), np.min(np.array(actions)))
# print((np.max(np.array(actions)) - np.min(np.array(actions)))+1)
print('rewards:', np.max(np.array(rewards)), np.min(np.array(rewards)))

shapes: (1111,) (1111, 4) (1111,) (1111,)
dtypes: float64 float64 int64 bool
states: 2.7618828170398175 -2.7965065300181307
actions: 1 0
rewards: 1.0 1.0


In [8]:
actions[:10]

array([0, 1, 1, 0, 1, 1, 1, 1, 1, 1])

In [9]:
rewards[:10]

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [10]:
# import numpy as np
def sigmoid(x, derivative=False):
  return x*(1-x) if derivative else 1/(1+np.exp(-x))

In [11]:
sigmoid(np.max(np.array(rewards))), sigmoid(np.min(np.array(rewards)))

(0.7310585786300049, 0.7310585786300049)

In [12]:
print('rewards:', np.max(np.array(rewards))/100, np.min(np.array(rewards))/100)

rewards: 0.01 0.01


The game resets after the pole has fallen past a certain angle. For each frame while the simulation is running, it returns a reward of 1.0. The longer the game runs, the more reward we get. Then, our network's goal is to maximize the reward by keeping the pole vertical. It will do this by moving the cart to the left and the right.

## Q-Network

We train our Q-learning agent using the Bellman Equation:

$$
Q(s, a) = r + \gamma \max{Q(s', a')}
$$

where $s$ is a state, $a$ is an action, and $s'$ is the next state from state $s$ and action $a$.

Before we used this equation to learn values for a Q-_table_. However, for this game there are a huge number of states available. The state has four values: the position and velocity of the cart, and the position and velocity of the pole. These are all real-valued numbers, so ignoring floating point precisions, you practically have infinite states. Instead of using a table then, we'll replace it with a neural network that will approximate the Q-table lookup function.

<img src="assets/deep-q-learning.png" width=450px>

Now, our Q value, $Q(s, a)$ is calculated by passing in a state to the network. The output will be Q-values for each available action, with fully connected hidden layers.

<img src="assets/q-network.png" width=550px>


As I showed before, we can define our targets for training as $\hat{Q}(s,a) = r + \gamma \max{Q(s', a')}$. Then we update the weights by minimizing $(\hat{Q}(s,a) - Q(s,a))^2$. 

For this Cart-Pole game, we have four inputs, one for each value in the state, and two outputs, one for each action. To get $\hat{Q}$, we'll first choose an action, then simulate the game using that action. This will get us the next state, $s'$, and the reward. With that, we can calculate $\hat{Q}$ then pass it back into the $Q$ network to run the optimizer and update the weights.

Below is my implementation of the Q-network. I used two fully connected layers with ReLU activations. Two seems to be good enough, three might be better. Feel free to try it out.

In [13]:
# Data of the model
def model_input(state_size):
    states = tf.placeholder(tf.float32, [None, state_size], name='states')
    actions = tf.placeholder(tf.int32, [None], name='actions')
    targetQs = tf.placeholder(tf.float32, [None], name='targetQs')
    reward = tf.placeholder(tf.float32, [], name='reward')
    return states, actions, targetQs, reward

In [14]:
# Generator/Controller: Generating/prediting the actions
def generator(states, action_size, hidden_size, reuse=False, alpha=0.1, training=False):
    with tf.variable_scope('generator', reuse=reuse):
        # First fully connected layer
        h1 = tf.layers.dense(inputs=states, units=hidden_size)
        bn1 = tf.layers.batch_normalization(h1, training=training)        
        nl1 = tf.maximum(alpha * bn1, bn1)
        
        # Second fully connected layer
        h2 = tf.layers.dense(inputs=nl1, units=hidden_size)
        bn2 = tf.layers.batch_normalization(h2, training=training)        
        nl2 = tf.maximum(alpha * bn2, bn2)
        
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=action_size)        
        #predictions = tf.nn.softmax(logits)

        # return actions logits
        return logits

In [15]:
# Discriminator/Dopamine: Reward function/planner/naviator/advisor/supervisor/cortical columns
def discriminator(states, actions, hidden_size, reuse=False, alpha=0.1, training=False):
    with tf.variable_scope('discriminator', reuse=reuse):
        # Fusion/merge states and actions/ SA/ SM
        x_fused = tf.concat(axis=1, values=[states, actions])
        
        # First fully connected layer
        h1 = tf.layers.dense(inputs=x_fused, units=hidden_size)
        bn1 = tf.layers.batch_normalization(h1, training=training)        
        nl1 = tf.maximum(alpha * bn1, bn1)
        
        # Second fully connected layer
        h2 = tf.layers.dense(inputs=nl1, units=hidden_size)
        bn2 = tf.layers.batch_normalization(h2, training=training)        
        nl2 = tf.maximum(alpha * bn2, bn2)
        
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=1)        
        #predictions = tf.nn.softmax(logits)

        # return rewards logits
        return logits

In [16]:
def model_loss(action_size, hidden_size, states, actions, targetQs, reward):
    # G
    actions_logits = generator(states=states, hidden_size=hidden_size, action_size=action_size)
    actions_labels = tf.one_hot(indices=actions, depth=action_size, dtype=actions_logits.dtype)
    neg_log_prob_actions = tf.nn.softmax_cross_entropy_with_logits_v2(logits=actions_logits, 
                                                                      labels=actions_labels)
    g_loss = tf.reduce_mean(neg_log_prob_actions[:-1] * targetQs[1:])
    
    # D
    Qs_logits = discriminator(actions=actions_logits, hidden_size=hidden_size, states=states)
    d_lossR = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=Qs_logits,
                                                                     labels=reward*tf.ones_like(Qs_logits)))
    d_lossQ = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=Qs_logits[:-1],
                                                                     labels=tf.reshape(tf.nn.sigmoid(targetQs[1:]),
                                                                                       shape=[-1, 1])))
    d_loss = d_lossR #+ d_lossQ

    return actions_logits, Qs_logits, g_loss, d_loss, d_lossR, d_lossQ

In [17]:
# Optimizating/training/learning G & D
def model_opt(g_loss, d_loss, learning_rate):
    """
    Get optimization operations in order
    :param g_loss: Generator loss Tensor for action prediction
    :param d_loss: Discriminator loss Tensor for reward prediction for generated/prob/logits action
    :param learning_rate: Learning Rate Placeholder
    :return: A tuple of (qfunction training, generator training, discriminator training)
    """
    # Get weights and bias to update
    t_vars = tf.trainable_variables()
    g_vars = [var for var in t_vars if var.name.startswith('generator')]
    d_vars = [var for var in t_vars if var.name.startswith('discriminator')]

    # Optimize
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): # Required for batchnorm (BN)
        g_opt = tf.train.AdamOptimizer(learning_rate).minimize(g_loss, var_list=g_vars)
        d_opt = tf.train.AdamOptimizer(learning_rate).minimize(d_loss, var_list=d_vars)

    return g_opt, d_opt

In [18]:
class Model:
    def __init__(self, state_size, action_size, hidden_size, learning_rate):

        # Data of the Model: make the data available inside the framework
        self.states, self.actions, self.targetQs, self.reward = model_input(state_size=state_size)

        # Create the Model: calculating the loss and forwad pass
        self.actions_logits, self.Qs_logits, self.g_loss, self.d_loss, self.d_lossR, self.d_lossQ = model_loss(
            action_size=action_size, hidden_size=hidden_size, # model init parameters
            states=self.states, actions=self.actions, # model input
            targetQs=self.targetQs, reward=self.reward) # model input
        
        # Update the model: backward pass and backprop
        self.g_opt, self.d_opt = model_opt(g_loss=self.g_loss, d_loss=self.d_loss, learning_rate=learning_rate)

## Hyperparameters

One of the more difficult aspects of reinforcememt learning are the large number of hyperparameters. Not only are we tuning the network, but we're tuning the simulation.

In [19]:
print('state size:{}'.format(states.shape), 
      'actions:{}'.format(actions.shape)) 
print('action size:{}'.format(np.max(actions) - np.min(actions)+1))

state size:(1111, 4) actions:(1111,)
action size:2


In [20]:
# Training parameters
# Network parameters
state_size = 4               # number of units for the input state/observation -- simulation
action_size = 2              # number of units for the output actions -- simulation
hidden_size = 64             # number of units in each Q-network hidden layer -- simulation
learning_rate = 0.001          # learning rate for adam

In [21]:
# Reset/init the graph/session
graph = tf.reset_default_graph()

# Init the model
model = Model(action_size=action_size, hidden_size=hidden_size, state_size=state_size, learning_rate=learning_rate)

## Training the model

Below we'll train our agent. If you want to watch it train, uncomment the `env.render()` line. This is slow because it's rendering the frames slower than the network can train. But, it's cool to watch the agent get better at the game.

In [22]:
# import gym

# ## Create the Cart-Pole game environment
# env = gym.make('CartPole-v0')
# env = gym.make('CartPole-v1')
# env = gym.make('Acrobot-v1')
# # env = gym.make('MountainCar-v0')
# # env = gym.make('Pendulum-v0')
# # env = gym.make('Blackjack-v0')
# # env = gym.make('FrozenLake-v0')
# # env = gym.make('AirRaid-ram-v0')
# # env = gym.make('AirRaid-v0')
# # env = gym.make('BipedalWalker-v2')
# # env = gym.make('Copy-v0')
# # env = gym.make('CarRacing-v0')
# # env = gym.make('Ant-v2') #mujoco
# # env = gym.make('FetchPickAndPlace-v1') # mujoco required!

In [None]:
from collections import deque
episodes_total_reward = deque(maxlen=100) # 100 episodes average/running average/running mean/window
saver = tf.train.Saver()
rewards_list, g_loss_list, d_loss_list = [], [], []
d_lossR_list, d_lossQ_list = [], []

# TF session for training
with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    
    # Training episodes/epochs
    for ep in range(111111):
        batch = [] # every data batch
        total_reward = 0
        state = env.reset() # env first state

        # Training steps/batches
        while True:
            action_logits, Q_logits = sess.run(fetches=[model.actions_logits, model.Qs_logits], 
                                               feed_dict={model.states: np.reshape(state, [1, -1])})
            action = np.argmax(action_logits)
            batch.append([state, action, Q_logits])
            state, reward, done, _ = env.step(action)
            total_reward += reward
            if done is True: # episode ended success/failure
                episodes_total_reward.append(total_reward) # stopping criteria
                rate = total_reward/ 500 # success is 500 points
                break

        # Training using batches
        #batch = memory.buffer
        states = np.array([each[0] for each in batch])
        actions = np.array([each[1] for each in batch])
        targetQs = np.array([each[2] for each in batch])
        g_loss, d_loss, d_lossR, d_lossQ, _, _ = sess.run([model.g_loss, model.d_loss, 
                                                           model.d_lossR, model.d_lossQ, 
                                                           model.g_opt, model.d_opt],
                                                          feed_dict = {model.states: states, 
                                                                       model.actions: actions,
                                                                       model.reward: rate,
                                                                       model.targetQs: targetQs.reshape([-1])})
        # Average 100 episode total reward
        # Print out
        print('Episode:{}'.format(ep),
              'meanR:{:.4f}'.format(np.mean(episodes_total_reward)),
              'gloss:{:.4f}'.format(g_loss),
              'dloss:{:.4f}'.format(d_loss),
              'dlossR:{:.4f}'.format(d_lossR),
              'dlossQ:{:.4f}'.format(d_lossQ),
)
        # Ploting out
        rewards_list.append([ep, np.mean(episodes_total_reward)])
        g_loss_list.append([ep, g_loss])
        d_loss_list.append([ep, d_loss])
        d_lossR_list.append([ep, d_lossR])
        d_lossQ_list.append([ep, d_lossQ])
        # Break episode/epoch loop
        if np.mean(episodes_total_reward) >= 500:
            break
            
    # At the end of all training episodes/epochs
    saver.save(sess, 'checkpoints/model2.ckpt')

Episode:0 meanR:46.0000 gloss:-0.0105 dloss:0.6879 dlossR:0.6879 dlossQ:0.6925
Episode:1 meanR:138.5000 gloss:0.0243 dloss:0.6955 dlossR:0.6955 dlossQ:0.6923
Episode:2 meanR:99.3333 gloss:0.0103 dloss:0.6993 dlossR:0.6993 dlossQ:0.6931
Episode:3 meanR:78.0000 gloss:0.0492 dloss:0.7240 dlossR:0.7240 dlossQ:0.6924
Episode:4 meanR:64.4000 gloss:0.0422 dloss:0.7186 dlossR:0.7186 dlossQ:0.6922
Episode:5 meanR:55.1667 gloss:0.0073 dloss:0.6942 dlossR:0.6942 dlossQ:0.6930
Episode:6 meanR:48.5714 gloss:-0.0253 dloss:0.6712 dlossR:0.6712 dlossQ:0.6929
Episode:7 meanR:43.7500 gloss:-0.0599 dloss:0.6476 dlossR:0.6476 dlossQ:0.6920
Episode:8 meanR:40.0000 gloss:-0.0982 dloss:0.6203 dlossR:0.6203 dlossQ:0.6898
Episode:9 meanR:36.8000 gloss:-0.1347 dloss:0.5985 dlossR:0.5985 dlossQ:0.6872
Episode:10 meanR:34.3636 gloss:-0.1867 dloss:0.5703 dlossR:0.5703 dlossQ:0.6819
Episode:11 meanR:32.1667 gloss:-0.2226 dloss:0.5528 dlossR:0.5528 dlossQ:0.6785
Episode:12 meanR:30.4615 gloss:-0.2882 dloss:0.5231 dl

Episode:105 meanR:30.8700 gloss:-1.7023 dloss:0.6261 dlossR:0.6261 dlossQ:0.2744
Episode:106 meanR:32.4800 gloss:-1.7848 dloss:0.9572 dlossR:0.9572 dlossQ:0.2528
Episode:107 meanR:32.9900 gloss:-1.7318 dloss:0.3853 dlossR:0.3853 dlossQ:0.2653
Episode:108 meanR:33.9800 gloss:-1.7852 dloss:0.6386 dlossR:0.6386 dlossQ:0.2533
Episode:109 meanR:34.4800 gloss:-1.7358 dloss:0.3701 dlossR:0.3701 dlossQ:0.2650
Episode:110 meanR:34.6800 gloss:-1.7164 dloss:0.2292 dlossR:0.2292 dlossQ:0.2704
Episode:111 meanR:34.8000 gloss:-1.7736 dloss:0.1768 dlossR:0.1768 dlossQ:0.2561
Episode:112 meanR:34.8900 gloss:-1.8613 dloss:0.1682 dlossR:0.1682 dlossQ:0.2360
Episode:113 meanR:34.8500 gloss:-1.8814 dloss:0.1575 dlossR:0.1575 dlossQ:0.2318
Episode:114 meanR:34.7000 gloss:-1.9276 dloss:0.1669 dlossR:0.1669 dlossQ:0.2216
Episode:115 meanR:34.3400 gloss:-1.9328 dloss:0.1393 dlossR:0.1393 dlossQ:0.2207
Episode:116 meanR:33.5900 gloss:-2.1017 dloss:0.1177 dlossR:0.1177 dlossQ:0.1883
Episode:117 meanR:31.3300 gl

Episode:209 meanR:47.8800 gloss:-1.9607 dloss:0.1229 dlossR:0.1229 dlossQ:0.2201
Episode:210 meanR:47.6700 gloss:-1.9431 dloss:0.1187 dlossR:0.1187 dlossQ:0.2240
Episode:211 meanR:47.6100 gloss:-1.8661 dloss:0.1468 dlossR:0.1468 dlossQ:0.2410
Episode:212 meanR:47.6300 gloss:-1.5528 dloss:0.1978 dlossR:0.1978 dlossQ:0.3193
Episode:213 meanR:47.7000 gloss:-1.7077 dloss:0.2072 dlossR:0.2072 dlossQ:0.2843
Episode:214 meanR:48.3600 gloss:-1.6612 dloss:0.4991 dlossR:0.4991 dlossQ:0.2924
Episode:215 meanR:48.8200 gloss:-1.5960 dloss:0.3767 dlossR:0.3767 dlossQ:0.3087
Episode:216 meanR:49.4700 gloss:-1.6177 dloss:0.4499 dlossR:0.4499 dlossQ:0.2994
Episode:217 meanR:49.6300 gloss:-1.5549 dloss:0.2263 dlossR:0.2263 dlossQ:0.3139
Episode:218 meanR:49.7000 gloss:-1.5685 dloss:0.1847 dlossR:0.1847 dlossQ:0.3104
Episode:219 meanR:49.8300 gloss:-1.6101 dloss:0.2143 dlossR:0.2143 dlossQ:0.2989
Episode:220 meanR:49.9800 gloss:-1.5808 dloss:0.2252 dlossR:0.2252 dlossQ:0.3069
Episode:221 meanR:49.9400 gl

Episode:321 meanR:37.9400 gloss:-1.8171 dloss:0.1362 dlossR:0.1362 dlossQ:0.2469
Episode:322 meanR:37.9300 gloss:-1.8834 dloss:0.1478 dlossR:0.1478 dlossQ:0.2314
Episode:323 meanR:37.9400 gloss:-1.9400 dloss:0.1454 dlossR:0.1454 dlossQ:0.2194
Episode:324 meanR:37.9900 gloss:-1.9716 dloss:0.1555 dlossR:0.1555 dlossQ:0.2131
Episode:325 meanR:38.0100 gloss:-2.0082 dloss:0.1322 dlossR:0.1322 dlossQ:0.2062
Episode:326 meanR:38.0100 gloss:-2.0714 dloss:0.1300 dlossR:0.1300 dlossQ:0.1940
Episode:327 meanR:38.0300 gloss:-2.0977 dloss:0.1407 dlossR:0.1407 dlossQ:0.1893
Episode:328 meanR:37.9900 gloss:-2.0926 dloss:0.1523 dlossR:0.1523 dlossQ:0.1902
Episode:329 meanR:37.9500 gloss:-2.1001 dloss:0.1347 dlossR:0.1347 dlossQ:0.1892
Episode:330 meanR:38.2900 gloss:-1.9332 dloss:0.4118 dlossR:0.4118 dlossQ:0.2283
Episode:331 meanR:39.8200 gloss:-1.8965 dloss:1.0858 dlossR:1.0858 dlossQ:0.2375
Episode:332 meanR:39.7700 gloss:-2.1225 dloss:0.2576 dlossR:0.2576 dlossQ:0.1879
Episode:333 meanR:39.7100 gl

Episode:434 meanR:28.5200 gloss:-2.3795 dloss:0.1094 dlossR:0.1094 dlossQ:0.1507
Episode:435 meanR:28.3500 gloss:-2.3963 dloss:0.1086 dlossR:0.1086 dlossQ:0.1488
Episode:436 meanR:27.9600 gloss:-2.4515 dloss:0.1085 dlossR:0.1085 dlossQ:0.1446
Episode:437 meanR:27.6500 gloss:-2.5578 dloss:0.1039 dlossR:0.1039 dlossQ:0.1212
Episode:438 meanR:27.2200 gloss:-2.5790 dloss:0.1174 dlossR:0.1174 dlossQ:0.1171
Episode:439 meanR:27.2200 gloss:-2.6764 dloss:0.1034 dlossR:0.1034 dlossQ:0.1077
Episode:440 meanR:27.2500 gloss:-2.7172 dloss:0.1178 dlossR:0.1178 dlossQ:0.1022
Episode:441 meanR:27.2400 gloss:-2.7003 dloss:0.1037 dlossR:0.1037 dlossQ:0.1064
Episode:442 meanR:27.2100 gloss:-2.7002 dloss:0.0969 dlossR:0.0969 dlossQ:0.1088
Episode:443 meanR:26.9600 gloss:-2.5048 dloss:0.1372 dlossR:0.1372 dlossQ:0.1257
Episode:444 meanR:26.5200 gloss:-2.2003 dloss:0.1693 dlossR:0.1693 dlossQ:0.1722
Episode:445 meanR:26.5800 gloss:-2.0520 dloss:0.2112 dlossR:0.2112 dlossQ:0.1990
Episode:446 meanR:27.0100 gl

Episode:538 meanR:26.3300 gloss:-1.6279 dloss:0.4976 dlossR:0.4976 dlossQ:0.2978
Episode:539 meanR:27.2100 gloss:-1.6421 dloss:0.5565 dlossR:0.5565 dlossQ:0.2944
Episode:540 meanR:27.6800 gloss:-1.6640 dloss:0.3704 dlossR:0.3704 dlossQ:0.2870
Episode:541 meanR:29.2500 gloss:-1.8625 dloss:0.9651 dlossR:0.9651 dlossQ:0.2395
Episode:542 meanR:31.1400 gloss:-1.7638 dloss:1.0856 dlossR:1.0856 dlossQ:0.2638
Episode:543 meanR:32.6900 gloss:-1.7199 dloss:0.9260 dlossR:0.9260 dlossQ:0.2743
Episode:544 meanR:34.1000 gloss:-1.6491 dloss:0.8567 dlossR:0.8567 dlossQ:0.2930
Episode:545 meanR:34.7500 gloss:-1.3939 dloss:0.4962 dlossR:0.4962 dlossQ:0.3637
Episode:546 meanR:35.1900 gloss:-1.3190 dloss:0.5468 dlossR:0.5468 dlossQ:0.3865
Episode:547 meanR:35.6100 gloss:-1.4191 dloss:0.5078 dlossR:0.5078 dlossQ:0.3575
Episode:548 meanR:36.5500 gloss:-1.3504 dloss:0.6421 dlossR:0.6421 dlossQ:0.3786
Episode:549 meanR:36.3900 gloss:-1.4846 dloss:0.2114 dlossR:0.2114 dlossQ:0.3406
Episode:550 meanR:36.6400 gl

Episode:642 meanR:51.8600 gloss:-1.6713 dloss:0.1781 dlossR:0.1781 dlossQ:0.2933
Episode:643 meanR:50.4600 gloss:-1.6132 dloss:0.2377 dlossR:0.2377 dlossQ:0.3058
Episode:644 meanR:49.2800 gloss:-1.4234 dloss:0.3013 dlossR:0.3013 dlossQ:0.3585
Episode:645 meanR:48.4600 gloss:-2.0519 dloss:0.1239 dlossR:0.1239 dlossQ:0.2093
Episode:646 meanR:47.8000 gloss:-1.4698 dloss:0.2919 dlossR:0.2919 dlossQ:0.3456
Episode:647 meanR:47.8000 gloss:-1.4130 dloss:0.5084 dlossR:0.5084 dlossQ:0.3601
Episode:648 meanR:47.5700 gloss:-1.5616 dloss:0.5881 dlossR:0.5881 dlossQ:0.3209
Episode:649 meanR:48.6900 gloss:-1.2547 dloss:0.6388 dlossR:0.6388 dlossQ:0.4079
Episode:650 meanR:49.1300 gloss:-1.4596 dloss:0.5511 dlossR:0.5511 dlossQ:0.3469
Episode:651 meanR:48.9100 gloss:-1.2545 dloss:0.4714 dlossR:0.4714 dlossQ:0.4077
Episode:652 meanR:48.7500 gloss:-1.3943 dloss:0.4140 dlossR:0.4140 dlossQ:0.3659
Episode:653 meanR:48.4600 gloss:-1.4744 dloss:0.2592 dlossR:0.2592 dlossQ:0.3452
Episode:654 meanR:52.0500 gl

Episode:746 meanR:67.7800 gloss:-1.2166 dloss:0.2335 dlossR:0.2335 dlossQ:0.4210
Episode:747 meanR:67.0400 gloss:-1.2863 dloss:0.2233 dlossR:0.2233 dlossQ:0.3989
Episode:748 meanR:66.1400 gloss:-1.2846 dloss:0.2130 dlossR:0.2130 dlossQ:0.3993
Episode:749 meanR:64.9800 gloss:-1.3024 dloss:0.2144 dlossR:0.2144 dlossQ:0.3943
Episode:750 meanR:64.1100 gloss:-1.3395 dloss:0.2025 dlossR:0.2025 dlossQ:0.3827
Episode:751 meanR:63.3700 gloss:-1.4499 dloss:0.1810 dlossR:0.1810 dlossQ:0.3502
Episode:752 meanR:62.8600 gloss:-1.4312 dloss:0.2067 dlossR:0.2067 dlossQ:0.3554
Episode:753 meanR:62.7000 gloss:-1.5639 dloss:0.1819 dlossR:0.1819 dlossQ:0.3179
Episode:754 meanR:58.8600 gloss:-1.5237 dloss:0.2024 dlossR:0.2024 dlossQ:0.3298
Episode:755 meanR:57.8700 gloss:-1.5479 dloss:0.1957 dlossR:0.1957 dlossQ:0.3229
Episode:756 meanR:56.4000 gloss:-1.5638 dloss:0.1989 dlossR:0.1989 dlossQ:0.3192
Episode:757 meanR:55.0900 gloss:-1.5604 dloss:0.2255 dlossR:0.2255 dlossQ:0.3206
Episode:758 meanR:54.9800 gl

Episode:851 meanR:61.8900 gloss:-1.4912 dloss:0.2139 dlossR:0.2139 dlossQ:0.3407
Episode:852 meanR:61.9600 gloss:-1.3981 dloss:0.2388 dlossR:0.2388 dlossQ:0.3680
Episode:853 meanR:62.2400 gloss:-1.1775 dloss:0.3238 dlossR:0.3238 dlossQ:0.4335
Episode:854 meanR:64.5300 gloss:-1.0637 dloss:0.9825 dlossR:0.9825 dlossQ:0.4639
Episode:855 meanR:65.6000 gloss:-1.2876 dloss:0.6175 dlossR:0.6175 dlossQ:0.3973
Episode:856 meanR:65.9400 gloss:-1.3314 dloss:0.3487 dlossR:0.3487 dlossQ:0.3842
Episode:857 meanR:66.0800 gloss:-1.3924 dloss:0.2922 dlossR:0.2922 dlossQ:0.3671
Episode:858 meanR:66.1000 gloss:-1.5268 dloss:0.2595 dlossR:0.2595 dlossQ:0.3317
Episode:859 meanR:65.4700 gloss:-1.5764 dloss:0.1927 dlossR:0.1927 dlossQ:0.3162
Episode:860 meanR:64.5300 gloss:-1.6838 dloss:0.1767 dlossR:0.1767 dlossQ:0.2885
Episode:861 meanR:64.1100 gloss:-1.7951 dloss:0.1557 dlossR:0.1557 dlossQ:0.2602
Episode:862 meanR:64.2200 gloss:-1.6907 dloss:0.1859 dlossR:0.1859 dlossQ:0.2871
Episode:863 meanR:64.5800 gl

Episode:955 meanR:74.4200 gloss:-1.3096 dloss:0.2145 dlossR:0.2145 dlossQ:0.3959
Episode:956 meanR:74.1300 gloss:-1.3126 dloss:0.2443 dlossR:0.2443 dlossQ:0.3957
Episode:957 meanR:74.3600 gloss:-1.0952 dloss:0.3915 dlossR:0.3915 dlossQ:0.4605
Episode:958 meanR:75.1900 gloss:-0.9739 dloss:0.5495 dlossR:0.5495 dlossQ:0.4980
Episode:959 meanR:75.9500 gloss:-0.9864 dloss:0.4915 dlossR:0.4915 dlossQ:0.4940
Episode:960 meanR:76.3100 gloss:-1.1095 dloss:0.3581 dlossR:0.3581 dlossQ:0.4542
Episode:961 meanR:76.6100 gloss:-1.1723 dloss:0.3233 dlossR:0.3233 dlossQ:0.4352
Episode:962 meanR:76.7000 gloss:-1.2258 dloss:0.2637 dlossR:0.2637 dlossQ:0.4189
Episode:963 meanR:76.5500 gloss:-1.3035 dloss:0.2658 dlossR:0.2658 dlossQ:0.3963
Episode:964 meanR:76.4400 gloss:-1.3268 dloss:0.2561 dlossR:0.2561 dlossQ:0.3894
Episode:965 meanR:76.5300 gloss:-1.3231 dloss:0.2373 dlossR:0.2373 dlossQ:0.3892
Episode:966 meanR:76.6700 gloss:-1.3668 dloss:0.2450 dlossR:0.2450 dlossQ:0.3761
Episode:967 meanR:76.7900 gl

Episode:1056 meanR:68.7700 gloss:-1.3877 dloss:0.2739 dlossR:0.2739 dlossQ:0.3729
Episode:1057 meanR:68.3600 gloss:-1.6535 dloss:0.2044 dlossR:0.2044 dlossQ:0.3001
Episode:1058 meanR:67.5000 gloss:-1.3709 dloss:0.2570 dlossR:0.2570 dlossQ:0.3767
Episode:1059 meanR:67.1000 gloss:-1.4395 dloss:0.3559 dlossR:0.3559 dlossQ:0.3567
Episode:1060 meanR:67.6600 gloss:-1.4800 dloss:0.5870 dlossR:0.5870 dlossQ:0.3460
Episode:1061 meanR:68.0700 gloss:-1.4015 dloss:0.4800 dlossR:0.4800 dlossQ:0.3701
Episode:1062 meanR:68.1500 gloss:-1.5275 dloss:0.2887 dlossR:0.2887 dlossQ:0.3494
Episode:1063 meanR:68.3500 gloss:-1.4538 dloss:0.3458 dlossR:0.3458 dlossQ:0.3589
Episode:1064 meanR:68.1900 gloss:-2.2420 dloss:0.1429 dlossR:0.1429 dlossQ:0.1804
Episode:1065 meanR:68.0400 gloss:-2.7274 dloss:0.1242 dlossR:0.1242 dlossQ:0.1398
Episode:1066 meanR:68.0400 gloss:-1.5647 dloss:0.2356 dlossR:0.2356 dlossQ:0.3267
Episode:1067 meanR:68.8500 gloss:-1.3052 dloss:0.5545 dlossR:0.5545 dlossQ:0.3940
Episode:1068 mea

Episode:1160 meanR:122.1800 gloss:-0.4766 dloss:0.5885 dlossR:0.5885 dlossQ:0.6375
Episode:1161 meanR:123.3500 gloss:-0.4936 dloss:0.6940 dlossR:0.6940 dlossQ:0.6342
Episode:1162 meanR:127.9800 gloss:-0.3960 dloss:1.0261 dlossR:1.0261 dlossQ:0.6568
Episode:1163 meanR:130.9700 gloss:-0.6490 dloss:0.9972 dlossR:0.9972 dlossQ:0.5986
Episode:1164 meanR:135.1100 gloss:-0.5295 dloss:1.0574 dlossR:1.0574 dlossQ:0.6190
Episode:1165 meanR:136.9600 gloss:-0.5354 dloss:0.7156 dlossR:0.7156 dlossQ:0.6114
Episode:1166 meanR:136.8300 gloss:-0.6857 dloss:0.3573 dlossR:0.3573 dlossQ:0.5842
Episode:1167 meanR:135.8500 gloss:-0.7315 dloss:0.3424 dlossR:0.3424 dlossQ:0.5703
Episode:1168 meanR:135.0300 gloss:-0.9098 dloss:0.2954 dlossR:0.2954 dlossQ:0.5147
Episode:1169 meanR:134.9700 gloss:-0.9586 dloss:0.2861 dlossR:0.2861 dlossQ:0.4999
Episode:1170 meanR:135.2200 gloss:-0.8758 dloss:0.3748 dlossR:0.3748 dlossQ:0.5348
Episode:1171 meanR:136.2300 gloss:-0.5423 dloss:0.5969 dlossR:0.5969 dlossQ:0.6090
Epis

Episode:1259 meanR:107.0400 gloss:-1.6705 dloss:0.2016 dlossR:0.2016 dlossQ:0.3073
Episode:1260 meanR:105.8900 gloss:-1.8838 dloss:0.1585 dlossR:0.1585 dlossQ:0.2540
Episode:1261 meanR:104.0000 gloss:-1.8971 dloss:0.1647 dlossR:0.1647 dlossQ:0.2525
Episode:1262 meanR:99.1800 gloss:-1.8333 dloss:0.1836 dlossR:0.1836 dlossQ:0.2656
Episode:1263 meanR:95.8100 gloss:-1.9154 dloss:0.1589 dlossR:0.1589 dlossQ:0.2460
Episode:1264 meanR:91.8300 gloss:-1.4547 dloss:0.2763 dlossR:0.2763 dlossQ:0.3815
Episode:1265 meanR:90.7300 gloss:-1.1145 dloss:0.4727 dlossR:0.4727 dlossQ:0.4661
Episode:1266 meanR:91.2300 gloss:-1.0851 dloss:0.4044 dlossR:0.4044 dlossQ:0.4765
Episode:1267 meanR:91.4600 gloss:-1.0856 dloss:0.3157 dlossR:0.3157 dlossQ:0.4717
Episode:1268 meanR:91.7800 gloss:-1.0516 dloss:0.3406 dlossR:0.3406 dlossQ:0.4831
Episode:1269 meanR:91.9300 gloss:-1.1499 dloss:0.2717 dlossR:0.2717 dlossQ:0.4496
Episode:1270 meanR:91.7800 gloss:-1.1591 dloss:0.2757 dlossR:0.2757 dlossQ:0.4463
Episode:1271 

Episode:1360 meanR:72.1900 gloss:-1.8531 dloss:0.1633 dlossR:0.1633 dlossQ:0.2606
Episode:1361 meanR:72.5300 gloss:-1.1995 dloss:0.3534 dlossR:0.3534 dlossQ:0.4413
Episode:1362 meanR:77.3500 gloss:-1.0170 dloss:1.6787 dlossR:1.6787 dlossQ:0.4878
Episode:1363 meanR:78.9900 gloss:-0.9724 dloss:0.7207 dlossR:0.7207 dlossQ:0.4993
Episode:1364 meanR:80.1200 gloss:-0.9723 dloss:0.6226 dlossR:0.6226 dlossQ:0.4994
Episode:1365 meanR:79.8000 gloss:-1.0521 dloss:0.3617 dlossR:0.3617 dlossQ:0.4758
Episode:1366 meanR:79.5000 gloss:-1.1445 dloss:0.3003 dlossR:0.3003 dlossQ:0.4506
Episode:1367 meanR:79.4300 gloss:-1.2077 dloss:0.2636 dlossR:0.2636 dlossQ:0.4308
Episode:1368 meanR:79.2800 gloss:-1.1816 dloss:0.2702 dlossR:0.2702 dlossQ:0.4402
Episode:1369 meanR:79.3200 gloss:-1.1755 dloss:0.2727 dlossR:0.2727 dlossQ:0.4396
Episode:1370 meanR:79.5200 gloss:-1.1030 dloss:0.3423 dlossR:0.3423 dlossQ:0.4636
Episode:1371 meanR:79.9500 gloss:-1.0012 dloss:0.4529 dlossR:0.4529 dlossQ:0.4900
Episode:1372 mea

Episode:1466 meanR:99.4300 gloss:-1.0524 dloss:0.2765 dlossR:0.2765 dlossQ:0.4773
Episode:1467 meanR:99.6500 gloss:-1.1088 dloss:0.3471 dlossR:0.3471 dlossQ:0.4555
Episode:1468 meanR:99.9400 gloss:-1.1542 dloss:0.3724 dlossR:0.3724 dlossQ:0.4458
Episode:1469 meanR:100.1300 gloss:-1.1567 dloss:0.3392 dlossR:0.3392 dlossQ:0.4403
Episode:1470 meanR:99.8400 gloss:-1.3264 dloss:0.2230 dlossR:0.2230 dlossQ:0.3926
Episode:1471 meanR:99.2000 gloss:-1.2721 dloss:0.2351 dlossR:0.2351 dlossQ:0.4094
Episode:1472 meanR:97.9500 gloss:-1.4617 dloss:0.2039 dlossR:0.2039 dlossQ:0.3551
Episode:1473 meanR:97.2100 gloss:-1.3984 dloss:0.2367 dlossR:0.2367 dlossQ:0.3684
Episode:1474 meanR:95.8000 gloss:-1.3497 dloss:0.2445 dlossR:0.2445 dlossQ:0.3861
Episode:1475 meanR:94.7100 gloss:-1.3504 dloss:0.2451 dlossR:0.2451 dlossQ:0.4035
Episode:1476 meanR:94.0200 gloss:-1.2597 dloss:0.2841 dlossR:0.2841 dlossQ:0.4240
Episode:1477 meanR:96.2400 gloss:-0.9276 dloss:0.9448 dlossR:0.9448 dlossQ:0.5178
Episode:1478 me

Episode:1572 meanR:76.8700 gloss:-1.3157 dloss:0.4152 dlossR:0.4152 dlossQ:0.4001
Episode:1573 meanR:78.6900 gloss:-1.4357 dloss:0.9906 dlossR:0.9906 dlossQ:0.3697
Episode:1574 meanR:79.1700 gloss:-1.8994 dloss:0.4820 dlossR:0.4820 dlossQ:0.2600
Episode:1575 meanR:79.0800 gloss:-2.8828 dloss:0.1327 dlossR:0.1327 dlossQ:0.1202
Episode:1576 meanR:79.1800 gloss:-2.0606 dloss:0.2795 dlossR:0.2795 dlossQ:0.2150
Episode:1577 meanR:76.7100 gloss:-2.3164 dloss:0.1570 dlossR:0.1570 dlossQ:0.1740
Episode:1578 meanR:76.7800 gloss:-2.1436 dloss:0.2093 dlossR:0.2093 dlossQ:0.2075
Episode:1579 meanR:76.6800 gloss:-2.5413 dloss:0.1344 dlossR:0.1344 dlossQ:0.1455
Episode:1580 meanR:76.6500 gloss:-3.0565 dloss:0.1200 dlossR:0.1200 dlossQ:0.1170
Episode:1581 meanR:75.1900 gloss:-2.4847 dloss:0.1673 dlossR:0.1673 dlossQ:0.1681
Episode:1582 meanR:74.4600 gloss:-2.7885 dloss:0.1351 dlossR:0.1351 dlossQ:0.1215
Episode:1583 meanR:74.5000 gloss:-2.6034 dloss:0.1444 dlossR:0.1444 dlossQ:0.1353
Episode:1584 mea

Episode:1672 meanR:77.6300 gloss:-0.9305 dloss:0.7849 dlossR:0.7849 dlossQ:0.5251
Episode:1673 meanR:76.7000 gloss:-1.2568 dloss:0.5986 dlossR:0.5986 dlossQ:0.4471
Episode:1674 meanR:76.2200 gloss:-2.0029 dloss:0.2764 dlossR:0.2764 dlossQ:0.3327
Episode:1675 meanR:76.6500 gloss:-1.2996 dloss:0.3554 dlossR:0.3554 dlossQ:0.4020
Episode:1676 meanR:76.4100 gloss:-1.8417 dloss:0.1524 dlossR:0.1524 dlossQ:0.2541
Episode:1677 meanR:76.4000 gloss:-1.8182 dloss:0.1631 dlossR:0.1631 dlossQ:0.2597
Episode:1678 meanR:76.2900 gloss:-1.9067 dloss:0.1486 dlossR:0.1486 dlossQ:0.2395
Episode:1679 meanR:76.3200 gloss:-1.9225 dloss:0.1603 dlossR:0.1603 dlossQ:0.2394
Episode:1680 meanR:76.3700 gloss:-1.9434 dloss:0.1538 dlossR:0.1538 dlossQ:0.2340
Episode:1681 meanR:76.3600 gloss:-1.9600 dloss:0.1582 dlossR:0.1582 dlossQ:0.2317
Episode:1682 meanR:76.9100 gloss:-1.3002 dloss:0.4056 dlossR:0.4056 dlossQ:0.4030
Episode:1683 meanR:77.9300 gloss:-1.0673 dloss:0.5619 dlossR:0.5619 dlossQ:0.4780
Episode:1684 mea

Episode:1773 meanR:159.7500 gloss:-0.2287 dloss:0.6159 dlossR:0.6159 dlossQ:0.6788
Episode:1774 meanR:160.8400 gloss:-0.2300 dloss:0.6336 dlossR:0.6336 dlossQ:0.6779
Episode:1775 meanR:161.4900 gloss:-0.2787 dloss:0.6118 dlossR:0.6118 dlossQ:0.6724
Episode:1776 meanR:162.3300 gloss:-0.3185 dloss:0.5832 dlossR:0.5832 dlossQ:0.6665
Episode:1777 meanR:163.2000 gloss:-0.3243 dloss:0.5866 dlossR:0.5866 dlossQ:0.6650
Episode:1778 meanR:164.5200 gloss:-0.2885 dloss:0.6333 dlossR:0.6333 dlossQ:0.6689
Episode:1779 meanR:165.2300 gloss:-0.3922 dloss:0.5523 dlossR:0.5523 dlossQ:0.6536
Episode:1780 meanR:166.2600 gloss:-0.3942 dloss:0.5875 dlossR:0.5875 dlossQ:0.6525
Episode:1781 meanR:167.0100 gloss:-0.4598 dloss:0.5415 dlossR:0.5415 dlossQ:0.6395
Episode:1782 meanR:167.6000 gloss:-0.4073 dloss:0.6000 dlossR:0.6000 dlossQ:0.6467
Episode:1783 meanR:167.8900 gloss:-0.4329 dloss:0.6215 dlossR:0.6215 dlossQ:0.6415
Episode:1784 meanR:168.2300 gloss:-0.4674 dloss:0.5757 dlossR:0.5757 dlossQ:0.6360
Epis

Episode:1875 meanR:122.5700 gloss:-0.9232 dloss:0.5618 dlossR:0.5618 dlossQ:0.5225
Episode:1876 meanR:123.6000 gloss:-0.9767 dloss:0.8040 dlossR:0.8040 dlossQ:0.5058
Episode:1877 meanR:124.2600 gloss:-0.8058 dloss:0.6695 dlossR:0.6695 dlossQ:0.5503
Episode:1878 meanR:124.4400 gloss:-0.8873 dloss:0.6700 dlossR:0.6700 dlossQ:0.5267
Episode:1879 meanR:124.5300 gloss:-0.9487 dloss:0.4949 dlossR:0.4949 dlossQ:0.5076
Episode:1880 meanR:123.7300 gloss:-1.3303 dloss:0.3154 dlossR:0.3154 dlossQ:0.4096
Episode:1881 meanR:123.5100 gloss:-1.4985 dloss:0.4129 dlossR:0.4129 dlossQ:0.3428
Episode:1882 meanR:122.7600 gloss:-1.3586 dloss:0.3615 dlossR:0.3615 dlossQ:0.3982
Episode:1883 meanR:122.2500 gloss:-1.2239 dloss:0.5073 dlossR:0.5073 dlossQ:0.4250
Episode:1884 meanR:121.2100 gloss:-2.2396 dloss:0.1552 dlossR:0.1552 dlossQ:0.2094
Episode:1885 meanR:120.7800 gloss:-0.8295 dloss:0.5184 dlossR:0.5184 dlossQ:0.5446
Episode:1886 meanR:120.3100 gloss:-0.9118 dloss:0.5096 dlossR:0.5096 dlossQ:0.5245
Epis

Episode:1979 meanR:69.3600 gloss:-1.5741 dloss:0.3739 dlossR:0.3739 dlossQ:0.3254
Episode:1980 meanR:69.2800 gloss:-1.6848 dloss:0.2414 dlossR:0.2414 dlossQ:0.2985
Episode:1981 meanR:68.7700 gloss:-1.8356 dloss:0.1780 dlossR:0.1780 dlossQ:0.2663
Episode:1982 meanR:68.4800 gloss:-1.7671 dloss:0.2155 dlossR:0.2155 dlossQ:0.2907
Episode:1983 meanR:67.6500 gloss:-2.1338 dloss:0.1527 dlossR:0.1527 dlossQ:0.2239
Episode:1984 meanR:67.6600 gloss:-2.1412 dloss:0.1437 dlossR:0.1437 dlossQ:0.2063
Episode:1985 meanR:66.7600 gloss:-2.2560 dloss:0.1318 dlossR:0.1318 dlossQ:0.1895
Episode:1986 meanR:66.1400 gloss:-1.6326 dloss:0.2683 dlossR:0.2683 dlossQ:0.3140
Episode:1987 meanR:66.1300 gloss:-2.1936 dloss:0.1343 dlossR:0.1343 dlossQ:0.2082
Episode:1988 meanR:65.2900 gloss:-1.9286 dloss:0.1533 dlossR:0.1533 dlossQ:0.2405
Episode:1989 meanR:62.6900 gloss:-1.6207 dloss:0.2036 dlossR:0.2036 dlossQ:0.3115
Episode:1990 meanR:58.0500 gloss:-1.4016 dloss:0.2822 dlossR:0.2822 dlossQ:0.3748
Episode:1991 mea

Episode:2081 meanR:59.5700 gloss:-1.1795 dloss:0.4973 dlossR:0.4973 dlossQ:0.4378
Episode:2082 meanR:61.1300 gloss:-0.8321 dloss:0.7047 dlossR:0.7047 dlossQ:0.5387
Episode:2083 meanR:62.9100 gloss:-0.8920 dloss:0.7482 dlossR:0.7482 dlossQ:0.5187
Episode:2084 meanR:63.8200 gloss:-1.4147 dloss:0.5457 dlossR:0.5457 dlossQ:0.3620
Episode:2085 meanR:64.1900 gloss:-1.5597 dloss:0.3252 dlossR:0.3252 dlossQ:0.3347
Episode:2086 meanR:64.1600 gloss:-1.6870 dloss:0.2449 dlossR:0.2449 dlossQ:0.2971
Episode:2087 meanR:64.2400 gloss:-2.0256 dloss:0.1617 dlossR:0.1617 dlossQ:0.2159
Episode:2088 meanR:64.3100 gloss:-1.9476 dloss:0.1815 dlossR:0.1815 dlossQ:0.2327
Episode:2089 meanR:64.2500 gloss:-2.1258 dloss:0.1494 dlossR:0.1494 dlossQ:0.1950
Episode:2090 meanR:64.0600 gloss:-2.0517 dloss:0.1662 dlossR:0.1662 dlossQ:0.2157
Episode:2091 meanR:63.7600 gloss:-2.1051 dloss:0.1592 dlossR:0.1592 dlossQ:0.2051
Episode:2092 meanR:63.4700 gloss:-1.6286 dloss:0.4077 dlossR:0.4077 dlossQ:0.3119
Episode:2093 mea

Episode:2183 meanR:63.9300 gloss:-1.1399 dloss:0.3815 dlossR:0.3815 dlossQ:0.4446
Episode:2184 meanR:63.3800 gloss:-1.2672 dloss:0.3266 dlossR:0.3266 dlossQ:0.4054
Episode:2185 meanR:63.2600 gloss:-1.3954 dloss:0.2720 dlossR:0.2720 dlossQ:0.3690
Episode:2186 meanR:63.3300 gloss:-1.3185 dloss:0.2937 dlossR:0.2937 dlossQ:0.3950
Episode:2187 meanR:63.5400 gloss:-1.2584 dloss:0.3014 dlossR:0.3014 dlossQ:0.4148
Episode:2188 meanR:63.5700 gloss:-1.5050 dloss:0.2171 dlossR:0.2171 dlossQ:0.3381
Episode:2189 meanR:63.6000 gloss:-1.5807 dloss:0.1907 dlossR:0.1907 dlossQ:0.3169
Episode:2190 meanR:63.6000 gloss:-1.6110 dloss:0.1836 dlossR:0.1836 dlossQ:0.3080
Episode:2191 meanR:63.6300 gloss:-1.5516 dloss:0.1973 dlossR:0.1973 dlossQ:0.3247
Episode:2192 meanR:63.1800 gloss:-1.4358 dloss:0.2122 dlossR:0.2122 dlossQ:0.3611
Episode:2193 meanR:62.4000 gloss:-1.7809 dloss:0.1644 dlossR:0.1644 dlossQ:0.2668
Episode:2194 meanR:61.9600 gloss:-1.7057 dloss:0.1788 dlossR:0.1788 dlossQ:0.2868
Episode:2195 mea

Episode:2286 meanR:104.9200 gloss:-1.0628 dloss:0.3213 dlossR:0.3213 dlossQ:0.4765
Episode:2287 meanR:105.2600 gloss:-0.7174 dloss:0.4567 dlossR:0.4567 dlossQ:0.5771
Episode:2288 meanR:106.6900 gloss:-0.4736 dloss:0.6483 dlossR:0.6483 dlossQ:0.6311
Episode:2289 meanR:109.3600 gloss:-0.3977 dloss:0.7911 dlossR:0.7911 dlossQ:0.6417
Episode:2290 meanR:111.4100 gloss:-0.3409 dloss:0.7067 dlossR:0.7067 dlossQ:0.6559
Episode:2291 meanR:114.8100 gloss:-0.3289 dloss:0.8333 dlossR:0.8333 dlossQ:0.6608
Episode:2292 meanR:117.9600 gloss:-0.6864 dloss:0.9840 dlossR:0.9840 dlossQ:0.5857
Episode:2293 meanR:122.3900 gloss:-0.7052 dloss:1.2443 dlossR:1.2443 dlossQ:0.5801
Episode:2294 meanR:124.8700 gloss:-0.7010 dloss:0.8509 dlossR:0.8509 dlossQ:0.5819
Episode:2295 meanR:126.3800 gloss:-0.6503 dloss:0.6676 dlossR:0.6676 dlossQ:0.5957
Episode:2296 meanR:127.6300 gloss:-0.6394 dloss:0.6404 dlossR:0.6404 dlossQ:0.5988
Episode:2297 meanR:127.4400 gloss:-0.6197 dloss:0.6495 dlossR:0.6495 dlossQ:0.6019
Epis

Episode:2386 meanR:176.2800 gloss:-0.0814 dloss:0.6800 dlossR:0.6800 dlossQ:0.6854
Episode:2387 meanR:176.3800 gloss:0.0010 dloss:0.6970 dlossR:0.6970 dlossQ:0.6906
Episode:2388 meanR:175.1800 gloss:-0.2184 dloss:0.5919 dlossR:0.5919 dlossQ:0.6705
Episode:2389 meanR:172.9100 gloss:-0.5224 dloss:0.4803 dlossR:0.4803 dlossQ:0.6269
Episode:2390 meanR:170.7900 gloss:-1.2592 dloss:0.2466 dlossR:0.2466 dlossQ:0.4154
Episode:2391 meanR:167.5300 gloss:-0.6872 dloss:0.3959 dlossR:0.3959 dlossQ:0.5841
Episode:2392 meanR:164.5100 gloss:-0.7459 dloss:0.3787 dlossR:0.3787 dlossQ:0.5674
Episode:2393 meanR:160.0900 gloss:-1.0257 dloss:0.2996 dlossR:0.2996 dlossQ:0.4816
Episode:2394 meanR:157.5300 gloss:-1.5283 dloss:0.2239 dlossR:0.2239 dlossQ:0.3561
Episode:2395 meanR:157.5800 gloss:-0.2101 dloss:0.6734 dlossR:0.6734 dlossQ:0.6758
Episode:2396 meanR:156.0700 gloss:-0.8838 dloss:0.3148 dlossR:0.3148 dlossQ:0.5243
Episode:2397 meanR:154.4800 gloss:-1.2004 dloss:0.2397 dlossR:0.2397 dlossQ:0.4314
Episo

Episode:2488 meanR:96.3600 gloss:-0.4594 dloss:0.7118 dlossR:0.7118 dlossQ:0.6409
Episode:2489 meanR:97.0100 gloss:-0.5230 dloss:0.5773 dlossR:0.5773 dlossQ:0.6268
Episode:2490 meanR:97.9900 gloss:-0.6373 dloss:0.5420 dlossR:0.5420 dlossQ:0.5990
Episode:2491 meanR:98.6000 gloss:-0.6910 dloss:0.5085 dlossR:0.5085 dlossQ:0.5858
Episode:2492 meanR:98.7600 gloss:-0.9417 dloss:0.3686 dlossR:0.3686 dlossQ:0.5103
Episode:2493 meanR:99.0800 gloss:-1.1032 dloss:0.3455 dlossR:0.3455 dlossQ:0.4622
Episode:2494 meanR:99.4100 gloss:-1.1897 dloss:0.3199 dlossR:0.3199 dlossQ:0.4370
Episode:2495 meanR:97.9600 gloss:-1.2112 dloss:0.3105 dlossR:0.3105 dlossQ:0.4346
Episode:2496 meanR:97.9800 gloss:-1.5931 dloss:0.1839 dlossR:0.1839 dlossQ:0.3160
Episode:2497 meanR:98.1100 gloss:-1.5072 dloss:0.2219 dlossR:0.2219 dlossQ:0.3462
Episode:2498 meanR:94.1500 gloss:-1.7437 dloss:0.1697 dlossR:0.1697 dlossQ:0.2827
Episode:2499 meanR:91.8900 gloss:-1.7375 dloss:0.1866 dlossR:0.1866 dlossQ:0.2875
Episode:2500 mea

Episode:2589 meanR:90.2100 gloss:-0.6704 dloss:0.6233 dlossR:0.6233 dlossQ:0.5911
Episode:2590 meanR:93.3900 gloss:-0.7908 dloss:1.2576 dlossR:1.2576 dlossQ:0.5572
Episode:2591 meanR:93.9600 gloss:-0.5270 dloss:0.6238 dlossR:0.6238 dlossQ:0.6274
Episode:2592 meanR:94.7800 gloss:-0.8902 dloss:0.6000 dlossR:0.6000 dlossQ:0.5266
Episode:2593 meanR:95.3900 gloss:-0.8907 dloss:0.5308 dlossR:0.5308 dlossQ:0.5268
Episode:2594 meanR:95.7800 gloss:-0.9969 dloss:0.4637 dlossR:0.4637 dlossQ:0.4933
Episode:2595 meanR:95.8700 gloss:-1.2872 dloss:0.3260 dlossR:0.3260 dlossQ:0.4014
Episode:2596 meanR:95.8100 gloss:-2.0740 dloss:0.1490 dlossR:0.1490 dlossQ:0.2332
Episode:2597 meanR:96.7200 gloss:-0.7992 dloss:0.5487 dlossR:0.5487 dlossQ:0.5529
Episode:2598 meanR:97.9300 gloss:-0.8612 dloss:0.5961 dlossR:0.5961 dlossQ:0.5381
Episode:2599 meanR:97.8800 gloss:-1.9808 dloss:0.1481 dlossR:0.1481 dlossQ:0.2367
Episode:2600 meanR:97.9100 gloss:-1.8884 dloss:0.1768 dlossR:0.1768 dlossQ:0.2641
Episode:2601 mea

Episode:2692 meanR:86.8500 gloss:-0.5927 dloss:0.8783 dlossR:0.8783 dlossQ:0.6110
Episode:2693 meanR:85.8500 gloss:-2.0508 dloss:0.1674 dlossR:0.1674 dlossQ:0.2564
Episode:2694 meanR:85.4900 gloss:-0.9179 dloss:0.3919 dlossR:0.3919 dlossQ:0.5229
Episode:2695 meanR:88.0700 gloss:-0.5419 dloss:0.8587 dlossR:0.8587 dlossQ:0.6229
Episode:2696 meanR:88.7400 gloss:-0.9667 dloss:0.4378 dlossR:0.4378 dlossQ:0.5013
Episode:2697 meanR:87.9700 gloss:-1.2770 dloss:0.2959 dlossR:0.2959 dlossQ:0.4107
Episode:2698 meanR:86.8500 gloss:-1.5054 dloss:0.2159 dlossR:0.2159 dlossQ:0.3426
Episode:2699 meanR:86.9600 gloss:-1.4732 dloss:0.2240 dlossR:0.2240 dlossQ:0.3546
Episode:2700 meanR:87.0400 gloss:-1.3819 dloss:0.2383 dlossR:0.2383 dlossQ:0.3831
Episode:2701 meanR:87.0500 gloss:-1.4025 dloss:0.2228 dlossR:0.2228 dlossQ:0.3763
Episode:2702 meanR:86.0900 gloss:-1.6004 dloss:0.1853 dlossR:0.1853 dlossQ:0.3172
Episode:2703 meanR:83.6300 gloss:-1.5182 dloss:0.2072 dlossR:0.2072 dlossQ:0.3443
Episode:2704 mea

Episode:2792 meanR:56.4400 gloss:-0.9460 dloss:0.9087 dlossR:0.9087 dlossQ:0.5065
Episode:2793 meanR:57.4000 gloss:-1.1711 dloss:0.5287 dlossR:0.5287 dlossQ:0.4387
Episode:2794 meanR:57.3700 gloss:-1.4792 dloss:0.3028 dlossR:0.3028 dlossQ:0.3524
Episode:2795 meanR:54.7400 gloss:-1.5625 dloss:0.3036 dlossR:0.3036 dlossQ:0.3337
Episode:2796 meanR:54.3200 gloss:-1.6347 dloss:0.2632 dlossR:0.2632 dlossQ:0.3132
Episode:2797 meanR:54.2800 gloss:-1.5588 dloss:0.2530 dlossR:0.2530 dlossQ:0.3278
Episode:2798 meanR:55.0200 gloss:-1.1702 dloss:0.4956 dlossR:0.4956 dlossQ:0.4396
Episode:2799 meanR:56.2800 gloss:-1.0162 dloss:0.6422 dlossR:0.6422 dlossQ:0.4864
Episode:2800 meanR:58.2500 gloss:-0.8720 dloss:0.8052 dlossR:0.8052 dlossQ:0.5303
Episode:2801 meanR:59.7300 gloss:-0.9254 dloss:0.6812 dlossR:0.6812 dlossQ:0.5133
Episode:2802 meanR:60.8800 gloss:-0.9933 dloss:0.5899 dlossR:0.5899 dlossQ:0.4934
Episode:2803 meanR:61.8700 gloss:-1.0587 dloss:0.5535 dlossR:0.5535 dlossQ:0.4729
Episode:2804 mea

Episode:2894 meanR:94.9200 gloss:-0.7464 dloss:0.5513 dlossR:0.5513 dlossQ:0.5674
Episode:2895 meanR:95.9700 gloss:-0.7136 dloss:0.6113 dlossR:0.6113 dlossQ:0.5774
Episode:2896 meanR:96.6000 gloss:-0.7636 dloss:0.5128 dlossR:0.5128 dlossQ:0.5610
Episode:2897 meanR:97.8600 gloss:-0.7112 dloss:0.6410 dlossR:0.6410 dlossQ:0.5749
Episode:2898 meanR:98.2000 gloss:-0.7494 dloss:0.5834 dlossR:0.5834 dlossQ:0.5642
Episode:2899 meanR:97.7000 gloss:-0.7956 dloss:0.5066 dlossR:0.5066 dlossQ:0.5522
Episode:2900 meanR:96.6900 gloss:-0.7965 dloss:0.5539 dlossR:0.5539 dlossQ:0.5520
Episode:2901 meanR:95.8500 gloss:-0.8912 dloss:0.4708 dlossR:0.4708 dlossQ:0.5232
Episode:2902 meanR:95.4900 gloss:-0.9227 dloss:0.4904 dlossR:0.4904 dlossQ:0.5150
Episode:2903 meanR:94.4200 gloss:-1.7858 dloss:0.1650 dlossR:0.1650 dlossQ:0.2864
Episode:2904 meanR:95.4200 gloss:-0.6176 dloss:0.6607 dlossR:0.6607 dlossQ:0.6023
Episode:2905 meanR:94.8700 gloss:-1.9930 dloss:0.1576 dlossR:0.1576 dlossQ:0.2547
Episode:2906 mea

Episode:2995 meanR:88.4600 gloss:-0.8045 dloss:0.6882 dlossR:0.6882 dlossQ:0.5514
Episode:2996 meanR:89.8500 gloss:-0.7667 dloss:0.8187 dlossR:0.8187 dlossQ:0.5635
Episode:2997 meanR:89.6900 gloss:-1.0263 dloss:0.6359 dlossR:0.6359 dlossQ:0.4882
Episode:2998 meanR:89.2600 gloss:-1.4269 dloss:0.4901 dlossR:0.4901 dlossQ:0.3705
Episode:2999 meanR:88.4000 gloss:-2.1154 dloss:0.1524 dlossR:0.1524 dlossQ:0.2224
Episode:3000 meanR:88.5600 gloss:-1.0783 dloss:0.6168 dlossR:0.6168 dlossQ:0.4694
Episode:3001 meanR:88.9700 gloss:-0.6973 dloss:0.5700 dlossR:0.5700 dlossQ:0.5813
Episode:3002 meanR:88.5400 gloss:-1.0266 dloss:0.3758 dlossR:0.3758 dlossQ:0.4842
Episode:3003 meanR:88.5700 gloss:-1.7429 dloss:0.1707 dlossR:0.1707 dlossQ:0.2904
Episode:3004 meanR:89.0900 gloss:-0.7586 dloss:0.7867 dlossR:0.7867 dlossQ:0.5635
Episode:3005 meanR:90.6400 gloss:-0.8688 dloss:0.6655 dlossR:0.6655 dlossQ:0.5330
Episode:3006 meanR:89.0400 gloss:-1.3741 dloss:0.2997 dlossR:0.2997 dlossQ:0.3900
Episode:3007 mea

Episode:3099 meanR:107.6400 gloss:-0.8067 dloss:0.4336 dlossR:0.4336 dlossQ:0.5489
Episode:3100 meanR:110.3500 gloss:-0.5083 dloss:0.9912 dlossR:0.9912 dlossQ:0.6292
Episode:3101 meanR:110.4900 gloss:-0.5650 dloss:0.6020 dlossR:0.6020 dlossQ:0.6094
Episode:3102 meanR:110.7800 gloss:-1.0031 dloss:0.4513 dlossR:0.4513 dlossQ:0.4890
Episode:3103 meanR:110.8400 gloss:-1.6663 dloss:0.2023 dlossR:0.2023 dlossQ:0.3089
Episode:3104 meanR:108.7100 gloss:-1.7551 dloss:0.1798 dlossR:0.1798 dlossQ:0.2910
Episode:3105 meanR:107.2200 gloss:-1.7526 dloss:0.1833 dlossR:0.1833 dlossQ:0.2878
Episode:3106 meanR:108.4400 gloss:-0.8016 dloss:0.6508 dlossR:0.6508 dlossQ:0.5492
Episode:3107 meanR:110.7200 gloss:-0.6456 dloss:0.8096 dlossR:0.8096 dlossQ:0.5976
Episode:3108 meanR:112.5000 gloss:-0.4865 dloss:0.7285 dlossR:0.7285 dlossQ:0.6332
Episode:3109 meanR:111.9200 gloss:-0.5328 dloss:0.5475 dlossR:0.5475 dlossQ:0.6215
Episode:3110 meanR:110.5700 gloss:-0.7459 dloss:0.5256 dlossR:0.5256 dlossQ:0.5678
Epis

Episode:3201 meanR:124.1200 gloss:-1.2529 dloss:0.3374 dlossR:0.3374 dlossQ:0.4138
Episode:3202 meanR:123.7000 gloss:-1.2317 dloss:0.3051 dlossR:0.3051 dlossQ:0.4204
Episode:3203 meanR:123.9600 gloss:-1.3364 dloss:0.3224 dlossR:0.3224 dlossQ:0.3905
Episode:3204 meanR:124.3100 gloss:-1.3614 dloss:0.3328 dlossR:0.3328 dlossQ:0.3835
Episode:3205 meanR:124.5100 gloss:-1.4399 dloss:0.2753 dlossR:0.2753 dlossQ:0.3578
Episode:3206 meanR:123.2100 gloss:-1.4534 dloss:0.2532 dlossR:0.2532 dlossQ:0.3571
Episode:3207 meanR:121.0900 gloss:-1.4314 dloss:0.3104 dlossR:0.3104 dlossQ:0.3629
Episode:3208 meanR:119.2300 gloss:-1.4257 dloss:0.3034 dlossR:0.3034 dlossQ:0.3644
Episode:3209 meanR:118.8000 gloss:-1.2297 dloss:0.3803 dlossR:0.3803 dlossQ:0.4214
Episode:3210 meanR:118.5800 gloss:-0.9628 dloss:0.4634 dlossR:0.4634 dlossQ:0.5026
Episode:3211 meanR:118.8200 gloss:-1.0012 dloss:0.4492 dlossR:0.4492 dlossQ:0.4901
Episode:3212 meanR:119.8500 gloss:-0.8979 dloss:0.5510 dlossR:0.5510 dlossQ:0.5219
Epis

Episode:3304 meanR:72.3600 gloss:-1.9372 dloss:0.1982 dlossR:0.1982 dlossQ:0.2529
Episode:3305 meanR:72.1600 gloss:-2.1574 dloss:0.1589 dlossR:0.1589 dlossQ:0.2036
Episode:3306 meanR:72.2900 gloss:-1.7290 dloss:0.3093 dlossR:0.3093 dlossQ:0.2995
Episode:3307 meanR:73.3600 gloss:-0.8798 dloss:0.6337 dlossR:0.6337 dlossQ:0.5300
Episode:3308 meanR:75.7700 gloss:-0.9011 dloss:0.9839 dlossR:0.9839 dlossQ:0.5263
Episode:3309 meanR:75.6300 gloss:-1.3704 dloss:0.3357 dlossR:0.3357 dlossQ:0.3937
Episode:3310 meanR:74.9600 gloss:-2.0010 dloss:0.1735 dlossR:0.1735 dlossQ:0.2451
Episode:3311 meanR:74.6600 gloss:-1.6118 dloss:0.3235 dlossR:0.3235 dlossQ:0.3149
Episode:3312 meanR:75.5300 gloss:-1.1450 dloss:0.8579 dlossR:0.8579 dlossQ:0.4516
Episode:3313 meanR:74.5800 gloss:-1.3325 dloss:0.3954 dlossR:0.3954 dlossQ:0.3977
Episode:3314 meanR:72.8600 gloss:-2.2774 dloss:0.1556 dlossR:0.1556 dlossQ:0.1998
Episode:3315 meanR:71.5100 gloss:-1.5324 dloss:0.2511 dlossR:0.2511 dlossQ:0.3383
Episode:3316 mea

Episode:3409 meanR:73.9200 gloss:-0.7965 dloss:1.4305 dlossR:1.4305 dlossQ:0.5590
Episode:3410 meanR:75.5900 gloss:-0.7494 dloss:0.6907 dlossR:0.6907 dlossQ:0.5677
Episode:3411 meanR:76.5400 gloss:-0.8567 dloss:0.6109 dlossR:0.6109 dlossQ:0.5371
Episode:3412 meanR:75.3100 gloss:-0.9550 dloss:0.4490 dlossR:0.4490 dlossQ:0.5061
Episode:3413 meanR:75.1200 gloss:-1.2367 dloss:0.3162 dlossR:0.3162 dlossQ:0.4167
Episode:3414 meanR:75.4400 gloss:-1.5003 dloss:0.3066 dlossR:0.3066 dlossQ:0.3400
Episode:3415 meanR:75.4500 gloss:-1.5605 dloss:0.2485 dlossR:0.2485 dlossQ:0.3241
Episode:3416 meanR:75.4500 gloss:-1.6905 dloss:0.2243 dlossR:0.2243 dlossQ:0.2932
Episode:3417 meanR:75.1800 gloss:-1.8717 dloss:0.1712 dlossR:0.1712 dlossQ:0.2480
Episode:3418 meanR:74.6100 gloss:-1.8907 dloss:0.1595 dlossR:0.1595 dlossQ:0.2456
Episode:3419 meanR:74.5800 gloss:-1.7078 dloss:0.1976 dlossR:0.1976 dlossQ:0.2926
Episode:3420 meanR:74.6500 gloss:-1.9204 dloss:0.1767 dlossR:0.1767 dlossQ:0.2421
Episode:3421 mea

Episode:3509 meanR:87.5000 gloss:-0.5630 dloss:0.5611 dlossR:0.5611 dlossQ:0.6160
Episode:3510 meanR:87.2800 gloss:-0.5548 dloss:0.6339 dlossR:0.6339 dlossQ:0.6170
Episode:3511 meanR:86.7900 gloss:-0.4841 dloss:0.5474 dlossR:0.5474 dlossQ:0.6279
Episode:3512 meanR:86.8600 gloss:-0.6269 dloss:0.5022 dlossR:0.5022 dlossQ:0.5993
Episode:3513 meanR:87.8100 gloss:-0.6515 dloss:0.6002 dlossR:0.6002 dlossQ:0.5909
Episode:3514 meanR:88.1100 gloss:-0.8523 dloss:0.4556 dlossR:0.4556 dlossQ:0.5351
Episode:3515 meanR:88.5800 gloss:-1.0081 dloss:0.4529 dlossR:0.4529 dlossQ:0.4909
Episode:3516 meanR:88.9700 gloss:-1.2068 dloss:0.4033 dlossR:0.4033 dlossQ:0.4275
Episode:3517 meanR:88.8900 gloss:-1.9877 dloss:0.1633 dlossR:0.1633 dlossQ:0.2573
Episode:3518 meanR:91.2600 gloss:-0.5904 dloss:0.7917 dlossR:0.7917 dlossQ:0.6087
Episode:3519 meanR:91.1800 gloss:-2.0592 dloss:0.1649 dlossR:0.1649 dlossQ:0.2449
Episode:3520 meanR:91.1100 gloss:-1.8849 dloss:0.1949 dlossR:0.1949 dlossQ:0.2905
Episode:3521 mea

Episode:3611 meanR:124.1300 gloss:-1.1940 dloss:0.3342 dlossR:0.3342 dlossQ:0.4444
Episode:3612 meanR:123.5900 gloss:-1.2763 dloss:0.3053 dlossR:0.3053 dlossQ:0.4237
Episode:3613 meanR:122.4200 gloss:-1.5710 dloss:0.2136 dlossR:0.2136 dlossQ:0.3266
Episode:3614 meanR:121.9700 gloss:-1.3911 dloss:0.2682 dlossR:0.2682 dlossQ:0.3852
Episode:3615 meanR:121.4900 gloss:-1.4300 dloss:0.2725 dlossR:0.2725 dlossQ:0.3726
Episode:3616 meanR:121.1300 gloss:-1.4566 dloss:0.2702 dlossR:0.2702 dlossQ:0.3689
Episode:3617 meanR:121.3700 gloss:-1.4693 dloss:0.2825 dlossR:0.2825 dlossQ:0.3663
Episode:3618 meanR:119.2800 gloss:-1.4278 dloss:0.3134 dlossR:0.3134 dlossQ:0.3723
Episode:3619 meanR:119.6600 gloss:-1.3866 dloss:0.3435 dlossR:0.3435 dlossQ:0.3843
Episode:3620 meanR:120.1400 gloss:-1.1779 dloss:0.3761 dlossR:0.3761 dlossQ:0.4397
Episode:3621 meanR:120.3900 gloss:-0.6631 dloss:0.6553 dlossR:0.6553 dlossQ:0.5922
Episode:3622 meanR:121.6300 gloss:-0.5402 dloss:0.5926 dlossR:0.5926 dlossQ:0.6203
Epis

Episode:3712 meanR:90.3200 gloss:-1.4136 dloss:0.2359 dlossR:0.2359 dlossQ:0.3833
Episode:3713 meanR:90.3000 gloss:-1.5613 dloss:0.2111 dlossR:0.2111 dlossQ:0.3350
Episode:3714 meanR:93.2100 gloss:-0.8138 dloss:1.0299 dlossR:1.0299 dlossQ:0.5483
Episode:3715 meanR:93.3100 gloss:-1.3796 dloss:0.3125 dlossR:0.3125 dlossQ:0.3878
Episode:3716 meanR:93.1500 gloss:-1.8374 dloss:0.1783 dlossR:0.1783 dlossQ:0.2770
Episode:3717 meanR:92.9600 gloss:-1.8279 dloss:0.1820 dlossR:0.1820 dlossQ:0.2738
Episode:3718 meanR:92.7800 gloss:-1.4793 dloss:0.2417 dlossR:0.2417 dlossQ:0.3571
Episode:3719 meanR:93.4900 gloss:-0.6302 dloss:0.5693 dlossR:0.5693 dlossQ:0.5953
Episode:3720 meanR:96.8800 gloss:-0.5353 dloss:1.0061 dlossR:1.0061 dlossQ:0.6189
Episode:3721 meanR:96.7600 gloss:-0.6168 dloss:0.6334 dlossR:0.6334 dlossQ:0.6016
Episode:3722 meanR:96.6400 gloss:-0.9354 dloss:0.5737 dlossR:0.5737 dlossQ:0.5085
Episode:3723 meanR:93.7500 gloss:-1.6026 dloss:0.3498 dlossR:0.3498 dlossQ:0.3113
Episode:3724 mea

Episode:3814 meanR:104.5000 gloss:-0.8695 dloss:0.4704 dlossR:0.4704 dlossQ:0.5296
Episode:3815 meanR:105.1100 gloss:-0.7805 dloss:0.5246 dlossR:0.5246 dlossQ:0.5536
Episode:3816 meanR:106.0500 gloss:-0.7606 dloss:0.5355 dlossR:0.5355 dlossQ:0.5609
Episode:3817 meanR:107.3600 gloss:-0.7285 dloss:0.6196 dlossR:0.6196 dlossQ:0.5686
Episode:3818 meanR:108.7800 gloss:-0.7226 dloss:0.6728 dlossR:0.6728 dlossQ:0.5675
Episode:3819 meanR:109.8200 gloss:-0.6128 dloss:0.7557 dlossR:0.7557 dlossQ:0.5991
Episode:3820 meanR:107.2600 gloss:-0.6588 dloss:0.6063 dlossR:0.6063 dlossQ:0.5879
Episode:3821 meanR:107.3500 gloss:-0.5541 dloss:0.6418 dlossR:0.6418 dlossQ:0.6174
Episode:3822 meanR:109.5500 gloss:-0.4879 dloss:0.8869 dlossR:0.8869 dlossQ:0.6335
Episode:3823 meanR:112.1700 gloss:-0.4388 dloss:0.8294 dlossR:0.8294 dlossQ:0.6438
Episode:3824 meanR:114.9300 gloss:-0.2803 dloss:0.7688 dlossR:0.7688 dlossQ:0.6560
Episode:3825 meanR:116.6700 gloss:-0.3418 dloss:0.6743 dlossR:0.6743 dlossQ:0.6550
Epis

Episode:3913 meanR:105.2200 gloss:-0.7260 dloss:1.3526 dlossR:1.3526 dlossQ:0.5743
Episode:3914 meanR:105.2900 gloss:-1.0387 dloss:0.5119 dlossR:0.5119 dlossQ:0.4664
Episode:3915 meanR:104.4400 gloss:-2.0181 dloss:0.1762 dlossR:0.1762 dlossQ:0.2399
Episode:3916 meanR:103.5000 gloss:-2.1213 dloss:0.1613 dlossR:0.1613 dlossQ:0.2169
Episode:3917 meanR:102.2400 gloss:-1.7709 dloss:0.2054 dlossR:0.2054 dlossQ:0.2874
Episode:3918 meanR:101.1100 gloss:-1.1509 dloss:0.3545 dlossR:0.3545 dlossQ:0.4429
Episode:3919 meanR:100.5300 gloss:-0.5361 dloss:0.6540 dlossR:0.6540 dlossQ:0.6141
Episode:3920 meanR:101.5600 gloss:-0.4079 dloss:0.7370 dlossR:0.7370 dlossQ:0.6484
Episode:3921 meanR:104.9000 gloss:-0.4014 dloss:1.0383 dlossR:1.0383 dlossQ:0.6459
Episode:3922 meanR:103.3200 gloss:-0.4657 dloss:0.6667 dlossR:0.6667 dlossQ:0.6371
Episode:3923 meanR:101.4700 gloss:-0.5575 dloss:0.5914 dlossR:0.5914 dlossQ:0.6155
Episode:3924 meanR:99.6100 gloss:-0.7140 dloss:0.5458 dlossR:0.5458 dlossQ:0.5755
Episo

Episode:4014 meanR:159.6500 gloss:-0.1041 dloss:0.7009 dlossR:0.7009 dlossQ:0.6891
Episode:4015 meanR:161.0700 gloss:-0.1011 dloss:0.6727 dlossR:0.6727 dlossQ:0.6880
Episode:4016 meanR:163.0300 gloss:-0.1053 dloss:0.6864 dlossR:0.6864 dlossQ:0.6886
Episode:4017 meanR:164.7100 gloss:-0.1018 dloss:0.6798 dlossR:0.6798 dlossQ:0.6888
Episode:4018 meanR:165.1500 gloss:-0.1239 dloss:0.6480 dlossR:0.6480 dlossQ:0.6855
Episode:4019 meanR:164.8500 gloss:-0.1145 dloss:0.6621 dlossR:0.6621 dlossQ:0.6877
Episode:4020 meanR:164.6700 gloss:-0.1050 dloss:0.6907 dlossR:0.6907 dlossQ:0.6889
Episode:4021 meanR:160.7100 gloss:-0.1112 dloss:0.6536 dlossR:0.6536 dlossQ:0.6870
Episode:4022 meanR:161.0200 gloss:-0.1059 dloss:0.6867 dlossR:0.6867 dlossQ:0.6891
Episode:4023 meanR:161.1500 gloss:-0.1210 dloss:0.6629 dlossR:0.6629 dlossQ:0.6873
Episode:4024 meanR:162.2100 gloss:-0.0965 dloss:0.6901 dlossR:0.6901 dlossQ:0.6870
Episode:4025 meanR:163.9800 gloss:-0.1208 dloss:0.6966 dlossR:0.6966 dlossQ:0.6880
Epis

Episode:4113 meanR:136.9500 gloss:-0.2547 dloss:0.7092 dlossR:0.7092 dlossQ:0.6744
Episode:4114 meanR:135.5000 gloss:-0.5081 dloss:0.5684 dlossR:0.5684 dlossQ:0.6299
Episode:4115 meanR:134.3200 gloss:-1.1124 dloss:0.3301 dlossR:0.3301 dlossQ:0.4542
Episode:4116 meanR:132.3900 gloss:-1.4367 dloss:0.2297 dlossR:0.2297 dlossQ:0.3639
Episode:4117 meanR:130.6300 gloss:-1.4958 dloss:0.2366 dlossR:0.2366 dlossQ:0.3615
Episode:4118 meanR:129.8000 gloss:-1.3786 dloss:0.2618 dlossR:0.2618 dlossQ:0.3916
Episode:4119 meanR:128.5700 gloss:-1.6621 dloss:0.2283 dlossR:0.2283 dlossQ:0.3264
Episode:4120 meanR:126.4400 gloss:-1.5461 dloss:0.2177 dlossR:0.2177 dlossQ:0.3442
Episode:4121 meanR:125.6500 gloss:-1.4520 dloss:0.2549 dlossR:0.2549 dlossQ:0.3676
Episode:4122 meanR:124.0400 gloss:-0.9774 dloss:0.3885 dlossR:0.3885 dlossQ:0.4973
Episode:4123 meanR:124.0300 gloss:-0.2936 dloss:0.6298 dlossR:0.6298 dlossQ:0.6682
Episode:4124 meanR:126.8900 gloss:-0.2088 dloss:0.8700 dlossR:0.8700 dlossQ:0.6700
Epis

Episode:4212 meanR:172.4100 gloss:-1.3152 dloss:0.4049 dlossR:0.4049 dlossQ:0.3906
Episode:4213 meanR:170.0700 gloss:-1.5740 dloss:0.2602 dlossR:0.2602 dlossQ:0.3556
Episode:4214 meanR:169.0300 gloss:-1.6040 dloss:0.2656 dlossR:0.2656 dlossQ:0.3441
Episode:4215 meanR:168.7500 gloss:-1.5929 dloss:0.2586 dlossR:0.2586 dlossQ:0.3460
Episode:4216 meanR:168.7100 gloss:-1.6294 dloss:0.2598 dlossR:0.2598 dlossQ:0.3344
Episode:4217 meanR:168.9100 gloss:-1.4739 dloss:0.3009 dlossR:0.3009 dlossQ:0.3630
Episode:4218 meanR:169.1700 gloss:-1.0877 dloss:0.3726 dlossR:0.3726 dlossQ:0.4735
Episode:4219 meanR:169.8800 gloss:-0.6709 dloss:0.5475 dlossR:0.5475 dlossQ:0.5874
Episode:4220 meanR:170.6500 gloss:-0.6499 dloss:0.5650 dlossR:0.5650 dlossQ:0.5934
Episode:4221 meanR:170.7200 gloss:-1.0392 dloss:0.4256 dlossR:0.4256 dlossQ:0.5124
Episode:4222 meanR:174.5500 gloss:-0.1145 dloss:0.7601 dlossR:0.7601 dlossQ:0.6896
Episode:4223 meanR:173.2400 gloss:-1.4484 dloss:0.2674 dlossR:0.2674 dlossQ:0.3743
Epis

Episode:4312 meanR:115.5800 gloss:-0.6343 dloss:0.5636 dlossR:0.5636 dlossQ:0.5977
Episode:4313 meanR:116.7900 gloss:-0.6127 dloss:0.5937 dlossR:0.5937 dlossQ:0.5998
Episode:4314 meanR:118.6500 gloss:-0.4791 dloss:0.6841 dlossR:0.6841 dlossQ:0.6383
Episode:4315 meanR:120.6200 gloss:-0.3962 dloss:0.6919 dlossR:0.6919 dlossQ:0.6512
Episode:4316 meanR:123.6000 gloss:-0.4057 dloss:0.8161 dlossR:0.8161 dlossQ:0.6523
Episode:4317 meanR:126.1800 gloss:-0.2816 dloss:0.7520 dlossR:0.7520 dlossQ:0.6703
Episode:4318 meanR:128.2000 gloss:-0.3326 dloss:0.7239 dlossR:0.7239 dlossQ:0.6612
Episode:4319 meanR:129.2300 gloss:-0.3616 dloss:0.6726 dlossR:0.6726 dlossQ:0.6558
Episode:4320 meanR:129.7300 gloss:-0.4120 dloss:0.6197 dlossR:0.6197 dlossQ:0.6449
Episode:4321 meanR:130.9600 gloss:-0.5176 dloss:0.6329 dlossR:0.6329 dlossQ:0.6256
Episode:4322 meanR:127.5500 gloss:-0.5140 dloss:0.5376 dlossR:0.5376 dlossQ:0.6257
Episode:4323 meanR:128.4800 gloss:-0.6623 dloss:0.5352 dlossR:0.5352 dlossQ:0.5906
Epis

Episode:4414 meanR:117.5300 gloss:-0.6276 dloss:0.5553 dlossR:0.5553 dlossQ:0.5960
Episode:4415 meanR:116.9600 gloss:-0.3846 dloss:0.6331 dlossR:0.6331 dlossQ:0.6498
Episode:4416 meanR:117.1900 gloss:-0.0589 dloss:0.7112 dlossR:0.7112 dlossQ:0.6902
Episode:4417 meanR:119.3000 gloss:-0.2477 dloss:0.8896 dlossR:0.8896 dlossQ:0.6797
Episode:4418 meanR:118.8300 gloss:-0.1197 dloss:0.6873 dlossR:0.6873 dlossQ:0.6827
Episode:4419 meanR:118.1900 gloss:-0.3435 dloss:0.6174 dlossR:0.6174 dlossQ:0.6521
Episode:4420 meanR:118.0700 gloss:-0.3982 dloss:0.6062 dlossR:0.6062 dlossQ:0.6499
Episode:4421 meanR:116.9500 gloss:-0.9970 dloss:0.3534 dlossR:0.3534 dlossQ:0.4988
Episode:4422 meanR:116.1400 gloss:-1.6087 dloss:0.2195 dlossR:0.2195 dlossQ:0.3369
Episode:4423 meanR:115.2500 gloss:-1.6087 dloss:0.2101 dlossR:0.2101 dlossQ:0.3307
Episode:4424 meanR:115.4100 gloss:-0.5641 dloss:0.5161 dlossR:0.5161 dlossQ:0.6048
Episode:4425 meanR:115.8100 gloss:-0.3639 dloss:0.5858 dlossR:0.5858 dlossQ:0.6525
Epis

Episode:4513 meanR:111.4200 gloss:-0.4042 dloss:0.8718 dlossR:0.8718 dlossQ:0.6518
Episode:4514 meanR:111.9100 gloss:-0.3870 dloss:0.6389 dlossR:0.6389 dlossQ:0.6532
Episode:4515 meanR:111.0800 gloss:-0.4689 dloss:0.5220 dlossR:0.5220 dlossQ:0.6331
Episode:4516 meanR:109.6800 gloss:-0.4037 dloss:0.6798 dlossR:0.6798 dlossQ:0.6523
Episode:4517 meanR:106.3900 gloss:-0.4296 dloss:0.6537 dlossR:0.6537 dlossQ:0.6482
Episode:4518 meanR:105.6500 gloss:-0.3840 dloss:0.6011 dlossR:0.6011 dlossQ:0.6533
Episode:4519 meanR:106.8900 gloss:-0.2731 dloss:0.7171 dlossR:0.7171 dlossQ:0.6709
Episode:4520 meanR:108.5800 gloss:-0.3602 dloss:0.7821 dlossR:0.7821 dlossQ:0.6605
Episode:4521 meanR:111.8200 gloss:-0.4406 dloss:0.8969 dlossR:0.8969 dlossQ:0.6461
Episode:4522 meanR:113.6400 gloss:-0.4594 dloss:0.6819 dlossR:0.6819 dlossQ:0.6400
Episode:4523 meanR:115.1500 gloss:-0.4894 dloss:0.6396 dlossR:0.6396 dlossQ:0.6343
Episode:4524 meanR:115.2700 gloss:-0.7083 dloss:0.4959 dlossR:0.4959 dlossQ:0.5794
Epis

Episode:4615 meanR:121.2200 gloss:-0.2795 dloss:0.7168 dlossR:0.7168 dlossQ:0.6704
Episode:4616 meanR:121.1800 gloss:-0.4417 dloss:0.6696 dlossR:0.6696 dlossQ:0.6430
Episode:4617 meanR:120.2500 gloss:-1.0735 dloss:0.5053 dlossR:0.5053 dlossQ:0.4567
Episode:4618 meanR:119.1500 gloss:-2.3317 dloss:0.1984 dlossR:0.1984 dlossQ:0.2303
Episode:4619 meanR:116.8200 gloss:-1.9906 dloss:0.2053 dlossR:0.2053 dlossQ:0.2609
Episode:4620 meanR:114.3400 gloss:-1.2125 dloss:0.3479 dlossR:0.3479 dlossQ:0.4306
Episode:4621 meanR:112.9700 gloss:-0.4078 dloss:0.7190 dlossR:0.7190 dlossQ:0.6486
Episode:4622 meanR:115.3800 gloss:-0.2735 dloss:0.8671 dlossR:0.8671 dlossQ:0.6711
Episode:4623 meanR:114.6700 gloss:-0.4147 dloss:0.5732 dlossR:0.5732 dlossQ:0.6459
Episode:4624 meanR:114.1900 gloss:-1.1509 dloss:0.3500 dlossR:0.3500 dlossQ:0.4694
Episode:4625 meanR:113.8200 gloss:-1.0403 dloss:0.3635 dlossR:0.3635 dlossQ:0.5056
Episode:4626 meanR:114.7400 gloss:-0.4966 dloss:0.6322 dlossR:0.6322 dlossQ:0.6334
Epis

Episode:4715 meanR:164.7500 gloss:-0.2622 dloss:0.6707 dlossR:0.6707 dlossQ:0.6667
Episode:4716 meanR:163.0700 gloss:-0.8074 dloss:0.3884 dlossR:0.3884 dlossQ:0.5593
Episode:4717 meanR:165.5700 gloss:-0.2051 dloss:0.7568 dlossR:0.7568 dlossQ:0.6774
Episode:4718 meanR:166.5000 gloss:-0.8501 dloss:0.5252 dlossR:0.5252 dlossQ:0.5343
Episode:4719 meanR:166.5400 gloss:-1.3326 dloss:0.2927 dlossR:0.2927 dlossQ:0.4191
Episode:4720 meanR:166.8400 gloss:-1.0388 dloss:0.4639 dlossR:0.4639 dlossQ:0.4793
Episode:4721 meanR:164.7200 gloss:-1.2593 dloss:0.2915 dlossR:0.2915 dlossQ:0.4347
Episode:4722 meanR:161.0400 gloss:-1.1411 dloss:0.4141 dlossR:0.4141 dlossQ:0.4474
Episode:4723 meanR:160.8500 gloss:-1.2385 dloss:0.4449 dlossR:0.4449 dlossQ:0.4161
Episode:4724 meanR:160.6400 gloss:-1.5136 dloss:0.2508 dlossR:0.2508 dlossQ:0.3665
Episode:4725 meanR:160.4600 gloss:-1.4706 dloss:0.2564 dlossR:0.2564 dlossQ:0.3802
Episode:4726 meanR:159.0900 gloss:-1.9066 dloss:0.2018 dlossR:0.2018 dlossQ:0.2799
Epis

Episode:4817 meanR:160.4100 gloss:-0.7555 dloss:0.6454 dlossR:0.6454 dlossQ:0.5632
Episode:4818 meanR:161.2000 gloss:-0.7569 dloss:0.7108 dlossR:0.7108 dlossQ:0.5632
Episode:4819 meanR:162.3300 gloss:-0.9369 dloss:0.6143 dlossR:0.6143 dlossQ:0.5080
Episode:4820 meanR:162.7700 gloss:-1.1157 dloss:0.6053 dlossR:0.6053 dlossQ:0.4574
Episode:4821 meanR:162.7400 gloss:-1.7686 dloss:0.2157 dlossR:0.2157 dlossQ:0.2972
Episode:4822 meanR:162.1500 gloss:-2.0872 dloss:0.1963 dlossR:0.1963 dlossQ:0.2615
Episode:4823 meanR:162.7000 gloss:-0.8818 dloss:0.6135 dlossR:0.6135 dlossQ:0.5262
Episode:4824 meanR:165.5700 gloss:-0.3168 dloss:0.7811 dlossR:0.7811 dlossQ:0.6611
Episode:4825 meanR:167.1200 gloss:-0.2233 dloss:0.6642 dlossR:0.6642 dlossQ:0.6786
Episode:4826 meanR:167.1000 gloss:-2.0368 dloss:0.2115 dlossR:0.2115 dlossQ:0.2808
Episode:4827 meanR:165.8500 gloss:-2.3671 dloss:0.1700 dlossR:0.1700 dlossQ:0.2144
Episode:4828 meanR:166.2800 gloss:-1.3852 dloss:0.4315 dlossR:0.4315 dlossQ:0.3755
Epis

# Visualizing training

Below I'll plot the total rewards for each episode. I'm plotting the rolling average too, in blue.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / N 

In [None]:
eps, arr = np.array(rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(g_loss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('G losses')

In [None]:
eps, arr = np.array(d_loss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('D losses')

## Testing

Let's checkout how our trained agent plays the game.

In [28]:
import gym
env = gym.make('CartPole-v0')
env = gym.make('CartPole-v1')
# env = gym.make('Acrobot-v1')
# env = gym.make('MountainCar-v0')
# env = gym.make('Pendulum-v0')
# env = gym.make('Blackjack-v0')
# env = gym.make('FrozenLake-v0')
# env = gym.make('AirRaid-ram-v0')
# env = gym.make('AirRaid-v0')
# env = gym.make('BipedalWalker-v2')
# env = gym.make('Copy-v0')
# env = gym.make('CarRacing-v0')
# env = gym.make('Ant-v2') #mujoco
# env = gym.make('FetchPickAndPlace-v1') # mujoco required!

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    
    # Episodes/epochs
    for _ in range(1):
    #while True:
        state = env.reset()
        total_reward = 0

        # Steps/batches
        #for _ in range(111111111111111111):
        while True:
            env.render()
            action_logits = sess.run(model.actions_logits, feed_dict={model.states: np.reshape(state, [1, -1])})
            action = np.argmax(action_logits)
            state, reward, done, _ = env.step(action)
            total_reward += reward
            if done:
                break
        # Print and break condition
        print('total_reward: {}'.format(total_reward))
        if total_reward == 500:
            break
                
# Closing the env
env.close()



[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
INFO:tensorflow:Restoring parameters from checkpoints/model2.ckpt
total_reward: 500.0


## Extending this

So, Cart-Pole is a pretty simple game. However, the same model can be used to train an agent to play something much more complicated like Pong or Space Invaders. Instead of a state like we're using here though, you'd want to use convolutional layers to get the state from the screen images.

![Deep Q-Learning Atari](assets/atari-network.png)

I'll leave it as a challenge for you to use deep Q-learning to train an agent to play Atari games. Here's the original paper which will get you started: http://www.davidqiu.com:8888/research/nature14236.pdf.