# DPG for BipedalWalker

In this notebook, we'll build a neural network that can learn to play games through reinforcement learning. More specifically, we'll use Q-learning to train an agent to play a game called [Cart-Pole](https://gym.openai.com/envs/CartPole-v0). In this game, a freely swinging pole is attached to a cart. The cart can move to the left and right, and the goal is to keep the pole upright as long as possible.

We can simulate this game using [OpenAI Gym](https://gym.openai.com/). First, let's check out how OpenAI Gym works. Then, we'll get into training an agent to play the Cart-Pole game.

In [1]:
# In this one we should define and detect GPUs for tensorflow
# GPUs or CPU
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.11.0
Default GPU Device: /device:GPU:0


>**Note:** Make sure you have OpenAI Gym cloned into the same directory with this notebook. I've included `gym` as a submodule, so you can run `git submodule --init --recursive` to pull the contents into the `gym` repo.

##### >**Note:** Make sure you have OpenAI Gym cloned. Then run this command `pip install -e gym/[all]`.

In [2]:
import gym
env = gym.make('BipedalWalker-v2')

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m


We interact with the simulation through `env`. To show the simulation running, you can use `env.render()` to render one frame. Passing in an action as an integer to `env.step` will generate the next step in the simulation.  You can see how many actions are possible from `env.action_space` and to get a random action you can use `env.action_space.sample()`. This is general to all Gym games. In the Cart-Pole game, there are two possible actions, moving the cart left or right. So there are two actions we can take, encoded as 0 and 1.

Run the code below to watch the simulation run.

In [3]:
env.observation_space, env.action_space

(Box(24,), Box(4,))

In [4]:
state = env.reset()
batch = []
for _ in range(1111):
    #env.render()
    action = env.action_space.sample()
    next_state, reward, done, _ = env.step(action) # take a random action
    batch.append([state, action, next_state, reward, float(done)])
    state = next_state
    if done:
        state = env.reset()

To shut the window showing the simulation, use `env.close()`.

In [5]:
# env.close()

If you ran the simulation above, we can look at the rewards:

In [6]:
import numpy as np
actions = np.array([each[0] for each in batch])
states = np.array([each[1] for each in batch])
rewards = np.array([each[2] for each in batch])
dones = np.array([each[3] for each in batch])
infos = np.array([each[4] for each in batch])

In [7]:
print('states:', np.max(np.array(states)), np.min(np.array(states)))
print('actions:', np.max(np.array(actions)), np.min(np.array(actions)))
print('rewards:', np.max(np.array(rewards)), np.min(np.array(rewards)))

states: 0.99992806 -0.9998551
actions: 2.40696652730306 -2.2228919665018716
rewards: 2.40696652730306 -2.2228919665018716


In [8]:
env.action_space.high, env.action_space.low

(array([1., 1., 1., 1.], dtype=float32),
 array([-1., -1., -1., -1.], dtype=float32))

The game resets after the pole has fallen past a certain angle. For each frame while the simulation is running, it returns a reward of 1.0. The longer the game runs, the more reward we get. Then, our network's goal is to maximize the reward by keeping the pole vertical. It will do this by moving the cart to the left and the right.

## Q-Network

We train our Q-learning agent using the Bellman Equation:

$$
Q(s, a) = r + \gamma \max{Q(s', a')}
$$

where $s$ is a state, $a$ is an action, and $s'$ is the next state from state $s$ and action $a$.

Before we used this equation to learn values for a Q-_table_. However, for this game there are a huge number of states available. The state has four values: the position and velocity of the cart, and the position and velocity of the pole. These are all real-valued numbers, so ignoring floating point precisions, you practically have infinite states. Instead of using a table then, we'll replace it with a neural network that will approximate the Q-table lookup function.

<img src="assets/deep-q-learning.png" width=450px>

Now, our Q value, $Q(s, a)$ is calculated by passing in a state to the network. The output will be Q-values for each available action, with fully connected hidden layers.

<img src="assets/q-network.png" width=550px>


As I showed before, we can define our targets for training as $\hat{Q}(s,a) = r + \gamma \max{Q(s', a')}$. Then we update the weights by minimizing $(\hat{Q}(s,a) - Q(s,a))^2$. 

For this Cart-Pole game, we have four inputs, one for each value in the state, and two outputs, one for each action. To get $\hat{Q}$, we'll first choose an action, then simulate the game using that action. This will get us the next state, $s'$, and the reward. With that, we can calculate $\hat{Q}$ then pass it back into the $Q$ network to run the optimizer and update the weights.

Below is my implementation of the Q-network. I used two fully connected layers with ReLU activations. Two seems to be good enough, three might be better. Feel free to try it out.

In [13]:
def model_input(state_size, action_size):
    #states = tf.placeholder(tf.float32, [None, *state_shape], name='states')
    states = tf.placeholder(tf.float32, [None, state_size], name='states')
    actions = tf.placeholder(tf.float32, [None, action_size], name='actions')
    targetQs = tf.placeholder(tf.float32, [None], name='targetQs')
    rewards = tf.placeholder(tf.float32, [None], name='rewards')
    training = tf.placeholder(tf.bool, [], name='training')
    return states, actions, targetQs, rewards, training

In [14]:
# Generator/Controller: Generating/prediting the actions
def generator(states, action_size, hidden_size, reuse=False, alpha=0.1, training=False):
    with tf.variable_scope('generator', reuse=reuse):
        # First fully connected layer
        h1 = tf.layers.dense(inputs=states, units=hidden_size)
        bn1 = tf.layers.batch_normalization(h1, training=training)        
        nl1 = tf.maximum(alpha * bn1, bn1)
        
        # Second fully connected layer
        h2 = tf.layers.dense(inputs=nl1, units=hidden_size)
        bn2 = tf.layers.batch_normalization(h2, training=training)        
        nl2 = tf.maximum(alpha * bn2, bn2)
        
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=action_size)        
        predictions = tf.nn.tanh(logits) # [-1, +1]

        # return actions logits
        return predictions

In [15]:
# Discriminator/Dopamine: Reward function/planner/naviator/advisor/supervisor/cortical columns
def discriminator(states, actions, hidden_size, reuse=False, alpha=0.1, training=False):
    with tf.variable_scope('discriminator', reuse=reuse):
        # First fully connected layer
        h1 = tf.layers.dense(inputs=states, units=action_size)
        bn1 = tf.layers.batch_normalization(h1, training=training)        
        nl1 = tf.maximum(alpha * bn1, bn1)
        
        # Second fully connected layer
        fused = tf.concat(axis=1, values=[nl1, actions])
        h2 = tf.layers.dense(inputs=fused, units=hidden_size)
        bn2 = tf.layers.batch_normalization(h2, training=training)        
        nl2 = tf.maximum(alpha * bn2, bn2)
                
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=1)        
        #predictions = tf.nn.softmax(logits)

        # return rewards logits
        return logits

In [16]:
# # DDPG
# def model_loss(action_size, hidden_size, states, actions, targetQs, rates):
#     actions_preds = generator(states=states, hidden_size=hidden_size, action_size=action_size)
#     gQs = discriminator(actions=actions_preds, hidden_size=hidden_size, states=states) # nextQs/targetQs
#     gloss = -tf.reduce_mean(gQs)
#     dQs = discriminator(actions=actions, hidden_size=hidden_size, states=states, reuse=True) # Qs
#     targetQs = tf.reshape(targetQs, shape=[-1, 1]) # gQs
#     dloss = tf.reduce_mean(tf.square(dQs - targetQs)) # DQN
#     rates = tf.reshape(rates, shape=[-1, 1]) # [-1,+1]
#     dloss += tf.reduce_mean(tf.square(tf.nn.tanh(dQs) - rates)) # DQN
#     return actions_preds, gQs, gloss, dloss

In [17]:
# Adverseial Q-learning
def model_loss(action_size, hidden_size, states, actions, targetQs, rewards, training):
    actions_preds = generator(states=states, hidden_size=hidden_size, action_size=action_size, training=training)
    gQs = discriminator(actions=actions_preds, hidden_size=hidden_size, states=states, training=training) #nextQs
    dQs = discriminator(actions=actions, hidden_size=hidden_size, states=states, training=training, reuse=True)#Qs
    targetQs = tf.reshape(targetQs, shape=[-1, 1]) # gQs
    gloss = tf.reduce_mean(tf.square(gQs - targetQs)) # DQN
    dloss = tf.reduce_mean(tf.square(dQs - targetQs)) # DQN
    rewards = tf.reshape(rewards, shape=[-1, 1]) # gQs
    dloss += tf.reduce_mean(tf.square(dQs - rewards)) # DQN
    return actions_preds, gQs, gloss, dloss

In [18]:
# Optimizating/training/learning G & D
def model_opt(g_loss, d_loss, g_learning_rate, d_learning_rate):
    # Get weights and bias to update
    t_vars = tf.trainable_variables()
    g_vars = [var for var in t_vars if var.name.startswith('generator')]
    d_vars = [var for var in t_vars if var.name.startswith('discriminator')]

    # Optimize
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): # Required for batchnorm (BN)
        g_opt = tf.train.AdamOptimizer(g_learning_rate).minimize(g_loss, var_list=g_vars)
        d_opt = tf.train.AdamOptimizer(d_learning_rate).minimize(d_loss, var_list=d_vars)
    return g_opt, d_opt

In [19]:
class Model:
    def __init__(self, state_size, action_size, hidden_size, g_learning_rate, d_learning_rate):

        # Data of the Model: make the data available inside the framework
        self.states, self.actions, self.targetQs, self.rewards, self.training = model_input(
            state_size=state_size, action_size=action_size)

        # Create the Model: calculating the loss and forwad pass
        self.actions_preds, self.Qs_logits, self.g_loss, self.d_loss = model_loss(
            action_size=action_size, hidden_size=hidden_size, # model init parameters
            states=self.states, actions=self.actions, targetQs=self.targetQs, 
            rewards=self.rewards, training=self.training) # model input
        
        # Update the model: backward pass and backprop
        self.g_opt, self.d_opt = model_opt(g_loss=self.g_loss, 
                                           d_loss=self.d_loss,
                                           g_learning_rate=g_learning_rate, 
                                           d_learning_rate=d_learning_rate)

In [20]:
from collections import deque
class Memory():
    def __init__(self, max_size = 1000):
        self.buffer = deque(maxlen=max_size)
    def sample(self, batch_size):
        idx = np.random.choice(np.arange(len(self.buffer)), 
                               size=batch_size, 
                               replace=False)
        return [self.buffer[ii] for ii in idx]

## Hyperparameters

One of the more difficult aspects of reinforcememt learning are the large number of hyperparameters. Not only are we tuning the network, but we're tuning the simulation.

In [21]:
env.observation_space, env.action_space

(Box(24,), Box(4,))

In [22]:
# Exploration parameters
explore_start = 1.0            # exploration probability at start
explore_stop = 0.01           # minimum exploration probability 
decay_rate = 0.0001            # exponential decay rate for exploration prob

# Network parameters
state_size = 24
action_size = 4
hidden_size = 24*2             # number of units in each Q-network hidden layer
g_learning_rate = 1e-4         # Q-network learning rate
d_learning_rate = 1e-4         # Q-network learning rate

# Memory parameters
memory_size = int(1e5)            # memory capacity
batch_size = int(1e2)             # experience mini-batch size == one episode size is 1000/int(1e3) steps
gamma = 0.99                   # future reward discount

In [23]:
# Reset/init the graph/session
graph = tf.reset_default_graph()

# Init the model
model = Model(action_size=action_size, state_size=state_size, hidden_size=hidden_size,
              g_learning_rate=g_learning_rate, d_learning_rate=d_learning_rate)

# Init the memory
memory = Memory(max_size=memory_size)

In [24]:
env.action_space.high, env.action_space.low, env.action_space.shape, \
env.reward_range, env.action_space

(array([1., 1., 1., 1.], dtype=float32),
 array([-1., -1., -1., -1.], dtype=float32),
 (4,),
 (-inf, inf),
 Box(4,))

In [25]:
print('states:', np.max(np.array(states)), np.min(np.array(states)))
print('actions:', np.max(np.array(actions)), np.min(np.array(actions)))
print('rewards:', np.max(np.array(rewards)), np.min(np.array(rewards)))

states: 0.99992806 -0.9998551
actions: 2.40696652730306 -2.2228919665018716
rewards: 2.40696652730306 -2.2228919665018716


In [26]:
state = env.reset()
total_reward = 0
num_step = 0
for each_step in range(memory_size):
    action = env.action_space.sample() # randomness
    action = np.clip(action, -1, 1) # clipped: [-1, +1]
    next_state, reward, done, _ = env.step(action)
    rate = -1 # success rate: [-1, +1]
    memory.buffer.append([state, action, next_state, reward, float(done), rate])
    num_step += 1 # memory updated
    total_reward += reward # max reward 300
    state = next_state
    if done is True:
        print('Progress:', each_step/memory_size)
        state = env.reset()
        # Best 100-episode average reward was 220.62 ± 0.69. 
        # (BipedalWalker-v2 is considered "solved" 
        #  when the agent obtains an average reward of at least 300 over 100 consecutive episodes.)        
        rate = total_reward/300
        rate = np.clip(rate, -1, 1) 
        total_reward = 0 # reset
        for idx in range(num_step): # episode length
            if memory.buffer[-1-idx][-1] == -1:
                memory.buffer[-1-idx][-1] = rate
        num_step = 0 # reset

Progress: 0.01599
Progress: 0.03199
Progress: 0.0328
Progress: 0.0334
Progress: 0.03418
Progress: 0.03473
Progress: 0.03537
Progress: 0.03601
Progress: 0.05201
Progress: 0.05283
Progress: 0.0537
Progress: 0.0697
Progress: 0.07052
Progress: 0.08652
Progress: 0.10252
Progress: 0.11852
Progress: 0.13452
Progress: 0.13558
Progress: 0.13634
Progress: 0.13714
Progress: 0.13839
Progress: 0.13886
Progress: 0.13955
Progress: 0.14011
Progress: 0.14063
Progress: 0.14135
Progress: 0.15735
Progress: 0.15834
Progress: 0.15884
Progress: 0.15921
Progress: 0.15986
Progress: 0.16046
Progress: 0.17646
Progress: 0.19246
Progress: 0.20846
Progress: 0.20959
Progress: 0.22559
Progress: 0.22627
Progress: 0.22707
Progress: 0.24307
Progress: 0.24376
Progress: 0.24518
Progress: 0.2463
Progress: 0.24701
Progress: 0.24766
Progress: 0.24821
Progress: 0.24917
Progress: 0.24964
Progress: 0.26564
Progress: 0.26638
Progress: 0.28238
Progress: 0.28316
Progress: 0.28381
Progress: 0.28477
Progress: 0.30077
Progress: 0.301

## Training the model

Below we'll train our agent. If you want to watch it train, uncomment the `env.render()` line. This is slow because it's rendering the frames slower than the network can train. But, it's cool to watch the agent get better at the game.

In [None]:
# Save/load the model and save for plotting
saver = tf.train.Saver()
episode_rewards_list, rewards_list, gloss_list, dloss_list = [], [], [], []

# TF session for training
with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model2.ckpt')    
    #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    episode_reward = deque(maxlen=100) # 100 episodes average/running average/running mean/window
    
    # Training episodes/epochs
    for ep in range(11111):
        total_reward = 0
        num_step = 0
        gloss_batch, dloss_batch = [], []
        state = env.reset()

        # Training steps/batches
        while True:
            action_preds = sess.run(model.actions_preds, feed_dict={model.states: state.reshape([1, -1]), 
                                                                    model.training: False})
            noise = np.random.normal(loc=0, scale=0.1, size=action_size) # randomness
            action = action_preds.reshape([-1]) + noise
            #print(action.shape, action_logits.shape, noise.shape)
            action = np.clip(action, -1, 1) # clipped
            next_state, reward, done, _ = env.step(action)
            rate = -1 # success rate: -1 to +1
            memory.buffer.append([state, action, next_state, reward, float(done), rate])
            num_step += 1 # memory updated
            total_reward += reward # max reward 300
            state = next_state
            
            if done is True:
                # Best 100-episode average reward was 220.62 ± 0.69. 
                # (BipedalWalker-v2 is considered "solved" 
                #  when the agent obtains an average reward of at least 300 over 100 consecutive episodes.)        
                rate = total_reward/300
                rate = np.clip(rate, -1, 1)
                for idx in range(num_step): # episode length
                    if memory.buffer[-1-idx][-1] == -1:
                        memory.buffer[-1-idx][-1] = rate
                        
            # Training
            batch = memory.sample(batch_size)
            states = np.array([each[0] for each in batch])
            actions = np.array([each[1] for each in batch])
            next_states = np.array([each[2] for each in batch])
            rewards = np.array([each[3] for each in batch])
            dones = np.array([each[4] for each in batch])
            #rates = np.array([each[5] for each in batch])
            nextQs_logits = sess.run(model.Qs_logits, feed_dict = {model.states: next_states, 
                                                                   model.training: False})
            #nextQs = np.max(nextQs_logits, axis=1) * (1-dones) # discrete DQN
            nextQs = nextQs_logits.reshape([-1]) * (1-dones) # continuous DPG
            targetQs = rewards + (gamma * nextQs)
            gloss, dloss, _, _ = sess.run([model.g_loss, model.d_loss, model.g_opt, model.d_opt],
                                          feed_dict = {model.states: states, 
                                                       model.actions: actions,
                                                       model.targetQs: targetQs, 
                                                       model.rewards: rewards, 
                                                       model.training: False})
            gloss_batch.append(gloss)
            dloss_batch.append(dloss)
            if done is True:
                break
                
        episode_reward.append(total_reward)
        print('Episode:{}'.format(ep),
              'meanR:{:.4f}'.format(np.mean(episode_reward)),
              'R:{:.4f}'.format(total_reward),
              'gloss:{:.4f}'.format(np.mean(gloss_batch)),
              'dloss:{:.4f}'.format(np.mean(dloss_batch)))
        # Ploting out
        episode_rewards_list.append([ep, np.mean(episode_reward)])
        rewards_list.append([ep, total_reward])
        gloss_list.append([ep, np.mean(gloss_batch)])
        dloss_list.append([ep, np.mean(dloss_batch)])
        # Break episode/epoch loop
        # Did not solve the environment. 
        # Best 100-episode average reward was 220.62 ± 0.69. 
        # (BipedalWalker-v2 is considered "solved" 
        #  when the agent obtains an average reward of at least 300 over 100 consecutive episodes.)        
        if np.mean(episode_reward) >= 300:
            break
            
    # At the end of all training episodes/epochs
    saver.save(sess, 'checkpoints/model.ckpt')

Episode:0 meanR:-111.7980 R:-111.7980 gloss:9.8430 dloss:19.8135
Episode:1 meanR:-109.8885 R:-107.9789 gloss:13.9129 dloss:27.9426
Episode:2 meanR:-110.1274 R:-110.6054 gloss:11.2913 dloss:22.7019
Episode:3 meanR:-110.2813 R:-110.7431 gloss:7.3838 dloss:14.9343
Episode:4 meanR:-110.4448 R:-111.0987 gloss:22.0639 dloss:44.4509
Episode:5 meanR:-110.9273 R:-113.3396 gloss:19.3048 dloss:39.0162
Episode:6 meanR:-111.3878 R:-114.1509 gloss:9.0165 dloss:18.3379
Episode:7 meanR:-111.7251 R:-114.0862 gloss:12.7251 dloss:25.8868
Episode:8 meanR:-112.5053 R:-118.7471 gloss:9.8833 dloss:20.1978
Episode:9 meanR:-112.9107 R:-116.5594 gloss:16.7994 dloss:34.2500
Episode:10 meanR:-113.2801 R:-116.9732 gloss:14.6961 dloss:30.1170
Episode:11 meanR:-113.5635 R:-116.6812 gloss:15.9536 dloss:32.6494
Episode:12 meanR:-113.9803 R:-118.9827 gloss:13.1512 dloss:27.1107
Episode:13 meanR:-114.2314 R:-117.4953 gloss:16.1415 dloss:33.1753
Episode:14 meanR:-113.7577 R:-107.1259 gloss:9.9885 dloss:20.7813
Episode:15

Episode:123 meanR:-112.8843 R:-102.3972 gloss:20.7027 dloss:45.1558
Episode:124 meanR:-112.6894 R:-103.0991 gloss:14.4157 dloss:31.7215
Episode:125 meanR:-112.5409 R:-102.6059 gloss:9.9077 dloss:24.2310
Episode:126 meanR:-112.3894 R:-102.1229 gloss:15.4140 dloss:33.7348
Episode:127 meanR:-112.2359 R:-102.5483 gloss:8.9320 dloss:21.3697
Episode:128 meanR:-112.0521 R:-101.9588 gloss:10.7403 dloss:23.8944
Episode:129 meanR:-111.8483 R:-101.8554 gloss:12.9678 dloss:28.9818
Episode:130 meanR:-111.4880 R:-101.9568 gloss:10.0014 dloss:23.7242
Episode:131 meanR:-111.1039 R:-103.2578 gloss:16.6071 dloss:35.8448
Episode:132 meanR:-110.5208 R:-101.4779 gloss:14.2578 dloss:31.9372
Episode:133 meanR:-109.8111 R:-102.2405 gloss:7.6782 dloss:18.5495
Episode:134 meanR:-109.0876 R:-101.5020 gloss:11.8704 dloss:26.7631
Episode:135 meanR:-108.4740 R:-104.1691 gloss:18.4932 dloss:39.5925
Episode:136 meanR:-107.8492 R:-101.7094 gloss:21.0530 dloss:46.1926
Episode:137 meanR:-107.1734 R:-101.6227 gloss:27.12

Episode:244 meanR:-103.1080 R:-104.6888 gloss:17.1103 dloss:44.3308
Episode:245 meanR:-103.1336 R:-104.6548 gloss:18.8239 dloss:47.7800
Episode:246 meanR:-103.1832 R:-107.0947 gloss:21.8734 dloss:56.3333
Episode:247 meanR:-103.2067 R:-104.5480 gloss:19.4688 dloss:50.7278
Episode:248 meanR:-103.8401 R:-167.8759 gloss:15.6641 dloss:42.1974
Episode:249 meanR:-103.8820 R:-105.9269 gloss:17.0835 dloss:46.0811
Episode:250 meanR:-103.9054 R:-104.5138 gloss:10.8119 dloss:33.1467
Episode:251 meanR:-103.9357 R:-104.5175 gloss:15.4464 dloss:41.2037
Episode:252 meanR:-103.9486 R:-103.4316 gloss:15.7599 dloss:42.0236
Episode:253 meanR:-103.9753 R:-104.6192 gloss:14.0971 dloss:38.9043
Episode:254 meanR:-103.9944 R:-104.5655 gloss:14.3083 dloss:39.1521
Episode:255 meanR:-104.0187 R:-104.4453 gloss:10.8316 dloss:32.4312
Episode:256 meanR:-104.0456 R:-104.4681 gloss:16.2564 dloss:42.9926
Episode:257 meanR:-104.0602 R:-104.7203 gloss:13.6910 dloss:37.5733
Episode:258 meanR:-104.0939 R:-104.9933 gloss:17

Episode:365 meanR:-103.3263 R:-102.0351 gloss:10.2737 dloss:35.3552
Episode:366 meanR:-103.2894 R:-102.5570 gloss:16.2545 dloss:49.2373
Episode:367 meanR:-103.2785 R:-102.0890 gloss:15.0784 dloss:45.4551
Episode:368 meanR:-103.2652 R:-102.4729 gloss:21.4870 dloss:61.2815
Episode:369 meanR:-103.2176 R:-103.4501 gloss:7.5291 dloss:33.0521
Episode:370 meanR:-102.5571 R:-102.5559 gloss:14.0514 dloss:43.4073
Episode:371 meanR:-102.5628 R:-102.3944 gloss:17.5748 dloss:50.5206
Episode:372 meanR:-102.5657 R:-102.7551 gloss:20.0415 dloss:59.8901
Episode:373 meanR:-102.5700 R:-103.2328 gloss:13.5138 dloss:45.5287
Episode:374 meanR:-102.5657 R:-102.4697 gloss:24.3097 dloss:68.9185
Episode:375 meanR:-102.5758 R:-103.1987 gloss:18.6447 dloss:57.3796
Episode:376 meanR:-102.5793 R:-102.3283 gloss:16.4179 dloss:53.3885
Episode:377 meanR:-102.5862 R:-102.4724 gloss:14.3712 dloss:44.5990
Episode:378 meanR:-102.5805 R:-102.5108 gloss:9.2366 dloss:37.5894
Episode:379 meanR:-102.5878 R:-103.0835 gloss:13.3

Episode:486 meanR:-102.6445 R:-103.4128 gloss:11.6588 dloss:44.1914
Episode:487 meanR:-102.6501 R:-102.8482 gloss:17.6935 dloss:65.9183
Episode:488 meanR:-102.6392 R:-102.2609 gloss:12.3218 dloss:52.0328
Episode:489 meanR:-102.6463 R:-103.3405 gloss:12.3096 dloss:49.8893
Episode:490 meanR:-102.6441 R:-102.3062 gloss:17.2763 dloss:58.4499
Episode:491 meanR:-102.6449 R:-102.4207 gloss:13.2451 dloss:55.7419
Episode:492 meanR:-102.6528 R:-103.3196 gloss:16.1602 dloss:58.0833
Episode:493 meanR:-102.6507 R:-101.9060 gloss:12.7373 dloss:51.4778
Episode:494 meanR:-102.6401 R:-102.2724 gloss:13.5622 dloss:56.2041
Episode:495 meanR:-102.6329 R:-102.3819 gloss:11.5353 dloss:51.7601
Episode:496 meanR:-102.6345 R:-102.1513 gloss:9.9359 dloss:41.6999
Episode:497 meanR:-102.6360 R:-103.4862 gloss:11.7060 dloss:48.6551
Episode:498 meanR:-102.6350 R:-102.3472 gloss:16.2324 dloss:64.2932
Episode:499 meanR:-102.6325 R:-102.2825 gloss:12.7156 dloss:58.2335
Episode:500 meanR:-102.6371 R:-102.3460 gloss:9.9

Episode:607 meanR:-102.6535 R:-102.0418 gloss:12.4395 dloss:51.7984
Episode:608 meanR:-102.6400 R:-101.9782 gloss:13.5824 dloss:52.9567
Episode:609 meanR:-102.6407 R:-102.4307 gloss:12.9237 dloss:58.4519
Episode:610 meanR:-102.6402 R:-101.9631 gloss:12.6938 dloss:54.3334
Episode:611 meanR:-102.6341 R:-101.9542 gloss:11.7002 dloss:45.8080
Episode:612 meanR:-102.6291 R:-102.0139 gloss:11.7612 dloss:51.1325
Episode:613 meanR:-102.6472 R:-104.1267 gloss:17.3940 dloss:60.0646
Episode:614 meanR:-102.6515 R:-102.4405 gloss:11.8665 dloss:52.4418
Episode:615 meanR:-102.6511 R:-103.1876 gloss:12.2730 dloss:52.3295
Episode:616 meanR:-102.6485 R:-102.0835 gloss:15.9060 dloss:58.3875
Episode:617 meanR:-102.6336 R:-101.7993 gloss:14.4072 dloss:57.7562
Episode:618 meanR:-102.6367 R:-103.5671 gloss:11.9483 dloss:55.0075
Episode:619 meanR:-102.6363 R:-102.2749 gloss:13.5784 dloss:55.2951
Episode:620 meanR:-102.6237 R:-102.0796 gloss:15.7540 dloss:62.1466
Episode:621 meanR:-102.6098 R:-102.0041 gloss:15

Episode:728 meanR:-102.6940 R:-103.1187 gloss:19.7717 dloss:67.4188
Episode:729 meanR:-102.7055 R:-103.0869 gloss:15.1503 dloss:63.0803
Episode:730 meanR:-102.7075 R:-103.3841 gloss:15.8818 dloss:62.3572
Episode:731 meanR:-102.7084 R:-103.2618 gloss:15.6908 dloss:54.0638
Episode:732 meanR:-102.7082 R:-101.9919 gloss:13.3767 dloss:53.9163
Episode:733 meanR:-102.7048 R:-101.7228 gloss:16.3547 dloss:62.8434
Episode:734 meanR:-102.7161 R:-104.1152 gloss:16.3718 dloss:63.7087
Episode:735 meanR:-102.7091 R:-102.4468 gloss:14.2971 dloss:49.7229
Episode:736 meanR:-102.7034 R:-102.6037 gloss:17.6155 dloss:64.5103
Episode:737 meanR:-102.6971 R:-102.3500 gloss:14.8574 dloss:55.6901
Episode:738 meanR:-102.7009 R:-102.3236 gloss:15.2550 dloss:57.9979
Episode:739 meanR:-102.7179 R:-103.6403 gloss:16.2525 dloss:65.2881
Episode:740 meanR:-102.7182 R:-103.1908 gloss:16.9262 dloss:66.0755
Episode:741 meanR:-102.7338 R:-103.5365 gloss:18.5623 dloss:65.8148
Episode:742 meanR:-102.7432 R:-103.3787 gloss:17

Episode:849 meanR:-102.7253 R:-102.0574 gloss:20.0008 dloss:69.5863
Episode:850 meanR:-102.7175 R:-102.3754 gloss:20.1596 dloss:72.2599
Episode:851 meanR:-102.7188 R:-101.8259 gloss:19.4985 dloss:79.6473
Episode:852 meanR:-102.7157 R:-103.1322 gloss:19.8892 dloss:70.1597
Episode:853 meanR:-102.7167 R:-103.2054 gloss:18.3750 dloss:72.3172
Episode:854 meanR:-102.7066 R:-102.2651 gloss:21.7233 dloss:84.7204
Episode:855 meanR:-102.7081 R:-103.6958 gloss:21.7034 dloss:78.0904
Episode:856 meanR:-102.7295 R:-104.1463 gloss:18.1690 dloss:71.4603
Episode:857 meanR:-102.7294 R:-104.1648 gloss:18.7129 dloss:70.6856
Episode:858 meanR:-102.7282 R:-103.0696 gloss:17.7420 dloss:74.1646
Episode:859 meanR:-102.7269 R:-103.2155 gloss:18.8887 dloss:72.5043
Episode:860 meanR:-102.7201 R:-102.0248 gloss:16.8600 dloss:58.0152
Episode:861 meanR:-102.7125 R:-103.2372 gloss:20.3986 dloss:76.0337
Episode:862 meanR:-102.7002 R:-102.0340 gloss:21.1185 dloss:84.1795
Episode:863 meanR:-102.7121 R:-103.1371 gloss:14

Episode:970 meanR:-102.2260 R:-101.1376 gloss:23.2023 dloss:82.2771
Episode:971 meanR:-102.2173 R:-101.1832 gloss:21.0399 dloss:76.5289
Episode:972 meanR:-102.2134 R:-101.7656 gloss:24.7815 dloss:81.9660
Episode:973 meanR:-102.1988 R:-101.6126 gloss:23.2502 dloss:77.4455
Episode:974 meanR:-102.2147 R:-103.3008 gloss:25.2804 dloss:88.6531
Episode:975 meanR:-102.2060 R:-101.2214 gloss:23.0335 dloss:77.9347
Episode:976 meanR:-102.1965 R:-101.7683 gloss:27.9717 dloss:94.8324
Episode:977 meanR:-102.1950 R:-103.2814 gloss:24.5801 dloss:84.3924
Episode:978 meanR:-102.1889 R:-101.4655 gloss:24.1547 dloss:77.9768
Episode:979 meanR:-102.1929 R:-103.3929 gloss:23.4136 dloss:79.4588
Episode:980 meanR:-102.1916 R:-101.3786 gloss:23.9366 dloss:82.3838
Episode:981 meanR:-102.1784 R:-101.7082 gloss:24.1498 dloss:83.2699
Episode:982 meanR:-102.1513 R:-101.7068 gloss:20.6277 dloss:74.1153
Episode:983 meanR:-102.1359 R:-101.7315 gloss:24.4914 dloss:84.2285
Episode:984 meanR:-102.1513 R:-103.2870 gloss:24

Episode:1090 meanR:-102.1315 R:-103.0075 gloss:29.3111 dloss:93.3961
Episode:1091 meanR:-102.1315 R:-101.1071 gloss:29.1490 dloss:90.7081
Episode:1092 meanR:-102.1391 R:-103.3355 gloss:27.2662 dloss:85.1563
Episode:1093 meanR:-102.1216 R:-101.1121 gloss:28.1148 dloss:95.7848
Episode:1094 meanR:-102.0902 R:-101.4501 gloss:27.8945 dloss:91.3652
Episode:1095 meanR:-102.0674 R:-101.1203 gloss:28.3316 dloss:89.6648
Episode:1096 meanR:-102.0650 R:-101.4904 gloss:28.7605 dloss:91.6138
Episode:1097 meanR:-102.0366 R:-101.2051 gloss:27.9824 dloss:91.4915
Episode:1098 meanR:-102.0183 R:-101.0777 gloss:25.1859 dloss:81.5212
Episode:1099 meanR:-102.0191 R:-102.5894 gloss:29.7827 dloss:95.3829
Episode:1100 meanR:-102.0160 R:-101.5167 gloss:25.0728 dloss:77.7862
Episode:1101 meanR:-102.0097 R:-101.0949 gloss:29.3441 dloss:93.1104
Episode:1102 meanR:-101.9880 R:-100.8206 gloss:28.6793 dloss:95.4004
Episode:1103 meanR:-101.9828 R:-101.0232 gloss:24.1338 dloss:80.5949
Episode:1104 meanR:-101.9827 R:-10

Episode:1209 meanR:-101.8333 R:-101.1440 gloss:31.0278 dloss:90.6095
Episode:1210 meanR:-101.8169 R:-100.9856 gloss:33.8465 dloss:102.4181
Episode:1211 meanR:-101.8294 R:-102.3931 gloss:26.4253 dloss:79.4146
Episode:1212 meanR:-101.8109 R:-101.1847 gloss:34.6843 dloss:109.6397
Episode:1213 meanR:-101.8085 R:-101.1201 gloss:32.4355 dloss:97.5544
Episode:1214 meanR:-101.8075 R:-101.1802 gloss:30.1711 dloss:92.7110
Episode:1215 meanR:-101.8207 R:-102.9944 gloss:31.6804 dloss:96.0133
Episode:1216 meanR:-101.8195 R:-101.3457 gloss:32.4773 dloss:100.1697
Episode:1217 meanR:-101.8026 R:-101.0864 gloss:30.5811 dloss:93.1036
Episode:1218 meanR:-101.7824 R:-101.0493 gloss:35.0777 dloss:104.9546
Episode:1219 meanR:-101.7830 R:-101.5673 gloss:33.4106 dloss:99.4891
Episode:1220 meanR:-101.7858 R:-101.5213 gloss:34.0856 dloss:104.2505
Episode:1221 meanR:-101.7917 R:-101.6359 gloss:30.2570 dloss:91.8312
Episode:1222 meanR:-101.7737 R:-101.6072 gloss:35.8630 dloss:108.0262
Episode:1223 meanR:-101.7505

Episode:1328 meanR:-101.5369 R:-100.7633 gloss:33.4418 dloss:96.3724
Episode:1329 meanR:-101.5331 R:-101.0897 gloss:34.6710 dloss:100.9524
Episode:1330 meanR:-101.5095 R:-101.7141 gloss:36.2986 dloss:105.1582
Episode:1331 meanR:-101.5243 R:-102.6288 gloss:36.6096 dloss:104.9166
Episode:1332 meanR:-101.5082 R:-101.1766 gloss:33.0446 dloss:92.4618
Episode:1333 meanR:-101.5037 R:-100.9669 gloss:32.8760 dloss:94.9597
Episode:1334 meanR:-101.4879 R:-100.7779 gloss:31.0053 dloss:92.1582
Episode:1335 meanR:-101.4855 R:-100.9148 gloss:34.0007 dloss:94.0209
Episode:1336 meanR:-101.4662 R:-100.9642 gloss:40.4239 dloss:113.0634
Episode:1337 meanR:-101.4488 R:-101.3407 gloss:34.3466 dloss:103.7367
Episode:1338 meanR:-101.4468 R:-100.8836 gloss:33.1308 dloss:94.5592
Episode:1339 meanR:-101.4376 R:-100.7316 gloss:31.7190 dloss:98.0635
Episode:1340 meanR:-101.4281 R:-101.1716 gloss:35.5365 dloss:106.7316
Episode:1341 meanR:-101.4235 R:-100.9507 gloss:31.8952 dloss:91.5059
Episode:1342 meanR:-101.4311

Episode:1446 meanR:-101.5828 R:-102.7434 gloss:33.4975 dloss:93.7935
Episode:1447 meanR:-101.5757 R:-100.8516 gloss:36.3150 dloss:101.8332
Episode:1448 meanR:-101.5963 R:-102.9701 gloss:42.6921 dloss:116.1877
Episode:1449 meanR:-101.6006 R:-101.5305 gloss:33.5693 dloss:90.5766
Episode:1450 meanR:-101.6023 R:-100.9914 gloss:42.5316 dloss:123.2792
Episode:1451 meanR:-101.6012 R:-100.9390 gloss:32.8812 dloss:93.1228
Episode:1452 meanR:-101.6062 R:-101.6297 gloss:37.4910 dloss:103.7329
Episode:1453 meanR:-101.6199 R:-102.4874 gloss:35.8952 dloss:97.2421
Episode:1454 meanR:-101.6441 R:-103.4367 gloss:39.8200 dloss:120.4238
Episode:1455 meanR:-101.6453 R:-100.9663 gloss:34.8883 dloss:94.2995
Episode:1456 meanR:-101.6373 R:-100.8117 gloss:45.4193 dloss:124.8458
Episode:1457 meanR:-101.6473 R:-101.9029 gloss:38.4256 dloss:107.9472
Episode:1458 meanR:-101.6456 R:-100.8581 gloss:35.1871 dloss:97.0746
Episode:1459 meanR:-101.6430 R:-100.7112 gloss:36.8925 dloss:102.2401
Episode:1460 meanR:-101.63

Episode:1564 meanR:-102.5775 R:-108.2020 gloss:37.5391 dloss:101.4265
Episode:1565 meanR:-102.6025 R:-103.4708 gloss:42.9892 dloss:119.8860
Episode:1566 meanR:-102.6785 R:-108.3614 gloss:41.4483 dloss:114.3820
Episode:1567 meanR:-102.6566 R:-102.1979 gloss:42.3871 dloss:119.7625
Episode:1568 meanR:-102.6808 R:-103.5386 gloss:48.5759 dloss:126.2762
Episode:1569 meanR:-102.6961 R:-102.3863 gloss:39.8150 dloss:108.5827
Episode:1570 meanR:-102.6685 R:-99.7274 gloss:45.1305 dloss:124.2958
Episode:1571 meanR:-102.6984 R:-103.8290 gloss:44.1214 dloss:118.5607
Episode:1572 meanR:-102.6902 R:-100.0059 gloss:40.0749 dloss:109.8020
Episode:1573 meanR:-102.7124 R:-103.0187 gloss:42.9816 dloss:115.3981
Episode:1574 meanR:-102.7122 R:-100.8042 gloss:44.4691 dloss:124.8490
Episode:1575 meanR:-102.7230 R:-101.9429 gloss:40.4824 dloss:109.1443
Episode:1576 meanR:-102.7371 R:-103.8208 gloss:41.5546 dloss:115.0430
Episode:1577 meanR:-102.7625 R:-103.4419 gloss:38.4296 dloss:101.7777
Episode:1578 meanR:-1

Episode:1682 meanR:-102.0853 R:-100.3319 gloss:46.3894 dloss:124.9179
Episode:1683 meanR:-102.0745 R:-102.0452 gloss:43.8577 dloss:118.8227
Episode:1684 meanR:-102.0726 R:-101.9118 gloss:41.2200 dloss:119.2564
Episode:1685 meanR:-102.0908 R:-102.8823 gloss:43.4429 dloss:117.3239
Episode:1686 meanR:-102.1028 R:-102.2632 gloss:44.7808 dloss:123.6579
Episode:1687 meanR:-102.0935 R:-101.0023 gloss:39.8218 dloss:112.2830
Episode:1688 meanR:-102.0790 R:-102.1718 gloss:44.3664 dloss:118.2294
Episode:1689 meanR:-102.0861 R:-100.8080 gloss:48.3253 dloss:133.6827
Episode:1690 meanR:-102.1135 R:-102.7031 gloss:43.9429 dloss:119.5624
Episode:1691 meanR:-102.1140 R:-102.5296 gloss:48.7614 dloss:131.2677
Episode:1692 meanR:-102.1120 R:-102.5713 gloss:44.8313 dloss:116.5293
Episode:1693 meanR:-102.0929 R:-100.6115 gloss:39.2315 dloss:110.7716
Episode:1694 meanR:-102.0765 R:-100.6219 gloss:43.0276 dloss:114.0140
Episode:1695 meanR:-102.0621 R:-100.5175 gloss:42.9015 dloss:117.1760
Episode:1696 meanR:-

Episode:1800 meanR:-100.9100 R:-100.4642 gloss:50.8085 dloss:143.9320
Episode:1801 meanR:-100.9077 R:-100.7218 gloss:44.3156 dloss:126.8076
Episode:1802 meanR:-100.9053 R:-100.5128 gloss:51.6338 dloss:141.0135
Episode:1803 meanR:-100.8727 R:-100.4417 gloss:46.4576 dloss:123.7551
Episode:1804 meanR:-100.8754 R:-100.6123 gloss:49.0177 dloss:124.1422
Episode:1805 meanR:-100.8716 R:-100.2904 gloss:48.5119 dloss:132.7769
Episode:1806 meanR:-100.8505 R:-100.4939 gloss:49.7761 dloss:129.3718
Episode:1807 meanR:-100.8278 R:-100.2814 gloss:54.5068 dloss:149.0485
Episode:1808 meanR:-100.8203 R:-100.3244 gloss:48.3853 dloss:137.5054
Episode:1809 meanR:-100.8198 R:-100.5718 gloss:48.2878 dloss:127.8036
Episode:1810 meanR:-100.8215 R:-100.4751 gloss:50.4744 dloss:135.9917
Episode:1811 meanR:-100.8225 R:-100.4246 gloss:41.7915 dloss:115.6514
Episode:1812 meanR:-100.8132 R:-100.3431 gloss:48.8813 dloss:126.6452
Episode:1813 meanR:-100.8338 R:-102.7761 gloss:45.3737 dloss:128.8336
Episode:1814 meanR:-

Episode:1918 meanR:-100.4840 R:-101.3793 gloss:47.3215 dloss:127.0289
Episode:1919 meanR:-100.4892 R:-100.9764 gloss:50.5874 dloss:135.1149
Episode:1920 meanR:-100.4940 R:-100.9478 gloss:44.0539 dloss:116.0014
Episode:1921 meanR:-100.5004 R:-100.9851 gloss:51.1274 dloss:138.2091
Episode:1922 meanR:-100.5042 R:-100.8582 gloss:45.3742 dloss:118.9074
Episode:1923 meanR:-100.5041 R:-100.9330 gloss:48.8818 dloss:132.3435
Episode:1924 meanR:-100.5082 R:-101.1092 gloss:47.0178 dloss:125.7234
Episode:1925 meanR:-100.5123 R:-100.8753 gloss:54.2303 dloss:144.1243
Episode:1926 meanR:-100.5170 R:-100.8126 gloss:48.7704 dloss:130.5954
Episode:1927 meanR:-100.5205 R:-100.8436 gloss:49.8984 dloss:137.7934
Episode:1928 meanR:-100.5283 R:-101.0374 gloss:48.2337 dloss:127.3566
Episode:1929 meanR:-100.5316 R:-100.4923 gloss:53.9835 dloss:142.8701
Episode:1930 meanR:-100.5356 R:-100.7021 gloss:55.1721 dloss:147.7764
Episode:1931 meanR:-100.5481 R:-101.6253 gloss:42.9383 dloss:111.7246
Episode:1932 meanR:-

Episode:2036 meanR:-102.7180 R:-111.7127 gloss:51.5712 dloss:432.7946
Episode:2037 meanR:-102.8118 R:-110.0217 gloss:71.3516 dloss:334.8354
Episode:2038 meanR:-102.9120 R:-110.5126 gloss:47.4414 dloss:317.7445
Episode:2039 meanR:-103.0123 R:-111.1097 gloss:69.3727 dloss:394.6035
Episode:2040 meanR:-103.1739 R:-116.8073 gloss:75.8209 dloss:396.6436
Episode:2041 meanR:-103.2308 R:-106.0825 gloss:71.5471 dloss:502.0707
Episode:2042 meanR:-103.3409 R:-111.2824 gloss:50.1539 dloss:419.5025
Episode:2043 meanR:-103.5076 R:-117.0172 gloss:63.1603 dloss:552.3697
Episode:2044 meanR:-103.6704 R:-116.7702 gloss:92.4795 dloss:583.8746
Episode:2045 meanR:-103.7148 R:-105.0962 gloss:75.3713 dloss:400.7059
Episode:2046 meanR:-103.8210 R:-111.2285 gloss:88.7369 dloss:424.0976
Episode:2047 meanR:-103.8682 R:-105.3525 gloss:52.0637 dloss:399.2988
Episode:2048 meanR:-104.0558 R:-119.1742 gloss:92.7356 dloss:427.5478
Episode:2049 meanR:-104.1569 R:-110.4889 gloss:47.4096 dloss:265.7876
Episode:2050 meanR:-

Episode:2154 meanR:-117.1469 R:-109.9396 gloss:37.4432 dloss:129.8700
Episode:2155 meanR:-117.0999 R:-108.0478 gloss:42.5601 dloss:139.4914
Episode:2156 meanR:-117.0603 R:-108.9108 gloss:35.8372 dloss:121.9962
Episode:2157 meanR:-117.0967 R:-109.3639 gloss:42.8368 dloss:140.3009
Episode:2158 meanR:-117.3043 R:-127.3022 gloss:43.8433 dloss:141.4226
Episode:2159 meanR:-117.4971 R:-132.0828 gloss:37.9335 dloss:131.0712
Episode:2160 meanR:-117.4493 R:-108.8322 gloss:35.4048 dloss:145.9598
Episode:2161 meanR:-117.4708 R:-115.8020 gloss:41.7982 dloss:142.0695
Episode:2162 meanR:-117.4425 R:-109.9255 gloss:39.9192 dloss:135.0939
Episode:2163 meanR:-117.4053 R:-108.7338 gloss:45.8279 dloss:164.4661
Episode:2164 meanR:-117.4620 R:-110.7801 gloss:38.3053 dloss:133.8902
Episode:2165 meanR:-117.4336 R:-109.4992 gloss:42.3682 dloss:151.2660
Episode:2166 meanR:-117.4392 R:-113.5846 gloss:38.4436 dloss:137.0309
Episode:2167 meanR:-117.6359 R:-133.0386 gloss:41.0114 dloss:142.2048
Episode:2168 meanR:-

Episode:2272 meanR:-111.4873 R:-112.1334 gloss:40.0738 dloss:149.3303
Episode:2273 meanR:-111.4881 R:-109.7616 gloss:52.6452 dloss:157.9440
Episode:2274 meanR:-111.4941 R:-109.6022 gloss:37.9685 dloss:123.1031
Episode:2275 meanR:-111.4880 R:-110.0756 gloss:41.2791 dloss:130.7903
Episode:2276 meanR:-111.4776 R:-109.2606 gloss:54.2839 dloss:162.4871
Episode:2277 meanR:-111.4899 R:-110.4817 gloss:46.3139 dloss:150.1643
Episode:2278 meanR:-111.5047 R:-110.9013 gloss:39.0706 dloss:124.9970
Episode:2279 meanR:-111.5275 R:-112.1293 gloss:43.3525 dloss:137.4123
Episode:2280 meanR:-111.5655 R:-112.9190 gloss:47.2066 dloss:147.6799
Episode:2281 meanR:-111.5810 R:-110.7965 gloss:53.0373 dloss:157.8456
Episode:2282 meanR:-111.5814 R:-107.1170 gloss:48.2253 dloss:143.1046
Episode:2283 meanR:-111.6294 R:-110.2808 gloss:46.2054 dloss:142.5567
Episode:2284 meanR:-111.6692 R:-111.4763 gloss:45.9874 dloss:147.3993
Episode:2285 meanR:-111.6646 R:-107.4852 gloss:45.0356 dloss:137.7542
Episode:2286 meanR:-

Episode:2390 meanR:-109.7502 R:-110.0294 gloss:43.3816 dloss:132.6402
Episode:2391 meanR:-109.7914 R:-111.4045 gloss:51.7583 dloss:152.0989
Episode:2392 meanR:-109.8314 R:-110.7805 gloss:49.5521 dloss:141.4143
Episode:2393 meanR:-109.8662 R:-110.5743 gloss:54.1894 dloss:160.5374
Episode:2394 meanR:-109.9077 R:-111.4083 gloss:56.6083 dloss:159.7211
Episode:2395 meanR:-109.9497 R:-111.5383 gloss:56.9174 dloss:157.6180
Episode:2396 meanR:-110.0017 R:-112.2375 gloss:51.0038 dloss:158.8524
Episode:2397 meanR:-110.0319 R:-109.8636 gloss:63.4251 dloss:176.5905
Episode:2398 meanR:-110.0308 R:-106.5096 gloss:48.0338 dloss:145.3005
Episode:2399 meanR:-110.0668 R:-110.0413 gloss:46.9881 dloss:144.2709
Episode:2400 meanR:-110.0970 R:-109.9748 gloss:52.0982 dloss:152.3556
Episode:2401 meanR:-110.0965 R:-106.7247 gloss:44.5902 dloss:135.1946
Episode:2402 meanR:-110.1485 R:-112.0471 gloss:59.4624 dloss:164.1850
Episode:2403 meanR:-110.2116 R:-112.1627 gloss:57.2945 dloss:163.5642
Episode:2404 meanR:-

Episode:2508 meanR:-110.8678 R:-107.9784 gloss:43.7458 dloss:140.3328
Episode:2509 meanR:-110.8414 R:-109.3797 gloss:44.4181 dloss:136.1261
Episode:2510 meanR:-110.8327 R:-109.8999 gloss:55.9525 dloss:157.9677
Episode:2511 meanR:-110.8086 R:-109.6585 gloss:58.3684 dloss:168.9456
Episode:2512 meanR:-110.8019 R:-109.4062 gloss:50.8808 dloss:155.8652
Episode:2513 meanR:-110.7989 R:-109.3125 gloss:53.7048 dloss:150.5560
Episode:2514 meanR:-110.7951 R:-109.9967 gloss:65.2146 dloss:184.2410
Episode:2515 meanR:-110.7710 R:-109.8494 gloss:62.3075 dloss:174.3138
Episode:2516 meanR:-110.7418 R:-109.5371 gloss:49.8364 dloss:154.9416
Episode:2517 meanR:-110.7757 R:-109.7928 gloss:50.9053 dloss:145.8077
Episode:2518 meanR:-110.7540 R:-110.0966 gloss:61.6184 dloss:175.1838
Episode:2519 meanR:-110.7918 R:-109.8370 gloss:46.5359 dloss:141.5817
Episode:2520 meanR:-110.7822 R:-110.0118 gloss:51.2888 dloss:157.0470
Episode:2521 meanR:-110.7647 R:-109.7774 gloss:57.7561 dloss:163.3943
Episode:2522 meanR:-

Episode:2626 meanR:-113.2632 R:-113.4031 gloss:56.4569 dloss:169.3863
Episode:2627 meanR:-113.2790 R:-114.0423 gloss:51.3082 dloss:163.9733
Episode:2628 meanR:-113.3005 R:-114.1589 gloss:40.0302 dloss:142.4294
Episode:2629 meanR:-113.3207 R:-114.3995 gloss:54.2770 dloss:161.8714
Episode:2630 meanR:-113.3409 R:-114.5475 gloss:55.2049 dloss:170.1649
Episode:2631 meanR:-113.3622 R:-114.8072 gloss:57.0581 dloss:177.7669
Episode:2632 meanR:-113.3749 R:-113.8532 gloss:64.7090 dloss:186.3922
Episode:2633 meanR:-113.3854 R:-114.1271 gloss:46.6362 dloss:151.9870
Episode:2634 meanR:-113.3981 R:-113.3064 gloss:53.3437 dloss:163.2108
Episode:2635 meanR:-113.4138 R:-115.0098 gloss:57.1496 dloss:171.6119
Episode:2636 meanR:-113.4511 R:-115.4401 gloss:57.6993 dloss:170.3307
Episode:2637 meanR:-113.4798 R:-115.0971 gloss:55.4997 dloss:167.5222
Episode:2638 meanR:-113.4848 R:-114.5433 gloss:59.4647 dloss:178.0912
Episode:2639 meanR:-113.5226 R:-116.1205 gloss:56.0540 dloss:176.9767
Episode:2640 meanR:-

Episode:2744 meanR:-114.6731 R:-113.7494 gloss:46.7745 dloss:157.8909
Episode:2745 meanR:-114.6523 R:-112.7511 gloss:40.1914 dloss:137.8368
Episode:2746 meanR:-114.6233 R:-113.5799 gloss:46.6682 dloss:156.5628
Episode:2747 meanR:-114.6057 R:-113.3671 gloss:53.4834 dloss:160.5274
Episode:2748 meanR:-114.5935 R:-112.5704 gloss:61.8555 dloss:180.4540
Episode:2749 meanR:-114.5757 R:-113.2659 gloss:53.5721 dloss:171.1971
Episode:2750 meanR:-114.5665 R:-113.5774 gloss:44.6986 dloss:151.2360
Episode:2751 meanR:-114.5532 R:-112.5978 gloss:43.8645 dloss:149.1445
Episode:2752 meanR:-114.5227 R:-112.5249 gloss:63.9141 dloss:191.2096
Episode:2753 meanR:-114.4978 R:-113.0536 gloss:55.9170 dloss:172.5199
Episode:2754 meanR:-114.4579 R:-112.7232 gloss:39.9411 dloss:143.6881
Episode:2755 meanR:-114.4364 R:-113.5989 gloss:63.2091 dloss:186.9442
Episode:2756 meanR:-114.4034 R:-112.9245 gloss:60.2117 dloss:179.7086
Episode:2757 meanR:-114.3895 R:-113.5260 gloss:49.7729 dloss:164.6912
Episode:2758 meanR:-

Episode:2862 meanR:-120.9836 R:-130.9204 gloss:34.9885 dloss:138.4334
Episode:2863 meanR:-121.1403 R:-127.8068 gloss:35.0744 dloss:133.7623
Episode:2864 meanR:-121.2436 R:-122.6454 gloss:40.3495 dloss:150.9272
Episode:2865 meanR:-121.2116 R:-109.0258 gloss:35.7936 dloss:140.9298
Episode:2866 meanR:-121.1307 R:-104.7746 gloss:29.6797 dloss:118.2517
Episode:2867 meanR:-121.0967 R:-110.0850 gloss:34.0257 dloss:129.0291
Episode:2868 meanR:-121.0791 R:-110.3449 gloss:29.5698 dloss:124.2429
Episode:2869 meanR:-121.0554 R:-110.2516 gloss:38.2092 dloss:145.5227
Episode:2870 meanR:-121.0116 R:-107.9811 gloss:36.6689 dloss:137.5782
Episode:2871 meanR:-120.9916 R:-110.3363 gloss:35.7258 dloss:136.1796
Episode:2872 meanR:-120.9764 R:-110.6313 gloss:37.4711 dloss:150.1585
Episode:2873 meanR:-120.9194 R:-106.8686 gloss:43.7199 dloss:161.0017
Episode:2874 meanR:-120.8765 R:-109.5722 gloss:46.2178 dloss:173.6578
Episode:2875 meanR:-120.8435 R:-110.0781 gloss:39.3701 dloss:144.1651
Episode:2876 meanR:-

Episode:2980 meanR:-111.5261 R:-112.3291 gloss:35.7311 dloss:131.9386
Episode:2981 meanR:-111.5131 R:-106.8554 gloss:31.1728 dloss:124.8315
Episode:2982 meanR:-111.5009 R:-110.0081 gloss:35.1291 dloss:137.3642
Episode:2983 meanR:-111.5591 R:-112.4420 gloss:39.1607 dloss:137.9511
Episode:2984 meanR:-111.6081 R:-112.9567 gloss:45.2361 dloss:157.2846
Episode:2985 meanR:-111.6522 R:-113.7831 gloss:41.6928 dloss:155.6023
Episode:2986 meanR:-111.6954 R:-112.7917 gloss:39.3066 dloss:145.2223
Episode:2987 meanR:-111.6619 R:-106.9623 gloss:39.7058 dloss:154.2882
Episode:2988 meanR:-111.7142 R:-112.7895 gloss:39.3375 dloss:147.6292
Episode:2989 meanR:-111.7173 R:-112.7309 gloss:35.1538 dloss:135.7696
Episode:2990 meanR:-111.7411 R:-114.0162 gloss:44.1279 dloss:156.0169
Episode:2991 meanR:-111.7380 R:-113.1681 gloss:42.5643 dloss:163.3975
Episode:2992 meanR:-111.7632 R:-113.7862 gloss:38.1247 dloss:139.5815
Episode:2993 meanR:-111.7830 R:-114.0845 gloss:41.8538 dloss:153.2716
Episode:2994 meanR:-

Episode:3098 meanR:-109.3461 R:-104.7654 gloss:41.4474 dloss:151.0239
Episode:3099 meanR:-109.3436 R:-105.8525 gloss:37.4882 dloss:140.2178
Episode:3100 meanR:-109.3569 R:-113.0481 gloss:39.0085 dloss:145.9366
Episode:3101 meanR:-109.3669 R:-113.3381 gloss:38.7174 dloss:145.7891
Episode:3102 meanR:-109.3756 R:-113.5452 gloss:34.6936 dloss:134.6234
Episode:3103 meanR:-109.3665 R:-104.9564 gloss:43.1108 dloss:147.6305
Episode:3104 meanR:-109.3770 R:-109.5097 gloss:37.5803 dloss:138.1886
Episode:3105 meanR:-109.4581 R:-114.1965 gloss:50.4827 dloss:173.3428
Episode:3106 meanR:-109.4763 R:-114.4505 gloss:39.0917 dloss:147.8633
Episode:3107 meanR:-109.5157 R:-111.1341 gloss:31.6452 dloss:136.1035
Episode:3108 meanR:-109.4594 R:-106.9575 gloss:37.5375 dloss:148.4724
Episode:3109 meanR:-109.5452 R:-114.0047 gloss:38.5338 dloss:147.5549
Episode:3110 meanR:-109.5493 R:-107.0229 gloss:42.4087 dloss:153.3037
Episode:3111 meanR:-109.5629 R:-114.4559 gloss:38.9036 dloss:147.4401
Episode:3112 meanR:-

Episode:3216 meanR:-109.6385 R:-113.1931 gloss:40.6125 dloss:155.2863
Episode:3217 meanR:-109.6507 R:-108.1845 gloss:43.5841 dloss:163.1862
Episode:3218 meanR:-109.7090 R:-113.3949 gloss:36.8604 dloss:144.7268
Episode:3219 meanR:-109.7512 R:-113.5758 gloss:35.9572 dloss:144.0507
Episode:3220 meanR:-109.7990 R:-113.4070 gloss:34.5718 dloss:140.5618
Episode:3221 meanR:-109.8218 R:-110.8923 gloss:43.3413 dloss:156.8935
Episode:3222 meanR:-109.8282 R:-111.2743 gloss:41.9412 dloss:159.3582
Episode:3223 meanR:-109.8313 R:-112.5905 gloss:32.9903 dloss:133.5847
Episode:3224 meanR:-109.8294 R:-106.5266 gloss:42.5937 dloss:161.0132
Episode:3225 meanR:-109.8432 R:-113.5949 gloss:44.5506 dloss:162.5470
Episode:3226 meanR:-109.8743 R:-113.1542 gloss:41.6802 dloss:159.7674
Episode:3227 meanR:-109.9426 R:-113.0562 gloss:35.1408 dloss:142.9952
Episode:3228 meanR:-110.0040 R:-113.6557 gloss:38.7770 dloss:159.8782
Episode:3229 meanR:-110.0690 R:-113.4598 gloss:39.3070 dloss:150.3785
Episode:3230 meanR:-

Episode:3334 meanR:-109.1104 R:-105.1451 gloss:36.3766 dloss:143.1476
Episode:3335 meanR:-109.0376 R:-105.8485 gloss:41.2196 dloss:153.8556
Episode:3336 meanR:-108.9840 R:-105.5834 gloss:35.5447 dloss:139.1058
Episode:3337 meanR:-108.9012 R:-105.2010 gloss:42.3553 dloss:153.3787
Episode:3338 meanR:-108.9742 R:-113.3243 gloss:45.3514 dloss:165.1019
Episode:3339 meanR:-108.9082 R:-106.3062 gloss:38.9103 dloss:152.5630
Episode:3340 meanR:-108.8161 R:-104.3262 gloss:41.1093 dloss:162.5821
Episode:3341 meanR:-108.7437 R:-106.5370 gloss:39.9204 dloss:158.6469
Episode:3342 meanR:-108.6665 R:-104.1769 gloss:35.7833 dloss:147.9346
Episode:3343 meanR:-108.5811 R:-105.2071 gloss:42.4827 dloss:152.5325
Episode:3344 meanR:-108.5105 R:-106.4171 gloss:41.3811 dloss:156.5781
Episode:3345 meanR:-108.5004 R:-105.0942 gloss:37.6010 dloss:147.5231
Episode:3346 meanR:-108.5365 R:-109.2252 gloss:37.7072 dloss:144.6406
Episode:3347 meanR:-108.5357 R:-105.6788 gloss:38.3590 dloss:150.7086
Episode:3348 meanR:-

Episode:3452 meanR:-108.0586 R:-105.8800 gloss:43.8737 dloss:167.8474
Episode:3453 meanR:-108.0648 R:-106.7122 gloss:43.0844 dloss:174.1324
Episode:3454 meanR:-108.0830 R:-107.1861 gloss:45.2908 dloss:167.8946
Episode:3455 meanR:-108.0996 R:-107.0053 gloss:39.6159 dloss:164.7795
Episode:3456 meanR:-108.0390 R:-105.9987 gloss:38.4360 dloss:162.7201
Episode:3457 meanR:-108.0257 R:-106.1410 gloss:41.3569 dloss:159.7258
Episode:3458 meanR:-108.0272 R:-106.0357 gloss:46.7504 dloss:173.0536
Episode:3459 meanR:-108.0188 R:-107.1733 gloss:42.0550 dloss:168.9498
Episode:3460 meanR:-108.0131 R:-104.7620 gloss:39.5511 dloss:162.2328
Episode:3461 meanR:-107.9382 R:-105.6617 gloss:41.1144 dloss:161.5135
Episode:3462 meanR:-107.8695 R:-106.4463 gloss:41.9832 dloss:163.5954
Episode:3463 meanR:-107.8179 R:-107.6964 gloss:42.2326 dloss:168.1625
Episode:3464 meanR:-107.8197 R:-105.7475 gloss:43.9580 dloss:170.3317
Episode:3465 meanR:-107.7560 R:-106.0328 gloss:45.8654 dloss:172.6920
Episode:3466 meanR:-

# Visualizing training

Below I'll plot the total rewards for each episode. I'm plotting the rolling average too, in blue.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / N 

In [None]:
eps, arr = np.array(episode_rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(gloss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('G losses')

In [None]:
eps, arr = np.array(dloss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('D losses')

## Testing

Let's checkout how our trained agent plays the game.

In [85]:
import gym
env = gym.make('BipedalWalker-v2')

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    
    # Episodes/epochs
    for _ in range(1):
        state = env.reset()
        total_reward = 0

        # Steps/batches
        while True:
            env.render()
            action_preds = sess.run(model.actions_preds, feed_dict={model.states: state.reshape([1, -1])})
            action = np.reshape(action_preds, [-1]) # For continuous action space
            state, reward, done, _ = env.step(action)
            total_reward += reward
            if done:
                print('total_reward: {}'.format(total_reward))
                break
# End the env                
env.close()

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
INFO:tensorflow:Restoring parameters from checkpoints/model.ckpt
total_reward: -103.43347330792993


## Extending this

So, Cart-Pole is a pretty simple game. However, the same model can be used to train an agent to play something much more complicated like Pong or Space Invaders. Instead of a state like we're using here though, you'd want to use convolutional layers to get the state from the screen images.

![Deep Q-Learning Atari](assets/atari-network.png)

I'll leave it as a challenge for you to use deep Q-learning to train an agent to play Atari games. Here's the original paper which will get you started: http://www.davidqiu.com:8888/research/nature14236.pdf.