# Navigation

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

### 1. Start the Environment

We begin by importing some necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

For instance, if you are using a Mac, then you downloaded `Banana.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Banana.app")
```

In [2]:
# env = UnityEnvironment(file_name="/home/arasdar/VisualBanana_Linux/Banana.x86")
env = UnityEnvironment(file_name="/home/arasdar/Banana_Linux/Banana.x86_64")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment.  At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
# print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)
# print(state.shape, len(env_info.vector_observations), env_info.vector_observations.shape)

Number of agents: 1
Number of actions: 4
States have length: 37


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action (uniformly) at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [5]:
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
while True:
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        print(state.shape)
        break
    
print("Score: {}".format(score))

(37,)
Score: 0.0


When finished, you can close the environment.

In [6]:
# env.close()

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [7]:
env_info = env.reset(train_mode=True)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
while True:
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    #print(state)
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))

Score: 0.0


In [8]:
# In this one we should define and detect GPUs for tensorflow
# GPUs or CPU
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.7.1
Default GPU Device: 


In [9]:
env_info = env.reset(train_mode=True)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
batch = []
while True: # infinite number of steps
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    #print(state, action, reward, done)
    batch.append([action, state, reward, done])
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
# print("Score: {}".format(score))

In [10]:
batch[0], batch[0][1].shape

([0, array([0.        , 0.        , 1.        , 0.        , 0.07196155,
         0.        , 0.        , 1.        , 0.        , 0.04892623,
         0.        , 1.        , 0.        , 0.        , 0.71655846,
         1.        , 0.        , 0.        , 0.        , 0.34735984,
         1.        , 0.        , 0.        , 0.        , 0.34166035,
         0.        , 0.        , 1.        , 0.        , 0.0420079 ,
         0.        , 0.        , 1.        , 0.        , 0.21867581,
         0.        , 0.        ]), 0.0, False], (37,))

In [11]:
batch[0][1].shape

(37,)

In [12]:
batch[0]

[0, array([0.        , 0.        , 1.        , 0.        , 0.07196155,
        0.        , 0.        , 1.        , 0.        , 0.04892623,
        0.        , 1.        , 0.        , 0.        , 0.71655846,
        1.        , 0.        , 0.        , 0.        , 0.34735984,
        1.        , 0.        , 0.        , 0.        , 0.34166035,
        0.        , 0.        , 1.        , 0.        , 0.0420079 ,
        0.        , 0.        , 1.        , 0.        , 0.21867581,
        0.        , 0.        ]), 0.0, False]

In [13]:
actions = np.array([each[0] for each in batch])
states = np.array([each[1] for each in batch])
rewards = np.array([each[2] for each in batch])
dones = np.array([each[3] for each in batch])
# infos = np.array([each[4] for each in batch])

In [14]:
# print(rewards[:])
print(np.array(rewards).shape, np.array(states).shape, np.array(actions).shape, np.array(dones).shape)
print(np.array(rewards).dtype, np.array(states).dtype, np.array(actions).dtype, np.array(dones).dtype)
print(np.max(np.array(actions)), np.min(np.array(actions)), 
      (np.max(np.array(actions)) - np.min(np.array(actions)))+1)
print(np.max(np.array(rewards)), np.min(np.array(rewards)))
print(np.max(np.array(states)), np.min(np.array(states)))

(300,) (300, 37) (300,) (300,)
float64 float64 int64 bool
3 0 4
0.0 -1.0
10.589945793151855 -10.711225509643555


In [16]:
# Data of the model
def model_input(state_size):
    states = tf.placeholder(tf.float32, [None, state_size], name='states')
    actions = tf.placeholder(tf.int32, [None], name='actions')
    targetQs = tf.placeholder(tf.float32, [None], name='targetQs')
    rewards = tf.placeholder(tf.float32, [None], name='rewards')
    rate = tf.placeholder(tf.float32, [], name='rate')
    return states, actions, targetQs, rewards, rate

In [17]:
# Generator/Controller: Generating/prediting the actions
def generator(states, action_size, hidden_size, reuse=False, alpha=0.1, training=False):
    with tf.variable_scope('generator', reuse=reuse):
        # First fully connected layer
        h1 = tf.layers.dense(inputs=states, units=hidden_size)
        bn1 = tf.layers.batch_normalization(h1, training=training)        
        nl1 = tf.maximum(alpha * bn1, bn1)
        
        # Second fully connected layer
        h2 = tf.layers.dense(inputs=nl1, units=hidden_size)
        bn2 = tf.layers.batch_normalization(h2, training=training)        
        nl2 = tf.maximum(alpha * bn2, bn2)
        
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=action_size)        
        #predictions = tf.nn.softmax(logits)

        # return actions logits
        return logits

In [18]:
# Discriminator/Dopamine: Reward function/planner/naviator/advisor/supervisor/cortical columns
def discriminator(states, actions, hidden_size, reuse=False, alpha=0.1, training=False):
    with tf.variable_scope('discriminator', reuse=reuse):
        # Fusion/merge states and actions/ SA/ SM
        x_fused = tf.concat(axis=1, values=[states, actions])
        
        # First fully connected layer
        h1 = tf.layers.dense(inputs=x_fused, units=hidden_size)
        bn1 = tf.layers.batch_normalization(h1, training=training)        
        nl1 = tf.maximum(alpha * bn1, bn1)
        
        # Second fully connected layer
        h2 = tf.layers.dense(inputs=nl1, units=hidden_size)
        bn2 = tf.layers.batch_normalization(h2, training=training)        
        nl2 = tf.maximum(alpha * bn2, bn2)
        
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=1)        
        #predictions = tf.nn.softmax(logits)

        # return rewards logits
        return logits

In [19]:
def model_loss(action_size, hidden_size, states, actions, targetQs, rewards, rate):
    # G
    actions_logits = generator(states=states, hidden_size=hidden_size, action_size=action_size)
    actions_labels = tf.one_hot(indices=actions, depth=action_size, dtype=actions_logits.dtype)
    neg_log_prob_actions = tf.nn.softmax_cross_entropy_with_logits_v2(logits=actions_logits, 
                                                                      labels=actions_labels)
    Qs_labels = rewards[:-1] + (0.99 * targetQs[1:])
    #g_loss = tf.reduce_mean(neg_log_prob_actions[:-1] * targetQs[1:])
    g_loss = tf.reduce_mean(neg_log_prob_actions[:-1] * Qs_labels)
    
    # D
    Qs_logits = discriminator(actions=actions_logits, hidden_size=hidden_size, states=states)
    d_lossR = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=Qs_logits,
                                                                     labels=rate * tf.ones_like(Qs_logits)))
    d_lossQ = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=tf.reshape(Qs_logits[:-1], shape=[-1]),
                                                                     labels=tf.nn.sigmoid(Qs_labels)))
    d_loss = d_lossR + d_lossQ

    return actions_logits, Qs_logits, g_loss, d_loss, d_lossR, d_lossQ

In [20]:
# Optimizating/training/learning G & D
def model_opt(g_loss, d_loss, learning_rate):
    """
    Get optimization operations in order
    :param g_loss: Generator loss Tensor for action prediction
    :param d_loss: Discriminator loss Tensor for reward prediction for generated/prob/logits action
    :param learning_rate: Learning Rate Placeholder
    :return: A tuple of (qfunction training, generator training, discriminator training)
    """
    # Get weights and bias to update
    t_vars = tf.trainable_variables()
    g_vars = [var for var in t_vars if var.name.startswith('generator')]
    d_vars = [var for var in t_vars if var.name.startswith('discriminator')]

    # Optimize
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): # Required for batchnorm (BN)
        g_opt = tf.train.AdamOptimizer(learning_rate).minimize(g_loss, var_list=g_vars)
        d_opt = tf.train.AdamOptimizer(learning_rate).minimize(d_loss, var_list=d_vars)

    return g_opt, d_opt

In [21]:
class Model:
    def __init__(self, state_size, action_size, hidden_size, learning_rate):

        # Data of the Model: make the data available inside the framework
        self.states, self.actions, self.targetQs, self.rewards, self.rate = model_input(state_size=state_size)

        # Create the Model: calculating the loss and forwad pass
        self.actions_logits, self.Qs_logits, self.g_loss, self.d_loss, self.d_lossR, self.d_lossQ = model_loss(
            action_size=action_size, hidden_size=hidden_size, # model init parameters
            states=self.states, actions=self.actions, # model input
            targetQs=self.targetQs, rewards=self.rewards, rate=self.rate) # model input
        
        # Update the model: backward pass and backprop
        self.g_opt, self.d_opt = model_opt(g_loss=self.g_loss, d_loss=self.d_loss, learning_rate=learning_rate)

In [22]:
print('state size:{}'.format(states.shape), 
      'actions:{}'.format(actions.shape)) 
print('action size:{}'.format(np.max(actions) - np.min(actions)+1))

state size:(300, 37) actions:(300,)
action size:4


In [23]:
# Training parameters
# Network parameters
state_size = 37              # number of units for the input state/observation -- simulation
action_size = 4              # number of units for the output actions -- simulation
hidden_size = 37*16          # number of units in each Q-network hidden layer -- simulation
learning_rate = 0.001          # learning rate for adam

In [24]:
# Reset/init the graph/session
graph = tf.reset_default_graph()

# Init the model
model = Model(action_size=action_size, hidden_size=hidden_size, state_size=state_size, learning_rate=learning_rate)

In [25]:
env_info = env.reset(train_mode=True)[brain_name] # reset the environment

while True: # infinite number of steps
#for _ in range(batch_size):
    state = env_info.vector_observations[0]   # get the next state
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    #memory.buffer.append([action, state, done])
    if done:                                       # exit loop if episode finished
        break

In [None]:
from collections import deque
episodes_total_reward = deque(maxlen=100) # 100 episodes average/running average/running mean/window
saver = tf.train.Saver()
rewards_list, g_loss_list, d_loss_list = [], [], []

# TF session for training
with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    
    # Training episodes/epochs
    for ep in range(111111):
        batch = [] # every data batch
        total_reward = 0
        #state = env.reset() # env first state
        env_info = env.reset(train_mode=True)[brain_name] # reset the environment

        # Training steps/batches
        while True:
            state = env_info.vector_observations[0]   # get the next state
            action_logits, Q_logits = sess.run(fetches=[model.actions_logits, model.Qs_logits], 
                                               feed_dict={model.states: np.reshape(state, [1, -1])})
            action = np.argmax(action_logits)
            #state, reward, done, _ = env.step(action)
            env_info = env.step(action)[brain_name]        # send the action to the environment
            reward = env_info.rewards[0]                   # get the reward
            batch.append([state, action, Q_logits, reward])
            total_reward += reward
            done = env_info.local_done[0]                  # see if episode has finished
            if done is True: # episode ended success/failure
                episodes_total_reward.append(total_reward) # stopping criteria
                #rate = total_reward/ 500 # success is 500 points, rate is between 0 and +1 ~ sigmoid
                rate = total_reward/ +13 # success is +13; rate is between -1 and +1 ~ tanh
                if rate >= +1: rate = +1
                if rate <= -1: rate = -1
                # min -13, max +13
                # min -1, max +1
                #reward = x-(-13)/ 26 # 0-1
                #prob_rate = (rate - (-1))/ (1-(-1))
                #prob_rate = (rate - rate_min)/ (rate_max-rate_min)
                rate_prob = (rate + 1)/2 # success rate 0-1
                break

        # Training using batches
        #batch = memory.buffer
        states = np.array([each[0] for each in batch])
        actions = np.array([each[1] for each in batch])
        targetQs = np.array([each[2] for each in batch])
        rewards = np.array([each[3] for each in batch])
        g_loss, d_loss, d_lossR, d_lossQ, _, _ = sess.run([model.g_loss, model.d_loss, 
                                                           model.d_lossR, model.d_lossQ, 
                                                           model.g_opt, model.d_opt],
                                                          feed_dict = {model.states: states, 
                                                                       model.actions: actions,
                                                                       model.targetQs: targetQs.reshape([-1]),
                                                                       model.rewards: rewards, 
                                                                       model.rate: rate})
        # Average 100 episode total reward
        # Print out
        print('Episode:{}'.format(ep),
              'meanR:{:.4f}'.format(np.mean(episodes_total_reward)),
              'rate:{:.4f}'.format(rate),
              'gloss:{:.4f}'.format(g_loss),
              'dloss:{:.4f}'.format(d_loss),
              'dlossR:{:.4f}'.format(d_lossR),
              'dlossQ:{:.4f}'.format(d_lossQ))
        # Ploting out
        rewards_list.append([ep, np.mean(episodes_total_reward)])
        g_loss_list.append([ep, g_loss])
        d_loss_list.append([ep, d_loss])
        # Break episode/epoch loop
        if np.mean(episodes_total_reward) >= +13:
            break
            
    # At the end of all training episodes/epochs
    saver.save(sess, 'checkpoints/model-nav.ckpt')

Episode:0 meanR:-1.0000 rate:-0.0769 gloss:-0.0960 dloss:1.3444 dlossR:0.6518 dlossQ:0.6925
Episode:1 meanR:-1.0000 rate:-0.0769 gloss:-1.0128 dloss:0.8274 dlossR:0.2438 dlossQ:0.5836
Episode:2 meanR:-1.0000 rate:-0.0769 gloss:-1.9897 dloss:0.3188 dlossR:-0.0569 dlossQ:0.3757
Episode:3 meanR:-1.0000 rate:-0.0769 gloss:-2.2554 dloss:0.3928 dlossR:-0.0129 dlossQ:0.4057
Episode:4 meanR:-0.8000 rate:0.0000 gloss:-2.7214 dloss:0.4067 dlossR:0.0958 dlossQ:0.3109
Episode:5 meanR:-0.8333 rate:-0.0769 gloss:-3.4543 dloss:0.0417 dlossR:-0.1692 dlossQ:0.2110
Episode:6 meanR:-0.7143 rate:0.0000 gloss:-4.2276 dloss:0.1659 dlossR:0.0290 dlossQ:0.1369
Episode:7 meanR:-0.6250 rate:0.0000 gloss:-4.7804 dloss:0.0998 dlossR:0.0159 dlossQ:0.0839
Episode:8 meanR:-0.5556 rate:0.0000 gloss:-6.7373 dloss:0.0364 dlossR:0.0048 dlossQ:0.0316
Episode:9 meanR:-0.4000 rate:0.0769 gloss:-6.3728 dloss:0.6303 dlossR:0.6170 dlossQ:0.0133
Episode:10 meanR:-0.4545 rate:-0.0769 gloss:-6.2946 dloss:-0.6500 dlossR:-0.6578 d

Episode:88 meanR:-0.1011 rate:0.0000 gloss:-22.3054 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:89 meanR:-0.1222 rate:-0.1538 gloss:-23.0185 dloss:-2.7985 dlossR:-2.7986 dlossQ:0.0000
Episode:90 meanR:-0.1209 rate:0.0000 gloss:-28.3139 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:91 meanR:-0.1413 rate:-0.1538 gloss:-32.0400 dloss:-3.8123 dlossR:-3.8123 dlossQ:0.0000
Episode:92 meanR:-0.1505 rate:-0.0769 gloss:-28.9348 dloss:-1.7199 dlossR:-1.7200 dlossQ:0.0000
Episode:93 meanR:-0.1702 rate:-0.1538 gloss:-31.6225 dloss:-3.7509 dlossR:-3.7509 dlossQ:0.0000
Episode:94 meanR:-0.1789 rate:-0.0769 gloss:-39.7995 dloss:-2.3477 dlossR:-2.3477 dlossQ:0.0000
Episode:95 meanR:-0.1875 rate:-0.0769 gloss:-43.2032 dloss:-2.5325 dlossR:-2.5325 dlossQ:0.0000
Episode:96 meanR:-0.1753 rate:0.0769 gloss:-50.5707 dloss:3.0322 dlossR:3.0322 dlossQ:0.0000
Episode:97 meanR:-0.1735 rate:0.0000 gloss:-48.0225 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:98 meanR:-0.1818 rate:-0.0769 gloss:-40.7710

Episode:174 meanR:-0.2600 rate:0.2308 gloss:-447.9164 dloss:76.4020 dlossR:76.4020 dlossQ:0.0000
Episode:175 meanR:-0.2600 rate:-0.0769 gloss:-399.7676 dloss:-22.7611 dlossR:-22.7611 dlossQ:0.0000
Episode:176 meanR:-0.2600 rate:0.0000 gloss:-369.2938 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:177 meanR:-0.2500 rate:0.0769 gloss:-373.4454 dloss:21.3266 dlossR:21.3266 dlossQ:0.0000
Episode:178 meanR:-0.2500 rate:0.0000 gloss:-379.1991 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:179 meanR:-0.2500 rate:0.0000 gloss:-323.5201 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:180 meanR:-0.2800 rate:-0.2308 gloss:-360.5891 dloss:-61.3709 dlossR:-61.3709 dlossQ:0.0000
Episode:181 meanR:-0.2800 rate:0.0000 gloss:-360.7635 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:182 meanR:-0.2900 rate:-0.0769 gloss:-387.2152 dloss:-22.0906 dlossR:-22.0906 dlossQ:0.0000
Episode:183 meanR:-0.2900 rate:0.0000 gloss:-387.1479 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:184 meanR:-0.2900 rate:

Episode:259 meanR:-0.2300 rate:0.0769 gloss:-1175.0355 dloss:66.3027 dlossR:66.3027 dlossQ:0.0000
Episode:260 meanR:-0.2200 rate:0.0000 gloss:-1043.8257 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:261 meanR:-0.2400 rate:-0.1538 gloss:-1100.7161 dloss:-124.8912 dlossR:-124.8912 dlossQ:0.0000
Episode:262 meanR:-0.2200 rate:0.0000 gloss:-1097.1277 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:263 meanR:-0.2000 rate:0.0000 gloss:-1092.0068 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:264 meanR:-0.2300 rate:-0.3077 gloss:-1337.6400 dloss:-302.6616 dlossR:-302.6616 dlossQ:0.0000
Episode:265 meanR:-0.2300 rate:0.0000 gloss:-1237.0875 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:266 meanR:-0.2000 rate:0.0769 gloss:-1359.6835 dloss:77.4857 dlossR:77.4857 dlossQ:0.0000
Episode:267 meanR:-0.1800 rate:0.0000 gloss:-1387.4218 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:268 meanR:-0.1800 rate:0.0000 gloss:-1536.4841 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:269 meanR:-0.2

Episode:343 meanR:-0.1400 rate:-0.0769 gloss:-1078.4763 dloss:-60.7371 dlossR:-60.7371 dlossQ:0.0000
Episode:344 meanR:-0.1400 rate:0.0000 gloss:-1087.7310 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:345 meanR:-0.1300 rate:-0.0769 gloss:-1480.0924 dloss:-83.2410 dlossR:-83.2410 dlossQ:0.0000
Episode:346 meanR:-0.1300 rate:-0.0769 gloss:-1581.4884 dloss:-88.7949 dlossR:-88.7949 dlossQ:0.0000
Episode:347 meanR:-0.1400 rate:-0.0769 gloss:-1529.4735 dloss:-86.0644 dlossR:-86.0644 dlossQ:0.0000
Episode:348 meanR:-0.1300 rate:0.0769 gloss:-1532.7257 dloss:86.2775 dlossR:86.2775 dlossQ:0.0000
Episode:349 meanR:-0.1300 rate:0.0000 gloss:-1445.9327 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:350 meanR:-0.1100 rate:0.0000 gloss:-1475.1337 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:351 meanR:-0.1300 rate:-0.1538 gloss:-1375.8595 dloss:-155.1231 dlossR:-155.1231 dlossQ:0.0000
Episode:352 meanR:-0.1300 rate:-0.0769 gloss:-1477.3549 dloss:-83.1807 dlossR:-83.1807 dlossQ:0.0000
Episod

Episode:427 meanR:-0.0200 rate:0.0769 gloss:-1270.3295 dloss:71.4443 dlossR:71.4443 dlossQ:0.0000
Episode:428 meanR:-0.0500 rate:-0.0769 gloss:-1484.8170 dloss:-83.3979 dlossR:-83.3979 dlossQ:0.0000
Episode:429 meanR:-0.0400 rate:0.1538 gloss:-1324.5264 dloss:148.7534 dlossR:148.7534 dlossQ:0.0000
Episode:430 meanR:-0.0700 rate:-0.1538 gloss:-1188.9473 dloss:-133.6135 dlossR:-133.6135 dlossQ:0.0000
Episode:431 meanR:-0.0500 rate:0.1538 gloss:-1429.7743 dloss:160.5686 dlossR:160.5686 dlossQ:0.0000
Episode:432 meanR:-0.0500 rate:0.0000 gloss:-1738.6239 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:433 meanR:-0.0400 rate:0.0000 gloss:-1429.0695 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:434 meanR:-0.0200 rate:0.0769 gloss:-1705.1191 dloss:95.6519 dlossR:95.6519 dlossQ:0.0000
Episode:435 meanR:-0.0400 rate:0.0000 gloss:-1567.4060 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:436 meanR:-0.0800 rate:-0.3846 gloss:-1190.7365 dloss:-334.5394 dlossR:-334.5394 dlossQ:0.0000
Episode:4

Episode:511 meanR:-0.0100 rate:0.0000 gloss:-1350.2809 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:512 meanR:-0.0300 rate:-0.0769 gloss:-1285.3473 dloss:-72.1069 dlossR:-72.1069 dlossQ:0.0000
Episode:513 meanR:-0.0300 rate:0.0000 gloss:-1402.0217 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:514 meanR:-0.0300 rate:0.0000 gloss:-1237.4478 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:515 meanR:-0.0300 rate:0.0000 gloss:-1005.5048 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:516 meanR:-0.0300 rate:0.0000 gloss:-1204.1154 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:517 meanR:0.0200 rate:0.3077 gloss:-987.9174 dloss:221.5622 dlossR:221.5622 dlossQ:0.0000
Episode:518 meanR:0.0700 rate:0.0769 gloss:-949.5247 dloss:53.3119 dlossR:53.3119 dlossQ:0.0000
Episode:519 meanR:0.1100 rate:0.0000 gloss:-1368.3026 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:520 meanR:0.1200 rate:0.0769 gloss:-1058.7081 dloss:59.3644 dlossR:59.3644 dlossQ:0.0000
Episode:521 meanR:0.1000 rate:-0.

Episode:596 meanR:0.0200 rate:0.0769 gloss:-1109.3406 dloss:62.2924 dlossR:62.2924 dlossQ:0.0000
Episode:597 meanR:0.0200 rate:0.0000 gloss:-1125.5854 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:598 meanR:-0.0400 rate:0.0769 gloss:-1350.0594 dloss:75.7491 dlossR:75.7491 dlossQ:0.0000
Episode:599 meanR:-0.0300 rate:0.0000 gloss:-1478.8955 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:600 meanR:-0.0300 rate:0.0000 gloss:-1339.5002 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:601 meanR:-0.0300 rate:0.0000 gloss:-1148.3549 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:602 meanR:-0.0300 rate:0.0000 gloss:-1317.1490 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:603 meanR:-0.0400 rate:-0.0769 gloss:-1211.3918 dloss:-67.9417 dlossR:-67.9417 dlossQ:0.0000
Episode:604 meanR:-0.0500 rate:-0.1538 gloss:-1283.5492 dloss:-143.9723 dlossR:-143.9723 dlossQ:0.0000
Episode:605 meanR:-0.0600 rate:0.0000 gloss:-1100.0135 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:606 meanR:-0.0400 

Episode:681 meanR:0.0400 rate:0.0000 gloss:-1026.7194 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:682 meanR:0.0400 rate:0.0000 gloss:-975.4412 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:683 meanR:0.0600 rate:0.2308 gloss:-887.8188 dloss:149.4717 dlossR:149.4717 dlossQ:0.0000
Episode:684 meanR:0.0700 rate:0.1538 gloss:-848.8059 dloss:95.3968 dlossR:95.3968 dlossQ:0.0000
Episode:685 meanR:0.0700 rate:0.0769 gloss:-979.8604 dloss:54.9393 dlossR:54.9393 dlossQ:0.0000
Episode:686 meanR:0.0800 rate:-0.0769 gloss:-848.9452 dloss:-47.7313 dlossR:-47.7313 dlossQ:0.0000
Episode:687 meanR:0.0900 rate:0.0000 gloss:-991.2678 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:688 meanR:0.0800 rate:0.0000 gloss:-1289.1023 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:689 meanR:0.0900 rate:0.0000 gloss:-1319.1196 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:690 meanR:0.1200 rate:0.0769 gloss:-1140.7622 dloss:63.9553 dlossR:63.9553 dlossQ:0.0000
Episode:691 meanR:0.0900 rate:-0.0769 glo

Episode:767 meanR:0.3600 rate:0.0000 gloss:-583.0620 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:768 meanR:0.3500 rate:-0.0769 gloss:-374.3598 dloss:-21.0474 dlossR:-21.0474 dlossQ:0.0000
Episode:769 meanR:0.3200 rate:0.0769 gloss:-350.2140 dloss:19.6086 dlossR:19.6086 dlossQ:0.0000
Episode:770 meanR:0.3200 rate:0.0000 gloss:-361.2056 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:771 meanR:0.3300 rate:0.0769 gloss:-375.7027 dloss:21.1182 dlossR:21.1182 dlossQ:0.0000
Episode:772 meanR:0.3300 rate:0.0000 gloss:-320.7170 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:773 meanR:0.3300 rate:0.0769 gloss:-359.6692 dloss:20.1991 dlossR:20.1991 dlossQ:0.0000
Episode:774 meanR:0.3400 rate:0.0769 gloss:-367.1304 dloss:20.5829 dlossR:20.5829 dlossQ:0.0000
Episode:775 meanR:0.3300 rate:0.0000 gloss:-499.4108 dloss:0.0000 dlossR:0.0000 dlossQ:0.0000
Episode:776 meanR:0.3300 rate:0.1538 gloss:-517.7791 dloss:58.0466 dlossR:58.0466 dlossQ:0.0000
Episode:777 meanR:0.3200 rate:-0.1538 gloss:-

Episode:853 meanR:0.2300 rate:0.0769 gloss:-131.7823 dloss:8.4290 dlossR:7.7353 dlossQ:0.6937
Episode:854 meanR:0.2400 rate:0.1538 gloss:-91.4142 dloss:12.2171 dlossR:11.4020 dlossQ:0.8151
Episode:855 meanR:0.3000 rate:0.5385 gloss:-91.0146 dloss:37.8485 dlossR:37.0636 dlossQ:0.7850
Episode:856 meanR:0.2900 rate:-0.0769 gloss:-60.0790 dloss:1.7886 dlossR:-0.1389 dlossQ:1.9276
Episode:857 meanR:0.2900 rate:0.0000 gloss:-54.5755 dloss:5.5006 dlossR:3.4009 dlossQ:2.0997
Episode:858 meanR:0.2600 rate:-0.2308 gloss:-56.3348 dloss:-4.9660 dlossR:-7.0627 dlossQ:2.0967
Episode:859 meanR:0.3200 rate:0.5385 gloss:-57.3140 dloss:29.2410 dlossR:26.3358 dlossQ:2.9053
Episode:860 meanR:0.3700 rate:0.4615 gloss:-55.6066 dloss:26.2788 dlossR:22.5684 dlossQ:3.7104
Episode:861 meanR:0.4000 rate:0.3846 gloss:-78.4146 dloss:27.3583 dlossR:24.6094 dlossQ:2.7489
Episode:862 meanR:0.3800 rate:-0.1538 gloss:-99.8030 dloss:-7.4333 dlossR:-9.9403 dlossQ:2.5070
Episode:863 meanR:0.3900 rate:0.0769 gloss:-239.364

Episode:940 meanR:1.2900 rate:-0.0769 gloss:-21.9936 dloss:-1.2485 dlossR:-1.2498 dlossQ:0.0013
Episode:941 meanR:1.2600 rate:-0.1538 gloss:-139.0272 dloss:-15.5944 dlossR:-15.5951 dlossQ:0.0008
Episode:942 meanR:1.2400 rate:-0.0769 gloss:-50.7199 dloss:-2.8620 dlossR:-2.8623 dlossQ:0.0003
Episode:943 meanR:1.1900 rate:-0.0769 gloss:-23.8521 dloss:-1.3527 dlossR:-1.3535 dlossQ:0.0008
Episode:944 meanR:1.1900 rate:-0.0769 gloss:-19.9021 dloss:-1.1216 dlossR:-1.1222 dlossQ:0.0006
Episode:945 meanR:1.2100 rate:0.1538 gloss:-19.2694 dloss:2.2252 dlossR:2.2174 dlossQ:0.0078
Episode:946 meanR:1.2300 rate:0.3846 gloss:-21.5767 dloss:6.2034 dlossR:6.1955 dlossQ:0.0079
Episode:947 meanR:1.2800 rate:0.3846 gloss:-18.9010 dloss:5.3639 dlossR:5.3620 dlossQ:0.0019
Episode:948 meanR:1.3000 rate:0.1538 gloss:-22.2992 dloss:2.5141 dlossR:2.5121 dlossQ:0.0020
Episode:949 meanR:1.2800 rate:0.2308 gloss:-18.7416 dloss:3.2778 dlossR:3.2667 dlossQ:0.0111
Episode:950 meanR:1.3200 rate:0.1538 gloss:-17.4518 

Episode:1028 meanR:1.8000 rate:0.4615 gloss:-1.4379 dloss:1.4902 dlossR:0.8536 dlossQ:0.6366
Episode:1029 meanR:1.7900 rate:0.0769 gloss:-1.4114 dloss:1.0464 dlossR:0.4170 dlossQ:0.6294
Episode:1030 meanR:1.8000 rate:0.1538 gloss:-1.5933 dloss:1.1352 dlossR:0.5047 dlossQ:0.6305
Episode:1031 meanR:1.7900 rate:0.0000 gloss:-1.8225 dloss:0.8246 dlossR:0.2587 dlossQ:0.5659
Episode:1032 meanR:1.8100 rate:0.2308 gloss:-2.0922 dloss:1.1770 dlossR:0.6133 dlossQ:0.5636
Episode:1033 meanR:1.8000 rate:0.0000 gloss:-53.6066 dloss:0.0970 dlossR:0.0111 dlossQ:0.0859
Episode:1034 meanR:1.8000 rate:0.0769 gloss:-2.7039 dloss:0.7297 dlossR:0.3062 dlossQ:0.4235
Episode:1035 meanR:1.8100 rate:0.0769 gloss:-3.8069 dloss:0.7893 dlossR:0.3488 dlossQ:0.4405
Episode:1036 meanR:1.8000 rate:-0.0769 gloss:-3.4877 dloss:0.2155 dlossR:-0.1144 dlossQ:0.3299
Episode:1037 meanR:1.8300 rate:0.1538 gloss:-3.2841 dloss:0.8547 dlossR:0.4950 dlossQ:0.3597
Episode:1038 meanR:1.8600 rate:0.2308 gloss:-3.7408 dloss:1.0068 dl

Episode:1116 meanR:2.1700 rate:0.0000 gloss:-2.1832 dloss:1.3458 dlossR:0.4230 dlossQ:0.9228
Episode:1117 meanR:2.1900 rate:0.2308 gloss:-3.2332 dloss:1.9310 dlossR:0.9967 dlossQ:0.9343
Episode:1118 meanR:2.2200 rate:0.3077 gloss:-1.5427 dloss:1.6395 dlossR:0.8258 dlossQ:0.8137
Episode:1119 meanR:2.2000 rate:0.0000 gloss:-21.4873 dloss:0.4759 dlossR:0.1222 dlossQ:0.3537
Episode:1120 meanR:2.1800 rate:0.1538 gloss:-1.1184 dloss:1.3605 dlossR:0.5949 dlossQ:0.7655
Episode:1121 meanR:2.1300 rate:0.1538 gloss:-3.2395 dloss:1.5545 dlossR:0.7604 dlossQ:0.7940
Episode:1122 meanR:2.1300 rate:0.1538 gloss:-1.6668 dloss:1.2525 dlossR:0.5573 dlossQ:0.6951
Episode:1123 meanR:2.1100 rate:0.0769 gloss:-2.6816 dloss:1.1791 dlossR:0.4346 dlossQ:0.7444
Episode:1124 meanR:2.0900 rate:0.0000 gloss:-2.3534 dloss:0.9630 dlossR:0.2905 dlossQ:0.6725
Episode:1125 meanR:2.0700 rate:0.0769 gloss:-1.8973 dloss:1.0935 dlossR:0.4145 dlossQ:0.6789
Episode:1126 meanR:2.0700 rate:0.1538 gloss:-2.7432 dloss:1.1167 dlos

Episode:1204 meanR:2.2900 rate:0.3077 gloss:-2.1489 dloss:1.4352 dlossR:0.7978 dlossQ:0.6374
Episode:1205 meanR:2.3500 rate:0.2308 gloss:-1.7803 dloss:1.2215 dlossR:0.6086 dlossQ:0.6129
Episode:1206 meanR:2.3900 rate:0.3077 gloss:-1.7827 dloss:1.4544 dlossR:0.7804 dlossQ:0.6740
Episode:1207 meanR:2.3400 rate:0.2308 gloss:-1.3224 dloss:1.3474 dlossR:0.6575 dlossQ:0.6899
Episode:1208 meanR:2.3100 rate:0.3077 gloss:-2.5089 dloss:1.7232 dlossR:0.9666 dlossQ:0.7566
Episode:1209 meanR:2.3000 rate:0.3846 gloss:-0.9319 dloss:1.4133 dlossR:0.7216 dlossQ:0.6916
Episode:1210 meanR:2.3200 rate:0.4615 gloss:-1.1931 dloss:1.6855 dlossR:0.9427 dlossQ:0.7428
Episode:1211 meanR:2.3000 rate:0.1538 gloss:-1.5301 dloss:1.2743 dlossR:0.5834 dlossQ:0.6909
Episode:1212 meanR:2.2700 rate:0.3077 gloss:-2.1435 dloss:1.6816 dlossR:0.8931 dlossQ:0.7885
Episode:1213 meanR:2.2800 rate:0.0769 gloss:-2.8565 dloss:1.2466 dlossR:0.5279 dlossQ:0.7187
Episode:1214 meanR:2.3300 rate:0.4615 gloss:-1.3882 dloss:1.5109 dloss

Episode:1293 meanR:2.4900 rate:-0.0769 gloss:-1.7851 dloss:0.8202 dlossR:0.2410 dlossQ:0.5792
Episode:1294 meanR:2.4600 rate:0.1538 gloss:-1.9938 dloss:1.0069 dlossR:0.4658 dlossQ:0.5411
Episode:1295 meanR:2.5000 rate:0.3846 gloss:-2.3834 dloss:1.2770 dlossR:0.8405 dlossQ:0.4365
Episode:1296 meanR:2.4900 rate:0.1538 gloss:-2.2634 dloss:0.8899 dlossR:0.4358 dlossQ:0.4541
Episode:1297 meanR:2.5100 rate:0.3077 gloss:-2.9067 dloss:1.2365 dlossR:0.8152 dlossQ:0.4213
Episode:1298 meanR:2.5600 rate:0.4615 gloss:-3.7318 dloss:1.9246 dlossR:1.4381 dlossQ:0.4865
Episode:1299 meanR:2.5200 rate:-0.0769 gloss:-6.2570 dloss:0.3540 dlossR:-0.2001 dlossQ:0.5541
Episode:1300 meanR:2.4900 rate:0.0769 gloss:-7.6981 dloss:1.1384 dlossR:0.5772 dlossQ:0.5612
Episode:1301 meanR:2.5200 rate:0.2308 gloss:-1.7670 dloss:1.0933 dlossR:0.5545 dlossQ:0.5388
Episode:1302 meanR:2.5100 rate:0.1538 gloss:-5.1915 dloss:1.3690 dlossR:0.7833 dlossQ:0.5857
Episode:1303 meanR:2.4600 rate:0.0000 gloss:-2.6205 dloss:0.7817 dl

Episode:1381 meanR:2.2300 rate:0.4615 gloss:-1.3604 dloss:1.4156 dlossR:0.7996 dlossQ:0.6160
Episode:1382 meanR:2.2100 rate:0.0000 gloss:-2.0446 dloss:0.7635 dlossR:0.2398 dlossQ:0.5236
Episode:1383 meanR:2.1900 rate:-0.0769 gloss:-1.9033 dloss:0.6817 dlossR:0.1468 dlossQ:0.5349
Episode:1384 meanR:2.1800 rate:0.0769 gloss:-3.1205 dloss:0.8345 dlossR:0.3575 dlossQ:0.4770
Episode:1385 meanR:2.1600 rate:0.2308 gloss:-2.6287 dloss:1.0876 dlossR:0.6231 dlossQ:0.4645
Episode:1386 meanR:2.1300 rate:-0.0769 gloss:-6.4149 dloss:0.1558 dlossR:-0.2701 dlossQ:0.4259
Episode:1387 meanR:2.1400 rate:0.0769 gloss:-5.1775 dloss:0.8479 dlossR:0.4141 dlossQ:0.4338
Episode:1388 meanR:2.1200 rate:0.0769 gloss:-3.7507 dloss:0.7167 dlossR:0.3341 dlossQ:0.3826
Episode:1389 meanR:2.1300 rate:0.0769 gloss:-5.5935 dloss:0.7071 dlossR:0.3905 dlossQ:0.3166
Episode:1390 meanR:2.1200 rate:0.3846 gloss:-4.5905 dloss:1.7525 dlossR:1.4029 dlossQ:0.3496
Episode:1391 meanR:2.0900 rate:-0.0769 gloss:-4.6167 dloss:0.0941 d

Episode:1469 meanR:1.9200 rate:0.2308 gloss:-2.6106 dloss:1.0423 dlossR:0.5989 dlossQ:0.4434
Episode:1470 meanR:1.9400 rate:0.2308 gloss:-1.9138 dloss:1.1909 dlossR:0.6404 dlossQ:0.5505
Episode:1471 meanR:1.9500 rate:0.2308 gloss:-0.9820 dloss:1.6102 dlossR:0.8729 dlossQ:0.7373
Episode:1472 meanR:1.8900 rate:0.0769 gloss:-1.8618 dloss:0.9376 dlossR:0.3771 dlossQ:0.5605
Episode:1473 meanR:1.8400 rate:0.0000 gloss:-2.0201 dloss:0.7378 dlossR:0.2215 dlossQ:0.5163
Episode:1474 meanR:1.8800 rate:0.6154 gloss:-1.9206 dloss:1.6171 dlossR:1.1007 dlossQ:0.5163
Episode:1475 meanR:1.8900 rate:0.2308 gloss:-2.0313 dloss:1.0443 dlossR:0.5529 dlossQ:0.4914
Episode:1476 meanR:1.8700 rate:0.0000 gloss:-4.4953 dloss:0.6549 dlossR:0.1591 dlossQ:0.4958
Episode:1477 meanR:1.9200 rate:0.4615 gloss:-1.9946 dloss:1.3934 dlossR:0.8951 dlossQ:0.4984
Episode:1478 meanR:1.9400 rate:0.1538 gloss:-5.9773 dloss:1.3703 dlossR:0.8300 dlossQ:0.5403
Episode:1479 meanR:1.9600 rate:0.3077 gloss:-2.3399 dloss:1.3599 dloss

Episode:1557 meanR:1.6600 rate:0.0000 gloss:-3.0283 dloss:0.5261 dlossR:0.1362 dlossQ:0.3899
Episode:1558 meanR:1.6900 rate:0.2308 gloss:-2.4771 dloss:1.0070 dlossR:0.5810 dlossQ:0.4260
Episode:1559 meanR:1.7000 rate:0.2308 gloss:-2.9902 dloss:1.0327 dlossR:0.6383 dlossQ:0.3944
Episode:1560 meanR:1.7100 rate:0.2308 gloss:-2.6680 dloss:0.9986 dlossR:0.5956 dlossQ:0.4030
Episode:1561 meanR:1.6900 rate:0.0769 gloss:-4.3174 dloss:0.7089 dlossR:0.3425 dlossQ:0.3664
Episode:1562 meanR:1.6800 rate:0.1538 gloss:-4.1521 dloss:0.9613 dlossR:0.5776 dlossQ:0.3837
Episode:1563 meanR:1.7400 rate:0.2308 gloss:-3.5101 dloss:1.1370 dlossR:0.7214 dlossQ:0.4155
Episode:1564 meanR:1.7600 rate:0.1538 gloss:-3.1012 dloss:0.9224 dlossR:0.4976 dlossQ:0.4248
Episode:1565 meanR:1.7700 rate:0.2308 gloss:-2.7067 dloss:1.1080 dlossR:0.6390 dlossQ:0.4691
Episode:1566 meanR:1.7500 rate:0.1538 gloss:-3.5315 dloss:1.0181 dlossR:0.5534 dlossQ:0.4647
Episode:1567 meanR:1.7900 rate:0.3077 gloss:-5.9491 dloss:1.9341 dloss

Episode:1645 meanR:1.8300 rate:0.0000 gloss:-2.8405 dloss:0.5358 dlossR:0.1368 dlossQ:0.3991
Episode:1646 meanR:1.8300 rate:0.1538 gloss:-2.1745 dloss:0.9081 dlossR:0.4369 dlossQ:0.4712
Episode:1647 meanR:1.8100 rate:0.0000 gloss:-2.6098 dloss:0.5679 dlossR:0.1515 dlossQ:0.4164
Episode:1648 meanR:1.8100 rate:0.0769 gloss:-3.3187 dloss:0.6166 dlossR:0.2890 dlossQ:0.3276
Episode:1649 meanR:1.8300 rate:0.3077 gloss:-3.2752 dloss:1.1703 dlossR:0.8432 dlossQ:0.3271
Episode:1650 meanR:1.8100 rate:0.0769 gloss:-2.7985 dloss:0.6844 dlossR:0.2943 dlossQ:0.3901
Episode:1651 meanR:1.8200 rate:0.0769 gloss:-3.0206 dloss:0.7025 dlossR:0.3008 dlossQ:0.4017
Episode:1652 meanR:1.8300 rate:0.2308 gloss:-2.8936 dloss:1.0223 dlossR:0.6256 dlossQ:0.3966
Episode:1653 meanR:1.8400 rate:0.1538 gloss:-3.4108 dloss:0.7863 dlossR:0.4758 dlossQ:0.3104
Episode:1654 meanR:1.8100 rate:0.0769 gloss:-2.8129 dloss:0.6960 dlossR:0.2975 dlossQ:0.3985
Episode:1655 meanR:1.8100 rate:0.0000 gloss:-3.6201 dloss:0.3637 dloss

Episode:1733 meanR:0.8100 rate:0.0000 gloss:-5.4188 dloss:0.1266 dlossR:0.0212 dlossQ:0.1054
Episode:1734 meanR:0.7600 rate:-0.0769 gloss:-3.9435 dloss:0.0563 dlossR:-0.1638 dlossQ:0.2201
Episode:1735 meanR:0.7200 rate:0.0769 gloss:-5.1129 dloss:0.4407 dlossR:0.3138 dlossQ:0.1269
Episode:1736 meanR:0.7200 rate:0.0000 gloss:-5.8934 dloss:0.1095 dlossR:0.0180 dlossQ:0.0914
Episode:1737 meanR:0.6900 rate:-0.0769 gloss:-6.0432 dloss:-0.2344 dlossR:-0.3221 dlossQ:0.0877
Episode:1738 meanR:0.6800 rate:0.0769 gloss:-5.8859 dloss:0.4630 dlossR:0.3521 dlossQ:0.1109
Episode:1739 meanR:0.6500 rate:0.0000 gloss:-6.3093 dloss:0.1218 dlossR:0.0194 dlossQ:0.1024
Episode:1740 meanR:0.6400 rate:0.0769 gloss:-8.3636 dloss:0.5148 dlossR:0.4750 dlossQ:0.0398
Episode:1741 meanR:0.6200 rate:0.0769 gloss:-6.1829 dloss:0.4735 dlossR:0.3675 dlossQ:0.1060
Episode:1742 meanR:0.6200 rate:0.0769 gloss:-5.5526 dloss:0.4345 dlossR:0.3319 dlossQ:0.1026
Episode:1743 meanR:0.5900 rate:0.0000 gloss:-4.7664 dloss:0.2174 

Episode:1821 meanR:0.6800 rate:0.5385 gloss:-4.5162 dloss:2.1101 dlossR:1.8504 dlossQ:0.2597
Episode:1822 meanR:0.7100 rate:0.1538 gloss:-3.7020 dloss:0.8103 dlossR:0.5036 dlossQ:0.3067
Episode:1823 meanR:0.6900 rate:0.0000 gloss:-3.8657 dloss:0.4055 dlossR:0.0884 dlossQ:0.3172
Episode:1824 meanR:0.6800 rate:0.0769 gloss:-3.8759 dloss:0.6311 dlossR:0.3094 dlossQ:0.3217
Episode:1825 meanR:0.7000 rate:0.0000 gloss:-5.9764 dloss:0.3736 dlossR:0.0721 dlossQ:0.3016
Episode:1826 meanR:0.7100 rate:0.0769 gloss:-4.5663 dloss:0.6804 dlossR:0.3495 dlossQ:0.3309
Episode:1827 meanR:0.7500 rate:0.3077 gloss:-3.4641 dloss:1.2624 dlossR:0.8939 dlossQ:0.3685
Episode:1828 meanR:0.7500 rate:0.0769 gloss:-3.4741 dloss:0.6974 dlossR:0.3167 dlossQ:0.3808
Episode:1829 meanR:0.7700 rate:0.1538 gloss:-2.6862 dloss:0.8694 dlossR:0.4515 dlossQ:0.4179
Episode:1830 meanR:0.7600 rate:0.0000 gloss:-2.4531 dloss:0.5901 dlossR:0.1613 dlossQ:0.4288
Episode:1831 meanR:0.7400 rate:-0.0769 gloss:-2.1495 dloss:0.5948 dlos

Episode:1909 meanR:0.6200 rate:0.0000 gloss:-4.1827 dloss:0.4514 dlossR:0.1127 dlossQ:0.3388
Episode:1910 meanR:0.6400 rate:0.2308 gloss:-4.2227 dloss:1.0083 dlossR:0.7754 dlossQ:0.2329
Episode:1911 meanR:0.6500 rate:0.0000 gloss:-4.3661 dloss:0.2241 dlossR:0.0436 dlossQ:0.1806
Episode:1912 meanR:0.6300 rate:-0.0769 gloss:-4.7993 dloss:-0.0847 dlossR:-0.2350 dlossQ:0.1503
Episode:1913 meanR:0.6200 rate:0.0769 gloss:-4.5852 dloss:0.4859 dlossR:0.3007 dlossQ:0.1852
Episode:1914 meanR:0.6300 rate:0.0769 gloss:-6.0359 dloss:0.5421 dlossR:0.3729 dlossQ:0.1692
Episode:1915 meanR:0.6200 rate:0.0000 gloss:-6.2584 dloss:0.2065 dlossR:0.0339 dlossQ:0.1726
Episode:1916 meanR:0.6300 rate:0.1538 gloss:-4.9849 dloss:0.8040 dlossR:0.6054 dlossQ:0.1986
Episode:1917 meanR:0.6100 rate:-0.0769 gloss:-5.8245 dloss:-0.1011 dlossR:-0.2876 dlossQ:0.1865
Episode:1918 meanR:0.5800 rate:-0.0769 gloss:-4.5702 dloss:-0.0080 dlossR:-0.2070 dlossQ:0.1990
Episode:1919 meanR:0.5800 rate:0.0769 gloss:-5.8421 dloss:0.5

Episode:1997 meanR:0.3900 rate:0.0000 gloss:-8.1814 dloss:0.0897 dlossR:0.0120 dlossQ:0.0777
Episode:1998 meanR:0.3800 rate:0.0000 gloss:-7.1135 dloss:0.1011 dlossR:0.0150 dlossQ:0.0861
Episode:1999 meanR:0.3700 rate:0.0000 gloss:-7.0798 dloss:0.1158 dlossR:0.0176 dlossQ:0.0982
Episode:2000 meanR:0.3700 rate:0.0769 gloss:-6.7895 dloss:0.4973 dlossR:0.3983 dlossQ:0.0991
Episode:2001 meanR:0.3700 rate:0.0000 gloss:-7.3785 dloss:0.1005 dlossR:0.0133 dlossQ:0.0872
Episode:2002 meanR:0.3700 rate:0.0000 gloss:-5.9878 dloss:0.1056 dlossR:0.0167 dlossQ:0.0889
Episode:2003 meanR:0.3800 rate:0.1538 gloss:-6.1956 dloss:0.7887 dlossR:0.7099 dlossQ:0.0788
Episode:2004 meanR:0.3700 rate:0.0000 gloss:-4.4027 dloss:0.3252 dlossR:0.0658 dlossQ:0.2594
Episode:2005 meanR:0.3600 rate:0.0000 gloss:-3.6901 dloss:0.5055 dlossR:0.1139 dlossQ:0.3916
Episode:2006 meanR:0.3500 rate:0.0769 gloss:-4.9305 dloss:0.4433 dlossR:0.3063 dlossQ:0.1370
Episode:2007 meanR:0.3200 rate:-0.0769 gloss:-6.2330 dloss:-0.2453 dlo

Episode:2085 meanR:0.5600 rate:0.0000 gloss:-4.6075 dloss:0.2337 dlossR:0.0459 dlossQ:0.1878
Episode:2086 meanR:0.5600 rate:0.0000 gloss:-4.0415 dloss:0.4044 dlossR:0.0871 dlossQ:0.3172
Episode:2087 meanR:0.5900 rate:0.2308 gloss:-4.7667 dloss:1.0112 dlossR:0.8441 dlossQ:0.1671
Episode:2088 meanR:0.6000 rate:0.0769 gloss:-5.0455 dloss:0.4525 dlossR:0.3134 dlossQ:0.1391
Episode:2089 meanR:0.6100 rate:0.0769 gloss:-5.2500 dloss:0.4578 dlossR:0.3234 dlossQ:0.1344
Episode:2090 meanR:0.6100 rate:0.0000 gloss:-4.3265 dloss:0.3035 dlossR:0.0633 dlossQ:0.2402
Episode:2091 meanR:0.6000 rate:0.0000 gloss:-5.2276 dloss:0.1621 dlossR:0.0282 dlossQ:0.1339
Episode:2092 meanR:0.6100 rate:0.0769 gloss:-5.0458 dloss:0.4417 dlossR:0.3109 dlossQ:0.1308
Episode:2093 meanR:0.6200 rate:0.0769 gloss:-5.4693 dloss:0.4497 dlossR:0.3320 dlossQ:0.1177
Episode:2094 meanR:0.6200 rate:0.0000 gloss:-4.8147 dloss:0.1935 dlossR:0.0357 dlossQ:0.1578
Episode:2095 meanR:0.6200 rate:0.0000 gloss:-4.5693 dloss:0.2114 dloss

Episode:2173 meanR:0.5600 rate:0.0000 gloss:-6.9738 dloss:0.2226 dlossR:0.0385 dlossQ:0.1841
Episode:2174 meanR:0.5400 rate:-0.1538 gloss:-7.5259 dloss:-0.5881 dlossR:-0.8027 dlossQ:0.2146
Episode:2175 meanR:0.5500 rate:0.0000 gloss:-8.2375 dloss:0.2522 dlossR:0.0363 dlossQ:0.2159
Episode:2176 meanR:0.5500 rate:0.0000 gloss:-10.5386 dloss:0.2253 dlossR:0.0338 dlossQ:0.1915
Episode:2177 meanR:0.5400 rate:-0.0769 gloss:-10.2630 dloss:-0.3995 dlossR:-0.5508 dlossQ:0.1512
Episode:2178 meanR:0.5500 rate:0.0769 gloss:-11.2700 dloss:0.8020 dlossR:0.6572 dlossQ:0.1448
Episode:2179 meanR:0.5600 rate:0.0769 gloss:-13.4442 dloss:0.9351 dlossR:0.7758 dlossQ:0.1593
Episode:2180 meanR:0.5600 rate:0.0000 gloss:-11.4514 dloss:0.1742 dlossR:0.0225 dlossQ:0.1516
Episode:2181 meanR:0.5700 rate:0.0769 gloss:-9.6777 dloss:0.6979 dlossR:0.5650 dlossQ:0.1329
Episode:2182 meanR:0.5700 rate:0.0000 gloss:-12.7369 dloss:0.1559 dlossR:0.0189 dlossQ:0.1370
Episode:2183 meanR:0.5500 rate:0.0000 gloss:-8.2685 dloss:

Episode:2261 meanR:0.4100 rate:0.0000 gloss:-6.2756 dloss:0.2106 dlossR:0.0337 dlossQ:0.1769
Episode:2262 meanR:0.4300 rate:0.0769 gloss:-9.2699 dloss:0.6300 dlossR:0.5418 dlossQ:0.0881
Episode:2263 meanR:0.4600 rate:0.2308 gloss:-4.1063 dloss:1.0319 dlossR:0.7670 dlossQ:0.2650
Episode:2264 meanR:0.4500 rate:0.0000 gloss:-9.7489 dloss:0.0304 dlossR:0.0034 dlossQ:0.0270
Episode:2265 meanR:0.4300 rate:-0.0769 gloss:-11.6697 dloss:-0.2994 dlossR:-0.6077 dlossQ:0.3083
Episode:2266 meanR:0.4700 rate:0.3077 gloss:-3.6162 dloss:1.2252 dlossR:0.9089 dlossQ:0.3163
Episode:2267 meanR:0.4900 rate:0.1538 gloss:-3.7418 dloss:0.8001 dlossR:0.5049 dlossQ:0.2952
Episode:2268 meanR:0.4800 rate:0.0000 gloss:-8.0322 dloss:0.4339 dlossR:0.0763 dlossQ:0.3576
Episode:2269 meanR:0.4800 rate:0.3077 gloss:-3.7650 dloss:1.2755 dlossR:0.9463 dlossQ:0.3292
Episode:2270 meanR:0.5300 rate:0.4615 gloss:-3.0473 dloss:1.5476 dlossR:1.1680 dlossQ:0.3795
Episode:2271 meanR:0.5400 rate:0.0000 gloss:-19.5696 dloss:0.1410 

Episode:2349 meanR:0.5500 rate:0.0000 gloss:-9.7561 dloss:0.1939 dlossR:0.0266 dlossQ:0.1673
Episode:2350 meanR:0.5500 rate:0.0000 gloss:-12.1059 dloss:0.0491 dlossR:0.0065 dlossQ:0.0426
Episode:2351 meanR:0.5700 rate:0.0769 gloss:-11.6451 dloss:0.7745 dlossR:0.6672 dlossQ:0.1073
Episode:2352 meanR:0.5500 rate:0.0000 gloss:-10.1919 dloss:0.1258 dlossR:0.0158 dlossQ:0.1100
Episode:2353 meanR:0.5200 rate:-0.0769 gloss:-9.9600 dloss:-0.4381 dlossR:-0.5417 dlossQ:0.1036
Episode:2354 meanR:0.5100 rate:0.0769 gloss:-10.9905 dloss:0.7227 dlossR:0.6288 dlossQ:0.0939
Episode:2355 meanR:0.5000 rate:-0.0769 gloss:-9.3408 dloss:-0.4423 dlossR:-0.5141 dlossQ:0.0719
Episode:2356 meanR:0.5100 rate:0.0769 gloss:-15.9995 dloss:1.0026 dlossR:0.9063 dlossQ:0.0963
Episode:2357 meanR:0.5300 rate:0.1538 gloss:-9.2384 dloss:1.1416 dlossR:1.0512 dlossQ:0.0905
Episode:2358 meanR:0.5300 rate:0.0000 gloss:-7.9564 dloss:0.0999 dlossR:0.0126 dlossQ:0.0873
Episode:2359 meanR:0.5300 rate:0.0000 gloss:-7.4493 dloss:0

Episode:2437 meanR:0.3100 rate:-0.0769 gloss:-8.6339 dloss:-0.4359 dlossR:-0.4780 dlossQ:0.0421
Episode:2438 meanR:0.3000 rate:0.0000 gloss:-8.5987 dloss:0.0529 dlossR:0.0067 dlossQ:0.0463
Episode:2439 meanR:0.3100 rate:0.0000 gloss:-7.7582 dloss:0.0644 dlossR:0.0088 dlossQ:0.0556
Episode:2440 meanR:0.3000 rate:0.0000 gloss:-8.3715 dloss:0.0502 dlossR:0.0068 dlossQ:0.0435
Episode:2441 meanR:0.3000 rate:-0.0769 gloss:-7.9911 dloss:-0.3925 dlossR:-0.4406 dlossQ:0.0481
Episode:2442 meanR:0.3000 rate:0.0000 gloss:-8.1069 dloss:0.0528 dlossR:0.0069 dlossQ:0.0459
Episode:2443 meanR:0.3100 rate:0.0769 gloss:-9.1198 dloss:0.5433 dlossR:0.5147 dlossQ:0.0286
Episode:2444 meanR:0.3500 rate:0.0769 gloss:-6.3324 dloss:0.4622 dlossR:0.3719 dlossQ:0.0904
Episode:2445 meanR:0.3800 rate:0.0769 gloss:-6.2580 dloss:0.4504 dlossR:0.3665 dlossQ:0.0838
Episode:2446 meanR:0.3600 rate:0.0769 gloss:-5.9743 dloss:0.4750 dlossR:0.3606 dlossQ:0.1143
Episode:2447 meanR:0.3700 rate:-0.0769 gloss:-4.4012 dloss:0.305

Episode:2525 meanR:0.4600 rate:-0.1538 gloss:-7.0217 dloss:-0.5341 dlossR:-0.7515 dlossQ:0.2174
Episode:2526 meanR:0.4600 rate:0.0769 gloss:-6.1353 dloss:0.5990 dlossR:0.3841 dlossQ:0.2150
Episode:2527 meanR:0.4300 rate:0.0000 gloss:-6.1980 dloss:0.2560 dlossR:0.0409 dlossQ:0.2151
Episode:2528 meanR:0.4300 rate:0.0769 gloss:-5.3183 dloss:0.5054 dlossR:0.3356 dlossQ:0.1697
Episode:2529 meanR:0.4100 rate:-0.1538 gloss:-5.0056 dloss:-0.3138 dlossR:-0.5154 dlossQ:0.2015
Episode:2530 meanR:0.4100 rate:0.0000 gloss:-5.8012 dloss:0.2089 dlossR:0.0341 dlossQ:0.1749
Episode:2531 meanR:0.4200 rate:0.0000 gloss:-5.2174 dloss:0.2166 dlossR:0.0383 dlossQ:0.1783
Episode:2532 meanR:0.4300 rate:0.2308 gloss:-5.1993 dloss:1.1039 dlossR:0.9175 dlossQ:0.1864
Episode:2533 meanR:0.4600 rate:0.2308 gloss:-6.0257 dloss:1.1969 dlossR:1.0453 dlossQ:0.1516
Episode:2534 meanR:0.4200 rate:-0.0769 gloss:-5.4684 dloss:-0.1027 dlossR:-0.2702 dlossQ:0.1675
Episode:2535 meanR:0.4300 rate:0.0000 gloss:-6.5854 dloss:0.1

Episode:2613 meanR:0.3200 rate:0.0000 gloss:-4.6079 dloss:0.2303 dlossR:0.0430 dlossQ:0.1873
Episode:2614 meanR:0.3300 rate:0.0769 gloss:-4.2803 dloss:0.5001 dlossR:0.2910 dlossQ:0.2091
Episode:2615 meanR:0.3500 rate:0.0769 gloss:-4.0113 dloss:0.5248 dlossR:0.2871 dlossQ:0.2377
Episode:2616 meanR:0.3600 rate:0.0000 gloss:-2.6361 dloss:0.6847 dlossR:0.2058 dlossQ:0.4789
Episode:2617 meanR:0.3500 rate:0.0000 gloss:-5.0328 dloss:0.2462 dlossR:0.0408 dlossQ:0.2054
Episode:2618 meanR:0.3200 rate:-0.0769 gloss:-5.7752 dloss:-0.1675 dlossR:-0.2968 dlossQ:0.1293
Episode:2619 meanR:0.3100 rate:0.0000 gloss:-6.7724 dloss:0.1711 dlossR:0.0274 dlossQ:0.1437
Episode:2620 meanR:0.3100 rate:0.0000 gloss:-7.6425 dloss:0.1522 dlossR:0.0244 dlossQ:0.1278
Episode:2621 meanR:0.2900 rate:0.0000 gloss:-9.4342 dloss:0.1481 dlossR:0.0195 dlossQ:0.1286
Episode:2622 meanR:0.2800 rate:0.0000 gloss:-9.3041 dloss:0.1385 dlossR:0.0220 dlossQ:0.1165
Episode:2623 meanR:0.3300 rate:0.2308 gloss:-8.7945 dloss:1.6452 dl

Episode:2701 meanR:0.4200 rate:0.0769 gloss:-10.5917 dloss:1.0479 dlossR:0.6482 dlossQ:0.3998
Episode:2702 meanR:0.4200 rate:0.0000 gloss:-5.9500 dloss:0.1117 dlossR:0.0186 dlossQ:0.0932
Episode:2703 meanR:0.4100 rate:0.0000 gloss:-7.1320 dloss:0.3197 dlossR:0.0527 dlossQ:0.2670
Episode:2704 meanR:0.4100 rate:0.0769 gloss:-7.7645 dloss:0.7382 dlossR:0.4842 dlossQ:0.2540
Episode:2705 meanR:0.4300 rate:0.2308 gloss:-4.2276 dloss:0.9898 dlossR:0.7712 dlossQ:0.2186
Episode:2706 meanR:0.4600 rate:0.1538 gloss:-3.9318 dloss:0.7453 dlossR:0.5055 dlossQ:0.2398
Episode:2707 meanR:0.4300 rate:-0.1538 gloss:-4.7623 dloss:-0.2808 dlossR:-0.4852 dlossQ:0.2044
Episode:2708 meanR:0.4100 rate:0.0000 gloss:-5.1728 dloss:0.2368 dlossR:0.0438 dlossQ:0.1930
Episode:2709 meanR:0.3800 rate:-0.1538 gloss:-3.7456 dloss:-0.1081 dlossR:-0.3546 dlossQ:0.2465
Episode:2710 meanR:0.3900 rate:0.1538 gloss:-3.8324 dloss:0.7249 dlossR:0.4927 dlossQ:0.2323
Episode:2711 meanR:0.3700 rate:-0.1538 gloss:-3.9032 dloss:-0.1

Episode:2789 meanR:0.1900 rate:0.0000 gloss:-8.6387 dloss:0.0465 dlossR:0.0070 dlossQ:0.0395
Episode:2790 meanR:0.1900 rate:0.0000 gloss:-9.2525 dloss:0.1040 dlossR:0.0150 dlossQ:0.0889
Episode:2791 meanR:0.1800 rate:0.0000 gloss:-8.6141 dloss:0.1134 dlossR:0.0150 dlossQ:0.0984
Episode:2792 meanR:0.2000 rate:0.0000 gloss:-8.8397 dloss:0.1238 dlossR:0.0167 dlossQ:0.1071
Episode:2793 meanR:0.2100 rate:0.0000 gloss:-8.6862 dloss:0.1289 dlossR:0.0173 dlossQ:0.1116
Episode:2794 meanR:0.2000 rate:-0.0769 gloss:-11.4433 dloss:-0.5345 dlossR:-0.6283 dlossQ:0.0938
Episode:2795 meanR:0.2000 rate:0.0000 gloss:-9.5294 dloss:0.0456 dlossR:0.0060 dlossQ:0.0397
Episode:2796 meanR:0.2400 rate:0.2308 gloss:-8.4796 dloss:1.5751 dlossR:1.4487 dlossQ:0.1264
Episode:2797 meanR:0.2200 rate:0.0000 gloss:-8.4248 dloss:0.1639 dlossR:0.0230 dlossQ:0.1409
Episode:2798 meanR:0.2000 rate:-0.0769 gloss:-10.0183 dloss:-0.4059 dlossR:-0.5417 dlossQ:0.1358
Episode:2799 meanR:0.2100 rate:0.0000 gloss:-5.8276 dloss:0.14

Episode:2877 meanR:0.2100 rate:-0.0769 gloss:-10.8207 dloss:-0.5642 dlossR:-0.6031 dlossQ:0.0389
Episode:2878 meanR:0.1900 rate:0.0000 gloss:-9.1989 dloss:0.0464 dlossR:0.0053 dlossQ:0.0411
Episode:2879 meanR:0.1600 rate:-0.1538 gloss:-9.0284 dloss:-0.9722 dlossR:-1.0090 dlossQ:0.0368
Episode:2880 meanR:0.2000 rate:0.1538 gloss:-7.7308 dloss:0.9157 dlossR:0.8739 dlossQ:0.0419
Episode:2881 meanR:0.1800 rate:0.0000 gloss:-11.0821 dloss:0.0080 dlossR:0.0009 dlossQ:0.0071
Episode:2882 meanR:0.1700 rate:-0.0769 gloss:-9.1249 dloss:-0.4887 dlossR:-0.5114 dlossQ:0.0228
Episode:2883 meanR:0.1200 rate:-0.1538 gloss:-11.0142 dloss:-1.2078 dlossR:-1.2304 dlossQ:0.0227
Episode:2884 meanR:0.0600 rate:-0.1538 gloss:-12.2110 dloss:-1.3428 dlossR:-1.3652 dlossQ:0.0224
Episode:2885 meanR:0.0800 rate:-0.1538 gloss:-9.6654 dloss:-1.0566 dlossR:-1.0850 dlossQ:0.0284
Episode:2886 meanR:0.0800 rate:0.0769 gloss:-21.7176 dloss:1.2185 dlossR:1.2152 dlossQ:0.0033
Episode:2887 meanR:0.0700 rate:0.0000 gloss:-16

Episode:2963 meanR:-0.0400 rate:-0.0769 gloss:-10.7917 dloss:-0.5631 dlossR:-0.6035 dlossQ:0.0403
Episode:2964 meanR:-0.0200 rate:0.1538 gloss:-8.8784 dloss:1.0480 dlossR:1.0013 dlossQ:0.0468
Episode:2965 meanR:0.0100 rate:0.2308 gloss:-8.7427 dloss:1.5321 dlossR:1.4904 dlossQ:0.0417
Episode:2966 meanR:0.0300 rate:0.0769 gloss:-18.0128 dloss:1.0282 dlossR:1.0100 dlossQ:0.0182
Episode:2967 meanR:0.0800 rate:0.3077 gloss:-10.1267 dloss:2.3415 dlossR:2.2844 dlossQ:0.0572
Episode:2968 meanR:0.0800 rate:0.0000 gloss:-21.2021 dloss:0.0985 dlossR:0.0056 dlossQ:0.0929
Episode:2969 meanR:0.0800 rate:0.0000 gloss:-9.2695 dloss:0.0412 dlossR:0.0045 dlossQ:0.0367
Episode:2970 meanR:0.0700 rate:0.0000 gloss:-9.4088 dloss:0.0510 dlossR:0.0061 dlossQ:0.0449
Episode:2971 meanR:0.0600 rate:-0.1538 gloss:-10.8897 dloss:-1.1999 dlossR:-1.2156 dlossQ:0.0158
Episode:2972 meanR:0.0700 rate:0.0000 gloss:-23.1303 dloss:0.0024 dlossR:0.0003 dlossQ:0.0021
Episode:2973 meanR:0.0700 rate:0.0000 gloss:-10.6976 dlo

Episode:3051 meanR:0.3000 rate:0.1538 gloss:-9.5011 dloss:1.1487 dlossR:1.0792 dlossQ:0.0695
Episode:3052 meanR:0.2900 rate:0.0000 gloss:-9.4216 dloss:0.0337 dlossR:0.0048 dlossQ:0.0289
Episode:3053 meanR:0.2800 rate:-0.1538 gloss:-7.7926 dloss:-0.7565 dlossR:-0.8576 dlossQ:0.1011
Episode:3054 meanR:0.2600 rate:0.1538 gloss:-7.1921 dloss:1.0126 dlossR:0.8466 dlossQ:0.1660
Episode:3055 meanR:0.3100 rate:0.2308 gloss:-5.5625 dloss:1.1392 dlossR:0.9766 dlossQ:0.1626
Episode:3056 meanR:0.3200 rate:0.0769 gloss:-5.8059 dloss:0.5357 dlossR:0.3652 dlossQ:0.1705
Episode:3057 meanR:0.3000 rate:-0.1538 gloss:-5.4028 dloss:-0.3976 dlossR:-0.5680 dlossQ:0.1704
Episode:3058 meanR:0.2900 rate:-0.0769 gloss:-5.0040 dloss:-0.0289 dlossR:-0.2338 dlossQ:0.2049
Episode:3059 meanR:0.3400 rate:0.1538 gloss:-5.4501 dloss:0.8265 dlossR:0.6531 dlossQ:0.1734
Episode:3060 meanR:0.3100 rate:-0.1538 gloss:-5.4534 dloss:-0.3862 dlossR:-0.5714 dlossQ:0.1852
Episode:3061 meanR:0.3300 rate:0.1538 gloss:-5.6415 dloss:

Episode:3139 meanR:0.5100 rate:-0.1538 gloss:-6.0748 dloss:-0.4945 dlossR:-0.6508 dlossQ:0.1564
Episode:3140 meanR:0.5000 rate:0.0000 gloss:-7.1849 dloss:0.0493 dlossR:0.0070 dlossQ:0.0423
Episode:3141 meanR:0.4900 rate:0.0000 gloss:-10.8923 dloss:0.1213 dlossR:0.0133 dlossQ:0.1080
Episode:3142 meanR:0.4800 rate:0.0000 gloss:-11.3607 dloss:0.0164 dlossR:0.0013 dlossQ:0.0151
Episode:3143 meanR:0.5000 rate:0.1538 gloss:-10.3824 dloss:1.3524 dlossR:1.1858 dlossQ:0.1666
Episode:3144 meanR:0.5100 rate:0.0000 gloss:-13.9305 dloss:0.2515 dlossR:0.0177 dlossQ:0.2337
Episode:3145 meanR:0.5100 rate:0.0769 gloss:-9.3459 dloss:0.7095 dlossR:0.5448 dlossQ:0.1647
Episode:3146 meanR:0.5100 rate:0.0000 gloss:-6.2181 dloss:0.1559 dlossR:0.0238 dlossQ:0.1321
Episode:3147 meanR:0.5300 rate:0.1538 gloss:-5.9078 dloss:0.8282 dlossR:0.6925 dlossQ:0.1357
Episode:3148 meanR:0.5500 rate:0.1538 gloss:-5.3437 dloss:0.7755 dlossR:0.6313 dlossQ:0.1442
Episode:3149 meanR:0.5400 rate:0.1538 gloss:-5.7725 dloss:0.775

Episode:3227 meanR:0.6300 rate:-0.0769 gloss:-3.1040 dloss:0.2633 dlossR:-0.0690 dlossQ:0.3324
Episode:3228 meanR:0.6500 rate:0.1538 gloss:-3.2493 dloss:0.7909 dlossR:0.4632 dlossQ:0.3277
Episode:3229 meanR:0.6800 rate:0.3077 gloss:-3.0927 dloss:1.1295 dlossR:0.8001 dlossQ:0.3293
Episode:3230 meanR:0.6600 rate:0.0000 gloss:-3.0662 dloss:0.4810 dlossR:0.1192 dlossQ:0.3618
Episode:3231 meanR:0.6600 rate:-0.0769 gloss:-2.8925 dloss:0.3561 dlossR:-0.0288 dlossQ:0.3849
Episode:3232 meanR:0.6400 rate:-0.0769 gloss:-3.2079 dloss:0.3255 dlossR:-0.0546 dlossQ:0.3800
Episode:3233 meanR:0.6400 rate:0.0000 gloss:-3.1850 dloss:0.5033 dlossR:0.1246 dlossQ:0.3787
Episode:3234 meanR:0.6800 rate:0.3077 gloss:-5.9459 dloss:1.7887 dlossR:1.4311 dlossQ:0.3575
Episode:3235 meanR:0.6700 rate:0.0000 gloss:-4.3643 dloss:0.5727 dlossR:0.1373 dlossQ:0.4354
Episode:3236 meanR:0.6700 rate:0.0000 gloss:-5.0902 dloss:0.4449 dlossR:0.0970 dlossQ:0.3478
Episode:3237 meanR:0.6700 rate:0.0000 gloss:-7.0230 dloss:0.3431

Episode:3315 meanR:0.8000 rate:0.0769 gloss:-3.9520 dloss:0.6311 dlossR:0.3073 dlossQ:0.3238
Episode:3316 meanR:0.8000 rate:0.0000 gloss:-3.3587 dloss:0.3910 dlossR:0.0898 dlossQ:0.3011
Episode:3317 meanR:0.8300 rate:0.2308 gloss:-3.1516 dloss:0.9779 dlossR:0.6370 dlossQ:0.3409
Episode:3318 meanR:0.8200 rate:0.0000 gloss:-3.2099 dloss:0.4355 dlossR:0.1031 dlossQ:0.3324
Episode:3319 meanR:0.8100 rate:0.0000 gloss:-2.9733 dloss:0.4948 dlossR:0.1275 dlossQ:0.3673
Episode:3320 meanR:0.7800 rate:0.0000 gloss:-2.5285 dloss:0.7532 dlossR:0.2352 dlossQ:0.5179
Episode:3321 meanR:0.7800 rate:0.0000 gloss:-3.0863 dloss:0.5327 dlossR:0.1298 dlossQ:0.4029
Episode:3322 meanR:0.7800 rate:-0.0769 gloss:-4.0755 dloss:0.1626 dlossR:-0.1474 dlossQ:0.3099
Episode:3323 meanR:0.7600 rate:0.0000 gloss:-4.6512 dloss:0.2214 dlossR:0.0400 dlossQ:0.1813
Episode:3324 meanR:0.7200 rate:-0.1538 gloss:-7.0904 dloss:-0.4926 dlossR:-0.7425 dlossQ:0.2499
Episode:3325 meanR:0.6900 rate:0.0000 gloss:-11.0827 dloss:0.2435

Episode:3403 meanR:1.0100 rate:-0.0769 gloss:-3.0054 dloss:0.2836 dlossR:-0.0566 dlossQ:0.3401
Episode:3404 meanR:1.0200 rate:0.0769 gloss:-3.1625 dloss:0.5930 dlossR:0.2757 dlossQ:0.3173
Episode:3405 meanR:0.9900 rate:0.0000 gloss:-3.2244 dloss:0.4119 dlossR:0.0969 dlossQ:0.3150
Episode:3406 meanR:0.9800 rate:0.0000 gloss:-3.7993 dloss:0.3069 dlossR:0.0659 dlossQ:0.2410
Episode:3407 meanR:0.9600 rate:0.0000 gloss:-3.9584 dloss:0.2777 dlossR:0.0566 dlossQ:0.2212
Episode:3408 meanR:0.9400 rate:0.0000 gloss:-3.9527 dloss:0.2951 dlossR:0.0617 dlossQ:0.2334
Episode:3409 meanR:0.9500 rate:0.0000 gloss:-4.4534 dloss:0.2233 dlossR:0.0430 dlossQ:0.1803
Episode:3410 meanR:0.9400 rate:0.0000 gloss:-4.7705 dloss:0.2008 dlossR:0.0366 dlossQ:0.1642
Episode:3411 meanR:0.9500 rate:0.0000 gloss:-5.3076 dloss:0.1704 dlossR:0.0293 dlossQ:0.1411
Episode:3412 meanR:0.9100 rate:-0.0769 gloss:-6.7098 dloss:-0.2677 dlossR:-0.3591 dlossQ:0.0914
Episode:3413 meanR:0.8900 rate:0.0000 gloss:-6.5118 dloss:0.2179 

Episode:3491 meanR:0.2000 rate:-0.3077 gloss:-8.5735 dloss:-1.5485 dlossR:-1.8576 dlossQ:0.3091
Episode:3492 meanR:0.2100 rate:0.0769 gloss:-8.6907 dloss:0.5978 dlossR:0.5046 dlossQ:0.0932
Episode:3493 meanR:0.1800 rate:0.0000 gloss:-7.8281 dloss:0.3229 dlossR:0.0483 dlossQ:0.2745
Episode:3494 meanR:0.1800 rate:0.0000 gloss:-13.6375 dloss:0.4364 dlossR:0.0411 dlossQ:0.3953
Episode:3495 meanR:0.1900 rate:0.0769 gloss:-10.2353 dloss:1.0031 dlossR:0.6277 dlossQ:0.3754
Episode:3496 meanR:0.1900 rate:0.0769 gloss:-8.2443 dloss:0.8508 dlossR:0.5067 dlossQ:0.3442
Episode:3497 meanR:0.2100 rate:0.0769 gloss:-4.7188 dloss:0.4615 dlossR:0.3034 dlossQ:0.1581
Episode:3498 meanR:0.1900 rate:0.0000 gloss:-5.1020 dloss:0.1479 dlossR:0.0261 dlossQ:0.1218
Episode:3499 meanR:0.1700 rate:-0.0769 gloss:-4.7290 dloss:-0.0301 dlossR:-0.2210 dlossQ:0.1909
Episode:3500 meanR:0.2000 rate:0.0769 gloss:-4.0613 dloss:0.5127 dlossR:0.2870 dlossQ:0.2256
Episode:3501 meanR:0.2200 rate:0.0000 gloss:-4.4513 dloss:0.24

Episode:3579 meanR:0.2900 rate:0.1538 gloss:-5.4170 dloss:0.8516 dlossR:0.6519 dlossQ:0.1997
Episode:3580 meanR:0.2300 rate:0.0000 gloss:-9.5554 dloss:0.2475 dlossR:0.0451 dlossQ:0.2024
Episode:3581 meanR:0.2300 rate:0.0769 gloss:-4.2009 dloss:0.5667 dlossR:0.3073 dlossQ:0.2594
Episode:3582 meanR:0.2400 rate:0.0000 gloss:-2.6731 dloss:0.5997 dlossR:0.1812 dlossQ:0.4184
Episode:3583 meanR:0.2400 rate:0.2308 gloss:-3.4604 dloss:0.9576 dlossR:0.6688 dlossQ:0.2888
Episode:3584 meanR:0.2400 rate:0.0000 gloss:-3.3066 dloss:0.4020 dlossR:0.0937 dlossQ:0.3083
Episode:3585 meanR:0.2900 rate:0.3846 gloss:-3.2037 dloss:1.3322 dlossR:1.0044 dlossQ:0.3278
Episode:3586 meanR:0.2900 rate:0.0769 gloss:-3.2253 dloss:0.5885 dlossR:0.2763 dlossQ:0.3122
Episode:3587 meanR:0.3400 rate:0.3077 gloss:-2.9543 dloss:1.1357 dlossR:0.7829 dlossQ:0.3529
Episode:3588 meanR:0.3800 rate:0.0769 gloss:-2.8719 dloss:0.6836 dlossR:0.2962 dlossQ:0.3874
Episode:3589 meanR:0.3900 rate:0.0769 gloss:-2.8552 dloss:0.6708 dloss

Episode:3667 meanR:0.8000 rate:0.0769 gloss:-8.7576 dloss:0.6476 dlossR:0.5097 dlossQ:0.1378
Episode:3668 meanR:0.7700 rate:0.0000 gloss:-5.8498 dloss:0.1316 dlossR:0.0221 dlossQ:0.1095
Episode:3669 meanR:0.7700 rate:0.0769 gloss:-6.9980 dloss:0.4909 dlossR:0.4087 dlossQ:0.0822
Episode:3670 meanR:0.7400 rate:0.0000 gloss:-5.4177 dloss:0.1254 dlossR:0.0211 dlossQ:0.1043
Episode:3671 meanR:0.7300 rate:0.1538 gloss:-4.8698 dloss:0.7325 dlossR:0.5803 dlossQ:0.1522
Episode:3672 meanR:0.7300 rate:0.0000 gloss:-4.9427 dloss:0.1805 dlossR:0.0328 dlossQ:0.1477
Episode:3673 meanR:0.7300 rate:-0.0769 gloss:-5.0987 dloss:-0.1330 dlossR:-0.2589 dlossQ:0.1259
Episode:3674 meanR:0.7400 rate:0.0000 gloss:-4.8807 dloss:0.1820 dlossR:0.0332 dlossQ:0.1488
Episode:3675 meanR:0.7100 rate:-0.0769 gloss:-5.2345 dloss:-0.1393 dlossR:-0.2664 dlossQ:0.1272
Episode:3676 meanR:0.6700 rate:0.0000 gloss:-5.8348 dloss:0.1602 dlossR:0.0252 dlossQ:0.1351
Episode:3677 meanR:0.6700 rate:0.1538 gloss:-4.9817 dloss:0.7600

Episode:3754 meanR:0.0000 rate:0.0000 gloss:-20.8253 dloss:0.0271 dlossR:0.0011 dlossQ:0.0260
Episode:3755 meanR:-0.0100 rate:-0.0769 gloss:-16.8975 dloss:-0.8999 dlossR:-0.9452 dlossQ:0.0453
Episode:3756 meanR:0.0000 rate:0.0000 gloss:-9.9623 dloss:0.0213 dlossR:0.0023 dlossQ:0.0190
Episode:3757 meanR:-0.0100 rate:0.0000 gloss:-10.7784 dloss:0.0096 dlossR:0.0008 dlossQ:0.0088
Episode:3758 meanR:-0.0100 rate:0.0769 gloss:-9.4619 dloss:0.5488 dlossR:0.5323 dlossQ:0.0165
Episode:3759 meanR:0.0100 rate:0.0769 gloss:-9.7579 dloss:0.5642 dlossR:0.5485 dlossQ:0.0158
Episode:3760 meanR:0.0300 rate:0.0769 gloss:-9.3771 dloss:0.5434 dlossR:0.5277 dlossQ:0.0157
Episode:3761 meanR:0.0500 rate:0.1538 gloss:-9.0662 dloss:1.0445 dlossR:1.0226 dlossQ:0.0220
Episode:3762 meanR:0.0400 rate:0.0000 gloss:-9.8753 dloss:0.0167 dlossR:0.0019 dlossQ:0.0149
Episode:3763 meanR:0.0000 rate:-0.2308 gloss:-10.8656 dloss:-1.8191 dlossR:-1.8296 dlossQ:0.0105
Episode:3764 meanR:0.0000 rate:-0.0769 gloss:-10.0946 dlo

Episode:3841 meanR:0.5600 rate:0.1538 gloss:-5.4401 dloss:1.0135 dlossR:0.6842 dlossQ:0.3293
Episode:3842 meanR:0.5900 rate:0.2308 gloss:-5.6699 dloss:1.3070 dlossR:1.0215 dlossQ:0.2854
Episode:3843 meanR:0.5900 rate:0.0000 gloss:-4.5737 dloss:0.4342 dlossR:0.0852 dlossQ:0.3489
Episode:3844 meanR:0.5800 rate:-0.0769 gloss:-7.1274 dloss:-0.2380 dlossR:-0.3752 dlossQ:0.1372
Episode:3845 meanR:0.6000 rate:0.1538 gloss:-5.0261 dloss:0.9699 dlossR:0.6456 dlossQ:0.3243
Episode:3846 meanR:0.6100 rate:0.1538 gloss:-5.5841 dloss:1.0220 dlossR:0.6969 dlossQ:0.3251
Episode:3847 meanR:0.5900 rate:-0.1538 gloss:-5.7166 dloss:-0.2390 dlossR:-0.5629 dlossQ:0.3239
Episode:3848 meanR:0.5800 rate:0.0769 gloss:-8.5100 dloss:0.7701 dlossR:0.5203 dlossQ:0.2498
Episode:3849 meanR:0.6000 rate:0.0769 gloss:-6.3788 dloss:0.6709 dlossR:0.4111 dlossQ:0.2599
Episode:3850 meanR:0.5500 rate:-0.3077 gloss:-5.8968 dloss:-1.0308 dlossR:-1.2683 dlossQ:0.2374
Episode:3851 meanR:0.5700 rate:0.1538 gloss:-5.7589 dloss:0.9

Episode:3929 meanR:0.5400 rate:0.0769 gloss:-3.5850 dloss:0.5645 dlossR:0.2800 dlossQ:0.2845
Episode:3930 meanR:0.4900 rate:0.1538 gloss:-3.9667 dloss:0.7157 dlossR:0.5007 dlossQ:0.2149
Episode:3931 meanR:0.4900 rate:0.0769 gloss:-4.3098 dloss:0.4680 dlossR:0.2857 dlossQ:0.1823
Episode:3932 meanR:0.5600 rate:0.1538 gloss:-4.2033 dloss:0.7129 dlossR:0.5195 dlossQ:0.1934
Episode:3933 meanR:0.5500 rate:-0.0769 gloss:-4.6292 dloss:-0.0396 dlossR:-0.2180 dlossQ:0.1784
Episode:3934 meanR:0.5900 rate:0.2308 gloss:-4.7220 dloss:1.0508 dlossR:0.8437 dlossQ:0.2071
Episode:3935 meanR:0.5800 rate:0.0000 gloss:-4.0455 dloss:0.2543 dlossR:0.0513 dlossQ:0.2030
Episode:3936 meanR:0.6100 rate:0.2308 gloss:-7.9194 dloss:1.6849 dlossR:1.3845 dlossQ:0.3004
Episode:3937 meanR:0.6200 rate:0.2308 gloss:-8.4255 dloss:1.9037 dlossR:1.4721 dlossQ:0.4316
Episode:3938 meanR:0.6200 rate:0.2308 gloss:-5.7089 dloss:1.4515 dlossR:1.0323 dlossQ:0.4192
Episode:3939 meanR:0.5900 rate:-0.0769 gloss:-8.1711 dloss:0.2651 d

Episode:4017 meanR:0.5400 rate:0.0000 gloss:-17.1381 dloss:2.4452 dlossR:0.1219 dlossQ:2.3233
Episode:4018 meanR:0.5500 rate:0.0769 gloss:-2.6444 dloss:0.7676 dlossR:0.3241 dlossQ:0.4435
Episode:4019 meanR:0.5100 rate:-0.1538 gloss:-2.8263 dloss:0.2086 dlossR:-0.1734 dlossQ:0.3821
Episode:4020 meanR:0.5500 rate:0.3077 gloss:-2.6586 dloss:1.1651 dlossR:0.7612 dlossQ:0.4039
Episode:4021 meanR:0.5200 rate:-0.0769 gloss:-2.9605 dloss:0.3252 dlossR:-0.0368 dlossQ:0.3620
Episode:4022 meanR:0.5000 rate:-0.2308 gloss:-3.6185 dloss:-0.2556 dlossR:-0.5272 dlossQ:0.2716
Episode:4023 meanR:0.5000 rate:-0.1538 gloss:-4.0583 dloss:-0.1654 dlossR:-0.3914 dlossQ:0.2260
Episode:4024 meanR:0.4700 rate:0.0000 gloss:-4.0846 dloss:0.2816 dlossR:0.0605 dlossQ:0.2211
Episode:4025 meanR:0.4700 rate:0.0000 gloss:-4.9279 dloss:0.2045 dlossR:0.0389 dlossQ:0.1656
Episode:4026 meanR:0.4800 rate:0.0769 gloss:-5.1166 dloss:0.4547 dlossR:0.3179 dlossQ:0.1368
Episode:4027 meanR:0.4700 rate:-0.1538 gloss:-5.2754 dloss:

Episode:4104 meanR:0.1000 rate:-0.0769 gloss:-6.3486 dloss:-0.2485 dlossR:-0.3394 dlossQ:0.0909
Episode:4105 meanR:0.1100 rate:0.0000 gloss:-6.5053 dloss:0.0824 dlossR:0.0121 dlossQ:0.0703
Episode:4106 meanR:0.1200 rate:0.0769 gloss:-6.3382 dloss:0.4379 dlossR:0.3675 dlossQ:0.0704
Episode:4107 meanR:0.1100 rate:-0.0769 gloss:-5.7423 dloss:-0.2112 dlossR:-0.3051 dlossQ:0.0939
Episode:4108 meanR:0.0700 rate:-0.1538 gloss:-6.6659 dloss:-0.6735 dlossR:-0.7354 dlossQ:0.0620
Episode:4109 meanR:0.0500 rate:-0.0769 gloss:-7.3200 dloss:-0.3518 dlossR:-0.4019 dlossQ:0.0500
Episode:4110 meanR:0.0400 rate:-0.0769 gloss:-6.4944 dloss:-0.2794 dlossR:-0.3515 dlossQ:0.0721
Episode:4111 meanR:0.0200 rate:-0.0769 gloss:-9.7434 dloss:-0.4789 dlossR:-0.5360 dlossQ:0.0571
Episode:4112 meanR:0.0100 rate:0.0769 gloss:-14.5279 dloss:0.8825 dlossR:0.8210 dlossQ:0.0615
Episode:4113 meanR:0.0200 rate:0.0769 gloss:-17.3333 dloss:1.0256 dlossR:0.9746 dlossQ:0.0510
Episode:4114 meanR:-0.0100 rate:-0.2308 gloss:-13.

Episode:4191 meanR:0.1600 rate:-0.0769 gloss:-6.6584 dloss:-0.0781 dlossR:-0.3268 dlossQ:0.2486
Episode:4192 meanR:0.1900 rate:0.0000 gloss:-7.6414 dloss:0.2761 dlossR:0.0400 dlossQ:0.2362
Episode:4193 meanR:0.2300 rate:0.0000 gloss:-6.0934 dloss:0.2584 dlossR:0.0460 dlossQ:0.2124
Episode:4194 meanR:0.2200 rate:0.0769 gloss:-6.1518 dloss:0.6395 dlossR:0.3940 dlossQ:0.2455
Episode:4195 meanR:0.1900 rate:-0.1538 gloss:-6.3703 dloss:-0.4755 dlossR:-0.6766 dlossQ:0.2010
Episode:4196 meanR:0.1800 rate:0.0000 gloss:-4.2373 dloss:0.4027 dlossR:0.0768 dlossQ:0.3260
Episode:4197 meanR:0.1700 rate:0.0000 gloss:-4.2938 dloss:0.2607 dlossR:0.0529 dlossQ:0.2079
Episode:4198 meanR:0.2400 rate:0.3077 gloss:-3.9326 dloss:1.1903 dlossR:0.9485 dlossQ:0.2418
Episode:4199 meanR:0.2900 rate:0.3077 gloss:-4.0423 dloss:1.2043 dlossR:0.9699 dlossQ:0.2344
Episode:4200 meanR:0.3000 rate:0.2308 gloss:-4.5145 dloss:1.0033 dlossR:0.8096 dlossQ:0.1937
Episode:4201 meanR:0.3100 rate:0.0000 gloss:-4.1428 dloss:0.2926

Episode:4279 meanR:0.2500 rate:0.0000 gloss:-7.2681 dloss:0.0419 dlossR:0.0057 dlossQ:0.0362
Episode:4280 meanR:0.2300 rate:-0.0769 gloss:-7.3122 dloss:-0.3621 dlossR:-0.4033 dlossQ:0.0412
Episode:4281 meanR:0.2200 rate:-0.0769 gloss:-6.8032 dloss:-0.3184 dlossR:-0.3720 dlossQ:0.0536
Episode:4282 meanR:0.1900 rate:0.0769 gloss:-5.7321 dloss:0.4505 dlossR:0.3409 dlossQ:0.1096
Episode:4283 meanR:0.1600 rate:0.0000 gloss:-8.5354 dloss:0.0284 dlossR:0.0035 dlossQ:0.0249
Episode:4284 meanR:0.1900 rate:0.3077 gloss:-9.3432 dloss:2.1226 dlossR:2.1003 dlossQ:0.0223
Episode:4285 meanR:0.2200 rate:0.1538 gloss:-9.1358 dloss:1.0503 dlossR:1.0269 dlossQ:0.0234
Episode:4286 meanR:0.2300 rate:0.0000 gloss:-9.7945 dloss:0.0218 dlossR:0.0025 dlossQ:0.0194
Episode:4287 meanR:0.2200 rate:-0.0769 gloss:-10.2638 dloss:-0.5565 dlossR:-0.5729 dlossQ:0.0163
Episode:4288 meanR:0.2500 rate:0.2308 gloss:-9.3017 dloss:1.5967 dlossR:1.5682 dlossQ:0.0284
Episode:4289 meanR:0.2300 rate:0.0000 gloss:-8.6039 dloss:0.

Episode:4367 meanR:0.5600 rate:0.2308 gloss:-2.6775 dloss:1.0040 dlossR:0.5980 dlossQ:0.4060
Episode:4368 meanR:0.6100 rate:0.3077 gloss:-3.2519 dloss:1.2717 dlossR:0.8701 dlossQ:0.4016
Episode:4369 meanR:0.6600 rate:0.3077 gloss:-3.0754 dloss:1.2554 dlossR:0.8439 dlossQ:0.4115
Episode:4370 meanR:0.6500 rate:-0.0769 gloss:-3.7593 dloss:0.3235 dlossR:-0.0843 dlossQ:0.4078
Episode:4371 meanR:0.6500 rate:0.0769 gloss:-7.4076 dloss:0.7993 dlossR:0.4969 dlossQ:0.3024
Episode:4372 meanR:0.6400 rate:0.0000 gloss:-3.8109 dloss:0.5772 dlossR:0.1389 dlossQ:0.4383
Episode:4373 meanR:0.6400 rate:0.0000 gloss:-6.1305 dloss:0.7666 dlossR:0.0898 dlossQ:0.6768
Episode:4374 meanR:0.6800 rate:0.2308 gloss:-2.8847 dloss:1.0587 dlossR:0.6412 dlossQ:0.4175
Episode:4375 meanR:0.6800 rate:0.0000 gloss:-3.3493 dloss:0.4083 dlossR:0.0920 dlossQ:0.3162
Episode:4376 meanR:0.7100 rate:0.3077 gloss:-2.6219 dloss:1.1919 dlossR:0.7594 dlossQ:0.4325
Episode:4377 meanR:0.7400 rate:0.0769 gloss:-2.4911 dloss:0.7299 dlo

Episode:4455 meanR:0.8400 rate:0.0000 gloss:-4.7260 dloss:0.6029 dlossR:0.1608 dlossQ:0.4422
Episode:4456 meanR:0.8600 rate:0.1538 gloss:-6.6072 dloss:0.8403 dlossR:0.7569 dlossQ:0.0834
Episode:4457 meanR:0.8500 rate:0.0000 gloss:-6.3274 dloss:0.1042 dlossR:0.0169 dlossQ:0.0873
Episode:4458 meanR:0.8100 rate:0.0769 gloss:-5.8664 dloss:0.4406 dlossR:0.3469 dlossQ:0.0938
Episode:4459 meanR:0.7900 rate:-0.0769 gloss:-5.5769 dloss:-0.1913 dlossR:-0.2924 dlossQ:0.1011
Episode:4460 meanR:0.7900 rate:-0.0769 gloss:-5.7890 dloss:-0.2179 dlossR:-0.3068 dlossQ:0.0889
Episode:4461 meanR:0.8100 rate:0.2308 gloss:-9.5468 dloss:1.6458 dlossR:1.6098 dlossQ:0.0360
Episode:4462 meanR:0.8000 rate:0.0000 gloss:-6.7586 dloss:0.0835 dlossR:0.0119 dlossQ:0.0716
Episode:4463 meanR:0.7800 rate:-0.0769 gloss:-5.6898 dloss:-0.1859 dlossR:-0.2968 dlossQ:0.1108
Episode:4464 meanR:0.7300 rate:-0.0769 gloss:-5.1024 dloss:-0.1258 dlossR:-0.2577 dlossQ:0.1319
Episode:4465 meanR:0.7100 rate:-0.0769 gloss:-5.5280 dloss

Episode:4543 meanR:0.1900 rate:0.2308 gloss:-8.5666 dloss:1.5043 dlossR:1.4496 dlossQ:0.0547
Episode:4544 meanR:0.1900 rate:0.0000 gloss:-7.4980 dloss:0.0535 dlossR:0.0074 dlossQ:0.0461
Episode:4545 meanR:0.2100 rate:0.1538 gloss:-7.6572 dloss:0.9163 dlossR:0.8670 dlossQ:0.0492
Episode:4546 meanR:0.2000 rate:-0.0769 gloss:-7.3862 dloss:-0.3455 dlossR:-0.4035 dlossQ:0.0580
Episode:4547 meanR:0.2100 rate:0.0769 gloss:-8.4671 dloss:0.5297 dlossR:0.4819 dlossQ:0.0478
Episode:4548 meanR:0.1900 rate:0.0000 gloss:-6.5360 dloss:0.0951 dlossR:0.0142 dlossQ:0.0810
Episode:4549 meanR:0.1800 rate:0.0000 gloss:-5.4224 dloss:0.1580 dlossR:0.0279 dlossQ:0.1300
Episode:4550 meanR:0.1600 rate:0.0000 gloss:-6.8029 dloss:0.0590 dlossR:0.0083 dlossQ:0.0507
Episode:4551 meanR:0.1600 rate:0.0000 gloss:-6.2587 dloss:0.1009 dlossR:0.0155 dlossQ:0.0854
Episode:4552 meanR:0.1500 rate:-0.0769 gloss:-6.7583 dloss:-0.2964 dlossR:-0.3661 dlossQ:0.0698
Episode:4553 meanR:0.1100 rate:-0.0769 gloss:-7.2608 dloss:-0.33

Episode:4631 meanR:0.3400 rate:0.0000 gloss:-2.9495 dloss:0.5915 dlossR:0.1530 dlossQ:0.4385
Episode:4632 meanR:0.3500 rate:0.0769 gloss:-7.2490 dloss:1.0486 dlossR:0.4982 dlossQ:0.5504
Episode:4633 meanR:0.3700 rate:0.0769 gloss:-3.6919 dloss:0.7191 dlossR:0.3300 dlossQ:0.3892
Episode:4634 meanR:0.3900 rate:0.1538 gloss:-2.8320 dloss:0.9235 dlossR:0.4793 dlossQ:0.4441
Episode:4635 meanR:0.4200 rate:0.0000 gloss:-2.9994 dloss:0.5426 dlossR:0.1359 dlossQ:0.4068
Episode:4636 meanR:0.4200 rate:0.0769 gloss:-3.3063 dloss:0.7476 dlossR:0.3313 dlossQ:0.4163
Episode:4637 meanR:0.4400 rate:0.3077 gloss:-2.8460 dloss:1.2695 dlossR:0.8052 dlossQ:0.4643
Episode:4638 meanR:0.4300 rate:-0.0769 gloss:-2.6464 dloss:0.4325 dlossR:0.0106 dlossQ:0.4219
Episode:4639 meanR:0.4300 rate:0.0000 gloss:-2.9549 dloss:0.5120 dlossR:0.1302 dlossQ:0.3817
Episode:4640 meanR:0.4300 rate:-0.0769 gloss:-2.9357 dloss:0.3284 dlossR:-0.0395 dlossQ:0.3678
Episode:4641 meanR:0.4700 rate:0.2308 gloss:-2.7349 dloss:0.9925 dl

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / N 

In [None]:
eps, arr = np.array(rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(loss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Average losses')

In [33]:
# # import gym
# # # env = gym.make('CartPole-v0')
# # env = gym.make('CartPole-v1')
# # # env = gym.make('Acrobot-v1')
# # # env = gym.make('MountainCar-v0')
# # # env = gym.make('Pendulum-v0')
# # # env = gym.make('Blackjack-v0')
# # # env = gym.make('FrozenLake-v0')
# # # env = gym.make('AirRaid-ram-v0')
# # # env = gym.make('AirRaid-v0')
# # # env = gym.make('BipedalWalker-v2')
# # # env = gym.make('Copy-v0')
# # # env = gym.make('CarRacing-v0')
# # # env = gym.make('Ant-v2') #mujoco
# # # env = gym.make('FetchPickAndPlace-v1') # mujoco required!

# with tf.Session() as sess:
#     #sess.run(tf.global_variables_initializer())
#     saver.restore(sess, 'checkpoints/model-nav.ckpt')    
#     #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    
#     # Episodes/epochs
#     for _ in range(1):
#         state = env.reset()
#         total_reward = 0

#         # Steps/batches
#         #for _ in range(111111111111111111):
#         while True:
#             env.render()
#             action_logits = sess.run(model.actions_logits, feed_dict={model.states: np.reshape(state, [1, -1])})
#             action = np.argmax(action_logits)
#             state, reward, done, _ = env.step(action)
#             total_reward += reward
#             if done:
#                 break
                
#         # Closing the env
#         print('total_reward: {:.2f}'.format(total_reward))
#         env.close()