# Continuous Control

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the second project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Reacher.app"`
- **Windows** (x86): `"path/to/Reacher_Windows_x86/Reacher.exe"`
- **Windows** (x86_64): `"path/to/Reacher_Windows_x86_64/Reacher.exe"`
- **Linux** (x86): `"path/to/Reacher_Linux/Reacher.x86"`
- **Linux** (x86_64): `"path/to/Reacher_Linux/Reacher.x86_64"`
- **Linux** (x86, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86"`
- **Linux** (x86_64, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86_64"`

For instance, if you are using a Mac, then you downloaded `Reacher.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Reacher.app")
```

In [2]:
# env = UnityEnvironment(file_name='/home/arasdar/unity-envs/Reacher_Linux/Reacher.x86_64')
# env = UnityEnvironment(file_name='/home/arasdar/unity-envs/Reacher_Linux_v1/Reacher.x86_64')
env = UnityEnvironment(file_name='/home/arasdar/unity-envs/Reacher_Linux_OneAgent/Reacher_Linux/Reacher.x86_64')

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, a double-jointed arm can move to target locations. A reward of `+0.1` is provided for each step that the agent's hand is in the goal location. Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of `33` variables corresponding to position, rotation, velocity, and angular velocities of the arm.  Each action is a vector with four numbers, corresponding to torque applicable to two joints.  Every entry in the action vector must be a number between `-1` and `1`.

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

Number of agents: 1
Size of each action: 4
There are 1 agents. Each observes a state with length: 33
The state for the first agent looks like: [ 0.00000000e+00 -4.00000000e+00  0.00000000e+00  1.00000000e+00
 -0.00000000e+00 -0.00000000e+00 -4.37113883e-08  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00 -1.00000000e+01  0.00000000e+00
  1.00000000e+00 -0.00000000e+00 -0.00000000e+00 -4.37113883e-08
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  5.75471878e+00 -1.00000000e+00
  5.55726671e+00  0.00000000e+00  1.00000000e+00  0.00000000e+00
 -1.68164849e-01]


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [5]:
env_info = env.reset(train_mode=False)[brain_name]     # reset the environment    
states = env_info.vector_observations                  # get the current state (for each agent)
scores = np.zeros(num_agents)                          # initialize the score (for each agent)
while True:
    actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
    actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
    env_info = env.step(actions)[brain_name]           # send all actions to tne environment
    next_states = env_info.vector_observations         # get next state (for each agent)
    rewards = env_info.rewards                         # get reward (for each agent)
    dones = env_info.local_done                        # see if episode finished
    scores += env_info.rewards                         # update the score (for each agent)
    states = next_states                               # roll over states to next time step
    if np.any(dones):                                  # exit loop if episode finished
        break
print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

Total score (averaged over agents) this episode: 0.09999999776482582


When finished, you can close the environment.

In [6]:
# env.close()

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [7]:
# Testing the train mode
env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
state = env_info.vector_observations[0]                  # get the current state (for each agent)
#scores = np.zeros(num_agents)                          # initialize the score (for each agent)
while True:
    action = np.random.randn(num_agents, action_size) # select an action (for each agent)
    #print(action)
    action = np.clip(action, -1, 1)                  # all actions between -1 and 1
    #print(action)
    env_info = env.step(action)[brain_name]           # send all actions to tne environment
    next_state = env_info.vector_observations[0]         # get next state (for each agent)
    reward = env_info.rewards[0]                         # get reward (for each agent)
    done = env_info.local_done[0]                        # see if episode finished
    #scores += env_info.rewards                         # update the score (for each agent)
    state = next_state                               # roll over states to next time step
    if done is True:                                  # exit loop if episode finished
        #print(action.shape, reward)
        #print(done)
        break
# print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

## Option 1: Solve the First Version
The task is episodic, and in order to solve the environment, your agent must get an average score of +30 over 100 consecutive episodes.

In [8]:
# In this one we should define and detect GPUs for tensorflow
# GPUs or CPU
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.7.1
Default GPU Device: 


In [9]:
def model_input(state_size, action_size):
    #states = tf.placeholder(tf.float32, [None, *state_size], name='states')
    states = tf.placeholder(tf.float64, [None, state_size], name='states')
    actions = tf.placeholder(tf.float64, [None, action_size], name='actions')
    targetQs = tf.placeholder(tf.float64, [None], name='targetQs')
    return states, actions, targetQs

In [10]:
# Generator/Controller: Generating/prediting the actions
def generator(states, action_size, hidden_size, reuse=False, alpha=0.1, training=False):
    with tf.variable_scope('generator', reuse=reuse):
        # First fully connected layer
        h1 = tf.layers.dense(inputs=states, units=hidden_size)
        bn1 = tf.layers.batch_normalization(h1, training=training)        
        nl1 = tf.maximum(alpha * bn1, bn1)
        
        # Second fully connected layer
        h2 = tf.layers.dense(inputs=nl1, units=hidden_size)
        bn2 = tf.layers.batch_normalization(h2, training=training)        
        nl2 = tf.maximum(alpha * bn2, bn2)
        
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=action_size)        
        #predictions = tf.nn.softmax(logits)

        # return actions logits
        return logits

In [11]:
# Discriminator/Dopamine: Reward function/planner/naviator/advisor/supervisor/cortical columns
def discriminator(states, actions, hidden_size, reuse=False, alpha=0.1, training=False):
    with tf.variable_scope('discriminator', reuse=reuse):
        # Fusion/merge states and actions/ SA/ SM
        x_fused = tf.concat(axis=1, values=[states, actions])
        
        # First fully connected layer
        h1 = tf.layers.dense(inputs=x_fused, units=hidden_size)
        bn1 = tf.layers.batch_normalization(h1, training=training)        
        nl1 = tf.maximum(alpha * bn1, bn1)
        
        # Second fully connected layer
        h2 = tf.layers.dense(inputs=nl1, units=hidden_size)
        bn2 = tf.layers.batch_normalization(h2, training=training)        
        nl2 = tf.maximum(alpha * bn2, bn2)
        
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=1)        
        #predictions = tf.nn.softmax(logits)

        # return rewards logits
        return logits

In [23]:
def model_loss(action_size, hidden_size, states, actions, targetQs):
    actions_logits = generator(states=states, hidden_size=hidden_size, action_size=action_size)
    neg_log_prob_actions = tf.nn.sigmoid_cross_entropy_with_logits(logits=actions_logits,
                                                                   labels=tf.nn.sigmoid(actions))
    gQs = discriminator(actions=actions_logits, hidden_size=hidden_size, states=states) # nextQs
    gloss = tf.reduce_mean(neg_log_prob_actions * tf.reshape(targetQs, shape=[-1, 1])) # DPG
    dQs = discriminator(actions=actions, hidden_size=hidden_size, states=states, reuse=True) # Qs
    dloss = tf.reduce_mean(tf.square(dQs - targetQs)) # DQN
    gloss1 = tf.reduce_mean(neg_log_prob_actions)
    gloss2 = tf.reduce_mean(gQs)
    gloss3 = tf.reduce_mean(dQs)
    gloss4 = tf.reduce_mean(targetQs)
    return actions_logits, gQs, gloss, dloss, gloss1, gloss2, gloss3, gloss4

In [24]:
# Optimizating/training/learning G & D
def model_opt(g_loss, d_loss, learning_rate):
    # Get weights and bias to update
    t_vars = tf.trainable_variables()
    g_vars = [var for var in t_vars if var.name.startswith('generator')]
    d_vars = [var for var in t_vars if var.name.startswith('discriminator')]

    # Optimize
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): # Required for batchnorm (BN)
        g_opt = tf.train.AdamOptimizer(learning_rate).minimize(g_loss, var_list=g_vars)
        d_opt = tf.train.AdamOptimizer(learning_rate).minimize(d_loss, var_list=d_vars)

    return g_opt, d_opt

In [25]:
class Model:
    def __init__(self, state_size, action_size, hidden_size, learning_rate):

        # Data of the Model: make the data available inside the framework
        self.states, self.actions, self.targetQs = model_input(state_size=state_size, action_size=action_size)

        # Create the Model: calculating the loss and forwad pass
        self.actions_logits, self.Qs_logits, self.g_loss, self.d_loss, self.g_loss1, self.g_loss2, self.g_loss3, self.g_loss4 = model_loss(
            action_size=action_size, hidden_size=hidden_size, # model init parameters
            states=self.states, actions=self.actions, targetQs=self.targetQs) # model input
        
        # Update the model: backward pass and backprop
        self.g_opt, self.d_opt = model_opt(g_loss=self.g_loss, d_loss=self.d_loss, learning_rate=learning_rate)

In [26]:
from collections import deque
class Memory():
    def __init__(self, max_size = 1000):
        self.buffer = deque(maxlen=max_size)
    def sample(self, batch_size):
        idx = np.random.choice(np.arange(len(self.buffer)), 
                               size=batch_size, 
                               replace=False)
        return [self.buffer[ii] for ii in idx]

In [27]:
print('state size:{}'.format(states.shape), 
      'actions:{}'.format(actions.shape)) 
print('action size:{}'.format(np.max(actions) - np.min(actions)+1))

state size:(100, 33) actions:(100, 4)
action size:7.04159214216411


In [28]:
# Exploration parameters
explore_start = 1.0            # exploration probability at start
explore_stop = 0.01            # minimum exploration probability 
decay_rate = 0.0001            # exponential decay rate for exploration prob

# Network parameters
state_size = 33
action_size = 4
hidden_size = 33*2             # number of units in each Q-network hidden layer
learning_rate = 0.0001         # Q-network learning rate

# Memory parameters
memory_size = 10000            # memory capacity
batch_size = 100               # experience mini-batch size
gamma = 0.99                   # future reward discount

In [29]:
# Reset/init the graph/session
graph = tf.reset_default_graph()

# Init the model
model = Model(action_size=action_size, state_size=state_size, hidden_size=hidden_size, learning_rate=learning_rate)

# Init the memory
memory = Memory(max_size=memory_size)

In [30]:
# Initializing the memory buffer
env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
state = env_info.vector_observations[0]                  # get the current state (for each agent)
for _ in range(memory_size):
    action = np.random.randn(num_agents, action_size) # select an action (for each agent)
    #action = np.clip(action, -1, 1)                  # all actions between -1 and 1
    env_info = env.step(action)[brain_name]           # send all actions to tne environment
    next_state = env_info.vector_observations[0]         # get next state (for each agent)
    reward = env_info.rewards[0]                         # get reward (for each agent)
    done = env_info.local_done[0]                        # see if episode finished
    memory.buffer.append([state, action.reshape([-1]), next_state, reward, float(done)])
    #print(state.shape, action.reshape([-1]).shape, reward, float(done))
    state = next_state                               # roll over states to next time step
    if done is True:                                  # exit loop if episode finished
        #print(done)
        env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
        state = env_info.vector_observations[0]                  # get the current state (for each agent)
        break

In [31]:
# len(memory.buffer), memory.buffer[100]

In [None]:
# Save/load the model and save for plotting
saver = tf.train.Saver()
episode_rewards_list, rewards_list, gloss_list, dloss_list = [], [], [], []

# TF session for training
with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    total_step = 0 # Explore or exploit parameter
    episode_reward = deque(maxlen=100) # 100 episodes average/running average/running mean/window
    
    # Training episodes/epochs
    for ep in range(11111):
        total_reward = 0
        gloss_batch, dloss_batch = [], []
        gloss1_batch, gloss2_batch, gloss3_batch, gloss4_batch = [], [], [], []
        #state = env.reset()
        env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
        state = env_info.vector_observations[0]                  # get the current state (for each agent)

        # Training steps/batches
        while True:
            # Explore (Env) or Exploit (Model)
            total_step += 1
            explore_p = explore_stop + (explore_start - explore_stop) * np.exp(-decay_rate * total_step) 
            if explore_p > np.random.rand():
                #action = env.action_space.sample()
                action = np.random.randn(num_agents, action_size) # select an action (for each agent)
            else:
                action = sess.run(model.actions_logits, feed_dict={model.states: state.reshape([1, -1])})
                #action = np.clip(action_logits, -1, 1)                  # all actions between -1 and 1
            #print(action.shape)
            #action = np.reshape(action_logits, [-1]) # For continuous action space
            #action = np.argmax(action_logits) # For discrete action space
            #next_state, reward, done, _ = env.step(action)
            env_info = env.step(action)[brain_name]           # send all actions to tne environment
            next_state = env_info.vector_observations[0]         # get next state (for each agent)
            reward = env_info.rewards[0]                         # get reward (for each agent)
            done = env_info.local_done[0]                        # see if episode finished
            memory.buffer.append([state, action.reshape([-1]), next_state, reward, float(done)])
            total_reward += reward
            state = next_state

            # Training
            batch = memory.sample(batch_size)
            states = np.array([each[0] for each in batch])
            actions = np.array([each[1] for each in batch])
            next_states = np.array([each[2] for each in batch])
            rewards = np.array([each[3] for each in batch])
            dones = np.array([each[4] for each in batch])
            nextQs_logits = sess.run(model.Qs_logits, feed_dict = {model.states: next_states})
            #nextQs = np.max(nextQs_logits, axis=1) * (1-dones)
            nextQs = nextQs_logits.reshape([-1]) * (1-dones)
            targetQs = rewards + (gamma * nextQs)
            gloss, dloss, gloss1, gloss2, gloss3, gloss4, _, _ = sess.run([model.g_loss, model.d_loss,
                                                                           model.g_loss1, model.g_loss2, 
                                                                           model.g_loss3, model.g_loss4,
                                                                           model.g_opt, model.d_opt],
                                                                          feed_dict = {model.states: states, 
                                                                                       model.actions: actions,
                                                                                       model.targetQs: targetQs})
            gloss_batch.append(gloss)
            dloss_batch.append(dloss)
            gloss1_batch.append(gloss1)
            gloss2_batch.append(gloss2)
            gloss3_batch.append(gloss3)
            gloss4_batch.append(gloss4)
            if done is True:
                break
                
        episode_reward.append(total_reward)
        print('Episode:{}'.format(ep),
              'meanR:{:.4f}'.format(np.mean(episode_reward)),
              'R:{:.4f}'.format(total_reward),
              'gloss:{:.4f}'.format(np.mean(gloss_batch)),
              'gloss1:{:.4f}'.format(np.mean(gloss1_batch)), #-logp
              'gloss2:{:.4f}'.format(np.mean(gloss2_batch)),#gQs
              'gloss3:{:.4f}'.format(np.mean(gloss3_batch)),#dQs
              'gloss3:{:.4f}'.format(np.mean(gloss4_batch)),#tgtQs
              'dloss:{:.4f}'.format(np.mean(dloss_batch)),
              'exploreP:{:.4f}'.format(explore_p))

        # Ploting out
        episode_rewards_list.append([ep, np.mean(episode_reward)])
        rewards_list.append([ep, total_reward])
        gloss_list.append([ep, np.mean(gloss_batch)])
        dloss_list.append([ep, np.mean(dloss_batch)])
        # Break episode/epoch loop
        ## Option 1: Solve the First Version
        #The task is episodic, and in order to solve the environment, 
        #your agent must get an average score of +30 over 100 consecutive episodes.        
        if np.mean(episode_reward) >= +30:
            break
            
    # At the end of all training episodes/epochs
    saver.save(sess, 'checkpoints/model.ckpt')

Episode:0 meanR:0.0000 R:0.0000 gloss:-56690957.3866 gloss1:235.6740 gloss2:-53106.0561 gloss3:-757.9370 gloss3:-52563.8088 dloss:17872345392.9994 exploreP:0.9057
Episode:1 meanR:0.0000 R:0.0000 gloss:-197091622555.8232 gloss1:5465.1795 gloss2:-15010665.4542 gloss3:-248752.1567 gloss3:-14863050.8328 dloss:699288525925114.5000 exploreP:0.8204
Episode:2 meanR:0.0733 R:0.2200 gloss:-21823996820796.5352 gloss1:27200.6082 gloss2:-400135213.5598 gloss3:-9933475.8035 gloss3:-395184196.4187 dloss:385404671158926400.0000 exploreP:0.7432
Episode:3 meanR:0.1325 R:0.3100 gloss:-538003670743966.0625 gloss1:75349.3380 gloss2:-3934707667.5229 gloss3:-124382224.0168 gloss3:-3884383901.9273 dloss:31328690181519925248.0000 exploreP:0.6734
Episode:4 meanR:0.1060 R:0.0000 gloss:-4075030265848659.0000 gloss1:156994.0983 gloss2:-15647228332.9122 gloss3:-677526345.4778 gloss3:-15461796004.1642 dloss:393358503054281211904.0000 exploreP:0.6102
Episode:5 meanR:0.3067 R:1.3100 gloss:-15461075627377090.0000 gloss

Episode:39 meanR:0.6852 R:0.1600 gloss:-5950279075204475.0000 gloss1:2112909.8290 gloss2:-1879180914.0281 gloss3:-1496565686.1064 gloss3:-1854041948.1802 dloss:5927945300539445248.0000 exploreP:0.0281
Episode:40 meanR:0.6863 R:0.7300 gloss:-730666161631881.3750 gloss1:2133547.2653 gloss2:-294555099.3937 gloss3:-245113898.4305 gloss3:-290898349.1692 dloss:174903036122237376.0000 exploreP:0.0263
Episode:41 meanR:0.7050 R:1.4700 gloss:-552521659686737.9375 gloss1:2003321.1653 gloss2:-254036869.0358 gloss3:-217981549.3147 gloss3:-251197653.8609 dloss:169197490305222368.0000 exploreP:0.0248
Episode:42 meanR:0.7298 R:1.7700 gloss:-433960651146850.6250 gloss1:1819873.2931 gloss2:-227853663.6442 gloss3:-202071817.0506 gloss3:-225322411.1795 dloss:153482996823589824.0000 exploreP:0.0234
Episode:43 meanR:0.7320 R:0.8300 gloss:-412097321156087.3750 gloss1:1620295.3449 gloss2:-247283599.6357 gloss3:-225006992.2613 gloss3:-244300146.0609 dloss:186940641841459328.0000 exploreP:0.0221
Episode:44 mean

Episode:82 meanR:0.8435 R:1.3700 gloss:-1103291120426.5156 gloss1:2842969.8915 gloss2:-375732.7735 gloss3:-290817.8137 gloss3:-372556.4266 dloss:1256748540273.1531 exploreP:0.0102
Episode:83 meanR:0.8507 R:1.4500 gloss:4313759711120.8677 gloss1:2962653.0725 gloss2:914848.4433 gloss3:571513.2810 gloss3:903980.5216 dloss:2733253532060.5986 exploreP:0.0102
Episode:84 meanR:0.8418 R:0.0900 gloss:8725430658462.0244 gloss1:2435195.0976 gloss2:2676362.7383 gloss3:1785602.2378 gloss3:2647886.0758 dloss:8591472576955.8311 exploreP:0.0102
Episode:85 meanR:0.8383 R:0.5400 gloss:5885088550760.6895 gloss1:2448487.0602 gloss2:1819196.4852 gloss3:1259640.2242 gloss3:1799199.6710 dloss:3538508191859.0981 exploreP:0.0102
Episode:86 meanR:0.8360 R:0.6400 gloss:4336575155631.1528 gloss1:2455046.4480 gloss2:1312432.3495 gloss3:908527.9252 gloss3:1299360.5761 dloss:2327373632222.4478 exploreP:0.0102
Episode:87 meanR:0.8391 R:1.1100 gloss:2198415832990.9600 gloss1:2180041.8437 gloss2:720991.1062 gloss3:5132

Episode:129 meanR:0.8832 R:0.6000 gloss:-332730509055.4635 gloss1:1407127.9640 gloss2:-219878.4001 gloss3:-147576.2344 gloss3:-217441.4297 dloss:330013400347.1296 exploreP:0.0100
Episode:130 meanR:0.8864 R:1.6200 gloss:233711043061.3415 gloss1:1465732.7106 gloss2:112984.5211 gloss3:74051.6955 gloss3:111982.5789 dloss:174701456607.5004 exploreP:0.0100
Episode:131 meanR:0.8845 R:1.0700 gloss:-6940542373.2180 gloss1:1273928.8485 gloss2:33156.1344 gloss3:21616.4068 gloss3:33208.0038 dloss:187556264606.1618 exploreP:0.0100
Episode:132 meanR:0.8890 R:0.7600 gloss:640891441412.0739 gloss1:1321933.4916 gloss2:305113.7307 gloss3:200288.0813 gloss3:301971.3152 dloss:411438028260.8167 exploreP:0.0100
Episode:133 meanR:0.8934 R:0.7700 gloss:96049449352.0063 gloss1:1218279.6352 gloss2:72140.3076 gloss3:48861.4023 gloss3:71509.1895 dloss:140479094995.0535 exploreP:0.0100
Episode:134 meanR:0.8895 R:0.7200 gloss:-60532259753.5655 gloss1:1280792.7179 gloss2:-28784.7647 gloss3:-21892.8483 gloss3:-28288.

Episode:176 meanR:0.9502 R:0.7000 gloss:-161280544026.1105 gloss1:843428.5501 gloss2:-166549.2327 gloss3:-119337.2933 gloss3:-164603.8653 dloss:97922109352.0168 exploreP:0.0100
Episode:177 meanR:0.9519 R:0.6500 gloss:-30318258282.3437 gloss1:776149.1958 gloss2:-20775.3179 gloss3:-14290.7018 gloss3:-21097.7161 dloss:102654222841.5441 exploreP:0.0100
Episode:178 meanR:0.9484 R:0.0600 gloss:-649901110301.0834 gloss1:918533.5376 gloss2:-574986.4807 gloss3:-374811.1359 gloss3:-568617.7454 dloss:537144737714.6252 exploreP:0.0100
Episode:179 meanR:0.9583 R:1.5700 gloss:-816877182241.6438 gloss1:967648.0907 gloss2:-718799.2160 gloss3:-482832.6986 gloss3:-711276.6401 dloss:662771699902.1765 exploreP:0.0100
Episode:180 meanR:0.9482 R:0.4100 gloss:-502224858378.8238 gloss1:967501.9998 gloss2:-425521.4002 gloss3:-288938.0896 gloss3:-420747.9312 dloss:380716200821.8192 exploreP:0.0100
Episode:181 meanR:0.9393 R:0.4100 gloss:-182293439300.3600 gloss1:1096785.5006 gloss2:-150693.7894 gloss3:-101200.1

Episode:223 meanR:0.8988 R:1.5800 gloss:-127572423468.2816 gloss1:2674374.9805 gloss2:-51414.6970 gloss3:-37104.6297 gloss3:-50410.6171 dloss:350643287917.9797 exploreP:0.0100
Episode:224 meanR:0.9085 R:1.2200 gloss:-217753058822.8239 gloss1:2508032.4770 gloss2:-49619.8489 gloss3:-36293.2165 gloss3:-49149.8487 dloss:100581952352.9607 exploreP:0.0100
Episode:225 meanR:0.9063 R:0.6800 gloss:-172724680071.4601 gloss1:2333597.5070 gloss2:-23442.2094 gloss3:-18098.8772 gloss3:-23379.9286 dloss:138660668933.6297 exploreP:0.0100
Episode:226 meanR:0.9008 R:0.7300 gloss:54316048356.4335 gloss1:2039933.2946 gloss2:-16526.1090 gloss3:-12577.1180 gloss3:-15992.1922 dloss:191701729043.7051 exploreP:0.0100
Episode:227 meanR:0.9101 R:1.4400 gloss:42418789332.9798 gloss1:2072678.5684 gloss2:-17425.1748 gloss3:-14041.8711 gloss3:-17401.5419 dloss:255894075512.3324 exploreP:0.0100
Episode:228 meanR:0.9027 R:0.6400 gloss:-102768071335.1940 gloss1:1826592.1834 gloss2:-25657.0499 gloss3:-19033.8319 gloss3:

Episode:270 meanR:0.8760 R:0.2800 gloss:2514704497172.4634 gloss1:1938852.5470 gloss2:918780.6830 gloss3:633608.4012 gloss3:906865.4510 dloss:1046831494772.2832 exploreP:0.0100
Episode:271 meanR:0.8608 R:0.0800 gloss:3081705983433.2905 gloss1:1783127.5715 gloss2:1280696.2693 gloss3:891218.3044 gloss3:1265512.2022 dloss:1494427243536.4255 exploreP:0.0100
Episode:272 meanR:0.8591 R:0.5400 gloss:4592584800775.0674 gloss1:2015495.0479 gloss2:1531965.6776 gloss3:1063393.2184 gloss3:1514140.9771 dloss:2164390058926.1528 exploreP:0.0100
Episode:273 meanR:0.8469 R:0.2500 gloss:2900026613216.8247 gloss1:1876084.1379 gloss2:1025646.5094 gloss3:713548.8255 gloss3:1016129.0722 dloss:1093013800154.0880 exploreP:0.0100
Episode:274 meanR:0.8388 R:0.4300 gloss:1699976886103.9834 gloss1:1938317.1201 gloss2:608160.7209 gloss3:423090.7238 gloss3:601812.9356 dloss:453262400427.6885 exploreP:0.0100
Episode:275 meanR:0.8343 R:0.7100 gloss:1102853022635.3677 gloss1:1682468.4479 gloss2:436201.7192 gloss3:2980

Episode:318 meanR:0.8831 R:0.4500 gloss:41411478060.2340 gloss1:1184281.6354 gloss2:36558.0383 gloss3:24571.7146 gloss3:36013.1081 dloss:19488535094.0737 exploreP:0.0100
Episode:319 meanR:0.8929 R:1.3500 gloss:17439536490.8998 gloss1:1271202.1759 gloss2:15453.4357 gloss3:10714.2951 gloss3:15520.2554 dloss:32852059782.2746 exploreP:0.0100
Episode:320 meanR:0.8919 R:0.8000 gloss:-44411427614.5461 gloss1:1307318.3608 gloss2:-29857.4616 gloss3:-20656.5394 gloss3:-29829.6682 dloss:19146866131.3356 exploreP:0.0100
Episode:321 meanR:0.8858 R:0.2300 gloss:-65515907034.9123 gloss1:1176316.4019 gloss2:-43520.2270 gloss3:-29930.3837 gloss3:-42886.4289 dloss:20187658784.3690 exploreP:0.0100
Episode:322 meanR:0.8886 R:0.4200 gloss:-18784815679.6959 gloss1:1164516.6994 gloss2:-16636.3108 gloss3:-11479.6401 gloss3:-16456.7455 dloss:32626866733.3509 exploreP:0.0100
Episode:323 meanR:0.8787 R:0.5900 gloss:-30314570783.2963 gloss1:1110389.1435 gloss2:-23224.2068 gloss3:-16498.9872 gloss3:-23203.1440 dlo

Episode:367 meanR:0.8403 R:0.4300 gloss:17663475214.6165 gloss1:868773.6050 gloss2:5162.7227 gloss3:2914.7847 gloss3:5121.2636 dloss:13071598177.2636 exploreP:0.0100
Episode:368 meanR:0.8440 R:1.0200 gloss:13882786758.0909 gloss1:805530.7420 gloss2:13034.1095 gloss3:8249.3028 gloss3:12906.4145 dloss:4137828691.5625 exploreP:0.0100
Episode:369 meanR:0.8490 R:0.9200 gloss:24672952164.7669 gloss1:845902.9897 gloss2:18304.1527 gloss3:11751.6075 gloss3:18130.4799 dloss:3819704500.1678 exploreP:0.0100
Episode:370 meanR:0.8518 R:0.5600 gloss:29402165026.5758 gloss1:754242.3143 gloss2:27485.3019 gloss3:18076.0659 gloss3:27244.8910 dloss:8534120813.5428 exploreP:0.0100
Episode:371 meanR:0.8527 R:0.1700 gloss:36786652006.6562 gloss1:797151.7569 gloss2:32414.3440 gloss3:21397.0735 gloss3:32028.4899 dloss:3449647111.6702 exploreP:0.0100
Episode:372 meanR:0.8519 R:0.4600 gloss:64223765038.6398 gloss1:686018.5116 gloss2:66567.1705 gloss3:44007.6029 gloss3:65802.1158 dloss:10567773477.1487 exploreP:0

Episode:417 meanR:0.8803 R:0.7300 gloss:-17105259847.2916 gloss1:272560.0692 gloss2:-45247.4368 gloss3:-28836.0454 gloss3:-44713.6344 dloss:4168841519.2072 exploreP:0.0100
Episode:418 meanR:0.8834 R:0.7600 gloss:-70828939827.9684 gloss1:321249.0153 gloss2:-153413.5140 gloss3:-98916.9978 gloss3:-151671.5202 dloss:25035120719.3592 exploreP:0.0100
Episode:419 meanR:0.8784 R:0.8500 gloss:-34133281408.7380 gloss1:363302.2943 gloss2:-68119.1986 gloss3:-44499.5008 gloss3:-67395.7939 dloss:7490756966.4437 exploreP:0.0100
Episode:420 meanR:0.8836 R:1.3200 gloss:-1085812893.3460 gloss1:359855.9091 gloss2:-2802.5455 gloss3:-1767.9467 gloss3:-2770.1420 dloss:132859269.3594 exploreP:0.0100
Episode:421 meanR:0.8863 R:0.5000 gloss:-64341943.9021 gloss1:388883.1916 gloss2:-1124.6001 gloss3:-663.9080 gloss3:-1143.1217 dloss:1427227026.7204 exploreP:0.0100
Episode:422 meanR:0.9026 R:2.0500 gloss:2803813731.7672 gloss1:409637.1348 gloss2:3129.3888 gloss3:2128.6219 gloss3:3093.6414 dloss:1423437992.4100 e

Episode:467 meanR:0.9706 R:0.9200 gloss:-4828101428.6885 gloss1:307517.6416 gloss2:-10534.6240 gloss3:-6929.5542 gloss3:-10438.2665 dloss:482839749.6356 exploreP:0.0100
Episode:468 meanR:0.9719 R:1.1500 gloss:-1780559254.2507 gloss1:304392.7356 gloss2:-3785.3677 gloss3:-2515.5773 gloss3:-3781.1576 dloss:720461917.5787 exploreP:0.0100
Episode:469 meanR:0.9670 R:0.4300 gloss:-845212459.6759 gloss1:318272.1758 gloss2:-1340.5662 gloss3:-833.8312 gloss3:-1334.2462 dloss:572729191.5995 exploreP:0.0100
Episode:470 meanR:0.9655 R:0.4100 gloss:-1333335446.8501 gloss1:298458.4456 gloss2:-2447.0444 gloss3:-1586.9679 gloss3:-2435.2418 dloss:781529101.9523 exploreP:0.0100
Episode:471 meanR:0.9777 R:1.3900 gloss:-1987963741.7774 gloss1:340781.4493 gloss2:-3649.5545 gloss3:-2365.7975 gloss3:-3601.2530 dloss:656169135.8926 exploreP:0.0100
Episode:472 meanR:0.9792 R:0.6100 gloss:-703768178.9801 gloss1:328309.7327 gloss2:-3055.4902 gloss3:-1917.9864 gloss3:-3011.2570 dloss:2535324244.5405 exploreP:0.010

Episode:517 meanR:0.9790 R:1.3800 gloss:6049907150.0816 gloss1:339719.1313 gloss2:14496.6077 gloss3:9494.5839 gloss3:14289.6418 dloss:834816580.0965 exploreP:0.0100
Episode:518 meanR:0.9783 R:0.6900 gloss:1831849943.7054 gloss1:349477.6280 gloss2:3825.6758 gloss3:2574.0561 gloss3:3802.4576 dloss:584771632.7851 exploreP:0.0100
Episode:519 meanR:0.9707 R:0.0900 gloss:2873653997.9377 gloss1:392636.1935 gloss2:5435.7290 gloss3:3584.0682 gloss3:5383.1567 dloss:1071767143.5889 exploreP:0.0100
Episode:520 meanR:0.9679 R:1.0400 gloss:518282210.8589 gloss1:414483.1843 gloss2:2978.5544 gloss3:1954.9707 gloss3:2971.0839 dloss:546440895.4388 exploreP:0.0100
Episode:521 meanR:0.9737 R:1.0800 gloss:807089689.5289 gloss1:423923.5572 gloss2:394.3490 gloss3:281.8778 gloss3:397.4887 dloss:544295876.7140 exploreP:0.0100
Episode:522 meanR:0.9591 R:0.5900 gloss:-171708751.4502 gloss1:410841.6412 gloss2:933.7793 gloss3:632.2988 gloss3:911.4887 dloss:250599358.0581 exploreP:0.0100
Episode:523 meanR:0.9574 R:

Episode:568 meanR:0.8899 R:1.5200 gloss:1795703184.9296 gloss1:265775.9295 gloss2:4603.8916 gloss3:3118.8526 gloss3:4551.3006 dloss:68276225.4344 exploreP:0.0100
Episode:569 meanR:0.8986 R:1.3000 gloss:1508857493.5589 gloss1:257259.3049 gloss2:3929.6767 gloss3:2629.1726 gloss3:3871.4810 dloss:236025303.2814 exploreP:0.0100
Episode:570 meanR:0.9031 R:0.8600 gloss:1159854844.3593 gloss1:301227.1125 gloss2:3153.7898 gloss3:2097.8469 gloss3:3110.0622 dloss:219026019.7053 exploreP:0.0100
Episode:571 meanR:0.8951 R:0.5900 gloss:3031076752.1589 gloss1:280670.8386 gloss2:6221.0983 gloss3:4087.0602 gloss3:6159.4484 dloss:284876870.9999 exploreP:0.0100
Episode:572 meanR:0.8941 R:0.5100 gloss:3601116440.3573 gloss1:284391.5278 gloss2:7425.0991 gloss3:4797.1408 gloss3:7341.7570 dloss:244630646.3953 exploreP:0.0100
Episode:573 meanR:0.8873 R:0.6000 gloss:2242188370.3258 gloss1:259110.4880 gloss2:5653.0579 gloss3:3651.1109 gloss3:5596.4099 dloss:352870091.1960 exploreP:0.0100
Episode:574 meanR:0.898

Episode:619 meanR:0.8492 R:0.5100 gloss:828366953.0514 gloss1:165745.5034 gloss2:3704.2735 gloss3:2459.3311 gloss3:3687.2243 dloss:203130597.2315 exploreP:0.0100
Episode:620 meanR:0.8541 R:1.5300 gloss:1415829212.7805 gloss1:152615.4923 gloss2:6355.6909 gloss3:4168.3305 gloss3:6286.6740 dloss:159630517.9982 exploreP:0.0100
Episode:621 meanR:0.8517 R:0.8400 gloss:1162780806.3585 gloss1:134744.8382 gloss2:4996.5316 gloss3:3291.2734 gloss3:4977.2733 dloss:253395679.9740 exploreP:0.0100
Episode:622 meanR:0.8517 R:0.5900 gloss:542170140.1405 gloss1:141729.2497 gloss2:3276.3772 gloss3:2219.5037 gloss3:3239.0073 dloss:159067066.9653 exploreP:0.0100
Episode:623 meanR:0.8618 R:1.5300 gloss:100630372.4284 gloss1:119455.3289 gloss2:1170.2079 gloss3:854.3086 gloss3:1160.3421 dloss:155756933.4090 exploreP:0.0100
Episode:624 meanR:0.8509 R:0.1000 gloss:349591076.7989 gloss1:112591.3232 gloss2:1452.6813 gloss3:1051.9051 gloss3:1448.3326 dloss:189282903.6548 exploreP:0.0100
Episode:625 meanR:0.8591 R:

Episode:670 meanR:0.8656 R:0.4300 gloss:-1926900094.2134 gloss1:116750.1559 gloss2:-13779.5164 gloss3:-8873.7284 gloss3:-13629.7517 dloss:321604400.1690 exploreP:0.0100
Episode:671 meanR:0.8628 R:0.3100 gloss:-2106164276.5454 gloss1:136361.6810 gloss2:-13455.5004 gloss3:-8673.0038 gloss3:-13321.5746 dloss:694109423.7514 exploreP:0.0100
Episode:672 meanR:0.8611 R:0.3400 gloss:-70887232.6316 gloss1:129336.6823 gloss2:198.0275 gloss3:184.3239 gloss3:199.0461 dloss:47858340.1504 exploreP:0.0100
Episode:673 meanR:0.8558 R:0.0700 gloss:-127023877.6994 gloss1:128932.4795 gloss2:-583.8211 gloss3:-332.2090 gloss3:-578.4078 dloss:216548841.0778 exploreP:0.0100
Episode:674 meanR:0.8506 R:0.6700 gloss:-7526716.5602 gloss1:148771.9937 gloss2:912.4320 gloss3:616.9451 gloss3:896.8869 dloss:118879981.0288 exploreP:0.0100
Episode:675 meanR:0.8475 R:0.4700 gloss:-4012390.7952 gloss1:143266.9322 gloss2:-535.0921 gloss3:-298.9380 gloss3:-522.7189 dloss:203973650.1829 exploreP:0.0100
Episode:676 meanR:0.84

Episode:721 meanR:0.8916 R:0.5100 gloss:-249484866.0809 gloss1:235282.8572 gloss2:-351.2262 gloss3:-230.6301 gloss3:-339.7783 dloss:109820740.6617 exploreP:0.0100
Episode:722 meanR:0.8997 R:1.4000 gloss:-334544123.1329 gloss1:268867.7564 gloss2:-1587.9353 gloss3:-1054.0118 gloss3:-1565.5819 dloss:341353048.1069 exploreP:0.0100
Episode:723 meanR:0.8887 R:0.4300 gloss:-1114631183.5382 gloss1:298900.1983 gloss2:-1958.8531 gloss3:-1289.4621 gloss3:-1917.6631 dloss:316230558.2711 exploreP:0.0100
Episode:724 meanR:0.9076 R:1.9900 gloss:-152169126.3082 gloss1:280008.1538 gloss2:-1202.0273 gloss3:-825.1778 gloss3:-1191.8299 dloss:134918870.5129 exploreP:0.0100
Episode:725 meanR:0.9037 R:1.0100 gloss:-53414928.8485 gloss1:303271.0494 gloss2:-838.4453 gloss3:-625.0701 gloss3:-804.1694 dloss:348183255.1539 exploreP:0.0100
Episode:726 meanR:0.9193 R:1.6100 gloss:-553440016.4699 gloss1:323242.2128 gloss2:-1408.3575 gloss3:-962.2569 gloss3:-1382.4478 dloss:168380168.9548 exploreP:0.0100
Episode:727 

Episode:771 meanR:0.8870 R:0.4500 gloss:-3590381688.6174 gloss1:340722.3281 gloss2:-7262.3251 gloss3:-4914.4382 gloss3:-7165.4881 dloss:325070612.8550 exploreP:0.0100
Episode:772 meanR:0.8912 R:0.7600 gloss:-8269851792.7930 gloss1:350573.6647 gloss2:-14340.2583 gloss3:-9671.1201 gloss3:-14167.9138 dloss:592990136.5633 exploreP:0.0100
Episode:773 meanR:0.8919 R:0.1400 gloss:-9432759276.2728 gloss1:358999.1791 gloss2:-18395.3860 gloss3:-12343.7609 gloss3:-18139.8391 dloss:803963907.6239 exploreP:0.0100
Episode:774 meanR:0.8963 R:1.1100 gloss:-2333163847.0698 gloss1:373312.6311 gloss2:-6805.3580 gloss3:-4631.7703 gloss3:-6719.2283 dloss:424724682.0523 exploreP:0.0100
Episode:775 meanR:0.8942 R:0.2600 gloss:-2714051127.3395 gloss1:417885.7578 gloss2:-3349.1366 gloss3:-2231.8196 gloss3:-3323.8159 dloss:586544892.5516 exploreP:0.0100
Episode:776 meanR:0.8857 R:0.4600 gloss:-3592265491.9282 gloss1:445505.7167 gloss2:-6489.1431 gloss3:-4398.7815 gloss3:-6429.0888 dloss:452965290.9774 exploreP:

Episode:821 meanR:0.9223 R:0.8600 gloss:3090056211.7138 gloss1:565635.2771 gloss2:2560.2029 gloss3:1633.9784 gloss3:2510.5553 dloss:705156704.8304 exploreP:0.0100
Episode:822 meanR:0.9215 R:1.3200 gloss:3848920248.3099 gloss1:506138.6417 gloss2:6451.2028 gloss3:4324.0313 gloss3:6402.5357 dloss:614122742.4743 exploreP:0.0100
Episode:823 meanR:0.9286 R:1.1400 gloss:5676384908.3479 gloss1:532425.3045 gloss2:6985.7684 gloss3:4475.6636 gloss3:6698.8416 dloss:718521388.5353 exploreP:0.0100
Episode:824 meanR:0.9207 R:1.2000 gloss:3973831247.7468 gloss1:490577.7527 gloss2:5161.6547 gloss3:3366.5440 gloss3:5073.4705 dloss:478569056.6790 exploreP:0.0100
Episode:825 meanR:0.9189 R:0.8300 gloss:4862951584.5041 gloss1:479239.5194 gloss2:6251.3743 gloss3:4118.7884 gloss3:6174.5267 dloss:557526792.5362 exploreP:0.0100
Episode:826 meanR:0.9176 R:1.4800 gloss:-275165340.4948 gloss1:409491.9550 gloss2:1592.3783 gloss3:1032.3844 gloss3:1567.5430 dloss:832933166.3404 exploreP:0.0100
Episode:827 meanR:0.91

Episode:871 meanR:0.8877 R:1.0200 gloss:-115122838.7078 gloss1:634702.2673 gloss2:-1182.8772 gloss3:-933.9527 gloss3:-1137.3356 dloss:998370044.5828 exploreP:0.0100
Episode:872 meanR:0.8925 R:1.2400 gloss:-11956228903.3510 gloss1:598706.2020 gloss2:-14844.7002 gloss3:-10139.8133 gloss3:-14663.9252 dloss:867525270.9008 exploreP:0.0100
Episode:873 meanR:0.9040 R:1.2900 gloss:-23618372608.3533 gloss1:678656.5576 gloss2:-26057.5457 gloss3:-17724.1168 gloss3:-25757.5789 dloss:1962949795.1588 exploreP:0.0100
Episode:874 meanR:0.8999 R:0.7000 gloss:-37293381096.6632 gloss1:669328.1928 gloss2:-38956.8053 gloss3:-26276.3270 gloss3:-38583.9536 dloss:2268754832.6718 exploreP:0.0100
Episode:875 meanR:0.9067 R:0.9400 gloss:-47474297429.7649 gloss1:725834.8291 gloss2:-45253.2481 gloss3:-30375.0427 gloss3:-44733.5698 dloss:3581400607.0758 exploreP:0.0100
Episode:876 meanR:0.9050 R:0.2900 gloss:-21172126222.7953 gloss1:661533.6969 gloss2:-20077.7042 gloss3:-13652.0543 gloss3:-19841.1040 dloss:11456953

Episode:921 meanR:0.8826 R:1.7400 gloss:349346803.3570 gloss1:598247.5305 gloss2:2292.9076 gloss3:1487.1327 gloss3:2249.5032 dloss:648424930.1535 exploreP:0.0100
Episode:922 meanR:0.8729 R:0.3500 gloss:-6772171298.0012 gloss1:544445.5887 gloss2:-6100.2304 gloss3:-4324.7369 gloss3:-6014.2427 dloss:1330482566.4274 exploreP:0.0100
Episode:923 meanR:0.8672 R:0.5700 gloss:2955322576.5482 gloss1:535834.1880 gloss2:2975.9799 gloss3:1908.6367 gloss3:2923.8676 dloss:450449421.6559 exploreP:0.0100
Episode:924 meanR:0.8609 R:0.5700 gloss:2453353313.3908 gloss1:455721.1602 gloss2:4183.5898 gloss3:2724.2827 gloss3:4130.4966 dloss:927638045.1130 exploreP:0.0100
Episode:925 meanR:0.8590 R:0.6400 gloss:-389795670.2865 gloss1:430213.5555 gloss2:425.4235 gloss3:193.8222 gloss3:442.9693 dloss:632670509.6403 exploreP:0.0100
Episode:926 meanR:0.8478 R:0.3600 gloss:-897937733.3415 gloss1:417877.1700 gloss2:-1956.3072 gloss3:-1426.6011 gloss3:-1932.0905 dloss:798131780.7956 exploreP:0.0100
Episode:927 meanR:

Episode:971 meanR:0.9039 R:1.0500 gloss:4227762263.8185 gloss1:507069.3872 gloss2:-905.6288 gloss3:-742.8857 gloss3:-938.4371 dloss:6012070098.2713 exploreP:0.0100
Episode:972 meanR:0.8937 R:0.2200 gloss:858749134.5447 gloss1:469214.3204 gloss2:2008.0859 gloss3:1187.4556 gloss3:1989.3409 dloss:280050736.4971 exploreP:0.0100
Episode:973 meanR:0.8925 R:1.1700 gloss:-1464375043.7604 gloss1:526542.9842 gloss2:-833.3303 gloss3:-785.6697 gloss3:-813.7278 dloss:819818082.1386 exploreP:0.0100
Episode:974 meanR:0.8897 R:0.4200 gloss:-1611455421.9368 gloss1:557321.1611 gloss2:-838.4168 gloss3:-827.7123 gloss3:-858.2406 dloss:1482341284.0745 exploreP:0.0100
Episode:975 meanR:0.8925 R:1.2200 gloss:-4166913378.2061 gloss1:653083.8207 gloss2:-4366.6439 gloss3:-3015.1554 gloss3:-4306.3464 dloss:2047490966.3059 exploreP:0.0100
Episode:976 meanR:0.8960 R:0.6400 gloss:-10711939935.4392 gloss1:673714.2231 gloss2:-13724.9120 gloss3:-9383.5151 gloss3:-13569.5408 dloss:416575414.3861 exploreP:0.0100
Episode

Episode:1021 meanR:0.8576 R:0.5500 gloss:-5785409450.6093 gloss1:777081.8912 gloss2:-6714.1220 gloss3:-4740.4421 gloss3:-6631.6602 dloss:73281208.5414 exploreP:0.0100
Episode:1022 meanR:0.8584 R:0.4300 gloss:-615920294.9459 gloss1:771902.1323 gloss2:867.4775 gloss3:436.0593 gloss3:853.6936 dloss:339692747.8162 exploreP:0.0100
Episode:1023 meanR:0.8536 R:0.0900 gloss:283165504.7918 gloss1:690236.7953 gloss2:1005.2903 gloss3:536.9026 gloss3:980.3529 dloss:980393531.1278 exploreP:0.0100
Episode:1024 meanR:0.8479 R:0.0000 gloss:2829074757.0359 gloss1:711636.9886 gloss2:2652.4055 gloss3:1644.7045 gloss3:2619.2436 dloss:1147686593.4788 exploreP:0.0100
Episode:1025 meanR:0.8516 R:1.0100 gloss:4142901390.8835 gloss1:661830.1567 gloss2:4014.8400 gloss3:2613.3535 gloss3:3994.7273 dloss:541677977.5456 exploreP:0.0100
Episode:1026 meanR:0.8561 R:0.8100 gloss:3225701204.5692 gloss1:675181.4107 gloss2:3907.6196 gloss3:2558.2274 gloss3:3884.5379 dloss:1128712303.5473 exploreP:0.0100
Episode:1027 mean

Episode:1071 meanR:0.8210 R:1.7500 gloss:846717511.3554 gloss1:535185.0227 gloss2:521.3978 gloss3:214.5513 gloss3:519.8773 dloss:441227546.9439 exploreP:0.0100
Episode:1072 meanR:0.8247 R:0.5900 gloss:-1795443909.9352 gloss1:560044.0148 gloss2:-167.0967 gloss3:-199.3108 gloss3:-129.2624 dloss:1630987670.3675 exploreP:0.0100
Episode:1073 meanR:0.8237 R:1.0700 gloss:886634865.8097 gloss1:640411.6786 gloss2:528.4757 gloss3:225.1598 gloss3:516.6863 dloss:89920459.9992 exploreP:0.0100
Episode:1074 meanR:0.8266 R:0.7100 gloss:-2432095785.4879 gloss1:671343.2542 gloss2:-1201.7861 gloss3:-939.6142 gloss3:-1182.6455 dloss:721185811.9068 exploreP:0.0100
Episode:1075 meanR:0.8248 R:1.0400 gloss:-5320907736.0231 gloss1:741304.7295 gloss2:-2277.2010 gloss3:-1645.6360 gloss3:-2160.2923 dloss:1015747549.9279 exploreP:0.0100
Episode:1076 meanR:0.8264 R:0.8000 gloss:-702325805.1867 gloss1:765651.5180 gloss2:-2568.1379 gloss3:-1894.4719 gloss3:-2533.0750 dloss:415963798.6275 exploreP:0.0100
Episode:1077

Episode:1121 meanR:0.8323 R:0.9500 gloss:83782525.1122 gloss1:772606.0712 gloss2:2544.8114 gloss3:1640.2512 gloss3:2507.9508 dloss:1461177544.4511 exploreP:0.0100
Episode:1122 meanR:0.8312 R:0.3200 gloss:3717770657.2911 gloss1:819987.5670 gloss2:2598.5657 gloss3:1595.8186 gloss3:2592.4550 dloss:672240661.0810 exploreP:0.0100
Episode:1123 meanR:0.8344 R:0.4100 gloss:4689043377.5974 gloss1:878176.5190 gloss2:295.4026 gloss3:-26.9580 gloss3:293.6682 dloss:1006122693.0218 exploreP:0.0100
Episode:1124 meanR:0.8444 R:1.0000 gloss:-8621555115.1747 gloss1:932547.6902 gloss2:-3637.5480 gloss3:-2988.7766 gloss3:-3685.0098 dloss:19877220821.4715 exploreP:0.0100
Episode:1125 meanR:0.8395 R:0.5200 gloss:-4489692012.3271 gloss1:927696.4310 gloss2:-4131.0324 gloss3:-3027.9267 gloss3:-4089.4072 dloss:25578684.6735 exploreP:0.0100
Episode:1126 meanR:0.8445 R:1.3100 gloss:-22076821162.0886 gloss1:1005634.7156 gloss2:-18022.2171 gloss3:-12617.2503 gloss3:-17817.6321 dloss:669827455.2574 exploreP:0.0100
E