# Continuous Control

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the second project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Reacher.app"`
- **Windows** (x86): `"path/to/Reacher_Windows_x86/Reacher.exe"`
- **Windows** (x86_64): `"path/to/Reacher_Windows_x86_64/Reacher.exe"`
- **Linux** (x86): `"path/to/Reacher_Linux/Reacher.x86"`
- **Linux** (x86_64): `"path/to/Reacher_Linux/Reacher.x86_64"`
- **Linux** (x86, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86"`
- **Linux** (x86_64, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86_64"`

For instance, if you are using a Mac, then you downloaded `Reacher.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Reacher.app")
```

In [2]:
# env = UnityEnvironment(file_name='/home/arasdar/unity-envs/Reacher_Linux/Reacher.x86_64')
# env = UnityEnvironment(file_name='/home/arasdar/unity-envs/Reacher_Linux_v1/Reacher.x86_64')
# env = UnityEnvironment(file_name='/home/arasdar/unity-envs/Reacher_Linux_OneAgent/Reacher_Linux/Reacher.x86_64')
env = UnityEnvironment(file_name='/home/arasdar/unity-envs/Reacher_Linux_NoVis_OneAgent/Reacher_Linux_NoVis/Reacher.x86_64')
# env = UnityEnvironment(file_name='/home/aras/unity-envs/Reacher_Linux_NoVis/Reacher.x86_64')

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, a double-jointed arm can move to target locations. A reward of `+0.1` is provided for each step that the agent's hand is in the goal location. Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of `33` variables corresponding to position, rotation, velocity, and angular velocities of the arm.  Each action is a vector with four numbers, corresponding to torque applicable to two joints.  Every entry in the action vector must be a number between `-1` and `1`.

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

Number of agents: 1
Size of each action: 4
There are 1 agents. Each observes a state with length: 33
The state for the first agent looks like: [ 0.00000000e+00 -4.00000000e+00  0.00000000e+00  1.00000000e+00
 -0.00000000e+00 -0.00000000e+00 -4.37113883e-08  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00 -1.00000000e+01  0.00000000e+00
  1.00000000e+00 -0.00000000e+00 -0.00000000e+00 -4.37113883e-08
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  5.75471878e+00 -1.00000000e+00
  5.55726671e+00  0.00000000e+00  1.00000000e+00  0.00000000e+00
 -1.68164849e-01]


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [5]:
# env_info = env.reset(train_mode=False)[brain_name]     # reset the environment    
# states = env_info.vector_observations                  # get the current state (for each agent)
# scores = np.zeros(num_agents)                          # initialize the score (for each agent)
# while True:
#     actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
#     actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
#     env_info = env.step(actions)[brain_name]           # send all actions to tne environment
#     next_states = env_info.vector_observations         # get next state (for each agent)
#     rewards = env_info.rewards                         # get reward (for each agent)
#     dones = env_info.local_done                        # see if episode finished
#     scores += env_info.rewards                         # update the score (for each agent)
#     states = next_states                               # roll over states to next time step
#     if np.any(dones):                                  # exit loop if episode finished
#         break
# print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

When finished, you can close the environment.

In [6]:
# env.close()

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [7]:
# # Testing the train mode
# env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
# state = env_info.vector_observations[0]                  # get the current state (for each agent)
# #scores = np.zeros(num_agents)                          # initialize the score (for each agent)
# num_steps = 0
# while True:
#     num_steps += 1
#     action = np.random.randn(num_agents, action_size) # select an action (for each agent)
#     #print(action)
#     action = np.clip(action, -1, 1)                  # all actions between -1 and 1
#     #print(action)
#     env_info = env.step(action)[brain_name]           # send all actions to tne environment
#     next_state = env_info.vector_observations[0]         # get next state (for each agent)
#     reward = env_info.rewards[0]                         # get reward (for each agent)
#     done = env_info.local_done[0]                        # see if episode finished
#     #scores += env_info.rewards                         # update the score (for each agent)
#     state = next_state                               # roll over states to next time step
#     if done is True:                                  # exit loop if episode finished
#         #print(action.shape, reward)
#         #print(done)
#         break
# print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))
# num_steps

## Option 1: Solve the First Version
The task is episodic, and in order to solve the environment, your agent must get an average score of +30 over 100 consecutive episodes.

In [8]:
# In this one we should define and detect GPUs for tensorflow
# GPUs or CPU
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.7.1
Default GPU Device: 


In [9]:
def model_input(state_size, action_size):
    #states = tf.placeholder(tf.float32, [None, *state_shape], name='states')
    states = tf.placeholder(tf.float32, [None, state_size], name='states')
    actions = tf.placeholder(tf.float32, [None, action_size], name='actions')
    targetQs = tf.placeholder(tf.float32, [None], name='targetQs')
    rates = tf.placeholder(tf.float32, [None], name='rates')
    training = tf.placeholder(tf.bool, [], name='training')
    return states, actions, targetQs, rates, training

In [10]:
# Generator/Controller: Generating/prediting the actions
def generator(states, action_size, hidden_size, reuse=False, alpha=0.1, training=False):
    with tf.variable_scope('generator', reuse=reuse):
        # First fully connected layer
        h1 = tf.layers.dense(inputs=states, units=hidden_size)
        bn1 = tf.layers.batch_normalization(h1, training=training)        
        nl1 = tf.maximum(alpha * bn1, bn1)
        
        # Second fully connected layer
        h2 = tf.layers.dense(inputs=nl1, units=hidden_size)
        bn2 = tf.layers.batch_normalization(h2, training=training)        
        nl2 = tf.maximum(alpha * bn2, bn2)
        
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=action_size) # [-inf, +inf]        
        predictions = tf.nn.tanh(logits) # [-1, +1]
        return predictions

In [11]:
# Discriminator/Dopamine: Reward function/planner/naviator/advisor/supervisor/cortical columns
def discriminator(states, actions, hidden_size, reuse=False, alpha=0.1, training=False):
    with tf.variable_scope('discriminator', reuse=reuse):
        # First fully connected layer
        h1 = tf.layers.dense(inputs=states, units=action_size)
        bn1 = tf.layers.batch_normalization(h1, training=training)        
        nl1 = tf.maximum(alpha * bn1, bn1)
        
        # Second fully connected layer
        fused = tf.concat(axis=1, values=[nl1, actions])
        h2 = tf.layers.dense(inputs=fused, units=hidden_size)
        bn2 = tf.layers.batch_normalization(h2, training=training)        
        nl2 = tf.maximum(alpha * bn2, bn2)
                
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=1)        
        #predictions = tf.nn.softmax(logits)
        return logits

In [12]:
def model_loss(action_size, hidden_size, states, actions, targetQs, rates, training):
    actions_preds = generator(states=states, hidden_size=hidden_size, action_size=action_size, training=training)
    gQs = discriminator(actions=actions_preds, hidden_size=hidden_size, states=states, training=training)
    gloss = -tf.reduce_mean(gQs)
    dQs = discriminator(actions=actions, hidden_size=hidden_size, states=states, training=training, reuse=True)
    rates = tf.reshape(rates, shape=[-1, 1]) # [0, 1]
    dloss = tf.nn.sigmoid_cross_entropy_with_logits(logits=dQs, # [-inf, +inf] to [0, 1]
                                                    labels=rates) # [0, 1]
                                                    #labels=(rates+1)/2) # [-1, +1] to [0, 1]
    targetQs = tf.reshape(targetQs, shape=[-1, 1])
    #dloss = tf.reduce_mean(tf.square(dQs - targetQs)) # DQN
    dloss += tf.nn.sigmoid_cross_entropy_with_logits(logits=dQs, # [-inf, +inf] to [0, 1]
                                                     labels=tf.nn.sigmoid(targetQs)) # [0, 1]
    return actions_preds, gQs, gloss, dloss

In [13]:
# Optimizating/training/learning G & D
def model_opt(g_loss, d_loss, g_learning_rate, d_learning_rate):
    # Get weights and bias to update
    t_vars = tf.trainable_variables()
    g_vars = [var for var in t_vars if var.name.startswith('generator')]
    d_vars = [var for var in t_vars if var.name.startswith('discriminator')]

    # Optimize
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): # Required for batchnorm (BN)
        g_opt = tf.train.AdamOptimizer(g_learning_rate).minimize(g_loss, var_list=g_vars)
        d_opt = tf.train.AdamOptimizer(d_learning_rate).minimize(d_loss, var_list=d_vars)
    return g_opt, d_opt

In [14]:
class Model:
    def __init__(self, state_size, action_size, hidden_size, g_learning_rate, d_learning_rate):

        # Data of the Model: make the data available inside the framework
        self.states, self.actions, self.targetQs, self.rates, self.training = model_input(
            state_size=state_size, action_size=action_size)

        # Create the Model: calculating the loss and forwad pass
        self.actions_preds, self.Qs_logits, self.g_loss, self.d_loss = model_loss(
            action_size=action_size, hidden_size=hidden_size, # model init parameters
            states=self.states, actions=self.actions, targetQs=self.targetQs, 
            rates=self.rates, training=self.training) # model input
        
        # Update the model: backward pass and backprop
        self.g_opt, self.d_opt = model_opt(g_loss=self.g_loss, d_loss=self.d_loss,
                                           g_learning_rate=g_learning_rate, d_learning_rate=d_learning_rate)

In [15]:
def sample(buffer, batch_size):
    idx = np.random.choice(np.arange(len(buffer)), 
                           size=batch_size, 
                           replace=False)
    return [buffer[ii] for ii in idx]

In [16]:
from collections import deque
class Memory():
    def __init__(self, max_size = 1000):
        self.buffer = deque(maxlen=max_size)
    def sample(self, batch_size):
        idx = np.random.choice(np.arange(len(self.buffer)), 
                               size=batch_size, 
                               replace=False)
        return [self.buffer[ii] for ii in idx]

In [17]:
# reset the environment
env_info.vector_observations.shape, env_info.previous_vector_actions.shape, \
brain.vector_action_space_size, brain.number_visual_observations, \
brain.vector_action_space_size, brain.vector_observation_space_size

((1, 33), (1, 4), 4, 0, 4, 33)

In [18]:
# Exploration parameters
explore_start = 1.0            # exploration probability at start
explore_stop = 0.01           # minimum exploration probability 
decay_rate = 0.0001            # exponential decay rate for exploration prob

# Network parameters
state_size = 33
action_size = 4
hidden_size = 33*2             # number of units in each Q-network hidden layer
g_learning_rate = 1e-4         # Q-network learning rate
d_learning_rate = 1e-4         # Q-network learning rate

# Memory parameters
memory_size = int(1e5)            # memory capacity
batch_size = int(1e2)             # experience mini-batch size
gamma = 0.99                   # future reward discount

In [19]:
# Reset/init the graph/session
graph = tf.reset_default_graph()

# Init the model
model = Model(action_size=action_size, state_size=state_size, hidden_size=hidden_size,
              g_learning_rate=g_learning_rate, d_learning_rate=d_learning_rate)

# Init the memory
memory = Memory(max_size=memory_size)

In [20]:
# Initializing the memory buffer
env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
state = env_info.vector_observations[0]                  # get the current state (for each agent)
total_reward = 0
num_step = 0
for each_step in range(memory_size):
    action = np.random.randn(num_agents, action_size) # select an action (for each agent)
    action = np.clip(action, -1, 1) # [-1, +1]
    env_info = env.step(action)[brain_name]           # send all actions to tne environment
    next_state = env_info.vector_observations[0]         # get next state (for each agent)
    reward = env_info.rewards[0]                         # get reward (for each agent)
    done = env_info.local_done[0]                        # see if episode finished
    rate = -1 # success rate: [-1, +1]
    memory.buffer.append([state, action.reshape([-1]), next_state, reward, float(done), rate])
    num_step += 1 # memory updated
    total_reward += reward # max reward 30
    state = next_state
    if done is True:
        print('Progress:', each_step/memory_size)
        #state = env.reset()
        env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
        state = env_info.vector_observations[0]                  # get the current state (for each agent)
        rate = total_reward/30
        rate = np.clip(rate, -1, 1) # [-1, +1]
        total_reward = 0 # reset
        for idx in range(num_step): # episode length
            if memory.buffer[-1-idx][-1] == -1:
                memory.buffer[-1-idx][-1] = rate
        num_step = 0 # reset

Progress: 0.01
Progress: 0.02001
Progress: 0.03002
Progress: 0.04003
Progress: 0.05004
Progress: 0.06005
Progress: 0.07006
Progress: 0.08007
Progress: 0.09008
Progress: 0.10009
Progress: 0.1101
Progress: 0.12011
Progress: 0.13012
Progress: 0.14013
Progress: 0.15014
Progress: 0.16015
Progress: 0.17016
Progress: 0.18017
Progress: 0.19018
Progress: 0.20019
Progress: 0.2102
Progress: 0.22021
Progress: 0.23022
Progress: 0.24023
Progress: 0.25024
Progress: 0.26025
Progress: 0.27026
Progress: 0.28027
Progress: 0.29028
Progress: 0.30029
Progress: 0.3103
Progress: 0.32031
Progress: 0.33032
Progress: 0.34033
Progress: 0.35034
Progress: 0.36035
Progress: 0.37036
Progress: 0.38037
Progress: 0.39038
Progress: 0.40039
Progress: 0.4104
Progress: 0.42041
Progress: 0.43042
Progress: 0.44043
Progress: 0.45044
Progress: 0.46045
Progress: 0.47046
Progress: 0.48047
Progress: 0.49048
Progress: 0.50049
Progress: 0.5105
Progress: 0.52051
Progress: 0.53052
Progress: 0.54053
Progress: 0.55054
Progress: 0.56055


In [None]:
# Save/load the model and save for plotting
saver = tf.train.Saver()
episode_rewards_list, rewards_list, gloss_list, dloss_list = [], [], [], []

# TF session for training
with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    #saver.restore(sess, 'checkpoints/model.ckpt')    
    #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    episode_reward = deque(maxlen=100) # 100 episodes average/running average/running mean/window
    
    # Training episodes/epochs
    for ep in range(11111):
        total_reward = 0
        num_step = 0
        gloss_batch, dloss_batch = [], []
        #state = env.reset()
        env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
        state = env_info.vector_observations[0]                  # get the current state (for each agent)

        # Training steps/batches
        while True:
            action_preds = sess.run(model.actions_preds, feed_dict={model.states: state.reshape([1, -1]), 
                                                                    model.training: False})
            noise = np.random.normal(loc=0, scale=0.1, size=action_size) # randomness
            action = action_preds + noise
            #print(action.shape, action_logits.shape, noise.shape)
            action = np.clip(action, -1, 1) # [-1, +1]
            #next_state, reward, done, _ = env.step(action)
            env_info = env.step(action)[brain_name]           # send all actions to tne environment
            next_state = env_info.vector_observations[0]         # get next state (for each agent)
            reward = env_info.rewards[0]                         # get reward (for each agent)
            done = env_info.local_done[0]                        # see if episode finished
            rate = -1 # success rate: # [-1, +1]
            memory.buffer.append([state, action.reshape([-1]), next_state, reward, float(done), rate])
            num_step += 1 # memory updated
            total_reward += reward # max reward 30
            state = next_state
            
            if done is True:
                # Best 100-episode average reward was +30 
                # (Reacher is considered "solved" 
                #  when the agent obtains an average reward of at least 300 over 100 consecutive episodes.)        
                rate = total_reward/30
                rate = np.clip(rate, -1, 1) # [-1, +1]
                for idx in range(num_step): # episode length
                    if memory.buffer[-1-idx][-1] == -1:
                        memory.buffer[-1-idx][-1] = rate
                        
            # Training
            allbatch = memory.buffer
            rates = np.array([each[5] for each in allbatch])
            ratedbatch = np.array(memory.buffer)[rates>0] # excluding -1 or unrated ones or min rated ones
            batch = sample(batch_size=batch_size, buffer=ratedbatch)
            states = np.array([each[0] for each in batch])
            actions = np.array([each[1] for each in batch])
            next_states = np.array([each[2] for each in batch])
            rewards = np.array([each[3] for each in batch])
            dones = np.array([each[4] for each in batch])
            rates = np.array([each[5] for each in batch])
            nextQs_logits = sess.run(model.Qs_logits, feed_dict = {model.states: next_states, 
                                                                   model.training: False})
            #nextQs = np.max(nextQs_logits, axis=1) * (1-dones) # discrete DQN
            nextQs = nextQs_logits.reshape([-1]) * (1-dones) # continuous DPG
            targetQs = rewards + (gamma * nextQs)
            gloss, dloss, _, _ = sess.run([model.g_loss, model.d_loss, model.g_opt, model.d_opt],
                                          feed_dict = {model.states: states, 
                                                       model.actions: actions,
                                                       model.targetQs: targetQs, 
                                                       model.rates: rates, 
                                                       model.training: False})
            gloss_batch.append(gloss)
            dloss_batch.append(dloss)
            if done is True:
                break
                
        episode_reward.append(total_reward)
        print('Episode:{}'.format(ep),
              'meanR:{:.4f}'.format(np.mean(episode_reward)),
              'R:{:.4f}'.format(total_reward),
              'rate:{:.4f}'.format(rate),
              'gloss:{:.4f}'.format(np.mean(gloss_batch)),
              'dloss:{:.4f}'.format(np.mean(dloss_batch)))
        # Ploting out
        episode_rewards_list.append([ep, np.mean(episode_reward)])
        rewards_list.append([ep, total_reward])
        gloss_list.append([ep, np.mean(gloss_batch)])
        #dloss_list.append([ep, np.mean(dloss_batch)])
        # Break episode/epoch loop
        # Did not solve the environment. 
        # Best 100-episode average reward was +30 
        # (Reacher is considered "solved" 
        #  when the agent obtains an average reward of at least 300 over 100 consecutive episodes.)        
        if np.mean(episode_reward) >= 30:
            break
            
    # At the end of all training episodes/epochs
    saver.save(sess, 'checkpoints/model.ckpt')

Episode:0 meanR:2.6900 R:2.6900 rate:0.0897 gloss:2.2616 dloss:0.5702
Episode:1 meanR:1.6600 R:0.6300 rate:0.0210 gloss:3.9771 dloss:0.2150
Episode:2 meanR:1.2200 R:0.3400 rate:0.0113 gloss:3.9731 dloss:0.1952
Episode:3 meanR:1.2450 R:1.3200 rate:0.0440 gloss:3.9787 dloss:0.1857
Episode:4 meanR:1.2260 R:1.1500 rate:0.0383 gloss:3.9258 dloss:0.1872
Episode:5 meanR:1.1717 R:0.9000 rate:0.0300 gloss:3.8400 dloss:0.1927
Episode:6 meanR:1.0043 R:0.0000 rate:0.0000 gloss:3.7764 dloss:0.1964
Episode:7 meanR:1.0662 R:1.5000 rate:0.0500 gloss:3.7653 dloss:0.1959
Episode:8 meanR:0.9833 R:0.3200 rate:0.0107 gloss:3.7158 dloss:0.2028
Episode:9 meanR:0.9310 R:0.4600 rate:0.0153 gloss:3.7487 dloss:0.1993
Episode:10 meanR:0.9609 R:1.2600 rate:0.0420 gloss:3.7589 dloss:0.1985
Episode:11 meanR:0.9217 R:0.4900 rate:0.0163 gloss:3.7416 dloss:0.2019
Episode:12 meanR:0.9238 R:0.9500 rate:0.0317 gloss:3.7512 dloss:0.2013
Episode:13 meanR:0.9043 R:0.6500 rate:0.0217 gloss:3.7172 dloss:0.2054
Episode:14 meanR

Episode:116 meanR:0.9913 R:0.9800 rate:0.0327 gloss:3.1728 dloss:0.3229
Episode:117 meanR:0.9923 R:1.2800 rate:0.0427 gloss:3.1673 dloss:0.3235
Episode:118 meanR:0.9957 R:0.8700 rate:0.0290 gloss:3.1603 dloss:0.3251
Episode:119 meanR:0.9957 R:1.0300 rate:0.0343 gloss:3.1616 dloss:0.3244
Episode:120 meanR:0.9968 R:1.0000 rate:0.0333 gloss:3.1521 dloss:0.3260
Episode:121 meanR:1.0032 R:1.3300 rate:0.0443 gloss:3.1464 dloss:0.3273
Episode:122 meanR:1.0215 R:2.7600 rate:0.0920 gloss:3.1478 dloss:0.3272
Episode:123 meanR:1.0268 R:0.8500 rate:0.0283 gloss:3.1257 dloss:0.3328
Episode:124 meanR:1.0382 R:1.6400 rate:0.0547 gloss:3.1195 dloss:0.3342
Episode:125 meanR:1.0461 R:1.2300 rate:0.0410 gloss:3.1167 dloss:0.3353
Episode:126 meanR:1.0331 R:0.2500 rate:0.0083 gloss:3.1123 dloss:0.3361
Episode:127 meanR:1.0318 R:0.3000 rate:0.0100 gloss:3.1207 dloss:0.3349
Episode:128 meanR:1.0311 R:0.6200 rate:0.0207 gloss:3.1295 dloss:0.3335
Episode:129 meanR:1.0377 R:1.1700 rate:0.0390 gloss:3.1367 dloss

Episode:230 meanR:1.0861 R:1.4700 rate:0.0490 gloss:3.1468 dloss:0.3363
Episode:231 meanR:1.0807 R:0.8800 rate:0.0293 gloss:3.1509 dloss:0.3357
Episode:232 meanR:1.0785 R:0.7500 rate:0.0250 gloss:3.1559 dloss:0.3347
Episode:233 meanR:1.0794 R:0.8000 rate:0.0267 gloss:3.1561 dloss:0.3350
Episode:234 meanR:1.0564 R:0.0000 rate:0.0000 gloss:3.1661 dloss:0.3327
Episode:235 meanR:1.0504 R:0.6700 rate:0.0223 gloss:3.1715 dloss:0.3312
Episode:236 meanR:1.0432 R:0.2400 rate:0.0080 gloss:3.1708 dloss:0.3311
Episode:237 meanR:1.0411 R:0.7300 rate:0.0243 gloss:3.1819 dloss:0.3290
Episode:238 meanR:1.0265 R:0.3800 rate:0.0127 gloss:3.1896 dloss:0.3275
Episode:239 meanR:1.0201 R:0.6300 rate:0.0210 gloss:3.1710 dloss:0.3294
Episode:240 meanR:1.0120 R:0.0000 rate:0.0000 gloss:3.1762 dloss:0.3283
Episode:241 meanR:1.0134 R:0.6600 rate:0.0220 gloss:3.1757 dloss:0.3287
Episode:242 meanR:1.0090 R:0.3700 rate:0.0123 gloss:3.1731 dloss:0.3288
Episode:243 meanR:1.0005 R:0.1200 rate:0.0040 gloss:3.2064 dloss