<a href="https://colab.research.google.com/github/NeuromatchAcademy/course-content-dl/blob/main/projects/ReinforcementLearning/human_rl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using RL to Model Cognitive Tasks

**By Neurmatch Academy**

__Content creators:__

__Production editor:__ Spiros Chavlis


**Our 2021 Sponsors, including Presenting Sponsor Facebook Reality Labs**

<p align='center'><img src='https://github.com/NeuromatchAcademy/widgets/blob/master/sponsors.png?raw=True'/></p>

---
# Objective

- This project aims to use behavioral data to train an agent and then use the agent to investigate data produced by human subjects. Having a computational agent that mimics humans in such tests, we will be able to compare its mechanics with human data.

- In another conception, we could fit an agent that learns many cognitive tasks that require abstract-level constructs such as executive functions. This is a multi-task control problem.




---
# Setup

In [None]:
# @title Install dependencies
!pip install dm-env --quiet
!pip install dm-sonnet --quiet
!pip install dm-acme==0.2.0 dm-acme[tf]==0.2.0 dm-acme[reverb]==0.2.0 --quiet

[K     |████████████████████████████████| 254 kB 6.8 MB/s 
[K     |████████████████████████████████| 155 kB 6.5 MB/s 
[K     |████████████████████████████████| 11.1 MB 1.3 MB/s 
[K     |████████████████████████████████| 99 kB 10.1 MB/s 
[?25h  Building wheel for dm-acme (setup.py) ... [?25l[?25hdone


In [None]:
# Imports

import time

import numpy as np
import pandas as pd
import sonnet as snt
import seaborn as sns
import matplotlib.pyplot as plt

import dm_env
from dm_env import specs

import acme
import acme.tf.networks as networks
from acme import specs
from acme import wrappers
from acme.tf import networks
from acme.testing import fakes
from acme import EnvironmentLoop
from acme.agents.tf.dqn import DQN
from acme.agents.tf.d4pg import D4PG
from acme.agents.tf.ddpg import DDPG
from acme.agents.tf.r2d2 import R2D2
from acme.tf import utils as tf2_utils
from acme.utils.loggers import TerminalLogger
from acme.agents.tf.dmpo import DistributionalMPO

In [None]:
# @title Figure settings
sns.set()

In [None]:
# @title `InMemoryLogger` that keeps the data in memory

class InMemoryLogger(acme.utils.loggers.Logger):
  """A simple logger that keeps all data in memory.

  Reference:
    https://github.com/deepmind/acme/blob/master/acme/utils/loggers/dataframe.py
  """
  def __init__(self):
    self._data = []

  def write(self, data: acme.utils.loggers.LoggingData):
    self._data.append(data)

---
# Background

- Cognitive scientists use standard lab tests to tap into specific processes in the brain and behavior. Some examples of those tests are Stroop, N-back, Digit Span, TMT (Trail making tests), and WCST (Wisconsin Card Sorting Tests).

- Despite an extensive body of research that explains human performance using descriptive what-models, we still need a more sophisticated approach to gain a better understanding of the underlying processes (i.e., a how-model).

- Interestingly, many of such tests can be thought of as a continuous stream of stimuli and corresponding actions, that is in consonant with the RL formulation. In fact, RL itself is in part motivated by how the brain enables goal-directed behaviors using reward systems, making it a good choice to explain human performance.

- One behavioral test example would be the N-back task.

  - In the N-back, participants view a sequence of stimuli, one by one, and are asked to categorize each stimulus as being either match or non-match. Stimuli are usually numbers, and feedback is given at both timestep and trajectory levels.

  - The agent is rewarded when its response matches the stimulus that was shown N steps back in the episode. A simpler version of the N-back uses two-choice action schema, that is match vs non-match. Once the present stimulus matches the one presented N step back, then the agent is expected to respond to it as being a `match`.


- Given a trained RL agent, we then find correlates of its fitted parameters with the brain mechanisms. The most straightforward composition could be the correlation of model parameters with the brain activities.

## Datasets

- HCP WM task ([NMA-CN HCP notebooks](https://github.com/NeuromatchAcademy/course-content/tree/master/projects/fMRI))

Any dataset that used cognitive tests would work.
Question: limit to behavioral data vs fMRI?
Question: Which stimuli and actions to use?
classic tests can be modeled using 1) bounded symbolic stimuli/actions (e.g., A, B, C), but more sophisticated one would require texts or images (e.g., face vs neutral images in social stroop dataset)
The HCP dataset from NMA-CN contains behavioral and imaging data for 7 cognitive tests including various versions of N-back.

## N-back task

In the N-back task, participants view a sequence of stimuli, one per time, and are asked to categorize each stimulus as being either match or non-match. Stimuli are usually numbers, and feedbacks are given at both timestep and trajectory levels.

In a typical neuro setup, both accuracy and response time are measured, but here, for the sake of brevity, we focus only on accuracy of responses.

---
# Cognitive Tests Environment

First we develop an environment in that agents perform a cognitive test, here the N-back.

## Human dataset

We need a dataset of human perfoming a N-back test, with the following features:

- `participant_id`: following the BIDS format, it contains a unique identifier for each participant.
- `trial_index`: same as `time_step`.
- `stimulus`: same as `observation`.
- `response`: same as `action`, recorded response by the human subject.
- `expected_response`: correct response.
- `is_correct`: same as `reward`, whether the human subject responded correctly.
- `response_time`: won't be used here.

Here we generate a mock dataset with those features, but remember to **replace this with real human data.**

In [None]:
def generate_mock_nback_dataset(N=2,
                                n_participants=10,
                                n_trials=32,
                                stimulus_choices=list('ABCDEF'),
                                response_choices=['match', 'non-match']):
  """Generate a mock dataset for the N-back task."""

  n_rows = n_participants * n_trials

  participant_ids = sorted([f'sub-{pid}' for pid in range(1,n_participants+1)] * n_trials)
  trial_indices = list(range(1,n_trials+1)) * n_participants
  stimulus_sequence = np.random.choice(stimulus_choices, n_rows)

  responses = np.random.choice(response_choices, n_rows)
  is_corrects = np.random.choice([True, False], n_rows)
  response_times = np.random.exponential(size=n_rows)

  df = pd.DataFrame({
      'participant_id': participant_ids,
      'trial_index': trial_indices,
      'stimulus': stimulus_sequence,
      'response': responses,
      'is_correct': is_corrects,
      #TODO: is_match
      'response_time': response_times
  })

  # mark matchig stimuli
  _nback_stim = df['stimulus'].shift(N)
  df['expected_response'] = (df['stimulus'] == _nback_stim).map({True: 'match', False: 'non-match'})

  # we don't care about burn-in trials (trial < N)
  df.loc[df['trial_index'] <= N, 'is_correct'] = True
  df.loc[df['trial_index'] <= N, ['response','response_time','expected_response']] = None

  return df


# ========
# now generate the actual data with the provided function and plot some of its features
mock_nback_data = generate_mock_nback_dataset()

sns.displot(data=mock_nback_data, x='response_time')
plt.suptitle('response time distribution of the mock N-back dataset', y=1.01)
plt.show()

sns.displot(data=mock_nback_data, x='is_correct')
plt.suptitle('Accuracy distribution of the mock N-back dataset', y=1.06)
plt.show()

mock_nback_data



Unnamed: 0,participant_id,trial_index,stimulus,response,is_correct,response_time,expected_response
0,sub-1,1,B,,True,,
1,sub-1,2,B,,True,,
2,sub-1,3,D,match,False,0.733423,non-match
3,sub-1,4,B,non-match,True,0.386090,match
4,sub-1,5,D,match,False,0.072855,match
...,...,...,...,...,...,...,...
315,sub-9,28,C,non-match,True,0.928347,match
316,sub-9,29,A,match,False,0.179742,non-match
317,sub-9,30,A,match,True,0.897553,non-match
318,sub-9,31,C,non-match,True,0.345323,non-match


## Implementation scheme


### Environment

The following cell implments N-back envinronment, that we later use to train a RL agent on human data. It is capable of performing two kinds of simulation:
- rewards the agent once the action was correct (i.e., a normative model of the environment).
- receives human data (or mock data if you prefer), and returns what participants performed as the observation. This is more useful for preference-based RL.

In [None]:
class NBack(dm_env.Environment):

  ACTIONS = ['match', 'non-match']

  def __init__(self,
               N=2,
               episode_steps=32,
               stimuli_choices=list('ABCDEF'),
               human_data=None,
               seed=1,
               ):
    """
    Args:
      N
      episode_steps
      stimuli_choices
      human_data
      seed

    """
    self.N = N
    self.episode_steps = episode_steps
    self.stimuli_choices = stimuli_choices
    self.stimuli = np.empty(shape=episode_steps)  # will be filled in the `reset()`

    self._reset_next_step = True

    # whether mimic humans or reward the agent once it responds optimally.
    if human_data is None:
      self._imitate_human = False
      self.human_data = None
      self.human_subject_data = None
    else:
      self._imitate_human = True
      self.human_data = human_data
      self.human_subject_data = None

    self._action_history = []

  def reset(self):
    self._reset_next_step = False
    self._current_step = 0
    self._action_history.clear()

    # generate a random sequence instead of relying on human data
    if self.human_data is None:
      # self.stimuli = np.random.choice(self.stimuli_choices, self.episode_steps)
      # FIXME This is a fix for acme & reverb issue with string observation. Agent should be able to handle strings
      self.stimuli = np.random.choice(len(self.stimuli_choices), self.episode_steps).astype(np.float32)
    else:
      # randomly choose a subject from the human data and follow her trials and responses.
      self.human_subject_data = self.human_data.query('participant_id == participant_id.sample().iloc[0]',
                                                engine='python').sort_values('trial_index')
      self.stimuli = self.human_subject_data['stimulus'].values
      # FIXME should we always use one specific human subject or randomly select one in each episode?

    return dm_env.restart(self._observation())


  def _episode_return(self):
    if self._imitate_human:
      return np.mean(self.human_subject_data['response'] == self._action_history)
    else:
      return 0

  def step(self, action: int):
    if self._reset_next_step:
      return self.reset()

    if self._imitate_human:
      # if it was the same action as the human subject, then reward the agent
      human_action = self.human_subject_data['response'].iloc[self._current_step]
      agent_action = NBack.ACTIONS[action]
      step_reward = (agent_action == human_action)
    else:
      # assume the agent is rationale and doesn't want to reproduce human, reward once the response it correct
      expected_action = 'match' if (self.stimuli[self._current_step] == self.stimuli[self._current_step - self.N]) else 'non-match'
      agent_action = NBack.ACTIONS[action]
      step_reward = 1. if (agent_action == expected_action) else -1.

    self._action_history.append(agent_action)

    self._current_step += 1

    # Check for termination.
    if self._current_step == self.stimuli.shape[0]:
      self._reset_next_step = True
      # we are using the mean of total time step rewards as the episode return
      return dm_env.termination(reward=self._episode_return(),
                                observation=self._observation())
    else:
      return dm_env.transition(reward=step_reward,
                               observation=self._observation())

  def observation_spec(self):
    return dm_env.specs.BoundedArray(
        shape=self.stimuli.shape,
        dtype=self.stimuli.dtype,
        name='nback_stimuli', minimum=0, maximum=1)

  def action_spec(self):
    return dm_env.specs.DiscreteArray(
        dtype=int,
        num_values=len(NBack.ACTIONS),
        name='action')

  def _observation(self):

    # agents observe only the current stimulus
    obs = self.stimuli[self._current_step - 1]

    # TODO uncomment to observe all the previrous stimuli instead of only the current stimulus
    # obs = self.stimuli[:self.current_step]
    # obs = ''.join(obs)

    return obs

  def plot_state(self):
    """Display current state of the environment.

     Note: `M` mean `match`, and `.` is a `non-match`.
    """
    from IPython.display import HTML
    stimuli = self.stimuli[:self._current_step - 1]
    actions = ['M' if a=='match' else '.' for a in self._action_history[:self._current_step - 1]]
    return HTML(
        f'<b>Environment ({self.N}-back):</b><br />'
        f'<pre><b>Stimuli:</b> {"".join(map(str,map(int,stimuli)))}</pre>'
        f'<pre><b>Actions:</b> {"".join(actions)}</pre>'
    )

  @staticmethod
  def create_environment():
    """Utility function to create a N-back environment and its spec."""

    # Make sure the environment outputs single-precision floats.
    environment = wrappers.SinglePrecisionWrapper(NBack())

    # Grab the spec of the environment.
    environment_spec = specs.make_environment_spec(environment)

    return environment, environment_spec

### Define a random agent

For more information you can refer to NMA-DL W3D2 Basic Reinforcement learning.

In [None]:
class RandomAgent(acme.Actor):

  def __init__(self, environment_spec):
    """Gets the number of available actions from the environment spec."""
    self._num_actions = environment_spec.actions.num_values

  def select_action(self, observation):
    """Selects an action uniformly at random."""
    action = np.random.randint(self._num_actions)
    return action

  def observe_first(self, timestep):
    """Does not record as the RandomAgent has no use for data."""
    pass

  def observe(self, action, next_timestep):
    """Does not record as the RandomAgent has no use for data."""
    pass

  def update(self):
    """Does not update as the RandomAgent does not learn from data."""
    pass

### Initialize the environment and the agent

In [None]:
environment, environment_spec = NBack.create_environment()
agent = RandomAgent(environment_spec)

print('actions:\n', environment_spec.actions)
print('observations:\n', environment_spec.observations)
print('rewards:\n', environment_spec.rewards)

# DEBUG
# timestep = environment.step(NBack.ACTIONS[0])  # pytype: dm_env.TimeStep
# timestep

actions:
 DiscreteArray(shape=(), dtype=int32, name=action, minimum=0, maximum=1, num_values=2)
observations:
 BoundedArray(shape=(32,), dtype=dtype('float32'), name='nback_stimuli', minimum=0.0, maximum=1.0)
rewards:
 Array(shape=(), dtype=dtype('float32'), name='reward')


### Run the loop

For more details, see NMA-DL W3D2.

In [None]:
# fitting parameters
n_episodes = 1_000
n_total_steps = 0
log_loss = False
n_steps = n_episodes * 32
all_returns = []


# main loop
for episode in range(n_episodes):
  episode_steps = 0
  episode_return = 0
  episode_loss = 0

  start_time = time.time()

  timestep = environment.reset()

  # Make the first observation.
  agent.observe_first(timestep)

  # Run an episode
  while not timestep.last():

    # DEBUG
    # print(timestep)

    # Generate an action from the agent's policy and step the environment.
    action = agent.select_action(timestep.observation)
    timestep = environment.step(action)

    # Have the agent observe the timestep and let the agent update itself.
    agent.observe(action, next_timestep=timestep)
    agent.update()

    # Book-keeping.
    episode_steps += 1
    n_total_steps += 1
    episode_return += timestep.reward

    if log_loss:
      episode_loss += agent.last_loss

    if n_steps is not None and n_total_steps >= n_steps:
      break

  # Collect the results and combine with counts.
  steps_per_second = episode_steps / (time.time() - start_time)
  result = {
      'episode': episode,
      'episode_length': episode_steps,
      'episode_return': episode_return,
  }
  if log_loss:
    result['loss_avg'] = episode_loss/episode_steps

  all_returns.append(episode_return)

  from IPython.display import display
  display(environment.plot_state())
  # Log the given results.
  print(result)

  if n_steps is not None and n_total_steps >= n_steps:
    break

print('\n', 'All episode returns:', all_returns)

{'episode': 0, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 1, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 2, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 3, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 4, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 5, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 6, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 7, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 8, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 9, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 10, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 11, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 12, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 13, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 14, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 15, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 16, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 17, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 18, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 19, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 20, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 21, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 22, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 23, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 24, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 25, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 26, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 27, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 28, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 29, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 30, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 31, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 32, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 33, 'episode_length': 32, 'episode_return': 13.0}


{'episode': 34, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 35, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 36, 'episode_length': 32, 'episode_return': 13.0}


{'episode': 37, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 38, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 39, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 40, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 41, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 42, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 43, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 44, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 45, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 46, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 47, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 48, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 49, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 50, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 51, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 52, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 53, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 54, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 55, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 56, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 57, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 58, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 59, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 60, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 61, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 62, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 63, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 64, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 65, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 66, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 67, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 68, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 69, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 70, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 71, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 72, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 73, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 74, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 75, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 76, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 77, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 78, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 79, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 80, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 81, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 82, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 83, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 84, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 85, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 86, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 87, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 88, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 89, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 90, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 91, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 92, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 93, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 94, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 95, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 96, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 97, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 98, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 99, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 100, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 101, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 102, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 103, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 104, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 105, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 106, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 107, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 108, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 109, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 110, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 111, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 112, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 113, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 114, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 115, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 116, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 117, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 118, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 119, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 120, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 121, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 122, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 123, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 124, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 125, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 126, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 127, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 128, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 129, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 130, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 131, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 132, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 133, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 134, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 135, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 136, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 137, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 138, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 139, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 140, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 141, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 142, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 143, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 144, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 145, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 146, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 147, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 148, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 149, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 150, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 151, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 152, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 153, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 154, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 155, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 156, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 157, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 158, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 159, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 160, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 161, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 162, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 163, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 164, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 165, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 166, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 167, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 168, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 169, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 170, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 171, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 172, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 173, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 174, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 175, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 176, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 177, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 178, 'episode_length': 32, 'episode_return': 13.0}


{'episode': 179, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 180, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 181, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 182, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 183, 'episode_length': 32, 'episode_return': 15.0}


{'episode': 184, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 185, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 186, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 187, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 188, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 189, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 190, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 191, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 192, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 193, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 194, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 195, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 196, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 197, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 198, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 199, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 200, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 201, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 202, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 203, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 204, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 205, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 206, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 207, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 208, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 209, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 210, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 211, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 212, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 213, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 214, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 215, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 216, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 217, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 218, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 219, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 220, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 221, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 222, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 223, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 224, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 225, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 226, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 227, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 228, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 229, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 230, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 231, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 232, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 233, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 234, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 235, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 236, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 237, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 238, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 239, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 240, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 241, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 242, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 243, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 244, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 245, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 246, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 247, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 248, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 249, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 250, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 251, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 252, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 253, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 254, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 255, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 256, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 257, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 258, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 259, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 260, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 261, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 262, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 263, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 264, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 265, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 266, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 267, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 268, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 269, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 270, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 271, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 272, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 273, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 274, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 275, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 276, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 277, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 278, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 279, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 280, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 281, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 282, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 283, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 284, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 285, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 286, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 287, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 288, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 289, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 290, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 291, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 292, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 293, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 294, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 295, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 296, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 297, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 298, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 299, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 300, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 301, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 302, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 303, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 304, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 305, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 306, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 307, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 308, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 309, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 310, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 311, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 312, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 313, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 314, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 315, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 316, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 317, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 318, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 319, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 320, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 321, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 322, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 323, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 324, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 325, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 326, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 327, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 328, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 329, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 330, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 331, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 332, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 333, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 334, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 335, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 336, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 337, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 338, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 339, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 340, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 341, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 342, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 343, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 344, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 345, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 346, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 347, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 348, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 349, 'episode_length': 32, 'episode_return': 15.0}


{'episode': 350, 'episode_length': 32, 'episode_return': 13.0}


{'episode': 351, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 352, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 353, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 354, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 355, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 356, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 357, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 358, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 359, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 360, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 361, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 362, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 363, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 364, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 365, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 366, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 367, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 368, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 369, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 370, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 371, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 372, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 373, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 374, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 375, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 376, 'episode_length': 32, 'episode_return': 15.0}


{'episode': 377, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 378, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 379, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 380, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 381, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 382, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 383, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 384, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 385, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 386, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 387, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 388, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 389, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 390, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 391, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 392, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 393, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 394, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 395, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 396, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 397, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 398, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 399, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 400, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 401, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 402, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 403, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 404, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 405, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 406, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 407, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 408, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 409, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 410, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 411, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 412, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 413, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 414, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 415, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 416, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 417, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 418, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 419, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 420, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 421, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 422, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 423, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 424, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 425, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 426, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 427, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 428, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 429, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 430, 'episode_length': 32, 'episode_return': 13.0}


{'episode': 431, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 432, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 433, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 434, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 435, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 436, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 437, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 438, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 439, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 440, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 441, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 442, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 443, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 444, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 445, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 446, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 447, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 448, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 449, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 450, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 451, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 452, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 453, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 454, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 455, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 456, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 457, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 458, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 459, 'episode_length': 32, 'episode_return': 13.0}


{'episode': 460, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 461, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 462, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 463, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 464, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 465, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 466, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 467, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 468, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 469, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 470, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 471, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 472, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 473, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 474, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 475, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 476, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 477, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 478, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 479, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 480, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 481, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 482, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 483, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 484, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 485, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 486, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 487, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 488, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 489, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 490, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 491, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 492, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 493, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 494, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 495, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 496, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 497, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 498, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 499, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 500, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 501, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 502, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 503, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 504, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 505, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 506, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 507, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 508, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 509, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 510, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 511, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 512, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 513, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 514, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 515, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 516, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 517, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 518, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 519, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 520, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 521, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 522, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 523, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 524, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 525, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 526, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 527, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 528, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 529, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 530, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 531, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 532, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 533, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 534, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 535, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 536, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 537, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 538, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 539, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 540, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 541, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 542, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 543, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 544, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 545, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 546, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 547, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 548, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 549, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 550, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 551, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 552, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 553, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 554, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 555, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 556, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 557, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 558, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 559, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 560, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 561, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 562, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 563, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 564, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 565, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 566, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 567, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 568, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 569, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 570, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 571, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 572, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 573, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 574, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 575, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 576, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 577, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 578, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 579, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 580, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 581, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 582, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 583, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 584, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 585, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 586, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 587, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 588, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 589, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 590, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 591, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 592, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 593, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 594, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 595, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 596, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 597, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 598, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 599, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 600, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 601, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 602, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 603, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 604, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 605, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 606, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 607, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 608, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 609, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 610, 'episode_length': 32, 'episode_return': 15.0}


{'episode': 611, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 612, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 613, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 614, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 615, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 616, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 617, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 618, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 619, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 620, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 621, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 622, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 623, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 624, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 625, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 626, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 627, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 628, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 629, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 630, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 631, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 632, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 633, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 634, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 635, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 636, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 637, 'episode_length': 32, 'episode_return': 13.0}


{'episode': 638, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 639, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 640, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 641, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 642, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 643, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 644, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 645, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 646, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 647, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 648, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 649, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 650, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 651, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 652, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 653, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 654, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 655, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 656, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 657, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 658, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 659, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 660, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 661, 'episode_length': 32, 'episode_return': 15.0}


{'episode': 662, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 663, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 664, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 665, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 666, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 667, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 668, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 669, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 670, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 671, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 672, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 673, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 674, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 675, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 676, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 677, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 678, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 679, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 680, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 681, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 682, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 683, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 684, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 685, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 686, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 687, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 688, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 689, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 690, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 691, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 692, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 693, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 694, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 695, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 696, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 697, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 698, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 699, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 700, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 701, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 702, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 703, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 704, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 705, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 706, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 707, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 708, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 709, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 710, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 711, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 712, 'episode_length': 32, 'episode_return': 15.0}


{'episode': 713, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 714, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 715, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 716, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 717, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 718, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 719, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 720, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 721, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 722, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 723, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 724, 'episode_length': 32, 'episode_return': 13.0}


{'episode': 725, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 726, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 727, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 728, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 729, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 730, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 731, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 732, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 733, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 734, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 735, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 736, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 737, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 738, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 739, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 740, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 741, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 742, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 743, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 744, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 745, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 746, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 747, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 748, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 749, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 750, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 751, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 752, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 753, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 754, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 755, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 756, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 757, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 758, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 759, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 760, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 761, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 762, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 763, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 764, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 765, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 766, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 767, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 768, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 769, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 770, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 771, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 772, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 773, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 774, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 775, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 776, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 777, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 778, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 779, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 780, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 781, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 782, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 783, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 784, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 785, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 786, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 787, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 788, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 789, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 790, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 791, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 792, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 793, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 794, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 795, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 796, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 797, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 798, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 799, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 800, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 801, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 802, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 803, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 804, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 805, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 806, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 807, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 808, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 809, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 810, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 811, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 812, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 813, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 814, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 815, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 816, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 817, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 818, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 819, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 820, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 821, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 822, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 823, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 824, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 825, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 826, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 827, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 828, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 829, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 830, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 831, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 832, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 833, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 834, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 835, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 836, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 837, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 838, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 839, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 840, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 841, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 842, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 843, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 844, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 845, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 846, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 847, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 848, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 849, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 850, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 851, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 852, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 853, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 854, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 855, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 856, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 857, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 858, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 859, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 860, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 861, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 862, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 863, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 864, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 865, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 866, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 867, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 868, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 869, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 870, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 871, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 872, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 873, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 874, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 875, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 876, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 877, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 878, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 879, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 880, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 881, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 882, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 883, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 884, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 885, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 886, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 887, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 888, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 889, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 890, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 891, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 892, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 893, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 894, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 895, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 896, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 897, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 898, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 899, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 900, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 901, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 902, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 903, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 904, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 905, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 906, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 907, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 908, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 909, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 910, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 911, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 912, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 913, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 914, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 915, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 916, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 917, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 918, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 919, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 920, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 921, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 922, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 923, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 924, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 925, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 926, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 927, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 928, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 929, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 930, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 931, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 932, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 933, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 934, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 935, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 936, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 937, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 938, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 939, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 940, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 941, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 942, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 943, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 944, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 945, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 946, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 947, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 948, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 949, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 950, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 951, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 952, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 953, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 954, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 955, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 956, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 957, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 958, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 959, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 960, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 961, 'episode_length': 32, 'episode_return': 9.0}


{'episode': 962, 'episode_length': 32, 'episode_return': -13.0}


{'episode': 963, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 964, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 965, 'episode_length': 32, 'episode_return': -7.0}


{'episode': 966, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 967, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 968, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 969, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 970, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 971, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 972, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 973, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 974, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 975, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 976, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 977, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 978, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 979, 'episode_length': 32, 'episode_return': -11.0}


{'episode': 980, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 981, 'episode_length': 32, 'episode_return': 7.0}


{'episode': 982, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 983, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 984, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 985, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 986, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 987, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 988, 'episode_length': 32, 'episode_return': 3.0}


{'episode': 989, 'episode_length': 32, 'episode_return': -1.0}


{'episode': 990, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 991, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 992, 'episode_length': 32, 'episode_return': 11.0}


{'episode': 993, 'episode_length': 32, 'episode_return': 1.0}


{'episode': 994, 'episode_length': 32, 'episode_return': -5.0}


{'episode': 995, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 996, 'episode_length': 32, 'episode_return': 5.0}


{'episode': 997, 'episode_length': 32, 'episode_return': -9.0}


{'episode': 998, 'episode_length': 32, 'episode_return': -3.0}


{'episode': 999, 'episode_length': 32, 'episode_return': 3.0}

 All episode returns: [-1.0, 5.0, 5.0, -1.0, -11.0, 3.0, -7.0, 3.0, 3.0, -3.0, 1.0, -3.0, -3.0, 3.0, -9.0, 9.0, -3.0, 3.0, -7.0, -1.0, -5.0, -7.0, -5.0, 3.0, 5.0, -13.0, -1.0, -5.0, 1.0, 1.0, 1.0, 3.0, 1.0, 13.0, 1.0, 5.0, 13.0, 1.0, -9.0, 5.0, -5.0, 1.0, -1.0, 5.0, 7.0, -3.0, 1.0, 5.0, -7.0, -1.0, 5.0, -7.0, 3.0, -7.0, -5.0, -7.0, -7.0, -1.0, -1.0, 1.0, -7.0, 1.0, -5.0, 7.0, 1.0, -1.0, 7.0, 1.0, 1.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, -3.0, -11.0, -3.0, -1.0, -9.0, 1.0, -3.0, -3.0, 11.0, 7.0, 1.0, 5.0, 9.0, -5.0, -5.0, -1.0, -7.0, -1.0, 1.0, 5.0, -5.0, 7.0, 1.0, 9.0, -3.0, -5.0, 1.0, -7.0, 5.0, 1.0, -3.0, 5.0, 3.0, 3.0, 1.0, -9.0, -7.0, -3.0, -3.0, 5.0, 7.0, 1.0, -1.0, -5.0, -1.0, 1.0, -1.0, 5.0, -7.0, -7.0, -5.0, 5.0, 5.0, 5.0, -11.0, -1.0, -9.0, 3.0, -13.0, -3.0, 9.0, 7.0, 7.0, 5.0, -1.0, 5.0, -3.0, 3.0, -1.0, 5.0, 1.0, -5.0, -13.0, -1.0, 7.0, -7.0, 3.0, 3.0, -1.0, -3.0, -9.0, 1.0, 3.0, 1.0, 7.0, 3.0, -13.0, 7.0, -9.0, -1.0, 

**Note:** You can simplify the environment loop using [DeepMind Acme](https://github.com/deepmind/acme).

In [None]:
def make_networks_d4pg(action_spec,
                       policy_layer_sizes=(256, 256, 256),
                       critic_layer_sizes=(512, 512, 256),
                       vmin=-150.,
                       vmax=150.,
                       num_atoms=51,
                      ):
  """Networks for D4PG agent."""
  action_size = np.prod(action_spec.shape, dtype=int)

  policy_network = snt.Sequential([
      tf2_utils.batch_concat,
      networks.LayerNormMLP(layer_sizes=policy_layer_sizes + (action_size,)),
      networks.TanhToSpec(spec=action_spec)
      ])
  critic_network = snt.Sequential([
      networks.CriticMultiplexer(
          action_network=networks.ClipToSpec(action_spec),
          critic_network=networks.LayerNormMLP(
              layer_sizes=critic_layer_sizes,
              activate_final=True),
      ),
      networks.DiscreteValuedHead(vmin=vmin,
                                  vmax=vmax,
                                  num_atoms=num_atoms)
      ])

  return policy_network, critic_network


def make_networks_dqn(action_spec):
  network = snt.Sequential([
      snt.Flatten(),
      snt.nets.MLP([50, 50, action_spec.num_values]),
  ])
  return network


policy_optimizer = snt.optimizers.Adam(1e-4)
critic_optimizer = snt.optimizers.Adam(1e-4)

In [None]:
# init the N-back environment
env, env_spec = NBack.create_environment()

# DEBUG fake testing environment.
# Uncomment this to debug your agent without using the N-back environment.
# env = fakes.DiscreteEnvironment(
#     num_actions=2,
#     num_observations=1000,
#     obs_dtype=np.float32,
#     episode_length=32)
# env_spec = specs.make_environment_spec(env)

In [None]:
# DQN agent
# agent = DQN(
#     environment_spec=env_spec,
#     network=make_networks_dqn(env_spec.actions))

# D4PG agent
# policy_network, critic_network = make_networks_d4pg(env_spec.actions)
# agent = D4PG(environment_spec=env_spec,
#              policy_network=policy_network,
#              critic_network=critic_network,
#              observation_network=tf2_utils.batch_concat, # Identity Op.
#              policy_optimizer=policy_optimizer,
#              critic_optimizer=critic_optimizer,
#              logger=InMemoryLogger())

# random agent
agent = RandomAgent(env_spec)

Now, we run the environment loop with the initiated agent and print the training log.

In [None]:
# training loop
loop = EnvironmentLoop(env, agent, logger=InMemoryLogger())
loop.run(n_episodes)

# print logs
logs = pd.DataFrame(loop._logger._data)
logs.tail()

Unnamed: 0,episode_length,episode_return,steps_per_second,episodes,steps
995,32,-3.0,12310.164909,996,31872
996,32,1.0,11280.696588,997,31904
997,32,-1.0,10034.970318,998,31936
998,32,-9.0,11047.635855,999,31968
999,32,-5.0,11354.177142,1000,32000
