<a href="https://colab.research.google.com/github/christianhidber/easyagents/blob/master/jupyter_notebooks/easyagents_logging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Investigating an agent api through logging, seeding & fixing juypter output cell clearing

### Install packages (gym, tfagents, tensorflow,....)

#### suppress package warnings, in colab: load additional packages for rendering

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import sys
import warnings

warnings.filterwarnings('ignore')
if 'google.colab' in sys.modules:
    !apt-get install xvfb >/dev/null
    !pip install pyvirtualdisplay >/dev/null    
    
    from pyvirtualdisplay import Display
    Display(visible=0, size=(960, 720)).start() 
else:
    #  for local installation
    sys.path.append('..')

#### install easyagents and rendering for orso

In [None]:
if 'google.colab' in sys.modules:
    !pip install -q easyagents >/dev/null
    !pip install -q networkx==2.3.0 >/dev/null

## Agent logging 

Use the log.Agent() callback to investigate how easyagents interacts with a backend:

In [1]:
from easyagents.agents import PpoAgent
from easyagents.callbacks import duration, log

ppoAgent = PpoAgent('CartPole-v0')
ppoAgent.train([log.Agent(), duration.Fast()], default_plots=False)

Using TensorFlow backend.
backend_name             tfagents 
TFPyEnvironment          ( suite_gym.load( ... ) ) 
AdamOptimizer            () 
ActorDistributionNetwork () 
ValueNetwork             () 
PpoAgent                 () 
tf_agent.initialize      () 
TFUniformReplayBuffer    () 
DynamicEpisodeDriver     () 
TFPyEnvironment          ( suite_gym.load( ... ) ) 
-----                    iteration    0 of 10        ----- 
collect_driver.run       () 
replay_buffer.gather_all () 
tf_agent.train           (experience=...) 
                         loss=4906.7  [actor=0.0     critic=4906.7 ] 
replay_buffer.clear      () 
-----                    iteration    1 of 10        ----- 
collect_driver.run       () 
replay_buffer.gather_all () 
tf_agent.train           (experience=...) 
                         loss=10658.6 [actor=0.0     critic=10658.6] 
replay_buffer.clear      () 
-----                    iteration    2 of 10        ----- 
collect_driver.run       () 
replay_buffer.gather_al


  TensorFlow's `tf-nightly` package will soon be updated to TensorFlow 2.0.

  Please upgrade your code to TensorFlow 2.0:
    * https://www.tensorflow.org/beta/guide/migration_guide

  Or install the latest stable TensorFlow 1.X release:
    * `pip install -U "tensorflow==1.*"`

  Otherwise your code may be broken by the change.

  


Instructions for updating:
SeedStream has moved to `tfp.util.SeedStream`.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


<easyagents.core.PpoTrainContext at 0x21acc8f7a48>

Plots clear the jupyter cell ouput before each update thereby clearing the log output as well, thus we turned them off.
Typically each call to the backend api during training is logged. 
Note that the logging starts with 'tfagents' the default backend for the PpoAgent.
We then see a sequence of calls performing the Agent initialisation before we enter the train loop.
Api calls during play or evaluation are not logged.

Let's take a look at the tensorforce backend:

In [1]:
from easyagents.agents import PpoAgent
from easyagents.callbacks import duration, log

ppoAgent = PpoAgent('CartPole-v0', backend='tensorforce')
ppoAgent.train([log.Agent(), duration.Fast()], default_plots=False)

Using TensorFlow backend.
backend_name             tensorforce 
Creating Environment...  
Environment.create       (environment="gym", level=CartPole-v0) 
Creating network specification...
Agent.create             (agent="ppo", environment=..., network=[{'type': 'dense', 'size': 100, 'activation': 'relu'}, {'type': 'dense', 'size': 100, 'activation': 'relu'}]learning_rate=0.001, batch_size=3, optimization_steps=1, discount=1.0) 
Runner.create            (agent=..., environment=...) 
runner.run               (num_episodes=None, max_episode_timesteps=50) 
Environment.create       (environment="gym", level=CartPole-v0) 



  TensorFlow's `tf-nightly` package will soon be updated to TensorFlow 2.0.

  Please upgrade your code to TensorFlow 2.0:
    * https://www.tensorflow.org/beta/guide/migration_guide

  Or install the latest stable TensorFlow 1.X release:
    * `pip install -U "tensorflow==1.*"`

  Otherwise your code may be broken by the change.

  


Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
`tf.batch_gather` is deprecated, please use `tf.gather` with `batch_dims=-1` instead.
Instructions for updating:
reduction_indices is deprecated, use axis instead
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


<easyagents.core.PpoTrainContext at 0x2b4acb78108>

While in tensorforce we also first do a sequence of agent and policy. Note that in contrast to tfagents we do not
build up actor and critic policy networks but instead pass a network specification to the Agent.create call.
Moreover tensorforce implements already the train loop through its Runner class. 
Thus we only see 1 call to runner.run instead of the many api calls for tfagents.

## Seeding

To set a seed use:

In [1]:
import easyagents

easyagents.agents.seed = 0

Using TensorFlow backend.



  TensorFlow's `tf-nightly` package will soon be updated to TensorFlow 2.0.

  Please upgrade your code to TensorFlow 2.0:
    * https://www.tensorflow.org/beta/guide/migration_guide

  Or install the latest stable TensorFlow 1.X release:
    * `pip install -U "tensorflow==1.*"`

  Otherwise your code may be broken by the change.

  




Once set, the seed is applied before each call to train. Let's validate this using our log.Agent callback:

In [2]:
from easyagents.agents import PpoAgent
from easyagents.callbacks import duration, log

ppoAgent = PpoAgent('CartPole-v0', backend='tensorforce')
ppoAgent.train([log.Agent(), duration.Fast()], default_plots=False)

backend_name             tensorforce 
tf.compat.v1.set_random_seed(0) 
tf.random.set_random_seed(seed=0) 
numpy.random.seed        (0) 
random.seed              (0) 
Environment.create       (environment="gym", level=CartPole-v0) 
Agent.create             (agent="ppo", environment=..., network=[{'type': 'dense', 'size': 100, 'activation': 'relu'}, {'type': 'dense', 'size': 100, 'activation': 'relu'}]learning_rate=0.001, batch_size=3, optimization_steps=1, discount=1.0) 
Runner.create            (agent=..., environment=...) 
runner.run               (num_episodes=None, max_episode_timesteps=50) 
Environment.create       (environment="gym", level=CartPole-v0) 



Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
`tf.batch_gather` is deprecated, please use `tf.gather` with `batch_dims=-1` instead.
Instructions for updating:
reduction_indices is deprecated, use axis instead
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


<easyagents.core.PpoTrainContext at 0x2a14a2c4e88>

<Figure size 1224x432 with 0 Axes>

Note that at the very beginning the calls to set the seeds for tensorflow, numpy and python.

## Gym steps logging
Use the log.Step() callback to investigate how the agent interacts with the gym environment:

In [3]:
from easyagents.agents import PpoAgent
from easyagents.callbacks import duration, log

ppoAgent = PpoAgent('CartPole-v0')
ppoAgent.train([log.Step(), duration.Fast()], default_plots=False)

Instructions for updating:
SeedStream has moved to `tfp.util.SeedStream`.


[CartPole-v0 3:0  :1  ] train iteration=0  step=0   play  episode=0  step=1     sum_of_rewards=1.0     reward=1.0   done=False action=1 observation=[-0.04363321  0.24146826  0.01284913 -0.30946528]
[CartPole-v0 3:0  :2  ] train iteration=0  step=0   play  episode=0  step=2     sum_of_rewards=2.0     reward=1.0   done=False action=0 observation=[-0.03880385  0.04616562  0.00665982 -0.01275795]
[CartPole-v0 3:0  :3  ] train iteration=0  step=0   play  episode=0  step=3     sum_of_rewards=3.0     reward=1.0   done=False action=1 observation=[-0.03788053  0.24119143  0.00640466 -0.30333221]
[CartPole-v0 3:0  :4  ] train iteration=0  step=0   play  episode=0  step=4     sum_of_rewards=4.0     reward=1.0   done=False action=0 observation=[-0.03305671  0.04597879  0.00033802 -0.0086363 ]
[CartPole-v0 3:0  :5  ] train iteration=0  step=0   play  episode=0  step=5     sum_of_rewards=5.0     reward=1.0   done=False action=1 observation=[-3.21371306e-02  2.41095889e-01  1.65294467e-04 -3.01212554

<easyagents.core.PpoTrainContext at 0x2a1609bec88>

<Figure size 1224x432 with 0 Axes>

For each call to the gym environments step method you get a log entry, along with the action taken and current
observation. Each entry starts with 

[{gym_env_id} {instance_id}:{episode_in_instance}:{step_in_episode}]

followed by the id of the current training iteration as well as the current iteration step count.
If in a evaluation period you get the same statistics for the current evaluation episode.

You may easily implement other log callbacks to produce statistics specific to your problem domain.

## Fixing a jupyter output cell clearing
It seems that jupyter / matplotlib backend changes its behaviour of outputing the current figure of an 
evaluated cell (if you can help here, please let use know by 
[creating an issue](https://github.com/christianhidber/easyagents/issues/new/choose)).

Nonetheless you may directly control easyagents jupyter ouput cell clearing behaviour through the plot.Clear()
callback:


In [None]:
from easyagents.agents import PpoAgent
from easyagents.callbacks import duration, log

ppoAgent = PpoAgent('CartPole-v0')
ppoAgent.train([log.Clear(on_train=False,on_play=False), duration.Fast()])

If your plot gets "doubled" after cell evaluation set on_train / on_play to True, if it disappears to False.