Notebook baseado nas pesquisas e trabalhos do Google CoLab utilizando Gym e Ambientes Gym.

Links para notebooks originais:
 * [link1](https://colab.research.google.com/drive/1C5iArMcVaiIwGatAj2utZAMHVtEmLLfw)

 * [link2](https://colab.research.google.com/drive/18LdlDDT87eb8cCTHZsXyS9ksQPzL3i6H#scrollTo=3U99_zgNCk3t)

> Este notebook treina uma política ótima usando a função PPO () da `spinup` para a tarefa LunarLander.



> Links para a trilha original de tutoriais: 

>> Primeiro notebook Python da série. Ele apresenta os preâmbulos necessários para trabalhar com o Gym no Google CoLab e explora os ambientes do Gym e políticas simples. [link](https://colab.research.google.com/drive/18LdlDDT87eb8cCTHZsXyS9ksQPzL3i6H).

>> Segundo notebook Python nesta série. Mostra como resolver algumas tarefas com ação aleatória, ação determinística atribuída ou ação heurística e como renderizar o processo de tarefas em vídeo. [link](https://colab.research.google.com/drive/1tug_bpg8RwrFOI8C6Ed-zo0OgD3yfnWy).

# CoLab Preambles

A maioria dos requisitos dos pacotes python já são atendidos no CoLab. Para executar o Gym, você deve instalar pré-requisitos como xvbf, opengl e outros pacotes python-dev usando os seguintes códigos.

In [None]:
!pip install gym
!apt-get install python-opengl -y
!apt install xvfb -y

# Special gym environment
!pip install gym[atari]

# For rendering environment, you can use pyvirtualdisplay.
!pip install pyvirtualdisplay
!pip install piglet

# To activate virtual display 
# need to run a script once for training an agent as follows
from pyvirtualdisplay import Display
display = Display(visible=0, size=(1400, 900))
display.start()

# This code creates a virtual display to draw game images on. 
# If you are running locally, just ignore it
import os
if type(os.environ.get("DISPLAY")) is not str or len(os.environ.get("DISPLAY"))==0:
    !bash ../xvfb start
    %env DISPLAY=:1

#
# Import libraries
#
import gym
from gym import logger as gymlogger
from gym.wrappers import Monitor
gymlogger.set_level(40) # error only
import tensorflow as tf
import numpy as np
import random
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import math
import glob
import io
import base64
from IPython.display import HTML

from IPython import display as ipythondisplay

"""
Utility functions to enable video recording of gym environment and displaying it
To enable video, just do "env = wrap_env(env)""
"""

def show_video():
  mp4list = glob.glob('video/*.mp4')
  if len(mp4list) > 0:
    mp4 = mp4list[0]
    video = io.open(mp4, 'r+b').read()
    encoded = base64.b64encode(video)
    ipythondisplay.display(HTML(data='''<video alt="test" autoplay 
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
  else: 
    print("Could not find video")
    

def wrap_env(env):
  env = Monitor(env, './video', force=True)
  return env

Reading package lists... Done
Building dependency tree       
Reading state information... Done
python-opengl is already the newest version (3.1.0+dfsg-1).
0 upgraded, 0 newly installed, 0 to remove and 21 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
xvfb is already the newest version (2:1.19.6-1ubuntu4.6).
0 upgraded, 0 newly installed, 0 to remove and 21 not upgraded.


### Instalando o spinningup no ambiente do CoLab.

In [None]:
# Install spinningup on CoLab
!git clone https://github.com/openai/spinningup.git
!cd spinningup
#!pip install -e . # this will incur error: File "setup.py" not found. Directory cannot be installed in editable mode: /content
!pip install -e spinningup

Cloning into 'spinningup'...
remote: Enumerating objects: 1263, done.[K
remote: Total 1263 (delta 0), reused 0 (delta 0), pack-reused 1263[K
Receiving objects: 100% (1263/1263), 31.02 MiB | 32.64 MiB/s, done.
Resolving deltas: 100% (590/590), done.
Obtaining file:///content/spinningup
Collecting cloudpickle==1.2.1
  Downloading https://files.pythonhosted.org/packages/09/f4/4a080c349c1680a2086196fcf0286a65931708156f39568ed7051e42ff6a/cloudpickle-1.2.1-py2.py3-none-any.whl
Collecting gym[atari,box2d,classic_control]~=0.15.3
[?25l  Downloading https://files.pythonhosted.org/packages/e0/01/8771e8f914a627022296dab694092a11a7d417b6c8364f0a44a8debca734/gym-0.15.7.tar.gz (1.6MB)
[K     |████████████████████████████████| 1.6MB 4.0MB/s 
Collecting matplotlib==3.1.1
[?25l  Downloading https://files.pythonhosted.org/packages/57/4f/dd381ecf6c6ab9bcdaa8ea912e866dedc6e696756156d8ecc087e20817e2/matplotlib-3.1.1-cp36-cp36m-manylinux1_x86_64.whl (13.1MB)
[K     |████████████████████████████████| 1

In [None]:
import spinup

# OpenAI Gym

OpenAI gym é uma biblioteca python que envolve muitos problemas de decisão clássicos, incluindo controle de robôs, videogames e jogos de tabuleiro. Usaremos os ambientes que ele fornece para testar nossos algoritmos de aprendizagem por reforço.

## Examplo: LunarLander-v2

Treina um agente usando o algoritmo PPO (Proximal Policy
Optimization) em spinup e apresenta o resultado da política aprendida em um vídeo. 

Ref.: https://openai.com/blog/openai-baselines-ppo/ 

### Treinar PPO com spinup

In [None]:
# load packages
import gym
from spinup import ppo_tf1 as ppo
import tensorflow as tf

# after training, load policy and show results in video
from spinup.utils.test_policy import load_tf_policy, run_policy

# train policy
env_fn = lambda : gym.make('LunarLander-v2')

ac_kwargs = dict(hidden_sizes=[64,64], activation=tf.nn.relu)

logger_kwargs = dict(output_dir='output_dir', exp_name='experiment_name')




In [None]:
ppo(env_fn=env_fn, ac_kwargs=ac_kwargs, steps_per_epoch=5000, epochs=250, logger_kwargs=logger_kwargs)

[32;1mLogging data to output_dir/progress.txt[0m
[36;1mSaving config:
[0m
{
    "ac_kwargs":	{
        "activation":	"relu",
        "hidden_sizes":	[
            64,
            64
        ]
    },
    "actor_critic":	"mlp_actor_critic",
    "clip_ratio":	0.2,
    "env_fn":	"<function <lambda> at 0x7f4255f5d1e0>",
    "epochs":	5,
    "exp_name":	"experiment_name",
    "gamma":	0.99,
    "lam":	0.97,
    "logger":	{
        "<spinup.utils.logx.EpochLogger object at 0x7f4255f55cf8>":	{
            "epoch_dict":	{},
            "exp_name":	"experiment_name",
            "first_row":	true,
            "log_current_row":	{},
            "log_headers":	[],
            "output_dir":	"output_dir",
            "output_file":	{
                "<_io.TextIOWrapper name='output_dir/progress.txt' mode='w' encoding='UTF-8'>":	{
                    "mode":	"w"
                }
            }
        }
    },
    "logger_kwargs":	{
        "exp_name":	"experiment_name",
        "output_dir":	"ou



---------------------------------------
|             Epoch |               0 |
|      AverageEpRet |            -212 |
|          StdEpRet |             131 |
|          MaxEpRet |           -31.3 |
|          MinEpRet |            -556 |
|             EpLen |            99.8 |
|      AverageVVals |         0.00749 |
|          StdVVals |           0.117 |
|          MaxVVals |           0.322 |
|          MinVVals |          -0.585 |
| TotalEnvInteracts |           5e+03 |
|            LossPi |         7.3e-09 |
|             LossV |        1.41e+04 |
|       DeltaLossPi |         -0.0161 |
|        DeltaLossV |       -2.98e+03 |
|           Entropy |            1.36 |
|                KL |          0.0129 |
|          ClipFrac |           0.132 |
|          StopIter |              79 |
|              Time |             7.6 |
---------------------------------------
---------------------------------------
|             Epoch |               1 |
|      AverageEpRet |            -182 |


In [None]:
from spinup.utils.test_policy import load_policy_and_env

In [None]:
# Show policy
_, get_action = load_policy_and_env('output_dir')
env2 = gym.make('LunarLander-v2')
env3 = wrap_env(env2)
run_policy(env3, get_action, max_ep_len=1000, num_episodes=50)
env3.close()
show_video() 



Loading from output_dir/tf1_save.


Using default action op.
[32;1mLogging data to /tmp/experiments/1601485098/progress.txt[0m
Episode 0 	 EpRet -128.853 	 EpLen 76
Episode 1 	 EpRet -82.507 	 EpLen 108
Episode 2 	 EpRet -82.847 	 EpLen 162
Episode 3 	 EpRet -97.729 	 EpLen 133
-------------------------------------
|    AverageEpRet |             -98 |
|        StdEpRet |            18.9 |
|        MaxEpRet |           -82.5 |
|        MinEpRet |            -129 |
|           EpLen |             120 |
-------------------------------------
