<a href="https://colab.research.google.com/github/JSJeong-me/Machine_Learning/blob/main/ML/10-cartpole.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Setup rendering dependencies for Google Colaboratory.

In [1]:
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1

Install d3rlpy!

In [2]:
!pip install d3rlpy

Collecting d3rlpy
  Downloading d3rlpy-1.1.0-cp37-cp37m-manylinux1_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 5.2 MB/s 
Collecting tensorboardX
  Downloading tensorboardX-2.5-py2.py3-none-any.whl (125 kB)
[K     |████████████████████████████████| 125 kB 43.6 MB/s 
Collecting colorama
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting structlog
  Downloading structlog-21.5.0-py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 1.7 MB/s 
Collecting GPUtil
  Downloading GPUtil-1.4.0.tar.gz (5.5 kB)
Building wheels for collected packages: GPUtil
  Building wheel for GPUtil (setup.py) ... [?25l[?25hdone
  Created wheel for GPUtil: filename=GPUtil-1.4.0-py3-none-any.whl size=7411 sha256=ea749720a6d34b36e8cf6d81cd63188e30dfd6dd910a14befd64539b2f765c9b
  Stored in directory: /root/.cache/pip/wheels/6e/f8/83/534c52482d6da64622ddbf72cd93c35d2ef2881b78fd08ff0c
Successfully built GPUtil
Installing collected packages: tensorboardX

Setup cartpole dataset.

In [3]:
from d3rlpy.datasets import get_cartpole

# get CartPole dataset
dataset, env = get_cartpole()

Downloading cartpole.pkl into d3rlpy_data/cartpole_replay_v1.1.0.h5...


Setup data-driven deep reinforcement learning algorithm.

In [4]:
from d3rlpy.algos import DiscreteCQL
from d3rlpy.metrics.scorer import discounted_sum_of_advantage_scorer
from d3rlpy.metrics.scorer import evaluate_on_environment
from d3rlpy.metrics.scorer import td_error_scorer
from d3rlpy.metrics.scorer import average_value_estimation_scorer
from sklearn.model_selection import train_test_split

# setup CQL algorithm
cql = DiscreteCQL(use_gpu=False)

# split train and test episodes
train_episodes, test_episodes = train_test_split(dataset, test_size=0.2)

# start training
cql.fit(train_episodes,
        eval_episodes=test_episodes,
        n_epochs=1,
        scorers={
            'environment': evaluate_on_environment(env), # evaluate with CartPol-v0 environment
            'advantage': discounted_sum_of_advantage_scorer, # smaller is better
            'td_error': td_error_scorer, # smaller is better
            'value_scale': average_value_estimation_scorer # smaller is better
        })

2022-05-06 06:02.00 [debug    ] RoundIterator is selected.
2022-05-06 06:02.00 [info     ] Directory is created at d3rlpy_logs/DiscreteCQL_20220506060200
2022-05-06 06:02.00 [debug    ] Building models...
2022-05-06 06:02.00 [debug    ] Models have been built.
2022-05-06 06:02.00 [info     ] Parameters are saved to d3rlpy_logs/DiscreteCQL_20220506060200/params.json params={'action_scaler': None, 'alpha': 1.0, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': None, 'target_update_interval': 8000, 'use_gpu': None, 'algorithm': 'DiscreteCQL', 'observation

Epoch 1/1:   0%|          | 0/2502 [00:00<?, ?it/s]

2022-05-06 06:02.26 [info     ] DiscreteCQL_20220506060200: epoch=1 step=2502 epoch=1 metrics={'time_sample_batch': 0.00020415624745076985, 'time_algorithm_update': 0.00790301253565019, 'loss': 0.6818982881607769, 'time_step': 0.00830855300958208, 'environment': 200.0, 'advantage': -2.5915046483778155, 'td_error': 1.140804249640917, 'value_scale': 1.0549770076815035} step=2502
2022-05-06 06:02.26 [info     ] Model parameters are saved to d3rlpy_logs/DiscreteCQL_20220506060200/model_2502.pt


[(1,
  {'advantage': -2.5915046483778155,
   'environment': 200.0,
   'loss': 0.6818982881607769,
   'td_error': 1.140804249640917,
   'time_algorithm_update': 0.00790301253565019,
   'time_sample_batch': 0.00020415624745076985,
   'time_step': 0.00830855300958208,
   'value_scale': 1.0549770076815035})]

Setup rendering utilities for Google Colaboratory.

In [5]:
import glob
import io
import base64

from gym.wrappers import Monitor
from IPython.display import HTML
from IPython import display as ipythondisplay
from pyvirtualdisplay import Display

# start virtual display
display = Display(visible=0, size=(1400, 900))
display.start()

# play recorded video
def show_video():
    mp4list = glob.glob('video/*.mp4')
    if len(mp4list) > 0:
        mp4 = mp4list[0]
        video = io.open(mp4, 'r+b').read()
        encoded = base64.b64encode(video)
        ipythondisplay.display(HTML(data='''
            <video alt="test" autoplay loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
            </video>'''.format(encoded.decode('ascii'))))
    else: 
        print("Could not find video")

Record video!

In [6]:
# wrap Monitor wrapper
env = Monitor(env, './video', force=True)

# evaluate
evaluate_on_environment(env)(cql)

200.0

Let's see how it works!

In [7]:
show_video()