## Set-up Virtual Display

Install libraries for multimedia processing and virtual display

In [1]:
%%capture
%%bash
apt install python-opengl # Python binding to OpenGL and related APIs
apt install ffmpeg  # FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.
apt install xvfb    # X virtual framebuffer - allows you to execute graphical apps without having to use a monitor
pip3 install pyvirtualdisplay  # python wrapper for Xvfb, Xephyr and Xvnc

Create and start a virtual display

In [2]:
from pyvirtualdisplay import Display

virtualdisplay = Display(visible=0, size=(1400, 900))
virtualdisplay.start()

<pyvirtualdisplay.display.Display at 0x7f0a3f420090>

## Installs and Imports

Install Gym, Stable Baseline, and Huggingface API Libraries

In [3]:
%%capture
%%bash
pip install gym[box2d]
pip install stable-baselines3[extra]
pip install pyglet  # python library for creation of games and other multimedia applications
# pip install ale-py==0.7.4   # To overcome an issue with Gym (https://github.com/DLR-RM/stable-baselines3/issues/875)
# The Arcade Learning Environment (ALE) is a simple framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games
pip install huggingface_sb3

In [4]:
import gym

from stable_baselines3 import PPO # the training algorithm we'll use
from stable_baselines3.common.env_util import make_vec_env  # to create parallel environments to train in
from stable_baselines3.common.vec_env import DummyVecEnv  # holds a lambda function for creating our environment for evaluation
from stable_baselines3.common.monitor import Monitor  # for the evaluate_policy function
from stable_baselines3.common.evaluation import evaluate_policy # utility function to evaluate the performance of our agent

from huggingface_hub import notebook_login  # login to huggingface
from huggingface_sb3 import package_to_hub  # package and push your agent to the huggingface hub

## Create Environment and Agent

Create the [Mountain car](https://www.gymlibrary.ml/environments/classic_control/mountain_car_continuous/) environment for inspection, evaluation and training. 

In [6]:
ENV_ID = "CarRacing-v0"
NUM_ENV = 1

env = gym.make(ENV_ID)

train_env = make_vec_env(ENV_ID, n_envs=NUM_ENV)
eval_env = Monitor(gym.make(ENV_ID))
hub_eval_env = DummyVecEnv([lambda: gym.make(ENV_ID)])

In [7]:
print("Observation Space", env.observation_space)
print("Action Space", env.action_space)

Observation Space Box([[[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 ...

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]], [[[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 ...

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [25

Create the Model

In [8]:

# param learning_rate: [0.0003]
# param n_steps: [2048] The number of steps to run for each environment per update (i.e. rollout buffer size is n_steps * n_envs)
# param n_epochs: [10] Number of epoch when optimizing the surrogate loss
# param batch_size: [64] Minibatch size ??

train_env = make_vec_env(ENV_ID, n_envs=NUM_ENV)
model = PPO(policy="CnnPolicy", env=train_env, verbose=1)

Using cuda device
Wrapping the env in a VecTransposeImage.


## Train and Save the Agent

In [9]:
evaluate_policy(model, eval_env, n_eval_episodes=50, deterministic=True)

Track generation: 1102..1383 -> 281-tiles track
Track generation: 1111..1393 -> 282-tiles track
Track generation: 1108..1389 -> 281-tiles track
Track generation: 1151..1443 -> 292-tiles track
Track generation: 1151..1442 -> 291-tiles track
Track generation: 1118..1402 -> 284-tiles track
Track generation: 1218..1527 -> 309-tiles track
Track generation: 1005..1266 -> 261-tiles track
Track generation: 941..1188 -> 247-tiles track
Track generation: 1131..1418 -> 287-tiles track
Track generation: 1123..1408 -> 285-tiles track
Track generation: 1272..1594 -> 322-tiles track
Track generation: 1232..1544 -> 312-tiles track
Track generation: 1032..1294 -> 262-tiles track
Track generation: 1318..1651 -> 333-tiles track
Track generation: 1255..1573 -> 318-tiles track
Track generation: 1024..1292 -> 268-tiles track
Track generation: 1141..1430 -> 289-tiles track
Track generation: 1077..1355 -> 278-tiles track
Track generation: 1231..1543 -> 312-tiles track
Track generation: 1199..1503 -> 304-tiles

(-90.86908824, 5.143391778536215)

In [10]:
model.learn(total_timesteps=5e5)

Track generation: 1216..1524 -> 308-tiles track
Track generation: 1027..1288 -> 261-tiles track
Track generation: 1087..1363 -> 276-tiles track
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -50.3    |
| time/              |          |
|    fps             | 165      |
|    iterations      | 1        |
|    time_elapsed    | 12       |
|    total_timesteps | 2048     |
---------------------------------
Track generation: 1380..1730 -> 350-tiles track
Track generation: 1099..1378 -> 279-tiles track
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -49.8      |
| time/                   |            |
|    fps                  | 142        |
|    iterations           | 2          |
|    time_elapsed         | 28         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    appro

KeyboardInterrupt: ignored

## Evaluate the Agent

In [None]:
mean_return, std_return = evaluate_policy(model, eval_env, n_eval_episodes=50, deterministic=True)
mean_return, std_return

(-55.66208251999999, 75.09680179115783)

## Visualize the Agent

In [None]:
from stable_baselines3.common.vec_env import VecVideoRecorder

def record_video(model, env_id, prefix, folder="./videos", n_steps=200):
  env = VecVideoRecorder(DummyVecEnv([lambda: gym.make(env_id)]),
                         video_folder=folder, 
                         record_video_trigger=lambda step: step == 0, 
                         video_length=n_steps, 
                         name_prefix=prefix )
  
  obs = env.reset()
  for step in range(n_steps):
    action, _states = model.predict(obs, deterministic=True)
    obs, rew, done, info = env.step(action)
    env.render()
  
  env.close()

In [None]:
record_video(model, ENV_ID, "PPO_mountain_car", folder="./videos", n_steps=1000)

Saving video to /content/videos/PPO_mountain_car-step-0-to-step-1000.mp4


In [None]:
from pathlib import Path
from IPython import display as ipythondisplay
import base64

def show_video(video_path):
  video_path = Path(video_path)
  video_b64 = base64.b64encode(video_path.read_bytes())
  html = f'''
  <video alt={video_path} autoplay loop controls style="height: 400px;">
    <source src="data: video/mp4;base64, {video_b64.decode('ascii')}" type="video/mp4" />
  </video>
  '''

  ipythondisplay.display(ipythondisplay.HTML(data=html))

In [None]:
show_video("./videos/PPO_mountain_car-step-0-to-step-1000.mp4")

## Package to Hub

In [None]:
notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default

git config --global credential.helper store[0m


In [None]:
package_to_hub(model=model, model_name="ppo-mountain_car", 
               model_architecture="PPO", 
               env_id=ENV_ID, 
               eval_env=hub_eval_env, 
               repo_id="danieladejumo/ppo-mountain_car",
               commit_message="Created and train PPO model")

[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue and use
push_to_hub instead.[0m


Cloning https://huggingface.co/danieladejumo/ppo-mountan_car into local empty directory.


Saving video to /content/-step-0-to-step-1000.mp4
[38;5;4mℹ Pushing repo ppo-mountan_car to the Hugging Face Hub[0m


Upload file replay.mp4:   1%|1         | 3.34k/236k [00:00<?, ?B/s]

Upload file ppo-mountan_car.zip:   3%|2         | 3.34k/127k [00:00<?, ?B/s]

Upload file ppo-mountan_car/policy.pth:   9%|8         | 3.34k/38.9k [00:00<?, ?B/s]

Upload file ppo-mountan_car/pytorch_variables.pth: 100%|##########| 431/431 [00:00<?, ?B/s]

Upload file ppo-mountan_car/policy.optimizer.pth:   4%|4         | 3.34k/76.3k [00:00<?, ?B/s]

To https://huggingface.co/danieladejumo/ppo-mountan_car
   f73b5b5..8e062f7  main -> main



[38;5;4mℹ Your model is pushed to the hub. You can view your model here:
https://huggingface.co/danieladejumo/ppo-mountan_car[0m


'https://huggingface.co/danieladejumo/ppo-mountan_car'

In [None]:
from huggingface_sb3 import load_from_hub

model_path = load_from_hub("danieladejumo/ppo-mountain_car", 
                           "ppo-mountan_car.zip")

env = gym.make("MountainCar-v0")

model = PPO.load(model_path, env)
mean_return, std_return = evaluate_policy(model, env, n_eval_episodes=50, deterministic=True)

In [12]:
%cd ./drive/MyDrive/Python/Machine\ Learning/Deep\ RL/stable-baselines3_gym-envs

/content/drive/MyDrive/Python/Machine Learning/Deep RL/stable-baselines3_gym-envs


In [14]:
%%bash
git init
git add .
git commit -m "Initial commit"
git branch -M main

Initialized empty Git repository in /content/drive/MyDrive/Python/Machine Learning/Deep RL/stable-baselines3_gym-envs/.git/



*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got 'root@d1a966b32234.(none)')
error: refname refs/heads/master not found
fatal: Branch rename failed


In [15]:
!git config --global user.email "adejumodaniel17@gmail.com"
!git config --global user.name "Daniel Adejumo"