<a href="https://colab.research.google.com/github/Saponjyan/CV/blob/main/Bipedalwalker_V2_PPO2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Deep Learning Labs S01 E05: BiPedalwalker-V2

This colab will allow you to train, evaluate and visulize your results. As Google colab don't support env.render() we will use a work around where we "fake" a display, record a video and then display it.

Notebook will run with classic **CPU** enviorment as well as **GPU** & **TPU**

> To run it all select `runtime` in menu and choose `run all`

![Nextgrid Deep learning labs](https://nextgrid.ai/wp-content/uploads/2019/12/Deck-wallpaper-logo-scaled.jpg)

### Stable Baselines OpenAI Gym BiPedalwalker-V2

Notebook by [nextgrid.ai](https://nextgrid.ai) for [Deep learning labs](https://nextgrid.ai/deep-learning-labs/) #5.


Documentation for stabile-baselines available at: [https://stable-baselines.readthedocs.io/](https://stable-baselines.readthedocs.io/)


notebook authored by M.   
[linkedin](https://www.linkedin.com/in/imathias) / [twitter](https://twitter.com/mathiiias123)   



# Новый раздел

## Install system wide packages
Install linux server packages using `apt-get` and Python packages using `pip`

In [2]:
!apt-get install swig cmake libopenmpi-dev zlib1g-dev xvfb x11-utils ffmpeg -qq #remove -qq for full output
!pip install stable-baselines[mpi] box2d box2d-kengz pyvirtualdisplay pyglet==1.3.1 --quiet #remove --quiet for full output
# Stable Baselines only supports tensorflow 1.x for now
%tensorflow_version 1.x

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for box2d-kengz (setup.py) ... [?25l[?25hdone
  Building wheel for mpi4py (pyproject.toml) ... [?25l[?25hcanceled
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/base_command.py", line 169, in exc_logging_wrapper
    status = run_func(*args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/req_command.py", line 242, in wrapper
    return func(self, options, args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/install.py", line 417, in run
    _, build_failures = build(
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/wheel_builder.py", line 320, in build
    wheel_file = _build_one(
  File "/usr/local/lib/python3.10/dist-packages/pip/_inter

ValueError: ignored

## Dependencis
import dependencis required to run & train our model + record a video

In [None]:
import gym
import imageio
import numpy as np
import base64
import IPython
import PIL.Image
import pyvirtualdisplay

# Video stuff
from pathlib import Path
from IPython import display as ipythondisplay

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import VecVideoRecorder, SubprocVecEnv, DummyVecEnv
from stable_baselines import PPO2

## Define variables & functions
Here we define our variables and also create a couple of functions

In [None]:
# set enviorment variables that we will use in our code
env_id = 'BipedalWalker-v2'
video_folder = '/videos'
video_length = 100

# set our inital enviorment
env = DummyVecEnv([lambda: gym.make(env_id)])
obs = env.reset()

In [None]:
# Evaluation Function
def evaluate(model, num_steps=1000):
  """
  Evaluate a RL agent
  :param model: (BaseRLModel object) the RL Agent
  :param num_steps: (int) number of timesteps to evaluate it
  :return: (float) Mean reward for the last 100 episodes
  """
  episode_rewards = [0.0]
  obs = env.reset()
  for i in range(num_steps):
      # _states are only useful when using LSTM policies
      action, _states = model.predict(obs)

      obs, reward, done, info = env.step(action)

      # Stats
      episode_rewards[-1] += reward
      if done:
          obs = env.reset()
          episode_rewards.append(0.0)
  # Compute mean reward for the last 100 episodes
  mean_100ep_reward = round(np.mean(episode_rewards[-100:]), 1)
  print("Mean reward:", mean_100ep_reward, "Num episodes:", len(episode_rewards))

  return mean_100ep_reward

In [None]:
# Make video
# Set up fake display; otherwise rendering will fail
import os
os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

In [None]:
# Record video
def record_video(env_id, model, video_length=500, prefix='', video_folder='videos/'):
  """
  :param env_id: (str)
  :param model: (RL model)
  :param video_length: (int)
  :param prefix: (str)
  :param video_folder: (str)
  """
  eval_env = DummyVecEnv([lambda: gym.make('BipedalWalker-v2')])
  # Start the video at step=0 and record 500 steps
  eval_env = VecVideoRecorder(env, video_folder=video_folder,
                              record_video_trigger=lambda step: step == 0, video_length=video_length,
                              name_prefix=prefix)

  obs = eval_env.reset()
  for _ in range(video_length):
    action, _ = model.predict(obs)
    obs, _, _, _ = eval_env.step(action)

  # Close the video recorder
  eval_env.close()

In [None]:
# Display video
def show_videos(video_path='', prefix=''):
  html = []
  for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
      video_b64 = base64.b64encode(mp4.read_bytes())
      html.append('''<video alt="{}" autoplay
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>'''.format(mp4, video_b64.decode('ascii')))
  ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

# Define & Configure out Reinforcment learning algoritm
In this example we are using default PPO2 / Proximal Policy Optimization. Read more about how you define your PPO2 [parameters](https://stable-baselines.readthedocs.io/en/master/modules/ppo2.html#parameters)

In [None]:
# Define the model
model = PPO2(MlpPolicy, env, verbose=1) # add & tweak default parameters, messure your output & improve link to parameters above (it will however work with default)

## Train model 50k steps & evaluate results
Here we train, evaluate, save, record & display video

In [None]:

# Random Agent, before training
mean_reward_before_train = evaluate(model, num_steps=10000)

# Train model
model.learn(total_timesteps=50000)

# Save model
model.save("ppo2-walker-50000")

# Random Agent, after training
mean_reward_after_train = evaluate(model, num_steps=1000)

In [None]:
# Record & show video
record_video('BipedalWalker-v2', model, video_length=1500, prefix='ppo2-walker-50000')
show_videos('videos', prefix='ppo2-walker-50000')

## Train model another 500k steps & evaluate results


In [None]:
# Random Agent, before training
mean_reward_before_train = evaluate(model, num_steps=10000)

# Train model
model.learn(total_timesteps=500000)

# Save model
model.save("ppo2-walker-500000")

# Random Agent, after training
mean_reward_after_train = evaluate(model, num_steps=10000)

In [None]:
# Record & show video
record_video('BipedalWalker-v2', model, video_length=1500, prefix='ppo2-walker-500000')
show_videos('videos', prefix='ppo2-walker-500000')