# Pendulum Swing Up

Descriptions about Pendulum environment: https://gymnasium.farama.org/environments/classic_control/pendulum/

The Hyperparameters are taken from: https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml


In [1]:
import gymnasium as gym
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.ppo.policies import MlpPolicy

## Create the Gym env and instantiate the agent

For this example, we will use Pendulum environment, a classic control problem.

In [2]:
env = gym.make("Pendulum-v1", g=9.81)
model = PPO(
    MlpPolicy,
    env, 
    verbose=0, 
    n_steps=1024,
    gae_lambda=0.95,
    gamma=0.9,
    n_epochs=10,
    ent_coef=0.0,
    learning_rate=1e-3,
    clip_range=0.2,
    use_sde=True,
    sde_sample_freq=4)
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=100, warn=False)
print(f"Before training: mean_reward: {mean_reward:.2f} +/- {std_reward:.2f}")

Before training: mean_reward: -1219.03 +/- 332.51


## Train the agent and evaluate it

In [3]:
# Train the agent
model.learn(total_timesteps=100_000)

<stable_baselines3.ppo.ppo.PPO at 0x320913f40>

In [4]:
# Evaluate the trained agent
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=100)
print(f"After training: mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")



After training: mean_reward:-196.35 +/- 246.49


Apparently the training went well, the mean reward increased a lot ! 

### Prepare video recording

In [5]:
# Set up fake display; otherwise rendering will fail
import os
os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

sh: Xvfb: command not found


In [6]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay


def show_videos(video_path="", prefix=""):
    """
    Taken from https://github.com/eleurent/highway-env

    :param video_path: (str) Path to the folder containing videos
    :param prefix: (str) Filter the video, showing only the only starting with this prefix
    """
    html = []
    for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
        video_b64 = base64.b64encode(mp4.read_bytes())
        html.append(
            """<video alt="{}" autoplay 
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>""".format(
                mp4, video_b64.decode("ascii")
            )
        )
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

We will record a video using the [VecVideoRecorder](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecvideorecorder) wrapper, you will learn about those wrapper in the next notebook.

In [7]:
from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv


def record_video(env_id, model, video_length=500, prefix="", video_folder="videos/"):
    """
    :param env_id: (str)
    :param model: (RL model)
    :param video_length: (int)
    :param prefix: (str)
    :param video_folder: (str)
    """
    eval_env = DummyVecEnv([lambda: gym.make(env_id, render_mode="rgb_array")])
    # Start the video at step=0 and record 500 steps
    eval_env = VecVideoRecorder(
        eval_env,
        video_folder=video_folder,
        record_video_trigger=lambda step: step == 0,
        video_length=video_length,
        name_prefix=prefix,
    )

    obs = eval_env.reset()
    for _ in range(video_length):
        action, _ = model.predict(obs)
        obs, _, _, _ = eval_env.step(action)

    # Close the video recorder
    eval_env.close()

### Visualize trained agent



In [8]:
record_video("Pendulum-v1", model, video_length=1000, prefix="ppo-pendulum")

Saving video to /Users/fenglongsong/PycharmProjects/differentiable-mpc/examples/reinforcement_learning/videos/ppo-pendulum-step-0-to-step-1000.mp4
Moviepy - Building video /Users/fenglongsong/PycharmProjects/differentiable-mpc/examples/reinforcement_learning/videos/ppo-pendulum-step-0-to-step-1000.mp4.
Moviepy - Writing video /Users/fenglongsong/PycharmProjects/differentiable-mpc/examples/reinforcement_learning/videos/ppo-pendulum-step-0-to-step-1000.mp4



                                                                 

Moviepy - Done !
Moviepy - video ready /Users/fenglongsong/PycharmProjects/differentiable-mpc/examples/reinforcement_learning/videos/ppo-pendulum-step-0-to-step-1000.mp4


In [9]:
show_videos("videos", prefix="ppo")