<a href="https://colab.research.google.com/github/BrutFab/ppo_BipedalWalker_v3/blob/main/ppo_BipedalWalker_v3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. Setup**


### **Install Packages**

In [1]:
# Install necessary packages
!apt install swig cmake ffmpeg xvfb python3-opengl
!pip install stable-baselines3==2.0.0a5 gymnasium[box2d] huggingface_sb3 pyvirtualdisplay imageio[ffmpeg]

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
cmake is already the newest version (3.22.1-1ubuntu1.22.04.2).
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
The following additional packages will be installed:
  freeglut3 libfontenc1 libglu1-mesa libxfont2 libxkbfile1 swig4.0 x11-xkb-utils xfonts-base
  xfonts-encodings xfonts-utils xserver-common
Suggested packages:
  libgle3 python3-numpy swig-doc swig-examples swig4.0-examples swig4.0-doc
The following NEW packages will be installed:
  freeglut3 libfontenc1 libglu1-mesa libxfont2 libxkbfile1 python3-opengl swig swig4.0
  x11-xkb-utils xfonts-base xfonts-encodings xfonts-utils xserver-common xvfb
0 upgraded, 14 newly installed, 0 to remove and 45 not upgraded.
Need to get 9,754 kB of archives.
After this operation, 25.6 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 freeglut3 amd64 2.8.1-6 [74.0 kB]
Get:2 http://

The Next Cell will force the notebook runtime to restart. This is to ensure all the new libraries installed will be used.

In [None]:
import os
os.kill(os.getpid(), 9)

### **Start Virtual Display**

In [1]:
from pyvirtualdisplay import Display
virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

<pyvirtualdisplay.display.Display at 0x7eb0c3ea5cf0>

### **Setup Environment**

In [2]:
import gymnasium as gym
env = gym.make("BipedalWalker-v3", hardcore=True)
env.reset()

(array([ 2.7474449e-03, -1.2436845e-05,  9.6746074e-04, -1.5999915e-02,
         9.1980219e-02, -1.2767193e-03,  8.6025739e-01,  2.3463638e-03,
         1.0000000e+00,  3.2384779e-02, -1.2766306e-03,  8.5380816e-01,
         9.0290722e-04,  1.0000000e+00,  4.4081402e-01,  4.4582012e-01,
         4.6142277e-01,  4.8955020e-01,  5.3410280e-01,  6.0246104e-01,
         7.0914888e-01,  8.8593185e-01,  1.0000000e+00,  1.0000000e+00],
       dtype=float32),
 {})

### **Observation Space**
Observation Space Shape (24,) vector of size 24, where each value contains different information about the walker:

- **Hull Angle Speed**: The speed at which the main body of the walker is rotating.
- **Angular Velocity**: The rate of change of the angular position of the walker.
- **Horizontal Speed**: The speed at which the walker is moving horizontally.
- **Vertical Speed**: The speed at which the walker is moving vertically.
- **Position of Joints**: The positions (angles) of the walker's joints. Given that the walker has 4 joints, this take up 4 values.
- **Joints Angular Speed**: The rate of change of the angular position for each joint. Again, this would be 4 values for the 4 joints.
- **Legs Contact with Ground**: Indicating whether each leg is in contact with the ground. Given two legs, this contains 2 values.
- **10 Lidar Rangefinder Measurements**: These are distance measurements to detect obstacles or terrain features around the walker. There are 10 of these values.


In [3]:
print("_____OBSERVATION SPACE_____ \n")
print("Observation Space Shape", env.observation_space.shape)
print("Sample observation", env.observation_space.sample()) # Get a random observation

_____OBSERVATION SPACE_____ 

Observation Space Shape (24,)
Sample observation [ 3.0030336  -3.8087451   4.8383284  -2.9032204  -2.7099688   4.6612353
  0.99242914 -1.0835009   2.1865888  -0.06143188 -1.8567798   1.0854756
  3.164287    2.3931043   0.32011095 -0.14404017  0.57871926  0.6630736
 -0.9742642  -0.9428211  -0.18407333  0.9888316   0.54687035 -0.97685945]


### **Action Space**

 Actions are motor speed values in the [-1, 1] range for each of the 4 joints at both hips and knees.

In [4]:
print("\n _____ACTION SPACE_____ \n")
print("Action Space Shape", env.action_space.shape)
print("Action Space Sample", env.action_space.sample()) # Take a random action


 _____ACTION SPACE_____ 

Action Space Shape (4,)
Action Space Sample [ 0.8176275   0.02198642 -0.91815233  0.8301315 ]


### **Vectorized Environment**
Create a vectorized environment (a method for stacking multiple independent environments into a single environment) of 16 environments to have more diverse experiences.

In [5]:
from stable_baselines3.common.env_util import make_vec_env
env = make_vec_env('BipedalWalker-v3', n_envs=16)

# **2. Building the Model**

In [6]:
from stable_baselines3 import PPO
model = PPO(
    policy = 'MlpPolicy',
    env = env,
    n_steps = 2048,
    batch_size = 128,
    n_epochs = 6,
    gamma = 0.999,
    gae_lambda = 0.98,
    ent_coef = 0.01,
    verbose=1)

  and should_run_async(code)


Using cpu device


# 3.**Video Generation**

In [7]:
from wasabi import Printer
import numpy as np
from stable_baselines3.common.base_class import BaseAlgorithm
from pathlib import Path
import tempfile
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import (
    DummyVecEnv,
    VecEnv,
    VecVideoRecorder,
)

  and should_run_async(code)


In [8]:
msg = Printer()

In [9]:
def generate_replay(
    model: BaseAlgorithm,
    eval_env: VecEnv,
    video_length: int,
    is_deterministic: bool,
    local_path: Path,
):
    """
    Generate a replay video of the agent
    :param model: trained model
    :param eval_env: environment used to evaluate the agent
    :param video_length: length of the video (in timesteps)
    :param is_deterministic: use deterministic or stochastic actions
    :param local_path: path of the local repository
    """
    # This is another temporary directory for video outputs
    # SB3 created a -step-0-to-... meta files as well as other
    # artifacts which we don't want in the repo.
    with tempfile.TemporaryDirectory() as tmpdirname:
        # Step 1: Create the VecVideoRecorder
        env = VecVideoRecorder(
            eval_env,
            tmpdirname,
            record_video_trigger=lambda x: x == 0,
            video_length=video_length,
            name_prefix="",
        )

        obs = env.reset()
        lstm_states = None
        episode_starts = np.ones((env.num_envs,), dtype=bool)

        try:
            for _ in range(video_length):
                action, lstm_states = model.predict(
                    obs,
                    state=lstm_states,
                    episode_start=episode_starts,
                    deterministic=is_deterministic,
                )
                obs, _, episode_starts, _ = env.step(action)

            # Save the video
            env.close()

            # Convert the video with x264 codec
            inp = env.video_recorder.path
            out = local_path
            os.system(f"ffmpeg -y -i {inp} -vcodec h264 {out}".format(inp, out))
            print(f"Video saved to: {out}")
        except KeyboardInterrupt:
            pass
        except Exception as e:
            msg.fail(str(e))
            # Add a message for video
            msg.fail(
                "We are unable to generate a replay of your agent"
            )

# **4. Training, Saving and Record the Videos**

In [10]:
import os

In [11]:
#create a directory to save the videos
video_dir = "/content/videos"
if not os.path.exists(video_dir):
    os.makedirs(video_dir)

In [12]:
env_id = "BipedalWalker-v3"
# Train and generate video at every 100000 steps, adjust the timesteps to your liking
for i in range(0, 200000, 10000):
    model.learn(total_timesteps=10000)
    # Save the model
    model_name = "ppo-BipedalWalker-v3"
    model.save(model_name)
    video_name = f"replay_{i + 10000}.mp4"
    generate_replay(
        model=model,
        eval_env=DummyVecEnv([lambda: Monitor(gym.make(env_id, hardcore=True, render_mode="rgb_array"))]),
        video_length=1000,
        is_deterministic=True,
        local_path=os.path.join(video_dir, video_name)
    )

model_name = "ppo-BipedalWalker-v3"
model.save(model_name)


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 305      |
|    ep_rew_mean     | -112     |
| time/              |          |
|    fps             | 2068     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmp3d2fu24j/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp3d2fu24j/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmp3d2fu24j/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp3d2fu24j/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_10000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 391      |
|    ep_rew_mean     | -111     |
| time/              |          |
|    fps             | 2083     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpg_w5hw10/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpg_w5hw10/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpg_w5hw10/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpg_w5hw10/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_20000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 547      |
|    ep_rew_mean     | -110     |
| time/              |          |
|    fps             | 2150     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpk28l9y99/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpk28l9y99/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpk28l9y99/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpk28l9y99/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_30000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 750      |
|    ep_rew_mean     | -109     |
| time/              |          |
|    fps             | 2114     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmp7u8s62ud/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp7u8s62ud/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmp7u8s62ud/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp7u8s62ud/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_40000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 671      |
|    ep_rew_mean     | -109     |
| time/              |          |
|    fps             | 2173     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpfrz9hcp3/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpfrz9hcp3/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpfrz9hcp3/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpfrz9hcp3/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_50000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.01e+03 |
|    ep_rew_mean     | -105     |
| time/              |          |
|    fps             | 2055     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpds8ga63h/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpds8ga63h/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpds8ga63h/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpds8ga63h/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_60000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 947      |
|    ep_rew_mean     | -105     |
| time/              |          |
|    fps             | 2116     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmp9laax118/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp9laax118/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmp9laax118/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp9laax118/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_70000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 978      |
|    ep_rew_mean     | -101     |
| time/              |          |
|    fps             | 1975     |
|    iterations      | 1        |
|    time_elapsed    | 16       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmp0je103yi/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp0je103yi/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmp0je103yi/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp0je103yi/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_80000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.24e+03 |
|    ep_rew_mean     | -95      |
| time/              |          |
|    fps             | 2094     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmp857rrax9/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp857rrax9/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmp857rrax9/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp857rrax9/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_90000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.43e+03 |
|    ep_rew_mean     | -91.2    |
| time/              |          |
|    fps             | 2034     |
|    iterations      | 1        |
|    time_elapsed    | 16       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmp0owmz6fm/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp0owmz6fm/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmp0owmz6fm/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp0owmz6fm/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_100000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.36e+03 |
|    ep_rew_mean     | -86.1    |
| time/              |          |
|    fps             | 2067     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpbi7chmez/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpbi7chmez/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpbi7chmez/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpbi7chmez/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_110000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.29e+03 |
|    ep_rew_mean     | -85.4    |
| time/              |          |
|    fps             | 2147     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpgqhw8aqh/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpgqhw8aqh/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpgqhw8aqh/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpgqhw8aqh/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_120000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.43e+03 |
|    ep_rew_mean     | -80.9    |
| time/              |          |
|    fps             | 2014     |
|    iterations      | 1        |
|    time_elapsed    | 16       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmp1ma5vi1_/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp1ma5vi1_/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmp1ma5vi1_/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp1ma5vi1_/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_130000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.51e+03 |
|    ep_rew_mean     | -73.7    |
| time/              |          |
|    fps             | 1992     |
|    iterations      | 1        |
|    time_elapsed    | 16       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmph7l52bsi/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmph7l52bsi/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmph7l52bsi/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmph7l52bsi/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_140000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.43e+03 |
|    ep_rew_mean     | -72.8    |
| time/              |          |
|    fps             | 2165     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmph8r4jpqh/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmph8r4jpqh/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmph8r4jpqh/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmph8r4jpqh/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_150000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.6e+03  |
|    ep_rew_mean     | -63.7    |
| time/              |          |
|    fps             | 2110     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpfe1n7cy5/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpfe1n7cy5/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpfe1n7cy5/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpfe1n7cy5/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_160000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.51e+03 |
|    ep_rew_mean     | -55.3    |
| time/              |          |
|    fps             | 2148     |
|    iterations      | 1        |
|    time_elapsed    | 15       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpf21d7qgr/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpf21d7qgr/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpf21d7qgr/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpf21d7qgr/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_170000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.51e+03 |
|    ep_rew_mean     | -51.8    |
| time/              |          |
|    fps             | 1854     |
|    iterations      | 1        |
|    time_elapsed    | 17       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpstpb5r88/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpstpb5r88/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpstpb5r88/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpstpb5r88/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_180000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.49e+03 |
|    ep_rew_mean     | -50.7    |
| time/              |          |
|    fps             | 2193     |
|    iterations      | 1        |
|    time_elapsed    | 14       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpx_dup1ny/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpx_dup1ny/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpx_dup1ny/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpx_dup1ny/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_190000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.21e+03 |
|    ep_rew_mean     | -49.9    |
| time/              |          |
|    fps             | 2041     |
|    iterations      | 1        |
|    time_elapsed    | 16       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpavud1yey/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpavud1yey/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpavud1yey/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpavud1yey/-step-0-to-step-1000.mp4
Video saved to: /content/videos/replay_200000.mp4


In [13]:
with open(os.path.join(video_dir, "filelist.txt"), "w") as f:
    for i in range(0, 200000, 10000):
        video_name = f"replay_{i + 10000}.mp4"
        f.write(f"file '{os.path.join(video_dir, video_name)}'\n")
# Concatenate all the videos into one
os.system(f"ffmpeg -f concat -safe 0 -i {os.path.join(video_dir, 'filelist.txt')} -c copy {os.path.join(video_dir, 'replay_all.mp4')}")

0

# **5. Visualize Final Video**

In [14]:
from IPython.display import HTML
from base64 import b64encode
mp4 = open('videos/replay_all.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=600 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

# **6. Evaluate the Model**

In [15]:
from stable_baselines3.common.evaluation import evaluate_policy

In [16]:
eval_env = Monitor(gym.make("BipedalWalker-v3"))
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=100.19 +/- 6.713420883846758
