# Mujoco Agents with Stable Baselines3 and Gymnasium

This Jupyter Notebook is dedicated to the exploration and understanding of running Mujoco agents using **Stable Baselines3** and **Gymnasium**.

## Introduction

- **Mujoco**: A physics engine that provides accurate and efficient simulation of robot dynamics.
- **Stable Baselines3**: A set of high-quality implementations of reinforcement learning algorithms in Python, built on top of the PyTorch library.
- **Gymnasium**: An open-source Python library for developing and comparing reinforcement learning algorithms.

## Objective

In this notebook, we will delve into the process of training and evaluating Mujoco agents using Stable Baselines3 and Gymnasium. The topics we will cover include:

- Environment setup
- Agent training
- Hyperparameter tuning
- Performance evaluation

## Outcome

By the end of this notebook, you will have a comprehensive understanding of how to use Stable Baselines3 and Gymnasium to develop and experiment with Mujoco agents.

Let's get started!

In [None]:
# Install the necessary libraries

!apt-get update -qq
!apt-get install -y \
    libgl1-mesa-dev \
    libgl1-mesa-glx \
    libglew-dev \
    libosmesa6-dev \
    software-properties-common \
    patchelf \
    xvfb


In [None]:
#Intall the libraries

!pip install gymnasium
!pip install free-mujoco-py
!pip install mujoco
!pip install stable-baselines3

In [None]:
import gymnasium as gym
import stable_baselines3 as sb3
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy


In [None]:
env = gym.make('HalfCheetah-v3')
print(env)

In [None]:
# Create the environment
env = gym.make('HalfCheetah-v3')

In [None]:
# Instantiate the PPO agent
model = sb3.PPO('MlpPolicy', env, verbose=1)

# Train the agent
total_timesteps = 1000000
model.learn(total_timesteps=total_timesteps)

# Save the trained model
model.save("ppo_halfcheetah")

# Load the trained model
try:
    loaded_model = sb3.PPO.load("ppo_halfcheetah", env=env)
except FileNotFoundError:
    print("Saved model not found.")

# Evaluate the trained agent
eval_env = gym.make('HalfCheetah-v3')
mean_reward, std_reward = evaluate_policy(loaded_model, eval_env, n_eval_episodes=10)
print(f"Mean reward: {mean_reward}, Std reward: {std_reward}")

eval_env.close()


In [None]:
#PPO agent with vectorised environment

# Number of parallel environments
n_envs = 4

# Create and vectorize the environments
env = make_vec_env('HalfCheetah-v3', n_envs=n_envs)

# Instantiate the PPO agent
model = sb3.PPO('MlpPolicy', env, verbose=1)

# Train the agent
total_timesteps = 1000000
model.learn(total_timesteps=total_timesteps)

# Save the trained model
model.save("ppo_halfcheetah")

# Load the trained model
try:
    loaded_model = sb3.PPO.load("ppo_halfcheetah", env=env)
except FileNotFoundError:
    print("Saved model not found.")

# Evaluate the trained agent
# Note: For evaluation, use a single (non-vectorized) environment
eval_env = gym.make('HalfCheetah-v3')
mean_reward, std_reward = evaluate_policy(loaded_model, eval_env, n_eval_episodes=10)
print(f"Mean reward: {mean_reward}, Std reward: {std_reward}")

eval_env.close()


In [None]:
##SAC Agent with vectorised environment

# Instantiate the SAC agent with the MlpPolicy
model = sb3.SAC('MlpPolicy', env, verbose=1)

# Train the agent
total_timesteps = 1000000
model.learn(total_timesteps=total_timesteps)

# Save the trained model
model.save("sac_halfcheetah")

# Load the trained model
try:
    loaded_model = sb3.SAC.load("sac_halfcheetah", env=env)
except FileNotFoundError:
    print("Saved model not found.")

# Evaluate the trained agent
# Note: For evaluation, use a single (non-vectorized) environment
eval_env = gym.make('HalfCheetah-v3')
mean_reward, std_reward = evaluate_policy(loaded_model, eval_env, n_eval_episodes=10)
print(f"Mean reward: {mean_reward}, Std reward: {std_reward}")

eval_env.close()