# Gymnasium Basics

This notebook walks through the basics of the gymnasium package and interface. We start by exploring some simple gyms and policies to solve them, then we take a closer look at our environment of interest, MuJoCo.

## Table of Contents
1. [Basic Gym Environments](#Basic-Gym-Environments)
    - [Random Policies](#Random-Policies)
    - [Solving Easy Problems with StableBaselines](#Solving-Easy-Problems-with-StableBaselines)
3. [The MuJoCo Environment](#The-MuJoCo-Environment)

## Basic Gym Environments

Though we're using a more complicated gymnasium environment, it's useful to get familiar with the library and its interface through simple examples. We'll explore 3-4 of these examples in this section.

### Random Policies

To begin, let's see how agents perform in these environments whenever they're chosing random actions.

In [1]:
import gymnasium as gym

In [2]:
def play_agent(env_name, model=None, n_timesteps=500):
    """Plays 500 games according to a policy. Random policy is default."""
    env = gym.make(env_name, render_mode="human")
    obs, info = env.reset()

    for _ in range(n_timesteps):  # Run for 500 timesteps or until the episode ends
        env.render()
        if model: #action selection -- either according to passed in policy or random policy
            action, _ = model.predict(obs)
        else:
            action = env.action_space.sample()
        observation, reward, terminated, truncated, info = env.step(action)
        if terminated or truncated:
            observation, info = env.reset()
    env.close()

In [3]:
environments = ["CartPole-v1", "MountainCar-v0", "Acrobot-v1", "LunarLander-v3"]
for env_name in environments:
    print(f"Running {env_name}...")
    play_agent(env_name, None)

Running CartPole-v1...


error: display Surface quit

Performance is pretty poor! But this gives us a sense of the environment and what to expect out of optimal policies.

### Solving Easy Problems with StableBaselines

In [None]:
from stable_baselines3 import PPO

def train_agent(env_name, timesteps=10000, save_model=False):
    """Train a simple PPO policy with an MLP, available simply via StableBaselines3!"""
    env = gym.make(env_name)
    model = PPO("MlpPolicy", env, verbose=1)
    model.learn(total_timesteps=timesteps)
    if save_model:
        model.save(f"{env_name}_ppo")
    env.close()
    return model

In [None]:
policies = {}
for env_name in environments:
    print(f"Training on {env_name}...")
    policies[env_name] = train_agent(env_name)

In [None]:
for env in environments:
    play_agent(env, policies[env])

## The MuJoCo Environment