# Gymnasium Basics

This notebook walks through the basics of the gymnasium package and interface. We start by exploring some simple gyms and policies to solve them, then we take a closer look at our environment of interest, MuJoCo.

## Table of Contents
1. [Basic Gym Environments](#Basic-Gym-Environments)
    - [Random Policies](#Random-Policies)
    - [Solving Easy Problems with StableBaselines](#Solving-Easy-Problems-with-StableBaselines)
3. [The MuJoCo Environment](#The-MuJoCo-Environment)

## Basic Gym Environments

### Random Policies

In [1]:
import gymnasium as gym

In [3]:
def play_agent(env_name: str, policy):
    env = gym.make(env_name, render_mode="human")
    model = PPO.load(f"{env_name}_ppo")
    obs, info = env.reset()

    for _ in range(500): #n_episodes = 500
        if policy not None: #action selection -- either according to passed in policy or random policy
            action, _ = model.predict(obs)
        else:
            action = env.action_space.sample()
    
        obs, reward, terminated, truncated, info = env.step(action) #execute action, get results, check for episode over
        if terminated or truncated:
            obs, info = env.reset()
    
    env.close()

Running CartPole-v1...
The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.
Running MountainCar-v0...
Running Acrobot-v1...
Running LunarLander-v3...


In [None]:
environments = ["CartPole-v1", "MountainCar-v0", "Acrobot-v1", "LunarLander-v3"]
for env_name in environments:
    print(f"Running {env_name}...")
    play_agent(env_name, None)

### Solving Easy Problems with StableBaselines

In [4]:
from stable_baselines3 import PPO

def train_agent(env_name, timesteps=10000, save_model=False):
    env = gym.make(env_name)
    model = PPO("MlpPolicy", env, verbose=1)
    model.learn(total_timesteps=timesteps)
    if save_model:
        model.save(f"{env_name}_ppo")
    env.close()
    return model

In [5]:
environments = ["CartPole-v1", "MountainCar-v0", "Acrobot-v1", "LunarLander-v2"]
policies = {}
for env_name in environments:
    print(f"Training on {env_name}...")
    policies[env_name] = train_agent(env_name)

Training on CartPole-v1...
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 21.8     |
|    ep_rew_mean     | 21.8     |
| time/              |          |
|    fps             | 1608     |
|    iterations      | 1        |
|    time_elapsed    | 1        |
|    total_timesteps | 2048     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 26.1        |
|    ep_rew_mean          | 26.1        |
| time/                   |             |
|    fps                  | 991         |
|    iterations           | 2           |
|    time_elapsed         | 4           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.008992441 |
|    clip_fraction        | 0.0963      |
|    clip_range           | 0.2    

  logger.deprecation(


DeprecatedEnv: Environment version v2 for `LunarLander` is deprecated. Please use `LunarLander-v3` instead.

## The MuJoCo Environment