
# **Verifying Gymnasium and MuJoCo Installations with a Comprehensive Testing Script**

Ensuring that your **Gymnasium**, **Atari**, and **MuJoCo** installations are correctly set up is crucial for developing and running reinforcement learning (RL) algorithms effectively. The following Python script is designed to test multiple Gymnasium environments, both image-based and text-based, to confirm that your setup is functioning as expected.
 
## **Overview**

The script serves as a diagnostic tool to:

- **Initialize** various Gymnasium environments.
- **Capture** frames from image-based environments.
- **Generate** and **display** animated GIFs for visual feedback.
- **Print** state sequences for text-based environments like FrozenLake.
- **Report** reward statistics to assess environment interactions.

By executing this script, you can verify whether your Gymnasium and MuJoCo installations are correctly configured and capable of running different types of environments without errors.

---

## **Environment Variable Configuration**

Before importing any libraries, it's essential to set specific environment variables that dictate how MuJoCo handles rendering. This preemptive configuration prevents rendering-related errors, especially in headless environments like JupyterLab.

```python
import os

# Set rendering backend for MuJoCo environments
# Options: 'egl', 'glfw', 'osmesa'
os.environ['MUJOCO_GL'] = 'egl'  # Change to 'glfw' or 'osmesa' if 'egl' causes issues
os.environ['PYOPENGL_PLATFORM'] = 'egl'

# Notes:
# - 'egl' is suitable for headless rendering without a display server.
# - 'glfw' is suitable for environments with a display server.
# - 'osmesa' is an alternative for offscreen rendering but may require additional setup.
```

**Key Points:**

- **`MUJOCO_GL`**: Determines the rendering backend MuJoCo uses.
    - **`'egl'`**: Ideal for headless setups where no display server is available.
    - **`'glfw'`**: Suitable when a display server is present.
    - **`'osmesa'`**: Alternative for offscreen rendering; may need extra configuration.
  
- **`PYOPENGL_PLATFORM`**: Specifies the OpenGL platform. Generally set to match `MUJOCO_GL`.

**Important:** These environment variables **must be set before** importing any other libraries, especially those related to rendering like Gymnasium and MuJoCo.

---

## **Importing Necessary Libraries**

After setting the environment variables, import all required Python libraries. This sequence ensures that the libraries recognize the rendering configurations you've specified.

```python
import gymnasium as gym
import warnings
import numpy as np
import imageio
from IPython.display import Image, display, clear_output
from io import BytesIO
import time

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')
```

**Library Roles:**

- **`gymnasium`**: Core library for RL environments.
- **`warnings`**: Manages warning messages; suppressed here for clarity.
- **`numpy`**: Facilitates numerical operations, particularly for reward calculations.
- **`imageio`**: Creates animated GIFs from captured frames.
- **`IPython.display`**: Displays images and manages output within Jupyter notebooks.
- **`io.BytesIO`**: Handles in-memory byte streams, eliminating the need for disk I/O.
- **`time`**: Manages delays between state displays for animation effect.

---

## **Defining Utility Functions**

The script includes several functions tailored to handle both image-based and text-based environments. Here's a breakdown of each:

### **1. `capture_frames`**

**Purpose:**

- Executes an episode within the specified environment using a **random policy**.
- **Captures** each frame rendered during the episode.
- **Accumulates rewards** to provide performance metrics.

**Functionality:**

1. **Initialization**: Resets the environment with a fixed seed for reproducibility.
2. **Episode Execution**:
    - **Action Selection**: Samples a random action from the environment's action space.
    - **Environment Step**: Applies the action and observes the outcome.
    - **Frame Capture**: Renders and stores the current frame.
    - **Termination Check**: Ends the episode if `terminated` or `truncated` flags are `True`.
3. **Output**: Returns captured frames, total reward, per-step rewards, and termination step.

---

### **2. `generate_gif`**

**Purpose:**

- Converts a sequence of image frames into an **animated GIF**.
- Configures the GIF to **loop infinitely** for continuous playback.

**Functionality:**

1. **Buffer Setup**: Uses an in-memory byte stream (`BytesIO`) to store the GIF.
2. **GIF Creation**: Utilizes `imageio.mimsave` to compile frames into a GIF format.
3. **Loop Configuration**: Sets `loop=0` to enable infinite looping.
4. **Output**: Returns the GIF as bytes, ready for display.

---

### **3. `display_gif`**

**Purpose:**

- Renders the generated animated GIF **inline** within a Jupyter notebook.

**Functionality:**

- Leverages `IPython.display.Image` to display the GIF directly in the notebook cell output.

---

### **4. `display_text_animation`**


**Purpose:**

- Simulates animation for **text-based environments** (e.g., FrozenLake) by sequentially printing each state.
- **Delays** between prints create an illusion of movement.

**Functionality:**

1. **Initialization**: Resets the environment and prints the initial state.
2. **Episode Execution**:
    - **Action Selection**: Samples a random action.
    - **Environment Step**: Applies the action and observes the outcome.
    - **State Display**: Prints the current state without clearing previous outputs (as `clear_output` is commented out).
    - **Termination Check**: Ends the episode if `terminated` or `truncated` flags are `True`.
3. **Output**: Prints total rewards and termination step after the episode concludes.

**Note:** The `clear_output(wait=True)` line is commented out to allow all state prints to appear sequentially without overwriting previous outputs. This helps in tracking the agent's movement through the environment.

---

### **5. `test_env`**


**Purpose:**

- **Unified Testing Function**: Handles both image-based and text-based environments.
- **Automates** the testing process, providing consistent outputs for different environment types.

**Functionality:**

1. **Environment Identification**:
    - Checks the `category` of the environment to determine the rendering approach.
2. **Text-Based Environments (`'toytext'`)**:
    - Initializes with `render_mode="ansi"`.
    - Runs the `display_text_animation` function to print state sequences.
3. **Image-Based Environments**:
    - Initializes with `render_mode="rgb_array"`.
    - Executes `capture_frames` to collect frames during the episode.
    - Generates and displays an animated GIF if frames are captured.
4. **Error Handling**:
    - Catches and prints any exceptions that occur during testing, ensuring that failures in one environment don't halt the entire script.

---

## **Executing the Environment Tests**

To utilize the defined functions and test your Gymnasium and MuJoCo installations, execute the following steps:

1. **Define the Environments to Test**:
    - Create a list of dictionaries, each containing the `env_id` and its corresponding `category`.
    - **Categories**:
        - `'mujoco'`: Image-based environments requiring MuJoCo.
        - `'atari'`: Image-based environments from the Atari suite.
        - `'classical_control'`: Image-based environments focusing on control tasks.
        - `'box2d'`: Image-based environments using Box2D physics.
        - `'toytext'`: Text-based environments like FrozenLake.
  
2. **Run Tests for Each Environment**:
    - Iterate through the list and call `test_env` for each environment.


**Execution Flow:**

1. **MuJoCo Environment (`Ant-v4`)**:
    - Attempts to initialize and run using MuJoCo rendering.
    - If successful, captures frames and displays an animated GIF.
    - Reports total and average rewards, along with termination step.
  
2. **Atari Environment (`ALE/Breakout-v5`)**:
    - Similar process as MuJoCo environments.
  
3. **Classical Control (`CartPole-v1`)** and **Box2D (`LunarLander-v2`)**:
    - Follows the same testing methodology.
  
4. **Text-Based Environment (`FrozenLake-v1`)**:
    - Initializes with `render_mode="ansi"`.
    - Sequentially prints each state to simulate animation.
    - Reports total rewards and termination step.

---

## **Understanding the State Sequence in FrozenLake**

When testing text-based environments like **FrozenLake-v1**, the script prints the state grid at each step, allowing you to visualize the agent's movement. Here's an example of what you might observe:

```
SFFF
FHFH
FFFH
HFFG

(Left)
SFFF
FHFH
FFFH
HFFG

(Up)
SFFF
FHFH
FFFH
HFFG

(Left)
SFFF
FHFH
FFFH
HFFG

...
```

**Interpretation:**

- **Grid Symbols**:
    - **S**: Start position.
    - **F**: Frozen tile (safe to walk on).
    - **H**: Hole (falling into these ends the episode).
    - **G**: Goal.
    - **A**: Agent's current position (if modified to display agent position).

- **Agent's Movement**:
    - The agent attempts to move in the specified direction (e.g., **Left**, **Up**).
    - Due to stochasticity, the actual movement might differ from the intended action.
    - The agent's new position is reflected in the subsequent state grid.



In [19]:


import os

# Set rendering backend for MuJoCo environments
# Options: 'egl', 'glfw', 'osmesa'
os.environ['MUJOCO_GL'] = 'egl'  # You can change this if 'egl' causes issues
os.environ['PYOPENGL_PLATFORM'] = 'egl'

# Note:
# - 'egl' is suitable for headless rendering.
# - 'glfw' is suitable for environments with a display server.
# - 'osmesa' is an alternative for offscreen rendering but may require additional setup.



import gymnasium as gym
import warnings
import numpy as np
import imageio
from IPython.display import Image, display, clear_output
from io import BytesIO
import time

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

def capture_frames(env, max_steps=1000):
    """
    Runs an episode using a random policy and captures the frames.

    Args:
        env (gym.Env): The Gymnasium environment.
        max_steps (int): Maximum number of steps to run.

    Returns:
        frames (list): List of frames captured during the episode.
        total_reward (float): Total reward accumulated.
        step_rewards (list): List of rewards per step.
        termination_step (int): Step at which the episode terminated.
    """
    frames = []
    total_reward = 0
    step_rewards = []
    termination_step = 0

    obs, info = env.reset(seed=42)

    for step in range(max_steps):
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        total_reward += reward
        step_rewards.append(reward)

        # Render the frame and append to frames list
        try:
            frame = env.render()
            if frame is not None:
                frames.append(frame)
        except Exception as render_exception:
            print(f"Rendering failed at step {step + 1}: {render_exception}")
            break

        if terminated or truncated:
            termination_step = step + 1
            break

    return frames, total_reward, step_rewards, termination_step

def generate_gif(frames, fps=30):
    """
    Generates an animated GIF from a list of frames with infinite looping.

    Args:
        frames (list): List of frames (as NumPy arrays).
        fps (int): Frames per second for the GIF.

    Returns:
        gif_bytes (bytes): The GIF image in bytes.
    """
    with BytesIO() as buffer:
        # 'loop=0' ensures the GIF loops infinitely
        imageio.mimsave(buffer, frames, format='GIF', fps=fps, loop=0)
        gif_bytes = buffer.getvalue()
    return gif_bytes

def display_gif(gif_bytes):
    """
    Displays an animated GIF in the Jupyter notebook with infinite looping.

    Args:
        gif_bytes (bytes): The GIF image in bytes.
    """
    display(Image(data=gif_bytes))

def display_text_animation(env, max_steps=1000, delay=0.5):
    """
    Displays a text-based environment's states sequentially to simulate animation.

    Args:
        env (gym.Env): The Gymnasium environment.
        max_steps (int): Maximum number of steps to run.
        delay (float): Delay in seconds between frames.
    """
    obs, info = env.reset(seed=42)
    initial_state = env.render()
    print(initial_state)

    total_reward = 0

    for step in range(max_steps):
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        total_reward += reward

        # Clear previous output
        # clear_output(wait=True)

        # Render the current state
        state = env.render()
        print(state)

        if terminated or truncated:
            print(f"\nEpisode terminated at step {step + 1}.")
            break

        time.sleep(delay)

    # Display total reward
    print(f"**Total Reward**: {total_reward:.2f}")
    print(f"**Episode terminated at step**: {step + 1}.\n")

def test_env(env_id, category, max_steps=1000):
    """
    Generalized function to test Gymnasium environments and display animations inline.

    Args:
        env_id (str): The Gymnasium environment ID.
        category (str): The category of the environment (e.g., 'mujoco', 'atari', 'toytext').
        max_steps (int): Maximum number of steps to run.
    """
    print(f"### Testing {category.capitalize()} Environment: {env_id}")
    try:
        if category.lower() == 'toytext':
            # Handle text-based environments differently
            env = gym.make(env_id, render_mode="ansi")
            print(f"**{env_id}** environment initialized successfully.\n")
            display_text_animation(env, max_steps=max_steps)
            env.close()
        
        # Handle image-based environments
        env = gym.make(env_id, render_mode="rgb_array")
        print(f"**{env_id}** environment initialized successfully.\n")

        # Capture frames and run the episode
        frames, total_reward, step_rewards, termination_step = capture_frames(env, max_steps)

        env.close()
        print(f"**Completed** testing {env_id} environment.")
        print(f"**Total Reward**: {total_reward:.2f}")
        print(f"**Average Reward per Step**: {np.mean(step_rewards):.2f}")
        print(f"**Episode terminated at step**: {termination_step}.\n")

        if frames:
            # Generate and display the animated GIF
            gif_bytes = generate_gif(frames)
            print("**Displaying the trajectory animation:**\n")
            display_gif(gif_bytes)
        else:
            print("No frames captured; skipping animation display.\n")

    except Exception as e:
        print(f"**Error** testing {env_id} environment: {e}\n")



In [None]:
environments = [
        {'env_id': 'Ant-v4', 'category': 'mujoco'},
        {'env_id': 'ALE/Breakout-v5', 'category': 'atari'},
        {'env_id': 'CartPole-v1', 'category': 'classical_control'},
        {'env_id': 'LunarLander-v2', 'category': 'box2d'},
        {'env_id': 'FrozenLake-v1', 'category': 'toytext'}  # Added FrozenLake
    ]


for env in environments:
    test_env(env['env_id'], env['category'], max_steps=100)

### States in FrozenLake 

**Link**: https://gymnasium.farama.org/environments/toy_text/frozen_lake/

Each character represents a tile in the grid:

- S: Start position.
- F: Frozen tile.
- H: Hole.
- G: Goal.