This notebook contains code to evaluate the performance of random agents on different MiniWorld environments. It also contains manually evaluated optimal episode lengths, obtained by the procedure described in the paper.

In [7]:
import numpy as np
import miniworld
import gymnasium as gym
from tqdm.notebook import tqdm

# Episode lengths and rewards generated by random agent

In [2]:
environments = ["MiniWorld-StarMazeArm-v0",
                "MiniWorld-StarMazeRandom-v0",
                "MiniWorld-WallGap-v0",
                "MiniWorld-FourColoredRooms-v0"]

In [22]:
num_episodes = 100

for env_name in environments:
    print(env_name)
    env = gym.make(env_name, view="agent", render_mode="rgb_array")
    episode_lengths = []
    rewards = []
    for i in tqdm(range(num_episodes)):
        counter = 0
        reward = 0
        term = False
        env.reset()
        while not term:
            obs, rew, terminated, truncated, _ = env.step(env.action_space.sample())
            term = terminated or truncated
            counter += 1
            reward += rew
        rewards.append(reward)
        episode_lengths.append(counter)
    print(f"Episode lengths - Mean: {np.mean(episode_lengths)}, Min: {np.min(episode_lengths)}, Max: {np.max(episode_lengths)}")
    print(f"Rewards - Mean: {np.mean(rewards):.3f}, Min: {np.min(rewards):.3f}, Max: {np.max(rewards):.3f}")
    print("\n\n")

MiniWorld-StarMazeArm-v0


  0%|          | 0/100 [00:00<?, ?it/s]

Episode lengths - Mean: 1133.67, Min: 53, Max: 1500
Rewards - Mean: 0.329, Min: 0.000, Max: 0.993



MiniWorld-StarMazeRandom-v0


  0%|          | 0/100 [00:00<?, ?it/s]

Episode lengths - Mean: 1072.97, Min: 1, Max: 1500
Rewards - Mean: 0.361, Min: 0.000, Max: 1.000



MiniWorld-WallGap-v0


  0%|          | 0/100 [00:00<?, ?it/s]

Episode lengths - Mean: 300.0, Min: 300, Max: 300
Rewards - Mean: 0.000, Min: 0.000, Max: 0.000



MiniWorld-FourColoredRooms-v0


  0%|          | 0/100 [00:00<?, ?it/s]

Episode lengths - Mean: 231.03, Min: 9, Max: 250
Rewards - Mean: 0.111, Min: 0.000, Max: 0.993





# Human-generated optimal episode lengths and rewards

10 episodes each, with top-view, walking directly to the goal position

### WallGap:

Episode lengths: [78, 61, 99, 70, 90, 56, 83, 57, 80, 82] | **Mean = 75.6**, Min = 56, Max = 99

Rewards: [0.95, 0.96, 0.93, 0.95, 0.94, 0.96, 0.94, 0.96, 0.95, 0.95] | Mean = 0.949, Min = 0.94, Max = 0.96

### FourColoredRooms:

Episode lengths: [16, 80, 51, 61, 13, 18, 71, 98, 68, 54] | **Mean = 53**, Min = 13, Max = 98

Rewards: [0.99, 0.94, 0.96, 0.95, 0.99, 0.99, 0.94, 0.92, 0.95, 0.96] | Mean = 0.959, Min = 0.92, Max = 0.99

### StarMazeArm:

Episode lengths: [42, 28, 39, 33, 38, 27, 36, 24, 43, 45] | **Mean = 35.5**, Min = 24, Max = 45

Rewards: [0.99, 1, 0.99, 1, 0.99, 1, 1, 1, 0.99, 0.99] | Mean = 0.995, Min = 0.99, Max = 1

### StarMazeRandom:

Episode lengths: [11, 19, 25, 18, 23, 15, 18, 14, 13, 8] | **Mean = 16.4**, Min = 8, Max = 25

Rewards: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1] | Mean = 1, Min = 1, Max = 1


A little helper code to quickly calculate mean, max and min from a list:

In [21]:
lst = [11, 19, 25, 18, 23, 15, 18, 14, 13, 8]
print(np.mean(lst))
print(np.min(lst))
print(np.max(lst))

16.4
8
25
