# Number Grid Env

Number Grid Env serves as an example of navigating ASCII worlds by relying on a series of potential questions of an
ASCII room with a player (denoted as @), and objects, denoted by numbers during training, and a test for OOD with
alphabetical objects during evaluation.

In [1]:
# setup
from example_env import NumberGridEnv, create_example_grid

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Display some grids
q, a =create_example_grid("easy", False, 0, include_q_prompt=False)
print(q)
print(a)
print("--------------------------------")
q, a = create_example_grid("medium", False, 0, include_q_prompt=False)
print(q)
print(a)
print("--------------------------------")
q, a = create_example_grid("hard", False, 0, include_q_prompt=False)
print(q)
print(a)
print("--------------------------------")



Your position is (3,1).
Map:
#######
#   0 #
#   @ #
#  4  #
#     #
#     #
#######
Question:
What is the product of the coordinates of you and 4? Answer in (x,y) format.
\boxed{(6, 2)}
--------------------------------
Your position is (3,4).
Map:
#########
#       #
#   7   #
#    4  #
#       #
#   @   #
#  0    #
#       #
#########
Question:
What is the (nearest integer) Euclidean distance between you and 0?
\boxed{1})
--------------------------------
Your position is (8,2).
Map:
###########
#   03    #
#       4 #
#        @#
# 6       #
#         #
#   5    2#
#         #
#         #
# 7       #
###########
Question:
What is the (nearest integer) Euclidean distance between you and 6?
\boxed{7})
--------------------------------


## Why Number Grid?
Number grid serves as a way to install spatial knowledge into an LLM without requiring visual capabilities.

By using some simple questions based on (where are you, where is X, what is the distance between you and X?), we can
begin to integrate spatial knowledge into the language model by providing it with randomized rooms and objects.

There are many downstream usecases for this (e.g. playing roguelike games and navigating complex tables) that this 
environment can teach. By being procedurally generated, we can easily sample a wide space without repetition.

In [3]:
# Looks good, lets get an evaluation going...
import asyncio
env_config, server_configs = NumberGridEnv.config_init()
env = NumberGridEnv(
    env_config,
    server_configs,
    slurm=False,
)
await env.evaluate()
print(env.eval_metrics)


100%|██████████| 80/80 [00:09<00:00,  8.72it/s]

[('easy', 0.55), ('medium', 0.3), ('hard', 0.15), ('ood', 0.35)]





# Current State

As you can see from the above evaluation, there is quite a bit of room for improvement in these models.
"hard" environments only score 15% on gpt-4.1-mini, and these are 7x7 or 9x9 rooms, not some truly difficuly ASCII map.

Since there is some progression in the difficulty stats, it appears likely that we can teach the model how to navigate
an ASCII map and collect (x,y) information.

In [4]:
# Process command to show we can execute the main env loop and collect data.
# We'll get data saved to data/number_grid.json to take a quick look at, as well.
!python example_env.py process

BaseEnvConfigWithDefaults(
    group_size=8,
    max_num_workers=-1,
    max_eval_workers=16,
    max_num_workers_per_node=8,
    steps_per_eval=100,
    max_token_length=2048,
    eval_handling=<EvalHandlingEnum.STOP_TRAIN: 'STOP_TRAIN'>,
    eval_limit_ratio=0.5,
    eval_on_start=False,
    inference_weight=1.0,
    batch_size=-1,
    max_batches_offpolicy=3,
    tokenizer_name='NousResearch/DeepHermes-3-Llama-3-3B-Preview',
    use_wandb=True,
    rollout_server_url='http://localhost:8000',
    total_steps=2,
    wandb_name=None,
    num_rollouts_to_keep=32,
    num_rollouts_per_group_for_logging=1,
    ensure_scores_are_not_same=False,
    data_path_to_save_groups='data/number_grid.jsonl',
    min_items_sent_before_logging=2,
    include_messages=True
)
[
    APIServerConfig(
        timeout=1200,
        num_max_requests_at_once=512,
        num_requests_for_eval=64,
        model_name='gpt-4.1-nano',
        rolling_buffer_length=1000,
        server_type='openai',
        api_k

wandb: Currently logged in as: dmahan93 to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.10
wandb: Run data is saved locally in c:\Users\dmaha\vscode_projects\atropos\examples\wandb\run-20250516_150016-ospuf97m
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run number_grid-2025-05-16-jagoqv
wandb:  View project at https://wandb.ai/dmahan93/atropos
wandb:  View run at https://wandb.ai/dmahan93/atropos/runs/ospuf97m
An unexpected error occurred during processing: 'list' object has no attribute 'strip'
