This is a vectorized version of the environment. How does it work? Simply put, instead of running a single environment, we're running them into batches (vectorization). At each time step, instead of formulating a single action $a$, we'll define it as a vector $a=[0, .., n_\text{envs}]$ where each entry corresponds to the action to be performed for an environment.

Thus, if you run 4 simulatenous environments, your observations space becomes (4, num_observations). Because we use a step size of 10 and have a total of 9 snesors, each observation will result into a space of (40,9) elements. The core idea is to speed up the inference and training time of the model instead of querying a single environment.

In our implementation, based on what model you decide, you should take as inputs the observations reshaped to your liking, and predict the actions $a$ from your policy.

In [None]:
%load_ext autoreload
%autoreload 2

In [33]:
import numpy as np
from student_client.student_gym_env_vectorized import create_student_gym_env_vectorized

env = create_student_gym_env_vectorized(
            num_envs=1,
            step_size=10,
            user_token='your_token'
        )

2026-02-13 11:32:32,112 - httpx - INFO - HTTP Request: GET http://127.0.0.1:8001/api/v1/version "HTTP/1.1 200 OK"
2026-02-13 11:32:32,113 - student_client.student_gym_env_vectorized - INFO - Client is up to date (version 0.1)
2026-02-13 11:32:32,119 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/session/create "HTTP/1.1 200 OK"
2026-02-13 11:32:32,120 - student_client.student_gym_env_vectorized - INFO - Created new session: c7432987-45ed-48ee-9529-206ecc0e88b2
2026-02-13 11:32:33,247 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/episode/create "HTTP/1.1 200 OK"
2026-02-13 11:32:33,248 - student_client.student_gym_env_vectorized - INFO - Created new episode 1/1: 982aca35-ecfe-49fb-8b1e-fae4afcc3e13
2026-02-13 11:32:33,249 - student_client.student_gym_env_vectorized - INFO - StudentGymEnvVectorized initialized with 1 environments
2026-02-13 11:32:33,249 - student_client.student_gym_env_vectorized - INFO - Episode IDs: ['982aca35-ecfe-49fb-8b1e-fae4afc

In [34]:
print(f"Environment created with {env.num_envs} parallel environments")
print(f"   Episode IDs: {env.episode_ids}")

# Reset all environments
print(f"\nüîÑ Resetting all environments...")
observations, infos = env.reset()

print(f"   Observations shape: {observations.shape}")
print(f"   First observation: {observations[0]}")

Environment created with 1 parallel environments
   Episode IDs: ['982aca35-ecfe-49fb-8b1e-fae4afcc3e13']

üîÑ Resetting all environments...


2026-02-13 11:32:34,207 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/episode/vectorized_reset "HTTP/1.1 200 OK"
2026-02-13 11:32:34,208 - student_client.student_gym_env_vectorized - INFO - All 1 environments reset successfully


   Observations shape: (1, 9)
   First observation: [7.9856232e+02 1.9416766e+04 3.3628339e+02 1.1219095e+03 3.7247080e-01
 1.3702285e+06 3.9606172e+03 0.0000000e+00 1.0236157e+01]


## Training / Iterations

Here you can iterate through the vectorized environments. You'll notice that actions are a vector where each entry corresponds to the associated environment in the vector.

In A), we automatically reset the envs that have terminated so you can continue for an indefinite amount of steps. As environments don't have the same length, they stop at different times, this helps you reset terminated episodes on the fly.

Tips:
- The step_size return many observations, should you feed each one-by-one in your model, or the full step_size=10 one? The choice is yours!
- There exists multiple ways of exploring the dataset

In [80]:
for step in range(40):

    # A) Check if any environments terminated
    terminated_envs = env.get_terminated_env_indices()
    if terminated_envs:
        print(f"   ‚ö†Ô∏è  Environments {terminated_envs} terminated")
        reset_obs, reset_infos = env.reset_specific_envs(terminated_envs)
        for i, env_id in enumerate(terminated_envs):
            infos[env_id] = reset_infos[i] # reset previous info dict

    # Generate random actions for all environments
    actions = np.random.randint(0, 3, size=env.num_envs)

    print(f"\n   Step {step + 1}:")
    print(f"      Actions: {actions}")

    # Take step
    observations, rewards, terminateds, truncateds, infos = env.step(actions)

    print(f"      Rewards: {rewards}")
    print(f"      Terminated: {terminateds}")
    print(f"      Obs: {observations[0].shape}") # observation shape of first environment
    print(f"      Active environments: {env.get_active_count()}/{env.num_envs}")

    # Show filtered info (production mode)
    for i, info in enumerate(infos):
        print(f"      Env {i} info: {info}")

env.close()

   ‚ö†Ô∏è  Environments [0] terminated


2026-02-13 11:37:21,747 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/episode/vectorized_reset "HTTP/1.1 200 OK"



   Step 1:
      Actions: [0]


2026-02-13 11:37:23,284 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/episode/vectorized_step "HTTP/1.1 200 OK"


      Rewards: [448.59598]
      Terminated: [False]
      Obs: (10, 9)
      Active environments: 1/1
      Env 0 info: {'step': 9, 'terminated': False, 'truncated': False}

   Step 2:
      Actions: [1]


2026-02-13 11:37:24,626 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/episode/vectorized_step "HTTP/1.1 200 OK"


      Rewards: [-556.77844]
      Terminated: [False]
      Obs: (10, 9)
      Active environments: 1/1
      Env 0 info: {'step': 19, 'terminated': False, 'truncated': False}

   Step 3:
      Actions: [0]


2026-02-13 11:37:25,898 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/episode/vectorized_step "HTTP/1.1 200 OK"


      Rewards: [422.15576]
      Terminated: [False]
      Obs: (10, 9)
      Active environments: 1/1
      Env 0 info: {'step': 29, 'terminated': False, 'truncated': False}

   Step 4:
      Actions: [0]


2026-02-13 11:37:27,280 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/episode/vectorized_step "HTTP/1.1 200 OK"


      Rewards: [349.74756]
      Terminated: [False]
      Obs: (10, 9)
      Active environments: 1/1
      Env 0 info: {'step': 39, 'terminated': False, 'truncated': False}

   Step 5:
      Actions: [1]


2026-02-13 11:37:28,822 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/episode/vectorized_step "HTTP/1.1 200 OK"


      Rewards: [-533.50397]
      Terminated: [False]
      Obs: (10, 9)
      Active environments: 1/1
      Env 0 info: {'step': 49, 'terminated': False, 'truncated': False}

   Step 6:
      Actions: [1]


2026-02-13 11:37:30,095 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/episode/vectorized_step "HTTP/1.1 200 OK"
2026-02-13 11:37:30,099 - httpx - INFO - HTTP Request: POST http://127.0.0.1:8001/api/v1/episode/vectorized_step "HTTP/1.1 200 OK"


      Rewards: [-785.50995]
      Terminated: [False]
      Obs: (10, 9)
      Active environments: 1/1
      Env 0 info: {'step': 59, 'terminated': False, 'truncated': False}

   Step 7:
      Actions: [2]
      Rewards: [92.64468]
      Terminated: [ True]
      Obs: (1, 9)
      Active environments: 0/1
      Env 0 info: {'step': 61, 'terminated': True, 'truncated': False}


In [78]:
observations

[array([[7.9424799e+02, 1.9262566e+04, 3.3565112e+02, 1.1190446e+03,
         3.7186900e-01, 1.3630226e+06, 3.9572827e+03, 0.0000000e+00,
         9.8477840e+00]], dtype=float32)]