### Template Code Exploration

This notebook is for understanding the starting code provided and debugging code components step by step

In [110]:
import gymnasium as gym
import random

In [111]:
def show_state(step, env, obs, reward):
    # print(step, env, obs, reward)
    ansi_state = env.render()
    array_state = list(env.unwrapped.decode(obs))
    print(f"Step {step}: {array_state}, Reward: {reward}")
    print(ansi_state)
 

In [112]:
env = gym.make("Taxi-v3", render_mode="ansi")
env

<TimeLimit<OrderEnforcing<PassiveEnvChecker<TaxiEnv<Taxi-v3>>>>>

In [113]:
obs, info = env.reset()
print(f'obs: {obs}')
print(f'info: {info}')

print(f"state: {env.unwrapped.s}")

obs: 391
info: {'prob': 1.0, 'action_mask': array([1, 1, 0, 1, 0, 0], dtype=int8)}
state: 391


In [114]:
# initial starting state
show_state(0, env, obs, 0)


Step 0: [3, 4, 2, 3], Reward: 0
+---------+
|R: | : :G|
| : | : : |
| : : : : |
| | : | :[43m [0m|
|[34;1mY[0m| : |[35mB[0m: |
+---------+




In [115]:
# random example
for i in range(2):

   obs, reward, terminated, truncated, info = env.step(

       random.choice([0, 1, 2, 3, 4, 5])

   )

   show_state(i + 1, env, obs, reward)


Step 1: [4, 4, 2, 3], Reward: -1
+---------+
|R: | : :G|
| : | : : |
| : : : : |
| | : | : |
|[34;1mY[0m| : |[35mB[0m:[43m [0m|
+---------+
  (South)

Step 2: [4, 3, 2, 3], Reward: -1
+---------+
|R: | : :G|
| : | : : |
| : : : : |
| | : | : |
|[34;1mY[0m| : |[35m[43mB[0m[0m: |
+---------+
  (West)



From the documentation, important text kept below

https://gymnasium.farama.org/api/env/

The main API methods that users of this class need to know are:

step() - Updates an environment with actions returning the next agent observation, the reward for taking that actions, if the environment has terminated or truncated due to the latest action and information from the environment about the step, i.e. metrics, debug info.

reset() - Resets the environment to an initial state, required before calling step. Returns the first agent observation for an episode and information, i.e. metrics, debug info.

render() - Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.

close() - Closes the environment, important when external software is used, i.e. pygame for rendering, databases



In [116]:
env.observation_space.n

np.int64(500)

In [117]:
env.action_space.n

np.int64(6)

In [118]:
env.action_space.sample()

np.int64(0)

In [120]:
# Total epochs per episode is 200 before it returns truncated=True
print("Max Steps: ", env.spec.max_episode_steps)

Max Steps:  200
