## Defining a custom gym environment
- Note this example was taken from the [gymnasium documentation](https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/)
- Observations provide the location of the target and agent.
- There are 4 actions in our environment, corresponding to the movements "right", "up", "left", and "down".
- A done signal is issued as soon as the agent has navigated to the grid cell where the target is located.
- Rewards are binary and sparse, meaning that the immediate reward is always zero, unless the agent has reached the target, then it is 1.

In [2]:
import gym_examples
import gymnasium as gym
from gymnasium.wrappers import FlattenObservation

env = gym.make('gym_examples/GridWorld-v0')
wrapped_env = FlattenObservation(env)
print(wrapped_env.reset())     # E.g.  [3 0 3 3], {}
env.close()

(array([1, 2, 0, 1]), {'distance': 2.0})


## Rendering in human mode

In [None]:
## Test the environment
env = gym.make('gym_examples/GridWorld-v0', render_mode='human')
observation, info = env.reset(seed=42)
for _ in range(100):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()

env.close()

In [None]:
## Test the environment

env = gym.make("LunarLander-v2", render_mode="rgb_array")
episode_rewards = []
G = 0
random_images = []

observation, info = env.reset(seed=42)
random_images.append(env.render())

for _ in range(1000):
    action = env.action_space.sample()  # this is where you would insert your policy
    observation, reward, terminated, truncated, info = env.step(action)
    random_images.append(env.render())

    if terminated or truncated:
        observation, info = env.reset()
        episode_rewards.append(G)
        G = 0

    else:
        G += reward

env.close()
print(episode_rewards)
# imageio.mimsave('./gifs/random-net.gif', random_images, fps = 60)      