## Initializing Environments

In [2]:
import gymnasium as gym

In [3]:
env = gym.make('CartPole-v1')
env

<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>

To see all envs

In [4]:
gym.envs.registry.keys()

dict_keys(['CartPole-v0', 'CartPole-v1', 'MountainCar-v0', 'MountainCarContinuous-v0', 'Pendulum-v1', 'Acrobot-v1', 'CartPoleJax-v0', 'CartPoleJax-v1', 'PendulumJax-v0', 'LunarLander-v2', 'LunarLanderContinuous-v2', 'BipedalWalker-v3', 'BipedalWalkerHardcore-v3', 'CarRacing-v2', 'Blackjack-v1', 'FrozenLake-v1', 'FrozenLake8x8-v1', 'CliffWalking-v0', 'Taxi-v3', 'Jax-Blackjack-v0', 'Reacher-v2', 'Reacher-v4', 'Pusher-v2', 'Pusher-v4', 'InvertedPendulum-v2', 'InvertedPendulum-v4', 'InvertedDoublePendulum-v2', 'InvertedDoublePendulum-v4', 'HalfCheetah-v2', 'HalfCheetah-v3', 'HalfCheetah-v4', 'Hopper-v2', 'Hopper-v3', 'Hopper-v4', 'Swimmer-v2', 'Swimmer-v3', 'Swimmer-v4', 'Walker2d-v2', 'Walker2d-v3', 'Walker2d-v4', 'Ant-v2', 'Ant-v3', 'Ant-v4', 'Humanoid-v2', 'Humanoid-v3', 'Humanoid-v4', 'HumanoidStandup-v2', 'HumanoidStandup-v4', 'GymV21Environment-v0', 'GymV26Environment-v0'])

The following is what the code represents:

![image.png](attachment:image.png)

In [10]:
env = gym.make("LunarLander-v2", render_mode="human")
observation, info = env.reset()

for _ in range(1000):
    action = env.action_space.sample()  # agent policy that uses the observation and info
    observation, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        observation, info = env.reset()

env.close()

: 

There is an agent and an environment which are being simulated.

- `make` creates an environment with `render_mode` set to `human`.
- We'll have to create something for the agent to interact with the environment.
- we call `reset` to get the initial state of the environment.
  - this will go to a random state
- `step` takes an action and returns the next state, reward, and extra information
  - a single action-observation exchange is called a timestep
  - after enough timesteps, the environment may end on a terminal state
    - this information is returned in `step`
  - you want to reset the environment if the environment terminates or reaches the max timesteps
    - you reset with the `reset` function

What are all the possible actions and observations?

- You can get the action space with `env.action_space`
- You can get the observation space with `env.observation_space`

How do you make an agent policy?
- Instead of an agent policy, we just sampled from the action space
  - There are tutorials for making policies

What should an environment have?
- `action_space` and `observation_space`
  - They should be `gym.spaces` objects
  - We will use `Box`
    - describes an n-dimensional continuous space. It’s a bounded space where we can define the upper and lower limits which describe the valid values our observations can take.
    
How do you modify the environment?
- We use `Wrappers`
  - Some defualt wrappers are `TimeLimit` and `OrderEnforcing`
- You use a wrapper by calling `env = wrapper(env)`


  

Overall goal:
- Create environment
- Let stable baselines interact with environment

In [None]:
# https://stable-baselines.readthedocs.io/en/master/