In [2]:
import gymnasium as gym
for i in gym.envs.registry.keys():
    print(i)
    

CartPole-v0
CartPole-v1
MountainCar-v0
MountainCarContinuous-v0
Pendulum-v1
Acrobot-v1
phys2d/CartPole-v0
phys2d/CartPole-v1
phys2d/Pendulum-v0
LunarLander-v2
LunarLanderContinuous-v2
BipedalWalker-v3
BipedalWalkerHardcore-v3
CarRacing-v2
Blackjack-v1
FrozenLake-v1
FrozenLake8x8-v1
CliffWalking-v0
Taxi-v3
tabular/Blackjack-v0
tabular/CliffWalking-v0
Reacher-v2
Reacher-v4
Pusher-v2
Pusher-v4
InvertedPendulum-v2
InvertedPendulum-v4
InvertedDoublePendulum-v2
InvertedDoublePendulum-v4
HalfCheetah-v2
HalfCheetah-v3
HalfCheetah-v4
Hopper-v2
Hopper-v3
Hopper-v4
Swimmer-v2
Swimmer-v3
Swimmer-v4
Walker2d-v2
Walker2d-v3
Walker2d-v4
Ant-v2
Ant-v3
Ant-v4
Humanoid-v2
Humanoid-v3
Humanoid-v4
HumanoidStandup-v2
HumanoidStandup-v4
GymV26Environment-v0
GymV21Environment-v0
Adventure-v0
AdventureDeterministic-v0
AdventureNoFrameskip-v0
Adventure-v4
AdventureDeterministic-v4
AdventureNoFrameskip-v4
Adventure-ram-v0
Adventure-ramDeterministic-v0
Adventure-ramNoFrameskip-v0
Adventure-ram-v4
Adventure-ramDe

Classic Control: These are canonical environments used in RL development; they form the basis of many textbook examples. They give the right mix of complexity and simplicity to test and benchmark new RL algorithms. Classic control environments in Gymnasium include: 
Acrobot
Cart Pole
Mountain Car Discrete
Mountain Car Continuous
Pendulum
Box2D: Box2D is a 2D Physics Engine for Games. Environments based on this engine include simple games like:
Lunar Lander
Car Racing
ToyText: These are small and simple environments often used to debug RL algorithms. Many of these environments are based on the small grid world model and simple card games. Examples include: 
Blackjack
Taxi
Frozen Lake
MuJoCo: Multi-Joint dynamics with Contact (MuJoCo) is an open-source physics engine that simulates environments for applications like robotics, biomechanics, ML, etc. MuJoCo environments in Gymnasium include:
Ant
Hopper
Humanoid
Swimmer
And more
In addition to the built-in environments, Gymnasium can be used with many external environments using the same API. 

We’ll use one of the canonical Classic Control environments in this tutorial. To import a specific environment, use the .make() command and pass the name of the environment as an argument. For example, to create a new environment based on CartPole (version 1), use the command below: 

In [3]:
import gymnasium as gym
env = gym.make("CartPole-v1")

Observation space 
The observation space is the space that includes all possible observations. It also defines the format in which observations are stored. The observation space is typically represented as an object of datatype Box. This is an ndarray which describes the parameters of the observations. The box specifies the bounds of each dimension. You can view the observation space for an environment using the observation_space method:

In [4]:
print("observation space: ", env.observation_space)

observation space:  Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)


In this example, the CartPole-v1 observation space has 4 dimensions. The 4 elements of the observation array are:

Cart position - varies between -4.8 and +4.8
Cart velocity - ranges between - to +
Pole angle - varies between -0.4189 and +0.4189
Pole angular velocity - ranges between  - to +

In [5]:
#reset method to see an individual obs

observation, info = env.reset()
print("observation: ", observation)

observation:  [ 0.03075869 -0.04980378  0.02226315  0.03467826]


Action space
The action space includes all possible actions that the agent can take. The action space also defines the format in which actions are represented. You can view the action space for an environment using the action_space method:

In [6]:
print("action space: ", env.action_space)

action space:  Discrete(2)


Understanding Reinforcement Learning Concepts in Gymnabsium
In a nutshell, Reinforcement Learning consists of an agent (like a robot) that interacts with its environment. A policy decides the agent’s actions. Depending on the agent’s actions, the environment gives a reward (or penalty) at each timestep. The agent uses RL to figure out the optimal policy that maximizes the total rewards the agent earns. 

The following are the key components of an RL environment: 

Environment: The external system, world, or context. The agent interacts with the environment in a series of timesteps. In each timestep, based on the agent’s action, the environment:
Gives a reward (or penalty) 
Decides the next state 
State: A mathematical representation of the current configuration of the environment. 
For example, the state of a pendulum environment can include the pendulum's position and angular velocity at each timestep. 
Terminal state: A state that does not lead to new/other states. 
Agent: The algorithm that observes the environment and takes various actions based on this observation. The agent’s goal is to maximize its rewards. 
For example, the agent decides how hard and in what direction to push the pendulum.  
Observation: A mathematical representation of the agent’s view of the environment, acquired, for example, using sensors. 
Action: The decision made by the agent before proceeding to the next step. The action affects the next state of the environment and earns the agent a reward. 
Reward: The feedback from the environment to the agent. It can be positive or negative, depending on the action and the state of the environment. 
Return: The expected cumulative return over future timesteps. Rewards from future timesteps can be discounted using a discount factor. 
Policy: The agent’s strategy about what action to take in various states. It is typically represented as a probability matrix, P, which maps states to actions.
Given a finite set of m possible states and n possible actions, element Pmn in the matrix denotes the probability of taking action an in the state sm.  
Episode: The series of timesteps from the (randomized) initial state until the agent reaches a terminal state.

Observation space and action space
The observation is the information that the agent gathers about the environment. An agent, for example, a robot, could collect environmental information using sensors. Ideally, the agent should be able to observe the complete state, which describes all the aspects of the environment. In practice, the agent uses its observations as a proxy for the state. Thus, the observations decide the agent’s actions. 

A space is analogous to a mathematical set. The space of items X includes all possible instances of X. The space of X also defines the structure (syntax and format) of all items of type X. Each Gymnasium environment has two spaces, the action space, action_space, and the observation space, observation_space. Both the action and observation spaces derive from the parent gymnasium.spaces.Space superclass. 

Observation space 
The observation space is the space that includes all possible observations. It also defines the format in which observations are stored. The observation space is typically represented as an object of datatype Box. This is an ndarray which describes the parameters of the observations. The box specifies the bounds of each dimension. You can view the observation space for an environment using the observation_space method:

source1: https://app.datacamp.com/learn/tutorials/reinforcement-learning-with-gymnasium?registration_source=google_onetap