# Reinforcement Learning Environments Exploration

## 1. Print the Total Number of Environments

In [26]:
import gym

# Retrieve all registered environments
envs = gym.envs.registry.keys()

# Count the total number of environments
total_envs = len(envs)
print(f"Total number of environments: {total_envs}")


Total number of environments: 44


## 2. Print All Registered Environments

In [None]:

# Retrieve all registered environments
envs = gym.envs.registry.all()

# Print the names of all environments
env_names = sorted([env_spec.id for env_spec in envs])
for name in env_names:
    print(name)


## 3. Explore Different Environments in RL


We will explore some classic Gym environments and describe their MDP components:
- **CartPole**
- **FrozenLake**
- **MountainCar**
- **Blackjack**
- **Taxi**
- **CliffWalking**


### CartPole Environment

In [6]:

import gym

env = gym.make("CartPole-v1")

print("Action Space:", env.action_space)
print("Observation Space:", env.observation_space)

print("\nMDP Definition for CartPole:")
print("States: Continuous observation of [cart position, cart velocity, pole angle, pole angular velocity]")
print("Actions: {0: Push cart left, 1: Push cart right}")
print("Transition: Deterministic physics-based transition depending on force and pole dynamics")
print("Reward: +1 for every timestep the pole remains upright")


Action Space: Discrete(2)
Observation Space: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)

MDP Definition for CartPole:
States: Continuous observation of [cart position, cart velocity, pole angle, pole angular velocity]
Actions: {0: Push cart left, 1: Push cart right}
Transition: Deterministic physics-based transition depending on force and pole dynamics
Reward: +1 for every timestep the pole remains upright


### FrozenLake Environment

In [9]:

env = gym.make("FrozenLake-v1", is_slippery=True)

print("Action Space:", env.action_space)
print("Observation Space:", env.observation_space)

print("\nMDP Definition for FrozenLake:")
print("States: Discrete cells in a grid (S, F, H, G)")
print("Actions: {0: Left, 1: Down, 2: Right, 3: Up}")
print("Transition: Stochastic (due to slipperiness)")
print("Reward: +1 for reaching the goal (G), 0 otherwise, - penalty for falling into hole (H)")


Action Space: Discrete(4)
Observation Space: Discrete(16)

MDP Definition for FrozenLake:
States: Discrete cells in a grid (S, F, H, G)
Actions: {0: Left, 1: Down, 2: Right, 3: Up}
Transition: Stochastic (due to slipperiness)
Reward: +1 for reaching the goal (G), 0 otherwise, - penalty for falling into hole (H)


### MountainCar Environment

In [12]:

env = gym.make("MountainCar-v0")

print("Action Space:", env.action_space)
print("Observation Space:", env.observation_space)

print("\nMDP Definition for MountainCar:")
print("States: Continuous observation of [position, velocity]")
print("Actions: {0: Push left, 1: No push, 2: Push right}")
print("Transition: Deterministic physics-based motion")
print("Reward: -1 per timestep until the goal is reached at the top of the hill")


Action Space: Discrete(3)
Observation Space: Box([-1.2  -0.07], [0.6  0.07], (2,), float32)

MDP Definition for MountainCar:
States: Continuous observation of [position, velocity]
Actions: {0: Push left, 1: No push, 2: Push right}
Transition: Deterministic physics-based motion
Reward: -1 per timestep until the goal is reached at the top of the hill


### Blackjack Environment

In [15]:

env = gym.make("Blackjack-v1")

print("Action Space:", env.action_space)
print("Observation Space:", env.observation_space)

print("\nMDP Definition for Blackjack:")
print("States: Tuple (player_sum, dealer_card, usable_ace)")
print("Actions: {0: Stick, 1: Hit}")
print("Transition: Stochastic, depends on card draw")
print("Reward: +1 win, 0 draw, -1 lose")


Action Space: Discrete(2)
Observation Space: Tuple(Discrete(32), Discrete(11), Discrete(2))

MDP Definition for Blackjack:
States: Tuple (player_sum, dealer_card, usable_ace)
Actions: {0: Stick, 1: Hit}
Transition: Stochastic, depends on card draw
Reward: +1 win, 0 draw, -1 lose


### Taxi Environment

In [18]:

env = gym.make("Taxi-v3")

print("Action Space:", env.action_space)
print("Observation Space:", env.observation_space)

print("\nMDP Definition for Taxi:")
print("States: Discrete (taxi position, passenger location, destination)")
print("Actions: {0: South, 1: North, 2: East, 3: West, 4: Pickup, 5: Dropoff}")
print("Transition: Deterministic based on grid movement and passenger interactions")
print("Reward: -1 per timestep, +20 successful dropoff, -10 illegal pickup/dropoff")


Action Space: Discrete(6)
Observation Space: Discrete(500)

MDP Definition for Taxi:
States: Discrete (taxi position, passenger location, destination)
Actions: {0: South, 1: North, 2: East, 3: West, 4: Pickup, 5: Dropoff}
Transition: Deterministic based on grid movement and passenger interactions
Reward: -1 per timestep, +20 successful dropoff, -10 illegal pickup/dropoff


### CliffWalking Environment

In [21]:

env = gym.make("CliffWalking-v0")

print("Action Space:", env.action_space)
print("Observation Space:", env.observation_space)

print("\nMDP Definition for CliffWalking:")
print("States: Discrete grid states (start, goal, cliff cells)")
print("Actions: {0: Up, 1: Right, 2: Down, 3: Left}")
print("Transition: Deterministic moves")
print("Reward: -1 per step, -100 if agent falls into the cliff, 0 at the goal")


Action Space: Discrete(4)
Observation Space: Discrete(48)

MDP Definition for CliffWalking:
States: Discrete grid states (start, goal, cliff cells)
Actions: {0: Up, 1: Right, 2: Down, 3: Left}
Transition: Deterministic moves
Reward: -1 per step, -100 if agent falls into the cliff, 0 at the goal
