# Explore Frozen Lake Environment

Frozen Lake is the stochastic environment i.e. agent reach the next states with some probability known as **Transition probability**. The transition probability is obtained using `env[state][action]`.

In [1]:
# Import necessary library
import gymnasium as gym

In [2]:
# Initialize the Frozen Lake Environment
env = gym.make('FrozenLake-v1', map_name="4x4", is_slippery=False, render_mode='ansi')

In [3]:
obs, info = env.reset()

# Print the initial position of agent in the environment
print("The initial observation is: {}".format(obs))

The initial observation is: 0


In [4]:
# Let's render the environment and observe the complete frozen lake environment
print(env.render())


[41mS[0mFFF
FHFH
FFFH
HFFG


Here, 

**S** refers to the starting state,
**F** refers to the frozen state,
**H** refers to the hole state, and,
**G** refers to the goal state

Our goal, here is to reach goal state **G** from the starting state **S** without visiting any hole state **H**. By any chance, if agent visits the hole state **H**, the agent will fall into the hole and die.

In [5]:
# Print the Observation space (or state space) and action space 
print("The observation space: {}".format(env.observation_space))
print("The action space: {}".format(env.action_space))

The observation space: Discrete(16)
The action space: Discrete(4)


The observation space is also called state space. And there are 16 states in our environment starting from 0 to 15 (from left to right and top to bottom). And, in each state, an agent can take four action as defined by action space. They are:

0 => Left
1 => Down
2 => Right
3 => Up

In [6]:
# map numbers to action
action_map = {
    0: "left",
    1: "down",
    2: "right",
    3: "up"
}

# Let's take a random action now from the action space
random_action = env.action_space.sample()

# # Take the action and get the new observation space
next_state, reward, done, info, transition_prob = env.step(random_action)
print("Action: {}".format(action_map[random_action]))
print("Next State: {}".format(next_state))
print("Reward: {}".format(reward))

Action: left
Next State: 0
Reward: 0.0


In [7]:
# Let's render the environment and confirm the same in the environment
print(env.render())

  (Left)
[41mS[0mFFF
FHFH
FFFH
HFFG


## One Episode Random Walk

In [8]:
# Number of times the agent moves
num_timestep = 20

# Reset the environment
env.reset()
# We make 10 random walk in environment
for i in range(num_timestep):
    print("------- Step: {} --------".format(i+1))
        # Let's take a random action now from the action space
    # Random action means we are taking random policy at the moment.
    random_action = env.action_space.sample()
    
    # # Take the action and get the new observation space
    next_state, reward, done, info, transition_prob = env.step(random_action)
    
    print("Action: {}".format(action_map[random_action]))
    print("Next State: {}".format(next_state))
    print("Reward: {}".format(reward))
    
    print(env.render())
    
    # if the agent moves to hole state, then terminate
    if done: 
        break

------- Step: 1 --------
Action: up
Next State: 0
Reward: 0.0
  (Up)
[41mS[0mFFF
FHFH
FFFH
HFFG

------- Step: 2 --------
Action: up
Next State: 0
Reward: 0.0
  (Up)
[41mS[0mFFF
FHFH
FFFH
HFFG

------- Step: 3 --------
Action: right
Next State: 1
Reward: 0.0
  (Right)
S[41mF[0mFF
FHFH
FFFH
HFFG

------- Step: 4 --------
Action: right
Next State: 2
Reward: 0.0
  (Right)
SF[41mF[0mF
FHFH
FFFH
HFFG

------- Step: 5 --------
Action: right
Next State: 3
Reward: 0.0
  (Right)
SFF[41mF[0m
FHFH
FFFH
HFFG

------- Step: 6 --------
Action: up
Next State: 3
Reward: 0.0
  (Up)
SFF[41mF[0m
FHFH
FFFH
HFFG

------- Step: 7 --------
Action: up
Next State: 3
Reward: 0.0
  (Up)
SFF[41mF[0m
FHFH
FFFH
HFFG

------- Step: 8 --------
Action: up
Next State: 3
Reward: 0.0
  (Up)
SFF[41mF[0m
FHFH
FFFH
HFFG

------- Step: 9 --------
Action: up
Next State: 3
Reward: 0.0
  (Up)
SFF[41mF[0m
FHFH
FFFH
HFFG

------- Step: 10 --------
Action: up
Next State: 3
Reward: 0.0
  (Up)
SFF[41mF[0m
FHFH
FF

This completes one episode in the environment and we can run for a series of episodes as below:

## Multi-episode Random Walk

In [9]:
# Number of times the agent moves
num_timestep = 20

# Number of episodes
num_episodes = 10

for e in range(num_timestep):
    print("-----------------------------------")
    print("Episode {}/{}".format(e, num_episodes))
    print("-----------------------------------")
    # Reset the environment
    env.reset()
    # We make 10 random walk in environment
    for t in range(num_timestep):
        print("timestep: {}".format(t+1))
        print("-----------------------------------------------------")
        # Let's take a random action now from the action space
        # Random action means we are taking random policy at the moment.
        random_action = env.action_space.sample()
        
        # # Take the action and get the new observation space
        next_state, reward, done, info, transition_prob = env.step(random_action)
        
        print("Action: {}".format(action_map[random_action]))
        print("Next State: {}".format(next_state))
        print("Reward: {}".format(reward))
        print("")
        
        print(env.render())
        
        # if the agent moves to hole state, then terminate
        if done: 
            break

-----------------------------------
Episode 0/10
-----------------------------------
timestep: 1
-----------------------------------------------------
Action: left
Next State: 0
Reward: 0.0

  (Left)
[41mS[0mFFF
FHFH
FFFH
HFFG

timestep: 2
-----------------------------------------------------
Action: right
Next State: 1
Reward: 0.0

  (Right)
S[41mF[0mFF
FHFH
FFFH
HFFG

timestep: 3
-----------------------------------------------------
Action: up
Next State: 1
Reward: 0.0

  (Up)
S[41mF[0mFF
FHFH
FFFH
HFFG

timestep: 4
-----------------------------------------------------
Action: up
Next State: 1
Reward: 0.0

  (Up)
S[41mF[0mFF
FHFH
FFFH
HFFG

timestep: 5
-----------------------------------------------------
Action: up
Next State: 1
Reward: 0.0

  (Up)
S[41mF[0mFF
FHFH
FFFH
HFFG

timestep: 6
-----------------------------------------------------
Action: down
Next State: 5
Reward: 0.0

  (Down)
SFFF
F[41mH[0mFH
FFFH
HFFG

-----------------------------------
Episode 1/10
------

Through random walk, as explored above, our agent mostly landed on Hole. Even if not and have reached the goal successfully, there is no policy learned that guarantees the agent will reach the goal. This is where we need a reinforcement learning to learn optimal policy that guides our agent in each state and helps it to reach the goal state. This optimal policy is learnt through various RL algorithm.