## Frozen Lake-v0

Winter is here. You and your friends were tossing around a frisbee at the park when you made a wild throw that left the frisbee out in the middle of the lake. The water is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you'll fall into the freezing water. At this time, there's an international frisbee shortage, so it's absolutely imperative that you navigate across the lake and retrieve the disc. However, the ice is slippery, so you won't always move in the direction you intend.

The surface is described using a grid like the following:

SFFF

FHFH

FFFH

HFFG

(S: starting point, safe)

(F: frozen surface, safe)

(H: hole, fall to your doom)

(G: goal, where the frisbee is located)

The episode ends when you reach the goal or fall in a hole. You receive a reward of 1 if you reach the goal, and zero otherwise.

The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile.

In [1]:
# Importing Useful Packages

import gym
import numpy as np
import random
from IPython.display import clear_output
import time

In [2]:
# Setting up the Frozen Lake Environment

env = gym.make("FrozenLake-v0")

In [3]:
# Initializing the Q-Table Values

action_space_steps = env.action_space.n
state_space_steps = env.observation_space.n

q_table = np.zeros((state_space_steps, action_space_steps))
print(q_table)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [4]:
# Initializing the Algorithm Parameters

max_episodes = 10000
max_steps_per_episodes = 100

learning_rate = 0.1
discount_rate = 0.99

exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.001
exploration_decay_rate = 0.001

In [5]:
# Q-Learning Algorithm

rewards_all_episodes = []

for episode in range(max_episodes):    
    
    state = env.reset()                
    
    done = False                       
    rewards_all_episode = 0            
    
    for step in range(max_steps_per_episodes):
        
        # Exploration-Exploitation Trade-off
        
        exploration_rate_threshold = random.uniform(0.2, 0.9)
        
        if exploration_rate_threshold > exploration_rate:
            action = np.argmax(q_table[state, :])
        else:
             action = env.action_space.sample()
                
        new_state, reward, done, info = env.step(action)
        
        # Update Q-Table for Q(s, a)
        
        q_table[state, action] = (1 - learning_rate) * q_table[state, action] + learning_rate * (reward + discount_rate * np.max(q_table[new_state, :]))
        
        state = new_state
        rewards_all_episode += reward
        
        if done == True:
            break
            
    
    # Exploration Rate Decay
    
    exploration_rate = min_exploration_rate + (max_exploration_rate - min_exploration_rate) * np.exp(-exploration_decay_rate * episode)
    
    rewards_all_episodes.append(rewards_all_episode)
        

In [6]:
# Calculate and Print the Average Reward per Thousand Episodes

rewards_per_thousand_episodes = np.split(np.array(rewards_all_episodes), max_episodes/1000)
count = 1000
print("Average Reward per Thousand Episodes")

for r in rewards_per_thousand_episodes:
    print(count, ":", str(sum(r/1000)))
    count += 1000
    
# Print Updated Q-Table

print("Q-Table")
print(q_table)

Average Reward per Thousand Episodes
1000 : 0.06100000000000005
2000 : 0.5340000000000004
3000 : 0.7140000000000005
4000 : 0.7380000000000005
5000 : 0.7240000000000005
6000 : 0.7110000000000005
7000 : 0.7210000000000005
8000 : 0.7450000000000006
9000 : 0.7090000000000005
10000 : 0.7340000000000005
Q-Table
[[0.51056433 0.43100658 0.43269299 0.43449086]
 [0.25741506 0.34586674 0.31114881 0.4526337 ]
 [0.37513415 0.24853227 0.24024026 0.25140213]
 [0.0260855  0.1612514  0.02236193 0.05209493]
 [0.52552053 0.30828324 0.34009718 0.38974292]
 [0.         0.         0.         0.        ]
 [0.10893429 0.0516517  0.32111082 0.08495761]
 [0.         0.         0.         0.        ]
 [0.45989429 0.3151485  0.35771801 0.55893846]
 [0.4219032  0.61019072 0.42607256 0.30835605]
 [0.67209557 0.34361512 0.31503831 0.24977597]
 [0.         0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.33428762 0.53712028 0.75907383 0.52138421]
 [0.63676753 0.88960541 0.68838982 

In [7]:
# Rendering the Virtual Environment of the Frozen Lake

for episode in range(3):
    state = env.reset()
    done= False
    print("Episode: ", episode + 1, "\n")
    time.sleep(1)
    
    for step in range(max_steps_per_episodes):
        
        clear_output(wait=True)
        env.render()
        time.sleep(0.3)
        
        action = np.argmax(q_table[state, :])      
        new_state, reward, done, info = env.step(action)
        
        if done == True:
            clear_output(wait=True)
            env.render()
            
            if reward == 1:
                print("You reached the Goal \n")
                time.sleep(3)
            else:
                print("You fell through the Hole \n")
                time.sleep(3)
            clear_output(wait=True)
            break 
            
        state = new_state
        
env.close()

  (Up)
SF[41mF[0mF
FHFH
FFFH
HFFG
You fell through the Hole 

