# Q-Learning with OpenAI FrozenLake

This notebook was created thanks to a course on Deep Reinforcement Learning created by Thomas Simonini. You can find the syllabus here: https://simoninithomas.github.io/Deep_reinforcement_learning_Course/

In this notebook I will try to apply the same concepts and functions created for the Taxi-v3 notebook to train agent to play Frozen Lake. Frozen Lake has an easier reward system compared to Taxi: you get 0 points if you fall into a hole (H) and 1 point if you get to the end goal (G).

In [25]:
# Loading dependencies
import numpy as np
import gym
from qlearning_functions import agent_testing, agent_training

In [26]:
# Creating Frozen Lake environment
env = gym.make('FrozenLake-v0')

In [27]:
# Showing example render
env.render()


[41mS[0mFFF
FHFH
FFFH
HFFG


In [32]:
# Obtaining action and state size to create qtable
action_size = env.action_space.n
state_size = env.observation_space.n

print("Action size: ", action_size, "\nState size: ", state_size)

Action size:  4 
State size:  16


In [29]:
# Creating empty qtable
qtable = np.zeros((state_size, action_size))

In [30]:
# Training qtable with default parameters
trained_qtable = agent_training(env, qtable)

In [31]:
score = agent_testing(env, trained_qtable)

In [18]:
score

'0.0'

Given that the only possible reward is finishing the game, I will first try to train a new agent with increased gamma values of 0.9, 0.95 and 1 so that the focus is going to be mostly on the long-term reward, that is winning the game.

In [21]:
trained_qtables = [agent_training(env, qtable, gamma=gamma) for gamma in [0.9, 0.95, 1]]

In [23]:
scores = [agent_testing(env, qtable=trained_qtable) for trained_qtable in trained_qtables]

In [24]:
scores

['0.203', '0.0', '0.0']

I'll play around with gamma values lower than 0.9 but higher than the default.

In [36]:
trained_qtables = [agent_training(env, qtable, gamma=gamma) for gamma in [0.7, 0.75, 0.85, 0.9]]

In [37]:
scores = [agent_testing(env, qtable=trained_qtable) for trained_qtable in trained_qtables]

In [38]:
scores

['0.0', '0.117', '0.03', '0.134']

I will keep using 0.9 as the gamma values given the scores, and I will try to decrease the decay rate of the epsilon variable to increase the ratio of exploration/exploitation. I will also reduce the number of training episodes since the default of 50000 was related to taxi which is a bigger environment.

In [45]:
trained_qtables = [agent_training(env, qtable, total_episodes=10000, gamma=0.9, decay_rate=decay_rate) 
                   for decay_rate in [0.05, 0.025, 0.001, 0.0005]]

In [46]:
scores = [agent_testing(env, qtable=trained_qtable) for trained_qtable in trained_qtables]

In [47]:
scores

['0.0', '0.0', '0.099', '0.045']