# Frozen Lake 

https://medium.com/swlh/introduction-to-reinforcement-learning-coding-q-learning-part-3-9778366a41c0

### Desription

S is the starting point, G is the goal, F is the solid ice where the agent can stand and H is the hole where if the agent goes, it falls down.

The agent has 4 possible moves which are represented in the environment as 0, 1, 2, 3 for left, right, down, up respectively.

For every state F, the agent gets 0 reward, for state S it gets -1 reward as in state H the agent will die and upon reaching G, the agent gets +1 reward.

In [1]:
import gym
import numpy as np
import time, pickle, os
import matplotlib.pyplot as plt

In [3]:
# A quick run of the CartPole enviroment with random actions
# prints out the observations
env = gym.make("FrozenLake-v0")
for i_episode in range(10):
    observation = env.reset()
    for t in range(50):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()


[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Up)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Up)
S[41mF[0mFF
FHFH
FFFH
HFFG
1
Episode finished after 3 timesteps

[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Down)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Right)
S[41mF[0mFF
FHFH
FFFH
HFFG
1
  (Up)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Left)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Up)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Down)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Down)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Down)
S[41mF[0mFF
FHFH
FFFH
HFFG
1
  (Up)
SF[41mF[0mF
FHFH
FFFH
HFFG
2
  (Left)
SFFF
FH[41mF[0mH
FFFH
HFFG
6
  (Right)
SF[41mF[0mF
FHFH
FFFH
HFFG
2
  (Down)
SFFF
FH[41mF[0mH
FFFH
HFFG
6
Episode finished after 13 timesteps

[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Down)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Up)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Down)
S[41mF[0mFF
FHFH
FFFH
HFFG
1
Episode finished after 4 timesteps

[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Left)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Up)
[41mS[0mFFF
FHFH
FFFH
HFFG
0
  (Up)
[41mS[0mF

## Solving the problem

Let's start off by importing some stuff and initialize the frozen lake environment. 

In [None]:
import gym
import numpy as np
import time, pickle, os

env = gym.make('FrozenLake-v0')

#### Initialize Variables

Lines 7–12 initializes our variables. epsilon for the epsilon-greedy approach, gamma is the discount factor, max_episodes is the maximum amount of times we’ll run the game, max_steps is the maximum steps we’ll run for every episode and lr_rate is the learning rate.

Line 14 initializes our Q-table as a 16x4 matrix filled with zeros. env.observation-space.n tells the total number of states in the game and env.action-space.n tells the total number of actions.

In [None]:
epsilon = 0.9
total_episodes = 10000
max_steps = 100

lr_rate = 0.81
gamma = 0.96

Q = np.zeros((env.observation_space.n, env.action_space.n))

In [None]:
    
def choose_action(state):
    action=0
    if np.random.uniform(0, 1) < epsilon:
        action = env.action_space.sample()
    else:
        action = np.argmax(Q[state, :])
    return action

def learn(state, state2, reward, action):
    predict = Q[state, action]
    target = reward + gamma * np.max(Q[state2, :])
    Q[state, action] = Q[state, action] + lr_rate * (target - predict)

In [None]:
# Start
for episode in range(total_episodes):
    state = env.reset()
    t = 0
    
    while t < max_steps:
        env.render()

        action = choose_action(state)  

        state2, reward, done, info = env.step(action)  

        learn(state, state2, reward, action)

        state = state2

        t += 1
       
        if done:
            break

        time.sleep(0.1)

print(Q)

with open("frozenLake_qTable.pkl", 'wb') as f:
    pickle.dump(Q, f)