# Frozen lake
The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile.<br /><br/>The surface is described using a grid like the following<br />SFFF<br />        FHFH<br />        FFFH<br />        HFFG<br/>S : starting point, safe<br />F : frozen surface, safe<br />H : hole, fall to your doom<br />G : goal, where the frisbee is located<br />The episode ends when you reach the goal or fall in a hole.<br />You receive a reward of 1 if you reach the goal, and zero otherwise.<br/><br/>Possible steps:<br/>LEFT = 0<br />DOWN = 1<br />RIGHT = 2<br />UP = 3

In [74]:
import gym
env = gym.make('FrozenLake-v0')
env.reset()
env.render()


[41mS[0mFFF
FHFH
FFFH
HFFG


In [75]:
# Action Space and Observation Space
# Action Space Bound - [0,3]
# Observation Space Bound - [0,15]
action_space = env.action_space.n
obs_space = env.observation_space.n
print("Action Space Count: "+str(action_space))
print("Observation Space Count: "+str(obs_space))

Action Space Count: 4
Observation Space Count: 16


### Manual Approach
Demonstration of every action that can be taken by the agent in the environment.

In [76]:
print("Initial State:")
env.reset()
env.render()

Initial State:

[41mS[0mFFF
FHFH
FFFH
HFFG


In [77]:
state,reward,done,info = env.step(0)
env.render()
print("\nState: "+str(state)+"\nReward: "+str(reward)+"\nDone: "+str(done)+"\nInfo: "+str(info))

  (Left)
[41mS[0mFFF
FHFH
FFFH
HFFG

State: 0
Reward: 0.0
Done: False
Info: {'prob': 0.3333333333333333}


In [78]:
state,reward,done,info = env.step(1)
env.render()
print("\nState: "+str(state)+"\nReward: "+str(reward)+"\nDone: "+str(done)+"\nInfo: "+str(info))

  (Down)
SFFF
[41mF[0mHFH
FFFH
HFFG

State: 4
Reward: 0.0
Done: False
Info: {'prob': 0.3333333333333333}


In [79]:
state,reward,done,info = env.step(2)
env.render()
print("\nState: "+str(state)+"\nReward: "+str(reward)+"\nDone: "+str(done)+"\nInfo: "+str(info))

  (Right)
[41mS[0mFFF
FHFH
FFFH
HFFG

State: 0
Reward: 0.0
Done: False
Info: {'prob': 0.3333333333333333}


In [80]:
state,reward,done,info = env.step(3)
env.render()
print("\nState: "+str(state)+"\nReward: "+str(reward)+"\nDone: "+str(done)+"\nInfo: "+str(info))

  (Up)
[41mS[0mFFF
FHFH
FFFH
HFFG

State: 0
Reward: 0.0
Done: False
Info: {'prob': 0.3333333333333333}


Here we can see that agent not always moves in  the direction of action specified, the direction in which agent goes is not deterministic but probabilistic (because of the fact that ice is slippery). It is difficult to manually solve this problem as it will take many steps. 

### Random Agent
Taking random action from the action space till the episode is completed. Episode is considered done when the agent reaches the goal or fall in hole.

In [110]:
env.reset()
print("Initial State:")
env.render()
reward = 0
count = 0 
done = False
while not done:
    action = env.action_space.sample()
    state,reward,done,info = env.step(action)
    count +=1
print("\nFinal State:")
env.render()
print("\nTotal Steps: "+str(count)+"\nState: "+str(state)+"\nReward: "+str(reward)+"\nDone: "+str(done)+"\nInfo: "+str(info))

Initial State:

[41mS[0mFFF
FHFH
FFFH
HFFG

Final State:
  (Down)
SFFF
F[41mH[0mFH
FFFH
HFFG

Total Steps: 13
State: 5
Reward: 0.0
Done: True
Info: {'prob': 0.3333333333333333}


Here the agent took 13 steps and fell in hole therefore reward is zero. This problem cannot be solved by a Random Agent because of the probabilistic nature of the environment. This can be solved by the Value Iteration Agent.