# Solving `FrozenLake-v1`: the Markov Decision Process Approach

<br>
<iframe width="800" height="600" src="https://www.gymlibrary.dev/environments/toy_text/frozen_lake/"></iframe>
<br>

## The Markov Decision Process

<br>
<iframe width="560" height="315" src="https://www.youtube.com/embed/sJIFUTITfBc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<br>

## Value Iteration

Notes:

* The value iteration algorithm below is not updating the values of termination states. Thus the values of those states must be initialized correctly at zero. 

* Since $\gamma = 1$, and the reward of getting the gift is 1 and can only be obtained once, and since that reward is the only non-zero reward, the value function will reflects the probability of eventually getting the gift (instead of ending up in holes).

In [5]:
import gym
import numpy as np

# Load the FrozenLake-v1 environment
env = gym.make('FrozenLake-v1', 
               desc=None, 
               map_name="4x4", 
               is_slippery=True)

# Set the discount factor (gamma) 
gamma = 1.0

# Set the maximum number of iterations for the value iteration loop
max_iterations = 100

# Initialize the value function with all zeros
V = np.zeros(env.observation_space.n)

# Start the value iteration loop
for i in range(max_iterations):
    # Initialize the updated value function with all zeros
    V_updated = np.zeros(env.observation_space.n)
    
    # Iterate over all states
    for s in range(env.observation_space.n):
        # Iterate over all actions
        values = []

        for a in range(env.action_space.n):
            # Initialize the value for the state-action pair to be 0
            value = 0
            
            # Iterate over all next states
            for p, s_prime, r, _ in env.P[s][a]:
                # Update the value for the state-action pair
                value += p * (r + gamma * V[s_prime])
            
            # Update the maximum value for the state
            values.append(value)
        
        # Update the value function for the state
        V_updated[s] = max(values)
    
    # Set the updated value function as the current value function
    V = V_updated

# Print the final value function
print('The Value Functions')
print(np.reshape(np.array(V), (4, 4)))

The Value Functions
[[0.74419029 0.71786905 0.69921264 0.68954284]
 [0.74998193 0.         0.47290225 0.        ]
 [0.7611395  0.7768436  0.72358054 0.        ]
 [0.         0.84920568 0.9239777  0.        ]]
