<a href="https://colab.research.google.com/github/Qitong323/Econ-CS-206/blob/main/Code%20Assignment%201%3A%20The%20computational%20pipline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Reinforcement Learning for Fun**

Citation (Chicago): 
Mendes, Rodolfo. “Gym Tutorial: The Frozen Lake.” Reinforcement Learning for Fun. June 16, 2019. https://reinforcement-learning4.fun/2019/06/16/gym-tutorial-frozen-lake/.

# **Gym Tutorial: The Frozen Lake**

In this article, we are going to learn how to create and explore the Frozen Lake environment using the [Gym library](https://gym.openai.com/), an open source project created by [OpenAI](https://openai.com/) used for reinforcement learning experiments. The Gym library defines a uniform interface for environments what makes the integration between algorithms and environment easier for developers. Among many ready-to-use environments, the default installation includes a text-mode version of the Frozen Lake game, used as example in our last post.

## **The Frozen Lake Environment**

The first step to create the game is to import the Gym library and create the environment. The code below shows how to do it:

In [None]:
# frozen-lake-ex1.py
import gym # loading the Gym library
 
env = gym.make("FrozenLake-v0")
env.reset()                    
env.render()


[41mS[0mFFF
FHFH
FFFH
HFFG


The first instruction imports Gym objects to our current namespace. The next line calls the method *gym.make()* to create the Frozen Lake environment and then we call the method *env.reset()* to put it on its initial state. Finally, we call the method *env.render()* to print its state:
<p align="center">
  <img src="https://launchyourintelligentapphome.files.wordpress.com/2019/06/screen-shot-2019-06-14-at-23.20.33.png" />
</p >
<br><i><center>Output of the the method env.render()</i></br></center>

So, the same grid we saw in the [previous post](https://reinforcement-learning4.fun/2019/06/09/introduction-reinforcement-learning-frozen-lake-example/) now is represented by a matrix of characters. Their meaning is as follows:



*   **S**: initial state
*   **F**: frozen lake
*   **H**: hole
*   **G**: the goal
*   **Red square**: indicates the current position of the player

Also, we can inspect the possible actions to perform in the environment, as well as the possible states of the game:

In [None]:
# frozen-lake-ex1.py
 
print("Action space: ", env.action_space)
print("Observation space: ", env.observation_space)


Action space:  Discrete(4)
Observation space:  Discrete(16)


In the code above, we print on the console the field *action_space* and the field *observation_space*. The returned objects are of the type *Discrete*, which describes a discrete space of size n. For example, the *action_space* for the Frozen Lake environment is a discrete space of 4 values, which means that the possible values for this space are 0 (zero), 1, 2 and 3. Yet, the *observation_space* is a discrete space of 16 values, which goes from 0 to 15. Besides, these objects offer some utility methods, like the *sample()* method which returns a random value from the space. With this method, we can easily create a dummy agent that plays the game randomly:

In [None]:
# frozen-lake-ex2.py
import gym
 
MAX_ITERATIONS = 10
 
env = gym.make("FrozenLake-v0")
env.reset()
env.render()
for i in range(MAX_ITERATIONS):
    random_action = env.action_space.sample()
    new_state, reward, done, info = env.step(
       random_action)
    env.render()
    if done:
        break


[41mS[0mFFF
FHFH
FFFH
HFFG
  (Down)
S[41mF[0mFF
FHFH
FFFH
HFFG
  (Down)
SFFF
F[41mH[0mFH
FFFH
HFFG


The code above executes the game for a maximum of 10 iterations using the method *sample()* from the *action_space* object to select a random action. Then the *env.step()* method takes the action as input, executes the action on the environment and returns a tuple of four values:

*   **new_state**: the new state of the environment
*   **reward**: the reward
*   **done**: a boolean flag indicating if the returned state is a terminal state
*   **info**: an object with additional information for debugging purposes

Finally, we use the method *env.render()* to print the grid on the console and use the returned “done” flag to break the loop. Notice that the selected action is printed together with the grid:
<p align="center">
  <img src= https://launchyourintelligentapphome.files.wordpress.com/2019/06/screen-shot-2019-06-16-at-08.35.24.png />
</p >
<br><i><center>Output of successive calls to env.render() method, after selecting an action to execute</i></br></center>

## **Stochastic vs Deterministic**

Note in the previous output the cases in which the player moves in a different direction than the one chosen by the agent. This behavior is completely normal in the Frozen Lake environment because it simulates a slippery surface. Also, this behavior represents an important characteristic of real-world environments: the transitions from one state to another, for a given action, are probabilistic. For example, if we shoot a bow and arrow there’s a chance to hit the target as well as to miss it. The distribution between these two possibilities will depend on our skill and other factors, like the direction of the wind, for example. Due to this probabilistic nature, the final result of a state transition does not depend entirely on the taken action.

By default, the Frozen Lake environment provided in Gym has probabilistic transitions between states. In other words, even when our agent chooses to move in one direction, the environment can execute a movement in another direction:

In [None]:
# frozen-lake-ex3.py
import gym
 
actions = {
    'Left': 0,
    'Down': 1,
    'Right': 2, 
    'Up': 3
}
 
print('---- winning sequence ------ ')
winning_sequence = (2 * ['Right']) + (3 * ['Down'])
    + ['Right']
print(winning_sequence)
 
env = gym.make("FrozenLake-v0")
env.reset()
env.render()
 
for a in winning_sequence:
    new_state, reward, done, info = env.step(actions[a])
    print()
    env.render()
    print("Reward: {:.2f}".format(reward))
    print(info)
    if done:
        break  
 
print()

Executing the code above, we can observe different results and paths at each execution. Also, using the info object returned by the step method we can inspect the probability used by the environment to choose the executed movement:
<p align="center">
  <img src= https://launchyourintelligentapphome.files.wordpress.com/2019/06/screen-shot-2019-06-16-at-17.09.04.png?w=808&h=808>
</p >
<br><i><center>The character moved in directions other than the selected one, with probability of 0.3333…</i></br></center>

However, the Frozen Lake environment can also be used in deterministic mode. By setting the property *is_slippery=False* when creating the environment, the slippery surface is turned off and then the environment always executes the action chosen by the agent:

In [None]:
# frozen-lake-ex4.py
env = gym.make("FrozenLake-v0", is_slippery=False)

Observe that the probabilities returned in the info object is always equals to 1.0.
<p align="center">
  <img src= https://launchyourintelligentapphome.files.wordpress.com/2019/06/screen-shot-2019-06-16-at-17.16.35.png?w=578&h=1174>
</p >
<br><i><center>In deterministic mode, the agent always move in the selected direction</i></br></center>

## **Map sizes and custom maps**

The default 4×4 map is not the only option to play the Frozen Lake game. Also, there’s an 8×8 version that we can create in two different ways. The first one is to use the specific environment id for the 8×8 map:

In [None]:
# frozen-lake-ex5.py
env = gym.make("FrozenLake8x8-v0")
env.reset()
env.render()


[41mS[0mFFFFFFF
FFFFFFFF
FFFHFFFF
FFFFFHFF
FFFHFFFF
FHHFFFHF
FHFFHFHF
FFFHFFFG


The second option is to call the make method passing the value “8×8” as an argument to the map_name parameter:

In [None]:
# frozen-lake-ex5.py
env = gym.make('FrozenLake-v0', map_name='8x8')
env.reset()
env.render()


[41mS[0mFFFFFFF
FFFFFFFF
FFFHFFFF
FFFFFHFF
FFFHFFFF
FHHFFFHF
FHFFHFHF
FFFHFFFG


And finally, we can create our custom map of the Frozen Lake game by passing an array of strings representing the map as an argument to the parameter desc:

In [None]:
custom_map = [
    'SFFHF',
    'HFHFF',
    'HFFFH',
    'HHHFH',
    'HFFFG'
]
 
env = gym.make('FrozenLake-v0', desc=custom_map)
env.reset()
env.render()


[41mS[0mFFHF
HFHFF
HFFFH
HHHFH
HFFFG


## **Conclusion**

In this post, we learned how to use the Gym library to create an environment to train a reinforcement learning agent. We focused on the Frozen Lake environment, a text mode game with simple rules but that allows us to explore the fundamental concepts of reinforcement learning.

## **References**

A brief introduction to reinforcement learning concepts can be found at [How AI Learns to Play Games](https://reinforcement-learning4.fun/2019/06/03/how-ai-learns-play-games/). The Frozen Lake game rules and fundamental concepts of reinforcement learning can be found at [Introduction to Reinforcement Learning: the Frozen Lake Example](https://reinforcement-learning4.fun/2019/06/09/introduction-reinforcement-learning-frozen-lake-example/). Finally, you find instructions on how to install the Gym environment, check the post [How to Install Gym](https://reinforcement-learning4.fun/2019/05/24/how-to-install-openai-gym/).

Finally, the code examples for this post can be found at https://github.com/rodmsmendes/reinforcementlearning4fun/tree/master/gym-tutorial-frozen-lake.





---


## **Answers to Research Questions**

*   **Research Question (RQ) 1**: How to experiment with the Frozen-AI environment in openAI gym by the dichotomy of environment and agents?  (2 points)
*   **Research Question (RQ) 2**: What are the general principles and references for lucid communication by professional markdowns and code formatting? (2 points)

*   **Research Question (RQ) 3**: How to tell stories using the Frozen-AI environment in openAI gym with lucid communication by professional markdowns and code formatting in Jupyter Notebook?  (2 points)




 







1.   The understanding of composition within the Frozen-lake environment tends to be necessary. The Frozen-AI environment creates a frozen lake in a grid map, and each grid in the map has a state, namely frozen, hole or goal. Thus, the setting of the environment leads to the discrete action space of the agent. Reaching different states, the agent may be rewarded or punished when in the environment. On the other hand, the agent receives observations through the interactions with the frozen-lake environment, making itself move. We can experiment with the Frozen Lake environment in the open-AI gym through the dichotomy of environment and agent via adjusting the learning strategy (algorithm) of environment and agent. Take the "sliding" status, for instance, we can change the environment by setting different probabilities of "sliding" and observing the behavior of agents.

2.  Developers tend to adopt code cells and markdown cells to achieve the annotation of their code and the visualization with mathematical expressions and other structures to assist in keeping readers on track of code developments and logic flow.

   The first and foremost principle for lucid communication is standard styling. Starting with the heading, the amount of symbol # indicates various sizes of headings. The styling of words in italics and bolding can be easily written out with asterisks (* for italics, ** for bolding). When quoting someone's saying, we can use block quotes, which come out with a right arrow >. To form nested block quotes, we can apply numerous right arrows. Besides, to express the idea more clearly, we need to tell the story in a standard format with smooth logic. That includes dividing the whole article into different sections and listing out the idea in a specific order (with numbers or bullet points). In terms of code formatting, we can import code when necessary to further demonstrate ideas and improve the interactions. Using proper layout helps readers identify the structure of code - whether they belong together within one single part or they are independent of each other. Another part of making code readable is optimizing its content. We need to polish the content to make the communication more effective and efficient to let what we write to make sense and be reader-friendly, including each element, function, or variable. Additionally, we should involve code comments to clarify the markups, such that chances are good that whenever readers look at the work, they will be able to understand what is going on quite quickly. 

3. The article can be a great example to tell the whole story. Firstly, it should have a clear logic flow, mainly on how to build the environment and then how to create the game. The article starts with creating the Frozen Lake environment and develops step by step according to the algorithm towards the conclusion. That offers readers an acceptable learning curve when reading the article and makes it easier to digest abstract concepts by explaining them one by one. Secondly, the article has a good structure with different sub-sections. Having got diverse sub-titles for each section, the article successfully improves the readability by breaking a giant goal into small pieces and leaving readers with several small tasks to complete. Thirdly, it illustrates code, hyperlinks, and generated diagrams to make the story more vivid and understandable. That provides readers chances to have a try themselves or to check the results to have a better insight into their concept and given code. The hyperlink also makes the further readings more accessible to readers and improves efficiency. 