# This notebook is for setting up the Box2D simulator Lunar Lander control task from OpenAI Gym. This notebook will feature the Lunar Lander, which we will ultimately learn the optimal policy for in the next lecture after we cover Q-learning. This environment will be excellent practice and prepartion for the final course project. For more on the environment visit: 

https://www.gymlibrary.dev/environments/box2d/lunar_lander/

# Lunar Lander is part of the Box2D environments. 

![Screen%20Shot%202022-05-05%20at%201.52.40%20PM.png](attachment:Screen%20Shot%202022-05-05%20at%201.52.40%20PM.png)

# Before running this notebook you may have to pip install the Box2D simulator as it's not included with basic install of gym. Run the command pip install Box2D

In [None]:
# To install run the command "pip install gym[box2d]" in your terminal

![Screen%20Shot%202022-05-05%20at%201.40.33%20PM.png](attachment:Screen%20Shot%202022-05-05%20at%201.40.33%20PM.png)

In [1]:
# First step is to get the environment gym. 
import gym 

In [2]:
# Second step is to use the "make" function to setup the environment:
env = gym.make("LunarLander-v2")

In [3]:
# Third step is to "reset" the environment to its initial state:
env.reset()

array([ 0.00244665,  1.3997078 ,  0.24779089, -0.49832776, -0.00282814,
       -0.05612831,  0.        ,  0.        ], dtype=float32)

# Check the observation space for more understanding: 

In [4]:
env.observation_space

Box([-inf -inf -inf -inf -inf -inf -inf -inf], [inf inf inf inf inf inf inf inf], (8,), float32)

# The documentaion on the state space is very bad; Digging into a research paper (https://arxiv.org/pdf/2011.11850.pdf), I found what each of the 8 elements in the state vector means. 

![Screen%20Shot%202022-05-05%20at%201.45.55%20PM.png](attachment:Screen%20Shot%202022-05-05%20at%201.45.55%20PM.png)

# Check the action space for more understanding: 

In [5]:
env.action_space

Discrete(4)

# Four discrete actions available: do nothing, fire left orientation engine, fire main engine, fire right orientation engine. For more see the wiki: https://github.com/openai/gym/wiki/Leaderboard#lunarlander-v2, but reading the paper is probably the best bet for understanding this environment more deeply. 

# Landing pad is always at coordinates (0,0). Coordinates are the first two numbers in state vector. Reward for moving from the top of the screen to landing pad and zero speed is about 100..140 points. If lander moves away from landing pad it loses reward back. Episode finishes if the lander crashes or comes to rest, receiving additional -100 or +100 points. Each leg ground contact is +10. Firing main engine is -0.3 points each frame. Solved is 200 points. Landing outside landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt. Four discrete actions available: do nothing, fire left orientation engine, fire main engine, fire right orientation engine.

# Let's take a step in this environment. 

In [6]:
# Fourth step is to take an action or a control as we call it. 
next_state, reward, isDone, _ = env.step(0)

print("The next state is ", next_state)
print("The reward is ", reward)
print("Is the episode over? ", isDone)

The next state is  [ 0.00489321  1.387919    0.24745628 -0.5239548  -0.00560186 -0.05547903
  0.          0.        ]
The reward is  -1.3907480745574787
Is the episode over?  False


# Let's render this environment and see what is looks like:

In [7]:
# At anytime during the simulation we can "render" or show the environment state by calling:
env.render( )

True

In [8]:
# To close the window, run the "close" method of the environment:
env.close( )

# In the next notebook, let's run a simulation of this using just the do-nothing control and just one of the other fire enginer controls.

In [None]:
# Ref: https://towardsdatascience.com/ai-learning-to-land-a-rocket-reinforcement-learning-84d61f97d055
# is a simpler article for the students to read. 