<a href="https://colab.research.google.com/github/TYH71/DELE_CA2_RL/blob/main/Untitled32.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LunarLanderV2 Attempted using DQN (Keras)



In [10]:
# Import Libraries
import gym
import numpy as np
import matplotlib.pyplot as plt


## Description

This environment is a classic rocket trajectory optimization problem.

According to Pontryagin's maximum principle, it is optimal to fire the engine at full throttle or turn it off. This is the reason why this environment has discreet actions: engine on or off.

There are two environment versions: discrete or continuous.

The landing pad is always at coordinates (0,0). The coordinates are the first two numbers in the state vector.

Landing outside the landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt.

## Action Space

There are four discrete actions available: `do nothing`, `fire left
orientation engine`, `fire main engine`, `fire right orientation engine`.

| Action | Description |
| :--: | :--: |
| 0 | Do Nothing |
| 1 | Fire Left Orientation Engine |
| 2 | Fire Main Engine |
| 3 | Fire Right Orientation Engine |

## Observation Space

There are 8 states: the coordinates of the lander in `x` & `y`, its linear velocities in `x` & `y`, its angle, its angular velocity, and two booleans showing if each leg is in contact with the ground or not.

## Reward 

> Solving the scenario is capped at 200 points.

| Reward Scenario | Points |
| :--: | :--: |
| Moving from top of the screen to the landing pad and zero speed | 100-140 |
| Moves away from the landing pad | Lose Reward |
| Lander Crashes | -100 |
| Lander comes to rest | +100 |
| Each leg with ground contact | +10 |
| Firing Main Engine | -0.3 per frame |
| Firing Side Engine | -0.03 per frame |

## Episode Termination

The episode finishes if:

1. lander crashes (in contact with the moon)
2. lander gets outside of the viewport (`x` coordinate is greater than 1)
3. lander is not awake (body doesn't move and doesn't collide with any other body)





In [11]:
env = gym.make('LunarLander-v2')

In [12]:
# Number of Action Space
print("Number of Activation Space: {}".format(env.action_space))

# Number of Observation Space
# Reference: https://github.com/openai/gym/blob/58aeddb62fb9d46d2d2481d1f7b0a380d8c454b1/gym/spaces/box.py#L44
obs_space = env.observation_space.shape
print("Number of Observation Space: {}".format(obs_space))

Number of Activation Space: Discrete(4)
Number of Observation Space: (8,)


In [13]:
env.observation_space.shape

(8,)

## Random Sampling

In [14]:
done = False
env = gym.make('LunarLander-v2')
env.reset()
monitor = gym.wrappers.Monitor(env, './EDA Video/Random', force=True)
monitor.reset()
while done == False:
    random_action = env.action_space.sample()
    state, reward, done, info = monitor.step(random_action)

env.close()
monitor.close()

## Action 0

In [None]:
done = False
env = gym.make('LunarLander-v2')
env.reset()
monitor = gym.wrappers.Monitor(env, './EDA Video/Action 0', force=True)
monitor.reset()
while done == False:
    random_action = env.action_space.sample()
    state, reward, done, info = monitor.step(0)

env.close()
monitor.close()

## Action 1

In [None]:
done = False
env = gym.make('LunarLander-v2')
env.reset()
monitor = gym.wrappers.Monitor(env, './EDA Video/Action 1', force=True)
monitor.reset()
while done == False:
    random_action = env.action_space.sample()
    state, reward, done, info = monitor.step(1)

env.close()
monitor.close()

## Action 2

In [None]:
done = False
env = gym.make('LunarLander-v2')
env.reset()
monitor = gym.wrappers.Monitor(env, './EDA Video/Action 2', force=True)
monitor.reset()
while done == False:
    random_action = env.action_space.sample()
    state, reward, done, info = monitor.step(2)

env.close()
monitor.close()

## Action 3

In [None]:
done = False
env = gym.make('LunarLander-v2')
env.reset()
monitor = gym.wrappers.Monitor(env, './EDA Video/Action 3', force=True)
monitor.reset()
while done == False:
    state, reward, done, info = monitor.step(3)

env.close()
monitor.close()