---
<div align = "center">

# Lunar Lander
</div>

---

---
<div align="center">

## Project Overview
</div>

---

This project explores the impact of customizing an ``OpenAI Gym Environment`` on **reinforcement learning (RL) performance**. We modified an existing Gym environment - Lunar Lander - in order to train an RL agent using the Stable Baselines library, and later compare results between the **customized and original environments**.

The process involves:

- **Environment Customization**: **Implement changes** such as altered rewards or added challenges to the Environment.
- **Agent Training**: Train an RL agent with **algorithms like PPO** and further **tune their hyperparameters** to ensure optimal performance.
- **Evaluation**: **Compare agent performance** in both environments to analyze the effect of the customizations.

This project aims to analyse **how does the environment design influence the outcomes of a Reinforcement Learning Algorithm**.

---
## Dependencies
---

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
# Remove Warnings
import warnings
warnings.filterwarnings('ignore')

In [None]:
import gymnasium as gym
import numpy as np
import math

from stable_baselines3 import (PPO)
from stable_baselines3.common.env_util import (make_vec_env)
from stable_baselines3.common.vec_env import (SubprocVecEnv)

from Configuration import (CONFIG, PATHS_CONFIG,
                            PPO_SETTINGS_1)
from Environment import (MyLunarLander)

---
## Original Environment
---

Initially let's create the Original Lunar Lander Environment and define a Reinforcement Learning Model to be trained.

In [None]:
# USED TO CHECK IF THE ENVIRONMENT IS WORKING

# Create a new instance of the Environment
env = gym.make("LunarLander", render_mode="human")

# Reset the Environment - To get the initial observation
observation, info = env.reset()

# Define a flag to determine if the episode is over or not
episode_over = False

# PerfORM a Episode
while not episode_over:
    # Choose a random action
    action = env.action_space.sample()  # agent policy that uses the observation and info

    # Perform a Action / Step
    observation, reward, terminated, truncated, info = env.step(action)

    # Update if the episode is over
    episode_over = terminated or truncated

# Close the Environment
env.close()

---
### PPO (Setting 1)
---

In [None]:
# Create a Environment
envs = make_vec_env('LunarLander', n_envs=CONFIG['N_ENVS'], seed=0, vec_env_cls=SubprocVecEnv)

# Define a RL Model
model = PPO(policy="MlpPolicy", env=envs, device='cpu', verbose=1, **PPO_SETTINGS_1)

# Train the Model
model.learn(CONFIG['N_ITERATIONS'], progress_bar=True)

# Close the Environment
envs.close()

In [None]:
# DEMO

# Define a Environment
env = gym.make("LunarLander", render_mode="human")

# Perform N Episodes
for ep in range(CONFIG['N_EPISODES']):
    obs, info = env.reset()
    trunc = False
    while not trunc:
        # pass observation to model to get predicted action
        action, _states = model.predict(obs)

        # pass action to env and get info back
        obs, rewards, trunc, done, info = env.step(action)

        # show the environment on the screen
        env.render()
        print(ep, rewards, trunc)
        print("---------------")

# Close the Environment
env.close()

---
## Custom Environment
---

In [None]:
# USED TO CHECK IF THE ENVIRONMENT IS WORKING

# Register the Custom Environment
gym.register(
    id="MyLunarLander",
    entry_point=MyLunarLander,
)

# Create a new instance of the Environment
env = gym.make("MyLunarLander", render_mode="human")

# Reset the Environment - To get the initial observation
observation, info = env.reset()

# Define a flag to determine if the episode is over or not
episode_over = False

# PerfORM a Episode
while not episode_over:
    # Choose a random action
    action = env.action_space.sample()  # agent policy that uses the observation and info

    # Perform a Action / Step
    observation, reward, terminated, truncated, info = env.step(action)

    # Update if the episode is over
    episode_over = terminated or truncated

# Close the Environment
env.close()

---
### PPO (Setting 1)
---

In [None]:
# Create a Environment
envs = make_vec_env(MyLunarLander, n_envs=CONFIG['N_ENVS'], seed=0, vec_env_cls=SubprocVecEnv)

# Define a RL Model
model = PPO(policy="MlpPolicy", env=envs, device='cpu', verbose=1, **PPO_SETTINGS_1)

# Train the Model
model.learn(CONFIG['N_ITERATIONS'], progress_bar=True)

# Close the Environment
envs.close()

In [None]:
# DEMO

# Define a Environment
env = gym.make("MyLunarLander", render_mode="human")

# Perform N Episodes
for ep in range(CONFIG['N_EPISODES']):
    obs, info = env.reset()
    trunc = False
    while not trunc:
        # pass observation to model to get predicted action
        action, _states = model.predict(obs)

        # pass action to env and get info back
        obs, rewards, trunc, done, info = env.step(action)

        # show the environment on the screen
        env.render()
        print(ep, rewards, trunc)
        print("---------------")

# Close the Environment
env.close()

---
### REMENDOS
---

In [None]:
# Evaluating the DQN Model after Trainning
mean_reward, std_reward = evaluate_policy(model_DQN, env, n_eval_episodes=5, render=True)
print("-> DQN")
print("Mean Reward: {}\nStandard Deviation: {}".format(mean_reward, std_reward))
env.close()

In [None]:
# Saving Trained Models
model_DQN.save("./Models/DQN_Lunar")
model_ACER.save("./Models/ACER_Lunar")

In [None]:
# Deleting the Models so that we can later load them
del model_DQN
del model_ACER

In [None]:
# Loading the Previouly Saved Models
model_DQN = DQN.load("./Models/DQN_Lunar.zip")
model_ACER = ACER.load("./Models/ACER_Lunar.zip")