# Lunar Lander Control- Assignment 2
 



This assignment aims to train an agent to play the Lunar Lander game using a convolutional neural network and a reinforcement learning algorithm.

In the OpenAI Gym game Lunar Lander (https://gym.openai.com/envs/LunarLander-v2/) the player's job is to control a small spaceship to land if safely on a landing pad. There are three thrusters which can be used for control. These work in three directions: up, left, and right. The player can also choose to do nothing. A dataset has been collected from an expert player of LunarLander that contains screenshots of the state of the game and the player's associated action (none, up, left, and right).

Training each of the models requires the following steps:
1. Import and preprocess data
    - Images are converted to grayscale, downsized to 84x84, and each pixel is normalised by dividing by 255
2. Train model using 70 percent of the original data set
    - Validation set is 15 percent and test set is 15 percent
3. Evalutation exercise as game controller for 200 episodes
    - Using a lunar lander game player for each model:
         Convolutional Neural Network: lunar_lander_ml_images_player.py
         Deep Q-Learning with Epsilon Greedy Policy: lunar_lander_rl_player.py
4. The models above are then evaluated based on their performance in the Lunar Lander game. They are compared based on computation time. 

### Import packages to use in model building and visualisation

In [1]:
import keras
from tensorflow.python.keras.layers import Dense, Activation, Dropout, Conv2D, MaxPooling2D, Flatten
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Conv2D, MaxPooling2D, Flatten
from keras.utils import np_utils
from keras import backend as K
from keras.utils.np_utils import to_categorical
from keras.utils.vis_utils import model_to_dot
from keras.optimizers import RMSprop, adam
from keras import backend as K
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
from keras.preprocessing.image import ImageDataGenerator

import sklearn
from sklearn.tree import export_graphviz
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn import preprocessing 
from sklearn.utils import shuffle


import rl
from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy, EpsGreedyQPolicy
from rl.memory import SequentialMemory

import csv
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy as sp
import PIL
from IPython.display import display, HTML, Image, SVG
import seaborn as sns
import pickle
import cv2
import gym
import os, os.path, shutil

# import shutil
# from IPython.display import SVG
# import cv2
# import os
%matplotlib inline

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


### Split images into class action folders in all_data file

In [4]:
# Creating folder path
folder_path=('\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\all_data')

images = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f))]

for image in images:
    frame, date, number, folder_name = image.split('.')[0].split('_')
    # eg. frame_2019-04-08-11-20-48_8_0.jpeg

    new_path = os.path.join(folder_path, folder_name)
    if not os.path.exists(new_path):
        os.makedirs(new_path)

    old_image_path = os.path.join(folder_path, image)
    new_image_path = os.path.join(new_path, image)
    shutil.move(old_image_path, new_image_path)

### Function to split folders into training and validation folders

NOTE: To use the split dataset function, we must have subdirectories named 'all_data', 'training_data', and 'validation_data' in the main directory as the notebook file. The all_data directory must split all images into their own subdirectories by category. It will randomly select training and validation set and create subdirectories by category in 'training_data' and 'validation_data' subdirectories. We use import the directories into generators for trainin models in Keras.

In [5]:
# This function was created by Daan Raman, which was shared on Github. It can be found at https://github.com/keras-team/keras/issues/5862.

def split_dataset_into_test_and_train_sets(all_data_dir, training_data_dir, testing_data_dir, testing_data_pct):
    # Recreate testing and training directories
    if testing_data_dir.count('/') > 1:
        shutil.rmtree(testing_data_dir, ignore_errors=False)
        os.makedirs(testing_data_dir)
        print("Successfully cleaned directory " + testing_data_dir)
    else:
        print("Refusing to delete testing data directory " + testing_data_dir + " as we prevent you from doing stupid things!")

    if training_data_dir.count('/') > 1:
        shutil.rmtree(training_data_dir, ignore_errors=False)
        os.makedirs(training_data_dir)
        print("Successfully cleaned directory " + training_data_dir)
    else:
        print("Refusing to delete training data directory " + training_data_dir + " as we prevent you from doing stupid things!")

    num_training_files = 0
    num_testing_files = 0

    for subdir, dirs, files in os.walk(all_data_dir):
        category_name = os.path.basename(subdir)

        # Don't create a subdirectory for the root directory
        print(category_name + " vs " + os.path.basename(all_data_dir))
        if category_name == os.path.basename(all_data_dir):
            continue

        training_data_category_dir = training_data_dir + '/' + category_name
        testing_data_category_dir = testing_data_dir + '/' + category_name

        if not os.path.exists(training_data_category_dir):
            os.mkdir(training_data_category_dir)

        if not os.path.exists(testing_data_category_dir):
            os.mkdir(testing_data_category_dir)

        for file in files:
            input_file = os.path.join(subdir, file)
            if np.random.rand(1) < testing_data_pct:
                shutil.copy(input_file, testing_data_dir + '/' + category_name + '/' + file)
                num_testing_files += 1
            else:
                shutil.copy(input_file, training_data_dir + '/' + category_name + '/' + file)
                num_training_files += 1

    print("Processed " + str(num_training_files) + " training files.")
    print("Processed " + str(num_testing_files) + " testing files.")



In [6]:
split_dataset_into_test_and_train_sets('all_data', 'training_data', 'validation_data', 0.3)

Refusing to delete testing data directory validation_data as we prevent you from doing stupid things!
Refusing to delete training data directory training_data as we prevent you from doing stupid things!
all_data vs all_data
0 vs all_data
1 vs all_data
2 vs all_data
3 vs all_data
Processed 44637 training files.
Processed 19034 testing files.


### Balance classes of actions in TRAINING data

Only training data will be balanced according to the samllest class. Class folder 1 has approx 1500 images, therefore we will try to reduce the other folder sizes to withing 2 times the size of this, using an undersampling function.

In [1]:
import glob
import random
dest0 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\0\\*.jpeg'
dest1 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\1\\*.jpeg'
dest2 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\2\\*.jpeg'
dest3 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\3\\*.jpeg'
# dest1 = '/Users/kate/Documents/UCD_2018_2019/Semester2/AdvancedMachineLearningCOMP47590/Assignment_2/balanced_train_data/*.jpeg'

file_list0 = glob.glob(dest0)
file_list1 = glob.glob(dest1)
file_list2 = glob.glob(dest2)
file_list3 = glob.glob(dest3)

import numpy
len_arr = numpy.array([len(file_list0),len(file_list1),len(file_list2),len(file_list3)])
len_arr

array([2788, 1577, 3211, 1525])

Training data folder sizes:
- 0: 21168
- 1: 1577
- 2: 20207
- 3: 1525

We want to reduce folders 0 and 2 to max 3000

Take images out of folders in order to balance classes

In [4]:
import glob
import random

# Downsize images in folder '0'
dest0 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\0\\*.jpeg'
# dest0 = '/Users/kate/Documents/UCD_2018_2019/Semester2/AdvancedMachineLearningCOMP47590/Assignment_2/training_data/0/*.jpeg'
file_list = glob.glob(dest0)

random.shuffle(file_list)
new_images0 = file_list[0:18268]

for file in new_images0:
    os.remove(file)

Number of images in each folder

In [5]:
# Finding number of images in each folder
import glob
import random
dest0 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\0\\*.jpeg'
dest1 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\1\\*.jpeg'
dest2 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\2\\*.jpeg'
dest3 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\3\\*.jpeg'
# dest1 = '/Users/kate/Documents/UCD_2018_2019/Semester2/AdvancedMachineLearningCOMP47590/Assignment_2/balanced_train_data/*.jpeg'

file_list0 = glob.glob(dest0)
file_list1 = glob.glob(dest1)
file_list2 = glob.glob(dest2)
file_list3 = glob.glob(dest3)

import numpy
len_arr = numpy.array([len(file_list0),len(file_list1),len(file_list2),len(file_list3)])
len_arr

array([ 2788,  1577, 20479,  1525])

In [6]:
# Downsize images in folder '2'
dest2 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\2\\*.jpeg'
# dest2 = '/Users/kate/Documents/UCD_2018_2019/Semester2/AdvancedMachineLearningCOMP47590/Assignment_2/training_data/2/*.jpeg'
file_list = glob.glob(dest2)

random.shuffle(file_list)
new_images2 = file_list[0:17268]

for file in new_images2:
    os.remove(file)

Number of images in each folder

In [7]:
# Finding number of images in each folder
import glob
import random
dest0 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\0\\*.jpeg'
dest1 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\1\\*.jpeg'
dest2 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\2\\*.jpeg'
dest3 = '\\Users\\aksunbul\\Desktop\\MSc Statistics\\Advanced Machine Learning\\Assignments\\Assignment 2\\assignment\\training_data\\3\\*.jpeg'
# dest1 = '/Users/kate/Documents/UCD_2018_2019/Semester2/AdvancedMachineLearningCOMP47590/Assignment_2/balanced_train_data/*.jpeg'

file_list0 = glob.glob(dest0)
file_list1 = glob.glob(dest1)
file_list2 = glob.glob(dest2)
file_list3 = glob.glob(dest3)

import numpy
len_arr = numpy.array([len(file_list0),len(file_list1),len(file_list2),len(file_list3)])
len_arr

array([2788, 1577, 3211, 1525])

### Downsize and change images

In [7]:
# https://medium.com/@vijayabhaskar96/tutorial-image-classification-with-keras-flow-from-directory-and-generators-95f75ebe5720
# Create generators for training and test data and rescale image pixels by 255
train_datagen = ImageDataGenerator(rescale=1./255)
validation_datagen = ImageDataGenerator(rescale=1./255)

# Convert images to grayscale and resize to dimension 64x64
train_generator = train_datagen.flow_from_directory(
        'training_data',
        target_size=(84, 84),
        batch_size=32,
        class_mode='categorical',
        color_mode = 'grayscale')

validation_generator = validation_datagen.flow_from_directory(
        'validation_data',
        target_size=(84, 84),
        batch_size=32,
        class_mode='categorical',
        color_mode = 'grayscale')

Found 9101 images belonging to 4 classes.
Found 19034 images belonging to 4 classes.


# Task 1: Convolutional Neural Network Modelling

# Create model architecture

In [12]:
# Build a CNN
model = Sequential()

model.add(Conv2D(32, (3,3), padding = 'valid', activation = 'relu', input_shape = (84, 84, 1)))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Conv2D(64, (3,3), padding = 'valid', activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Conv2D(64, (3,3), padding = 'valid', activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2,2)))

model.add(Flatten())
model.add(Dense(512, activation = 'relu'))
model.add(Dense(4, activation = 'softmax'))

print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_7 (Conv2D)            (None, 82, 82, 32)        320       
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 41, 41, 32)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 39, 39, 64)        18496     
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 19, 19, 64)        0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 17, 17, 64)        36928     
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 4096)              0         
__________

In [13]:
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

### Train the model on the training data and test on validation data

In [14]:
STEP_SIZE_TRAIN=train_generator.n//validation_generator.batch_size
STEP_SIZE_VALID=validation_generator.n//validation_generator.batch_size
model.fit_generator(generator=train_generator,
                    steps_per_epoch=STEP_SIZE_TRAIN,
                    validation_data=validation_generator,
                    validation_steps=STEP_SIZE_VALID,
                    epochs=4
)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x149b3f812b0>

In [15]:
model.save('ML_image_model_v2.h5')

# ******

### New model

In [4]:
# Build a CNN
model = Sequential()

model.add(Conv2D(32, (5,5), padding = 'valid', activation = 'relu', input_shape = (84, 84, 1)))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Conv2D(64, (3,3), padding = 'valid', activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Conv2D(128, (3,3), padding = 'valid', activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2,2)))

model.add(Flatten())
model.add(Dense(256, activation = 'relu'))
model.add(Dense(4, activation = 'softmax'))

print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 80, 80, 32)        832       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 40, 40, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 38, 38, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 19, 19, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 17, 17, 128)       73856     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 128)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 8192)              0         
__________

In [5]:
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

In [8]:
STEP_SIZE_TRAIN=train_generator.n//validation_generator.batch_size
STEP_SIZE_VALID=validation_generator.n//validation_generator.batch_size
model.fit_generator(generator=train_generator,
                    steps_per_epoch=STEP_SIZE_TRAIN,
                    validation_data=validation_generator,
                    validation_steps=STEP_SIZE_VALID,
                    epochs=4
)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1f7e61eccc0>

In [9]:
model.save('ML_image_model_v2_newParameters.h5')

# Task 2: Reinforcement Learning with Deep Q Learning

In [2]:
ENV_NAME = 'LunarLander-v2'
env = gym.make(ENV_NAME)
np.random.seed(123)
env.seed(123)
nb_actions = env.action_space.n

In [3]:
# Next, we build a simple model
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 8)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 16)                144       
_________________________________________________________________
activation_1 (Activation)    (None, 16)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 16)                272       
_________________________________________________________________
activation_2 (Activation)    (None, 16)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 16)                272       
_________________________________________________________________
activation_3 (Activation)    (None, 16)                0         
__________

In [4]:
# Training Deep Q Learning with Epsilon Greedy policy
memory = SequentialMemory(limit=300000, window_length=1)
policy = EpsGreedyQPolicy()
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=200,
               target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

In [5]:
# Fit the model
dqn.fit(env, nb_steps=100000, visualize=False, verbose=2)

# After training is done, we save the final weights.
dqn.save_weights('dqn_{}_weights.h5f'.format(ENV_NAME), overwrite=True)

Training for 100000 steps ...
    80/100000: episode: 1, duration: 0.136s, episode steps: 80, steps per second: 588, episode reward: -611.151, mean reward: -7.639 [-100.000, 2.843], mean action: 1.550 [0.000, 2.000], mean observation: 0.136 [-5.029, 3.282], loss: --, mean_absolute_error: --, mean_q: --
   134/100000: episode: 2, duration: 0.037s, episode steps: 54, steps per second: 1442, episode reward: -479.214, mean reward: -8.874 [-100.000, -1.844], mean action: 1.407 [0.000, 2.000], mean observation: 0.080 [-2.033, 7.817], loss: --, mean_absolute_error: --, mean_q: --
   220/100000: episode: 3, duration: 0.810s, episode steps: 86, steps per second: 106, episode reward: -732.691, mean reward: -8.520 [-100.000, -0.404], mean action: 1.349 [0.000, 3.000], mean observation: 0.235 [-1.891, 4.834], loss: 63.888624, mean_absolute_error: 1.949395, mean_q: 0.291975
   306/100000: episode: 4, duration: 0.277s, episode steps: 86, steps per second: 310, episode reward: -133.441, mean reward: 

  2285/100000: episode: 30, duration: 0.200s, episode steps: 76, steps per second: 380, episode reward: -102.128, mean reward: -1.344 [-100.000, 10.619], mean action: 0.724 [0.000, 3.000], mean observation: -0.028 [-1.582, 4.637], loss: 13.179567, mean_absolute_error: 28.402786, mean_q: -35.223862
  2359/100000: episode: 31, duration: 0.181s, episode steps: 74, steps per second: 409, episode reward: -74.615, mean reward: -1.008 [-100.000, 19.660], mean action: 0.500 [0.000, 3.000], mean observation: -0.029 [-1.568, 1.413], loss: 9.157174, mean_absolute_error: 28.363958, mean_q: -35.429417
  2424/100000: episode: 32, duration: 0.173s, episode steps: 65, steps per second: 375, episode reward: -91.888, mean reward: -1.414 [-100.000, 13.965], mean action: 0.477 [0.000, 3.000], mean observation: -0.086 [-5.292, 1.408], loss: 8.510504, mean_absolute_error: 29.380829, mean_q: -36.841072
  2482/100000: episode: 33, duration: 0.142s, episode steps: 58, steps per second: 407, episode reward: -11

  4544/100000: episode: 58, duration: 0.335s, episode steps: 108, steps per second: 323, episode reward: -45.190, mean reward: -0.418 [-100.000, 16.004], mean action: 0.991 [0.000, 3.000], mean observation: -0.031 [-1.232, 2.191], loss: 13.129876, mean_absolute_error: 35.165718, mean_q: -44.561642
  4630/100000: episode: 59, duration: 0.263s, episode steps: 86, steps per second: 327, episode reward: -108.261, mean reward: -1.259 [-100.000, 8.455], mean action: 1.384 [0.000, 3.000], mean observation: -0.017 [-1.026, 3.466], loss: 12.443233, mean_absolute_error: 35.379581, mean_q: -44.691456
  4733/100000: episode: 60, duration: 0.332s, episode steps: 103, steps per second: 311, episode reward: -84.633, mean reward: -0.822 [-100.000, 12.727], mean action: 1.806 [0.000, 3.000], mean observation: -0.055 [-1.012, 3.853], loss: 16.306618, mean_absolute_error: 34.756184, mean_q: -43.623970
  4798/100000: episode: 61, duration: 0.174s, episode steps: 65, steps per second: 373, episode reward: 

 14304/100000: episode: 86, duration: 0.724s, episode steps: 221, steps per second: 305, episode reward: -89.323, mean reward: -0.404 [-100.000, 13.347], mean action: 1.516 [0.000, 3.000], mean observation: 0.116 [-1.408, 1.406], loss: 6.861284, mean_absolute_error: 15.231625, mean_q: -16.402731
 14531/100000: episode: 87, duration: 0.753s, episode steps: 227, steps per second: 301, episode reward: -147.033, mean reward: -0.648 [-100.000, 3.489], mean action: 1.454 [0.000, 3.000], mean observation: 0.169 [-0.410, 1.400], loss: 10.678433, mean_absolute_error: 15.028285, mean_q: -16.116560
 14684/100000: episode: 88, duration: 0.532s, episode steps: 153, steps per second: 288, episode reward: -153.772, mean reward: -1.005 [-100.000, 3.151], mean action: 1.510 [0.000, 3.000], mean observation: 0.175 [-0.496, 1.414], loss: 6.163798, mean_absolute_error: 15.888356, mean_q: -16.968409
 14892/100000: episode: 89, duration: 0.684s, episode steps: 208, steps per second: 304, episode reward: -15

 28026/100000: episode: 114, duration: 5.293s, episode steps: 1000, steps per second: 189, episode reward: -56.788, mean reward: -0.057 [-4.835, 4.611], mean action: 1.841 [0.000, 3.000], mean observation: 0.184 [-0.752, 1.398], loss: 5.450135, mean_absolute_error: 19.805973, mean_q: 4.397847
 29026/100000: episode: 115, duration: 5.035s, episode steps: 1000, steps per second: 199, episode reward: -57.279, mean reward: -0.057 [-4.611, 4.299], mean action: 1.709 [0.000, 3.000], mean observation: 0.162 [-0.222, 1.410], loss: 6.604644, mean_absolute_error: 20.053078, mean_q: 5.953378
 30026/100000: episode: 116, duration: 6.097s, episode steps: 1000, steps per second: 164, episode reward: -69.007, mean reward: -0.069 [-4.855, 4.505], mean action: 1.750 [0.000, 3.000], mean observation: 0.095 [-0.441, 1.433], loss: 7.142855, mean_absolute_error: 19.761776, mean_q: 8.204585
 31026/100000: episode: 117, duration: 4.343s, episode steps: 1000, steps per second: 230, episode reward: -119.015, m

 54518/100000: episode: 142, duration: 4.108s, episode steps: 1000, steps per second: 243, episode reward: -88.897, mean reward: -0.089 [-4.339, 4.849], mean action: 1.704 [0.000, 3.000], mean observation: 0.160 [-0.281, 1.398], loss: 6.967619, mean_absolute_error: 30.415562, mean_q: 32.640732
 55518/100000: episode: 143, duration: 3.851s, episode steps: 1000, steps per second: 260, episode reward: -82.119, mean reward: -0.082 [-4.736, 5.193], mean action: 1.794 [0.000, 3.000], mean observation: 0.175 [-0.234, 1.462], loss: 5.478444, mean_absolute_error: 30.530975, mean_q: 32.925949
 56518/100000: episode: 144, duration: 5.473s, episode steps: 1000, steps per second: 183, episode reward: -69.291, mean reward: -0.069 [-4.595, 5.407], mean action: 1.821 [0.000, 3.000], mean observation: 0.050 [-0.500, 1.401], loss: 6.984499, mean_absolute_error: 30.784012, mean_q: 33.945187
 57518/100000: episode: 145, duration: 4.527s, episode steps: 1000, steps per second: 221, episode reward: -70.083,

 81618/100000: episode: 170, duration: 4.155s, episode steps: 1000, steps per second: 241, episode reward: -71.033, mean reward: -0.071 [-4.652, 4.200], mean action: 1.420 [0.000, 3.000], mean observation: 0.184 [-0.152, 1.406], loss: 4.857139, mean_absolute_error: 28.906933, mean_q: 33.595646
 82618/100000: episode: 171, duration: 4.774s, episode steps: 1000, steps per second: 209, episode reward: -74.445, mean reward: -0.074 [-5.145, 4.943], mean action: 1.718 [0.000, 3.000], mean observation: 0.102 [-0.786, 1.412], loss: 4.999288, mean_absolute_error: 28.285997, mean_q: 32.568596
 83618/100000: episode: 172, duration: 3.933s, episode steps: 1000, steps per second: 254, episode reward: -92.447, mean reward: -0.092 [-4.959, 4.638], mean action: 1.513 [0.000, 3.000], mean observation: 0.221 [-0.157, 1.413], loss: 4.790727, mean_absolute_error: 27.500744, mean_q: 31.580008
 84618/100000: episode: 173, duration: 4.514s, episode steps: 1000, steps per second: 222, episode reward: -68.476,

Average total reward for 200 games: -161.32143

# Task 3: Results, Comparison and Computation Evaluation

### Convolutional Neural Network
The aircraft generally lands upside down, which suggest overshooting or persistence in the action choice when going left or right. The aircraft also tends to just fall down to the platform. The image dataset was unbalanced dataset which have more data for the classes 0 and 2. Therefore, in order to balance the dataset we have downsized these two classes to similar number of images with other two classes. We have trained two models in this section with different parameters. Each of them have performed poorly and created similar results in the game. The average total rewards were -336.5 and 377.2, respectively.

Training for the convolutional neural network is extremely slow compared to reinforcement learning model with each epoch taking on average 1500 seconds (approximately 25 minutes) with batch size equal to 32. After 4 epochs, these models achieve approx 65% training accuracies and 68-69% validation accuracies. These results say that it is significantly more computationally expensive to use image data given its high level of dimensionality (84x84x1), even after downsizing and conversion to grayscale, to get comparable accuracy to a supervised model with the state features.

### Reinforcement Learning
Reinforcement learning using Deep-Q Learning and the state data performs well in general and improves upon the results from the convolutional neural network. In general, reinforcement learning plays a more sophisticated game using a combination of actions leading to an upright landing, a landing point closer to the target range, and slower landing based on velocity. 

Additionally, We tested the results between training the model using epsilon greedy policy. We trained this model for 100,000 steps, which took 4-5 minutes approximately. The model trained with epsilon greedy policy performs better than concolutional neural network model, yielding an average total reward of -130.8.

Finally, we would consider Deep Q-Learning with epsilon greedy policy to be the best performing model. Since, it takes less time for training and testing and also with higher reward.