There are three different models we have to consider for this task:
1. The model from task 1 that is trained on the state vectors. This model can be tested in the lunar lander game by running the file: lunar_lander_ml_states_player.py. This will save the results to a csv that will be examined below. Alternatively, you may use the csv file: lunarlander_ml_states_rewards.csv which are the rewards I got when I ran the code.
2. The model from task 2 that is trained on the image dataset. This model can be tested in the lunar lander game by running the file: lunar_lander_ml_images_player.py. This will save the results to a csv that will be examined below. Alternatively, you may use the csv file: lunarlander_ml_images_rewards.csv which are the rewards I got when I ran the code.
3. The model from task 3. This model was a deepQlearning model and thus has already been tested as part of my task 3 evaluation. The results from that experiment will also be compared below

In [1]:
import csv
import numpy as np
from scipy.stats import ttest_ind
import pandas as pd

In [2]:
# States trained model
result_array_state_vectors = np.array([])
with open("lunarlander_ml_states_rewards.csv", "r") as f:
    reader = csv.reader(f)
    for row in reader:
        row[0] = float(row[0])
        result_array_state_vectors = np.append(result_array_state_vectors, row[0])
        

In [3]:
m = np.mean(result_array_state_vectors)
s = np.std(result_array_state_vectors)
print("Supervised learning with state vectors achieves a mean reward of:", m, "with standard deviation:", s)

Supervised learning with state vectors achieves a mean reward of: 217.0581780573467 with standard deviation: 49.11343075739714


In [4]:
# Images trained model
result_array_images = np.array([])
with open("lunarlander_ml_images_rewards.csv", "r") as f:
    reader = csv.reader(f)
    for row in reader:
        row[0] = float(row[0])
        result_array_images = np.append(result_array_images, row[0])

In [5]:
m2 = np.mean(result_array_images)
s2 = np.std(result_array_images)
print("Supervised learning with images achieves a mean reward of:", m2, "with standard deviation:", s2)

Supervised learning with images achieves a mean reward of: -314.3909846382998 with standard deviation: 149.00272316831354


In [6]:
# Taking the reward from the task 3 jupyter notebook with reinforcemnt learning 
%store -r result_reinforcement_learning
m3 = np.mean(result_reinforcement_learning)
s3 = np.std(result_reinforcement_learning)
print("Reinforcement learning with state vectors achieves a mean reward of:", m3, "with standard deviation:", s3)

Reinforcement learning with state vectors achieves a mean reward of: 208.9124220234545 with standard deviation: 48.34306859394585


In [7]:
# Do a t-test to confirm that the difference we observe is statistically significant 
# State vectors vs images data
test = ttest_ind(result_array_state_vectors, result_array_images)
if test[1]<0.05:
    difference = "statistically significant"
else:
    difference = "not statistically significant"
print("State vectors vs Image dataset obtain t-statistic of", test[0], "and thus p-value of", test[1],". Thus the difference we observe is", difference, "at a 95% confidence level\n")

# State vectors vs reinforcement learning
test = ttest_ind(result_array_state_vectors, result_reinforcement_learning)
if test[1]<0.05:
    difference = "statistically significant"
else:
    difference = "not statistically significant"
print("State vectors vs reinforcement learning obtain t-statistic of", test[0], "and thus p-value of", test[1],". Thus the difference we observe is", difference, "at a 95% confidence level\n")

# Images data vs reinforcement learning
test = ttest_ind(result_array_images, result_reinforcement_learning)
if test[1]<0.05:
    difference = "statistically significant"
else:
    difference = "not statistically significant"
print("Images data vs reinforcement learning obtain t-statistic of", test[0], "and thus p-value of", test[1],". Thus the difference we observe is", difference, "at a 95% confidence level")

# All three results are significantly different


State vectors vs Image dataset obtain t-statistic of 47.7856712900519 and thus p-value of 5.846945195076932e-167 . Thus the difference we observe is statistically significant at a 95% confidence level

State vectors vs reinforcement learning obtain t-statistic of 1.6674336423101666 and thus p-value of 0.09621480890369682 . Thus the difference we observe is not statistically significant at a 95% confidence level

Images data vs reinforcement learning obtain t-statistic of -47.12516406233982 and thus p-value of 6.491117069447761e-165 . Thus the difference we observe is statistically significant at a 95% confidence level


In [8]:
# Summary
performance_ranking = pd.DataFrame(index = ["Reward"], columns=["Reinforcement Learning", "States trained model", "Images trained model"])
performance_ranking.loc["Reward", "Reinforcement Learning"] = m3
performance_ranking.loc["Reward", "States trained model"] = m
performance_ranking.loc["Reward", "Images trained model"] = m2
performance_ranking


Unnamed: 0,Reinforcement Learning,States trained model,Images trained model
Reward,208.912,217.058,-314.391


### Document to describe results of the experiment

The first point to make when reporting on the results of the experiment is to awknowledge the ranking based on mean reward. There were three approaches, two supervised models trained on states and images respectively and a third model that was a deepQlearning model. The order of performance based on mean reward over 200 test runs can be observed in the performance table above. Clearly the images trained model has not performed well. The task of learning from pure images would need far more training in order to improve. The reinforcement model and states trained model however have performed extremely well. I applied a pairwise t-test to check if the difference in test scores obtained is statistically significant and found that they are not. Thus I conclude that both models are equally good. 

Another key component to the assessment of the experiment is the amount of computation required to train each model. There are two points I would make here. Firstly, the computation and time required to train the reinforcement model to reach this level was far more than that of the states trained model. However my second comment is that this computation measurement doesnt account for the computation required to obtain this perfect data set. In fact, one could argue that, as the performance was above human standard, the only way to obtain such a good data set in order to train the states model is to first train a reinforcement model to that level. So although it has been successful here to apply supervised learning this is unlikely to scale well to bigger environments. 

A final interesting output from the experiment is the route the reinforcement model took to learning the best tactics. The model began picking up the physics of the game very quickly. By episode 20 it could last the full 1000 steps of an episode. The model spent approx 200 episodes in this state getting to grips with the physics and trying to stay flying for as long as possible. Then very suddenly (possibly due to the e greedy) the model stumbled across the fact that attempting to land quickly could lead to a far higher reward and very quickly every episode only lasted around 300 steps with rewards jumping up by over 100. It is very interesting to see the game develop tactis itself due to some randomness in the training phase. This would be something interesting to explore in bigger environments. 
