# **Modeling the LunarLander Environment**
### CSCI 7000: Applied Deep Learning (Homework 2)

**To run this notebook, you need OpenAI Gym. You can replicate my environment with the following `environment.yml` file for Anaconda.**

```
name: lunarlander-modeling
channels:
  - conda-forge
dependencies:
  - python=3.7.*
  - tensorflow-gpu=2.1.*
  - numpy=1.18.*
  - scipy=1.4.*
  - nb_conda_kernels
  - pip
  - pip:
    - gym[box2d]
```

**The motivation behind this work is to explore a couple of ways to "model" the OpenAI Gym [LunarLander](https://gym.openai.com/envs/LunarLander-v2/) environment. The observations in this environment consist of 8 values associated with the lander:**
 - ***x* position,**
 - ***y* position,**
 - ***x* velocity,**
 - ***y* velocity,**
 - **angle,**
 - **angular velocity,**
 - **boolean variable indicating whether its left leg is in contact with the ground,**
 - **boolean variable indicating whether its right leg is in contact with the ground.**
 
**Specifically, I'm interested in finding out if we can predict with some accuracy the next state of the lander *despite* not knowing what the intervening action is. This might seem nonsensical at first glance: if we don't know what action is going to be taken, how are we going to know the next state?**

**However, I suspect that a given state might contain enough information to make a decent prediction of the next state. While the actions taken might change the lander's velocities, they only do so incrementally. For instance, if the lander has a large negative *y* velocity at a given timestep, its *y* position will decrease significantly by the next timestep regardless of what action was taken.**

**Alright, let's get started. First, we import some libraries**

In [1]:
import numpy as np
import gym

import tensorflow as tf
from tensorflow.keras import layers

**Now, we generate 100,000 episodes' worth of data from the LunarLander environment. Every action is taken randomly.**

In [2]:
env = gym.make('LunarLander-v2')

env.seed(42)

episode_count = 100000
reward = 0
done = False

data = []

for i in range(episode_count):
    episode_obs = []
    ob = env.reset()
    episode_obs.append(ob)
    while True:
        action = env.action_space.sample()
        ob, reward, done, _ = env.step(action)
        episode_obs.append(ob)
        if done:
            data.append(np.array(episode_obs))
            if i % 5000 == 0:
                print("Completed:", i)
            break
            
env.close()



Completed: 0
Completed: 5000
Completed: 10000
Completed: 15000
Completed: 20000
Completed: 25000
Completed: 30000
Completed: 35000
Completed: 40000
Completed: 45000
Completed: 50000
Completed: 55000
Completed: 60000
Completed: 65000
Completed: 70000
Completed: 75000
Completed: 80000
Completed: 85000
Completed: 90000
Completed: 95000


**Now, we create training and test data from the generated data. I'm taking a naive approach to this, and just creating sequences of 10 observations. The idea is that, given 9 observations in a row, we will try to predict the 10th observation.**

In [3]:
MAX_SEQ_LEN = 10
STATE_DIM = env.observation_space.shape[0]

In [4]:
def generate_fixed_seq_data(data):
    fixed_seq_data = []
    for episode in data:
        for i in range(len(episode) - MAX_SEQ_LEN):
            fixed_seq_data.append(episode[i:(i + MAX_SEQ_LEN)])

    return np.array(fixed_seq_data)

In [5]:
train_data = generate_fixed_seq_data(data[:80000])
test_data = generate_fixed_seq_data(data[80000:])

In [6]:
X_train, y_train = train_data[:, :-1], train_data[:, -1]
X_test, y_test = test_data[:, :-1], test_data[:, -1]

In [7]:
print("X_train:", X_train.shape)
print("y_train:", y_train.shape)
print("X_test:", X_test.shape)
print("y_test:", y_test.shape)

X_train: (6702422, 9, 8)
y_train: (6702422, 8)
X_test: (1672840, 9, 8)
y_test: (1672840, 8)


**Let's define and train a simple LSTM model.**

In [8]:
lstm_model = tf.keras.Sequential()
lstm_model.add(layers.LSTM(128, input_shape=X_train.shape[-2:]))
lstm_model.add(layers.Dense(8))

lstm_model.compile(loss='mse',
                   optimizer=tf.keras.optimizers.Adam(1e-4),
                   metrics=['mae', 'mse'])

In [9]:
lstm_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 128)               70144     
_________________________________________________________________
dense (Dense)                (None, 8)                 1032      
Total params: 71,176
Trainable params: 71,176
Non-trainable params: 0
_________________________________________________________________


In [10]:
lstm_model.fit(X_train, y_train,
               validation_split=0.2,
               batch_size=128,
               epochs=10)

Train on 5361937 samples, validate on 1340485 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7faea8022e50>

**To have some basis of comparison, let's define and train a simple fully-connected model with roughly the same number of trainable parameters as the LSTM model.**

In [11]:
dense_model = tf.keras.Sequential()
dense_model.add(layers.Flatten(input_shape=X_train.shape[-2:]))
dense_model.add(layers.Dense(384, activation='relu'))
dense_model.add(layers.Dense(128, activation='relu'))
dense_model.add(layers.Dense(8))

dense_model.compile(loss='mse',
                    optimizer=tf.keras.optimizers.Adam(1e-4),
                    metrics=['mae', 'mse'])

In [12]:
dense_model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 72)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 384)               28032     
_________________________________________________________________
dense_2 (Dense)              (None, 128)               49280     
_________________________________________________________________
dense_3 (Dense)              (None, 8)                 1032      
Total params: 78,344
Trainable params: 78,344
Non-trainable params: 0
_________________________________________________________________


In [13]:
dense_model.fit(X_train, y_train,
                validation_split=0.2,
                batch_size=128,
                epochs=10)

Train on 5361937 samples, validate on 1340485 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fadac1e7950>

**Let's also try a fully-connected model that only uses a single observation to make a prediction (instead of using a sequence of 9 observations).**

In [14]:
X_train_single = X_train[:, -1, :]
X_test_single = X_test[:, -1, :]

In [15]:
dense_single_model = tf.keras.Sequential()
dense_single_model.add(layers.Dense(256, activation='relu', input_dim=X_train_single.shape[-1]))
dense_single_model.add(layers.Dense(8))

dense_single_model.compile(loss='mse',
                           optimizer=tf.keras.optimizers.Adam(1e-4),
                           metrics=['mae', 'mse'])

In [16]:
dense_single_model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 256)               2304      
_________________________________________________________________
dense_5 (Dense)              (None, 8)                 2056      
Total params: 4,360
Trainable params: 4,360
Non-trainable params: 0
_________________________________________________________________


In [17]:
dense_single_model.fit(X_train_single, y_train,
                       validation_split=0.2,
                       batch_size=128,
                       epochs=10)

Train on 5361937 samples, validate on 1340485 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fadac074d50>

**Finally, let's take a look at the results. I also compare the models to a naive baseline that uses the last-seen state as its "prediction".**

In [18]:
y_lstm_pred = lstm_model.predict(X_test)

print("LSTM Network")
print("MSE:", np.mean(tf.keras.losses.mean_squared_error(y_test, y_lstm_pred)))
print("MAE:", np.mean(tf.keras.losses.mean_absolute_error(y_test, y_lstm_pred)))

LSTM Network
MSE: 0.0036692955
MAE: 0.0128733255


In [19]:
y_dense_pred = dense_model.predict(X_test)

print("Dense Network")
print("MSE:", np.mean(tf.keras.losses.mean_squared_error(y_test, y_dense_pred)))
print("MAE:", np.mean(tf.keras.losses.mean_absolute_error(y_test, y_dense_pred)))

Dense Network
MSE: 0.0034199469
MAE: 0.013440048


In [20]:
y_dense_single_pred = dense_single_model.predict(X_test_single)

print("Dense Network (Single Observation)")
print("MSE:", np.mean(tf.keras.losses.mean_squared_error(y_test, y_dense_single_pred)))
print("MAE:", np.mean(tf.keras.losses.mean_absolute_error(y_test, y_dense_single_pred)))

Dense Network (Single Observation)
MSE: 0.0042579677
MAE: 0.014133502


In [21]:
y_naive_baseline = X_test_single

print("Naive Baseline")
print("MSE:", np.mean(tf.keras.losses.mean_squared_error(y_test, y_naive_baseline)))
print("MAE:", np.mean(tf.keras.losses.mean_absolute_error(y_test, y_naive_baseline)))

Naive Baseline
MSE: 0.0066404184
MAE: 0.015631419
