# **Sample Tutorial: Testing your Models**

Hello teams! In order to help streamline the judging process for each of your models, this notebook will walk through how to format your models for easy and fair evaluation.

This sample tutorial will be using the previous sample LSTM model given in the training notebook to show how models will be tested. We hope this transparency will allow for teams to understand exactly how models will be judged. 

Let's first begin by importing some basic libraries

In [1]:
import h5py
import numpy as np
from datetime import datetime
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import LSTM, Dense, Reshape

We'll be using the first week of our original training dataset for testing. Let's go ahead and open and access that data now

Note: this will not be the actual dataset being used the for final model judging.

In [2]:
with h5py.File('datasets/training_data.h5', 'r') as f:
    # Access the trip dataset and their corresponding timestamps
    traffic_data = f['trip'][()]
    dates = f['timeslot'][()]

In [3]:
test_set_size = 48 * 7

test_traffic_data = traffic_data[:test_set_size]
test_dates = dates[:test_set_size]

With the data open, we can go ahead and do any preprocessing needed for it to run in our models. If teams reshaped or adjusted their model's data in any way this code should be changed to reflect those adjustments

The code below handles some of the basic data handling we did in the model training tutorial. We'll need to use this formatted data for our model to accept it as input. The bottom code snippet does just that

In [4]:
formatted_dates = []

for date_string in test_dates:
    formatted_date = datetime.strptime(date_string.decode(), '%Y%m%d%H%M')

    year = formatted_date.year
    month = formatted_date.month
    day = formatted_date.day
    hour = formatted_date.hour
    minute = formatted_date.minute

    formatted_dates.append(np.array([year, month, day, hour, minute]))

test_dates = np.array(formatted_dates).reshape(test_set_size, 5, 1)

Now, we'll define our model architecture. This is necessary in order to load the previously trained model. If you have any questions on how we obtained the *lstm_model.h5* file please review the previous LSTM model training tutorial where we covered this in more depth

Teams should replace the bottom code snippet to load their custom built models. 

In [5]:
model = Sequential()
model.add(LSTM(50, activation='tanh', input_shape=(5, 1)))
model.add(Dense(2 * 16 * 8, activation='linear'))
model.add(Reshape((2, 16, 8)))

model = load_model('lstm_model.keras')

With the model's correctly loaded in, we can simply evaluate it's performance on our "test set". Examining it's RMSE we can get a quick idea how the model performed on the test set

In [6]:
mse = model.evaluate(test_dates, test_traffic_data)

# Show rmse to see how model performs on the test set
rmse = np.sqrt(mse)
print(f'Root Mean Squared Error: {rmse}')

Root Mean Squared Error: 7.137125238577685


Here we see our model's achieved RMSE, it's important for teams to remember that model performance will not be the sole judging criteria for this competition. Team presentations will also play a large role in determining overall placing

This concludes this sample notebook. Teams are encouraged to send in their testing notebooks by Friday, February 2nd by 5pm EST

If any teams have follow up questions or need any help relating to this notebook, please feel free to drop a question in the hackathon teams chat. We are all more than happy to help answer any questions you might have!

# **Thank you and best of luck!**