Deep learning neural networks are very easy to create and evaluate in Python with Keras, but you must follow a strict model life-cycle.

In this notebook you will discover the step-by-step life-cycle for creating, training, and evaluating Long Short-Term Memory (LSTM) Recurrent Neural Networks in Keras and how to make predictions with a trained model.

overview of the 5 steps in the LSTM model life-cycle in Keras that we are going to look at.

**Define Network**

**Compile Network**

**Fit Network**

**Evaluate Network**

**Make Predictions**

**Environment**
This tutorial assumes you have a Python SciPy environment installed. You can use either Python 2 or 3 with this example.

**step-1 Define Network**


Neural networks are defined in Keras as a sequence of layers. The container for these layers is the Sequential class. 

The first step is to create an instance of the Sequential class. Then you can create your layers and add them in the order that they should be connected. The LSTM recurrent layer comprised of memory units is called LSTM(). A fully connected layer that often follows LSTM layers and is used for outputting a prediction is called Dense().

For example, we can do this in two steps:

In [None]:
model = Sequential()
model.add(LSTM(2))
model.add(Dense(1))

But we can also do this in one step by creating an array of layers and passing it to the constructor of the Sequential.

In [None]:
layers = [LSTM(2), Dense(1)]
model = Sequential(layers)

Assuming your data is loaded as a NumPy array, you can convert a 2D dataset to a 3D dataset using the reshape() function in NumPy. If you would like columns to become timesteps for one feature, you can use:

In [None]:
data = data.reshape((data.shape[0], data.shape[1], 1))

If you would like columns in your 2D data to become features with one timestep, you can use:

In [None]:
data = data.reshape((data.shape[0], 1, data.shape[1]))

You can specify the input_shape argument that expects a tuple containing the number of timesteps and the number of features.

In [None]:
model = Sequential()
model.add(LSTM(5, input_shape=(2,1)))
model.add(Dense(1))

LSTM layers can be stacked by adding them to the Sequential model. Importantly, when stacking LSTM layers, we must output a sequence rather than a single value for each input so that the subsequent LSTM layer can have the required 3D input. We can do this by setting the return_sequences argument to True. For example:

In [None]:
model = Sequential()
model.add(LSTM(5, input_shape=(2,1), return_sequences=True))
model.add(LSTM(5))
model.add(Dense(1))

For example, activation functions that transform a summed signal from each neuron in a layer can be extracted and added to the Sequential as a layer-like object called Activation.

In [None]:
model = Sequential()
model.add(LSTM(5, input_shape=(2,1)))
model.add(Dense(1))
model.add(Activation('sigmoid'))

**Step 2. Compile Network**

Compilation is an efficiency step. It transforms the simple sequence of layers that we defined into a highly efficient series of matrix transforms in a format intended to be executed on your GPU or CPU, depending on how Keras is configured.

Compilation requires a number of parameters to be specified, specifically tailored to training your network. Specifically, the optimization algorithm to use to train the network and the loss function used to evaluate the network that is minimized by the optimization algorithm.

For example, below is a case of compiling a defined model and specifying the stochastic gradient descent (sgd) optimization algorithm and the mean squared error (mean_squared_error) loss function, intended for a regression type problem.



In [None]:
model.compile(optimizer='sgd', loss='mean_squared_error')

Alternately, the optimizer can be created and configured before being provided as an argument to the compilation step.

In [None]:
algorithm = SGD(lr=0.1, momentum=0.3)
model.compile(optimizer=algorithm, loss='mean_squared_error')

**step- 3 Fit Network**

Once the network is compiled, it can be fit, which means adapt the weights on a training dataset. Fitting the network requires the training data to be specified, both a matrix of input patterns, X, and an array of matching output patterns, y. 

The network is trained using the backpropagation algorithm and optimized according to the optimization algorithm and loss function specified when compiling the model. The backpropagation algorithm requires that the network be trained for a specified number of epochs or exposures to the training dataset.

Training can take a long time, from seconds to hours to days depending on the size of the network and the size of the training data.

You can reduce the amount of information displayed to just the loss each epoch by setting the verbose argument to 2. You can turn off all output by setting verbose to 1.

In [None]:
history = model.fit(X, y, batch_size=10, epochs=100)

In [None]:
history = model.fit(X, y, batch_size=10, epochs=100, verbose=0)

**Step 4. Evaluate Network**

Once the network is trained, it can be evaluated. The network can be evaluated on the training data, but this will not provide a useful indication of the performance of the network as a predictive model, as it has seen all of this data before. We can evaluate the performance of the network on a separate dataset, unseen during testing. This will provide an estimate of the performance of the network at making predictions for unseen data in the future.Once the network is trained, it can be evaluated.

The model evaluates the loss across all of the test patterns, as well as any other metrics specified when the model was compiled, like classification accuracy. A list of evaluation metrics is returned.

For example, for a model compiled with the accuracy metric, we could evaluate it on a new dataset as follows:

In [None]:
loss, accuracy = model.evaluate(X, y)

As with fitting the network, verbose output is provided to give an idea of the progress of evaluating the model. We can turn this off by setting the verbose argument to 0.

In [None]:
loss, accuracy = model.evaluate(X, y, verbose=0)

**Step 5. Make Predictions**

Once we are satisfied with the performance of our fit model, we can use it to make predictions on new data. This is as easy as calling the predict() function on the model with an array of new input patterns.



In [None]:
predictions = model.predict(X)

Alternately, for classification problems, we can use the predict_classes() function that will automatically convert uncrisp predictions to crisp integer class values.

In [None]:
predictions = model.predict_classes(X)

As with fitting and evaluating the network, verbose output is provided to given an idea of the progress of the model making predictions. We can turn this off by setting the verbose argument to 0.

In [None]:
predictions = model.predict(X, verbose=0)

**End-to-End Worked Example**

Letâ€™s tie all of this together with a small worked example.

In [1]:
# Example of LSTM to learn a sequence
from pandas import DataFrame
from pandas import concat
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# create sequence
length = 10
sequence = [i/float(length) for i in range(length)]
print(sequence)
# create X/y pairs
df = DataFrame(sequence)
df = concat([df.shift(1), df], axis=1)
df.dropna(inplace=True)
# convert to LSTM friendly format
values = df.values
X, y = values[:, 0], values[:, 1]
X = X.reshape(len(X), 1, 1)
# 1. define network
model = Sequential()
model.add(LSTM(10, input_shape=(1,1)))
model.add(Dense(1))
# 2. compile network
model.compile(optimizer='adam', loss='mean_squared_error')
# 3. fit network
history = model.fit(X, y, epochs=1000, batch_size=len(X), verbose=0)
# 4. evaluate network
loss = model.evaluate(X, y, verbose=0)
print(loss)
# 5. make predictions
predictions = model.predict(X, verbose=0)
print(predictions[:, 0])

[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
9.56310541369021e-05
[0.12466948 0.20848265 0.29829723 0.3931416  0.4919188  0.5934638
 0.69660336 0.8002115  0.90325487]
