# Long Short-Term Memory Network

The Long Short-Term Memory network, or LSTM network, is a recurrent neural network that is trained using Backpropagation Through Time and overcomes the vanishing gradient problem.LSTM networks have memory blocks that are connected through layers.

Given the log error of this month, what is the log error (log(zestimate)- log(sales price)) next month?

We can write a simple function to convert our single column of data into a two-column dataset: the first column containing this month’s (t) log Error count and the second column containing next month’s (t+1) log Error, to be predicted.

Before we get started, let’s first import all of the functions and classes we intend to use. This assumes a working SciPy environment with the Keras deep learning library installed.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import numpy as np
import pandas as pd
import math
from collections import defaultdict
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import LabelEncoder
from keras.layers import Dropout, BatchNormalization

Fixing the random number seed to ensure our results are reproducible.

In [3]:
# fix random seed for reproducibility
np.random.seed(7)

In [5]:
train = pd.read_csv("/content/drive/MyDrive/zillow-prize-1/train_2016_v2.csv", parse_dates=["transactiondate"])
prop = pd.read_csv('/content/drive/MyDrive/zillow-prize-1/properties_2016.csv')
sample = pd.read_csv('/content/drive/MyDrive/zillow-prize-1/sample_submission.csv')

print('Fitting Label Encoder on properties')
for c in prop.columns:
    prop[c]=prop[c].fillna(-1)
    if prop[c].dtype == 'object':
        lbl = LabelEncoder()
        lbl.fit(list(prop[c].values))
        prop[c] = lbl.transform(list(prop[c].values))

#Create df_train and x_train y_train from that
print('Creating training set:')
df_train = train.merge(prop, how='left', on='parcelid')

  prop = pd.read_csv('/content/drive/MyDrive/zillow-prize-1/properties_2016.csv')


Fitting Label Encoder on properties
Creating training set:


LSTMs are sensitive to the scale of the input data, specifically when the sigmoid (default) or tanh activation functions are used.

So, we rescale the data to the range of 0-to-1, also called normalizing.We can easily normalize the dataset using the MinMaxScaler preprocessing class from the scikit-learn library.

In [6]:
df_train.fillna(-1.0)
dataset = df_train[['logerror']]
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)

After modelling our data and estimate the skill of our model on the training dataset, we need to get an idea of the skill of the model on new unseen data.
For a normal classification or regression problem, we would do this using cross validation.

With time series data, the sequence of values is important.
A simple method that we can use is to split the ordered dataset into train and test datasets.
The code below calculates the index of the split point and separates the data into the training datasets with 90% of the
observations that we can use to train our model, leaving the remaining 10% for testing the model.

In [7]:
# split into train and test sets
train_size = int(len(dataset) * 0.90)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
print(len(train), len(test))

81247 9028


We define a function to create new dataset;

The function takes two arguments: the dataset, which is a NumPy array that we want to convert into a dataset, and the look_back, which is the number of previous time steps to use as input variables to predict the next time
period -in this case defaulted to 1.

This default will create a dataset where X is the log Error at a given time (t)
and Y is the log Error at the next time (t + 1).

In [8]:
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
    dataX, dataY = [],[]
    for i in range(len(dataset)-look_back):
        a = dataset[i:(i+look_back), :]
        dataX.append(a)
        dataY.append(dataset[i + look_back, :])
    return np.array(dataX), np.array(dataY)

We create a function build_model which takes the parameters train,test, look_back, activation, optimizer, epoch and loss as arguments.

The LSTM network expects the input data (X) to be provided with a specific array structure in the form of: [samples, time steps, features].

Currently, our data is in the form: [samples, features] and we are framing the problem as one time step for each sample. We can transform the prepared train and test input data into the expected structure using numpy.reshape().

To design and fit our LSTM network for this problem,
The network has a visible layer with 1 input, a hidden layer with 4 LSTM blocks or neurons, and an output layer that makes a single value prediction. The default sigmoid activation function is used for the LSTM blocks. The network is trained for 100 epochs and a batch size of 1 is used.

In [9]:
look_back = 1
def build_model(train,test,look_back,activation,optimizer,epochs,loss):
    # reshape into X=t and Y=t+1
    trainX, trainY = create_dataset(train, look_back)
    testX, testY = create_dataset(test, look_back)
    # reshape input to be [samples, time steps, features]
    trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
    testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
    # create and fit the LSTM network
    model = Sequential()
    model.add(LSTM(4, input_shape=(1, look_back),activation = activation))
    model.add(Dense(1))
    model.compile(loss=loss, optimizer=optimizer)
    model.fit(trainX, trainY, epochs=epochs, batch_size=256, verbose=2)
    # make predictions
    trainPredict = model.predict(trainX)
    testPredict = model.predict(testX)
    # invert predictions
    trainPredict = scaler.inverse_transform(trainPredict)
    trainY = scaler.inverse_transform(trainY)
    testPredict = scaler.inverse_transform(testPredict)
    testY = scaler.inverse_transform(testY)
    # calculate root mean squared error
    trainScore = math.sqrt(mean_squared_error(trainY, trainPredict[:, 0]))
    print('Train Score: %.2f RMSE' % (trainScore))
    testScore = math.sqrt(mean_squared_error(testY, testPredict[:, 0]))
    print('Test Score: %.2f RMSE' % (testScore))

The function build_model is called various times with changing the activation functions (sigmoid, relu, tanh), optimizers (adam, adagrad) and loss functions (mean_squared_error,hinge, logcosh) for Epoch=100.

In [10]:
build_model(train,test,1,'sigmoid','adam',100,'mean_squared_error')

Epoch 1/100


  super().__init__(**kwargs)


318/318 - 2s - 8ms/step - loss: 0.1380
Epoch 2/100
318/318 - 1s - 2ms/step - loss: 0.0019
Epoch 3/100
318/318 - 1s - 2ms/step - loss: 3.0286e-04
Epoch 4/100
318/318 - 1s - 2ms/step - loss: 3.0225e-04
Epoch 5/100
318/318 - 1s - 2ms/step - loss: 3.0225e-04
Epoch 6/100
318/318 - 1s - 2ms/step - loss: 3.0225e-04
Epoch 7/100
318/318 - 1s - 2ms/step - loss: 3.0225e-04
Epoch 8/100
318/318 - 1s - 2ms/step - loss: 3.0224e-04
Epoch 9/100
318/318 - 1s - 2ms/step - loss: 3.0226e-04
Epoch 10/100
318/318 - 1s - 2ms/step - loss: 3.0227e-04
Epoch 11/100
318/318 - 1s - 2ms/step - loss: 3.0231e-04
Epoch 12/100
318/318 - 1s - 2ms/step - loss: 3.0229e-04
Epoch 13/100
318/318 - 1s - 2ms/step - loss: 3.0228e-04
Epoch 14/100
318/318 - 1s - 2ms/step - loss: 3.0227e-04
Epoch 15/100
318/318 - 1s - 3ms/step - loss: 3.0233e-04
Epoch 16/100
318/318 - 1s - 4ms/step - loss: 3.0229e-04
Epoch 17/100
318/318 - 1s - 3ms/step - loss: 3.0234e-04
Epoch 18/100
318/318 - 1s - 4ms/step - loss: 3.0230e-04
Epoch 19/100
318/318 

We can see that the model has an average error of about 0.16 on the training dataset, and about 0.15 on the test dataset. Of all the various functions used to predict and calculate the RMSE these set of activation, loss and optimizer functions give more accurate predictions of Log Error of home values.

In [11]:
build_model(train,test,1,'relu','adam',100,'mean_squared_error')

Epoch 1/100


  super().__init__(**kwargs)


318/318 - 2s - 6ms/step - loss: 0.1037
Epoch 2/100
318/318 - 1s - 2ms/step - loss: 0.0018
Epoch 3/100
318/318 - 1s - 2ms/step - loss: 3.1878e-04
Epoch 4/100
318/318 - 1s - 5ms/step - loss: 3.1866e-04
Epoch 5/100
318/318 - 1s - 3ms/step - loss: 3.1846e-04
Epoch 6/100
318/318 - 1s - 4ms/step - loss: 3.1819e-04
Epoch 7/100
318/318 - 1s - 2ms/step - loss: 3.1781e-04
Epoch 8/100
318/318 - 1s - 4ms/step - loss: 3.1707e-04
Epoch 9/100
318/318 - 1s - 2ms/step - loss: 3.1550e-04
Epoch 10/100
318/318 - 1s - 2ms/step - loss: 3.1489e-04
Epoch 11/100
318/318 - 1s - 2ms/step - loss: 3.1455e-04
Epoch 12/100
318/318 - 1s - 2ms/step - loss: 3.1422e-04
Epoch 13/100
318/318 - 1s - 2ms/step - loss: 3.1382e-04
Epoch 14/100
318/318 - 1s - 2ms/step - loss: 3.1345e-04
Epoch 15/100
318/318 - 1s - 2ms/step - loss: 3.1305e-04
Epoch 16/100
318/318 - 1s - 2ms/step - loss: 3.1256e-04
Epoch 17/100
318/318 - 1s - 2ms/step - loss: 3.1208e-04
Epoch 18/100
318/318 - 1s - 2ms/step - loss: 3.1168e-04
Epoch 19/100
318/318 

We can see that the model has an average error of about 0.16 on the training dataset, and about 0.15 on the test dataset. Of all the various functions used to predict and calculate the RMSE these set of activation, loss and optimizer functions give more accurate predictions of Log Error of home values.

In [12]:
build_model(train,test,1,'tanh','adam',100,'mean_squared_error')

Epoch 1/100


  super().__init__(**kwargs)


318/318 - 2s - 7ms/step - loss: 0.0415
Epoch 2/100
318/318 - 1s - 2ms/step - loss: 3.3107e-04
Epoch 3/100
318/318 - 1s - 2ms/step - loss: 3.3080e-04
Epoch 4/100
318/318 - 1s - 2ms/step - loss: 3.3057e-04
Epoch 5/100
318/318 - 1s - 2ms/step - loss: 3.3027e-04
Epoch 6/100
318/318 - 1s - 2ms/step - loss: 3.2985e-04
Epoch 7/100
318/318 - 1s - 2ms/step - loss: 3.2938e-04
Epoch 8/100
318/318 - 1s - 2ms/step - loss: 3.2885e-04
Epoch 9/100
318/318 - 1s - 2ms/step - loss: 3.2824e-04
Epoch 10/100
318/318 - 1s - 2ms/step - loss: 3.2760e-04
Epoch 11/100
318/318 - 1s - 4ms/step - loss: 3.2670e-04
Epoch 12/100
318/318 - 1s - 3ms/step - loss: 3.2580e-04
Epoch 13/100
318/318 - 1s - 4ms/step - loss: 3.2480e-04
Epoch 14/100
318/318 - 1s - 3ms/step - loss: 3.2370e-04
Epoch 15/100
318/318 - 1s - 4ms/step - loss: 3.2242e-04
Epoch 16/100
318/318 - 1s - 2ms/step - loss: 3.2114e-04
Epoch 17/100
318/318 - 1s - 2ms/step - loss: 3.1969e-04
Epoch 18/100
318/318 - 1s - 2ms/step - loss: 3.1833e-04
Epoch 19/100
318/

The model has an average error of about 0.16 on the training dataset, and about 0.15 on the test dataset. Of all the various functions used to predict and calculate the RMSE these set of activation and optimizer functions give more accurate predictions of Log Error of home values with the loss function being MEAN_SQUARED_ERROR which measures the average of the squares of the errors or deviations —that is, the difference between the estimator and what is estimated.

In [13]:
#Hinge loss function
build_model(train,test,1,'tanh','adam',100,'hinge')

Epoch 1/100


  super().__init__(**kwargs)


318/318 - 3s - 10ms/step - loss: 0.7174
Epoch 2/100
318/318 - 1s - 3ms/step - loss: 0.0628
Epoch 3/100
318/318 - 1s - 2ms/step - loss: 9.9566e-04
Epoch 4/100
318/318 - 1s - 2ms/step - loss: 7.9582e-04
Epoch 5/100
318/318 - 1s - 2ms/step - loss: 6.7991e-04
Epoch 6/100
318/318 - 1s - 2ms/step - loss: 5.9799e-04
Epoch 7/100
318/318 - 1s - 2ms/step - loss: 5.3166e-04
Epoch 8/100
318/318 - 1s - 2ms/step - loss: 4.7451e-04
Epoch 9/100
318/318 - 1s - 2ms/step - loss: 4.2807e-04
Epoch 10/100
318/318 - 1s - 2ms/step - loss: 3.9334e-04
Epoch 11/100
318/318 - 1s - 2ms/step - loss: 3.6682e-04
Epoch 12/100
318/318 - 1s - 2ms/step - loss: 3.4391e-04
Epoch 13/100
318/318 - 1s - 2ms/step - loss: 3.2356e-04
Epoch 14/100
318/318 - 1s - 2ms/step - loss: 3.0725e-04
Epoch 15/100
318/318 - 1s - 2ms/step - loss: 2.9343e-04
Epoch 16/100
318/318 - 1s - 2ms/step - loss: 2.8031e-04
Epoch 17/100
318/318 - 1s - 2ms/step - loss: 2.6720e-04
Epoch 18/100
318/318 - 1s - 2ms/step - loss: 2.5431e-04
Epoch 19/100
318/318

The model has an average error of about 57.46 on the training dataset, and about 57.48 on the test dataset. By this we can see that these set of function give out the bad predictions of Log Error.

In [15]:
from keras import losses
build_model(train, test, 1, 'tanh', 'adam', 100, losses.LogCosh())

Epoch 1/100
318/318 - 2s - 7ms/step - loss: 0.0363
Epoch 2/100
318/318 - 1s - 2ms/step - loss: 1.7478e-04
Epoch 3/100
318/318 - 1s - 2ms/step - loss: 1.5317e-04
Epoch 4/100
318/318 - 1s - 2ms/step - loss: 1.5316e-04
Epoch 5/100
318/318 - 1s - 2ms/step - loss: 1.5314e-04
Epoch 6/100
318/318 - 1s - 2ms/step - loss: 1.5311e-04
Epoch 7/100
318/318 - 1s - 2ms/step - loss: 1.5310e-04
Epoch 8/100
318/318 - 1s - 2ms/step - loss: 1.5303e-04
Epoch 9/100
318/318 - 1s - 2ms/step - loss: 1.5300e-04
Epoch 10/100
318/318 - 1s - 2ms/step - loss: 1.5295e-04
Epoch 11/100
318/318 - 1s - 2ms/step - loss: 1.5291e-04
Epoch 12/100
318/318 - 1s - 2ms/step - loss: 1.5284e-04
Epoch 13/100
318/318 - 1s - 2ms/step - loss: 1.5273e-04
Epoch 14/100
318/318 - 1s - 2ms/step - loss: 1.5267e-04
Epoch 15/100
318/318 - 1s - 3ms/step - loss: 1.5257e-04
Epoch 16/100
318/318 - 1s - 3ms/step - loss: 1.5252e-04
Epoch 17/100
318/318 - 1s - 4ms/step - loss: 1.5235e-04
Epoch 18/100
318/318 - 1s - 3ms/step - loss: 1.5229e-04
Epoch

The model has an average error of about 0.16 on the training dataset, and about 0.15 on the test dataset. By this we can see that these set of function give out the same predictions of Log Error as the mean squared error loss function.

In [16]:
build_model(train,test,1,'tanh','Adagrad',100,'hinge')

Epoch 1/100


  super().__init__(**kwargs)


318/318 - 2s - 7ms/step - loss: 0.9709
Epoch 2/100
318/318 - 1s - 2ms/step - loss: 0.9513
Epoch 3/100
318/318 - 1s - 2ms/step - loss: 0.9378
Epoch 4/100
318/318 - 1s - 4ms/step - loss: 0.9266
Epoch 5/100
318/318 - 1s - 4ms/step - loss: 0.9167
Epoch 6/100
318/318 - 1s - 3ms/step - loss: 0.9076
Epoch 7/100
318/318 - 1s - 2ms/step - loss: 0.8991
Epoch 8/100
318/318 - 1s - 2ms/step - loss: 0.8911
Epoch 9/100
318/318 - 1s - 2ms/step - loss: 0.8835
Epoch 10/100
318/318 - 1s - 2ms/step - loss: 0.8762
Epoch 11/100
318/318 - 1s - 2ms/step - loss: 0.8691
Epoch 12/100
318/318 - 1s - 2ms/step - loss: 0.8622
Epoch 13/100
318/318 - 1s - 2ms/step - loss: 0.8555
Epoch 14/100
318/318 - 1s - 4ms/step - loss: 0.8490
Epoch 15/100
318/318 - 1s - 4ms/step - loss: 0.8426
Epoch 16/100
318/318 - 1s - 4ms/step - loss: 0.8363
Epoch 17/100
318/318 - 1s - 2ms/step - loss: 0.8301
Epoch 18/100
318/318 - 1s - 2ms/step - loss: 0.8240
Epoch 19/100
318/318 - 1s - 2ms/step - loss: 0.8180
Epoch 20/100
318/318 - 1s - 2ms/s

The model has an average error of about 18.40 on the training dataset, and about 18.40 on the test dataset.

In [17]:
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back),activation = 'tanh'))
model.add(Dense(1))
model.add(BatchNormalization())
model.add(Dropout(.6))
model.add(Dense(1))
model.compile(loss='hinge', optimizer='Adagrad')
model.fit(trainX, trainY, epochs=200, batch_size=256, verbose=2)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform(trainY)
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform(testY)
trainScore = math.sqrt(mean_squared_error(trainY, trainPredict[:, 0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY, testPredict[:, 0]))
print('Test Score: %.2f RMSE' % (testScore))

Epoch 1/200


  super().__init__(**kwargs)


318/318 - 2s - 8ms/step - loss: 0.9853
Epoch 2/200
318/318 - 1s - 2ms/step - loss: 0.9715
Epoch 3/200
318/318 - 1s - 2ms/step - loss: 0.9622
Epoch 4/200
318/318 - 1s - 2ms/step - loss: 0.9543
Epoch 5/200
318/318 - 1s - 2ms/step - loss: 0.9474
Epoch 6/200
318/318 - 1s - 2ms/step - loss: 0.9413
Epoch 7/200
318/318 - 1s - 2ms/step - loss: 0.9354
Epoch 8/200
318/318 - 1s - 2ms/step - loss: 0.9298
Epoch 9/200
318/318 - 1s - 2ms/step - loss: 0.9248
Epoch 10/200
318/318 - 1s - 2ms/step - loss: 0.9197
Epoch 11/200
318/318 - 1s - 2ms/step - loss: 0.9148
Epoch 12/200
318/318 - 1s - 2ms/step - loss: 0.9104
Epoch 13/200
318/318 - 1s - 4ms/step - loss: 0.9058
Epoch 14/200
318/318 - 1s - 4ms/step - loss: 0.9016
Epoch 15/200
318/318 - 1s - 3ms/step - loss: 0.8974
Epoch 16/200
318/318 - 1s - 4ms/step - loss: 0.8932
Epoch 17/200
318/318 - 1s - 2ms/step - loss: 0.8891
Epoch 18/200
318/318 - 1s - 2ms/step - loss: 0.8857
Epoch 19/200
318/318 - 1s - 2ms/step - loss: 0.8815
Epoch 20/200
318/318 - 1s - 2ms/s

The epochs has been set as 200 but this doesn't improve the RSME rather the error is increased to 22.20 on train and 22.19 on test data set in comparision with epochs=100.

### Kernel Initializers
Initializations define the way to set the initial random weights of Keras layers.

#### Normal
Initializer that generates tensors with a normal distribution.
Here, we have used Random normal distribution of the weights and epoch being 200 the RSME of train dataset 21.52 and test 21.52.

#### Random Uniform
Initializer that generates tensors with a uniform distribution.
Here, we have used Random normal distribution of the weights and epoch being 200 the RSME of train dataset 21.51 and test 21.50.

Even with the normal or uniformly distributed weights the error of the log Error doesn't improve.

In [18]:
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back),activation = 'tanh'))
model.add(Dense(1,kernel_initializer = 'normal'))
model.add(BatchNormalization())
model.add(Dropout(.6))
model.add(Dense(1,kernel_initializer = 'normal'))
model.compile(loss='hinge', optimizer='Adagrad')
model.fit(trainX, trainY, epochs=200, batch_size=256, verbose=2)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform(trainY)
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform(testY)
trainScore = math.sqrt(mean_squared_error(trainY, trainPredict[:, 0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY, testPredict[:, 0]))
print('Test Score: %.2f RMSE' % (testScore))

Epoch 1/200


  super().__init__(**kwargs)


318/318 - 2s - 8ms/step - loss: 0.9888
Epoch 2/200
318/318 - 1s - 4ms/step - loss: 0.9785
Epoch 3/200
318/318 - 1s - 4ms/step - loss: 0.9714
Epoch 4/200
318/318 - 1s - 3ms/step - loss: 0.9654
Epoch 5/200
318/318 - 1s - 4ms/step - loss: 0.9600
Epoch 6/200
318/318 - 1s - 2ms/step - loss: 0.9550
Epoch 7/200
318/318 - 1s - 4ms/step - loss: 0.9504
Epoch 8/200
318/318 - 1s - 2ms/step - loss: 0.9459
Epoch 9/200
318/318 - 1s - 2ms/step - loss: 0.9417
Epoch 10/200
318/318 - 1s - 2ms/step - loss: 0.9376
Epoch 11/200
318/318 - 1s - 2ms/step - loss: 0.9336
Epoch 12/200
318/318 - 1s - 2ms/step - loss: 0.9299
Epoch 13/200
318/318 - 1s - 2ms/step - loss: 0.9262
Epoch 14/200
318/318 - 1s - 2ms/step - loss: 0.9225
Epoch 15/200
318/318 - 1s - 2ms/step - loss: 0.9191
Epoch 16/200
318/318 - 1s - 2ms/step - loss: 0.9156
Epoch 17/200
318/318 - 1s - 2ms/step - loss: 0.9122
Epoch 18/200
318/318 - 1s - 3ms/step - loss: 0.9089
Epoch 19/200
318/318 - 1s - 3ms/step - loss: 0.9056
Epoch 20/200
318/318 - 1s - 4ms/s

In [19]:
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back),activation = 'tanh'))
model.add(Dense(1,kernel_initializer = 'random_uniform'))
model.add(BatchNormalization())
model.add(Dropout(.6))
model.add(Dense(1,kernel_initializer = 'random_uniform'))
model.compile(loss='hinge', optimizer='Adagrad')
model.fit(trainX, trainY, epochs=200, batch_size=256, verbose=2)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform(trainY)
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform(testY)
trainScore = math.sqrt(mean_squared_error(trainY, trainPredict[:, 0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY, testPredict[:, 0]))
print('Test Score: %.2f RMSE' % (testScore))

Epoch 1/200


  super().__init__(**kwargs)


318/318 - 3s - 8ms/step - loss: 0.9890
Epoch 2/200
318/318 - 1s - 2ms/step - loss: 0.9789
Epoch 3/200
318/318 - 1s - 2ms/step - loss: 0.9721
Epoch 4/200
318/318 - 1s - 2ms/step - loss: 0.9663
Epoch 5/200
318/318 - 1s - 4ms/step - loss: 0.9610
Epoch 6/200
318/318 - 1s - 2ms/step - loss: 0.9561
Epoch 7/200
318/318 - 1s - 2ms/step - loss: 0.9516
Epoch 8/200
318/318 - 1s - 2ms/step - loss: 0.9472
Epoch 9/200
318/318 - 1s - 2ms/step - loss: 0.9431
Epoch 10/200
318/318 - 1s - 2ms/step - loss: 0.9390
Epoch 11/200
318/318 - 1s - 2ms/step - loss: 0.9352
Epoch 12/200
318/318 - 1s - 2ms/step - loss: 0.9314
Epoch 13/200
318/318 - 1s - 2ms/step - loss: 0.9278
Epoch 14/200
318/318 - 1s - 5ms/step - loss: 0.9241
Epoch 15/200
318/318 - 1s - 5ms/step - loss: 0.9206
Epoch 16/200
318/318 - 1s - 4ms/step - loss: 0.9172
Epoch 17/200
318/318 - 1s - 3ms/step - loss: 0.9138
Epoch 18/200
318/318 - 1s - 4ms/step - loss: 0.9105
Epoch 19/200
318/318 - 1s - 2ms/step - loss: 0.9072
Epoch 20/200
318/318 - 1s - 2ms/s

## Summary:

In this notebook, we have implemented LSTM recurrent neural networks for time series prediction of LogError = (log(Zestimate)-log(salesprice)) using 2016 property dataset and its corresponding log error values provided by zillow for home value prediction in Python using Keras and tensorflow deep learning libraries.

Firstly, we have converted an array of values into a dataset matrix and fix random seed for reproducability. Normalized the dataset and split into training 90% and test dataset 10% later, a LSTM network was bulit with 4 inputs, 1 layer to predict the Error of existing Log Error and predicted Log Error by randonmly generated weights on gradient descent with various epochs, optimizers, activation and loss functions. With eposch=100, Activation functions= (sigmoid, relu, tanh), optimizers (adam, adagrad) and loss functions = mean_squared_error. Also, the weights are given manually for the gradient descent by kernel intializers which gives the best RSME of 0.16 on the train data and 0.15 on the test data.
