#  Training and evaluating an LSTM using reversed sequences
 All you need to
do is write a variant of the data generator where the input sequences are reverted along
the time dimension 

(replace the last line with yield samples[:, ::-1, :], targets).

In [1]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

In [2]:
max_features = 10000
maxlen = 500

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

In [3]:
# Reverses sequences 
Rx_train = [x[::-1] for x in x_train]
Rx_test = [x[::-1] for x in x_test]

In [4]:
# padding to enshoure all the sentence are of same length

Rx_train = sequence.pad_sequences(x_train, maxlen=maxlen)
Rx_test = sequence.pad_sequences(x_test, maxlen=maxlen)

In [5]:
# padding to enshoure all the sentence are of same length
# NORMAL SEQUENCE

x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

# LSTM Using Reverse Seqence

In [6]:
model = Sequential()
model.add(layers.Embedding(max_features, 128))
model.add(layers.LSTM(32))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['acc'])

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 128)         1280000   
_________________________________________________________________
lstm (LSTM)                  (None, 32)                20608     
_________________________________________________________________
dense (Dense)                (None, 1)                 33        
Total params: 1,300,641
Trainable params: 1,300,641
Non-trainable params: 0
_________________________________________________________________


In [7]:
history = model.fit(Rx_train, y_train,
                    epochs=10,
                    batch_size=128,
                    validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [8]:
evaluation = model.evaluate(Rx_test,y_test,verbose=2)
print()
print("Loss: ",evaluation[0]*100,"%")
print("Accuracy: ",evaluation[1]*100,"%")

782/782 - 8s - loss: 0.4058 - acc: 0.8652

Loss:  40.58133661746979 %
Accuracy:  86.52399778366089 %


You get performance nearly identical to that of the chronological-order LSTM.
Remarkably, on such a text dataset, reversed-order processing works just as well as
chronological processing, confirming the hypothesis that, although word order does
matter in understanding language

 # Bidirectional_RNN By processing a sequence both ways, a bidirectional RNN can catch patterns that may be overlooked by a unidirectional RNN.

#  Training and evaluating a bidirectional LSTM

To instantiate a bidirectional RNN in Keras, you use the Bidirectional layer, which takes
as its first argument a recurrent layer instance. Bidirectional creates a second, separate
instance of this recurrent layer and uses one instance for processing the input sequences
in chronological order and the other instance for processing the input sequences in
reversed order.

In [9]:
model = Sequential()
model.add(layers.Embedding(max_features, 32))
model.add(layers.Bidirectional(layers.LSTM(32)))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

In [10]:
history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=128,
                    validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [11]:
evaluation = model.evaluate(x_test,y_test,verbose=2)
print()
print("Loss: ",evaluation[0]*100,"%")
print("Accuracy: ",evaluation[1]*100,"%")

782/782 - 12s - loss: 0.4088 - acc: 0.8656

Loss:  40.87960422039032 %
Accuracy:  86.56399846076965 %


It performs slightly better than the regular LSTM you tried in the previous section,
achieving over 89% validation accuracy. It also seems to overfit more quickly, which is
unsurprising because a bidirectional layer has twice as many parameters as a chronological LSTM. With some regularization, the bidirectional approach would likely be a
strong performer on this task.

#  Training a bidirectional GRU

In [12]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import RMSprop

In [13]:
# These are not Book codes. I think those codes were for last dataset jenaClimate

model = Sequential()
model.add(layers.Embedding(max_features, 32))
model.add(layers.Bidirectional(layers.GRU(32)))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer=RMSprop(), loss='binary_crossentropy', metrics=['acc'])

In [14]:
history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=128,
                    validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [15]:
evaluation = model.evaluate(x_test,y_test,verbose=2)
print()
print("Loss: ",evaluation[0]*100,"%")
print("Accuracy: ",evaluation[1]*100,"%")

782/782 - 12s - loss: 0.5014 - acc: 0.8117

Loss:  50.13570189476013 %
Accuracy:  81.1680018901825 %


This performs about as well as the regular GRU layer. It’s easy to understand why: all the
predictive capacity must come from the chronological half of the network, because the
antichronological half is known to be severely underperforming on this task (again,
because the recent past matters much more than the distant past in this case). 

#Going even further
There are many other things you could try, in order to improve performance on the
temperature-forecasting problem:

 Adjust the number of units in each recurrent layer in the stacked setup. The
current choices are largely arbitrary and thus probably suboptimal.

 Adjust the learning rate used by the RMSprop optimizer.

 Try using LSTM layers instead of GRU layers.

 Try using a bigger densely connected regressor on top of the recurrent layers:
that is, a bigger Dense layer or even a stack of Dense layers.

 Don’t forget to eventually run the best-performing models (in terms of validation MAE) on the test set! Otherwise, you’ll develop architectures that are overfitting to the validation set.

#Wrapping up
Here’s what you should take away from this section:

 As you first learned in chapter 4, when approaching a new problem, it’s good to
first establish common-sense baselines for your metric of choice. If you don’t
have a baseline to beat, you can’t tell whether you’re making real progress.

 Try simple models before expensive ones, to justify the additional expense.
Sometimes a simple model will turn out to be your best option.

 When you have data where temporal ordering matters, recurrent networks are
a great fit and easily outperform models that first flatten the temporal data.

 To use dropout with recurrent networks, you should use a time-constant dropout mask and recurrent dropout mask. These are built into Keras recurrent layers, so all you have to do is use the dropout and recurrent_dropout arguments
of recurrent layers.

 Stacked RNNs provide more representational power than a single RNN layer.
They’re also much more expensive and thus not always worth it. Although they
offer clear gains on complex problems (such as machine translation), they may
not always be relevant to smaller, simpler problems.

 Bidirectional RNNs, which look at a sequence both ways, are useful on naturallanguage processing problems. But they aren’t strong performers on sequence
data where the recent past is much more informative than the beginning of the
sequence.