Stacked architecture #50

Sum02dean · 2019-09-17T11:27:20Z

Hi Zafareli

I was wondering how one would implement this code for an arbitrary number of stacked encoding and decoding layers. E,.g. my architecture (shown below contains 2 stacked LSTM layers, both in the encoding phase and the decoding phase (there is bi-directionality in the encoding phase):

PS. I am showing a batch model, I also have stateful inference model, in which i transfer the weights and states over to the decoding LSTMS.

# Canonical Model
charset = list(vocab.charset)
# Callbacks
h = History()
rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.5,patience=10, min_lr=0.000001, verbose=1, min_delta=1e-5)
es = EarlyStopping(monitor='val_loss', min_delta=1e-6, patience=10, verbose=0, mode='auto')
l2 = 0.00002 # Don't use a high regularization.
tb = TensorBoard(log_dir='./logs', histogram_freq=0, batch_size=250, update_freq='epoch')

#---------------------- Encoder--------------------------#
# Peviously used 250
lstm_dim = 500
# Input_shape= X_train.shape[1:]

# Input shape
canonical_encoder_input = Input(shape=(None, len(charset)))

# First encoder layer
encoder_LSTM = Bidirectional(CuDNNLSTM(lstm_dim // 2, return_state=True, return_sequences=True, name='bd_enc_LSTM_01'))
encoder1_output, forward_h1, forward_c1, backward_h1, backward_c1 = encoder_LSTM (canonical_encoder_input)

# Second encoder layer
encoder_LSTM2 = Bidirectional(CuDNNLSTM(lstm_dim // 2, return_state=True, return_sequences=True, name='bd_enc_LSTM_02'))
encoder2_output, forward_h2, forward_c2, backward_h2, backward_c2 = encoder_LSTM2 (encoder1_output)

# Concatenate all states together
encoder_states = Concatenate(axis=-1)([forward_h1, forward_c1, forward_h2, forward_c2,
                                       backward_h1, backward_c1, backward_h2, backward_c2])

encoder_dense_layer = Dense(lstm_dim, kernel_regularizer=regularizers.l2(l2), activation='relu', name="enc_dense")
encoder_dense = encoder_dense_layer(encoder_states)
# Add dropout here?

print(type(encoder_dense))

#---------------------- states--------------------------#

# States for the first LSTM layer
canonical_decoder_input = Input(shape= (None, len(charset))) #teacher forcing
dense_h1 = Dense(lstm_dim, kernel_regularizer=regularizers.l2(l2), activation='relu', name="dec_dense_h1")
dense_c1 = Dense(lstm_dim, kernel_regularizer=regularizers.l2(l2), activation='relu', name="dec_dense_c1")
state_h1 = dense_h1(encoder_dense)
state_c1 = dense_c1(encoder_dense)
states1 =[state_h1, state_c1]


# States for the second LSTM layer
dense_h2 = Dense(lstm_dim, activation='relu', name="dec_dense_h2")
dense_c2 = Dense(lstm_dim, activation='relu', name="dec_dense_c2")
state_h2 = dense_h2(encoder_dense)
state_c2 = dense_c2(encoder_dense)
states2 =[state_h2, state_c2]

#------------------------Decoder------------------------#

# This goes through a decoding lstm
decoder_LSTM1 = CuDNNLSTM(lstm_dim, return_sequences=True, return_state=True, name='dec_LSTM_01')
decoder1_output,_,_ = decoder_LSTM1(canonical_decoder_input, initial_state=states1)


# Couple the first LSTM with the 2nd LSTM
decoder_LSTM2 = CuDNNLSTM(lstm_dim, return_sequences=True, return_state=True, name='dec_LSTM_02')
decoder2_output,_,_ = decoder_LSTM2(decoder1_output, initial_state=states2) 


# Pass hidden states of decoder2_outputs to dense layer with softmax
decoder_dense = Dense(len(charset), kernel_regularizer=regularizers.l2(l2),  activation='softmax', name="dec_dense_softmax")
decoder_out = decoder_dense(decoder2_output)

#----------------------compilations------------#

# Model compilation (canonical)
model = Model(inputs=[canonical_encoder_input, canonical_decoder_input], outputs=[decoder_out])
#Run training
start = time.time()

# Optimizers
#learning_rate = 0.002, #comment out if using exponential LearningRateScheduler
adam=Adam() 
rms=RMSprop() 

# Full (canonical) model
model.compile(optimizer=adam, loss='categorical_crossentropy')

# Custom exponential learning rate scheduler
lr_schedule = LearningRateSchedule(epoch_to_start=50, last_epoch=349)

lr_scheduler = LearningRateScheduler(schedule=lr_schedule.exp_decay, verbose=1)


# Fit
model.fit(x=[X_train, Y_train], 
          y=Y_train_target, 
          batch_size=250, 
          epochs=350,
          shuffle = True,
          validation_data = ([X_test, Y_test], Y_test_target),
          callbacks = [h, tb, lr_scheduler])

end = time.time()
print(end - start)
model.summary()

Is it possible to implement your code for this architecture?

best,

Dean

The text was updated successfully, but these errors were encountered:

Sum02dean · 2019-09-17T16:39:32Z

I believe that I should apply an attention layer to each of the encoder states, prior to concatenation. However, the question of how to tie the encoding attention to the decoding states remains hm.

Sum02dean closed this as completed Sep 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stacked architecture #50

Stacked architecture #50

Sum02dean commented Sep 17, 2019 •

edited

Sum02dean commented Sep 17, 2019 •

edited

Stacked architecture #50

Stacked architecture #50

Comments

Sum02dean commented Sep 17, 2019 • edited

Sum02dean commented Sep 17, 2019 • edited

Sum02dean commented Sep 17, 2019 •

edited

Sum02dean commented Sep 17, 2019 •

edited