You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering how one would implement this code for an arbitrary number of stacked encoding and decoding layers. E,.g. my architecture (shown below contains 2 stacked LSTM layers, both in the encoding phase and the decoding phase (there is bi-directionality in the encoding phase):
PS. I am showing a batch model, I also have stateful inference model, in which i transfer the weights and states over to the decoding LSTMS.
# Canonical Model
charset = list(vocab.charset)
# Callbacks
h = History()
rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.5,patience=10, min_lr=0.000001, verbose=1, min_delta=1e-5)
es = EarlyStopping(monitor='val_loss', min_delta=1e-6, patience=10, verbose=0, mode='auto')
l2 = 0.00002 # Don't use a high regularization.
tb = TensorBoard(log_dir='./logs', histogram_freq=0, batch_size=250, update_freq='epoch')
#---------------------- Encoder--------------------------#
# Peviously used 250
lstm_dim = 500
# Input_shape= X_train.shape[1:]
# Input shape
canonical_encoder_input = Input(shape=(None, len(charset)))
# First encoder layer
encoder_LSTM = Bidirectional(CuDNNLSTM(lstm_dim // 2, return_state=True, return_sequences=True, name='bd_enc_LSTM_01'))
encoder1_output, forward_h1, forward_c1, backward_h1, backward_c1 = encoder_LSTM (canonical_encoder_input)
# Second encoder layer
encoder_LSTM2 = Bidirectional(CuDNNLSTM(lstm_dim // 2, return_state=True, return_sequences=True, name='bd_enc_LSTM_02'))
encoder2_output, forward_h2, forward_c2, backward_h2, backward_c2 = encoder_LSTM2 (encoder1_output)
# Concatenate all states together
encoder_states = Concatenate(axis=-1)([forward_h1, forward_c1, forward_h2, forward_c2,
backward_h1, backward_c1, backward_h2, backward_c2])
encoder_dense_layer = Dense(lstm_dim, kernel_regularizer=regularizers.l2(l2), activation='relu', name="enc_dense")
encoder_dense = encoder_dense_layer(encoder_states)
# Add dropout here?
print(type(encoder_dense))
#---------------------- states--------------------------#
# States for the first LSTM layer
canonical_decoder_input = Input(shape= (None, len(charset))) #teacher forcing
dense_h1 = Dense(lstm_dim, kernel_regularizer=regularizers.l2(l2), activation='relu', name="dec_dense_h1")
dense_c1 = Dense(lstm_dim, kernel_regularizer=regularizers.l2(l2), activation='relu', name="dec_dense_c1")
state_h1 = dense_h1(encoder_dense)
state_c1 = dense_c1(encoder_dense)
states1 =[state_h1, state_c1]
# States for the second LSTM layer
dense_h2 = Dense(lstm_dim, activation='relu', name="dec_dense_h2")
dense_c2 = Dense(lstm_dim, activation='relu', name="dec_dense_c2")
state_h2 = dense_h2(encoder_dense)
state_c2 = dense_c2(encoder_dense)
states2 =[state_h2, state_c2]
#------------------------Decoder------------------------#
# This goes through a decoding lstm
decoder_LSTM1 = CuDNNLSTM(lstm_dim, return_sequences=True, return_state=True, name='dec_LSTM_01')
decoder1_output,_,_ = decoder_LSTM1(canonical_decoder_input, initial_state=states1)
# Couple the first LSTM with the 2nd LSTM
decoder_LSTM2 = CuDNNLSTM(lstm_dim, return_sequences=True, return_state=True, name='dec_LSTM_02')
decoder2_output,_,_ = decoder_LSTM2(decoder1_output, initial_state=states2)
# Pass hidden states of decoder2_outputs to dense layer with softmax
decoder_dense = Dense(len(charset), kernel_regularizer=regularizers.l2(l2), activation='softmax', name="dec_dense_softmax")
decoder_out = decoder_dense(decoder2_output)
#----------------------compilations------------#
# Model compilation (canonical)
model = Model(inputs=[canonical_encoder_input, canonical_decoder_input], outputs=[decoder_out])
#Run training
start = time.time()
# Optimizers
#learning_rate = 0.002, #comment out if using exponential LearningRateScheduler
adam=Adam()
rms=RMSprop()
# Full (canonical) model
model.compile(optimizer=adam, loss='categorical_crossentropy')
# Custom exponential learning rate scheduler
lr_schedule = LearningRateSchedule(epoch_to_start=50, last_epoch=349)
lr_scheduler = LearningRateScheduler(schedule=lr_schedule.exp_decay, verbose=1)
# Fit
model.fit(x=[X_train, Y_train],
y=Y_train_target,
batch_size=250,
epochs=350,
shuffle = True,
validation_data = ([X_test, Y_test], Y_test_target),
callbacks = [h, tb, lr_scheduler])
end = time.time()
print(end - start)
model.summary()
Is it possible to implement your code for this architecture?
best,
Dean
The text was updated successfully, but these errors were encountered:
I believe that I should apply an attention layer to each of the encoder states, prior to concatenation. However, the question of how to tie the encoding attention to the decoding states remains hm.
Hi Zafareli
I was wondering how one would implement this code for an arbitrary number of stacked encoding and decoding layers. E,.g. my architecture (shown below contains 2 stacked LSTM layers, both in the encoding phase and the decoding phase (there is bi-directionality in the encoding phase):
PS. I am showing a batch model, I also have stateful inference model, in which i transfer the weights and states over to the decoding LSTMS.
Is it possible to implement your code for this architecture?
best,
Dean
The text was updated successfully, but these errors were encountered: