Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stacked architecture #50

Closed
Sum02dean opened this issue Sep 17, 2019 · 1 comment
Closed

Stacked architecture #50

Sum02dean opened this issue Sep 17, 2019 · 1 comment

Comments

@Sum02dean
Copy link

Sum02dean commented Sep 17, 2019

Hi Zafareli

I was wondering how one would implement this code for an arbitrary number of stacked encoding and decoding layers. E,.g. my architecture (shown below contains 2 stacked LSTM layers, both in the encoding phase and the decoding phase (there is bi-directionality in the encoding phase):

PS. I am showing a batch model, I also have stateful inference model, in which i transfer the weights and states over to the decoding LSTMS.

# Canonical Model
charset = list(vocab.charset)
# Callbacks
h = History()
rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.5,patience=10, min_lr=0.000001, verbose=1, min_delta=1e-5)
es = EarlyStopping(monitor='val_loss', min_delta=1e-6, patience=10, verbose=0, mode='auto')
l2 = 0.00002 # Don't use a high regularization.
tb = TensorBoard(log_dir='./logs', histogram_freq=0, batch_size=250, update_freq='epoch')

#---------------------- Encoder--------------------------#
# Peviously used 250
lstm_dim = 500
# Input_shape= X_train.shape[1:]

# Input shape
canonical_encoder_input = Input(shape=(None, len(charset)))

# First encoder layer
encoder_LSTM = Bidirectional(CuDNNLSTM(lstm_dim // 2, return_state=True, return_sequences=True, name='bd_enc_LSTM_01'))
encoder1_output, forward_h1, forward_c1, backward_h1, backward_c1 = encoder_LSTM (canonical_encoder_input)

# Second encoder layer
encoder_LSTM2 = Bidirectional(CuDNNLSTM(lstm_dim // 2, return_state=True, return_sequences=True, name='bd_enc_LSTM_02'))
encoder2_output, forward_h2, forward_c2, backward_h2, backward_c2 = encoder_LSTM2 (encoder1_output)

# Concatenate all states together
encoder_states = Concatenate(axis=-1)([forward_h1, forward_c1, forward_h2, forward_c2,
                                       backward_h1, backward_c1, backward_h2, backward_c2])

encoder_dense_layer = Dense(lstm_dim, kernel_regularizer=regularizers.l2(l2), activation='relu', name="enc_dense")
encoder_dense = encoder_dense_layer(encoder_states)
# Add dropout here?

print(type(encoder_dense))

#---------------------- states--------------------------#

# States for the first LSTM layer
canonical_decoder_input = Input(shape= (None, len(charset))) #teacher forcing
dense_h1 = Dense(lstm_dim, kernel_regularizer=regularizers.l2(l2), activation='relu', name="dec_dense_h1")
dense_c1 = Dense(lstm_dim, kernel_regularizer=regularizers.l2(l2), activation='relu', name="dec_dense_c1")
state_h1 = dense_h1(encoder_dense)
state_c1 = dense_c1(encoder_dense)
states1 =[state_h1, state_c1]


# States for the second LSTM layer
dense_h2 = Dense(lstm_dim, activation='relu', name="dec_dense_h2")
dense_c2 = Dense(lstm_dim, activation='relu', name="dec_dense_c2")
state_h2 = dense_h2(encoder_dense)
state_c2 = dense_c2(encoder_dense)
states2 =[state_h2, state_c2]

#------------------------Decoder------------------------#

# This goes through a decoding lstm
decoder_LSTM1 = CuDNNLSTM(lstm_dim, return_sequences=True, return_state=True, name='dec_LSTM_01')
decoder1_output,_,_ = decoder_LSTM1(canonical_decoder_input, initial_state=states1)


# Couple the first LSTM with the 2nd LSTM
decoder_LSTM2 = CuDNNLSTM(lstm_dim, return_sequences=True, return_state=True, name='dec_LSTM_02')
decoder2_output,_,_ = decoder_LSTM2(decoder1_output, initial_state=states2) 


# Pass hidden states of decoder2_outputs to dense layer with softmax
decoder_dense = Dense(len(charset), kernel_regularizer=regularizers.l2(l2),  activation='softmax', name="dec_dense_softmax")
decoder_out = decoder_dense(decoder2_output)

#----------------------compilations------------#

# Model compilation (canonical)
model = Model(inputs=[canonical_encoder_input, canonical_decoder_input], outputs=[decoder_out])
#Run training
start = time.time()

# Optimizers
#learning_rate = 0.002, #comment out if using exponential LearningRateScheduler
adam=Adam() 
rms=RMSprop() 

# Full (canonical) model
model.compile(optimizer=adam, loss='categorical_crossentropy')

# Custom exponential learning rate scheduler
lr_schedule = LearningRateSchedule(epoch_to_start=50, last_epoch=349)

lr_scheduler = LearningRateScheduler(schedule=lr_schedule.exp_decay, verbose=1)


# Fit
model.fit(x=[X_train, Y_train], 
          y=Y_train_target, 
          batch_size=250, 
          epochs=350,
          shuffle = True,
          validation_data = ([X_test, Y_test], Y_test_target),
          callbacks = [h, tb, lr_scheduler])

end = time.time()
print(end - start)
model.summary()

Is it possible to implement your code for this architecture?

best,

Dean

@Sum02dean
Copy link
Author

Sum02dean commented Sep 17, 2019

I believe that I should apply an attention layer to each of the encoder states, prior to concatenation. However, the question of how to tie the encoding attention to the decoding states remains hm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant