4 questions regarding the structure of LSTM_autoencoder

Hello, Mr. Ranjan. Thanks for your great article  [LSTM Autoencoder for Extreme Rare Event Classification in Keras](https://towardsdatascience.com/lstm-autoencoder-for-extreme-rare-event-classification-in-keras-ce209a224cfb) and code on the github. While reading your code, however, I came up with 3 questions. 

I decided to ask you questions here rather than on medium because I can upload pictures and quote codes more accurately here. Hope you are okay with this. 

<h3>Q1.  Why ‘return_sequences=True’ for all the LSTM layers?</h3>

<h4> Back up explanations </h4>

1. I think LSTM autoencoder is very similar to seq2seq model : __In Autoencdoer__ , input data is squeezed into a single latent vector with smaller length than the original input. __In seq2seq model__ the same thing happens with an input sequence and a fixed-length vector. 

![encoder decode model](https://user-images.githubusercontent.com/34431729/58377446-b55e9f80-7fbb-11e9-9570-fc5d63d0dd39.PNG)
 <p align = "center"> < Figure 1. seq2seq model : Encoding - Decoding model > </p>

2. In the encoding stage, what a model needs to do is making a fixed-length vector(__a latent vector__) which contains all the information and time-wise relationships of the input sequence. In decoding step, a model’s goal is to create an output that is as close as possible to the original input.

3. So my guess is that in the encoding stage, we do not need outputs as in the figure 1, as the autoencoder model's only goal is to make a hidden latent vector well. The little MSE the output created from the latent vector in the decoding stage has with the input data, the better the latent vector is. 

4. Doesn’t it mean that we can make ‘return_sequences = False’, which does not print out the outputs in the encoding stage?

<h3> Q2. What would be the first hidden state (h0, c0) for the decoding stage? </h3> 

<h4> Back up explanations </h4>

1. According to the code, the hidden latent vector is repeated for `timesteps` as in `lstm_autoencoder.add(RepeatVector(timesteps))` This means that __the latent vector would be fed to the decoder as an input in the decoding stage__. Below is the code snippet.

```
lstm_autoencoder = Sequential()

# Encoder
lstm_autoencoder.add(LSTM(timesteps, activation='relu', input_shape=(timesteps, n_features), return_sequences=True))
lstm_autoencoder.add(LSTM(16, activation='relu', return_sequences=True))
lstm_autoencoder.add(LSTM(1, activation='relu'))
lstm_autoencoder.add(RepeatVector(timesteps))

# Decoder
lstm_autoencoder.add(LSTM(timesteps, activation='relu', return_sequences=True))
lstm_autoencoder.add(LSTM(16, activation='relu', return_sequences=True))
lstm_autoencoder.add(TimeDistributed(Dense(n_features)))
``` 

2. If latent vectors are used as inputs in the decoding stage, what would be used for __inital hidden state (h0, c0)__ ? In the seq2seq model (figure 1) mentioned above, the latent vector is used as initial hidden state (h0, c0) in the decoding stage.  The input in the decoding stage would be a sentence that needs to be translated, for example from English to French.

3. So I am curious to know what would be used as an __initial hidden state cell (h0, c0)__ in your code!

<h3>Q3.  Why output unit size increases from 5 to 16, in the encoding stage?</h3>

<h4> Back up explanations </h4>

1. From the `lstm_autoencoder.summary()` we can see that the output unit increases from 5 (in the layer 'lstm_16') to 16 (in the layer 'lstm_17' )

![image](https://user-images.githubusercontent.com/34431729/58378453-088e1d80-7fcf-11e9-922b-f27eae661378.png)
 <p align = "center"> < Figure 2. summary of LSTM - Autoencoder model > </p>

2. Since the output of previous LSTM layer is an input for the next LSTM layer, I think the output size is equivalent to hidden state size. 
 
3. If the hidden layer's size is greater than the number of inputs, the model can learn just an 'identity function' which is not desirable. (Source : [What is the intuition behind the sparsity parameter in sparse autoencoders?])(https://stats.stackexchange.com/questions/149478/what-is-the-intuition-behind-the-sparsity-parameter-in-sparse-autoencoders)

3. Layer 'lstm_16' is only 5-size long while the next layer 'lstm_17' is 16-size long. So I think the lstm_17 would just copy (acting like an 'identity matrix')  the last_16, which makes the layer lstm_17 undesirable.

4. I am curious to know why the output size (hidden_layer size) increases rather than decreases!

<h3>Q4.  How smaller does the input data size get reduced in the latent vector?</h3>

<h4> Back up explanations </h4>

1. In the full-connected layer Autoencoder, we can see how smaller the input vector get reduced. For example in the picture below, 10 node-long vector input gets reduced to 4 node-long latent vector in the middle. 

![image](https://user-images.githubusercontent.com/34431729/58381821-3ab67400-7ffd-11e9-9f06-6d1045c5ecbc.png)

2. In your code, how smaller did the 59 node long input vector (one input of a certain time. It has 59 features and 1 answer label) get reduced in the latent vector?

Thanks for this nice post again.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

4 questions regarding the structure of LSTM_autoencoder #1

Q1. Why ‘return_sequences=True’ for all the LSTM layers?

Back up explanations

Q2. What would be the first hidden state (h0, c0) for the decoding stage?

Back up explanations

Q3. Why output unit size increases from 5 to 16, in the encoding stage?

Back up explanations

Q4. How smaller does the input data size get reduced in the latent vector?

Back up explanations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

4 questions regarding the structure of LSTM_autoencoder #1

Description

Q1. Why ‘return_sequences=True’ for all the LSTM layers?

Back up explanations

Q2. What would be the first hidden state (h0, c0) for the decoding stage?

Back up explanations

Q3. Why output unit size increases from 5 to 16, in the encoding stage?

Back up explanations

Q4. How smaller does the input data size get reduced in the latent vector?

Back up explanations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions