Skip to content

Latest commit

 

History

History
58 lines (35 loc) · 2.12 KB

Neural Machine Translation by Jointly Learning to Align and Translate.md

File metadata and controls

58 lines (35 loc) · 2.12 KB

#Neural Machine Translation by Jointly Learning to Align and Translate - ICLR 2015

###Written by Mingdong

##Task Machine Translation from English to French

This is a pure generation approach, like Sequence to Sequence Learning.

##Method ###Overview This model consists of two parts:

  1. Encoder: Bi-directional RNN
  2. Decoder: GRU + Attention

As shown in figure 1.

Figure 1

###Encoder A vector concatenated by two vectors, which were produced by a forward and a backward RNN.

The author claimed that the RNN can better focus on the local input, thus better fits the whole model.

###Decoder Basically it is a GRU. The flow is

  1. We have all input vector {h}, previous hidden state s_i-1, previous output y_i-1.
  2. Use a score function to compute the score(h_j, s_i-1) = a_ij
  3. Use a_ij to compute the weighted sum of all h_j to get c_i.
    • In one word, the aim of step 2 and 3 is to get a weighted sum of input vectors as the input of the RNN-Decoder
    • Step 2 get the weight
    • Step 3 get the sum
  4. Use s_i-1, y_i-1, c_i to compute next hidden state s_i
  5. Use s_i, y_i-1, c_i to predict next output y_i

##Contribution Instead of previous Encode-Decode structure, this paper used the attention to dynamically focus on different vectors rather than single sentence representation vector. This makes it work better with long sequences, as the experiments shown.

##Drawback There are many things that can be replaced, which makes me curious about their effects.

  1. Bi-RNN
    • Can it be replaced by LSTM, single RNN or just raw input vector?
  2. GRU
    • Can it be replaced by LSTM or other RNN?

##Cite Bibtex