#Neural Machine Translation by Jointly Learning to Align and Translate - ICLR 2015

###Written by Mingdong

##Task Machine Translation from English to French

This is a pure generation approach, like Sequence to Sequence Learning.

##Method ###Overview This model consists of two parts:

Encoder: Bi-directional RNN
Decoder: GRU + Attention

As shown in figure 1.

###Encoder A vector concatenated by two vectors, which were produced by a forward and a backward RNN.

The author claimed that the RNN can better focus on the local input, thus better fits the whole model.

###Decoder Basically it is a GRU. The flow is

We have all input vector {h}, previous hidden state s_i-1, previous output y_i-1.
Use a score function to compute the score(h_j, s_i-1) = a_ij
Use a_ij to compute the weighted sum of all h_j to get c_i.
- In one word, the aim of step 2 and 3 is to get a weighted sum of input vectors as the input of the RNN-Decoder
- Step 2 get the weight
- Step 3 get the sum
Use s_i-1, y_i-1, c_i to compute next hidden state s_i
Use s_i, y_i-1, c_i to predict next output y_i

##Contribution Instead of previous Encode-Decode structure, this paper used the attention to dynamically focus on different vectors rather than single sentence representation vector. This makes it work better with long sequences, as the experiments shown.

##Drawback There are many things that can be replaced, which makes me curious about their effects.

Bi-RNN
- Can it be replaced by LSTM, single RNN or just raw input vector?
GRU
- Can it be replaced by LSTM or other RNN?

##Cite Bibtex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly