Learned in Translation: Contextualized Word Vectors(2017)

Abstract

contextualize word vectors하기 위하여 machine translation을 위한 attentinal seq2eq 모델의 deep-LSTM을 사용
- **contextualize = 문맥화하다
이렇게 만들어진 context vector(CoVe)가 NLP모델의 성능을 좋게 만들었다.
번역모델의 Encoder를 Featrue Extractor로 사용하자는 논문 by 조재민님

Introduction

이미지 분야에서 pretrained 된 CNN이 다른 분야에 적용되어 더 좋은 성능을 낸 것처럼 pretrained된 NLP Encoder를 사용하면 어떨까?
- **(원문 : the successful transfer of CNNs trained on ImageNet to other tasks in computer vision)
large amounts of unlabeled training data로 부터 word vector형태의 information으로 바꾼 덕분에 다양한 연구가 가능했다.
The ability to share a common representation of words in the context of sentences
그렇다면, LSTM-based encoder는 다른 NLP모델에서 많이 사용하므로 이를 활용하자!
요약하자면, 한 곳에서 잘 trained된 Encoder는 다른 모델에 적용되어도 좋은 효과를 발휘할 것이다.

간략한 모델 설명

architecture

다른 번역모델에서 학습된 Encoder를 다른 NLP모델에 붙여줌을 통해서 CoVe를 만듦(Encoder가 context를 만드는 역할)
- **(원문 : We append the outputs of the MT-LSTMs, CoVe, to the word vectors typically used as inputs to these models)
단지 word2vec이나 GloVe를 통해서만 학습된 word vector를 사용하는 것보다 좋은 효과
MT-LSTM(Encoder를 학습하는 번역 모델)의 data set이 많을수록, CoVe를 사용하는 다른 모델의 성능 개선
MT가 text classification and question answering과는 관련없어 보이지만, 문맥을 이해한다는 점에서 효과적
the state of the art!

Related Work

Transfer Learning

Recent work in NLP has continued in this direction by using pretrained word representations to improve models

Neural Machine Translation

seq2seq 모델의 Encoder, Decoder 개념이 중요하다.

Transfer Learning and Machine Translation

Machine translation 은 losing information in the source language sentence가 없기 때문에 source domain for transfer learning에 적합하다.
MT를 위한 데이터가 많으므로 상대적으로 유리하다.

Transfer Learning in Computer Vision

using a pretrained CNN to extract features from region proposals improves object detection and semantic segmentation models.

Machine Translation Model

English -> German 번역 모델 사용하여 W1 = [a1, a2, ..., a_n] -> W2 = [b1, b2, ..., b_n]으로 바꿈
GloVe(W1)를 two-layer bidirectional long short-term memory network (MT-LSTM)에 넣어줌. 이때 MT-LSTM이 decoder(W2의 distribution 결정)에게 context를 전달함
- 나중에 pretrained encoder 역할을 하게 됨
decoder는 previous target embedding과 a context-adjusted hidden state를 a two-layer, unidirectional LSTM에 넣어줌
vector of attention weights α는 the relevance of each encoding time-step to the current decoder state를 나타낸다.
H refers to the elements of h stacked along the time dimension
context-adjusted hidden state h-tilda를 만들기 위해 tanh에 넣어줌
마지막 transformation of the context-adjusted hidden state에서 output words의 distribution이 만들어짐

CoVe

CoVe(w) = MT-LSTM(GloVe(w)), w = word sequence
w_tilda = [GloVe(w);CoVe(w)]의 형태로 만든다.

Classification with CoVe

classification

**(원문 : This model is designed to handle both single-sentence and two-sentence classification tasks. In the case of single-sentence tasks, the input sequence is duplicated to form two sequences.)
Input sentence w_x, w_y -> w_x_tilda, w_y_tilda by CoVe
specific representations x = biLSTM(f(w_x_tilda)), y = biLSTM(f(w_y_tilda))를 f = ReLu 사용하여 얻는다.
이 sequence를 시간축으로 쌓아서 X, Y 행렬을 얻는다
** 위 그림의 Encoder는 CoVe를 만들 때 사용하는 Encoder가 아닌, 원래 모델 자체의 Encoder
bi-attention mechanism을 통해 특성 추출
- A = XY^T, A_x = softmax(A), A_y = softmax(A^T) 를 통하여 특성 추출
- C_x = A_x^T * X, C_y = A_y^T * Y가 machanism의 결과
- attention mechanism 참조 : http://freesearch.pe.kr/archives/4724
Integrate
- 각 sequence representation에 conditioning information을 더해준다.
- 다시, alpha = softmax(X_ly * v1 + b1), beta = softmax(Y_lx * v2 + b2)로 특성 추출
- 이것을 X_ly, Y_lx에 각각 곱해줘 x_self, y_self를 만들어줌
pooling의 방식으로는 [max-pooling;mean-pooling;min-pooling;self]의 형태로 만들어줌
- pooling은 X_ly, Y_lx에 해주는 것

Question Answering with CoVe

전체적인 과정은 위와 같지만, ReLU가 아닌 tanh를 activation function으로 사용
one of the sequences is the document and the other the question in the question-document pair. These sequences are then fed through the coattention and dynamic decoder implemented as in the original Dynamic Coattention Network (DCN)

Datasets

dataset

Experiments

character n-gram embeddings 에서 더 효과적이었다.
같은 모델이라도 데이터에 따라서 성능이 다르다.
data set마다 가장 효과적인 모델이 달랐다.

Conclusion

CoVe, wonderful, best, the state of the art (거의 스티브잡스 수준;;)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learned in Translation: Contextualized Word Vectors(2017)

Abstract

Introduction

간략한 모델 설명

Related Work

Transfer Learning

Neural Machine Translation

Transfer Learning and Machine Translation

Transfer Learning in Computer Vision

Machine Translation Model

CoVe

Classification with CoVe

Question Answering with CoVe

Datasets

Experiments

Conclusion

Clone this wiki locally