# seq2seq

Sequence to sequence allows variable length input & output sequences
- can be used with LSTMs or attention

Sutskever et. al (2014) Sequence to Sequence Learning with Neural Networks - [arvix](https://arxiv.org/abs/1409.3215)
- 12 million Eng -> French sentences
- 10 days to train across 8 GPUs
- reverses the order of words in the source (not the target)

Applications
- translation, question answering, image captioning

One limitation of seq2seq is knowning the set of possible output elements in advance
- limits seq2seq to solve problems like sorting or travelling salesman


## Embeddings

Machines understand numbers - part of any NLP pipeline is tokenization of the text into discrete classes
- transforming the tokens (that form our corpus) into a vector

Special tokens include:
- `<EOS>` = end of sentence
- `<PAD>` = padding (if you require constant length sequences)

Frequency based embeddings include bag of words or TIDF
- these are commonly used to generate features for non-neural network NLP approaches

It is more common with neural networks to use vector embeddings
- each word in corpus is mapped to a vector
- can be learnt unsupervised by predicting current word - know as Continuous Bag-of-Words (CBOW)

Word2Vec (all Mikolov et. al (2013)) - Efficient Estimation of Word Representations in Vector Space + Linguistic Regularities in Continuous Space Word Representations (two papers)

<img src="assets/word-embed.png" width="50%" />
*From Linguistic Regularities in Continuous Space Word Representations*

These word embeddings can be reused for other tasks
- transfer learning
- driving lots of recent progress in NLP

Further reading
- What are Word Embeddings - ML Mastery - [text](https://machinelearningmastery.com/what-are-word-embeddings/)


## Encoder-Decoder architecture

<img src="assets/seq2seq.png" width="50%" />

*From Sutskever et. al (2014)*

Encoder output is like a sentence embedding
- entire sequence is processed by an LSTM to produce a fixed length embedding (known as a context vector)
- decoder is initialized with this context, then generates the output sequence

<img src="assets/enc-dec.png" width="50%" />

*From Attention? Attention! - Lilian Wang*


## Resources used

Attention? Attention! - Lilian Wang - [text](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html)

What are Word Embeddings - ML Mastery - [text](https://machinelearningmastery.com/what-are-word-embeddings/)