# seq2seq

Sequence to sequence allows variable length input & output sequences
- major contribution = variable length sequences
- can be used with LSTMs or attention

Sutskever et. al (2014) Sequence to Sequence Learningwith Neural Networks - [arvix](https://arxiv.org/abs/1409.3215)
- reverses the order of words in the source (not the target)
- 12 million Eng -> French sentences
- 10 days to train across 8 GPUs

Applications
- translation, question answering, image captioning

One limitation of seq2seq is knowning the set of possible output elements in advance
- limits seq2seq to solve problems like sorting or travelling salesman
- Pointer Net (Vinyals et. al 2015) to solve this


## Embeddings

Text / sequence is first tokenized.  
- transforming the tokens (that form our corpus) into a vector

Special tokens include:
- `<EOS>` = end of sentence
- `<PAD>` = padding (if you require constant length sequences)

Frequency based embeddings include bag of words or TIDF:
- these are commonly used to generate features for non-neural network NLP approaches

It is more common with neural networks to use vector embeddings:
- each word in corpus is mapped to a vector
- can be learnt unsupervised by predicting current word - know as Continuous Bag-of-Words (CBOW)

Word2Vec (all Mikolov et. al (2013))
- Efficient Estimation of Word Representations in Vector Space
- Linguistic Regularities in Continuous Space Word Representations

![](assets/word-embed.png)
*From Linguistic Regularities in Continuous Space Word Representations*

Word embeddings can be reused
- transfer learning
- driving lots of recent progress in NLP

Further reading
- What are Word Embeddings - ML Mastery - [text](https://machinelearningmastery.com/what-are-word-embeddings/)

## Encoder-Decoder architecture

![](assets/seq2seq.png)

*From Sutskever et. al (2014)*

- encoder processes input seq using an LSTM to a fixed length embedding (context vector)
- decoder is initialized with context, then generates the output sequence

Encoder output = like a sentence embedding

Encoder -> fixed length context (aka sentence embedding, thought vector), that summarizes entire sentence

Decoder = initialized with context -> output

![](assets/enc-dec.png)


## Resources used

Attention? Attention! - Lilian Wang - [text](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html)

What are Word Embeddings - ML Mastery - [text](https://machinelearningmastery.com/what-are-word-embeddings/)