# seq2seq

seq2seq is a branding of the encoder-decoder sequence model introduced in 2014
- key contribution = variable input & output sequences
- can be used with LSTMs or attention

Sutskever et. al (2014) Sequence to Sequence Learning with Neural Networks - [arvix](https://arxiv.org/abs/1409.3215)
- 12 million Eng -> French sentences
- 10 days to train across 8 GPUs
- reverses the order of words in the source (not the target)

Applications
- translation, question answering, image captioning

One limitation of seq2seq is knowning the set of possible output elements in advance
- limits seq2seq to solve problems like sorting or travelling salesman


## Embeddings

Machines understand numbers 
- if we are working with text, we need to transform the text into discrete tokens (usually individual words)
- these tokens are then mapped to a vector representation

Special tokens include:
- `<EOS>` = end of sentence
- `<PAD>` = padding (if you require constant length sequences)

One-hot encoding is a type of embedding
- simple, sparse, no context infomation about position in the sequence

Frequency based embeddings include bag of words or TIDF
- simple, accounts for frequency, sparse, no context

The word vector representation is more common with neural nets:
- each word in corpus is mapped to a dense vector (often length 300 or more)
- can be learnt unsupervised by predicting current word - know as Continuous Bag-of-Words (CBOW)

Word2Vec (all Mikolov et. al (2013)) - Efficient Estimation of Word Representations in Vector Space + Linguistic Regularities in Continuous Space Word Representations (two papers)

<img src="assets/word-embed.png" width="50%" />
*From Linguistic Regularities in Continuous Space Word Representations*

These word embeddings can be reused for other tasks
- transfer learning in NLP = very important
- you can use pretrained word embeddings on your task (classification, clustering etc)
- driving lots of recent progress in NLP

Further reading
- What are Word Embeddings - ML Mastery - [text](https://machinelearningmastery.com/what-are-word-embeddings/)

## How does a word embedding work?

Let's build a simple pipeline.  First we need a corpus (we need to know this ahead of time):

In [None]:
#  see page 312 of https://www.physixfan.com/wp-content/files/GEBen.pdf
text = 'In short, in using chunked high-level models, we sacrifice determinism for simplicity'

print(text)

A naive tokenization based on whitespace:

In [None]:
tokens = text.split(' ')
tokens

Initialize the word vectors (one for each word).  These will be updated as we train a neural net that uses these vectors:

In [None]:
embedding_dim = 4 
vecs = np.random.normal(size=(len(tokens), embedding_dim))
vecs.shape

One hot encode the corpus:

In [None]:
!pip install scikit-learn -Uq
from sklearn.preprocessing import OneHotEncoder

enc = OneHotEncoder()
oh = enc.fit_transform(np.array(tokens).reshape(-1, 1))

Now make a sentence & one hot encode it:

In [None]:
sent = np.array('determinism for simplicity'.split(' ')).reshape(-1, 1)

oh = enc.transform(sent)

oh.todense()

And we can finally get our vectorized sentence representation:

In [None]:
oh * vecs

If you have a model that can't deal with variable length sequences, here it is common to average over all word vectors to get a fixed length representation.

## Encoder-decoder architecture

Now we understand a little about how to embed sequences into vectors, we can look at the downstream model architecture of the seq2seq model:
- process the input sequence in the encoder to a fixed length context vector
- the encoder output is like a sentence embedding
- use that context vector to initialize the decoder and generate the output sequence

<img src="assets/seq2seq.png" width="70%" />

*From Sutskever et. al (2014)*

<img src="assets/enc-dec.png" width="70%" />

*From Attention? Attention! - Lilian Wang*


## Resources used

Attention? Attention! - Lilian Wang - [text](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html)

What are Word Embeddings - ML Mastery - [text](https://machinelearningmastery.com/what-are-word-embeddings/)