# Lecture 18: 2023-04-04 Word Embeddings and Sequence Models

## Lecture Overview

* Word Embeddings (Word2Vec, GLoVe, FastText, and ELMo)
* Sequence Models (RNNs, LSTMs, and BiLSTMs)

## Word Embeddings

### Static Word Embeddings

1. Lack of contextual representations: Static embeddings assign a single vector representation to each word, which fails to capture the different meanings or senses of a word when used in different contexts.

2. Polysemy: Since static embeddings provide a single representation for each word, they cannot differentiate between multiple senses of words (words with multiple meanings). This can lead to misinterpretation and decreased performance in NLP tasks.

3. Limited vocabulary and out-of-vocabulary (OOV) words: Static embeddings are generated from a fixed vocabulary. Words not present in the training corpus are treated as out-of-vocabulary words, and the model struggles to provide meaningful representations for them.

4. Suboptimal handling of phrases and idiomatic expressions: Static word embeddings often struggle to capture the meanings of phrases or idiomatic expressions, as they are designed to work with individual words rather than multi-word units.

5. Static in nature: Once trained, the word embeddings remain static and do not evolve or adapt to new contexts or updates in language use. This can limit their performance in applications that require up-to-date language understanding.

6. No explicit morphological information: Static word embeddings do not explicitly account for morphological information, such as prefixes, suffixes, or inflections, which can be important for understanding word meanings.

### Dynamic or Contextual Word Embeddings

1. ELMo: Embeddings from Language Models
2. BERT: Bidirectional Encoder Representations from Transformers
3. GPT: Generative Pre-Training


## Caveat: Large Language Models BIG-Bench

* BIG-Bench: A benchmark for general-purpose language models [REPO](https://github.com/google/BIG-bench)



## Sequence data

* Sequence data is data that is ordered in some way. For example, a sequence of words in a sentence, a sequence of characters in a word, a sequence of pixels in an image, a sequence of notes in a song, a sequence of frames in a video, and so on.

* Unlike Bag-of-Words models, sequence models can take into account the order of the words in a sentence. This makes them ideal for tasks such as machine translation, speech recognition, and text summarization.

* We will follow the standard conventions and model sequence data as follows:

$$x^{(i)} = (x_1^{(i)}, x_2^{(i)}, \ldots, x_T^{(i)})$$

Where $T$ is the length of the sequence and $x_t^{(i)}$ is the $t^{th}$ element of the $i^{th}$ sequence in the training set.

## Different categories of sequence models

* one to one - input layer is a single value (vector or scalar), output layer is a single value (vector or scalar). For example, image classification is a one to one model.
* one to many - input layer is a single value (vector or scalar), output layer is a sequence. For example, image captioning is a one to many model.
* many to one - input layer is a sequence, output layer is a single value (vector or scalar). For example, sentiment analysis is a many to one model.
* many to many - input layer is a sequence, output layer is a sequence. For example, machine translation is a many to many model. Some variants of this model depend on the synchronization of the input and output sequences. For example, in video classification, the input and output sequences are synchronized, whereas in machine translation, the input and output sequences are not synchronized.

<center><img src="http://karpathy.github.io/assets/rnn/diags.jpeg" width="800" height="300"></center>

N.B.: a rectangle is a vector and arrows are functions. 

source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/



In [None]:
## RNN Architecture

```mermaid

````

# Resources (Continued Reading)

* [Andrej Karpathy's blog post on word embeddings](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
* [A Visual Introduction to Word Embeddings](https://jalammar.github.io/illustrated-word2vec/)
* [A Visual Introduction to Machine Learning](https://jalammar.github.io/visual-interactive-guide-basics-machine-learning-algorithms/)