# Understanding Word Embeddings

Word embeddings are a revolutionary approach in natural language processing (NLP) that maps words or phrases from the vocabulary to vectors of real numbers.

This method represents words in a dense vector space where semantically similar words are mapped to nearby points. Word embeddings capture the essence of word meanings, relationships, and the context in which they appear, enabling machines to understand and process language much like humans do.

### Key Concepts of Word Embeddings

- **Semantic Similarity**: Words that are used in similar contexts are embedded closely together in the vector space. For example, "king" and "queen" are closer than "king" and "apple".
- **Dimensionality Reduction**: Embeddings reduce the dimensionality of the text data, converting sparse, high-dimensional vectors (like one-hot encoded vectors) into dense, lower-dimensional forms. This improves model efficiency and performance.

### Representative Learning Algorithms

Several algorithms are pivotal in generating word embeddings. Here are a few notable ones:

1. **Word2Vec**: Introduced by Mikolov et al., Word2Vec offers two architecture choices for learning embeddings: Continuous Bag of Words (CBOW) and Skip-Gram.

  - **CBOW** predicts a target word based on context words. The objective function it optimizes is:

  $$
      \max_{\theta} \sum_{w \in C} \log p(w|C; \theta)
  $$
  where \(C\) represents the context words, \(w\) is the target word, and \(\theta\) are the parameters of the model.

  - **Skip-Gram** works in the opposite manner, predicting context words from a target word. Its objective function is:

   $$
    \max_{\theta} \sum_{w \in C} \log p(C|w; \theta)
   $$

2. **GloVe (Global Vectors for Word Representation)**: GloVe is another widely-used method that focuses on word co-occurrences in the corpus. The model learns by minimizing the difference between the dot product of the embeddings of two words and the logarithm of their co-occurrence probability:
  $$
      J(\theta) = \sum_{i,j=1}^{V} f(X_{ij}) (w_i^T w_j + b_i + b_j - \log X_{ij})^2
  $$

  where $w_i$ and $w_j$ are the word embeddings for words $i$ and $j$, $b_i$ and $b_j$ are the biases for words $i$ and $j$, $X_{ij}$ is the number of times word $i$ appears in the context of word $j$, and $f$ is a weighting function applied to $X_{ij}$.

### Applications of Word Embeddings

Word embeddings are foundational for various NLP tasks, including text classification, sentiment analysis, machine translation, and more. By understanding and applying these embeddings, we can significantly enhance the performance of NLP models, making them more nuanced and effective in handling language.

In the next sections, we'll explore how to leverage these embeddings for text representation and delve into advanced neural network models for NLP tasks.


### Exercise 1: Exploring Word Embeddings through Analogy Tasks

* Use a pre-trained Word2Vec or GloVe model available in libraries like `gensim` or `spaCy`.
  * e.g. `word2vec-google-news-300`

* Solve the following analogy tasks using the model:

  ```
  king - man + woman = ?
  paris - france + italy = ?
  einstein - scientist + artist = ?
  ```

* Evaluate performance of the model on [Google dataset](http://download.tensorflow.org/data/questions-words.txt)

In [1]:
# import gensim.downloader as api

# Load the pre-trained Word2Vec model
# model = api.load('word2vec-google-news-300')

# implement here

### Exercise 2: Visualizing Word Embeddings

* Choose a diverse set of words
  * For example, words related to professions, animals, and emotions.
  * [Occupation list](https://gist.github.com/wsc/1083459)
  * [Emotion list](https://raw.githubusercontent.com/imsky/wordlists/master/adjectives/emotions.txt)

* Use the same pre-trained model to extract embeddings for the selected words.

* Use the available library (e.g. `sklearn.manifold`) to reduce the dimensionality of the embeddings to 2 dimensions. Fit the model to the embeddings and transform them into a 2D space.

* Plot the 2D embeddings. Label each point with its corresponding word to see how words cluster together.

In [2]:
# implement here