# Chapter 3: Text Embeddings

## 3.1 Embeddings

Embeddings are procedures that turn input data into vectors.  Recall from chapter 1 that there are 2 kinds:
* Representational embeddings are designed by hang and thus human-interpretable.
* Operational embeddings are learned from data

In [28]:
import numpy as np

### 3.1.1 Embed by Hand: Representational Embeddings
**one-hot embeddings**
Every element of this vector embedding is zero, except at a single meaningful index.
<img src="images/oneHot.png/">
Given the embedding above:
The word _cat_ is represented by the vector $ [0, 1, 0, 0, 0, 0 ]$.
The word _sat_ is represented by the vector $ [0, 0, 1, 0,0, 0, ]$

In [32]:
def one_hot(lexicon, word):
    '''


    :param lexicon: A sentence string
    :param word: a string
    :return: np.array one-hot vector representing the word from the given lexicon
    '''
    lexicon = np.array(lexicon.split(' ')) # split sentence into words
    return np.multiply(lexicon == word, 1) # multiply turns True/False into integers 1, 0

lex = 'want to go swim at the beach'

one_hot(lex, 'beach' )

array([0, 0, 0, 0, 0, 0, 1])

Ideally, words that are related to each other will generally lie close to each other in vector space.

In [34]:
a = one_hot(lex, 'beach')
b = one_hot(lex, 'swim')
c = one_hot(lex, 'want')
print(f'distance between \"beach\" and \"swim\": {np.linalg.norm(a - b)}')
print(f'distance between \"beach\" and \"want\": {np.linalg.norm(a - c)}')


distance between "beach" and "swim": 1.4142135623730951
distance between "beach" and "want": 1.4142135623730951


### 3.1.1 Learning to Embed: Procedural embeddings

## 3.2 From Words to Vectors: word2vec

## 3.3 From Documents to Vector: doc2vec

## 3.4 Summary