# Designing Data-Intensive Applications Notes

## Chapter 3: Text Embeddings

### Embeddings
* Embeddings are procedures for converting input data into vector representations.
    * A vector being like an arrow containing numbers

#### Embeddings by Direct Computation: Representational 
* One-hot Embedding:
    * Given a lexicon of N entries, every word is represented as a vector of N-1 zeroes with a single and distinct one
    * Ideally, words that are related together should lie close to each other
    * Note that this doesn't make snse from a linguistic pov as they all just differ by one bit

#### Learning to embed: procedural embeddings
* Keras Embeddings:
    * Trainable layers that deplay a matrix of weights that are optimized during training
    * They implicitly minimize loss functions via optimizing the representations they create
        * Base criterion: Maximimze the distinctiveness of the vector representations
    * Text -> Words via Tokenization -> Words as Integers via CountVectorizer -> Words as integers-> documents as vectors -> Embedding -> Word embeddings
    * This can be easily visualized using T-SNE which maps high-dimensional vectors to lower planes
    * Standard embeddings are in general randomly distributed
        * But you can train the encodings to be finetuned to encode semantic/lexical relationships!
        * Find the optimal embeddings for the words such that the provided document labeling is learned with maximum accuracy

### From words to vectors: Word2Vec
* Word2Vec takes word context information into account for establishing relations between words
* Keras worked to maximize the accuracy in accordance to sentiment labeling but not for establishing relations between similar contexts.
* Main functions:
    * Predicting words from contexts
    * Predicting contexts from words
* In order to setup context prediction, we need both + and - contexts
    * Negative sampling: Gather + examples and randomly generated negative examples
* In order to get truly meaningful and convincing results, the model should be trained on a large amount of data for many iterations
* You can also add embeddings to other embeddings!
    * Stanford Uni:
        * Word2Vec off the shelf
        * 6 billion words w/ 400K vocab
    * GloVe:
        * 2014 version of English Wikipedia
* Task-specific embeddings can be superior to general pre-trained embeddings under specific circumstances

### From documents to vectors: Doc2Vec
* Embeddings are not just restricted to the word level!
* You can create a sliding window over words in a doc to generate n-grams of a pre-specified size
    * The bector becomes co-trained with the words in the document/paragraph!
* Functionally works in a very similar fashion to Word2Vec

### Summary:
* Embeddings can be optimized during optimization for an object, like training a sentiment classifier.
* Pre-trained embeddings are not always beneficiary
* Sometimes it makes sense to use an on-the-fly embeddings that is specific to and optimized for the NLP task at hand
* Two examples of embedding algorithms are Word2Vec and Doc2Vec