# Natural Language Processing
This notebook contains my notes from the *Natural Language Processing* talk by Brian Sletten at *UberConf 2019*. I've done ML on the vision side, but don't know much about NLP or even linguistics in general. I hoped this talk would explain some of the basic principles in NLP.

### Introduction
Some common NLP tasks:
* Search and retrieval
* Entity and relashionship extraction
* Linguistic structure
* Translation
* Generative content
* Question answering

Brian gave a list of common NLP frameworks:
* GATE
* LingPipe
* Apache OpenNLP
* UIMA
* Stanford Parser
* Mallet

The above libraries are older and use non-ML techniques.

From here, Brian transitioned into talking about several NLP models.

### Vector Space Model
This model is an old model used for search and retrieval. I have seen parts of this applied to vision algorithms as well. It involves coming up with a *vector representation* of documents, then applying a *similarity measure* to the vectors to decide how well they match up.

This can be used nicely in search where you have a query vector trying to find the best matches in a collection of documents.

#### Bag of words model
This model creates a single vector that tracks of occurrences of words in the document. The following are some variations you can use:
* 0/1 present/absent
* Word count in the document
* Term frequency (TF)
* Inverse document frequency (IDF)
* TF-IDF combination

This approach throws away information, namely the order of occurrences of words.

#### N-gram model
Like a bag of words, but looking at sequences of words ("john likes", "mary talks", etc.) as the dictionary instead of individual words.

#### Similarity Measures
Once you have the vectors defined, you can use a similarity measure to compare the vectors, such as the following:
* Dot product
* Cosine similarity

A downside of the vector space model is that it treats words as isolated indices. It doesn't understand ontologies like "a cat is an animal, so searching for animal should potentially return results with cats in them".

### Word Embeddings
We often Want to be able to group words that have similar meanings. Brian showed an example of the Word2Vec model. He went through a bit of information about how the Word2Vec RNN was trained. 

He also went through parts of the following tutorials:

* http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
* https://deeplearning4j.org/docs/latest/deeplearning4j-nlp-word2vec

#### Visualizing Embeddings
Brian showed a visualization of the Word2Vec high-dimensional word embeddings being visualized using t-SNE. He talked about the relationships between words that you can see in Word2Vec using math operations on the vectors (King - Man + Woman = Queen).

### Naive Bayes
Naive Bayes is a probabilistic classifier based on Bayes' Theorem. An advantage of this when used with NLP is that it doesn't require much training data to be effective. This is often used for document classification. The "naive" part comes because it assumes independence of the features.

Brian went through an example of training a spam classifier in R using Naive Bayes.

### Takeaways
* He used a lot of linguistics terms I didn't understand, learn about that a bit.
* Go through some of those tutorials myself
* Try out some of those open-source (non-ML) frameworks