Word Embedding Variations
-----

How do we embed words that have multiple meanings?
-------

"bank" is related to money or a river

"burns" is a person or a verb

The straight-forward solution is to make two tokens:    $burns_{PERSON}$ and $burns_{VERB}$.

Each one will be tagged with part-of-speech (POS).

Let's play with a demo - https://demos.explosion.ai/sense2vec/

NLP Protip
------

keep everything lower case but at special character

burns is the verb

^burns is the person

By the end of this session, you should be able to:
----

- Describe word embedding extensions:  
    - Dependency-Based Word Embeddings
    - Machine Translation
- Explain how word2vec can be extended to paragraphs and documents (doc2vec)
- Identify applications for cutting-edge algorithms of Word Mover Distance and Thought Vectors
- Understand how to vectorize __everything__

Check for understanding
---

What is goal of word embeddings in Plain English?

Create a dense vector representation of words that models semantic meaning based on context.

Model the latent structure of language using literal string location.


Dependency-Based Word Embeddings
---

<center><img src="images/parse.png" width="700"/></center>

An alternative to the bag-of-words approach is to derive contexts based on the syntactic relations the word participates in.

Extend beyond skip-gram window, "weighted" by syntax (not just token distance).

Results
----

<center><img src="images/dependency_results.png" width="700"/></center>

__How are Bag-of-words (BoW) and depdency (deps) different?___

BOW generates counties or cities in Florida (meronyms: part of the whole).

Dependency generates other states "brothers and sisters" 

(cohyponyms: words that shares hyoponyms, belong to the same hypernym)

[Source](http://www.aclweb.org/anthology/P14-2050.pdf)

Machine Translation
---

<center><img src="images/machine_translation.png" width="700"/></center>

Language translations are linear transformation: rotations and scalings of the vector space.  

How are Machine Translations learned?
----

The transform matrix can be learned by bootstrapping from a small sample (manually labeled), then extend to entire language.

Steps:

1. Create a word embedding in both languages
2. Manually specify pairs (typically - simple concrete nouns)
3. Find the translation matrix
4. Apply translation matrix across entire language (including idiomatic)

Doc2Vec, the most powerful extension of word2vec
---

<center><img src="http://img5.picload.org/image/paagccr/doc2vec.png" width="700"/></center>

Naive doc2vec
-----

<center><img src="images/vectors.png" width="300"/></center>



<center><img src="images/naive_doc.png" width="700"/></center>

Doc2vec (aka paragraph2vec or sentence embeddings) 
-----

Modifies the word2vec algorithm to larger blocks of text, such as sentences, paragraphs or entire documents. 

<center><img src="images/overview_word.png" width="700"/></center>


<center><img src="images/overview_paragraph.png" width="700"/></center>

Every paragraph is mapped to a unique vector.

Both paragraph vector and word vectors are use to predict the next word in a context. 

Each additional context does not have be a fixed length (because it is just a pointer!)


, represented by a column in matrix D and every word is also mapped to a unique vector, represented by a column in matrix W . 
Additional parameters but the updates are sparse thus still efficient.

We have a MAJOR problem: TOO MANY CRAFT BEERS! 🍻
----

<center><img src="https://therooster.com/sites/default/files/styles/hero/public/beertaps-rooster.png?itok=gKvs_C7H" width="700"/></center>

Solution: Descri.beer using doc2vec
---

<img src="https://timebusinessblog.files.wordpress.com/2013/03/85632599-e1364519588629.jpg?w=360&h=240&crop=1" style="width: 400px;"/>

How do to make sense of 1.6M beer reviews?
-----

<center><img src="images/beer_space.jpg" width="700"/></center>



[Demo](http://descri.beer/)  
[Source](http://www.slideshare.net/BenEverson/describeer-demo)

Concept level document similarity
---

> The Sicilian gelato was extremely rich.

vs.

> The Italian ice-cream was very velvety.

The statements reference the __same__ idea but share __no__ words.

Word Mover’s Distance (WMD)
----

<center><img src="images/wmd_illustration_1.png" width="700"/></center>

Represent text documents as a weighted point cloud of embedded words. 

The distance between two text documents A and B is the minimum cumulative distance that words from document A need to travel to match exactly the point cloud of document B.

Earth mover’s distance metric (EMD)
-----

Word Mover’s Distance (WMD) is a special case of the [earth mover’s distance metric (EMD)](https://en.wikipedia.org/wiki/Earth_mover%27s_distance)

EMD is a method to evaluate __dissimilarity__ between two multi-dimensional distributions in a feature space. The EMD 'lifts' this distance from individual features to full distributions.

[Deep dive on EMD](http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/RUBNER/emd.htm)

Word Mover Distance Example
----

<center><img src="images/WMD_worked_example.png" width="700"/></center>

Word Mover Distance Example
----

State-of-the-art k-nearest neighbors (kNN) classification accuracy but slowest metric to compute.

(Another example of speed / accuracy trade-off)

[Source: From Word Embeddings To Document Distances](http://jmlr.org/proceedings/papers/v37/kusnerb15.pdf)

[Application to Data Science](http://tech.opentable.com/2015/08/11/navigating-themes-in-restaurant-reviews-with-word-movers-distance/)

everthing2vec
----

<center><img src="images/all_the_things.jpg" width="700"/></center>

Notable Vectorizations
-----

| Name | Embedding  | 
|:-------:|:------:| 
| [Char2Vec](http://arxiv.org/abs/1508.06615) | Character |
| [Word2Vec](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) | Word | 
| [GloVe](http://www-nlp.stanford.edu/pubs/glove.pdf) | Word | 
| [Doc2Vec](https://cs.stanford.edu/~quocle/paragraph_vector.pdf) | Sections of text |
| [Gene2Vec](https://davidcox143.github.io/Gene2vec/) | Functional unit of heredity |
| [Item2Vec](https://arxiv.org/abs/1603.04259) | Things to buy |
| [Image2Vec](https://arxiv.org/abs/1507.08818) | Image |
| [Video2Vec](https://www.dropbox.com/s/m99k5md8461xi0s/ICIP_Paper_Revised.pdf) | Video |


[Source](http://datascienceassn.org/content/table-xx2vec-algorithms)

Emjois have meaning based on order and context
------
<center><img src="images/emjois.png" width="700"/></center>

emjoi2vec
----

<center><img src="https://s3.amazonaws.com/instagram-static/engineering-blog/emoji-hashtags/tsne_map_tight.png" width="700"/></center>

What would be a business use for emjoi2vec?
------

Trust & Safety

[Image](images/https://s3.amazonaws.com/instagram-static/engineering-blog/emoji-hashtags/tsne_map_tight.png)  
[Source](http://instagram-engineering.tumblr.com/post/117889701472/emojineering-part-1-machine-learning-for-emoji)

Embed Everything in the Space
----

<center><img src="images/airbnb.png" width="700"/></center>

Thought Vectors
---

<img src="https://cdn-images-1.medium.com/max/2000/1*KYLrhDHqAAdQaJiN1G4ytA.jpeg" style="width: 400px;"/>

Geoffrey Hinton's, from Google, "Top Secret" new algorithm.

Instead of embedding words or documents in vector space, embed thoughts in vector space. Their features will represent how each thought relates to other thoughts. 

When Google farts 💨, the rest of the world 💩
------

It hasn't been released so it is mostly speculation. Keep your eye out for it.

[Thought2vec teaser](https://wtvox.com/robotics/google-is-working-on-a-new-algorithm-thought-vectors)  
[General introduction](http://deeplearning4j.org/thoughtvectors.html)<br>
[Skip-Thought Vectors paper](https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf)

Summary
---

- Word2vec is another perspective on Machine Translation, rotation and translation of embedded space.
- Other semantic meanings can be captured by using dependency parsing as context.
- Longer pieces of text can also be embedded into the same space as words (i.e., doc2vec).
- Given the properties of word2vec (e.g., large input, straightforward training, and vector output), it can be applied to a variety of problems.
    - emojis
    - thoughts
    - `<insert your idea here>`

<br>
<br> 
<br>

----