----
Now that we have word vectors, what can we do?
----

Math with words!

<img src="http://rlv.zcache.com/math_is_awesome_poster-re24bd4726be24b82acc1d83fe7b4a8e4_cru_8byvr_512.jpg" style="width: 400px;"/>

----
Types of Word Math
----

1. Distance
2. Arithmetic
3. Clustering

---
1. Distance
---
<br>
<img src="http://blog.krecan.net/wp-content/family.png" style="width: 300px;"/>

The relationships between words can encoded as distance through the space.

Words that are related will be closer than unrelate words.

----
Ways to measure distance
----

<img src="http://i1.wp.com/dataaspirant.com/wp-content/uploads/2015/04/euclidean.png?w=600" style="width: 400px;"/>

<img src="http://i2.wp.com/dataaspirant.com/wp-content/uploads/2015/04/manhattan.png?w=600" style="width: 400px;"/>

<img src="http://i2.wp.com/dataaspirant.com/wp-content/uploads/2015/04/cosine.png?resize=610%2C468" style="width: 400px;"/>

[Read more here](http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/)


Cosine similarity is most often used in NLP.

Because cosine similarity is automatically normalized. It is bounded between -1 and 1, similar to a correlation.

<img src="https://upload.wikimedia.org/math/f/3/6/f369863aa2814d6e283f859986a1574d.png" style="width: 400px;"/>

Cosine values and their semantic meaning:

1 : word vectors mean exactly the same  
0 : word vectors are orthogonality (mathematically unrelated)  
‚àí1 : word vectors mean exactly opposite  

In [12]:
def cos_sim(v1, v2):
   "Calculate cosine similarity between vector 1 and 2"
   return v1.dot(v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

In [14]:
def test_cos_sim():
    v1 = np.array([1, 2, 3])
    assert cos_sim(v1, v1) == 1.0
    
    v2 = np.array([-1, -2, -3])
    assert cos_sim(v1, v2) == -1.0
    
    v3 = np.array([0, 3])
    v4 = np.array([4, 0])
    assert cos_sim(v3, v4) == 0.0
    
    v5 = np.array([3, 45, 7, 2])
    v6 = np.array([2, 54, 13, 15])
    assert round(cos_sim(v5, v6), 4) == round(0.97228425171235, 4)
    return "tests pass :)"
    
print(test_cos_sim())

tests pass :)


----
Words closest to ‚ÄúSweden‚Äù
----

<img src="http://deeplearning4j.org/img/sweden_cosine_distance.png" style="width: 400px;"/>

---
2. Arithmetic: Word analogies
---

The "Hello, world!" of word2vec:
> Man is to woman as king is to queen

$cos(w, king) - cos(w, man) + cos(w, woman) = cos(w, queen)$

![](http://multithreaded.stitchfix.com/assets/images/blog/vectors.gif)

[Demo](http://rare-technologies.com/word2vec-tutorial/#app)

----
Different paths through word2vec space encode different relationships.
----

### Plurals

![](images/plurals.png)  


### Verb Tense

![](images/verb.png)

### Country-Captial
![](images/country.png)

----
How can you use word2vec to build data products?
----

<img src="https://assets.toptal.io/uploads/blog/image/827/toptal-blog-image-1423052243609.jpg" style="width: 400px;"/>

When I worked at an employment website, I built a recommendation engine for job seekers. The job seeker would have a resume and we would suggest jobs for them. My goal was given a current job title, suggest a "better" job. This would increase platform engagement.
<br>
<br>
<details><summary>
What would be next logical career move from Babysitter?
</summary>
A Nanny. 
<br>
A Nanny is a Babysitter as Senior Engineer is to a Engineer.
</details>

----
3) Clustering
----

<img src="http://static1.squarespace.com/static/52165be2e4b046d1ac57778c/t/55f4a66de4b016fee4ec7595/1442096821668/left.gif?format=1500w" style="width: 400px;"/>

[Source](http://douglasduhaime.com/blog/clustering-semantic-vectors-with-python)

Use your favorite!

K-means is a good start.

---
Word2vec implementation
---

----
Code
----

1. [Google‚Äôs TensorFlow](https://www.tensorflow.org/versions/r0.8/tutorials/word2vec/index.html)
2. [Python‚Äôs Gensim package](https://radimrehurek.com/gensim/)  
3. [Google‚Äôs word2vec](https://code.google.com/p/word2vec/)  

----
Corpus (aka, data in NLP)
----

> "Data is the world's best regularizer"

You need __a lot__ of data.

100 billion words is good start üòâ. 100 million will work. 10 million is minimum.

---
Check for understanding
---
<br>
<details><summary>
How can we evaluate word2vec, especially if it is built on a custom corpus?
</summary>
<br>
<br>
Word2Vec is an unsupervised learning algorithm. Thus there is no good way to objectively evaluate the result. 
<br>
<br>
One possible method is to compare analogies performance with pretrained Google vectors.
</details>

----
Why is word2vec so popular?
----

- It is a preprocessing step that turns text into a numerical form that Deep Learning Nets and machine learning algorithms can use.

- Dense vectors outputs (in contrast to typical NLP sparse vectors)

- Can be trained and tuned on custom collection of words (corpora)

---
Summary
---

- Word2Vec is popular because it is straight forward to implement and creates dense embedding vectors.
- Word2Vec is a _relatively_ simple neural net with 1 input layer, 1 hidden layer, and 1 output layer.
- There are 2 common ways to represent context: 
    1. CBOW: given context, predict word
    2. skip-gram: given word, predict context
- After training, any vector operations can be applied to words. The most common operations are: 
    - Arithmetic (add, subtract, multiply, and divide)
    - Distance (typically using Cosine Similarity)
    - Clustering

----
Futher Study
-----

- Read [Deep or Shallow, NLP Is Breaking Out](http://dl.acm.org/citation.cfm?id=2874915)  
- Watch [Udacity's Deep Learning course](https://www.udacity.com/course/deep-learning--ud730)
- Watch Ali Ghodsi's word2veclectures [part 1](https://www.youtube.com/watch?v=TsEGsdVJjuA) and [part 2](https://www.youtube.com/watch?v=nuirUEmbaJU)
- Read [word2vec Parameter Learning Explained](http://www-personal.umich.edu/~ronxin/pdf/w2vexp.pdf)
- Neural Networks Demystified:
    + [Watch](https://www.youtube.com/watch?v=5MXp9UUkSmc) 
    + [Code](http://nbviewer.ipython.org/github/stephencwelch/Neural-Networks-Demysitifed/)

<br>
<br>
<br>

----