Skip to content

Commit ab2e2f9

Browse files
authored
Code files added
1 parent b92e2d8 commit ab2e2f9

5 files changed

+1475
-0
lines changed
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Understanding Word2vec model"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"\n",
15+
"Word2vec is one of the most popular and widely used models for generating the word\n",
16+
"embeddings. What are word embeddings though? Word embeddings are the vector\n",
17+
"representations of words in a vector space. The embedding generated by the word2vec\n",
18+
"model captures the syntactic and semantic meanings of a word. Having a meaningful\n",
19+
"vector representation of a word helps the neural network to understand the word better.\n",
20+
"\n",
21+
"For instance, let us consider the following text:\n",
22+
"\n",
23+
" _Archie used to live in New York, he then moved to Santa Clara. He loves apples and strawberries._\n",
24+
"\n",
25+
"\n",
26+
"Word2vec model generates the vector representation for each of the words in the above\n",
27+
"text. If we project and visualize the vectors in embedding space, we can see how all the\n",
28+
"similar words are plotted close together.\n",
29+
"As you can see in the below figure, words apples and strawberries are plotted close\n",
30+
"together, and New York and Santa Clara are plotted close together. They are plotted close\n",
31+
"together because the word2vec model has learned that apples and strawberries are similar\n",
32+
"entities i.e fruits, New York and Santa Clara are similar entities i.e cities and so their vectors\n",
33+
"(embeddings) are similar to each other and which is why the distance between them is less. \n",
34+
"\n",
35+
"![image](images/1.png)\n",
36+
"\n",
37+
"Thus, with word2vec, we can learn the meaningful vector representation of a word which\n",
38+
"helps the neural networks to understand what the word is about. Having a good\n",
39+
"representation of a text would be useful in various tasks. Since our network can understand\n",
40+
"the contextual and syntactic meaning of words, this will branch out to various use cases\n",
41+
"such as text summarization, sentiment analysis, text generation and more.\n",
42+
"\n",
43+
"\n",
44+
"Okay. But how do the word2vec model learns the word embeddings? There are two types\n",
45+
"of word2vec model for learning the embeddings of a word,\n",
46+
"1. CBOW (Continous Bag of Words)\n",
47+
"2. Skip-gram\n",
48+
"We will go into detail and learn how each of these models learns the vector representations\n",
49+
"of a word. "
50+
]
51+
}
52+
],
53+
"metadata": {
54+
"kernelspec": {
55+
"display_name": "Python [default]",
56+
"language": "python",
57+
"name": "python2"
58+
},
59+
"language_info": {
60+
"codemirror_mode": {
61+
"name": "ipython",
62+
"version": 2
63+
},
64+
"file_extension": ".py",
65+
"mimetype": "text/x-python",
66+
"name": "python",
67+
"nbconvert_exporter": "python",
68+
"pygments_lexer": "ipython2",
69+
"version": "2.7.11"
70+
}
71+
},
72+
"nbformat": 4,
73+
"nbformat_minor": 2
74+
}
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Continuous Bag of words\n",
8+
"\n",
9+
"\n",
10+
"Let us say we have a neural network with one input, hidden and output layer. The goal of\n",
11+
"the network is to predict a word given its surrounding words. The word which we are\n",
12+
"trying to predict is called the target word and the word surrounding the target word is\n",
13+
"called the context words.\n",
14+
"\n",
15+
"How many number of context words we use to predict the target word? We use a window\n",
16+
"of size to choose the context word. If the window size is 2 then we use two words before\n",
17+
"and two words after the target word as the context words.\n",
18+
"\n",
19+
"Let us consider the sentence 'The sun rises in the east' with the word 'rises' as the target word. \n",
20+
"\n",
21+
"\n",
22+
"If we set\n",
23+
"the window size =2 then we take the words 'the' and 'sun' which are the two words before\n",
24+
"and 'in' and 'the' which are the two words after to the target word 'rises' as context words as\n",
25+
"shown below:\n",
26+
"\n",
27+
"![image](images/1_1.png)\n",
28+
"\n",
29+
"So the input to the network is context words and output is a target word. How do we feed\n",
30+
"these inputs to the network? Neural network accepts only numeric input so we cannot feed\n",
31+
"the raw context words directly as an input to the network. Hence, we convert all the words\n",
32+
"in the given sentence into a numeric form using one hot encoded technique as shown in the\n",
33+
"following figure: \n",
34+
"\n",
35+
"![image](images/2_1.png)\n",
36+
"\n",
37+
"The architecture of CBOW model is shown in the below figure. As you can see we feed the\n",
38+
"context words - the, sun, in, the as inputs to the network and it predicts the target\n",
39+
"word rises as an output.\n",
40+
"\n",
41+
"\n",
42+
"![image to be added](images/3_1.png)\n",
43+
"\n",
44+
"In the initial iteration, the network cannot predict the target word correctly. But over a\n",
45+
"series of iterations, it learns to predict the correct target word using gradient descent. i.e\n",
46+
"with gradient descent, we update the weights of the network and find the optimal weights\n",
47+
"with which we can predict the correct target word.\n",
48+
"\n",
49+
"As we have one input, one hidden and one output layer as shown in the above figure. We\n",
50+
"will have two weights. \n",
51+
"\n",
52+
"* Input layer to hidden layer weight $W$\n",
53+
"* Hidden layer to output layer weight $W'$\n",
54+
"\n",
55+
"\n",
56+
"During the training process, the network will try to find the optimal values for these two\n",
57+
"sets of weights so that it can predict the correct target word.\n",
58+
"It turns out that the optimal weights between the input to a hidden layer $W$ forms the vector representation of words. They basically constitute the semantic meaning of the\n",
59+
"words. So after training, we simply remove the output layer and take the weights between\n",
60+
"input and hidden layer and assign it to the corresponding words.\n",
61+
"\n",
62+
"\n",
63+
"If we look at the matrix it represents the embeddings for each of the words as shown\n",
64+
"below. So, the embedding for the word sun is [0.0, 0.3,0.3,0.6,0.1 ].\n",
65+
"\n",
66+
"![image](images/4_1.png)\n",
67+
"\n",
68+
"__Thus, the CBOW model learns to predict the target word given context words. They learn to predict\n",
69+
"the correct target word using gradient descent. During training, they update the weights of the\n",
70+
"network through gradient descent and find the optimal weights with which we can predict the\n",
71+
"correct target word. The optimal weights between input to hidden layer form the vector\n",
72+
"representations of a word. So after training, we simply take the weights between the input to hidden\n",
73+
"layer assign them as a vector to the corresponding words.__\n",
74+
"\n",
75+
"\n",
76+
"Now that we have an intuitive understanding of CBOW model, we will go into detail and\n",
77+
"learn mathematically how exactly the word embeddings are computed.\n",
78+
"\n",
79+
"We learned that weights between the input to the hidden layer basically form the vector\n",
80+
"representation of the words. But how exactly CBOW model predicts the target word? How\n",
81+
"does it learn the optimal weights using backpropagation? Let us inspect that in the next\n",
82+
"section"
83+
]
84+
},
85+
{
86+
"cell_type": "code",
87+
"execution_count": null,
88+
"metadata": {
89+
"collapsed": true
90+
},
91+
"outputs": [],
92+
"source": []
93+
}
94+
],
95+
"metadata": {
96+
"kernelspec": {
97+
"display_name": "Python [conda env:universe]",
98+
"language": "python",
99+
"name": "conda-env-universe-py"
100+
},
101+
"language_info": {
102+
"codemirror_mode": {
103+
"name": "ipython",
104+
"version": 3
105+
},
106+
"file_extension": ".py",
107+
"mimetype": "text/x-python",
108+
"name": "python",
109+
"nbconvert_exporter": "python",
110+
"pygments_lexer": "ipython3",
111+
"version": "3.5.4"
112+
}
113+
},
114+
"nbformat": 4,
115+
"nbformat_minor": 2
116+
}

0 commit comments

Comments
 (0)