PacktPublishing
diff --git a/‎Chapter07/7.01 Understanding Word2vec Model.ipynb‎
Lines changed: 74 additions & 0 deletions b/‎Chapter07/7.01 Understanding Word2vec Model.ipynb‎
Lines changed: 74 additions & 0 deletions
diff --git a/‎Chapter07/7.02 Continuous Bag of words.ipynb‎
Lines changed: 116 additions & 0 deletions b/‎Chapter07/7.02 Continuous Bag of words.ipynb‎
Lines changed: 116 additions & 0 deletions
@@ -0,0 +1,74 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Understanding Word2vec model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "Word2vec is one of the most popular and widely used models for generating the word\n",
+    "embeddings. What are word embeddings though? Word embeddings are the vector\n",
+    "representations of words in a vector space. The embedding generated by the word2vec\n",
+    "model captures the syntactic and semantic meanings of a word. Having a meaningful\n",
+    "vector representation of a word helps the neural network to understand the word better.\n",
+    "\n",
+    "For instance, let us consider the following text:\n",
+    "\n",
+    "  _Archie used to live in New York, he then moved to Santa Clara. He loves apples and strawberries._\n",
+    "\n",
+    "\n",
+    "Word2vec model generates the vector representation for each of the words in the above\n",
+    "text. If we project and visualize the vectors in embedding space, we can see how all the\n",
+    "similar words are plotted close together.\n",
+    "As you can see in the below figure, words apples and strawberries are plotted close\n",
+    "together, and New York and Santa Clara are plotted close together. They are plotted close\n",
+    "together because the word2vec model has learned that apples and strawberries are similar\n",
+    "entities i.e fruits, New York and Santa Clara are similar entities i.e cities and so their vectors\n",
+    "(embeddings) are similar to each other and which is why the distance between them is less. \n",
+    "\n",
+    "![image](images/1.png)\n",
+    "\n",
+    "Thus, with word2vec, we can learn the meaningful vector representation of a word which\n",
+    "helps the neural networks to understand what the word is about. Having a good\n",
+    "representation of a text would be useful in various tasks. Since our network can understand\n",
+    "the contextual and syntactic meaning of words, this will branch out to various use cases\n",
+    "such as text summarization, sentiment analysis, text generation and more.\n",
+    "\n",
+    "\n",
+    "Okay. But how do the word2vec model learns the word embeddings? There are two types\n",
+    "of word2vec model for learning the embeddings of a word,\n",
+    "1. CBOW (Continous Bag of Words)\n",
+    "2. Skip-gram\n",
+    "We will go into detail and learn how each of these models learns the vector representations\n",
+    "of a word. "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,116 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Continuous Bag of words\n",
+    "\n",
+    "\n",
+    "Let us say we have a neural network with one input, hidden and output layer. The goal of\n",
+    "the network is to predict a word given its surrounding words. The word which we are\n",
+    "trying to predict is called the target word and the word surrounding the target word is\n",
+    "called the context words.\n",
+    "\n",
+    "How many number of context words we use to predict the target word? We use a window\n",
+    "of size to choose the context word. If the window size is 2 then we use two words before\n",
+    "and two words after the target word as the context words.\n",
+    "\n",
+    "Let us consider the sentence 'The sun rises in the east' with the word 'rises' as the target word. \n",
+    "\n",
+    "\n",
+    "If we set\n",
+    "the window size =2 then we take the words 'the' and 'sun' which are the two words before\n",
+    "and 'in' and 'the' which are the two words after to the target word 'rises' as context words as\n",
+    "shown below:\n",
+    "\n",
+    "![image](images/1_1.png)\n",
+    "\n",
+    "So the input to the network is context words and output is a target word. How do we feed\n",
+    "these inputs to the network? Neural network accepts only numeric input so we cannot feed\n",
+    "the raw context words directly as an input to the network. Hence, we convert all the words\n",
+    "in the given sentence into a numeric form using one hot encoded technique as shown in the\n",
+    "following figure: \n",
+    "\n",
+    "![image](images/2_1.png)\n",
+    "\n",
+    "The architecture of CBOW model is shown in the below figure. As you can see we feed the\n",
+    "context words - the, sun, in, the as inputs to the network and it predicts the target\n",
+    "word rises as an output.\n",
+    "\n",
+    "\n",
+    "![image to be added](images/3_1.png)\n",
+    "\n",
+    "In the initial iteration, the network cannot predict the target word correctly. But over a\n",
+    "series of iterations, it learns to predict the correct target word using gradient descent. i.e\n",
+    "with gradient descent, we update the weights of the network and find the optimal weights\n",
+    "with which we can predict the correct target word.\n",
+    "\n",
+    "As we have one input, one hidden and one output layer as shown in the above figure. We\n",
+    "will have two weights. \n",
+    "\n",
+    "* Input layer to hidden layer weight $W$\n",
+    "* Hidden layer to output layer weight $W'$\n",
+    "\n",
+    "\n",
+    "During the training process, the network will try to find the optimal values for these two\n",
+    "sets of weights so that it can predict the correct target word.\n",
+    "It turns out that the optimal weights between the input to a hidden layer $W$ forms the vector representation of words.  They basically constitute the semantic meaning of the\n",
+    "words. So after training, we simply remove the output layer and take the weights between\n",
+    "input and hidden layer and assign it to the corresponding words.\n",
+    "\n",
+    "\n",
+    "If we look at the matrix it represents the embeddings for each of the words as shown\n",
+    "below. So, the embedding for the word sun is [0.0, 0.3,0.3,0.6,0.1 ].\n",
+    "\n",
+    "![image](images/4_1.png)\n",
+    "\n",
+    "__Thus, the CBOW model learns to predict the target word given context words. They learn to predict\n",
+    "the correct target word using gradient descent. During training, they update the weights of the\n",
+    "network through gradient descent and find the optimal weights with which we can predict the\n",
+    "correct target word. The optimal weights between input to hidden layer form the vector\n",
+    "representations of a word. So after training, we simply take the weights between the input to hidden\n",
+    "layer assign them as a vector to the corresponding words.__\n",
+    "\n",
+    "\n",
+    "Now that we have an intuitive understanding of CBOW model, we will go into detail and\n",
+    "learn mathematically how exactly the word embeddings are computed.\n",
+    "\n",
+    "We learned that weights between the input to the hidden layer basically form the vector\n",
+    "representation of the words. But how exactly CBOW model predicts the target word? How\n",
+    "does it learn the optimal weights using backpropagation? Let us inspect that in the next\n",
+    "section"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [conda env:universe]",
+   "language": "python",
+   "name": "conda-env-universe-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.5.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}