# Word Embeddings: Intro to CBOW model, activation functions and working with Numpy

In this lecture notebook you will be given an introduction to the continuous bag-of-words model, its activation functions and some considerations when working with Numpy. 

Let's dive into it!

In [2]:
import numpy as np

# The continuous bag-of-words model
The CBOW model is based on a neural network, the architecture of which looks like the figure below, as you'll recall from the lecture.

## Activation functions
Let's start by implementing the activation functions, ReLU and softmax.

### ReLU
ReLU is used to calculate the values of the hidden layer, in the following formulas:

\begin{align}
 \mathbf{z_1} &= \mathbf{W_1}\mathbf{x} + \mathbf{b_1}  \tag{1} \\
 \mathbf{h} &= \mathrm{ReLU}(\mathbf{z_1})  \tag{2} \\
\end{align}

Let's fix a value for $\mathbf{z_1}$ as a working example.

In [4]:
np.random.seed(10)
z1 = 10*np.random.rand(5, 1) - 5
z1

array([[ 2.71320643],
       [-4.79248051],
       [ 1.33648235],
       [ 2.48803883],
       [-0.01492988]])

In [5]:
h = z1.copy()
h[h < 0] = 0
h

array([[2.71320643],
       [0.        ],
       [1.33648235],
       [2.48803883],
       [0.        ]])

In [6]:
def relu(z):
    h = z.copy()
    h[h < 0] = 0
    return h

In [7]:
z = np.array([
    [-1.25459881], [ 4.50714306], [ 2.31993942], [ 0.98658484], 
    [-3.4398136 ]])
relu(z)

array([[0.        ],
       [4.50714306],
       [2.31993942],
       [0.98658484],
       [0.        ]])

### Softmax
The second activation function that you need is softmax. This function is used to calculate the values of the output layer of the neural network, using the following formulas:

\begin{align}
 \mathbf{z_2} &= \mathbf{W_2}\mathbf{h} + \mathbf{b_2}   \tag{3} \\
 \mathbf{\hat y} &= \mathrm{softmax}(\mathbf{z_2})   \tag{4} \\
\end{align}

To calculate softmax of a vector $\mathbf{z}$, the $i$-th component of the resulting vector is given by:

$$ \textrm{softmax}(\textbf{z})_i = \frac{e^{z_i} }{\sum\limits_{j=1}^{V} e^{z_j} }  \tag{5} $$

Let's work through an example.

In [8]:
z = np.array([9, 8, 11, 10, 8.5])
z

array([ 9. ,  8. , 11. , 10. ,  8.5])

In [9]:
e_z = np.exp(z)
e_z

array([ 8103.08392758,  2980.95798704, 59874.1417152 , 22026.46579481,
        4914.7688403 ])

In [11]:
e_z_norm = e_z / e_z.sum()
e_z_norm

array([0.08276948, 0.03044919, 0.61158833, 0.22499077, 0.05020223])

In [12]:
def softmax(z):
    e_z = np.exp(z)
    e_z_norm = e_z / e_z.sum()
    return e_z_norm

In [13]:
softmax(z)

array([0.08276948, 0.03044919, 0.61158833, 0.22499077, 0.05020223])

In [15]:
assert(softmax(z).sum() == 1.)

## Dimensions: 1-D arrays vs 2-D column vectors

Before moving on to implement forward propagation, backpropagation, and gradient descent in the next lecture notebook, let's have a look at the dimensions of the vectors you've been handling until now.

Create a vector of length $V$ filled with zeros.

In [18]:
V = 5
x_array = np.zeros(V)
x_array

array([0., 0., 0., 0., 0.])

In [19]:
x_array.shape

(5,)

In [20]:
x_column_vector = x_array.copy()
x_column_vector.shape = (V, 1)
x_column_vector

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.]])

In [21]:
# or
x_column_vector = x_array.copy()
x_column_vector.shape = (-1, 1)
x_column_vector

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.]])