# Negative sample for word2vec
Compute cost, gradients for negative-sample in word2vec model
* parameters: predicted, outputVectors, target, indices
* target, indices are the indices into outputVectors
* K is the length of indices

\begin{align}
J_{neg-sample}(o, v_{c}, U) & = -\log(\delta(u_{o}^\top v_{c}) - \sum_{k=1}^{K} \log(\delta(-u_{k}^\top v_{c})) \\
\frac {\partial{J}} {\partial{v_{c}}} & = (\delta(u_{o}^\top v_{c})-1) u_{o} - \sum_{k=1}^{K} (\delta(-u_{k}^\top v_{c})-1) u_{k} \\
\frac {\partial{J}} {\partial{u_{o}}} & = (\delta(u_{o}^\top v_{c})-1) v_{c} \\
\frac {\partial{J}} {\partial{u_{k}}} & = (\delta(u_{k}^\top v_{c})-1) v_{c}, \quad \text {for all}\; k = 1,2,\dots, K
\end{align}

In the following code, **predicted** is vector $v_{c}$, **outputVectors** is matrix $U$, **target** is subscript $o$, **cost** is $J_{neg-sample}(o, v_{c}, U)$, **gradPred** is $\frac {\partial{J}} {\partial{v_{c}}}$, and **grad** are $\frac {\partial{J}} {\partial{u_{o}}}$ and $\frac {\partial{J}} {\partial{u_{k}}}$ .

In [None]:
import numpy as np
import random
from q2_sigmoid import sigmoid

In [None]:
predicted = np.array([0.26726124, 0.53452248, 0.80178373])
predicted.shape

In [None]:
outputVectors = np.array([
       [0.26726124, 0.53452248, 0.80178373],
       [0.45584231, 0.56980288, 0.68376346],
       [0.50257071, 0.57436653, 0.64616234],
       [0.52342392, 0.57576631, 0.62810871]])
outputVectors.shape

In [None]:
# target, indices are the indices into outputVectors
# make sure K < len(outputVectors) and indices are not equal to target
target, indices = 1, [0, 2]
K = len(indices)

In [None]:
gradPred = np.zeros(predicted.shape)
gradPred.shape

In [None]:
grad = np.zeros(outputVectors.shape)
grad.shape

In [None]:
np.dot(outputVectors[target], predicted)

In [None]:
out_sigmoid = sigmoid(np.dot(outputVectors[target], predicted))
out_sigmoid

In [None]:
cost = -np.log(out_sigmoid)

In [None]:
outputVectors[target].shape

In [None]:
(out_sigmoid - 1) * outputVectors[target]

In [None]:
gradPred += (out_sigmoid - 1) * outputVectors[target]
gradPred

In [None]:
(out_sigmoid - 1) * predicted

In [None]:
grad[target] += (out_sigmoid - 1) * predicted
grad

In [None]:
for k in range(K):
    idx = indices[k]
    out_sigmoid = sigmoid(-np.dot(outputVectors[idx], predicted))
    
    cost -= np.log(out_sigmoid)
    gradPred -= (out_sigmoid - 1) * outputVectors[idx]
    grad[idx] -= (out_sigmoid - 1) * predicted

In [None]:
gradPred

In [None]:
grad