# Word Embeddings

Notebook ini terdiri dari 2 bagian: persiapan data dan model continuous bag-of-words (CBOW).

In [1]:
import sys
!pip install emoji==1.4.1
#!{sys.executable} -m pip install emoji

Collecting emoji==1.4.1
  Downloading emoji-1.4.1.tar.gz (185 kB)
[K     |████████████████████████████████| 185 kB 2.1 MB/s eta 0:00:01
[?25hBuilding wheels for collected packages: emoji
  Building wheel for emoji (setup.py) ... [?25ldone
[?25h  Created wheel for emoji: filename=emoji-1.4.1-py3-none-any.whl size=186394 sha256=a3500354171187e3e5a3735626ca9c993df2fd3c10956f06685ae418a6770482
  Stored in directory: /Users/hendriksugiarto/Library/Caches/pip/wheels/66/98/c2/683c7cb1a5449f5d0936d0b65fe1ddd5ebae8e45638a0cd5c0
Successfully built emoji
Installing collected packages: emoji
  Attempting uninstall: emoji
    Found existing installation: emoji 2.8.0
    Uninstalling emoji-2.8.0:
      Successfully uninstalled emoji-2.8.0
Successfully installed emoji-1.4.1


In [2]:
import re
import nltk
from nltk.tokenize import word_tokenize
import emoji
import numpy as np

from utils2 import get_dict

nltk.download('punkt')  # download pre-trained Punkt tokenizer for English

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/hendriksugiarto/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

# Data preparation

Bagian ini terdiri dari:
- Membersihkan dan tokenisasi corpus
- Mempersiapkan context words dan center word untuk training data set CBOW. 
- Menciptakan representasi vektor sederhana dari context words (features) and center words (targets).

## Cleaning and tokenization


In [3]:
# Define a corpus
corpus = 'Who ❤️ "word embeddings" in 2022? I do 🙂 !!!'

In [4]:
print(f'Corpus:  {corpus}') # Print original corpus
data = re.sub(r'[,!?;-]+', '.', corpus) # Do the substitution
print(f'After cleaning punctuation:  {data}') # Print cleaned corpus

Corpus:  Who ❤️ "word embeddings" in 2022? I do 🙂 !!!
After cleaning punctuation:  Who ❤️ "word embeddings" in 2022. I do 🙂 .


Gunakan NLTK's tokenization engine untuk memisahkan corpus menjadi individual tokens.

In [5]:
print(f'Initial string:  {data}') # Print cleaned corpus
data = nltk.word_tokenize(data) # Tokenize the cleaned corpus
print(f'After tokenization:  {data}') # Print the tokenized version of the corpus

Initial string:  Who ❤️ "word embeddings" in 2022. I do 🙂 .
After tokenization:  ['Who', '❤️', '``', 'word', 'embeddings', "''", 'in', '2022', '.', 'I', 'do', '🙂', '.']


Buang angka dan tanda baca (selain titik), lalu ubah semua ke huruf kecil

In [6]:
print(f'Initial list of tokens:  {data}') # Print the tokenized version of the corpus

# Filter tokenized corpus using list comprehension
data = [ ch.lower() for ch in data
         if ch.isalpha()
         or ch == '.'
         or emoji.get_emoji_regexp().search(ch)
       ]
print(f'After cleaning:  {data}') # Print the tokenized and filtered version of the corpus

Initial list of tokens:  ['Who', '❤️', '``', 'word', 'embeddings', "''", 'in', '2022', '.', 'I', 'do', '🙂', '.']
After cleaning:  ['who', '❤️', 'word', 'embeddings', 'in', '.', 'i', 'do', '🙂', '.']


### Tugas

Buatlah fungsi yang berisi semua proses cleaning dan tokenization di atas

In [7]:
# Define the 'tokenize' function that will include the steps previously seen
def tokenize(corpus):
    data = re.sub(r'[,!?;-]+', '.', corpus) # Do the substitution
    data = nltk.word_tokenize(data) # Tokenize the cleaned corpus
    data = [ ch.lower() for ch in data
         if ch.isalpha()
         or ch == '.'
         or emoji.get_emoji_regexp().search(ch)
       ]
    return data

Cobalah fungsi diatas pada kalimat: "I am happy because I am learning"

In [8]:
corpus = 'I am happy because I am learning' # Define new corpus
print(f'Corpus:  {corpus}') # Print new corpus
words = tokenize(corpus) # Save tokenized version of corpus into 'words' variable
print(f'Words (tokens):  {words}') # Print the tokenized version of the corpus

Corpus:  I am happy because I am learning
Words (tokens):  ['i', 'am', 'happy', 'because', 'i', 'am', 'learning']


Cobalah dengan menggunakan kalimatmu sendiri.

In [9]:
tokenize("Mari menulis kata apapun ::: !") # Run this with any sentence

['mari', 'menulis', 'kata', 'apapun', '.']

## Sliding window of words

Sekarang kamu dapat mencoba menggeser window of words. Untuk setiap window, carilah center word dan context words.

### Tugas
Buatlah fungsi `get_windows` yang berisi operasi ini

In [10]:
# Define the 'get_windows' function
def get_windows(words, C):
    i = C
    while i < len(words) - C:
        center_word = words[i]
        context_words = words[(i-C):i]+words[(i+1):(i+C+1)]
        yield context_words, center_word
        i += 1

In [11]:
# Print 'context_words' and 'center_word' for the new corpus with a 'context half-size' of 2
for x, y in get_windows(
            ['i', 'am', 'happy', 'because', 'i', 'am', 'learning'],
            2
        ):
    print(f'{x}\t{y}')

['i', 'am', 'because', 'i']	happy
['am', 'happy', 'i', 'am']	because
['happy', 'because', 'am', 'learning']	i


Contoh pertama terdiri dari 
- context words: "i", "am", "because", "i",
- center word: "happy".

Cobalah gunakan kata-katamu sendiri

In [12]:
# Print 'context_words' and 'center_word' for any sentence with a 'context half-size' of 1
for x, y in get_windows(tokenize("Now it's your turn: sekarang kita sedang belajar NLP!"), 2):
    print(f'{x}\t{y}')

['now', 'it', 'turn', 'sekarang']	your
['it', 'your', 'sekarang', 'kita']	turn
['your', 'turn', 'kita', 'sedang']	sekarang
['turn', 'sekarang', 'sedang', 'belajar']	kita
['sekarang', 'kita', 'belajar', 'nlp']	sedang
['kita', 'sedang', 'nlp', '.']	belajar


## Mengubah kata menjadi vektor

In [13]:
# Get 'word2Ind' and 'Ind2word' dictionaries for the tokenized corpus
word2Ind, Ind2word = get_dict(words)

In [14]:
# Print 'word2Ind' dictionary
word2Ind

{'am': 0, 'because': 1, 'happy': 2, 'i': 3, 'learning': 4}

In [15]:
# Print value for the key 'i' within word2Ind dictionary
print("Index of the word 'i':  ",word2Ind['i'])

Index of the word 'i':   3


In [16]:
# Print 'Ind2word' dictionary
Ind2word

{0: 'am', 1: 'because', 2: 'happy', 3: 'i', 4: 'learning'}

In [17]:
# Print value for the key '2' within Ind2word dictionary
print("Word which has index 2:  ",Ind2word[2] )

Word which has index 2:   happy


In [18]:
V = len(word2Ind) # Save length of word2Ind dictionary into the 'V' variable
print("Size of vocabulary: ", V) # Print length of word2Ind dictionary

Size of vocabulary:  5


### Vektor one-hot


cobalah dapatkan indeks dari kata "happy"

In [19]:
n = word2Ind['happy'] # Save index of word 'happy' into the 'n' variable
n

2

Buatlah vektor berukuran V dengan nilai 0

In [20]:
center_word_vector = np.zeros(V) # Create vector with the same length as the vocabulary, filled with zeros
center_word_vector

array([0., 0., 0., 0., 0.])

Cek ukurannya

In [21]:
len(center_word_vector) == V # Assert that the length of the vector is the same as the size of the vocabulary

True

Ganti elemen ke-$n$ dengan nilai 1.

In [22]:
center_word_vector[n] = 1 # Replace element number 'n' with a 1

Inilah vektor one-hot nya

In [23]:
center_word_vector

array([0., 0., 1., 0., 0.])

### Tugas

Buatlah fungsi berisi operasi diatas

In [24]:
# Define the 'word_to_one_hot_vector' function that will include the steps previously seen
def word_to_one_hot_vector(word, word2Ind, V):
    # BEGIN your code here
    one_hot_vector = np.zeros(V)
    one_hot_vector[word2Ind[word]] = 1
    # END your code here
    return one_hot_vector

In [25]:
word_to_one_hot_vector('happy', word2Ind, V) # Print output of 'word_to_one_hot_vector' function for word 'happy'

array([0., 0., 1., 0., 0.])

### Tugas

Carilah 1-hot vektor dari kata "learning"

In [26]:
# BEGIN your code here
word_to_one_hot_vector('learning', word2Ind, V) # Print output of 'word_to_one_hot_vector' function for word 'learning'
# END your code here

array([0., 0., 0., 0., 1.])

Expected output:

    array([0., 0., 0., 0., 1.])

### Mendapatkan vektor dari context word

Hitung rata-rata vektor one-hot


In [27]:
context_words = ['i', 'am', 'because', 'i'] # Define list containing context words

In [28]:
# Create one-hot vectors for each context word using list comprehension
context_words_vectors = [word_to_one_hot_vector(w, word2Ind, V) for w in context_words]
context_words_vectors

[array([0., 0., 0., 1., 0.]),
 array([1., 0., 0., 0., 0.]),
 array([0., 1., 0., 0., 0.]),
 array([0., 0., 0., 1., 0.])]

In [29]:
np.mean(context_words_vectors, axis=0) # Compute mean of the vectors using numpy

array([0.25, 0.25, 0.  , 0.5 , 0.  ])

### Tugas

Buatlah fungsi `context_words_to_vector` berisi semua operasi diatas

In [30]:
# Define the 'context_words_to_vector' function that will include the steps previously seen
def context_words_to_vector(context_words, word2Ind, V):
    # BEGIN your code here
    context_words_vectors = [word_to_one_hot_vector(w, word2Ind, V) for w in context_words]
    context_words_vectors = np.mean(context_words_vectors, axis=0) # Compute mean of the vectors using numpy
    # END your code here
    return context_words_vectors

In [31]:
# Print output of 'context_words_to_vector' function for context words: 'i', 'am', 'because', 'i'
context_words_to_vector(['i', 'am', 'because', 'i'], word2Ind, V) 

array([0.25, 0.25, 0.  , 0.5 , 0.  ])

### Tugas
Apakah representasi vektor dari context words "am happy i am"?

In [32]:
# BEGIN your code here
context_words_to_vector(['am', 'happy', 'i', 'am'], word2Ind, V) 
# END your code here

array([0.5 , 0.  , 0.25, 0.25, 0.  ])

Expected output:

    array([0.5 , 0.  , 0.25, 0.25, 0.  ])


## Training set

Gabungkan seluruh fungsi diatas

In [33]:
words

['i', 'am', 'happy', 'because', 'i', 'am', 'learning']

In [34]:
# Print vectors associated to center and context words for corpus
for context_words, center_word in get_windows(words, 2):  # reminder: 2 is the context half-size
    print(f'Context words:  {context_words} -> {context_words_to_vector(context_words, word2Ind, V)}')
    print(f'Center word:  {center_word} -> {word_to_one_hot_vector(center_word, word2Ind, V)}')
    print()

Context words:  ['i', 'am', 'because', 'i'] -> [0.25 0.25 0.   0.5  0.  ]
Center word:  happy -> [0. 0. 1. 0. 0.]

Context words:  ['am', 'happy', 'i', 'am'] -> [0.5  0.   0.25 0.25 0.  ]
Center word:  because -> [0. 1. 0. 0. 0.]

Context words:  ['happy', 'because', 'am', 'learning'] -> [0.25 0.25 0.25 0.   0.25]
Center word:  i -> [0. 0. 0. 1. 0.]



In [35]:
# Define the generator function 'get_training_example'
def get_training_example(words, C, word2Ind, V):
    for context_words, center_word in get_windows(words, C):
        yield context_words_to_vector(context_words, word2Ind, V), word_to_one_hot_vector(center_word, word2Ind, V)

In [36]:
# Print vectors associated to center and context words for corpus using the generator function
for context_words_vector, center_word_vector in get_training_example(words, 2, word2Ind, V):
    print(f'Context words vector:  {context_words_vector}')
    print(f'Center word vector:  {center_word_vector}')
    print()

Context words vector:  [0.25 0.25 0.   0.5  0.  ]
Center word vector:  [0. 0. 1. 0. 0.]

Context words vector:  [0.5  0.   0.25 0.25 0.  ]
Center word vector:  [0. 1. 0. 0. 0.]

Context words vector:  [0.25 0.25 0.25 0.   0.25]
Center word vector:  [0. 0. 0. 1. 0.]



# The continuous bag-of-words model

Bagian ini terdiri dari:
- Fungsi aktivasi
- Forward propagation.
- Cross-entropy loss.
- Backpropagation.
- Gradient descent.
- Word embedding

## Activation functions

### ReLU


\begin{align}
 \mathbf{z_1} &= \mathbf{W_1}\mathbf{x} + \mathbf{b_1}  \tag{1} \\
 \mathbf{h} &= \mathrm{ReLU}(\mathbf{z_1})  \tag{2} \\
\end{align}


In [37]:
np.random.seed(10) # Define a random seed so all random outcomes can be reproduced
z_1 = 10*np.random.rand(5, 1)-5 # Define a 5X1 column vector using numpy
z_1

array([[ 2.71320643],
       [-4.79248051],
       [ 1.33648235],
       [ 2.48803883],
       [-0.01492988]])

ReLU berarti semua nilai negatif menjadi nol.


In [38]:
h = z_1.copy() # Create copy of vector and save it in the 'h' variable

In [39]:
h < 0 # Determine which values met the criteria (this is possible because of vectorization)

array([[False],
       [ True],
       [False],
       [False],
       [ True]])

In [40]:
h[h < 0] = 0 # Slice the array or vector. This is the same as applying ReLU to it

In [41]:
h # Print the vector after ReLU

array([[2.71320643],
       [0.        ],
       [1.33648235],
       [2.48803883],
       [0.        ]])

### Tugas
Buatlah fungsi ReLU dengan operasi diatas

In [42]:
# Define the 'relu' function that will include the steps previously seen
def relu(z):
    # BEGIN your code here
    result = z.copy() # Create copy of vector and save it in the 'h' variable
    result[result < 0] = 0 # Slice the array or vector. This is the same as applying ReLU to it
    # END your code here
    
    return result

In [43]:
# Define a new vector and save it in the 'z' variable
z = np.array([[-1.25459881], [ 4.50714306], [ 2.31993942], [ 0.98658484], [-3.4398136 ]])
relu(z) # Apply ReLU to it

array([[0.        ],
       [4.50714306],
       [2.31993942],
       [0.98658484],
       [0.        ]])

Expected output:

    array([[0.        ],
           [4.50714306],
           [2.31993942],
           [0.98658484],
           [0.        ]])

### Softmax



$$ \textrm{softmax}(\textbf{z})_i = \frac{e^{z_i} }{\sum\limits_{j=1}^{V} e^{z_j} }  \tag{5} $$



In [44]:
z = np.array([9, 8, 11, 10, 8.5]) # Define a new vector and save it in the 'z' variable
z

array([ 9. ,  8. , 11. , 10. ,  8.5])

In [45]:
e_z = np.exp(z) # Save exponentials of the values in a new vector
e_z

array([ 8103.08392758,  2980.95798704, 59874.1417152 , 22026.46579481,
        4914.7688403 ])

In [46]:
sum_e_z = np.sum(e_z) # Save the sum of the exponentials
sum_e_z

97899.41826492078

In [47]:
e_z[0]/sum_e_z # Print softmax value of the first element in the original vector

0.08276947985173956

### Tugas
Buatlah fungsi softmax melalui operasi diatas

In [48]:
# Define the 'softmax' function that will include the steps previously seen
def softmax(z):
    # BEGIN your code here
    e_z = np.exp(z)
    sum_e_z = np.sum(e_z)
    return e_z / sum_e_z
    # END your code here

In [49]:
print(softmax([9, 8, 11, 10, 8.5])) # Print softmax values for original vector
np.sum(softmax([9, 8, 11, 10, 8.5]))

[0.08276948 0.03044919 0.61158833 0.22499077 0.05020223]


1.0

Expected output:

    array([0.08276948, 0.03044919, 0.61158833, 0.22499077, 0.05020223])

## Dimensi Tensor

In [50]:
x_array = np.zeros(V) # Assert that the sum of the softmax values is equal to 1
x_array

array([0., 0., 0., 0., 0.])

In [51]:
x_array.shape 

(5,)

In [52]:
x_column_vector = x_array.copy() # Copy vector
x_column_vector.shape = (V, 1)  # Reshape copy of vector # alternatively ... = (x_array.shape[0], 1)
x_column_vector

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.]])

In [53]:
x_column_vector.shape

(5, 1)

## Forward propagation

In [54]:
N = 3 # Define the size of the word embedding vectors and save it in the variable 'N'

### Inisiasi bobot dan bias

In [55]:
# Define first matrix of weights
W1 = np.array([[ 0.41687358,  0.08854191, -0.23495225,  0.28320538,  0.41800106],
               [ 0.32735501,  0.22795148, -0.23951958,  0.4117634 , -0.23924344],
               [ 0.26637602, -0.23846886, -0.37770863, -0.11399446,  0.34008124]])

# Define second matrix of weights
W2 = np.array([[-0.22182064, -0.43008631,  0.13310965],
               [ 0.08476603,  0.08123194,  0.1772054 ],
               [ 0.1871551 , -0.06107263, -0.1790735 ],
               [ 0.07055222, -0.02015138,  0.36107434],
               [ 0.33480474, -0.39423389, -0.43959196]])

# Define first vector of biases
b1 = np.array([[ 0.09688219],
               [ 0.29239497],
               [-0.27364426]])

# Define second vector of biases
b2 = np.array([[ 0.0352008 ],
               [-0.36393384],
               [-0.12775555],
               [-0.34802326],
               [-0.07017815]])

In [56]:
# BEGIN your code here
print(f'V (vocabulary size): {V}') 
print(f'N (embedding size / size of the hidden layer): {N}')
print(f'size of W1: {W1.shape} (NxV)')
print(f'size of b1: {b1.shape} (Nx1)')
print(f'size of W2: {W1.shape} (VxN)')
print(f'size of b2: {b2.shape} (Vx1)')
# END your code here

V (vocabulary size): 5
N (embedding size / size of the hidden layer): 3
size of W1: (3, 5) (NxV)
size of b1: (3, 1) (Nx1)
size of W2: (3, 5) (VxN)
size of b2: (5, 1) (Vx1)


### Contoh Training

In [57]:
training_examples = get_training_example(words, 2, word2Ind, V) # Save generator object in the 'training_examples' variable with the desired arguments

In [58]:
x_array, y_array = next(training_examples) # Get first values from generator

In [59]:
x_array

array([0.25, 0.25, 0.  , 0.5 , 0.  ])

In [60]:
y_array

array([0., 0., 1., 0., 0.])

Ubahlah vektor menjadi matriks

In [61]:
x = x_array.copy() # Copy vector
x.shape = (V, 1) # Reshape it
print('x')
print(x)
print()

y = y_array.copy()
y.shape = (V, 1)
print('y')
print(y)

x
[[0.25]
 [0.25]
 [0.  ]
 [0.5 ]
 [0.  ]]

y
[[0.]
 [0.]
 [1.]
 [0.]
 [0.]]


### Nilai dari hidden layer

\begin{align}
 \mathbf{z_1} = \mathbf{W_1}\mathbf{x} + \mathbf{b_1}  \tag{1} \\
 \mathbf{h} = \mathrm{ReLU}(\mathbf{z_1})  \tag{2} \\
\end{align}

Pertama hitung nilai $\mathbf{z_1}$.

In [62]:
z1 = np.dot(W1, x) + b1 # Compute z1 (values of first hidden layer before applying the ReLU function)

In [63]:
z1

array([[ 0.36483875],
       [ 0.63710329],
       [-0.3236647 ]])

Hitung ReLU dari $\mathbf{z_1}$ untuk memperoleh $\mathbf{h}$

In [64]:
h = relu(z1) # Compute h (z1 after applying ReLU function)
h

array([[0.36483875],
       [0.63710329],
       [0.        ]])

### Nilai dari output layer


\begin{align}
 \mathbf{z_2} &= \mathbf{W_2}\mathbf{h} + \mathbf{b_2}   \tag{3} \\
 \mathbf{\hat y} &= \mathrm{softmax}(\mathbf{z_2})   \tag{4} \\
\end{align}

**Pertama, hitung $\mathbf{z_2}$.**

In [65]:
z2 = np.dot(W2, h) + b2 # Compute z2 (values of the output layer before applying the softmax function)
z2

array([[-0.31973737],
       [-0.28125477],
       [-0.09838369],
       [-0.33512159],
       [-0.19919612]])

Expected output:

    array([[-0.31973737],
           [-0.28125477],
           [-0.09838369],
           [-0.33512159],
           [-0.19919612]])

**Hitung $\mathbf{\hat y}$.**

In [66]:
y_hat = softmax(z2) # Compute y_hat (z2 after applying softmax function)
y_hat

array([[0.18519074],
       [0.19245626],
       [0.23107446],
       [0.18236353],
       [0.20891502]])

Expected output:

    array([[0.18519074],
           [0.19245626],
           [0.23107446],
           [0.18236353],
           [0.20891502]])


## Cross-entropy loss



In [67]:
y_hat

array([[0.18519074],
       [0.19245626],
       [0.23107446],
       [0.18236353],
       [0.20891502]])

Nilai target adalah:

In [68]:
y

array([[0.],
       [0.],
       [1.],
       [0.],
       [0.]])

Rumus dari cross-entropy loss adalah:

$$ J=-\sum\limits_{k=1}^{V}y_k\log{\hat{y}_k} \tag{6}$$



In [69]:
def cross_entropy_loss(y_predicted, y_actual):
    loss = np.sum(-np.log(y_predicted)*y_actual) # Fill the loss variable with your code
    return loss

In [70]:
cross_entropy_loss(y_hat, y) # Print value of cross entropy loss for prediction and target value

1.4650152923611106

Expected output:

    1.4650152923611106

## Backpropagation

\begin{align}
 \frac{\partial J}{\partial \mathbf{W_1}} &= \rm{ReLU}\left ( \mathbf{W_2^\top} (\mathbf{\hat{y}} - \mathbf{y})\right )\mathbf{x}^\top \tag{7}\\
 \frac{\partial J}{\partial \mathbf{W_2}} &= (\mathbf{\hat{y}} - \mathbf{y})\mathbf{h^\top} \tag{8}\\
 \frac{\partial J}{\partial \mathbf{b_1}} &= \rm{ReLU}\left ( \mathbf{W_2^\top} (\mathbf{\hat{y}} - \mathbf{y})\right ) \tag{9}\\
 \frac{\partial J}{\partial \mathbf{b_2}} &= \mathbf{\hat{y}} - \mathbf{y} \tag{10}
\end{align}



### Tugas
Hitung variabel `grad_b2` sebagai berikut
$$\frac{\partial J}{\partial \mathbf{b_2}} = \mathbf{\hat{y}} - \mathbf{y}$$

In [71]:
# BEGIN your code here
grad_b2 = y_hat - y # Compute vector with partial derivatives of loss function with respect to b2
# END your code here

grad_b2

array([[ 0.18519074],
       [ 0.19245626],
       [-0.76892554],
       [ 0.18236353],
       [ 0.20891502]])

Expected output:

    array([[ 0.18519074],
           [ 0.19245626],
           [-0.76892554],
           [ 0.18236353],
           [ 0.20891502]])

### Tugas
Hitung variabel `grad_W2` sebagai berikut

$$\frac{\partial J}{\partial \mathbf{W_2}} = (\mathbf{\hat{y}} - \mathbf{y})\mathbf{h^\top} \tag{8}$$

In [72]:
# BEGIN your code here
grad_W2 = np.dot(y_hat - y, h.T) # Compute matrix with partial derivatives of loss function with respect to W2
# END your code here

grad_W2

array([[ 0.06756476,  0.11798563,  0.        ],
       [ 0.0702155 ,  0.12261452,  0.        ],
       [-0.28053384, -0.48988499, -0.        ],
       [ 0.06653328,  0.1161844 ,  0.        ],
       [ 0.07622029,  0.13310045,  0.        ]])

Expected output:

    array([[ 0.06756476,  0.11798563,  0.        ],
           [ 0.0702155 ,  0.12261452,  0.        ],
           [-0.28053384, -0.48988499,  0.        ],
           [ 0.06653328,  0.1161844 ,  0.        ],
           [ 0.07622029,  0.13310045,  0.        ]])

### Tugas

Hitung variabel `grad_b1` sebagai berikut

$$\frac{\partial J}{\partial \mathbf{b_1}} = \rm{ReLU}\left ( \mathbf{W_2^\top} (\mathbf{\hat{y}} - \mathbf{y})\right ) \tag{9}$$

In [73]:
# BEGIN your code here
grad_b1 = relu(np.dot(W2.T, y_hat - y)) # Compute vector with partial derivatives of loss function with respect to b1
# END your code here

grad_b1

array([[0.        ],
       [0.        ],
       [0.17045858]])

Expected output:

    array([[0.        ],
           [0.        ],
           [0.17045858]])

### Tugas

Hitung variabel `grad_W1` sebagai berikut

$$\frac{\partial J}{\partial \mathbf{W_1}} = \rm{ReLU}\left ( \mathbf{W_2^\top} (\mathbf{\hat{y}} - \mathbf{y})\right )\mathbf{x}^\top \tag{7}$$

In [74]:
# BEGIN your code here
grad_W1 = np.dot(relu(np.dot(W2.T, y_hat - y)), x.T) # Compute matrix with partial derivatives of loss function with respect to W1
# END your code here

grad_W1

array([[0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.04261464, 0.04261464, 0.        , 0.08522929, 0.        ]])

Expected output:

    array([[0.        , 0.        , 0.        , 0.        , 0.        ],
           [0.        , 0.        , 0.        , 0.        , 0.        ],
           [0.04261464, 0.04261464, 0.        , 0.08522929, 0.        ]])

Cek semua dimensi tensor diatas

In [75]:
# BEGIN your code here
print(f'V (vocabulary size): {V}')
print(f'N (embedding size / size of the hidden layer): {N}')
print(f'size of grad_W1: {grad_W1.shape} (NxV)')
print(f'size of grad_b1: {grad_b1.shape} (Nx1)')
print(f'size of grad_W2: {grad_W1.shape} (VxN)')
print(f'size of grad_b2: {grad_b2.shape} (Vx1)')
# END your code here

V (vocabulary size): 5
N (embedding size / size of the hidden layer): 3
size of grad_W1: (3, 5) (NxV)
size of grad_b1: (3, 1) (Nx1)
size of grad_W2: (3, 5) (VxN)
size of grad_b2: (5, 1) (Vx1)


## Gradient descent


\begin{align}
 \mathbf{W_1} &:= \mathbf{W_1} - \alpha \frac{\partial J}{\partial \mathbf{W_1}} \tag{11}\\
 \mathbf{W_2} &:= \mathbf{W_2} - \alpha \frac{\partial J}{\partial \mathbf{W_2}} \tag{12}\\
 \mathbf{b_1} &:= \mathbf{b_1} - \alpha \frac{\partial J}{\partial \mathbf{b_1}} \tag{13}\\
 \mathbf{b_2} &:= \mathbf{b_2} - \alpha \frac{\partial J}{\partial \mathbf{b_2}} \tag{14}\\
\end{align}



In [76]:
alpha = 0.01 # Define alpha

Perbaharui bobot $\mathbf{W_1}$:

In [77]:
W1_new = W1 - alpha * grad_W1 # Compute updated W1

Bandingkan nilai lama dan baru bobot $\mathbf{W_1}$:

In [78]:
print('old value of W1:')
print(W1)
print()
print('new value of W1:')
print(W1_new)

old value of W1:
[[ 0.41687358  0.08854191 -0.23495225  0.28320538  0.41800106]
 [ 0.32735501  0.22795148 -0.23951958  0.4117634  -0.23924344]
 [ 0.26637602 -0.23846886 -0.37770863 -0.11399446  0.34008124]]

new value of W1:
[[ 0.41687358  0.08854191 -0.23495225  0.28320538  0.41800106]
 [ 0.32735501  0.22795148 -0.23951958  0.4117634  -0.23924344]
 [ 0.26594987 -0.23889501 -0.37770863 -0.11484675  0.34008124]]


### Tugas

Hitung gradient descent lainnya

\begin{align}
 \mathbf{W_2} &:= \mathbf{W_2} - \alpha \frac{\partial J}{\partial \mathbf{W_2}} \tag{12}\\
 \mathbf{b_1} &:= \mathbf{b_1} - \alpha \frac{\partial J}{\partial \mathbf{b_1}} \tag{13}\\
 \mathbf{b_2} &:= \mathbf{b_2} - \alpha \frac{\partial J}{\partial \mathbf{b_2}} \tag{14}\\
\end{align}

In [79]:
# BEGIN your code here
W2_new = W2 - alpha * grad_W2 # Compute updated W2
b1_new = b1 - alpha * grad_b1 # Compute updated b1
b2_new = b2 - alpha * grad_b2 # Compute updated b2
# END your code here

print('W2_new')
print(W2_new)
print()
print('b1_new')
print(b1_new)
print()
print('b2_new')
print(b2_new)

W2_new
[[-0.22249629 -0.43126617  0.13310965]
 [ 0.08406387  0.08000579  0.1772054 ]
 [ 0.18996044 -0.05617378 -0.1790735 ]
 [ 0.06988689 -0.02131322  0.36107434]
 [ 0.33404254 -0.39556489 -0.43959196]]

b1_new
[[ 0.09688219]
 [ 0.29239497]
 [-0.27534885]]

b2_new
[[ 0.03334889]
 [-0.3658584 ]
 [-0.12006629]
 [-0.3498469 ]
 [-0.0722673 ]]


Expected output:

    W2_new
    [[-0.22384758 -0.43362588  0.13310965]
     [ 0.08265956  0.0775535   0.1772054 ]
     [ 0.19557112 -0.04637608 -0.1790735 ]
     [ 0.06855622 -0.02363691  0.36107434]
     [ 0.33251813 -0.3982269  -0.43959196]]

    b1_new
    [[ 0.09688219]
     [ 0.29239497]
     [-0.27875802]]

    b2_new
    [[ 0.02964508]
     [-0.36970753]
     [-0.10468778]
     [-0.35349417]
     [-0.0764456 ]]

## Word embedding


### Optsi 1: embedding dari $\mathbf{W_1}$



In [80]:
W1

array([[ 0.41687358,  0.08854191, -0.23495225,  0.28320538,  0.41800106],
       [ 0.32735501,  0.22795148, -0.23951958,  0.4117634 , -0.23924344],
       [ 0.26637602, -0.23846886, -0.37770863, -0.11399446,  0.34008124]])

In [81]:
# Print corresponding word for each index within vocabulary's range
for i in range(V):
    print(Ind2word[i])

am
because
happy
i
learning


In [82]:
# loop through each word of the vocabulary
for word in word2Ind:
    # extract the column corresponding to the index of the word in the vocabulary
    word_embedding_vector = W1[:, word2Ind[word]]
    
    print(f'{word}: {word_embedding_vector}')

am: [0.41687358 0.32735501 0.26637602]
because: [ 0.08854191  0.22795148 -0.23846886]
happy: [-0.23495225 -0.23951958 -0.37770863]
i: [ 0.28320538  0.4117634  -0.11399446]
learning: [ 0.41800106 -0.23924344  0.34008124]


### Opsi 2: embedding dari $\mathbf{W_2}$

In [83]:
W2.T

array([[-0.22182064,  0.08476603,  0.1871551 ,  0.07055222,  0.33480474],
       [-0.43008631,  0.08123194, -0.06107263, -0.02015138, -0.39423389],
       [ 0.13310965,  0.1772054 , -0.1790735 ,  0.36107434, -0.43959196]])

In [84]:
# loop through each word of the vocabulary
for word in word2Ind:
    # extract the column corresponding to the index of the word in the vocabulary
    word_embedding_vector = W2.T[:, word2Ind[word]]
    
    print(f'{word}: {word_embedding_vector}')

am: [-0.22182064 -0.43008631  0.13310965]
because: [0.08476603 0.08123194 0.1772054 ]
happy: [ 0.1871551  -0.06107263 -0.1790735 ]
i: [ 0.07055222 -0.02015138  0.36107434]
learning: [ 0.33480474 -0.39423389 -0.43959196]


### Opsi 3: embedding dari $\mathbf{W_1}$ and $\mathbf{W_2}$

In [85]:
# BEGIN your code here
W3 = (W1+W2.T)/2 # Compute W3 as the average of W1 and W2 transposed
# END your code here

W3

array([[ 0.09752647,  0.08665397, -0.02389858,  0.1768788 ,  0.3764029 ],
       [-0.05136565,  0.15459171, -0.15029611,  0.19580601, -0.31673866],
       [ 0.19974284, -0.03063173, -0.27839106,  0.12353994, -0.04975536]])

Expected output:

    array([[ 0.09752647,  0.08665397, -0.02389858,  0.1768788 ,  0.3764029 ],
           [-0.05136565,  0.15459171, -0.15029611,  0.19580601, -0.31673866],
           [ 0.19974284, -0.03063173, -0.27839106,  0.12353994, -0.04975536]])

In [86]:
# loop through each word of the vocabulary
for word in word2Ind:
    # extract the column corresponding to the index of the word in the vocabulary
    word_embedding_vector = W3[:, word2Ind[word]]
    
    print(f'{word}: {word_embedding_vector}')

am: [ 0.09752647 -0.05136565  0.19974284]
because: [ 0.08665397  0.15459171 -0.03063173]
happy: [-0.02389858 -0.15029611 -0.27839106]
i: [0.1768788  0.19580601 0.12353994]
learning: [ 0.3764029  -0.31673866 -0.04975536]
