<a href="https://colab.research.google.com/github/LeonardoGoncRibeiro/06_MachineLearning/blob/main/02_Advanced/09_Word2Vec_Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Word2Vec: Training a word embedding

In this course, we will show how can we train a word embedding model. First, we will see how to use Spacy to preprocess textual data. Then, we will understand how to set hyperparameters for the Word2Vec model. The fitting of our model can be performed using Gensim. In the end, we will create a text classifier using our Word2Vec model, and we will learn how to save a file for our model to use it later.

In this course, we will use the following packages:

In [1]:
!python -m spacy download pt_core_news_sm --quiet

2022-08-03 20:09:27.128494: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
[K     |████████████████████████████████| 13.0 MB 344 kB/s 
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('pt_core_news_sm')


In [81]:
import pandas as pd
import numpy as np
import spacy

from gensim.models import Word2Vec
from gensim.models import KeyedVectors

import logging

from sklearn.dummy import DummyClassifier
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report

Also, we will use the following dataset:

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
df_train = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/treino.csv')
df_test  = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/teste.csv')

# Spacy

Spacy is a very strong open-source library for NLP. Here, we will use Spacy to preprocess our data. Since we will use a model for the portuguese language, let's import our portuguese data:

In [55]:
nlp = spacy.load('pt_core_news_sm', disable = ["parser", "ner", "tagger", "textcat"])   # We disabled some features to improve efficiency

This object can be used to transform our text into ```doc``` typing. For instance:

In [6]:
text = "Rio de Janeiro é uma cidade maravilhosa."
doc = nlp(text)
type(doc)

spacy.tokens.doc.Doc

This typing stores different tokens for our data. For instance, we can get the first word in our text using:

In [7]:
doc[0]

Rio

Nice! Also, we can check the entities in our text:

In [8]:
doc.ents

(Rio de Janeiro,)

We can also check if something in our text is a stopword. For instance:

In [9]:
doc[0].is_stop

False

So, it seems that the first word in our text is not a stopword.

Note that, in this course, we will not use One-hot encoding representation. Instead, we will used Word2Vec. We will take many words from our corpus, and try to define optimal vector representations with a fixed size. The definition of the optimal vectors is similar to the training of a Neural Network.

To train a Word2Vec model, we should start by performing a preprocessing over our text. Thus, we should do:

* Put our text in lower case.
* Exclude punctuation.
* Exclude stopwords.
* Get titles with, at least, three words.

Thus, let's create a function to perform the preprocessing of our text:

In [10]:
def GetTreatedText(text_to_treat, nlp):
  valid_tokens = []

  text_lowered = text_to_treat.lower( )                  # Lowercase

  doc = nlp(text_lowered)                                # Getting the doc representation of our text

  for token in doc:
    if (not token.is_stop) and (token.is_alpha):         # Excluding cases where the token is a stopword or a alphanumeric
      valid_tokens.append(token.text)
      
  if len(valid_tokens) > 2:                              # If there are less than 3 words, return NA
    return " ".join(valid_tokens)

Nice! Now, let's test our treatment:

In [11]:
treated_text = GetTreatedText(text, nlp)
treated_text

'rio janeiro cidade maravilhosa'

Nice! Everything worked out fine. Note that, now, to tokenize our text again, we should simply use a ```split( )``` method. 

# Applying the treatment over our dataset

Now, let's perform the treatment over our dataset. First, let's make a basic test with only one of our titles:

In [12]:
test = df_train.title.iloc[0]
test

'Após polêmica, Marine Le Pen diz que abomina negacionistas do Holocausto'

Let's get the treated text:

In [13]:
treated_text = GetTreatedText(test, nlp)
treated_text

'polêmica marine le pen abomina negacionistas holocausto'

It seems to have worked once again. So, let's apply to our entire dataframe using apply:

In [14]:
df_train['title_treat'] = df_train.title.apply(lambda x : GetTreatedText(x, nlp))
df_test['title_treat']  = df_test.title.apply(lambda x : GetTreatedText(x, nlp))

Finally, let's check if everything worked out:

In [15]:
df_train[['title', 'title_treat']].head(10)

Unnamed: 0,title,title_treat
0,"Após polêmica, Marine Le Pen diz que abomina n...",polêmica marine le pen abomina negacionistas h...
1,"Macron e Le Pen vão ao 2º turno na França, em ...",macron le pen turno frança revés siglas tradic...
2,"Apesar de larga vitória nas legislativas, Macr...",apesar larga vitória legislativas macron terá ...
3,"Governo antecipa balanço, e Alckmin anuncia qu...",governo antecipa balanço alckmin anuncia queda...
4,"Após queda em maio, a atividade econômica sobe...",queda maio atividade econômica sobe junho bc
5,Barcelona vence de virada; Atlético de Madri b...,barcelona vence virada atlético madri bate bay...
6,'Spartacus' oferece um duplo retrato de batalh...,spartacus oferece duplo retrato batalhas perdidas
7,Sobe para 86 o número de mortos no atentado te...,sobe mortos atentado terrorista nice frança
8,"Premiada em Sundance, Crystal Moselle retrata ...",premiada sundance crystal moselle retrata sexi...
9,Metroviários e ferroviários ameaçam parar na p...,metroviários ferroviários ameaçam parar terça ...


Nice. Note that, due to our treatment, we may have ended up with some null values. Let's check this:

In [16]:
df_train['title_treat'].isna( ).sum( )

5320

In [17]:
df_test['title_treat'].isna( ).sum( )

2178

Indeed, we have some null values in the train and test sets. We can remove those using the ```dropna( )``` method. Also, we can use the ```drop_duplicates( )``` to drop possible duplicated entries. Thus:

In [18]:
df_train = df_train.dropna(subset = ['title_treat']).drop_duplicates( )
df_test  = df_test.dropna(subset = ['title_treat']).drop_duplicates( )

# Fitting our Word2Vec model

Finally, after we have treated our data, we can fit our Word2Vec model, which will get us our vector representation for each word. We can do that using the ```gensim``` package, using the ```Word2Vec( )```. Some important parameters are:

* ```sg```: Boolean. 1 for Skipgram, 0 for Continuous Bag of Words.
* ```window```: How many words we be considered before and after our focused word.
* ```size```: Size of our final vector.
* ```min_count```: Lowest number of frequency necessary for a word to be considered.
* ```alpha```: Similar to the Learning Rate from Neural Networks.
* ```min_alpha```: ```alpha``` will decay over the epochs, until the model is fitted.

Now, let's instance our model:

In [31]:
w2v_model = Word2Vec(sg = 0,
                     window = 2,
                     size = 300,
                     min_count = 3,
                     alpha = 0.03,
                     min_alpha = 0.007)

Before fitting our model, we first have to build our vocabulary. To do so, we use the ```build_vocab( )``` method. This method we receive a list of different lists of tokens. To increase efficiency, we can use a genexp. We can create this list using:

In [32]:
list_tokens_list = [text.split(" ") for text in df_train.title_treat]

Now, we can build our vocabulary. To visualize the building of the vocabulary, we can use a log:

In [33]:
logging.basicConfig(format = "%(asctime)s : - %(message)s", level = logging.INFO)
w2v_model.build_vocab(list_tokens_list, progress_per = 5000)

2022-08-03 20:29:24,936 : - collecting all words and their counts
2022-08-03 20:29:24,942 : - PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2022-08-03 20:29:24,964 : - PROGRESS: at sentence #5000, processed 31930 words, keeping 10193 word types
2022-08-03 20:29:24,983 : - PROGRESS: at sentence #10000, processed 63840 words, keeping 14986 word types
2022-08-03 20:29:25,003 : - PROGRESS: at sentence #15000, processed 95719 words, keeping 18273 word types
2022-08-03 20:29:25,028 : - PROGRESS: at sentence #20000, processed 127647 words, keeping 21016 word types
2022-08-03 20:29:25,049 : - PROGRESS: at sentence #25000, processed 159536 words, keeping 23481 word types
2022-08-03 20:29:25,068 : - PROGRESS: at sentence #30000, processed 191470 words, keeping 25476 word types
2022-08-03 20:29:25,087 : - PROGRESS: at sentence #35000, processed 223330 words, keeping 27311 word types
2022-08-03 20:29:25,107 : - PROGRESS: at sentence #40000, processed 255199 words, keeping 29028

Nice! We can see the number of words in our corpus using:

In [34]:
c_count = w2v_model.corpus_count
c_count

84680

Finally, we can train our Word2Vec model using:

In [36]:
w2v_model.train(list_tokens_list, total_examples = c_count, epochs = 30)

2022-08-03 20:29:41,642 : - training model with 3 workers on 18190 vocabulary and 300 features, using sg=0 hs=0 sample=0.001 negative=5 window=2
2022-08-03 20:29:42,732 : - EPOCH 1 - PROGRESS: at 35.15% examples, 166430 words/s, in_qsize 5, out_qsize 0
2022-08-03 20:29:43,783 : - EPOCH 1 - PROGRESS: at 75.75% examples, 180807 words/s, in_qsize 5, out_qsize 0
2022-08-03 20:29:44,168 : - worker thread finished; awaiting finish of 2 more threads
2022-08-03 20:29:44,172 : - worker thread finished; awaiting finish of 1 more threads
2022-08-03 20:29:44,176 : - worker thread finished; awaiting finish of 0 more threads
2022-08-03 20:29:44,177 : - EPOCH - 1 : training on 541297 raw words (505407 effective words) took 2.5s, 201300 effective words/s
2022-08-03 20:29:45,230 : - EPOCH 2 - PROGRESS: at 55.50% examples, 270032 words/s, in_qsize 5, out_qsize 0
2022-08-03 20:29:45,931 : - worker thread finished; awaiting finish of 2 more threads
2022-08-03 20:29:45,948 : - worker thread finished; await

(15162032, 16238910)

The training of our first model took 55.7 s.

## Evaluating our model

To see if our model is good, we can try to evaluate the similarity between words. For instance, if we pass "google" to our model, we expect the model to return the most similar words (which will likely be related to big tech companies. Let's test it:

In [37]:
w2v_model.wv.most_similar("google")

2022-08-03 20:30:51,343 : - precomputing L2-norms of word weight vectors


[('apple', 0.591137707233429),
 ('amazon', 0.5327430963516235),
 ('facebook', 0.518150269985199),
 ('disney', 0.5077775120735168),
 ('tesla', 0.49346739053726196),
 ('airbnb', 0.49191713333129883),
 ('alibaba', 0.4917002022266388),
 ('volkswagen', 0.48766130208969116),
 ('yahoo', 0.48442187905311584),
 ('uber', 0.4781746566295624)]

Nice! We really got what we would expect. Let's test another word:

In [38]:
w2v_model.wv.most_similar("china")

[('rock', 0.40720683336257935),
 ('índia', 0.37196022272109985),
 ('chinesa', 0.3669336140155792),
 ('expedia', 0.35940343141555786),
 ('toshiba', 0.3531379699707031),
 ('méxico', 0.35062330961227417),
 ('malvinas', 0.34907084703445435),
 ('bc', 0.3440648913383484),
 ('bce', 0.34233248233795166),
 ('otimismo', 0.3391525149345398)]

Note that, for "china", the words are not very common. The most similar word is "rock", with "india" comming right after (which makes more sense). 

# Fitting another Word2Vec model

Note that, here, our model was based on a continuous bag of words (CBOW). However, we can also fit a SKIPGRAM to our data. To do so, we can instance a different model:

In [39]:
w2v_skipg = Word2Vec(sg = 1,
                     window = 5,
                     size = 300,
                     min_count = 3,
                     alpha = 0.03,
                     min_alpha = 0.007)

w2v_skipg.build_vocab(list_tokens_list, progress_per = 5000)

c_count = w2v_skipg.corpus_count

w2v_skipg.train(list_tokens_list, total_examples = c_count, epochs = 30)

2022-08-03 20:31:45,091 : - collecting all words and their counts
2022-08-03 20:31:45,096 : - PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2022-08-03 20:31:45,115 : - PROGRESS: at sentence #5000, processed 31930 words, keeping 10193 word types
2022-08-03 20:31:45,132 : - PROGRESS: at sentence #10000, processed 63840 words, keeping 14986 word types
2022-08-03 20:31:45,149 : - PROGRESS: at sentence #15000, processed 95719 words, keeping 18273 word types
2022-08-03 20:31:45,167 : - PROGRESS: at sentence #20000, processed 127647 words, keeping 21016 word types
2022-08-03 20:31:45,187 : - PROGRESS: at sentence #25000, processed 159536 words, keeping 23481 word types
2022-08-03 20:31:45,209 : - PROGRESS: at sentence #30000, processed 191470 words, keeping 25476 word types
2022-08-03 20:31:45,230 : - PROGRESS: at sentence #35000, processed 223330 words, keeping 27311 word types
2022-08-03 20:31:45,250 : - PROGRESS: at sentence #40000, processed 255199 words, keeping 29028

(15161764, 16238910)

Note that the Skipgram model has a higher training time. Let's see the most similar words once again:

In [41]:
w2v_skipg.wv.most_similar("google")

2022-08-03 20:34:29,717 : - precomputing L2-norms of word weight vectors


[('alphabet', 0.4390574097633362),
 ('antitruste', 0.43469756841659546),
 ('android', 0.4245913624763489),
 ('reguladores', 0.4116258919239044),
 ('waze', 0.4096982181072235),
 ('paypal', 0.40019655227661133),
 ('lyft', 0.3987908363342285),
 ('apple', 0.39382725954055786),
 ('difusão', 0.38902547955513),
 ('buffett', 0.3868878483772278)]

Note that, this time, we did not get big tech companies. We got alphabet (which is the name of the company that owns google), antitruste (which is related to a law which directly targets google), android, and others.

Let's check china:

In [43]:
w2v_skipg.wv.most_similar("china")

[('town', 0.4401307702064514),
 ('desvalorizar', 0.4094200134277344),
 ('taiwan', 0.3961268663406372),
 ('chinês', 0.3897581100463867),
 ('expedia', 0.38523411750793457),
 ('yuan', 0.38365963101387024),
 ('felipão', 0.37362807989120483),
 ('estabilização', 0.3668305575847626),
 ('bilateral', 0.36438506841659546),
 ('sobrevoam', 0.36179083585739136)]

Again, we got different words, such as "town" (likely from chinatown), "desvalorizar", "taiwan", and others.

This occurs because the Skipgram model takes into consideration more information about the context of the word. 

# Saving our Word2Vec models

Finally, to save our model, we can do:

In [44]:
w2v_model.wv.save_word2vec_format("/content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/model_cbow.txt", binary = False)
w2v_skipg.wv.save_word2vec_format("/content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/model_skip.txt", binary = False)

2022-08-03 20:38:33,880 : - storing 18190x300 projection weights into /content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/model_cbow.txt
2022-08-03 20:38:39,196 : - storing 18190x300 projection weights into /content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/model_skip.txt


To get our models, we can also do:

In [65]:
w2v_cbow = KeyedVectors.load_word2vec_format("/content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/model_cbow.txt")
w2v_skip = KeyedVectors.load_word2vec_format("/content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/model_skip.txt")

2022-08-03 21:02:56,581 : - loading projection weights from /content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/model_cbow.txt
2022-08-03 21:03:01,941 : - loaded (18190, 300) matrix from /content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/model_cbow.txt
2022-08-03 21:03:01,944 : - loading projection weights from /content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/model_skip.txt
2022-08-03 21:03:05,036 : - loaded (18190, 300) matrix from /content/drive/MyDrive/Colab Notebooks/Machine Learning/Avançado/model_skip.txt


# Using our new model to assist in our classification

Finally, we have built a Word2Vec model. This model is able to make a treatment in our text, creating a vectorized representation of our text. Then, we are able to fit a model to this vector.

Our Word2Vec model returns a vectorized representation of a word. There are many approaches that we can do get a vectorized representation of an entire text. The most simple approach is to simply sum all vectors for the words in our text. Let's define a function to get the vectorized representation of a text:

In [66]:
def SumCombination(treated_text, w2v_model, nlp):
  resulting_vector = np.zeros(300)               # Initializing our vector

  tokens = treated_text.split(' ')               # Getting the tokens from our text
  
  for token in tokens:
    try:
      resulting_vector += w2v_model.get_vector(token)
    except KeyError:
      pass

  return resulting_vector

Let's try an example:

In [67]:
example = df_train.title_treat.iloc[0]
example

'polêmica marine le pen abomina negacionistas holocausto'

In [68]:
SumCombination(example, w2v_cbow, nlp)

array([ 1.1639581 ,  0.99361866,  1.58946272, -1.70519099, -0.58655722,
        2.76381978, -1.48856495,  2.61652827, -2.8249495 , -1.11621271,
       -1.21121646, -0.35616846, -1.81881744, -4.00770722, -0.69931941,
        1.52200159,  1.20268416, -1.00832912,  4.1725359 , -0.66098908,
        2.01742315, -1.92499796,  0.29944097,  0.5762695 , -3.5113394 ,
       -1.78334784,  2.60668474, -0.52663851,  4.239319  ,  2.74070652,
       -0.5602636 ,  0.58146892,  0.46159234,  1.89427415, -0.50434637,
        1.54227377,  1.70399833, -0.57121015, -1.33123249,  2.57822677,
        0.66038442,  1.21749611, -1.81092528, -2.21237579,  1.45097527,
        0.39411859,  1.69874291, -0.18345772,  1.64049189, -0.52350726,
        0.43626305, -1.25838883,  1.78670114, -0.8610464 ,  0.67001021,
        0.42579739,  0.62606178, -0.14037363, -0.9631228 ,  0.7100436 ,
       -2.80344407,  1.17153793,  1.04110266, -2.50932276, -1.76449172,
        0.17380876,  1.36496691, -0.91505472,  1.61307287,  1.19

Nice! We got a resulting vector, which is given by the sum of all vectors from the treated text. Now, let's create a function to get the resulting vectors for our entire dataset:

In [78]:
def GetAllResVectors(texts, w2v_model, nlp):
  x = len(texts)
  y = 300
  resulting_vectors = np.zeros((x, y))

  for i in range(x):
    text_i = texts.iloc[i]
    vector = SumCombination(text_i, w2v_model, nlp)
    resulting_vectors[i] = vector

  return resulting_vectors

Now, we do:

In [79]:
X_train = GetAllResVectors(df_train.title_treat, w2v_cbow, nlp)
X_test  = GetAllResVectors(df_test.title_treat, w2v_cbow, nlp)

Note that, here, we created our vectors using the CBOW model. Later, we will compare our results with the ones using SKIPGRAM. We can also get our target features using:

In [80]:
y_train = df_train.category
y_test  = df_test.category

## Dummy classifier

Before fitting more complex models, let's fit a dummy classifier, and see how does it behave. This model will serve as a baseline. We can do this with:

In [82]:
dummy = DummyClassifier( )
dummy.fit(X_train, y_train)
acc = dummy.score(X_test, y_test)*100
print("Accuracy: {:.2f}%".format(acc))

Accuracy: 25.40%


Our dummy classifier got a very low accuracy. Let's also get a classification report:

In [83]:
y_pred = dummy.predict(X_test)
cr = classification_report(y_test, y_pred)
print(cr)

  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     colunas       0.00      0.00      0.00      3940
   cotidiano       0.00      0.00      0.00      1696
     esporte       0.25      1.00      0.41      4657
   ilustrada       0.00      0.00      0.00       131
     mercado       0.00      0.00      0.00      5861
       mundo       0.00      0.00      0.00      2050

    accuracy                           0.25     18335
   macro avg       0.04      0.17      0.07     18335
weighted avg       0.06      0.25      0.10     18335



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


It seems that our dummy classifier identified that our highest frequency category is 'esporte', and it considered that all news were from that category. It showed a 100% recall for this category, but 0.0% for the others. 

# Logistic regression - Using CBOW

Finally, let's fit a logistic regression using the vectors obtained via CBOW. These vectors can be obtained using:

In [84]:
X_train = GetAllResVectors(df_train.title_treat, w2v_cbow, nlp)
X_test  = GetAllResVectors(df_test.title_treat, w2v_cbow, nlp)

To fit the Logistic Regression model and get its accuracy, we can do:

In [87]:
logreg = LogisticRegression(max_iter = 200)
logreg.fit(X_train, y_train)
acc = logreg.score(X_test, y_test)*100
print("Accuracy: {:.2f}%".format(acc))

Accuracy: 76.25%


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Nice! Our model had some issues with convergence, but, still, we got a much higher accuracy this time. If we plot the classification report:

In [89]:
y_pred = logreg.predict(X_test)
cr = classification_report(y_test, y_pred)
print(cr)

              precision    recall  f1-score   support

     colunas       0.70      0.53      0.60      3940
   cotidiano       0.63      0.81      0.71      1696
     esporte       0.92      0.87      0.90      4657
   ilustrada       0.12      0.85      0.22       131
     mercado       0.83      0.79      0.81      5861
       mundo       0.73      0.84      0.78      2050

    accuracy                           0.76     18335
   macro avg       0.66      0.78      0.67     18335
weighted avg       0.79      0.76      0.77     18335



Note that the overall recall and precision for most categories is now much higher. 

# Logistic regression - Using SKIPGRAM

Now, for comparison purposes, we can try to fit a similar model, but now using the SKIPGRAM model to get the resulting vectors. These vectors can be obtained using:

In [90]:
X_train = GetAllResVectors(df_train.title_treat, w2v_skip, nlp)
X_test  = GetAllResVectors(df_test.title_treat, w2v_skip, nlp)

To fit the Logistic Regression model and get its accuracy, we can do:

In [91]:
logreg = LogisticRegression(max_iter = 200)
logreg.fit(X_train, y_train)
acc = logreg.score(X_test, y_test)*100
print("Accuracy: {:.2f}%".format(acc))

Accuracy: 77.09%


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Using the SKIPGRAM, we got a slightly higher accuracy. Let's get the classification report:

In [92]:
y_pred = logreg.predict(X_test)
cr = classification_report(y_test, y_pred)
print(cr)

              precision    recall  f1-score   support

     colunas       0.70      0.53      0.61      3940
   cotidiano       0.64      0.80      0.71      1696
     esporte       0.93      0.88      0.91      4657
   ilustrada       0.14      0.88      0.24       131
     mercado       0.83      0.80      0.82      5861
       mundo       0.75      0.85      0.80      2050

    accuracy                           0.77     18335
   macro avg       0.67      0.79      0.68     18335
weighted avg       0.80      0.77      0.78     18335



In general, it seems that the precision and recall are slightly higher (when compared to the CBOW model). In this case, the SKIPGRAM seems like a better method to perform the vectorization of our words. 