# word2vec implementation

The below cells are implementation for converting arrays of keywords and their respective weights into one single vector using the weighted average of the keywords' vectors. The final cell gives an example of how it can be used to combine the meanings of words.

I used a 320 parameter model because it gives better results. If it ends up necessary that we use a model with less parameters we can easily change this.

In [2]:
import numpy as np
from gensim.models import KeyedVectors

# this is the model containing all the vectors for different words
wv = KeyedVectors.load_word2vec_format('vectors/cow-320.txt', binary=False)

In [3]:
def keywords2vec(keywords, weights):
    '''
    input: equal sized arrays of the keywords and their respective weights
    output: weighted average of the keywords' vectors
    '''
    meaning_vec = np.zeros(320)
    for i in range(len(keywords)):
        word_vec = wv[keywords[i]]
        meaning_vec += (word_vec * weights[i])
        
    weights_tot = sum(weights)
    
    meaning_vec /= weights_tot
    
    return meaning_vec

In [4]:
keywords = ["appel", "banaan"]
weights = [0.5, 0.5]

vec = keywords2vec(keywords, weights)
print(wv.most_similar(vec))

[('appel', 0.9236449003219604), ('banaan', 0.9236448407173157), ('ananas', 0.8367544412612915), ('sinaasappel', 0.814830482006073), ('aardbei', 0.8037692308425903), ('meloen', 0.8004668951034546), ('sinasappel', 0.7890397906303406), ('watermeloen', 0.782245397567749), ('perzik', 0.778252363204956), ('dadel', 0.776611864566803)]
