## Loading pre-trained sense vectors 

To test with word sense embeddings you can use a pretrained model (sense vectors and sense probabilities). These sense         vectors were induced from Wikipedia using word2vec similarities between words in ego-networks. Sense probabilities are       stored in a separate file which is located next to the file with sense vectors. 

In [2]:
import sensegram
sense_vectors_fpath = "model/wikipedia-ru-2018.txt.clusters.minsize5-1000-sum-score-20.vectors.bin"
sv = sensegram.SenseGram.load_word2vec_format(sense_vectors_fpath, binary=False)

## Getting the list of senses of a word 

Probabilities of senses will be loaded automatically if placed in the same folder as sense vectors and named according to the  same scheme as our pretrained files.

To examine how many senses were learned for a word call `get_senses` funcion:

In [25]:
word = "ключ"
sv.get_senses(word)

[('ключ#1', 0.33494),
 ('ключ#2', 0.279518),
 ('ключ#3', 0.248193),
 ('ключ#4', 0.125301),
 ('ключ#5', 0.012048)]

## Sense aware nearest neighbors

The function returns a list of sense names with probabilities for each sense. As one can see, our model has learned two senses for the word "ключ".

To understand which word sense is represented with a sense vector use `most_similar` function:


In [19]:
word = "ключ"
for sense_id, prob in sv.get_senses(word):
    print(sense_id)
    print("="*20)
    for rsense_id, sim in sv.wv.most_similar(sense_id):
        print("{} {:f}".format(rsense_id, sim))
    print("\n")

ключ#1
Ключ#1 0.991653
колодец#2 0.915456
ЯМА#2 0.899536
Яма#2 0.899536
яма#2 0.899536
ищущий#2 0.897314
Ищущий#2 0.897314
Иллюзий#4 0.896384
Змеиной#3 0.894822
змеиной#4 0.894822


ключ#2
Ключ#2 0.995825
дубликат#1 0.942747
Дубликат#1 0.942747
чемоданчик#1 0.935276
брелок#2 0.929783
Брелок#2 0.929783
Кошелёк#1 0.926213
Кулон#1 0.923792
кулон#1 0.923792
Кейс#2 0.923318


ключ#3
симметричный_ключ#1 0.948021
ключевой_поток#1 0.945735
зашифрованный_текст#1 0.944024
сеансовый_ключ#1 0.944022
ПИН_код#1 0.939620
пин_код#1 0.939620
шифротекст#2 0.938751
шифротекст_formula_#1 0.937948
Секретный_ключ#1 0.936267
секретный_ключ#1 0.936267


ключ#4
Ключ#4 0.952905
яр#1 0.952719
овраг#1 0.951952
Яр#2 0.951806
порог#2 0.951373
Порог#2 0.951373
ЯР#2 0.949762
хуторок#2 0.945221
Овраг#1 0.940528
ерик#1 0.931127


ключ#5
Древу#2 0.919379
Эросу#1 0.899062
Юпитеру#1 0.893557
Содому#1 0.892363
Тебатиманкчсатт#2 0.886120
веку#1 0.883292
Создателю#1 0.883041
Веку#1 0.882586
возвращающим#1 0.877548
создателю#

## Word sense disambiguation: loading word embeddings

To use our word sense disambiguation mechanism you also need word vectors or context vectors, depending on the dismabiguation  strategy. Those word are located in the ``model`` directory and has the extension ``.vectors``.

Our WSD mechanism is based on word similarities (`sim`) and requires word vectors to represent context words. In following we provide a disambiguation example using similarity strategy.

First, load word vectors using gensim library:


In [24]:
from gensim.models import KeyedVectors
word_vectors_fpath = "model/wikipedia-ru-2018.txt.vectors"
wv = KeyedVectors.load_word2vec_format(word_vectors_fpath, binary=False, unicode_errors="ignore")

In [29]:
from wsd import WSD
wsd_model = WSD(sv, wv, window=5, method='sim', filter_ctx=3)

Disambiguation method: sim
Filter context: f = 3


In [38]:
word = "ключ"
context = "ключ — информация в криптографии, используемая алгоритмом для преобразования сообщения при шифровании или расшифровании."
wsd_model.dis_text(context, word, 0, 4)

('ключ#3',
 [-0.092931547260743735,
  0.28434546029012309,
  0.36909524453328935,
  -0.17320387329222187,
  -0.011932713897268587])

In [31]:
context = "ключ - Металлический стержень особой формы для отпирания и запирания замка"
wsd_model.dis_text(context, word, 0, 4)

('ключ#2',
 [0.048491791449905092,
  0.45415751329647575,
  0.28697937282923136,
  -0.063768491451374393,
  0.075406151259497817])

In [36]:
context = "ключ, Родник — естественный выход подземных вод на земную поверхность на суше или под водой"
wsd_model.dis_text(context, word, 0, 4)

('ключ#4',
 [0.20482207192807622,
  0.23251588640731458,
  0.12376888477421301,
  0.32699153638902612,
  0.13794625108490285])