# Word2Vec


FastText provides two models for computing word representations: skipgram and cbow ('continuous-bag-of-words').
The skipgram model learns to predict a target word thanks to a nearby word. On the other hand, the cbow model predicts the target word according to its context. 
The model has been trained on the Rx thorax corpus (3M words) with the skipgram model using the following parameters: 
- Vector size: 100 dimensions
- Subword between 3 and 6 characters: The subwords are all the substrings contained in a word between the minimum size (nmin) and the maximal size (nmax). 
- Number of epoch 5
- Learning rate: 0.05
- Threads: 12

In [6]:
import fastText 
from fastText import load_model

f = load_model('fasttext/text.bin')
words, frequency = f.get_words(include_freq=True)
subwords = f.get_subwords("epoc")
vector = f.get_word_vector("epoc")
print(subwords)
print(vector)

(['epoc', '<ep', '<epo', '<epoc', '<epoc>', 'epo', 'epoc', 'epoc>', 'poc', 'poc>', 'oc>'], array([     26,  681952, 1838757, 1912258, 1775324, 1905263,  133644,
        865630,  217019,  125063, 1127517]))
[-0.0093281  -0.15532416  0.0948198  -0.08907584  0.25190157 -0.10687302
  0.01649686 -0.20749712  0.11510517 -0.02903952 -0.0196301   0.28202081
  0.00626374 -0.03796449  0.31332636 -0.11211477 -0.07280819  0.09385599
  0.25875148 -0.03122052 -0.10758104  0.11866977 -0.14313541 -0.0205832
 -0.47141314 -0.14188825  0.28987271  0.26812938 -0.17702948  0.03432148
 -0.20997262 -0.03703849 -0.07540703 -0.0446909   0.13087593  0.23045629
  0.18910374 -0.17747886  0.01849453  0.52423888  0.04227362  0.12532395
 -0.34457079 -0.04104725  0.16589129  0.13506122 -0.07036062 -0.04251134
  0.14447868  0.12321639 -0.20340008  0.0634378  -0.01942538 -0.00913388
  0.19385919 -0.09477488 -0.54019481  0.0198158   0.20428528  0.18138929
  0.16437246  0.00601617  0.15107866  0.19145215 -0.08057079  0.0

Nearest neighbors: A simple way to check the quality of a word vector is to look at its nearest neighbors. This give an intuition of the type of semantic information the vectors are able to capture. We see that misspelled word matches also to reasonable words.