# Word Vectors with Spacy

Word vectors, or word embeddings, are numerical erpresentations of words in multidimensional space through matrices.

In [1]:
import numpy as np
import spacy

In [2]:
model = spacy.load("en_core_web_md")

In [3]:
with open("data/wiki_us.txt") as f:
    txt = f.read()

In [4]:
doc = model(txt)

In [5]:
sentence = list(doc.sents)[0]
print(sentence)

The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America.


In [6]:
word = "country"

In [7]:
model.vocab.strings[word]

12290671265767728302

In [8]:
model.vocab.vectors[model.vocab.strings[word]]

array([ 7.3677e-02,  5.2156e-01,  3.1504e-01, -2.5743e-01,  6.3185e-01,
       -1.4061e-01,  4.4108e-02,  5.5817e-02, -4.5327e-01,  3.3915e+00,
       -8.8491e-01,  1.4696e-02, -4.4164e-01, -2.7146e-01,  6.5846e-02,
       -1.4115e-01, -1.8359e-01,  5.5882e-01,  3.4735e-01,  4.9206e-01,
       -2.9363e-01,  1.5864e-01,  1.8245e-01,  2.4616e-01, -5.4651e-02,
       -2.6116e-01, -2.0582e-01,  1.7782e-01, -2.3421e-01,  2.0985e-01,
        3.4395e-01, -7.5486e-03,  1.1200e-01, -4.3141e-01,  7.9923e-02,
       -1.0038e-01, -1.5685e-01, -2.8106e-01,  1.3058e-01,  4.1862e-01,
       -8.8605e-02,  2.7584e-01,  2.5731e-01,  1.9394e-01, -2.0442e-01,
       -4.4856e-01,  4.8795e-01, -2.4139e-01,  9.9646e-02, -3.3894e-01,
       -6.3024e-01,  2.4977e-01, -4.6698e-02, -3.4425e-01,  3.9504e-01,
       -1.0694e-01,  1.5900e-01, -2.0601e-01,  3.7994e-01, -1.8441e-01,
       -1.1135e-02,  4.4415e-01, -1.6894e-01,  4.8023e-01, -1.4859e-01,
        2.5886e-01, -1.5207e-01,  6.8721e-02,  3.1572e-01,  9.31

## Similar Words

In [9]:
similar_words = model.vocab.vectors.most_similar(
    np.asarray([model.vocab.vectors[model.vocab.strings[word]]]), n=10
)

In [10]:
words = [model.vocab.strings[w] for w in similar_words[0][0]]

In [11]:
print(words)

['country', 'COUNTRY', 'NATION', 'nation', 'COUNTIRES', 'nations', 'member-states', 'worLd', 'World', 'world']


## Doc Similarity

In [12]:
doc1 = model("I like salty fries and hamburgers.")
doc2 = model("Fast food tastes very good.")

print(doc1, "<->", doc2, doc1.similarity(doc2))

I like salty fries and hamburgers. <-> Fast food tastes very good. 0.7799485853415737


## Word Similarity

In [13]:
french_fries = model("french fries")
burgers = model("burgers")
print(french_fries, "<->", burgers, french_fries.similarity(burgers))

french fries <-> burgers 0.6947320615296676
