## Introduction to Word Embeddings

Word embeddings are a type of word representation that allows words to be represented as continuous vectors in a high-dimensional space. Unlike traditional representations like Bag of Words (BoW), word embeddings capture semantic meanings and relationships between words by placing similar words closer together in the vector space.

### Key Concepts

1. **Word Embedding**: A dense vector representation of a word where each dimension captures some aspect of its meaning.
2. **Pre-trained Embeddings**: Embeddings learned from large corpora, such as Word2Vec, GloVe, and FastText.
3. **Semantic Similarity**: Words with similar meanings will have similar embeddings, making it easier to perform tasks like word similarity and analogy.

In [None]:
from gensim.models import KeyedVectors

 Load pre-trained Word2Vec model (Google News vectors)
Note: This model is quite large. For demonstration, use a smaller or different model as needed.
 model = KeyedVectors.load_word2vec_format('path/to/GoogleNews-vectors-negative300.bin', binary=True)

In [None]:
# For demonstration, we'll use a smaller pre-trained model available in gensim
from gensim.downloader import load
model = load('glove-wiki-gigaword-50')

In [None]:
# Example words
words = ['king', 'queen', 'man', 'woman']

In [None]:
# Get embeddings
embeddings = {word: model[word] for word in words}

In [None]:
# Display embeddings
for word, vector in embeddings.items():
    print(f"Word: {word}\nEmbedding: {vector}\n")

In [None]:
# Find similar words
similar_words = model.most_similar('computer', topn=5)

# TODO:: Display similar words

In [None]:
# Solve analogy
analogy_result = model.most_similar(positive=['queen', 'man'], negative=['king'], topn=1)

# TODO:: Display result

In [1]:
from gensim.models import KeyedVectors

In [2]:
from gensim.downloader import load
model = load('glove-wiki-gigaword-50')

In [3]:
words = ['prince', 'princess', 'boy', 'girl']

In [4]:
embeddings = {word: model[word] for word in words}

In [5]:
for word, vector in embeddings.items():
  print(f"Word: {word}\nEmbedding: {vector}\n")

Word: prince
Embedding: [ 9.8846e-01  1.4535e+00 -5.3081e-01  1.0509e-01  8.4058e-01  1.4018e-01
  6.6562e-02  1.3341e+00 -7.5813e-01 -3.5223e-01  1.6588e-01  1.0016e+00
  1.9623e-02 -6.6392e-01  9.2825e-02  2.5132e-01 -1.6274e-01 -1.1954e-01
 -5.0072e-01  6.7374e-01  6.6886e-01 -3.8679e-02  2.0223e-01 -1.5211e-01
  1.2169e-01 -1.8324e+00 -8.5664e-01 -6.2454e-01 -3.1896e-01  6.0221e-01
  1.4110e+00  5.0157e-01 -1.1413e-01  5.1808e-01  8.4700e-01  1.7618e-01
 -3.5265e-02  5.6405e-01 -3.8524e-01  6.0270e-01  3.4331e-01  1.1836e+00
 -3.7197e-01 -1.1069e+00  1.2758e-04 -1.8202e-01 -1.3696e+00 -1.4970e+00
  4.0618e-01 -4.2445e-01]

Word: princess
Embedding: [ 1.4992    1.6053   -1.1699    0.69597   0.63491   1.0803   -0.15271
  1.0974   -0.12842  -0.74608   0.59572   0.76493   0.18664  -0.47217
  0.72322   0.48368  -1.4334   -0.032644 -0.1652    0.38196   0.71329
  0.89524  -0.26091   0.44074   0.52343  -1.2422   -1.8396   -0.10232
 -0.081415 -0.087913  0.91303   0.69385   0.26242   1.0921 

In [6]:
similar_words = model.most_similar('pen', topn =5)
similar_words

[('ballpoint', 0.6782395839691162),
 ('pencil', 0.6529707908630371),
 ('pens', 0.6143139004707336),
 ('duick', 0.597665548324585),
 ('paper', 0.588976263999939)]

In [7]:
analogy_result = model.most_similar(positive=['princess','boy'], negative=['prince'], topn=1)
analogy_result

[('girl', 0.8908403515815735)]