<a href="https://colab.research.google.com/github/Danalmestadi/T5-Week-seven/blob/main/Word_Embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction to Word Embeddings

Word embeddings are a type of word representation that allows words to be represented as continuous vectors in a high-dimensional space. Unlike traditional representations like Bag of Words (BoW), word embeddings capture semantic meanings and relationships between words by placing similar words closer together in the vector space.

### Key Concepts

1. **Word Embedding**: A dense vector representation of a word where each dimension captures some aspect of its meaning.
2. **Pre-trained Embeddings**: Embeddings learned from large corpora, such as Word2Vec, GloVe, and FastText.
3. **Semantic Similarity**: Words with similar meanings will have similar embeddings, making it easier to perform tasks like word similarity and analogy.

In [1]:
from gensim.models import KeyedVectors


 Load pre-trained Word2Vec model (Google News vectors)
Note: This model is quite large. For demonstration, use a smaller or different model as needed.
 model = KeyedVectors.load_word2vec_format('path/to/GoogleNews-vectors-negative300.bin', binary=True)

In [2]:
# For demonstration, we'll use a smaller pre-trained model available in gensim
from gensim.downloader import load
model = load('glove-wiki-gigaword-50')



In [3]:
# Example words
words = ['school', 'bus', 'kid', 'car']

In [4]:
# Get embeddings
embeddings = {word: model[word] for word in words}

In [5]:
# Display embeddings
for word, vector in embeddings.items():
    print(f"Word: {word}\nEmbedding: {vector}\n")

Word: school
Embedding: [-0.90629    1.2485    -0.79692   -1.4027    -0.038458  -0.25177
 -1.2838    -0.58413   -0.11179   -0.56908   -0.34842   -0.39626
 -0.0090178 -1.0691    -0.35368   -0.052826  -0.37056    1.0931
 -0.19205    0.44648    0.45169    0.72104   -0.61103    0.6315
 -0.49044   -1.7517     0.055979  -0.52281   -1.0248    -0.89142
  3.0695     0.14483   -0.13938   -1.3907     1.2123     0.40173
  0.4171     0.27364    0.98673    0.027599  -0.8724    -0.51648
 -0.30662    0.37784    0.016734   0.23813    0.49411   -0.56643
 -0.18744    0.62809  ]

Word: bus
Embedding: [ 0.84772   0.070253  0.96791  -0.27164  -0.37617   0.31978  -1.3108
  0.091093  0.59919  -0.90217  -0.050876 -0.83886  -0.61596   0.29642
 -0.42189  -0.21969  -0.94006   1.2221   -0.66526  -0.57745   0.76126
  0.51459  -0.88565   1.5135    0.42326  -1.2947    0.45522   0.67073
  0.80188  -0.65449   2.4117    0.62445  -0.046631  0.37524   1.0103
  0.25259   1.0913   -0.79427  -0.17027   1.4866   -0.24077   0.

In [7]:
# Find similar words
similar_words = model.most_similar('mobile', topn=5)

# TODO:: Display similar words
similar_words

[('wireless', 0.8815027475357056),
 ('phones', 0.805967390537262),
 ('broadband', 0.7867703437805176),
 ('cellular', 0.7820915579795837),
 ('operating', 0.7751185297966003)]

In [9]:
# Solve analogy
analogy_result = model.most_similar(positive=['teacher', 'student'], negative=['school'], topn=1)

# TODO:: Display result
analogy_result

[('educator', 0.7146477699279785)]