<a href="https://colab.research.google.com/github/Tomisin510/LearnWithDSN/blob/main/Week_4/Week_4_Applied_Learning_Assignments_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Computer Vision Take-Home Assignment

## **Name:** Tomisin Obijole
## **Date:** 24 February 2026

# Applied Learning Assignments 3

## 3.1 One-Hot Encoding
Define a vocabulary of at least 5 unique words and generate one‑hot encoded vectors.

In [1]:
import numpy as np

# Vocabulary
vocab = ['cat', 'dog', 'bird', 'fish', 'hamster']
word_to_index = {word: i for i, word in enumerate(vocab)}

def one_hot(word):
    vec = np.zeros(len(vocab))
    vec[word_to_index[word]] = 1
    return vec

# Test
for word in vocab:
    print(f"{word}: {one_hot(word)}")

cat: [1. 0. 0. 0. 0.]
dog: [0. 1. 0. 0. 0.]
bird: [0. 0. 1. 0. 0.]
fish: [0. 0. 0. 1. 0.]
hamster: [0. 0. 0. 0. 1.]


## 3.2 Bag of Words & TF‑IDF

Use the sentences:
"The quick brown fox jumps over the lazy dog."
"The dog sleeps in the kernel"
Generate BoW and TF‑IDF representations.

In [2]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

sentences = [
    "The quick brown fox jumps over the lazy dog.",
    "The dog sleeps in the kernel"
]

# Bag of Words
count_vec = CountVectorizer()
bow = count_vec.fit_transform(sentences)
print("BoW feature names:", count_vec.get_feature_names_out())
print("BoW matrix:\n", bow.toarray())

# TF-IDF
tfidf_vec = TfidfVectorizer()
tfidf = tfidf_vec.fit_transform(sentences)
print("\nTF-IDF feature names:", tfidf_vec.get_feature_names_out())
print("TF-IDF matrix:\n", tfidf.toarray())

BoW feature names: ['brown' 'dog' 'fox' 'in' 'jumps' 'kernel' 'lazy' 'over' 'quick' 'sleeps'
 'the']
BoW matrix:
 [[1 1 1 0 1 0 1 1 1 0 2]
 [0 1 0 1 0 1 0 0 0 1 2]]

TF-IDF feature names: ['brown' 'dog' 'fox' 'in' 'jumps' 'kernel' 'lazy' 'over' 'quick' 'sleeps'
 'the']
TF-IDF matrix:
 [[0.342369   0.24359836 0.342369   0.         0.342369   0.
  0.342369   0.342369   0.342369   0.         0.48719673]
 [0.         0.30253071 0.         0.42519636 0.         0.42519636
  0.         0.         0.         0.42519636 0.60506143]]


## 3.3 Word2Vec with Gensim

Create a small dataset of at least 3 sentences about animals. Train a Word2Vec model and retrieve the embedding for "dog".

In [4]:
# First, install gensim if not already installed
!pip install gensim

# Now import and run the Word2Vec code
from gensim.models import Word2Vec
import warnings
warnings.filterwarnings('ignore')  # Suppress any deprecation warnings

# Dataset - properly tokenized sentences
sentences = [
    "The cat meows".split(),
    "The dog barks".split(),
    "The bird sings".split()
]

print("Training sentences:", sentences)

# Train Word2Vec model
# vector_size: dimensionality of the word vectors
# window: maximum distance between current and predicted word
# min_count: ignores words with total frequency lower than this
# workers: number of threads to use
model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, workers=4)

# Get embedding for "dog"
if 'dog' in model.wv:
    print("\nEmbedding for 'dog':")
    print(model.wv['dog'])
    print(f"Shape: {model.wv['dog'].shape}")
else:
    print("'dog' not in vocabulary")

# Find similar words (though with such a small dataset, results may be limited)
if 'dog' in model.wv:
    try:
        similar = model.wv.most_similar('dog', topn=3)
        print("\nWords most similar to 'dog':")
        for word, score in similar:
            print(f"  {word}: {score:.4f}")
    except:
        print("Could not find similar words (dataset too small)")

# Save and load the model (optional)
# model.save("word2vec_animal.model")
# loaded_model = Word2Vec.load("word2vec_animal.model")

Collecting gensim
  Downloading gensim-4.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (8.4 kB)
Downloading gensim-4.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (27.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.9/27.9 MB[0m [31m38.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gensim
Successfully installed gensim-4.4.0
Training sentences: [['The', 'cat', 'meows'], ['The', 'dog', 'barks'], ['The', 'bird', 'sings']]

Embedding for 'dog':
[ 0.00018913  0.00615464 -0.01362529 -0.00275093  0.01533716  0.01469282
 -0.00734659  0.0052854  -0.01663426  0.01241097 -0.00927464 -0.00632821
  0.01862271  0.00174677  0.01498141 -0.01214813  0.01032101  0.01984565
 -0.01691478 -0.01027138 -0.01412967 -0.0097253  -0.00755713 -0.0170724
  0.01591121 -0.00968788  0.01684723  0.01052514 -0.01310005  0.00791574
  0.0109403  -0.01485307 -0.01481144 -0.00495046 -0.01725145 -0.00316314
 -0.00080687  0.00659937  0

## 3.4 Pretrained GloVe with Gensim

Load the GloVe model (glove-wiki-gigaword-50) and retrieve embedding for "king" and its 5 most similar words.

In [5]:
import gensim.downloader as api

# Load the GloVe model (this may take a few minutes the first time)
glove = api.load("glove-wiki-gigaword-50")

# Embedding for "king"
if 'king' in glove:
    print("Embedding for 'king':", glove['king'])
else:
    print("'king' not in vocabulary")

# 5 most similar words to "king"
if 'king' in glove:
    similar = glove.most_similar('king', topn=5)
    print("Most similar words to 'king':", similar)
else:
    print("Cannot find 'king'")

Embedding for 'king': [ 0.50451   0.68607  -0.59517  -0.022801  0.60046  -0.13498  -0.08813
  0.47377  -0.61798  -0.31012  -0.076666  1.493    -0.034189 -0.98173
  0.68229   0.81722  -0.51874  -0.31503  -0.55809   0.66421   0.1961
 -0.13495  -0.11476  -0.30344   0.41177  -2.223    -1.0756   -1.0783
 -0.34354   0.33505   1.9927   -0.04234  -0.64319   0.71125   0.49159
  0.16754   0.34344  -0.25663  -0.8523    0.1661    0.40102   1.1685
 -1.0137   -0.21585  -0.15155   0.78321  -0.91241  -1.6106   -0.64426
 -0.51042 ]
Most similar words to 'king': [('prince', 0.8236179351806641), ('queen', 0.7839043140411377), ('ii', 0.7746230363845825), ('emperor', 0.7736247777938843), ('son', 0.766719400882721)]
