# **Programming Assessment \#5**

Names: Alyanna Abalos, Loben Tipan

More information on the assessment is found in our Canvas course.

# **Load Pre-trained Embeddings**

*While you don't have to separate your code into blocks, it might be easier if you separated loading / downloading your data from the main part of your solution. Consider placing all loading of data into the code block below.*

In [1]:
import gensim.downloader as api
import nltk
from gensim.models import KeyedVectors
from nltk.corpus import words

word_vectors = api.load("word2vec-google-news-300")
nltk.download('words')
english_words = set(words.words())

[nltk_data] Downloading package words to
[nltk_data]     /Users/alyannaabalos/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


# **Your Implementation**

*Again, you don't have to have everything in one block. Use the notebook according to your preferences with the goal of fulfilling the assessment in mind.*

# Random Word Generator

In [92]:
import random

def get_random_word(word_vectors, english_words):
    valid_words = set(word_vectors.key_to_index.keys()) & english_words
    valid_words = [word for word in valid_words if word in word_vectors.key_to_index]
    if not valid_words:
        raise ValueError("No valid words found that are common between the word vectors model and the nltk corpus.")
    random_word = random.choice(valid_words)
    return random_word.lower()

In [93]:
def get_similar_words(word_vectors, target_word, english_words, indices=[10, 50, 100]):
    topn_value = max(indices) * 10
    similar_words = word_vectors.most_similar(target_word, topn=topn_value)
    
    normal_similar_words = [(word, score) for word, score in similar_words if word in english_words]
    
    num_similar_words = len(normal_similar_words)

    for idx in indices:
        if idx <= num_similar_words:
            word, score = normal_similar_words[idx - 1]
            print(f"{idx}th most similar word to '{target_word}': {word} with a similarity score of {score}")
        else:
            print(f"Only {num_similar_words} similar words were found for '{target_word}', not enough to show the {idx}th word.")

    return normal_similar_words

In [94]:
def get_similarity_score(word_vectors, correct_word, guess, precision=8):
    try:
        similarity_score = word_vectors.similarity(correct_word, guess)
        similarity_score = round(similarity_score, precision)
        return similarity_score
    except KeyError as e:
        return None

In [101]:
def semantle_dupe(word_vectors, english_words):
    similarity = 0
    try:
        random_word = get_random_word(word_vectors, english_words)
    except IndexError:
        print("No valid words found in the model's vocabulary.")
        return

    print(f"Randomly selected word: {random_word}")
    
    try:
        get_similar_words(word_vectors, random_word, english_words, indices=[10, 50, 100])
    except KeyError as e:
        return

    print("\nInput your guess. Type 'q' to exit.\n")
    while True:
        user_guess = input("Enter your guess: ").lower()
        if user_guess == 'q':
            print(f"The target word was: {random_word}")
            break
        similarity = get_similarity_score(word_vectors, random_word, user_guess)
        if similarity is not None:
            print(f"Similarity score: {similarity}")
            if abs(similarity - 1.0) < 1e-7:
                print("Congratulations, you found the target word!")
                break
        else:
            print("Word not found in the model's vocabulary.")



# Main
if __name__ == "__main__":
    semantle_dupe(word_vectors, english_words)

Randomly selected word: foremost
10th most similar word to 'foremost': biggest with a similarity score of 0.3866763710975647
50th most similar word to 'foremost': distinguished with a similarity score of 0.3141373097896576
100th most similar word to 'foremost': primacy with a similarity score of 0.28819724917411804

Input your guess. Type 'q' to exit.



Enter your guess:  first


Similarity score: 0.14847147464752197


Enter your guess:  front


Similarity score: 0.028867339715361595


Enter your guess:  important


Similarity score: 0.3441140651702881


Enter your guess:  foremost


Similarity score: 1.0
Congratulations, you found the target word!
