# **Mini-Project \#2: Replicating Semantle**

Names: Balingit, Andrei Luis & Burayag, Ethan Axl

More information on the assessment is found in our Canvas course.

# **Environment Setup**



This code block installs all the necessary libraries for the Semantle replication.

In [1]:
!pip install gensim fasttext numpy --quiet

# **Load Pre-Trained Word Embeddings**

This code block loads the pre-trained word embeddings to be used for the Semantle replication later on. The word embedding model to be used is either the GloVe that was trained on Wikipedia and Gigaword or another one that was trained on Twitter.

In [2]:
import gensim.downloader as api

In [3]:
# Comment out the word embedding model to be used

# GloVe (Stanford) - Trained on Wikipedia/Gigaword
model = api.load("glove-wiki-gigaword-200")



In [4]:
# Comment out the word embedding model to be used

# GloVe (Stanford) - Trained on Twitter
# model = api.load("glove-twitter-200")

# **Word Pool Definition & Filtering**
This code block gets a list of standard and real English words, which will be the basis for the target words later on. This will also keep only the ones existing in our word embedding model and are long enough to be a reasonable target word. This provides a much cleaner and more realistic vocabulary for the game.

In [None]:
# Downloads the standard English word list
!wget https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt --quiet

# Builds the valid English word pool
with open('words_alpha.txt') as f:
    english_words = set(f.read().split())

valid_words = [
    word for word in english_words
    if word in model.key_to_index
    and len(word) > 3
]

print(f"Valid Word Pool Size: {len(valid_words)}")

Valid Word Pool Size: 98280


# **Target Word Selection & Cosine Similarity Calculation**
This code block selects a random target word from the defined list of valid words. This will also calculate the cosine similarities between the target word and all other words in the word embedding model.



In [6]:
import random

# Selects a random target word to be used
target_word = random.choice(valid_words)

# Gets the cosine similarities between the target word and all words in the model
similarities = model.most_similar(target_word, topn=len(model))

# **Semantle Game Replication**

This code block shows and demonstrates the implementation of the replication of the Semantle Game.

**Note: User Play Star Here**

In [7]:
def print_header():
    print("\n*===============================*")
    print("           SEMANTLE              ")
    print("*===============================*\n")

end_game = False
guesses = []
message = ""

while not end_game:

    print_header()
    print(f"Target Word: {target_word}\n")
    print("Nearest Words & Similarity Scores:")
    print(f"  10. - {similarities[9][0]} {similarities[9][1]:.5f}")
    print(f"  100. - {similarities[99][0]} {similarities[99][1]:.5f}")
    print(f"  1000. - {similarities[999][0]} {similarities[999][1]:.5f}")

    print("\n*----- Your Guesses So Far -----*")
    if len(guesses) == 0:
        print("No guesses yet.")
    else:
        for rank, (word, score) in enumerate(guesses, start=1):
            print(f"  {rank}. {word} - {score:.5f}")

    if message:
        print(f"\n>>> {message}")
        message = ""

    guess_word = input("\nYour guess (Enter 1 to Exit): ").strip().lower()

    if guess_word == "1":
        print_header()
        print(f"Better luck next time! The word was '{target_word}' after {len(guesses)} guess(es)!")
        print("\n*----- Final Guesses -----*")
        if len(guesses) == 0:
            print("No guesses.")
        else:
            for rank, (word, score) in enumerate(guesses, start=1):
                print(f"  {rank}. {word} - {score:.5f}")
        end_game = True
        continue

    if guess_word not in model.key_to_index:
        message = f"'{guess_word}' is not recognized in the resource. Try another word."
        continue

    if any(word == guess_word for word, score in guesses):
        message = f"You've already guessed '{guess_word}'! Try a different word."
        continue

    similarity_score = model.similarity(target_word, guess_word)
    guesses.append((guess_word, similarity_score))
    guesses.sort(key=lambda x: x[1], reverse=True)

    if guess_word == target_word:
        print_header()
        print(f"Great job! You guessed the word '{target_word}' in {len(guesses)} guess(es)!")
        print("\n*----- Final Guesses -----*")
        for rank, (word, score) in enumerate(guesses, start=1):
            print(f"  {rank}. {word} - {score:.5f}")
        end_game = True
    else:
        message = f"'{guess_word}' has a similarity score of {similarity_score:.5f}"


           SEMANTLE              

Target Word: subjection

Nearest Words & Similarity Scores:
  10. - veiling 0.45311
  100. - francoism 0.38054
  1000. - luminiferous 0.32043

*----- Your Guesses So Far -----*
No guesses yet.

           SEMANTLE              

Target Word: subjection

Nearest Words & Similarity Scores:
  10. - veiling 0.45311
  100. - francoism 0.38054
  1000. - luminiferous 0.32043

*----- Your Guesses So Far -----*
  1. hat - -0.09685

>>> 'hat' has a similarity score of -0.09685

           SEMANTLE              

Target Word: subjection

Nearest Words & Similarity Scores:
  10. - veiling 0.45311
  100. - francoism 0.38054
  1000. - luminiferous 0.32043

*----- Your Guesses So Far -----*
  1. subject - -0.01682
  2. hat - -0.09685

>>> 'subject' has a similarity score of -0.01682

           SEMANTLE              

Target Word: subjection

Nearest Words & Similarity Scores:
  10. - veiling 0.45311
  100. - francoism 0.38054
  1000. - luminiferous 0.32043

*-----