# Leveraging Gen AI for SAT Prep - Semantic Similarity

This notebook showcases how I used semantic similarity to find the best suited word for a given genre to reduce hallucination.

Semantic similarity is based on this paper: https://arxiv.org/pdf/2108.06130

Using BAAI/bge-large-en-v1.5 for evaluating Semantic Similarity

In [1]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-large-en-v1.5", trust_remote_code=True)

  from .autonotebook import tqdm as notebook_tqdm
2025-02-09 03:49:52.038516: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739072992.056241   10670 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739072992.061891   10670 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Similarity score ranges from 0 - 1. 0 means no similarity and 1 means high semantic similarity. Setting the threshold to 0.5 so that we have at least one word with similarity of 0.5 of higher.

In [2]:
from llama_index.core.evaluation import SemanticSimilarityEvaluator
from llama_index.core.embeddings import resolve_embed_model

evaluator = SemanticSimilarityEvaluator(
    embed_model=embed_model,
    similarity_threshold=0.5,
)

### Word dataset with definition

In [3]:
import pandas as pd
import random
from random import randrange
vocab_df = pd.read_csv('sat_words_with_definition.csv')
print("Sample word: {} ".format(vocab_df.head(5)))

Sample word:         word part_of_speech                                         definition
0      Abate              v  to reduce in amount, degree, or severity; The ...
1      Abhor              v  to hate or detest; She abhors cruelty to animals.
2    Abstain              v  to refrain or hold back voluntarily; He abstai...
3  Accretion              n  a gradual buildup or growth by addition; An ac...
4    Acerbic            adj  sharp, biting, or caustic in tone; She made an... 


### Genre dataset

In [4]:
genre_df = pd.read_csv('sat_genre.csv')
print("Sample genre: {} ".format(genre_df.head(5)))

Sample genre:                          genre
0    Emergence of Homo sapiens
1  Use of fire by early humans
2   Development of stone tools
3      Agricultural Revolution
4     Establishment of Jericho 


In [5]:
import pandas
from pandas import DataFrame 
import time

Following code selects a random genre, selects 10 random words, and checks the semantic similarity for each combination. Finally, comes up with combintations that has the highest score and the score should be .5 or above.

In [6]:
output_df = DataFrame(columns=['genre', 'word', 'similarity_score', 'definition', 'invalid_answer_choices'])
test_cases_count=1000
word_count=10
invalid_choice_count=3
counter = 0
while True:
    words=[]
    genre = (genre_df['genre'][randrange(genre_df.shape[0])]).lower()
    for i in range (word_count):
        key = randrange(vocab_df.shape[0])
        words.append((vocab_df['word'][randrange(vocab_df.shape[0])]).lower())       
    
    highest_score=0
    similarity_scores={}
    selected_word=''
    passing_count=0
    for i in range (len(words)):
        result = await evaluator.aevaluate(
            response=genre,
            reference=words[i],
        )
        similarity_scores.update({result.score:words[i]})
        if (result.passing):
            passing_count += 1
        # print("{},{}".format(words[i],result.score))
    
    # we need atleast one match with greater than 50%
    if passing_count == 0:
        continue
        
    # sort the ditionary so that we can pick the word that has highest similarity and pick 
    # the bottom 3 for invalid choices
    scores = list(similarity_scores.keys())
    scores.sort()
    sorted_scores = {i: similarity_scores[i] for i in scores}

    invalid_choices=[]
    for k in range(invalid_choice_count):
        invalid_choices.append(sorted_scores[scores[k]])

    # word with highest similarity
    selected_word = sorted_scores[scores[len(scores) - 1]]
    
    # capture the word definition
    word_definition = vocab_df[vocab_df['word'].str.lower() == 
                       selected_word.lower()].squeeze()['definition']

    # store in the dataframe
    output_df = output_df.append({'genre':genre,'word': selected_word, 
        'similarity_score':scores[len(scores) - 1], 'definition': word_definition, 'invalid_answer_choices':invalid_choices}, 
        ignore_index=True)
    
    print("{} - Genre: {}; Word: {} ".format(counter, genre, selected_word))
    counter += 1
    
    # break if test cases count has been reached
    if counter >= test_cases_count:
        break


0 - Genre: formation of the modern atlantic ocean basin; Word: undermine 
1 - Genre: construction of the great wall of china; Word: viable 
2 - Genre: civil rights movement; Word: tribulation 
3 - Genre: development of stone tools; Word: wary 
4 - Genre: evolution of early reptiles; Word: succinct 
5 - Genre: formation of the mediterranean sea; Word: illusion 
6 - Genre: roman basilicas; Word: venerate 
7 - Genre: industrial revolution factories; Word: opulent 
8 - Genre: unification of upper and lower egypt; Word: relegate 
9 - Genre: evolution of first tetrapods (four-limbed vertebrates); Word: stalwart 
10 - Genre: fall of the western roman empire; Word: relegate 
11 - Genre: establishment of early ecosystems on land; Word: intuition 
12 - Genre: permian-triassic extinction event; Word: transient 
13 - Genre: african tribal huts; Word: solitary 
14 - Genre: brutalist structures; Word: solitary 
15 - Genre: formation of the soviet union; Word: rhetoric 
16 - Genre: egyptian pyramids;

In [7]:
# write the dataframe to a csv file
output_df.to_csv('website_eval_word_genre.csv', index=False)