# Leveraging Gen AI for SAT Prep - Prompt Engineering

## Overview

This notebook showcases prompt engineering techniques I used to generate content that is apt for mastering SAT vocabulary, especially for "Word-In-Context" type questions. 

I used an A100, single GPU, machine from Lambda Labs.

We'll start with installing transformers, torch and accelerate libraries

In [1]:
!pip install transformers
!pip install torch
!pip install accelerate>=0.26.0

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [2]:
import transformers
import torch
import accelerate
print(transformers.__version__)
print(torch.__version__)
print(accelerate.__version__)

  from .autonotebook import tqdm as notebook_tqdm


4.49.0
2.6.0
1.5.2


In [140]:
import pandas as pd
import random
from random import randrange
from pandas import DataFrame 
import os
import csv
import numpy as np
import traceback
import re
import json
import ast

To run this notebook, you will need a Hugging Face (HF) token as Llama models are gated and require users to accept Meta’s usage terms. To get a token, you will have to provide your contact information in the HF model page (https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct), accept the terms, and you will receive an email once the access has been approved. Followed by that, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab by navigating to the "Secrets" tab in the left panel and creating a new secret named "HF_TOKEN" and paste your Hugging Face token as the value. Restart you session thereafer.

We use Llama 3.1-8B model due to resource restrictions. If you have a powerful machine, you can leverage the 70B model.

In [3]:
#model_id = "meta-llama/Meta-Llama-3.1-70B-Instruct"
model_id="meta-llama/Meta-Llama-3-8B-Instruct"

### SAT Vocabulary Dataset

I created a SAT vocabulary dataset containing 500+ words. You can download the csv from my Git Repository. I used a CSV fromat that can be converted to Hugging Face Datasets if need be.

In [4]:
vocab_df = pd.read_csv('sat_vocab.csv')
print("Sample word: {} ".format(vocab_df.head(5)))

Sample word:          word
0       Abate
1  Aberration
2       Abhor
3      Abject
4      Abjure 


### SAT Style Genre for Paragraph Generation

I created a Genre dataset that can be used to instruct the model to create context based on a given Genre.

In [5]:
genre_df = pd.read_csv('sat_genre.csv')
print("Sample genre: {} ".format(genre_df.head(5)))

Sample genre:                          genre
0    Emergence of Homo sapiens
1  Use of fire by early humans
2   Development of stone tools
3      Agricultural Revolution
4     Establishment of Jericho 


We have a test cases file that has random word and genre combination that we will use to evaluate differnet experiments to make the generation better.

In [6]:
# let's load the test cases file
test_cases_file='test_eval_word_genre.csv'
#test_cases_file='semantic_eval_1000_word_genre.csv'
test_cases = pandas.read_csv(test_cases_file)

test_cases.head(5)

Unnamed: 0,genre,word,similarity_score,answer_choices,answer_choices_with_score
0,korean war,impasse,0.560486,{'breakthrough': 'a sudden and significant imp...,"{'sycophant': 0.4524288068094801, 'blanch': 0...."
1,invention of the printing press,catalyst,0.53147,{'inhibitor': 'a substance that prevents or sl...,"{'stagnant': 0.4349112972563168, 'deflate': 0...."
2,9/11 terrorist attacks,resilient,0.545534,{'arbiter': 'a person who has the authority to...,"{'arbiter': 0.39572552196368016, 'prolific': 0..."
3,unification of germany,laud,0.53511,{'obfuscate': ' to make something unclear or d...,"{'obfuscate': 0.4468186325794635, 'acumen': 0...."
4,futurist conceptual designs,whimsical,0.525331,"{'bequeath': "" to give (something) to someone ...","{'bequeath': 0.4239206552097495, 'accost': 0.4..."


## Download Llama Model
The following step will download the model weights and will take about 8 - 10 mins.

You will need Hugging Face token for downloading the weights. You can pass the token in the api or run "huggingface-cli login" and pass the token using a terminal window.

In [7]:
from transformers import LlamaForCausalLM, AutoModelForCausalLM, AutoTokenizer
import torch
from huggingface_hub import login

access_token="<your HF Token>"
login(token = access_token)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
print(device)

cache_dir="/home/ubuntu/Pragyan/model_cache"

model=AutoModelForCausalLM.from_pretrained(model_id, token=access_token, cache_dir=cache_dir).to(device)
#Use the line below if you are loading a finetuned model
#model=AutoModelForCausalLM.from_pretrained("<finetuned-model-location").to(device)
tokenizer= AutoTokenizer.from_pretrained(model_id, token=access_token, cache_dir=cache_dir)

2025-03-19 15:17:32.293401: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1742397452.342906    2754 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742397452.357587    2754 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


cuda:0


Loading checkpoint shards: 100%|██████████| 4/4 [00:31<00:00,  7.83s/it]


Inference Configuration - generating deterministic output, output logits, probabilities, etc.

In [8]:
from transformers import GenerationConfig
generation_config = GenerationConfig(
        # number of tokens to generate
        max_new_tokens=100,  
        # only choose from the top k most likely words
        top_k=50,  
        # Whether or not to use sampling ; use greedy decoding otherwise.
        do_sample=True,
        # parameter that controls the randomness or creativity of the generated text
        temperature=0.001, 
        # sets the pad tokens to whatever it is in the tokenizer
        pad_token_id=tokenizer.eos_token_id, 
        # output unnormalized outputs
        output_logits=True,
        # output the probabilities
        output_scores=True,   
        # passes hidden state along with output
        output_hidden_states=True,
        #returns output as a dict
        return_dict_in_generate=True,
        # reduce repetition
        #repetition_penalty=1.5
    )

print(generation_config)

GenerationConfig {
  "do_sample": true,
  "max_new_tokens": 100,
  "output_hidden_states": true,
  "output_logits": true,
  "output_scores": true,
  "pad_token_id": 128009,
  "return_dict_in_generate": true,
  "temperature": 0.001
}



Utility methods for preprocessing, running inference and post processing

In [10]:

paragraph_length=100
replacement_mask='________________'

# function for masking the SAT word from the generated paragraph
def mask_word(word, paragraph):
  return paragraph.replace(word, replacement_mask)

# run output to a file
def write_output_to_file(df, output_file):
    # delete output file, if it exists
    if os.path.exists(output_file):
        os.remove(output_file)
    
    # write the results to file
    f = open(output_file,"w",newline="")
    df.to_csv(f)
    f.flush()
    f.close() 

# get word generation probability 
def get_probability_score(generated_tokens, transition_scores, word):
    prob = []
    for tok, score in zip(generated_tokens[0], transition_scores[0]):
        s = str(tokenizer.decode(tok)).strip()
        if s in word:
            prob.append(f"{tokenizer.decode(tok)}, {np.exp(score.numpy(force=True)):.2%}, {tok:5d} ")
    return prob


In [79]:
def format_output(generated_text, word):
    # if word not included, ignore the response
    try: 
        if generated_text.find(" "+word+" ") == -1:
            return [False, generated_text, "SAT word not included in the paragraph"]
        
        generated_text = re.sub(r'\r?\n',' ', generated_text)
        generated_text = re.sub(r' +',' ', generated_text).strip()
        
        # format question
        if generated_text.find("The paragraph should be") != -1:
            sp = generated_text.split('.')
            sp.pop(0)
            generated_text = ".".join(sp)
        
        if generated_text.find("Here is the paragraph:") != -1:
            sp = generated_text.split(':')
            sp.pop(0)
            generated_text = ":".join(sp)
        
        if generated_text.find("The Answer is:") != -1:
            sp = generated_text.split(':')
            sp.pop(0)
            generated_text = ":".join(sp)
            
        if not generated_text.endswith("."):
            sp = generated_text.split('.')
            sp.pop(len(sp)-1)
            generated_text = ".".join(sp)
            generated_text = generated_text + "."
    except Exception:
        traceback.print_exc()
        return [False, generated_text, "Error when formatting"]
        
    return [True, generated_text]

In [141]:
def validate_generated_text(word, text):
    validation_prompt = "Is the word {} used properly in the following paragraph, yes or no? Here is the paragraph: {}".format(word, text)
    validation_output = run_inference(validation_prompt, 150)
    summary_prompt = "Summarize the following paragraph in one word, yes or no? Here is the paragraph: {}".format(validation_output[0])
    summary_output = run_inference(summary_prompt, 150)
    valid_usage = False
    if summary_output[0].lower().find("yes") != -1:
        valid_usage = True
    return [valid_usage, validation_prompt, validation_output[0], summary_prompt, summary_output[0]]


    
# function for running inference against the model
def run_inference(prompt, paragraph_length):
    inputs = tokenizer([prompt], return_tensors="pt").to(device)
    outputs=model.generate(**inputs, generation_config=generation_config)
    transition_scores = model.compute_transition_scores(outputs.sequences, outputs.scores, normalize_logits=True)  
    input_length = 1 if model.config.is_encoder_decoder else inputs.input_ids.shape[1]
    complete_text=''
    for t in outputs.sequences:
        complete_text += tokenizer.decode(t)
        
    generated_tokens = outputs.sequences[:,input_length:]
    generated_text = ''
    for t in generated_tokens:
        generated_text += tokenizer.decode(t)
        
    return [generated_text, generated_tokens, transition_scores, complete_text]

# Utility function for running experiment in a batch fashion
def run_experiment(input_prompt, test_cases_df, num_tests, select_random=False, add_word_meaning=False):
    output_df = DataFrame(columns=['word', 'genre', 'semantic_score', 'is_valid',
                'formatted_text', 'validation_output', 'summary_output',
                'prompt', 'answer_choices', 'validation_prompt', 'summary_prompt',
                'probability', 'answer_choices_with_score', 
                'full_word_tokenized', 'complete_response', 'generated_text'])
    for index, row in test_cases_df.iterrows():
        word = ""
        genre = ""
        final_prompt = ""
        
        if select_random:
            word = (vocab_df['word'][randrange(vocab_df.shape[0])]).lower()
            genre = (genre_df['genre'][randrange(genre_df.shape[0])]).lower()
        else:
            word = row['word'].lower()
            genre = row['genre'].lower()
        
        if add_word_meaning:
            word_def = row['answer_choices']
            d = ast.literal_eval(word_def)
            word_meaning = d[word]
            final_prompt = input_prompt.format(word, word_meaning, genre, word, word)
        else:
            final_prompt = input_prompt.format(genre, word)
            
        output = run_inference(final_prompt, paragraph_length)
        para = output[0]
        probability = ""
        answer_choices = ""
            
        full_word_tokenized = False
        tokens_string = str(probability)
        if word in tokens_string:
            full_word_tokenized = True
            
        formatted_text = ""
        is_valid= False
        validation_output=""
        validation_prompt=""
        summary_prompt=""
        summary_output=""
        if word in para:
            # format output
            formatted_output = format_output(para, word)
            formatting_status = formatted_output[0]
            formatted_text = formatted_output[1]

            # validated word usage
            validation = validate_generated_text(word, formatted_text)
            validation_prompt = validation[1]
            validation_output = validation[2]
            is_valid = validation[0]
            summary_prompt = validation[3]
            summary_output = validation[4]
            formatted_text = mask_word(word, formatted_text)
            probability = get_probability_score(output[1], output[2], word)
        else:
            probability = "SAT word not included in the paragraph"
            answer_choices = "N/A"
            
        print("Processing {}; word: {}; genre: {}; validity status: {}".format(index+1, word, genre, is_valid))
        
        output_df = output_df.append({
                'word': word, 'genre': genre, 'semantic_score': row['similarity_score'], 'is_valid':is_valid,
                'formatted_text': formatted_text, 'validation_output': validation_output, 'summary_output': summary_output,
                'prompt': final_prompt, 'answer_choices': row['answer_choices'], 'validation_prompt': validation_prompt,
                'summary_prompt':summary_prompt, 'probability':  probability, 'answer_choices_with_score': row['answer_choices_with_score'], 
                'full_word_tokenized': full_word_tokenized,
                'complete_response': output[3], 'generated_text': para, 
                }, ignore_index=True)
        
        if (index + 1) >= num_tests:
            break
    return output_df

## Experiment 1: Basic Prompt

In [120]:
zero_shot_output_file='output_basic.csv'
basic_prompt = "Generate a paragraph on {} with the word {} in it."
num_tests=100
df = run_experiment(basic_prompt, test_cases, num_tests, True)
print()
print("Generated Text: {}".format(df['generated_text'][0]))
print()
print("Answer Choices: {}".format(df['answer_choices'][0]))
print()
print("Word Generation Probability: {}".format(df['probability'][0]))

#uncomment the line below to write the output to file
write_output_to_file(df, zero_shot_output_file)

Processing 1; word: didactic; genre: the history of skateboarding
Processing 2; word: jeopardize; genre: cooling of earth's crust
Processing 3; word: reticent; genre: the history of speed skating
Processing 4; word: stipulate; genre: earth's magnetic pole reversals
Processing 5; word: concur; genre: indian independence
Processing 6; word: revamp; genre: victorian mansions
Processing 7; word: yearning; genre: appearance of the first marsupial mammals
Processing 8; word: indignant; genre: evolution of early reptiles
Processing 9; word: defamation; genre: the history of bowling
Processing 10; word: finesse; genre: invention of the smartphone
Processing 11; word: loathe; genre: the history of track and field
Processing 12; word: relish; genre: widespread volcanic activity creating basalt plateaus
Processing 13; word: acquiesce; genre: development of the internet
Processing 14; word: nefarious; genre: emergence of early amphibians
Processing 15; word: defiant; genre: beaux-arts train statio

## Experiment 2: Zero-shot Prompt Engineering

In [116]:
zero_shot_output_file='output_zero_shot.csv'
zero_shot='''Question: Generate a paragraph on The Iguazu Falls with the word revered in it. 
The Answer is: Insert Paragraph here 
Generate a paragraph on {} with the word {} in it.'''
num_tests=100
df = run_experiment(zero_shot, test_cases, num_tests, False)
#print()
#print("Generated Text: {}".format(df['generated_text'][0]))
#print()
#print("Answer Choices: {}".format(df['answer_choices'][0]))
#print()
#print("Word Generation Probability: {}".format(df['probability'][0]))

#uncomment the line below to write the output to file
write_output_to_file(df, zero_shot_output_file)

Processing 1; word: impasse; genre: korean war
Processing 2; word: catalyst; genre: invention of the printing press
Processing 3; word: resilient; genre: 9/11 terrorist attacks
Processing 4; word: laud; genre: unification of germany
Processing 5; word: whimsical; genre: futurist conceptual designs
Processing 6; word: engross; genre: the history of curling
Processing 7; word: enrapture; genre: deconstructivist museums
Processing 8; word: exigent; genre: formation of the grand canyon
Processing 9; word: quandary; genre: advent of quantum computing
Processing 10; word: anachronistic; genre: the history of ice hockey
Processing 11; word: ubiquitous; genre: development of modern ocean currents
Processing 12; word: temptation; genre: persian wars
Processing 13; word: dilapidated; genre: african tribal huts
Processing 14; word: gregarious; genre: emergence of early bats
Processing 15; word: copious; genre: formation of the mediterranean sea
Processing 16; word: anomaly; genre: first tectonic 

## Experiment 3: Few-shot Prompt Engineering

I use few-short prompting to teach the model the output format we would like. We add specific example of the desired output for the model to follow. This technique is called "few-shot prompting". 

We evaluate the validity of the test questions by checking if the tokenizer has the word and manually checking if the generate paragraph is meaningful, in other words the model is not halucinating. 

In [146]:
few_shot_output_file='output_few_shot_output.csv'
few_shot='''Question: Generate a paragraph on The Iguazu Falls with the word revered in it. 
The Answer is: The Iguazu Falls, which lie on the border between Argentina and Brazil, are a popular and revered tourist destination. The waterfalls have been visited by people from all over the world for over a century, and they are often cited as one of the most impressive natural wonders on Earth. 
Generate a paragraph on {} with the word {} in it.'''
num_tests=100
df = run_experiment(few_shot, test_cases, num_tests, False) 
#print()
#print("Generated Text: {}".format(df['generated_text'][0]))
#print()
#print("Answer Choices: {}".format(df['answer_choices'][0]))
#print()
#print("Word Generation Probability: {}".format(df['probability'][0]))

#uncomment the line below to write the output to file
write_output_to_file(df, few_shot_output_file)

Processing 1; word: impasse; genre: korean war; validity status: True
Processing 2; word: catalyst; genre: invention of the printing press; validity status: True
Processing 3; word: resilient; genre: 9/11 terrorist attacks; validity status: False
Processing 4; word: laud; genre: unification of germany; validity status: True
Processing 5; word: whimsical; genre: futurist conceptual designs; validity status: True
Processing 6; word: engross; genre: the history of curling; validity status: False
Processing 7; word: enrapture; genre: deconstructivist museums; validity status: False
Processing 8; word: exigent; genre: formation of the grand canyon; validity status: True
Processing 9; word: quandary; genre: advent of quantum computing; validity status: False
Processing 10; word: anachronistic; genre: the history of ice hockey; validity status: True
Processing 11; word: ubiquitous; genre: development of modern ocean currents; validity status: True
Processing 12; word: temptation; genre: persi

## Experiment 4: Role based Prompt Engineering

In [147]:
role_based_output_file='output_role_based.csv'
role_based_shot='''You are a teacher writing a reasearch paper on a certain subject. Your task is to write a 100 words paragraph on the subject, however, you must include a given word in the paragraph.
Generate a paragraph on the subject {} with the word {} in it.'''
num_tests=100
df = run_experiment(role_based_shot, test_cases, num_tests, False) 
#print()
#print("Generated Text: {}".format(df['generated_text'][0]))
#print()
#print("Answer Choices: {}".format(df['answer_choices'][0]))
#print()
#print("Word Generation Probability: {}".format(df['probability'][0]))

#uncomment the line below to write the output to file
write_output_to_file(df, role_based_output_file)

Processing 1; word: impasse; genre: korean war; validity status: False
Processing 2; word: catalyst; genre: invention of the printing press; validity status: True
Processing 3; word: resilient; genre: 9/11 terrorist attacks; validity status: False
Processing 4; word: laud; genre: unification of germany; validity status: True
Processing 5; word: whimsical; genre: futurist conceptual designs; validity status: True
Processing 6; word: engross; genre: the history of curling; validity status: True
Processing 7; word: enrapture; genre: deconstructivist museums; validity status: True
Processing 8; word: exigent; genre: formation of the grand canyon; validity status: False
Processing 9; word: quandary; genre: advent of quantum computing; validity status: True
Processing 10; word: anachronistic; genre: the history of ice hockey; validity status: False
Processing 11; word: ubiquitous; genre: development of modern ocean currents; validity status: False
Processing 12; word: temptation; genre: pers

## Experiment 5: Chain of Thought Prompt Engineering

In [149]:
cot_output_file='output_chain_of_thought.csv'
cot_shot='''You are a teacher writing a reasearch paper on a certain subject. Your task is to write a 100 words paragraph on the subject, however, you must include a given word in the paragraph.
1. know the meaning of the word
2. identify salient details about the subject
3. weave one or two salient details and the word into a 100 words paragraph

Generate a paragraph on the subject {} with the word {} in it.'''
num_tests=100
df = run_experiment(cot_shot, test_cases, num_tests, False) 
#print()
#print("Generated Text: {}".format(df['generated_text'][0]))
#print()
#print("Answer Choices: {}".format(df['answer_choices'][0]))
#print()
#print("Word Generation Probability: {}".format(df['probability'][0]))

#uncomment the line below to write the output to file
write_output_to_file(df, cot_output_file)

Processing 1; word: impasse; genre: korean war; validity status: True
Processing 2; word: catalyst; genre: invention of the printing press; validity status: False
Processing 3; word: resilient; genre: 9/11 terrorist attacks; validity status: False
Processing 4; word: laud; genre: unification of germany; validity status: True
Processing 5; word: whimsical; genre: futurist conceptual designs; validity status: True
Processing 6; word: engross; genre: the history of curling; validity status: False
Processing 7; word: enrapture; genre: deconstructivist museums; validity status: True
Processing 8; word: exigent; genre: formation of the grand canyon; validity status: True
Processing 9; word: quandary; genre: advent of quantum computing; validity status: True
Processing 10; word: anachronistic; genre: the history of ice hockey; validity status: False
Processing 11; word: ubiquitous; genre: development of modern ocean currents; validity status: True
Processing 12; word: temptation; genre: persi

## Experiment 6: Retrieval Augmented Generation

In [145]:
rag_shot_output_file='output_rag_shot_output.csv'
rag_based_shot='''Given the meaning of the word {}: {}; Generate a paragraph on {} with the word {} in it. Make sure your paragraph is one single paragraph that is formally worded. If you are done generating the paragraph, stop. Make sure to use the given word {} as is and only use it once.'''
num_tests=100
df = run_experiment(rag_based_shot, test_cases, num_tests, False, add_word_meaning=True) 
#print()
#print("Generated Text: {}".format(df['generated_text'][0]))
#print()
#print("Answer Choices: {}".format(df['answer_choices'][0]))
#print()
#print("Word Generation Probability: {}".format(df['probability'][0]))

#uncomment the line below to write the output to file
write_output_to_file(df, rag_shot_output_file)

Processing 1; word: impasse; genre: korean war; validity status: False
Processing 2; word: catalyst; genre: invention of the printing press; validity status: True
Processing 3; word: resilient; genre: 9/11 terrorist attacks; validity status: False
Processing 4; word: laud; genre: unification of germany; validity status: True
Processing 5; word: whimsical; genre: futurist conceptual designs; validity status: False
Processing 6; word: engross; genre: the history of curling; validity status: False
Processing 7; word: enrapture; genre: deconstructivist museums; validity status: True
Processing 8; word: exigent; genre: formation of the grand canyon; validity status: True
Processing 9; word: quandary; genre: advent of quantum computing; validity status: True
Processing 10; word: anachronistic; genre: the history of ice hockey; validity status: True
Processing 11; word: ubiquitous; genre: development of modern ocean currents; validity status: False
Processing 12; word: temptation; genre: pers