# Leveraging Gen AI for SAT Prep - Prompt Engineering

## Overview

This notebook showcases prompt engineering techniques I used to generate content that is apt for mastering SAT vocabulary, especially for "Word-In-Context" type questions. 

I used an A100, single GPU, machine from Lambda Labs.

We'll start with installing transformers, torch and accelerate libraries

In [None]:
!pip install transformers
!pip install torch
!pip install accelerate>=0.26.0

In [None]:
import transformers
import torch
import accelerate
print(transformers.__version__)
print(torch.__version__)
print(accelerate.__version__)

In [None]:
import pandas as pd
import random
from random import randrange
from pandas import DataFrame 
import os
import csv
import numpy as np
import traceback
import re
import json
import ast
import decimal

To run this notebook, you will need a Hugging Face (HF) token as Llama models are gated and require users to accept Meta’s usage terms. To get a token, you will have to provide your contact information in the HF model page (https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct), accept the terms, and you will receive an email once the access has been approved. Followed by that, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab by navigating to the "Secrets" tab in the left panel and creating a new secret named "HF_TOKEN" and paste your Hugging Face token as the value. Restart you session thereafer.

We use Llama 3.1-8B model due to resource restrictions. If you have a powerful machine, you can leverage the 70B model.

In [None]:
#model_id = "meta-llama/Meta-Llama-3.1-70B-Instruct"
model_id="meta-llama/Meta-Llama-3-8B-Instruct"

### SAT Vocabulary Dataset

I created a SAT vocabulary dataset containing 500+ words. You can download the csv from my Git Repository. I used a CSV fromat that can be converted to Hugging Face Datasets if need be.

In [None]:
vocab_df = pd.read_csv('sat_vocab.csv')
print("Sample word: {} ".format(vocab_df.head(5)))

### SAT Style Genre for Paragraph Generation

I created a Genre dataset that can be used to instruct the model to create context based on a given Genre.

In [None]:
genre_df = pd.read_csv('sat_genre.csv')
print("Sample genre: {} ".format(genre_df.head(5)))

We have a test cases file that has random word and genre combination that we will use to evaluate differnet experiments to make the generation better.

In [None]:
# let's load the test cases file
test_cases_file='test_eval_word_genre.csv'
#test_cases_file='semantic_eval_1000_word_genre.csv'
test_cases = pd.read_csv(test_cases_file)

test_cases.head(5)

## Download Llama Model
The following step will download the model weights and will take about 8 - 10 mins.

You will need Hugging Face token for downloading the weights. You can pass the token in the api or run "huggingface-cli login" and pass the token using a terminal window.

In [None]:
from transformers import LlamaForCausalLM, AutoModelForCausalLM, AutoTokenizer
import torch
from huggingface_hub import login

access_token="<Your HF Token>"
login(token = access_token)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
print(device)

cache_dir="/home/ubuntu/Pragyan/model_cache"

model=AutoModelForCausalLM.from_pretrained(model_id, token=access_token, cache_dir=cache_dir).to(device)
#Use the line below if you are loading a finetuned model
#model=AutoModelForCausalLM.from_pretrained("<finetuned-model-location").to(device)
tokenizer= AutoTokenizer.from_pretrained(model_id, token=access_token, cache_dir=cache_dir)

Inference Configuration - generating deterministic output, output logits, probabilities, etc.

In [None]:
from transformers import GenerationConfig
generation_config = GenerationConfig(
        # number of tokens to generate
        max_new_tokens=100,  
        # only choose from the top k most likely words
        top_k=50,
        # Whether or not to use sampling ; use greedy decoding otherwise.
        do_sample=True,
        # parameter that controls the randomness or creativity of the generated text
        temperature=0.01, 
        # sets the pad tokens to whatever it is in the tokenizer
        pad_token_id=tokenizer.eos_token_id, 
        # output unnormalized outputs
        output_logits=True,
        # output the probabilities
        output_scores=True,   
        # passes hidden state along with output
        output_hidden_states=True,
        #returns output as a dict
        return_dict_in_generate=True,
        # reduce repetition
        #repetition_penalty=1.5
    )

print(generation_config)

Utility methods for preprocessing, running inference and post processing

In [None]:

paragraph_length=100
replacement_mask='________________'

# function for masking the SAT word from the generated paragraph
def mask_word(word, paragraph):
  return paragraph.replace(word, replacement_mask)

# run output to a file
def write_output_to_file(df, output_file):
    # delete output file, if it exists
    if os.path.exists(output_file):
        os.remove(output_file)
    
    # write the results to file
    f = open(output_file,"w",newline="")
    df.to_csv(f)
    f.flush()
    f.close() 

# get word generation probability 
def get_probability_score(generated_tokens, transition_scores):
    prob = []
    for tok, score in zip(generated_tokens[0], transition_scores[0]):
        prob.append(f"{tokenizer.decode(tok)}, {score}, {tok:5d} ")
    return prob


In [None]:
def format_output(generated_text, word):
    # if word not included, ignore the response
    comment = ""
    try: 
        if not generated_text.endswith("."):
            sp = generated_text.split('.')
            sp.pop(len(sp)-1)
            generated_text = ".".join(sp)
            generated_text = generated_text + "."

        word_in_between = generated_text.find(" "+word+" ")
        word_ending_with_period = generated_text.find(" "+word+".")
        word_ending_with_comma = generated_text.find(" "+word+",")

        if(word_in_between == -1 and word_ending_with_comma == -1 
           and word_ending_with_comma == -1):
            if generated_text.find(word) == -1:
                return [False, generated_text, "Word not included"]
            else:
                comment = "Word included in a different form"

        generated_text = re.sub(r'\r?\n',' ', generated_text)
        generated_text = re.sub(r' +',' ', generated_text).strip()
        
        # format question
        if generated_text.find("The paragraph should be") != -1:
            sp = generated_text.split('.')
            sp.pop(0)
            generated_text = ".".join(sp)
        
        if generated_text.find("Here is the paragraph:") != -1:
            sp = generated_text.split(':')
            sp.pop(0)
            generated_text = ":".join(sp)
        
        if generated_text.find("The Answer is:") != -1:
            sp = generated_text.split(':')
            sp.pop(0)
            generated_text = ":".join(sp)
            
    except Exception:
        traceback.print_exc()
        return [False, generated_text, "Error when formatting"]
        
    return [True, generated_text, comment]

In [None]:
def validate_generated_text(word, text):
    validation_prompt = "Is {} the most logical and precise word in following paragraph? Response should be, answer: yes or answer:no. Here is the paragraph: {}".format(word, text) 

    validation_output = run_inference(validation_prompt, 150)
    valid_usage = False
    if validation_output[0].find("Answer: yes") != -1:
        valid_usage = True
    if validation_output[0].find("Answer: no") != -1:
        valid_usage = False
    return [valid_usage, validation_prompt, validation_output[0]]


    
# function for running inference against the model
def run_inference(prompt, paragraph_length):
    inputs = tokenizer([prompt], return_tensors="pt").to(device)
    outputs=model.generate(**inputs, generation_config=generation_config)
    transition_scores = model.compute_transition_scores(outputs.sequences, outputs.scores, normalize_logits=True)  
    input_length = 1 if model.config.is_encoder_decoder else inputs.input_ids.shape[1]
    complete_text=''
    for t in outputs.sequences:
        complete_text += tokenizer.decode(t)
        
    generated_tokens = outputs.sequences[:,input_length:]
    generated_text = ''
    for t in generated_tokens:
        generated_text += tokenizer.decode(t)
        
    return [generated_text, generated_tokens, transition_scores, complete_text]

# Utility function for running experiment in a batch fashion
def run_experiment(input_prompt, test_cases_df, num_tests, select_random=False, add_word_meaning=False):
    output_df = DataFrame(columns=['word', 'genre', 'semantic_score', 'is_valid', 'error_text', 'transition_score',
                'formatted_text', 'validation_output','probability', 
                'prompt', 'answer_choices', 'validation_prompt', 
                'answer_choices_with_score', 'complete_response', 'generated_text'])
    for index, row in test_cases_df.iterrows():
        word = ""
        genre = ""
        final_prompt = ""
        
        if select_random:
            word = (vocab_df['word'][randrange(vocab_df.shape[0])]).lower()
            genre = (genre_df['genre'][randrange(genre_df.shape[0])]).lower()
        else:
            word = row['word'].lower()
            genre = row['genre'].lower()
       
        if add_word_meaning:
            word_def = row['answer_choices']
            d = ast.literal_eval(word_def)
            word_meaning = d[word]
            final_prompt = input_prompt.format(word, word_meaning, genre, word, word)
        else:
            final_prompt = input_prompt.format(genre, word)
            
        output = run_inference(final_prompt, paragraph_length)
        para = output[0]
        probability = []
        answer_choices = ""
            
        # format output
        formatted_output = format_output(para, word)
        is_valid = formatted_output[0]
        formatted_text = formatted_output[1]
        error_text = formatted_output[2]
        validation_output=""
        validation_prompt=""
        transition_score=""
        if is_valid:
            # validated word usage
            validation = validate_generated_text(word, formatted_text)
            is_valid = validation[0]
            if not is_valid:
               error_text = 'Automated Validation Failed' 
            validation_prompt = validation[1]
            validation_output = validation[2]
            formatted_text = mask_word(word, formatted_text)
            transition_score = output[2].sum().cpu().item()
            transition_score=f"{transition_score:.8f}"
            probability = get_probability_score(output[1], output[2])
        else:
            error_text = formatted_output[2]
            answer_choices = "N/A"
            
        print("Processing {}; word: {}; genre: {}; validity status: {}; transition score: {}".format(index+1, word, genre, is_valid, transition_score))
        
        output_df = output_df.append({
                'word': word, 'genre': genre, 'semantic_score': row['similarity_score'], 'is_valid':is_valid, 'error_text':error_text,
                'transition_score': transition_score, 'formatted_text': formatted_text, 'validation_output': validation_output, 
                'probability': probability, 'prompt': final_prompt, 'answer_choices': row['answer_choices'], 'validation_prompt': validation_prompt,
                'answer_choices_with_score': row['answer_choices_with_score'], 'complete_response': output[3],
                'generated_text': para, 
                }, ignore_index=True)
        
        if (index + 1) >= num_tests:
            break
    return output_df

## Experiment 1: Basic Prompt

In [None]:
zero_shot_output_file='output_basic.csv'
basic_prompt = "Generate a paragraph on {} with the word {} in it."
num_tests=100
df = run_experiment(basic_prompt, test_cases, num_tests, True)
print()
print("Generated Text: {}".format(df['generated_text'][0]))
print()
print("Answer Choices: {}".format(df['answer_choices'][0]))


#uncomment the line below to write the output to file
write_output_to_file(df, zero_shot_output_file)

## Experiment 2: Zero-shot Prompt Engineering

In [None]:
zero_shot_output_file='output_zero_shot.csv'
zero_shot='''Question: Generate a paragraph on The Iguazu Falls with the word revered in it. 
The Answer is: Insert Paragraph here 
Generate a paragraph on {} with the word {} in it.'''
num_tests=100
df = run_experiment(zero_shot, test_cases, num_tests, False)

#uncomment the line below to write the output to file
write_output_to_file(df, zero_shot_output_file)

## Experiment 3: Few-shot Prompt Engineering

I use few-short prompting to teach the model the output format we would like. We add specific example of the desired output for the model to follow. This technique is called "few-shot prompting". 

We evaluate the validity of the test questions by checking if the tokenizer has the word and manually checking if the generate paragraph is meaningful, in other words the model is not halucinating. 

In [None]:
few_shot_output_file='output_few_shot_output.csv'
few_shot='''Question: Generate a paragraph on The Iguazu Falls with the word revered in it. 
The Answer is: The Iguazu Falls, which lie on the border between Argentina and Brazil, are a popular and revered tourist destination. The waterfalls have been visited by people from all over the world for over a century, and they are often cited as one of the most impressive natural wonders on Earth. 
Generate a paragraph on {} with the word {} in it.'''
num_tests=100
df = run_experiment(few_shot, test_cases, num_tests, False) 

#uncomment the line below to write the output to file
write_output_to_file(df, few_shot_output_file)

## Experiment 4: Role based Prompt Engineering

In [None]:
role_based_output_file='output_role_based.csv'
role_based_shot='''You are a teacher writing a reasearch paper on a certain subject. Your task is to write a 100 words paragraph on the subject, however, you must include a given word in the paragraph.
Generate a paragraph on the subject {} with the word {} in it.'''
num_tests=1000
df = run_experiment(role_based_shot, test_cases, num_tests, False) 

#uncomment the line below to write the output to file
write_output_to_file(df, role_based_output_file)

## Experiment 5: Chain of Thought Prompt Engineering

In [None]:
cot_output_file='output_chain_of_thought.csv'
cot_shot='''You are a teacher writing a reasearch paper on a certain subject. Your task is to write a 100 words paragraph on the subject, however, you must include a given word in the paragraph.
1. know the meaning of the word
2. identify salient details about the subject
3. weave one or two salient details and the word into a 100 words paragraph

Generate a paragraph on the subject {} with the word {} in it.'''
num_tests=100
df = run_experiment(cot_shot, test_cases, num_tests, False) 

#uncomment the line below to write the output to file
write_output_to_file(df, cot_output_file)

## Experiment 6: Retrieval Augmented Generation

In [None]:
rag_shot_output_file='output_rag_shot_output.csv'
rag_based_shot='''Given the meaning of the word {}: {}; Generate a paragraph on {} with the word {} in it. Make sure your paragraph is one single paragraph that is formally worded. If you are done generating the paragraph, stop. Make sure to use the given word {} as is and only use it once.'''

num_tests=100
df = run_experiment(rag_based_shot, test_cases, num_tests, False, add_word_meaning=True) 

#uncomment the line below to write the output to file
write_output_to_file(df, rag_shot_output_file)