### Prompt Engineering Experiment

Problems are straight from the CS 372 Practice Midterms. I will reference the content of each problem below the table. For each problem, the structure of the prompt that is being passed to the LLM is the following: 

"""

You are a helpful assistant. The user is working on the problem below.

Problem:
{problem}

Answer the user's question about this problem based on the provided context.
Keep your answer brief and to the point.

Context (retrieved chunks related to the problem):
{context}

Question about the problem: {question}
Answer:

"""

**Style Instructions:**

*Minimal*: You are a helpful assistant. Answer the user's question based ONLY on the provided context. Keep your answer brief and to the point.

*Explanatory*: You are an expert tutor. Answer the user's question using the provided context. You must cite the specific chunk index (e.g., [Chunk 1]) that supports each part

*Tutoring*: You are a Socratic tutor. Do not give the answer directly. Instead, use the context to guide the user toward the answer with a hint or a leading question.

*Similarity*: Analyze why the following chunks were retrieved for the user's question. Explain the relevance of each chunk to the query.


For this experiment, the "question" part is kept the same: "Please explain how I should approach this problem" and only the problem and style were changed‚Äîapplying each style to each problem. The quantitative outputs were recorded below, and the outputs of the notebook were 


| Problem | Prompt Style | Answer latency (s) | Answer length (words) |
| --- | --- | --- | --- |
| Practice Midterm 1 #8 | Minimal | 5.8 | 193 |
| Practice Midterm 1 #8 | Explanatory | 11.4 | 435 |
| Practice Midterm 1 #8 | Tutoring | 4.6 | 174 |
| Practice Midterm 1 #8 | Similar | 10.6 | 475 |
| Practice Midterm 1 #12 | Minimal | 3.9 | 105 |
| Practice Midterm 1 #12 | Explanatory | 8.5 | 333 |
| Practice Midterm 1 #12 | Tutoring | 5.0 | 180 |
| Practice Midterm 1 #12 | Similar | 10.4 | 490 |
| Practice Midterm 1 #14 | Minimal | 2.8 | 73 |
| Practice Midterm 1 #14 | Explanatory | 9.4 | 339 |
| Practice Midterm 1 #14 | Tutoring | 3.2 | 120 |
| Practice Midterm 1 #14 | Similar | 15.7 | 504 |
| Practice Midterm 1 #15 | Minimal | 3.9 | 99 |
| Practice Midterm 1 #15 | Explanatory | 9.7 | 302 |
| Practice Midterm 1 #15 | Tutoring | 2.4 | 79 |
| Practice Midterm 1 #15 | Similar | 12.4 | 481 |
| Practice Midterm 2 #8 | Minimal | 2.8 | 67 |
| Practice Midterm 2 #8 | Explanatory | 11.9 | 307 |
| Practice Midterm 2 #8 | Tutoring | 1.0 | 49 |
| Practice Midterm 2 #8 | Similar | 12.1 | 536 |
| Practice Midterm 2 #9 | Minimal | 2.7 | 57 |
| Practice Midterm 2 #9 | Explanatory | 9.1 | 318 |
| Practice Midterm 2 #9 | Tutoring | 3.1 | 111 |
| Practice Midterm 2 #9 | Similar | 16.4 | 466 |
| Practice Midterm 2 #10 | Minimal | 2.1 | 82 |
| Practice Midterm 2 #10 | Explanatory | 7.3 | 273 |
| Practice Midterm 2 #10 | Tutoring | 1.3 | 51 |
| Practice Midterm 2 #10 | Similar | 11.8 | 523 |
| Practice Midterm 2 #18 | Minimal | 2.6 | 80 |
| Practice Midterm 2 #18 | Explanatory | 11.7 | 46 |
| Practice Midterm 2 #18 | Tutoring | 1.7 | 46 |
| Practice Midterm 2 #18 | Similar | 12.9 | 499 |


### Problem Content

**Practice Midterm 1 #8**: "Logistic regression models are typically trained to minimize the Cross-Entropy Loss (CE). Does a model with strictly positive CE on a given training dataset necessarily have less than 100% classificaiton accuracy on that same dataset? Briefly explain."

**Practice Midterm 1 #12**: "Name and define (mathematically) the function most commonly used for normalizing the outputs of a neural network for a classification task into k>2 categories."

**Practice Midterm 1 #14**: "Both back propagation and the stochastic gradient descent algorithm are commonly used in training artificial neural networks. Briefly describe the difference between these two algorithms."

**Practice Midterm 1 #15**: "You are training a neural network with minibatch stochastic gradient descent. Briefly describe what the learning rate hyperparameter is. Why might you increase the learning rate? Why might you decrease the learning rate?"

**Practice Midterm 2 #8**: "Describe the Reinforcement Learning from Human Feedback (RLHF) paradigm for language model alignment. How does it differ from supervised fine-tuning (SFT) or instruction-tuning?"

**Practice Midterm 2 #9**: "Describe the technique of gradient accumulation during model training or fine-tuning. Explain how gradient accumulation allows you to maintain a greater effective batch size during training or fine-tuning while reducing memory requirements."

**Practice Midterm 2 #10**: "For a causal decoder language model, where should the system prompt be inserted alongside the user prompt in the input to the model? Briefly justify your answer with reference to the attention mechanism."

**Practice Midterm 2 #18**: "In Reinforcement Learning, we want to learn a policy. A policy is a function from what domain to what
range? (In other words, what is the set of possible inputs to the policy, and what is the set of possible
outputs?)"


### Qualitative Analysis
The style instructions labeled at the top of this cell describe the outputs very precisely, so I will provide a more pragmatic analysis‚Äîhow the various prompts can be used most effectively. 
- Miminal: used to check understanding quickly and get instantaneous feedback if one already has an answer and is confident in their understanding. 
- Explanatory: If a student is completely lost on a subject, or it is one of the first problems they have seen of this type, the explanatory response gives a good formula for how to solve these kinds of problems. 
- Tutoring: Once a student has seen a few of the same types of problems (covering the same content or using the same techniques), they can use the tutoring prompt if they are almost to the answer but are missing a small amount of vital information. The tutoring prompt will lead them back through the process of discovering the answer by explaining where in the notes the relevant content is, presumably leading the student to find the vital piece of missing information on their own. 
- Similarity: This proved to be the least helpful, but I think if I used semantic chunking rather than a fixed character limit, and used a threshold matching algorithm rather than top k (this combination would ensure highly relevant information to the Problem), this would result in this prompt being much more useful to the student. 

In [None]:
# code used to gather data

import time
from src.core.database import DatabaseManager
from src.core.rag import answer_question
from src.core.types import PromptStyle

db = DatabaseManager()

course = db.get_course_by_name("CS 372")
if not course:
    raise ValueError("Course 'CS 372' not found.")

exam = db.get_exam_by_name(course.course_id, "Final")
if not exam:
    raise ValueError("Exam 'Final' not found for course 'CS 372'.")

problems = db.list_problems_for_exam(exam.exam_id)
if not problems:
    raise ValueError("No problems found for the Final exam in CS 372.")

target_problem = problems[0]
question_text = "Please explain how I should approach this problem"
prompt_styles = [
    PromptStyle.MINIMAL,
    PromptStyle.EXPLANATORY,
    PromptStyle.TUTORING,
    PromptStyle.SIMILARITY,
]

rows = []
for style in prompt_styles:
    start = time.perf_counter()
    result = answer_question(
        question_text=question_text,
        problem_id=target_problem.problem_id,
        prompt_style=style,
        db_manager=db,
    )
    elapsed = time.perf_counter() - start
    rows.append(
        {
            "prompt_style": style.value,
            "latency_ms": round(elapsed * 1000, 2),
            "word_count": len(result.answer.split()),
            "answer": result.answer,
        }
    )

print(f"Problem: {target_problem.problem_text[:120]}...")
print(f"Question: {question_text}")
for row in rows:
    print("\n---")
    print(f"Prompt style: {row['prompt_style']}")
    print(f"Latency (ms): {row['latency_ms']}")
    print(f"Answer length (words): {row['word_count']}")
    print(f"Answer: {row['answer']}")


Problem: In Reinforcement Learning, we want to learn a policy. A policy is a function from what domain to what
range? (In other ...
Question: Please explain how I should approach this problem

---
Prompt style: minimal
Latency (ms): 2591.23
Answer length (words): 80
Answer: To approach this problem, you should focus on learning the optimal policy (ùúã‚àó) and the optimal value function (ùë£‚àó) by interacting with the environment. Use the Bellman Optimality Equation to iteratively compute the optimal value function, which will help you infer the optimal policy. Consider the dynamics of the environment and the rewards associated with actions in different states. If the environment is deterministic, you can simplify your calculations. Aim to maximize the expected return (ùîºùê∫ùë°) through your chosen actions.

---
Prompt style: explanatory
Latency (ms): 9145.06
Answer length (words): 319
Answer: To approach the problem effectively, you should follow these steps:

1. **Understand th