## Prompt Engineering for Item Generation

### Workshop Overview

This hands-on workshop covers various prompt engineering techniques
for generating high-quality items.

Learning Objectives:
- Master different types of prompting strategies
- Implement structured prompt templates for item generation
- Apply few-shot learning for consistent item quality
- Use chain-of-thought reasoning for complex assessments
- Implement evaluation and refinement workflows

### Basic Concept
Prompt engineering is the process of crafting effective instructions for language models to get the best possible results. It means carefully choosing words, providing context, and structuring requests in ways that help LLMs produce more accurate, helpful, and on-target responses.

In item development, prompt engineering ensures that generated items are educationally sound, technically accurate, appropriately challenging, and consistently aligned with learning objectives.

#### Prompting Strategies
Effective prompt engineering for item generation employs various strategic approaches, each designed to optimize specific aspects of LLM output.

##### Zero-shot Prompting
This is the simplest prompting strategy where we give the AI instructions
without any examples. It's like asking someone to do a task by just
explaining what you want, without showing them how others have done it.

**When to use Zero-Shot:**
1. Quick prototyping and testing
2. Simple, straightforward tasks
3. When you don't have good examples ready
4. For creative tasks where you want variety

**Limitations:**
1. Less consistent formatting
2. May misunderstand complex requirements
3. Output quality can vary

##### Few-shot Prompting
Few-shot prompting is like teaching by showing examples. Instead of just
telling the AI what to do, we show it a few examples of inputs and their
desired outputs, and it learns the pattern.

**When to use Few-Shot:**
1. When you need consistent formatting
2. When you have good examples available
3. For complex tasks with specific patterns
4. When zero-shot isn't giving consistent results

**Benefits:**
1. More consistent output format
2. Better understanding of requirements
3. Can encode domain expertise through examples
4. Reduces ambiguity

**Limitations:**
1. Requires good quality examples
2. Uses more tokens (costs more)
3. Can overfit to example patterns

##### Chain-of-Thought Prompting
Chain-of-thought prompting guides the AI through explicit step-by-step reasoning processes, making the model's thinking visible and systematic. This approach is particularly valuable for complex assessment item development where multiple considerations must be balanced.

**When to use Chain-of-Thought:**
1. For complex assessment scenarios requiring multi-step analysis
2. When generating items that test higher-order thinking skills
3. For quality assurance and validation of item construction
4. When transparency in the generation process is important
5. For developing items with sophisticated distractors

**Benefits:**
1. Improved reasoning quality: Forces systematic consideration of all aspects
2. Transparency: Makes the generation process auditable and reviewable
3. Better distractor development: Explicit focus on misconception-based options
4. Quality control: Each step can be evaluated independently

**Limitations:**
1. Longer prompts and responses cost more token usage
2. May generate unnecessarily complex items
3. More processing time required
4. Requires careful prompt structure to maintain consistency











### Hands-on Setup
We'll begin by loading the required libraries and configuring our environment.

In [7]:
%pip install langchain

Collecting langchain
  Using cached langchain-1.0.2-py3-none-any.whl.metadata (4.7 kB)
Collecting langgraph<1.1.0,>=1.0.0 (from langchain)
  Using cached langgraph-1.0.1-py3-none-any.whl.metadata (7.4 kB)
Collecting langgraph-checkpoint<4.0.0,>=2.1.0 (from langgraph<1.1.0,>=1.0.0->langchain)
  Using cached langgraph_checkpoint-3.0.0-py3-none-any.whl.metadata (4.2 kB)
Collecting langgraph-prebuilt<1.1.0,>=1.0.0 (from langgraph<1.1.0,>=1.0.0->langchain)
  Using cached langgraph_prebuilt-1.0.1-py3-none-any.whl.metadata (5.0 kB)
Collecting langgraph-sdk<0.3.0,>=0.2.2 (from langgraph<1.1.0,>=1.0.0->langchain)
  Using cached langgraph_sdk-0.2.9-py3-none-any.whl.metadata (1.5 kB)
Using cached langchain-1.0.2-py3-none-any.whl (107 kB)
Using cached langgraph-1.0.1-py3-none-any.whl (155 kB)
Using cached langgraph_checkpoint-3.0.0-py3-none-any.whl (46 kB)
Using cached langgraph_prebuilt-1.0.1-py3-none-any.whl (28 kB)
Using cached langgraph_sdk-0.2.9-py3-none-any.whl (56 kB)
Installing collected p


[notice] A new release of pip is available: 25.0.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import PromptTemplate, FewShotPromptTemplate
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Set up Google API key
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")

# Initialize the language model
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")

##### Zero-shot Prompting

In [3]:
def basic_zero_shot_example():
    """
    The simplest form of zero-shot prompting using updated LangChain syntax
    """

    print("BASIC ZERO-SHOT EXAMPLE")
    print("=" * 40)

    # Create a simple prompt
    prompt_text = (
        "Create a multiple-choice question about photosynthesis.\n"
        "Include 4 options (A, B, C, D) and indicate the correct answer.\n"
    )

    print(f"Generating question about 'photosynthesis'...\n")

    try:
        result = llm.invoke(prompt_text)
        print(f"Generated Question:")
        print(result.content)
        print("\n" + "=" * 40)
        return result
    except Exception as e:
        print(f"Error occurred: {e}")
        return None

# Call the function
basic_zero_shot_example()

BASIC ZERO-SHOT EXAMPLE
Generating question about 'photosynthesis'...

Generated Question:
Here is a multiple-choice question about photosynthesis:

**Question:** Which of the following correctly identifies the primary products of photosynthesis?

A) Carbon dioxide and water
B) Glucose (sugar) and oxygen
C) Light energy and chlorophyll
D) Nitrogen and water vapor

**Correct Answer:** B



AIMessage(content='Here is a multiple-choice question about photosynthesis:\n\n**Question:** Which of the following correctly identifies the primary products of photosynthesis?\n\nA) Carbon dioxide and water\nB) Glucose (sugar) and oxygen\nC) Light energy and chlorophyll\nD) Nitrogen and water vapor\n\n**Correct Answer:** B', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id='run--6e22270e-73e7-4546-8258-f30581f1fb88-0', usage_metadata={'input_tokens': 31, 'output_tokens': 775, 'total_tokens': 806, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 713}})

##### Few-shot Prompting

In [4]:
# Mathematics Examples
MATH_EXAMPLES = [
    {
        "instruction": "Create a linear equation problem",
        "grade": "8th Grade",
        "topic": "Solving y - 3 = 12",
        "output": """Question: Solve for y: y - 3 = 12
A) y = 9
B) y = 15
C) y = 4
D) y = 36
Correct Answer: B
Explanation: To solve y - 3 = 12, add 3 to both sides: y = 12 + 3 = 15"""
    },
    {
        "instruction": "Create a linear equation problem",
        "grade": "8th Grade",
        "topic": "Solving m + 8 = 20",
        "output": """Question: Solve for m: m + 8 = 20
A) m = 28
B) m = 12
C) m = 8
D) m = 160
Correct Answer: B
Explanation: To solve m + 8 = 20, subtract 8 from both sides: m = 20 - 8 = 12"""
    },
    {
        "instruction": "Create a linear equation problem",
        "grade": "8th Grade",
        "topic": "Solving x + 7 = 15",
        "output": """Question: Solve for x: x + 7 = 15
A) x = 8
B) x = 22
C) x = 7
D) x = 15
Correct Answer: A
Explanation: To solve x + 7 = 15, subtract 7 from both sides: x = 15 - 7 = 8"""
    }
]

# Medical Examples
MEDICAL_EXAMPLES = [
    {
        "specialty": "Physician Assistant",
        "topic": "Acute Coronary Syndrome",
        "presentation": "58-year-old male with chest pain",
        "output": """Question: A 58-year-old male presents with crushing chest pain radiating to his left arm for 2 hours. ECG shows ST-segment elevation in leads II, III, and aVF. What is the most likely diagnosis?
A) Anterior STEMI
B) Inferior STEMI
C) Non-ST elevation MI
D) Unstable angina
Correct Answer: B
Explanation: ST elevation in leads II, III, and aVF indicates an inferior wall STEMI."""
    },
    {
        "specialty": "Physician Assistant",
        "topic": "Acute Coronary Syndrome",
        "presentation": "62-year-old with chest pain and ECG changes",
        "output": """Question: A 62-year-old man has chest pain with ECG showing ST depression in leads V4-V6. Initial troponin is normal. What classification best describes this presentation?
A) STEMI
B) NSTEMI
C) Unstable angina
D) Stable angina
Correct Answer: C
Explanation: ST depression with normal troponin suggests unstable angina. NSTEMI would have elevated troponin."""
    }
]


##### Few-shot with Mathematics Example

In [5]:
def few_shot_math_example():
    """
    Few-shot for mathematics using examples
    """

    print(f"FEW-SHOT: MATHEMATICS LINEAR EQUATIONS")

    # Format examples for few-shot
    examples = []
    for ex in MATH_EXAMPLES[:2]:  # Use first 2 as examples
        examples.append({
            "instruction": f"Grade: {ex['grade']}, Topic: {ex['topic']}",
            "output": ex['output']
        })

    # Show the examples we're using
    print(f"Using these examples to teach the AI:")
    for i, ex in enumerate(examples, 1):
        print(f"\nExample {i}:")
        print(f"Input: {ex['instruction']}")
        print(f"Output: {ex['output'][:100]}...")

    # Create the prompts
    example_prompt = PromptTemplate(
        input_variables=["instruction", "output"],
        template="Instruction: {instruction}\n{output}"
    )

    few_shot_prompt = FewShotPromptTemplate(
        examples=examples,
        example_prompt=example_prompt,
        prefix="You are an expert at creating math assessment items. Follow these examples exactly:",
        suffix="Instruction: {instruction}\n",
        input_variables=["instruction"]
    )

    # Generate new question
    chain = few_shot_prompt | llm

    print(f"\nNow generating a new question...\n")

    # Use invoke instead of run with proper parameter format
    result = chain.invoke({
        "instruction": "Grade: 8th Grade, Topic: Solving n - 5 = 18"
    })

    print(f"Generated Question:")
    # Handle different response types
    print(result.content if hasattr(result, 'content') else result)

    return result

# Call the function (moved outside the function definition)
few_shot_math_example()

FEW-SHOT: MATHEMATICS LINEAR EQUATIONS
Using these examples to teach the AI:

Example 1:
Input: Grade: 8th Grade, Topic: Solving y - 3 = 12
Output: Question: Solve for y: y - 3 = 12
A) y = 9
B) y = 15
C) y = 4
D) y = 36
Correct Answer: B
Explanatio...

Example 2:
Input: Grade: 8th Grade, Topic: Solving m + 8 = 20
Output: Question: Solve for m: m + 8 = 20
A) m = 28
B) m = 12
C) m = 8
D) m = 160
Correct Answer: B
Explanat...

Now generating a new question...

Generated Question:
Question: Solve for n: n - 5 = 18
A) n = 13
B) n = 23
C) n = 5
D) n = 90
Correct Answer: B
Explanation: To solve n - 5 = 18, add 5 to both sides: n = 18 + 5 = 23


AIMessage(content='Question: Solve for n: n - 5 = 18\nA) n = 13\nB) n = 23\nC) n = 5\nD) n = 90\nCorrect Answer: B\nExplanation: To solve n - 5 = 18, add 5 to both sides: n = 18 + 5 = 23', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id='run--9229a612-10ca-4e48-984d-5c2568189503-0', usage_metadata={'input_tokens': 248, 'output_tokens': 1005, 'total_tokens': 1253, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 922}})

##### Chain-of -thought

In [6]:
def chain_of_thought_math_example():
    """
    Chain-of-thought prompting for Grade 8 mathematics
    This shows the AI's step-by-step reasoning process
    """

    print("CHAIN-OF-THOUGHT: GRADE 8 MATHEMATICS")
    print("=" * 50)

    # Create a chain-of-thought prompt that guides step-by-step reasoning
    cot_prompt = PromptTemplate(
        input_variables=["grade", "topic", "concept"],
        template="""
You are creating a Grade {grade} mathematics assessment item about {topic}.

Think through this step-by-step:

Step 1: Identify the key mathematical concept
What specific aspect of {concept} should this question test?

Step 2: Determine appropriate difficulty level
What makes this appropriate for Grade {grade} students?

Step 3: Choose a real-world context
What everyday situation would make this concept meaningful?

Step 4: Design the problem scenario
Create a clear, engaging problem setup.

Step 5: Write the question stem
Make it specific and unambiguous.

Step 6: Develop answer choices
- Create one correct answer using proper mathematical reasoning
- Design three distractors based on common student errors

Step 7: Provide complete solution
Show the mathematical steps and reasoning.

Now work through each step:

Step 1 - Key concept analysis:
[Analyze what to test]

Step 2 - Grade level appropriateness:
[Justify difficulty level]

Step 3 - Real-world context:
[Choose meaningful scenario]

Step 4 - Problem scenario:
[Create the setup]

Step 5 - Question stem:
[Write the question]

Step 6 - Answer choices:
A) [Correct answer with reasoning]
B) [Common error: show what mistake leads here]
C) [Common error: show what mistake leads here]
D) [Common error: show what mistake leads here]

        """
    )

    # Create the chain
    chain = cot_prompt | llm

    print("Generating a chain-of-thought mathematics problem...\n")

    try:
        result = chain.invoke({
            "grade": "8",
            "topic": "Linear Equations",
            "concept": "solving multi-step equations with variables on both sides"
        })

        print("CHAIN-OF-THOUGHT REASONING AND RESULT:")
        print("=" * 50)
        print(result.content if hasattr(result, 'content') else result)

        return result

    except Exception as e:
        print(f"Error occurred: {e}")
        return None

# Call the function
chain_of_thought_math_example()

CHAIN-OF-THOUGHT: GRADE 8 MATHEMATICS
Generating a chain-of-thought mathematics problem...

CHAIN-OF-THOUGHT REASONING AND RESULT:
Here's the step-by-step creation of the Grade 8 mathematics assessment item:

---

**Step 1 - Key concept analysis:**
The question should test a student's ability to:
1.  Formulate a linear equation from a real-world scenario.
2.  Solve a multi-step linear equation with variables on both sides.
3.  Use inverse operations to isolate the variable.
4.  Correctly manage positive and negative integers during algebraic manipulation.
5.  Interpret the solution in the context of the problem.

**Step 2 - Grade level appropriateness:**
This is appropriate for Grade 8 because common core standards (e.g., CCSS.MATH.CONTENT.8.EE.C.7.B) expect students to "Solve linear equations with rational number coefficients, including equations whose solutions require expanding expressions using the distributive property and collecting like terms." This problem focuses on collecting

AIMessage(content='Here\'s the step-by-step creation of the Grade 8 mathematics assessment item:\n\n---\n\n**Step 1 - Key concept analysis:**\nThe question should test a student\'s ability to:\n1.  Formulate a linear equation from a real-world scenario.\n2.  Solve a multi-step linear equation with variables on both sides.\n3.  Use inverse operations to isolate the variable.\n4.  Correctly manage positive and negative integers during algebraic manipulation.\n5.  Interpret the solution in the context of the problem.\n\n**Step 2 - Grade level appropriateness:**\nThis is appropriate for Grade 8 because common core standards (e.g., CCSS.MATH.CONTENT.8.EE.C.7.B) expect students to "Solve linear equations with rational number coefficients, including equations whose solutions require expanding expressions using the distributive property and collecting like terms." This problem focuses on collecting like terms and variables on both sides, which is a core skill for this grade level. The numbers 